Tuesday, March 29, 2016

Goldengate Monitor Memory Usage

Goldengate Monitor Memory Usage


Goldengate Monitor is Oracle's GUI solution for administrators to monitor and alert on Goldengate targets.  I really think this product has some great features, however it is not without fault.  Recently we noticed that our monitor server was frequently failing.  From the server log:

<Error> (thread=PacketListener1, member=1): Stopping cluster due to unhandled exception: java.lang.OutOfMemoryError: Java heap space

We decided to take a look at resource utilization using jconsole via a VNC session.  If you've never used this before it's really a great tool to give you an idea of the performance of  your java processes.




In the screenshot above you can see the upward trend of memory usage by the Monitor Weblogic process, indicating a memory leak.  By default this process is able to consume up to 2GB of memory.  If you're monitoring a somewhat large environment it's likely you'll see frequent crashes of this process.

As a workaround we increased the memory allocation during startup, like so:

nohup ./startManagedWebLogic.sh MONITORSERVER_server1 http://someserver.domain.com:7002 -Xms8192m -Xmx8192m &

In this example I've given it 8GB of memory, which you can also confirm from the jconsole with the MAX and COMMITTED fields under the Memory tab.





Unfortunately this really is just a workaround.  In the screenshot above you can see that over the course of about 20 days we were already pretty close to our 8GB threshold.  As the administrator, you'll need to restart the weblogic processes periodically or setup a job to automatically do so.

We attempted to use JDK 1.8 in the hopes that memory usage was more efficient, however Goldengate Monitor requires JDK 1.7 and will not run under 1.8 unfortunately.  Monitor has the potential to be a really great tool for administrators and developers.  Hopefully Oracle addresses this memory issue in a future update.

As always, please let me know if you have any questions and be sure to subscribe for frequent updates on all things Oracle Goldengate!

Wednesday, March 16, 2016

Goldengate Monitor Stops Sending E-mail Alerts

We use Goldengate Monitor in our environment for the purpose of managing and alerting on Goldengate.  Recently we discovered that Monitor stopped sending email alerts.  From the Monitor log:


[2016-02-04T10:50:15.278-06:00] [JAGENT] [ERROR] [OGGMON-20269] [com.goldengate.monitor.jagent.jmx.MBeansContainerImpl] [tid: agentRegistration] [ecid: 0000LAh8ior5qY95zfL6iW1MgrqA000007,0] java.lang.IllegalArgumentException: id must be positive: -2069441695


After raising an SR the issue became clear.  The underlying Monitor database uses a sequence to generate ID's for alerts.  The datatype for this particular sequence is integer, meaning that the max number of ID's that can be generated is 2147483647.  Once the limit is reached the ID's become negative and Monitor is unable to handle this.  Oracle has accepted this as a design flaw and hopes to have this datatype changed to LONG in a future release (Bug 22346275 - Monitor Server not sending out Email alerts).  

In the meantime their solution was to completely reinstall Monitor and the underlying database.  This is less than ideal so we decided to see if we could fix this ourselves.  It took a bit of digging but we were able to find all references to negative alert ID's with the following SQLs.  


 SELECT COUNT (*)
  FROM DEV_OGGMON.CONNECTIONS
 WHERE INPUT_ID <= 0;

SELECT COUNT (*)
  FROM DEV_OGGMON.CONNECTIONS
 WHERE OUTPUT_ID <= 0;

SELECT COUNT (*)
  FROM DEV_OGGMON.MPS
 WHERE OBJ_ID <= 0;

SELECT COUNT (*)
  FROM DEV_OGGMON.GGS_OBJECTS
 WHERE ID <= 0;


Once identified these entries need to be removed.


DELETE FROM DEV_OGGMON.CONNECTIONS
      WHERE INPUT_ID <= 0;

DELETE FROM DEV_OGGMON.CONNECTIONS
      WHERE OUTPUT_ID <= 0;

DELETE FROM DEV_OGGMON.MPS
      WHERE OBJ_ID <= 0;

DELETE FROM DEV_OGGMON.GGS_OBJECTS
      WHERE ID <= 0;


Additionally, the following tables need to be wiped as well.


DELETE FROM DEV_OGGMON.ALERT_NOTICES;

DELETE FROM DEV_OGGMON.ALERTS;

DELETE FROM DEV_OGGMON.MPS_HISTORY;

DELETE FROM DEV_OGGMON.MPS_HISTORY_COMPOSITE_VALUES;


Next you need to reset the sequence used to generate alert ID's.


UPDATE DEV_OGGMON.SEQUENCE
   SET SEQ_COUNT = 1;


Lastly let's get the instance re-added to the Monitor Server.


Delete Jagent logs
Recreate Datastore
Remove target instance from Monitor
Restart Monitor Server
Restart local Jagent


This is a viable workaround until Oracle fixes this bug in a future release, but you'll need to keep an eye on the sequence.  The more targets you monitor, the quicker this issue is likely to surface.

Feel free to let me know if you have any questions.