Basic monitoring recommendations
Here are some monitoring recommendations that are relatively easy to implement.
Consider monitoring the following items using a monitoring tool, such as check_MK, Zenoss, Zyrion, IBM/Tivoli, or other monitoring tools. Polling intervals should be every five minutes.
Node | What you should monitor | Why you should monitor it |
---|---|---|
On all nodes |
|
These checks help you monitor all the basics and should be useful for
troubleshooting. We recommend performing each of the following checks
every five minutes on each server.
|
Jive web applications | We recommend running a synthetic health check against your Jive
application (by using a tool such as WebInject).
|
WebInject interacts with the web application to verify basic functionality. It provides functional tests beyond just connecting to a listening port. Checking individual servers, as well as the load balancer instance, verifies proper load balancer behavior. We recommend setting these checks every five minutes initially. To minimize false alarms, we require two failures before an alert is sent. If you find that these settings are resulting in too many false alarms, then adjust your settings accordingly. We recommend setting up WebInject tests that perform the following:
For an example of WebInject XML code that performs all of the above, see WebInject code example. |
Cache server |
|
JMX provides a means of checking the Java Virtual Machine's heap size
for excessive garbage collection. Disk space checks ensure continued
logging.
|
Databases (Activity Engine, Analytics, and web application) |
|
Database checks show potential problems in the web application server
which can consume resources at the database layer (such as excessive
open connections to the database).
|
Document conversion |
|
The various service statistics are exposed via JMX's mbean and can be accessed the same way as JMX on the web application node's Tomcat's Java Virtual Machine. |
Activity Engine |
|
JMX provides a means of checking the Java Virtual Machine's heap size
for excessive garbage collection. Disk space checks ensure continued
logging.
|