A usually Nagios setup is consisted in only one server that does all the checks by itself. In the moment that server arrives at 2000 checks or more (and especially if you configure the ndoutils on that server) you will see the processor going wild. At this point it’s time for a change.
So let’s go on debug: What processes eat the most of your system resources?
Once you have indentified the guilty parts it’s time to attack – “divide et impera”.
Let’s see the options:
– See the tuning section in the Nagios documentation (http://nagios.sourceforge.net/docs/3_0/tuning.html)
– Do not install fancy web plug-ins for your Nagios web server (for example NEXSM)
– Set a bigger check interval will help (for example setting 15 instead of 5 minutes can drop you’re processor usage from 5.xx to 3.xx)
– In Nagios 2 users reported that in case of lots of persistent comments the status.dat file kept filling up thus creating a page render problem.
– Think about passing to the plug-in check_mk which gets all the checks in one request (http://mathias-kettner.de/check_mk)
– NSCA: “Allows you to submit passive service checks results to another server on the network that is running Nagios” – I think is complicated to set up and it requires yet another Nagios install (here is the doc link http://nagios.sourceforge.net/download/contrib/documentation/misc/NSCA_Setup.pdf)
– The DNX project – an interesting approach to the flexible Nagios architecture by making worker nodes that get jobs from the Nagios central server (http://dnx.sourceforge.net/ and http://dnx.sourceforge.net/DNX_Workflow.pdf)
– Try to set up roles in you’re monitoring set-up by giving NRPE agents specific tasks (ex: one machine to check the SNMP services, one to check the Linux services, etc.).
Personally, I would test the last two options because, depending of you’re environment, you could choose between the more flexible check system with DNX or the more specific one of specialized NRPE agents.
What about you? What are you’re solutions?
Leave a comment with you’re personal experience.
Note: Copying this article to your website is strictly NOT allowed.