An advanced, cross-platform agent, we can install NCPA on Windows / Linux / AIX / Mac OS X machines.
In this context, we shall look into how to automatically restart problematic services on Linux servers using the Nagios Cross-Platform Agent (NCPA).
How to Restart Linux Services With NCPA ?
You should have an NCPA configured on the Linux machine we would like to restart services on.
1. Create Restart Script
First, we create a service_restart.sh script in the /usr/local/ncpa/plugins directory that will perform the service restart command:
# vi /usr/local/ncpa/plugins/service_restart.sh
Then we paste the following code into the terminal session:
#!/bin/bashsudo service $1 restart
Once done, we save the changes and close the file.
2. Grant NCPA Permission to Restart Services
The Nagios user needs permission to execute the service command.
We execute the following commands as root to give NCPA permission to restart services:
# echo “nagios ALL = NOPASSWD: `which service`” >> /etc/sudoers
# echo ‘Defaults:nagios !requiretty’ >> /etc/sudoers
3. Test the Commands from Nagios XI Server
Now we will test if the script we just created on the Linux server is working.
The example below will restart the crond service as it is unlikely to cause any issues:
# cd /usr/local/nagios/libexec
# ./check_ncpa.py -H 10.25.13.30 -P 5693 -t Str0ngT0k3n -M ‘plugins/service_restart.sh’ -a crond
[root@xi-c7x-x64 libexec]# ./check_ncpa.py -H 10.25.13.30 -P 5693 -t Str0ngT0k3n -M ‘plugins/service_restart.sh’ -a crond
Stopping crond: [ OK ]
Starting crond: [ OK ] | ‘status’=0;1;2;
Since we received back the results from the service_restart.sh command, it appears to work.
4. Create Event Handler Script
Next, we create a script Nagios XI to use for the event handler. It will be called service_restart.sh and locate in the /usr/local/nagios/libexec/ directory on the Nagios XI server:
# vi /usr/local/nagios/libexec/service_restart.sh
Then we paste the following:
case “$1” in
/usr/local/nagios/libexec/check_ncpa.py -H “$2” -P 5693 -t “$3” -M ‘plugins/service_restart.sh’ -a “$4”
Eventually, we save the changes and close the file.
Now to set the correct permissions we execute the following commands:
# chown apache:nagios /usr/local/nagios/libexec/service_restart.sh
# chmod 775 /usr/local/nagios/libexec/service_restart.sh
Then we test the script:
# /usr/local/nagios/libexec/service_restart.sh CRITICAL 10.25.13.30 Str0ngT0k3n crond
Once the script runs, it receives three arguments referenced as $1, $2, $3, $4 in the script.
$1 = The state of the service.
$2 = The host address of the Linux server.
$3 = The NCPA Token on the Linux server.
$4 = The name of the service being restarted.
Make note that only when the service is in a CRITICAL state will we execute the service_restart.sh command.
5. Create Event Handler
Moving ahead, we create an event handler on the Nagios XI server to be used by our services.
For that, we navigate to Configure > Core Config Manager.
Select Commands from the list on the left, click the >_ Commands link and then, Add New.
Then we populate the fields with the values on the following page:
Service Restart – Linux
$USER1$/service_restart.sh $SERVICESTATE$ $HOSTADDRESS$ Str0ngT0k3n $_SERVICESERVICE$
Ensure to check the Active check box.
Eventually, Save and Apply Configuration.
6. Add a Service Check
Now we need to create a Service using the NCPA Monitoring Wizard. To do so, we select the crond service from the list of Services.
Then we finish the wizard to create the new service.
7. Update Service With Event Handler
Here, we need to do two things:
a. Select Event Handler
b. Add the name of the service we want to restart as a custom variable to the service object.
To implement this:
i. Navigate to Configure > Core Config Manager > Monitoring > Services.
ii. Click the Service status for: crond to edit the service.
iii. Then click the Check Settings tab.
iv. From the Event handler drop-down list, select the option Service Restart – Linux.
v. For Event handler enabled click On.
vi. Click the Misc Settings tab and then click the Manage Free Variables button.
vii. We will add a custom variable so that the event handler knows the name of the service to restart.
viii. If we click Insert the variable will add to the list on the right.
ix. Then click Close >> Save.
x. Finally, Apply Configuration for the changes to take affect.
To test, we force the service to stop on the Linux machine:
# service crond stop
We wait for the Nagios service to go to a critical state or force the next check.
Once the Nagios XI Cron Scheduling Daemon service is in a critical state the event handler will execute and the Linux crond service will restart.
Next time Nagios XI checks the Cron Scheduling Daemon service it will return to an OK state.
However, if the event handler does not work properly, check the /usr/local/nagios/var/nagios.log file for any errors.
 SERVICE ALERT: 10.25.13.34;Cron Scheduling Daemon;CRITICAL;SOFT;1;crond is stopped
 wproc: SERVICE EVENTHANDLER job 7 from worker Core Worker 12627 is a non-check
helper but exited with return code 13
 wproc: early_timeout=0; exited_ok=1; wait_status=3328; error_code=0;
 wproc: stderr line 01: execvp(/usr/local/nagios/libexec/service_restart.sh, …)
failed. Errno is 13: Permission denied
Here, we can see that the worker did not have permission to execute the service_restart.sh command.