×


Clear Solaris Service Maintenance Status in Nagios - Troubleshoot and Resolve

Are you trying to clear Solaris Service Maintenance Status?

This guide is for you.


Generally, when the Nagios Core service verifies the configuration files and finds an invalid configuration, the core service will not start.

Basically, In order to resolve the problem you must fix the problem Nagios Core is complaining about. This is normal behavior of Nagios Core, it is not specific to Solaris.

Here at Ibmi Media, as part of our Server Management Services, we regularly help our Customers to perform Nagios queries.

In this context, we shall look into how to clear the Solaris Maintenance Status on service, specifically the Nagios Core or NRPE service.


Service Maintenance Status in Solaris

Though this is normal behavior of Nagios Core, it is not specific to Solaris alone.

However, when the service fails to start several times, Solaris will put the service into a Maintenance State.

This prevents the problem from becoming a great issue.

If Nagios Core continues to complain, even after fixing the problem, we must clear the maintenance state on the service before Solaris allows a service to start again.


How to fix this Nagios error ?

Generally, this issue happens when a Nagios user tries to reboot the Solaris server.

To troubleshoot, you need to know that reason why the service did not start.

To see detailed status information, we run:

$ svcs -xv nagios

Output:

svc:/application/nagios:default (?)
State: maintenance since March 31, 2021 04:57:38 PM EST
Reason: Start method failed repeatedly, last exited with status 8.
See: http://support.oracle.com/msg/SMF-8000-KS
See: /var/svc/log/application-nagios:default.log
Impact: This service is not running.

From the output above, you will see that the service is in a maintenance state. However, we don't have details as to the cause of the issue except that the Start method failed repeatedly.

We saw the mention of a log file /var/svc/log/application-nagios:default.log.

To perform further troubleshooting, we execute:

$ tail -20 /var/svc/log/application-nagios:default.log
License: GPL
Website: https://www.nagios.org
Reading configuration data…
Read main config file okay…
Error: Invalid max_check_attempts value for host ‘test’
Error: Could not register host (config file ‘/usr/local/nagios/etc/objects/localhost.cfg’, starting on line 33)
Error processing object config files!
***> One or more problems was encountered while processing the config files…
Check your configuration file(s) to ensure that they contain valid
directives and data definitions. If you are upgrading from a previous
version of Nagios, you should be aware that some variables/definitions
may have been removed or modified in this version. Make sure to read
the HTML documentation regarding the config files, as well as the
‘Whats New’ section to find out what has changed.
[ Mar 31 16:57:38 Method “start” exited with status 8. ]

The object definition in the configuration file is missing required directives is a common Nagios Core problem.

Here, the template was forgotten and hence all the common options were missing.

Before we proceed, we fix the error. Once done, we can run the verify command to check:

/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg

Successful verification will end like this:

Total Errors: 0
Things look okay – No serious problems were detected during the pre-flight check

At this point, however, if we try to restart the service, it will not start:

svcadm enable nagios
svcs -xv nagios
svc:/application/nagios:default (?)
State: maintenance since March 31, 2021 04:57:38 PM EST
Reason: Start method failed repeatedly, last exited with status 8.
See: http://support.oracle.com/msg/SMF-8000-KS
See: /var/svc/log/application-nagios:default.log
Impact: This service is not running.

This does not mean there is still a Nagios configuration issue.

If we pay attention to the date and time on the state, we can see it has not changed.

Solaris will refuse to start the service until we clear the maintenance state.


How to Clear Solaris Service Maintenance Status ?

In order to clear the maintenance state, we execute:

$ svcadm clear nagios

Then to start Nagios, we run:

$ svcadm enable nagios

Now we check the state of the service:

$ svcs -xv nagios

The output resembles something like this:

svc:/application/nagios:default (?)
State: online since March 31, 2021 05:17:21 PM EST
See: /var/svc/log/application-nagios:default.log
Impact: None.

From the output, we can see that the Nagios Core is now running again.


[Need help with Nagios errors resolution? We are here for you. ]


Conclusion

This article covers how to fix Clear Solaris Service Maintenance Status Nagios issue. Basically, When the Nagios Core service finds an invalid configuration, the core service will not start. 

To fix the problem you must fix the problem Nagios Core is complaining about.

This is normal behavior of Nagios Core, it is not specific to Solaris.

However on Solaris, after a service has failed to start several times, Solaris will put the service into what is called a Maintenance State. This state prevents a small problem from becoming a bigger problem. 

Even after fixing the problem Nagios Core is complaining about, you must also clear the maintenance state on the service before Solaris allows a service to be started again.

This means that the service is in a maintenance state, however there is not a lot of detail as to the cause of the issue except that the Start method failed repeatedly. 

It does however provide the name of a log file /var/svc/log/application-nagios:default.log.

Execute the following command to perform further troubleshooting:

tail -20 /var/svc/log/application-nagios:default.log


To Clear Maintenance State on Nagios:

1. Run the following command to clear the maintenance state:

$ svcadm clear nagios

2. Execute the following command to start Nagios:

$ svcadm enable nagios

3. Now check the state of the service:

$ svcs -xv nagios