Nagios NDOUtils Message Queue Exceeded – Fix it Now








Are you facing Nagios error, NDOUtils: Message Queue Exceeded? 

We can help you.


NDOUtils uses the operating system kernel message queue. As the amount of messages increases, we need to tune the kernel settings need to allow more messages to queue and process.

This is what leads to the error, NDOUtils: Message Queue Exceeded.

Here at Ibmi Media, as part of our Server Management Services, we regularly help our Customers to perform related NDOUtils queries.


Nature of Nagios error, NDOUtils Message Queue Exceeded ?

In Nagios, we may experience the following symptoms:

i. Missing hosts or services or status data

ii. Long time to restart the Nagios process

iii. Unusually high CPU load


A flood of messages in the /var/log/messages related to ndo2db like:

ndo2db: Error: max retries exceeded sending message to queue. Kernel queue parameters may need to be tuned. See README.
ndo2db: Warning: queue send error, retrying…

In addition, we may see multiple queues for the Nagios user while executing:

ipcs -q
—— Message Queues ——–
key msqid owner perms used-bytes messages
0xee070002 1409024 nagios 600 100672512 98313
0x50070002 1441793 nagios 600 0 0


How to resolve Nagios error, NDOUtils: Message Queue Exceeded?

To begin, we need to identify the current values:

grep ‘kernel.msgmnb’ /etc/sysctl.conf
grep ‘kernel.msgmax’ /etc/sysctl.conf
grep ‘kernel.msgmni’ /etc/sysctl.conf
kernel.msgmnb = 131072000
kernel.msgmax = 131072000
kernel.msgmni = 256000

If the settings are not already defined, then there will be no output for that command. So it will need to be defined in the /etc/sysctl.conf file.

For msgmnb and msgmax, we need to use the same value.

Recommended values are 131072000 and 262144000

On the other hand, for msgmni, we recommend 512000.

Unless we have a high-performance server, values higher than these will not be a solution.


For msgmnb and msgmax, the following commands will update /etc/sysctl.conf with increased values:

sed -i ‘s/^kernel\.msgmnb.*/kernel\.msgmnb = 262144000/g’ /etc/sysctl.conf
sed -i ‘s/^kernel\.msgmax.*/kernel\.msgmax = 262144000/g’ /etc/sysctl.conf


The below are for the msgmni option.

For the grep command we executed previously:

1. If it does not return output, this command will add the setting to the /etc/sysctl.conf file:

echo ‘kernel.msgmni = 512000’ >> /etc/sysctl.conf

2. On the other hand, if it does, this command will update the setting in the /etc/sysctl.conf file:

sed -i ‘s/^kernel\.msgmni.*/kernel\.msgmni = 512000/g’ /etc/sysctl.conf

Once done, we execute the following command:

sysctl -p
net.ipv4.ip_forward = 0
net.ipv4.conf.default.rp_filter = 1
net.ipv4.conf.default.accept_source_route = 0
kernel.sysrq = 0
kernel.core_uses_pid = 1
net.ipv4.tcp_syncookies = 1
kernel.msgmnb = 262144000
kernel.msgmax = 262144000
kernel.shmmax = 4294967295
kernel.shmall = 268435456
kernel.msgmni = 512000

The output shows increased values have been applied to the kernel.


Then we need to restart services:

On RHEL 6|CentOS 6|Oracle Linux 6|Ubuntu 14,

$ service nagios stop
$ service ndo2db restart
$ service nagios start

On RHEL 7|CentOS 7|Oracle Linux 7|Debian|Ubuntu 16/18,

$ systemctl stop nagios.service
$ systemctl restart ndo2db.service
$ systemctl start nagios.service

Eventually, we should check the message queues:

ipcs -q

If we see more than one queue for the user Nagios, execute the following to clear them:

for i in `ipcs -q | grep nagios |awk ‘{print $2}’`; do ipcrm -q $i; done

We watch the queues for 10-15 minutes to ensure they process:

watch ipcs -q

To stop watching the queues, we hit Ctrl + C.


If we find the message queue does not process quickly, the problem may relate to MySQL/MariaDB.

Ensure that the DB server has enough CPU and memory resources.

In addition, if the DB server is on the same server as the Nagios server, we should look at offloading the DB to a dedicated server.


[Need help with fixing Nagios errors? We are available 24/7. ]



Conclusion

This article covers how to resolve the Nagios error, NDOUtils: Message Queue Exceeded error occurs when the amount of messages increases.

NDOUtils uses the operating system kernel message queue. As the amount of messages increases the kernel settings need to be tuned to allow more messages to be queued and processed.

A flood of messages in the /var/log/messages related to ndo2db like:

ndo2db: Error: max retries exceeded sending message to queue. Kernel queue parameters may neeed to be tuned. See README.
ndo2db: Warning: queue send error, retrying... 


Nature of this Nagios error:

In Nagios you experience the following symptoms:

1. Missing hosts or services or status data

2. Takes a very long time to restart the Nagios process

3. Unusually high CPU load



How to fix Nagios error, NDOUtils: Message Queue Exceeded ?

The following commands are for the msgmni option. 

For the grep command you executed previously:

i. If it did not return output, this command will add the setting to the /etc/sysctl.conf file:

$ echo 'kernel.msgmni = 512000' >> /etc/sysctl.conf

2. If it did return output, this command will update the setting in the /etc/sysctl.conf file:

$ sed -i 's/^kernel\.msgmni.*/kernel\.msgmni = 512000/g' /etc/sysctl.conf

3. After making those changes, execute the following command:

$ sysctl -p

4. You need to restart services using the commands below:

$ systemctl stop nagios.service
$ systemctl restart ndo2db.service
$ systemctl start nagios.service

For Linux Tutorials

We create Linux HowTos and Tutorials for Sys Admins. Visit us on LinuxAPT.com

Also for Tech related tips, Visit forum.outsourcepath.com or General Technical tips on www.outsourcepath.com