Sometimes, Nagios users are unable to login to the Nagios XI web interface when trying to establish a connection to the Nagios XI server via an SSH tool such as putty.
Here at Ibmi Media, as part of our Server Management Services, we regularly help our Customers to solve Nagios related errors.
In this context, we shall look into the causes and tips to fix it.
The following factors can trigger this error;
i. Incorrect/Lost password for the nagiosadmin user.
ii. When SELinux enabled.
iii. In cases where apache service is not running.
iv. If firewall is blocking port 80.
v. When the mysqld service is not running or there are crashed database tables.
vi. The postgresql service is not running or the database is not accepting commands.
vii. Other products installed that use Postgres may need their databases vacuumed.
The most common reason for the error in the process of logging into the web interface is when using a wrong password for nagiosadmin.
In this case, you will need to rest the nagiosadmin's password in such cases.
To accomplish this, open an SSH or direct console session to the Nagios XI host and run the following command;
/usr/local/nagiosxi/scripts/reset_nagiosadmin_password.php --password=newpassword
We will need the policycoreutils tool installed to check whether SELinux is disabled. Use the following command to install it;
For RHEL|CentOS|Oracle Linux, run the following command;
yum install -y policycoreutils
For Debian | Ubuntu, run the command below;
apt-get install -y policycoreutils
Next, check the status of SELinux by running one of these commands;
sestatus
OR
getenforce
To disable SELinux, run the command below;
setenforce 0
sed -i 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/selinux/config
Running the above commands will turn off SELinux immediately AND make the change remain after the server reboot.
In cases where apache web server is not running, then you will get an error messages in the web UI like this;
In Firefox browser, you will see;
"Unable to connect"
In Google Chrome, you will see;
"This webpage is not available"
Now, check the status of Apache by using one of the commands below;
For RHEL 6|CentOS 6|Oracle Linux 6, run the command below;
service httpd status
For RHEL 7|CentOS 7|Oracle Linux 7, run the command below;
systemctl status httpd.service
For Ubuntu 14 System, run the command below;
service apache2 status
For Debian | Ubuntu 16/18 based system, run the command;
# systemctl status apache2.service
To start/restart the apache service, run the command below:
For RHEL 6|CentOS 6|Oracle Linux 6, execute;
service httpd start
or
service httpd restart
For RHEL 7|CentOS 7|Oracle Linux 7 Machines, run the follwoing command;
systemctl start httpd.service
or
systemctl restart httpd.service
For Ubuntu 14 systems, run any of the command below;
service apache2 start
or
service apache2 restart
For Debian|Ubuntu 16/18 based systems, execute any of the following command;
systemctl start apache2.service
or
systemctl restart apache2.service
Let us now check the steps to unblock the port in iptables firewall. Start by checking the status of the firewall by running the following command;
service iptables status
Now go through the output for a line as given below that tells us that the firewall rule exists and allows inbound TCP traffic on port 80:
7 ACCEPT tcp -- 0.0.0.0/0 0.0.0.0/0 state NEW tcp dpt:80
If this firewall rule DOES NOT exist, then it can be added by executing the following commands:
iptables -I INPUT -p tcp --dport 80 -j ACCEPT
Then to save this rule, execute;
service iptables save
Ubuntu uses the Uncomplicated Firewall (ufw) to manage firewall rules which is not enabled by default.
To check if it is enabled, run the following command;
ufw status
If the status says it is inactive, then we can enable the firewall on boot and start it using the command below:
ufw enable
Now we need to add rules for different ports as the default configuration denies all incoming connections.
To list the firewall rules execute the following command;
ufw status verbose
You will get an output such as this;
Status: active
Logging: on (low)
Default: deny (incoming), allow (outgoing), disabled (routed)
New profiles: skip
To Action From
-- ------ ----
80 ALLOW IN Anywhere
80 (v6) ALLOW IN Anywhere (v6)
From the above output, you can see from the output that firewall rules exist allowing inbound TCP traffic on port 80.
If this firewall rule does not exist, then it can be added by executing the following commands:
ufw allow http
ufw reload
When the mysqld service is not running or there are crashed database tables?
In cases where there is an issue with the MySQL database while trying to login to the web Interface, we will most probably see an errors message similar to this one:
Message: A database connection error has been detected, we are attempting to repair the server, if the repair does not resolve the issue, please contact Nagios support.
Such Database corruption is usually caused by power outages, running out of disk space, or improper shutting down the Nagios XI server.
The correct way for shutting down the Nagios XI server would be to issue the following command in the command line;
shutdown -h now
If the Nagios XI machine has insufficient disk space then we may see errors like this when the repair database script is run:
/usr/local/nagiosxi/scripts/repairmysql.sh: line 59: 11735 Segmentation fault (core dumped) $cmd $t --sort_buffer_size=256M
Timeout error occurred trying to start MySQL Daemon.
Starting mysqld: [FAILED]
To repair the Nagios and nagiosql databases, simply run the following commands in the command line as the root user;
/usr/local/nagiosxi/scripts/repairmysql.sh nagios
/usr/local/nagiosxi/scripts/repairmysql.sh nagiosql
/usr/local/nagiosxi/scripts/repairmysql.sh nagiosxi
Alternatively, you are using Nagios XI 2014 and other newer versions, then we can use the following commands;
cd /usr/local/nagiosxi/scripts/
./repair_databases.sh
You will see an error such as this;
SQL: DELETE FROM nagios_logentries WHERE logentry_time < FROM_UNIXTIME(1293570334)
SQL: SQL Error [ndoutils] :</b> Table './nagios/nagios_logentries' is marked as crashed and last (automatic?) repair failed CLEANING ndoutils TABLE 'notifications'...
You can force repair the tables by using the commands below:
For RHEL 6|CentOS 6|Oracle Linux 6, run the following commands;
service mysqld stop
cd /var/lib/mysql/nagios
myisamchk -r -f nagios_<corrupted_table>
service mysqld start
rm -f /usr/local/nagiosxi/var/dbmaint.lock
php /usr/local/nagiosxi/cron/dbmaint.php
For RHEL 7|CentOS 7|Oracle Linux 7|Debian 9, run the following commands;
systemctl stop mariadb.service
cd /var/lib/mysql/nagios
myisamchk -r -f nagios_<corrupted_table>
systemctl start mariadb.service
rm -f /usr/local/nagiosxi/var/dbmaint.lock
php /usr/local/nagiosxi/cron/dbmaint.php
For Ubuntu 14 system, run the following commands;
service mysql stop
cd /var/lib/mysql/nagios
myisamchk -r -f nagios_<corrupted_table>
service mysql start
rm -f /usr/local/nagiosxi/var/dbmaint.lock
php /usr/local/nagiosxi/cron/dbmaint.php
For Debian 8|Ubuntu 16/18 based systems, run the following commands;
systemctl stop mysql.service
cd /var/lib/mysql/nagios
myisamchk -r -f nagios_<corrupted_table>
systemctl start mysql.service
rm -f /usr/local/nagiosxi/var/dbmaint.lock
php /usr/local/nagiosxi/cron/dbmaint.php
In other cases, you might need to truncate (empty) one or more tables. The following commands provide show us how to truncate both the "nagios_logentries" and "nagios_notifications" tables in the Nagios MySQL database;
mysql -u ndoutils -pn@gweb nagios -e 'TRUNCATE TABLE nagios_logentries'
mysql -u ndoutils -pn@gweb nagios -e 'TRUNCATE TABLE nagios_notifications'
The above commands will help to clear all entries from the affected tables. After truncating the tables, we should repeat the repair process outlined above.
When postgresql service is not running or the database is not accepting commands
For PostgreSQL issues, we will see an additional message, which would look something like this;
SQL: SQL Error [nagiosxi] : Database connection failed SQL: SQL Error [nagiosxi] : Database connection failed SQL: SQL Error [nagiosxi] : Database connection failed
Message: A database connection error has been detected, we are attempting to repair the server, if the repair does not resolve the issue, please contact Nagios support.
From observing the above message, it is important to ensure that;
i. We are not running out of disk space with the df commands.
ii. PostgreSQL is running and we can actually log in to the database manually.
iii. Try to start/restart PostgreSQL to see if it would start normally using one of the commands below:
For RHEL 6|CentOS 6|Oracle Linux 6|Ubuntu 14, run the command below;
service postgresql start
or
service postgresql restart
For RHEL 7|CentOS 7|Oracle Linux 7|Debian|Ubuntu 16/18, run the command below;
sysctemctl start postgresql.service
or
sysctemctl restart postgresql.service
In some cases, you need to run a vacuum on the Postgres database.
Start by determining the version of Postgres with the following command:
postgres -V
Based on that output, execute the commands specific to the version:
For Versions BEFORE 9
For RHEL 6|CentOS 6|Oracle Linux 6|Ubuntu 14, run the following commands;
echo "vacuum;vacuum analyze;"|psql nagiosxi postgres
service postgresql restart
For RHEL 7|CentOS 7|Oracle Linux 7|Debian|Ubuntu 16/18, run the following commands;
echo "vacuum;vacuum analyze;"|psql nagiosxi postgres
sysctemctl restart postgresql.service
For Versions 9 onwards
For RHEL 6|CentOS 6|Oracle Linux 6|Ubuntu 14, run the command;
echo "vacuum;vacuum analyze;vacuum full;"|psql nagiosxi postgres
service postgresql restart
For RHEL 7|CentOS 7|Oracle Linux 7|Debian|Ubuntu 16/18, execute;
echo "vacuum;vacuum analyze;vacuum full;"|psql nagiosxi postgres
sysctemctl restart postgresql.service
To manually log into Postgres, execute the command below;
psql nagiosxi nagiosxi
In some cases, you will get an error message such as:
psql: FATAL: database is not accepting commands to avoid wraparound data loss in database "postgres"
HINT: Stop the postmaster and use a standalone backend to vacuum database "postgres".
We may notice either a high CPU usage for the postmaster process or a repeated error message in the /var/lib/pgsql/data/pg_log file:
transaction ID wrap limit is 2147484146
To solve this issue, simply run the following command via ssh;
Versions BEFORE PostgreSQL 9
For RHEL 6|CentOS 6|Oracle Linux 6|Ubuntu 14, run the following commands;
service postgresql stop
su postgres
echo "VACUUM;" > /tmp/fix.sql
postgres -D /var/lib/pgsql/data nagiosxi < /tmp/fix.sql
postgres -D /var/lib/pgsql/data postgres < /tmp/fix.sql
postgres -D /var/lib/pgsql/data template1 < /tmp/fix.sql
exit
service postgresql start
For RHEL 7|CentOS 7|Oracle Linux 7|Debian|Ubuntu 16/18, execute;
sysctemctl stop postgresql.service
su postgres
echo "VACUUM;" > /tmp/fix.sql
postgres -D /var/lib/pgsql/data nagiosxi < /tmp/fix.sql
postgres -D /var/lib/pgsql/data postgres < /tmp/fix.sql
postgres -D /var/lib/pgsql/data template1 < /tmp/fix.sql
exit
sysctemctl start postgresql.service
Other products installed that use Postgres may need their databases vacuumed
If we have another piece of software installed on the Nagios XI server that uses Postgres, such as Nagios Fusion, we may need to vacuum the databases of that software as well as those mentioned in previous steps.
In the case of Fusion specifically, the following commands needed to be run as well as those in the previous steps:
postgres -D /var/lib/pgsql/data nagiosfusion < /tmp/fix.sql
or
postgres --single -D /var/lib/pgsql/data nagiosfusion < /tmp/fix.sql
Sometimes, Nagios users experience login issues and are unable to Log into the Nagios XI web interface due to a number of reasons ranging from the wrong password to the SELinux policy.