Are you experiencing scheduled backup failures in Nagios? This guide will help to fix it.
Here at Ibmi Media, as part of our Server Management Services, we regularly help our Customers to solve Nagios related errors.
In this context, we shall look into what causes this error and how to get rid of it.
In some cases when trying to perform scheduled backup processes in Nagios, it simply fails.
For example, during a recent backup of a .tar.gz file, we discovered that the file did not backup successfully.
Furthermore, you might see many folders not being removed as per the backup limit setting in "/store/backups/nagiosxi/" location.
A major factor that affects a file backup process in Nagios is as a result of database corruption.
To get to the root of why backups processes are failing, start by performing a manual backup in an SSH on the Nagios XI server. This will generate a verbose output that will help in determining why the backup is failing.
Start by establishing a connection to your Nagios XI server as the root user via a terminal or SSH tool such as putty.
Next, run the below script to create a backup of your Nagios XI;
/usr/local/nagiosxi/scripts/backup_xi.sh
Then you will get a successful backup with the following message:
===============
BACKUP COMPLETE
===============
The backup will be stored in /store/backups/nagiosxi/2xxxxx.tar.gz location.
To create manual backups via the web UI via "Admin - System Backups - Local Backup Archives".
Next, click on the "Create Backup" button to enable the backup process to start. The backup process status will not appear on the page. However, you will know the backup completion once the .tar.gz file appears in the list of backups.
As earlier stated database corruption is one of the main reasons for the failure of backups. Here is backup failed due to database corruption which we came across recently.
Running configuration check…
Stopping nagios: done.
Starting nagios: done.
Backing up Core Config Manager (NagiosQL)…
tar: Removing leading `/’ from member names
tar: Removing leading `/’ from member names
Backing up Nagios Core…
tar: Removing leading `/’ from member names
tar: /usr/local/nagios/var/ndo.sock: socket ignored
tar: /usr/local/nagios/var/rw/nagios.qh: socket ignored
Backing up Nagios XI…
tar: Removing leading `/’ from member names
Backing up MRTG…
tar: Removing leading `/’ from member names
Backing up NRDP…
tar: Removing leading `/’ from member names
Backing up Nagvis…
tar: Removing leading `/’ from member names
Backing up MySQL databases…
mysqldump: Got error: 130: Incorrect file format ‘nagios_flappinghistory’ when using LOCK TABLES
Error backing up MySQL database ‘nagios’ – check the password in this script!
From the error report above, you will see that the error "check the password in this script" is just a generic message and is not the cause. The line before it explains the issue here.
In order to fix this problem, we repaired the Nagios and nagiosql databases. To do this, run the following commands in the command line as the root user:
/usr/local/nagiosxi/scripts/repairmysql.sh nagios
/usr/local/nagiosxi/scripts/repairmysql.sh nagiosql
/usr/local/nagiosxi/scripts/repairmysql.sh nagiosxi
If you are running Nagios XI 2014 onwards, you can use;
cd /usr/local/nagiosxi/scripts/
./repair_databases.sh
Then we run a force repair on the tables. To do that, here are the commands to run;
service mysqld stop
cd /var/lib/mysql/nagios
myisamchk -r -f nagios_<corrupted_table>
service mysqld start
rm -f /usr/local/nagiosxi/var/dbmaint.lock
php /usr/local/nagiosxi/cron/dbmaint.php
systemctl stop mariadb.service
cd /var/lib/mysql/nagios
myisamchk -r -f nagios_<corrupted_table>
systemctl start mariadb.service
rm -f /usr/local/nagiosxi/var/dbmaint.lock
php /usr/local/nagiosxi/cron/dbmaint.php
service mysql stop
cd /var/lib/mysql/nagios
myisamchk -r -f nagios_<corrupted_table>
service mysql start
rm -f /usr/local/nagiosxi/var/dbmaint.lock
php /usr/local/nagiosxi/cron/dbmaint.php
systemctl stop mysql.service
cd /var/lib/mysql/nagios
myisamchk -r -f nagios_<corrupted_table>
systemctl start mysql.service
rm -f /usr/local/nagiosxi/var/dbmaint.lock
php /usr/local/nagiosxi/cron/dbmaint.php
After repairing the database, you can now run a manual backup to ensure that the issue is solved.
Then to remove the directories of failed backup files by executing the following command;
find /store/backups/nagiosxi/ -mindepth 1 -maxdepth 1 -type d -exec rm -rf ‘{}’ \;
If you wish to see what exactly is happening when the scheduled backups are running, then you can inspect the output live by executing the following command;
tail /usr/local/nagiosxi/var/cmdsubsys.log -f
Press CTRL + C when you have finished.
This article will guide you on how to solve Nagios scheduled backups failure which occurs when the database is currupted.