Backup and Restore Nagios Log Server

Server Management Service

Are you trying to perform Backup and Restore Nagios Log Server?

This guide is for you.

Sometimes, we come across situations where we lose an instance and end up losing our data. In such cases, the Backup and Restore of Nagios Log Server help store our data even when it is held on other instances.

Here at Ibmi Media, as part of our Server Management Services, we regularly help our Customers to fix Nagios related queries.

In this context, we shall look into the steps to Backup and Restore Nagios Log Server.

Backup and Restore Nagios Log Server

Log Server is a powerful log monitoring and management application. It allows organizations to quickly view, sort, and configure logs from any source, on any given network.

In order to see how to perform backup and restore a Nagios Log Server cluster, we need to know its performance mechanisms. Let us look into its detail now.

Backup Nagios Log Server

Since Nagios Log Server is a cluster oriented application, its Backups are slightly different from regular backup methods.

The backup methodology in Nagios Log Server ensures that it is held on all instances in the Nagios Log Server cluster. Thus, in the event that we lose an instance, we won't lose any data as it is held on other instances.

Nagios Log Server has several backup methods. Let us see a few methods to do this.

Snapshots

They are point-in-time backups of the log data that exists in the Elasticsearch database. Snapshot performs on the entire cluster. During the snapshot and maintenance job, a node will run the commands to create a new snapshot.

These snapshots are stored in a Snapshot Repository, accessible by all instances in the Nagios Log Server cluster

Since the snapshot is of indexes that have shards allocated to different instances, we need an NFS or CIFS share so that those instances can store their data in the snapshot.

It is important that the snapshot repository is on another server if we have a single instance Nagios Log Server cluster. Thus, avoiding data loss if the instance was to completely destroy.

Config Snapshots

Config Snapshots create automatically whenever we apply the configuration. However, it is possible to create manual config snapshots. They are backups of the Inputs, Filters, and Outputs for Logstash.

They are in the /usr/local/nagioslogserver/snapshots/ location on every instance in the cluster. Each instance in the cluster will have a copy of the config snapshots in this location guaranteeing that if an instance goes down the others have a copy of it.

Config snapshots allow us to roll back to a point in time in the scenarios where we did not like the changes previously made. Manual snapshots will remain until we choose to delete them.

Config snapshots can be found by navigating to Configure > Configure > Config Snapshots.

Manual and auto-created snapshots allow us to:

i. Download the .tar.gz file to the computer

ii. Restore the snapshot to all instances in the cluster

System Backups

System Backups contain the configuration settings, dashboards, users, internal logs, alerts. They also include the Inputs, Filters, and Outputs for Logstash.

The location of these backups is /store/backups/nagioslogserver/ and the file named is based on the current date and epoch value.

For example, nagioslogserver.2017-05-09.1494303122.tar.gz.

These backups are on every instance in the log server cluster. Whenever we schedule a backup job to run, each instance will create a local copy of the backup.

Thus, if we were to lose an instance in the cluster, another instance will have a copy of this backup. However, this does not protect if all of the instances were to be lost in a disaster.

We recommend periodically take a copy of the system backup to an external location to ensure we can restore the Nagios Log Server.

System backups are configured to run once a day as a system job. By default, the time they run base on when we install the first instance in the Nagios Log Server cluster.

We can find the backups system job at Admin > System > Command Subsystem. From here we can change the frequency of the snapshot and also initiate one.

There is no location in the Nagios Log Server GUI to view the system backups. To ensure the backups exist, we need to establish a terminal session to a Nagios Log Server instance and check the directory.

What Happens In A Disaster ?

Multiple Instance Cluster – Losing One Instance

When we have multiple instances in the Nagios Log Server cluster and we lose one of those instances, generally the impact is minimal.

The cluster will continue to function as the Elasticsearch data is spread across the instances

Any log data that is being sent to the down instance will not receive

We can attempt to repair the problem causing the instance to fail in order to return the cluster back to a healthy state. Once the instance re-connects to the cluster Elasticsearch will automatically update with the rest of the data in the cluster.

If we expect the instance to be down for some time we should reconfigure the devices to send data to another instance.

If updating every device to send data to another instance is too time-consuming, we can run up a fresh install of Nagios Log Server that uses the existing IP address of the old instance and add it to the existing cluster.

It is worth mentioning that any one instance in the cluster is no more important than another.

Multiple Instance Cluster – Losing Multiple Instances

The effect is fundamentally the same as losing one instance. This has to do with how the Elasticsearch data is spread across the instances.

To do this, follow the steps outlined in the previous section.

Single Instance Cluster OR Multiple Instance Cluster – Losing ALL Instances

This scenario is more likely going to occur when we have a single instance cluster or a multi-instance cluster.

However, the recovery steps are the same for the single instance cluster.

i. Run up a fresh install of Nagios Log Server on a new instance

ii. Then restore the System Backup

iii. Mount the Snapshot Repository

iv. Finally, restore the indexes to recover the log data

In a multi-instance scenario, once we have the first instance running then it is a simple matter of running up additional instances and adding them to the cluster.

Single Instance OR Multiple Instance Cluster – Losing ALL Instances and NO SYSTEM BACKUP

This is the worst-case scenario, it is more likely going to occur when we have a single instance cluster.

It requires additional manual steps to recover if we don’t have a copy of the System Backup. All of the data in the System Backup includes in the snapshots. So, as long as we have the snapshot repository available, we will be able to recover.

i. Initially run a fresh install of Nagios Log Server on a new instance and install it as a new cluster

ii. Then mount the Snapshot Repository

iii. Subsequently, restore the recent snapshots of kibana-int, nagioslogserver, and nagioslogserver_log

iv. Then restore the indexes to recover the log data

v. Finally, reset the backend jobs

In a multi-instance scenario, once we have the first instance to run, it is just a matter of running up additional instances and adding them to the cluster.

Restore Nagios Log Server

Do not to follow these steps if we have lost an instance in a multi-instance cluster and need to repair it.

However, we recommend these steps if we lose all instances in the cluster(single instance or multiple instances).

To test restoring a system backup this method is good. We do not need to perform this action in an isolated network as the restored cluster has a different ID and won't affect the production cluster.

1. Fresh Install Of Nagios Log Server

The first step is to run up a fresh install of Nagios Log Server on the existing hardware of an instance that died.

However, it is recommended to perform a clean install of the RHEL or CentOS operating system.

There are two methods for installing Nagios Log Server, they both perform a full installation, quick and manual.

Quick installation:

For a quick install, we run:

# curl https://assets.nagios.com/downloads/nagios-log-server/install.sh | sh

This command will download and install Nagios Log Server. Then, proceed to the Finalize Installation section.

Manual Download:

Alternatively, we can install Nagios XI by issuing the following commands in the terminal session:

# cd /tmp
# wget https://assets.nagios.com/downloads/nagios-log-server/nagioslogserver-latest.tar.gz
# tar xzf nagioslogserver-latest.tar.gz
# cd nagioslogserver
# ./fullinstall

To install a specific version of Nagios Log Server, we visit https://assets.nagios.com/downloads/nagios-log-server/versions.php to obtain the URL and use that in the wget command above.

Finalize Installation

Once the installation is complete we will receive the following message:

Nagios Log Server Installation Success!

You can finish the final setup steps for Nagios Log Server by visiting:

http://<server_ip_address>/nagioslogserver

i. We, navigate to the user interface by using the URL provided in the terminal session.

ii. Then, in the Final Installation Steps screen,

iii. Select Install if this is the first server in the Nagios Log Server cluster.

iv. Select Connect if we want to add this server to an existing Nagios Log Server cluster.

This will direct us to a page of fields to populate before proceeding.

If we have already purchased Nagios Log Server we can add the license key here.

v. Under Admin Account Setup populate the fields to continue.

vi. Once ready, click Finish Installation to save these settings.

Please wait while the settings apply to the server, once complete we will see the Login screen with the status, Installation Complete.

Here, we type the username and password to login to Nagios Log Server and then click the login button to begin.

We will be logged in to Nagios Log Server and be placed at the home screen.

Once the install is complete DO NOT navigate to the user interface to complete the final installation steps. Leave the terminal open and continue with the below steps.

Restore The System Backup

Before the final installation step, we need to transfer the system back up to the /store/backups/nagioslogserver/ directory on this instance.

We can use a program like WinSCP to do the transfer or use SCP.

In order to restore the system backup, we execute:

# cd /usr/local/nagioslogserver/scripts/
# ./restore_backup.sh /store/backups/nagioslogserver/nagioslogserver.2017-05-10.1494373596.tar.gz

We can see that the backup file in this example is nagioslogserver.2017-05-10.1494373596.tar.gz, we need to change this to match the name of the system backup.

We know, it was a success when we receive the “Restore Complete!” message.

At this point, we should open the web GUI to this instance and log in to check that it is OK. Dashboards, inputs, filters, users, and other settings should exist. However, there will not be any log data available to query against.

Mount The Snapshot Repository

Now we mount the snapshot repository that contains the existing snapshots.

For example, here, we will mount it to /snapshot_repository.

i. Once the repository is mounted, open the web GUI and navigate to Admin > System > Snapshots & Maintenance.

ii. Then click the Create Repository button and populate the fields for the new repository

iii. The Location field will be /snapshot_repository in this example

iv. Click the Add Repository button to create the repository

Now that we have the repository, the Snapshots list will populate with the existing snapshots that can restore.

Restore Indexes

To restore the existing log data we need to restore the indexes from the snapshot repository.

i. Click the Restore icon to restore the required snapshot

ii. Then select the indexes that we want to restore and then click the Restore Indexes button

iii. The restore process will run in the background

iv. To confirm, navigate to Admin > System > Index Status

This completes the process of restoring Nagios Log Server from a system backup. At this point, we can add more instances to the cluster if necessary.

Restoring WITHOUT A System Backup

Our Support Techs suggest following these steps if we lose all instances in the cluster, AND we do not have a copy of the system backup.

We should only follow these steps in a worst-case scenario or for a test.

Fresh Install Of Nagios Log Server

The first step is to run up a fresh install of Nagios Log Server. This can be on the existing hardware of an instance that died.

However, it is recommended that we perform a clean install of the RHEL or CentOS operating system. Once the install is complete navigate to the user interface and complete the final installation steps.

We need to have a functioning Nagios Log Server cluster for this method to work. Leave the terminal open.

Mount The Snapshot Repository

Now mount the snapshot repository that contains the existing snapshots.

For example, here we will mount to /snapshot_repository.

i. Once done, open the web GUI and navigate to Admin > System > Snapshots & Maintenance.

ii. Click the Create Repository button and populate the fields for the new repository

iii. The Location field will be /snapshot_repository in this example

iv. Then click the Add Repository button to create the repository

Now that the repository has been created, the Snapshots list will be populated with the existing snapshots that can restore. DO NOT attempt to restore any indexes at this point.

Restore kibana-int, nagioslogserver and nagioslogserver_log

These steps will restore everything that would have been restored normally using a system backup.

We need to open the terminal to perform these steps.

i. First, we need the name of the snapshot repository by executing the command:

# curl -XGET “localhost:9200/_snapshot?pretty”

Our output will be similar to:

{
“snapshot_repository” : {
“type” : “fs”,
“settings” : {
“compress” : “true”,
“location” : “/snapshot_repository”
}
}
}

ii. We require the name snapshot_repository for the next command. This command will show all available snapshots taken in the past 5 days:

# curator show snapshots –repository snapshot_repository –newer-than 5 –time-unit days

For example, out output will be similar to:

2017-05-10 16:47:21,198 INFO Job starting: show snapshots
2017-05-10 16:47:21,219 INFO Matching snapshots:
curator-20170509213607
curator-20170508714688
curator-20170507212685
curator-20170506318623
curator-20170505263402

Our snapshot target in this example is curator-20170509213607. Now we need to confirm that this snapshot contains the required indexes.

iii. The following command requires the name of the repository snapshot_repository and the snapshot curator-20170509213607:

# curl -XGET ‘localhost:9200/_snapshot/snapshot_repository/curator-20170509213607?pretty’

The output will be like this:

{
“snapshots” : [ {
“snapshot” : “curator-20170509213607”,
“version_id” : 1070699,
“version” : “1.7.6”,
“indices” : [ “kibana-int”, “logstash-2017.05.09”, “nagioslogserver”, “nagioslogserver_log” ],
“state” : “SUCCESS”,
“start_time” : “2017-05-09T21:36:07.261Z”,
“start_time_in_millis” : 1494365767261,
“end_time” : “2017-05-09T21:36:07.976Z”,
“end_time_in_millis” : 1494365767976,
“duration_in_millis” : 715,
“failures” : [ ],
“shards” : {
“total” : 16,
“failed” : 0,
“successful” : 16
}
} ]
}

iv. Specifically, this line is what we are after:

“indices” : [ “kibana-int”, “logstash-2017.05.09”, “nagioslogserver”, “nagioslogserver_log” ],

It tells us that it has recent snapshots of the kibana-int, nagioslogserver, and nagioslogserver_log indexes.

Now that we have confirmed the repository contains the indexes required we can now restore them.

v. Each index requires two commands to perform a restore. While the first command closes the index, the second command restores the index.

This command closes the kibana-int index:

# curl -XPOST ‘localhost:9200/kibana-int/_close?pretty’

Which should output:

{
“acknowledged” : true
}

This command restores the kibana-int index:

# curl -XPOST ‘localhost:9200/_snapshot/snapshot_repository/curator20170509213607/_restore?pretty’ -d ‘{ “indices”:”kibana-int”}’
Which should result:
{
“accepted” : true
}

vi. The following commands will restore the nagioslogserver and nagioslogserver_log indexes:

# curl -XPOST ‘localhost:9200/nagioslogserver/_close?pretty’
# curl -XPOST ‘localhost:9200/_snapshot/snapshot_repository/curator20170509213607/_restore?pretty’ -d ‘{ “indices”:”nagioslogserver”}’
# curl -XPOST ‘localhost:9200/nagioslogserver_log/_close?pretty’
# curl -XPOST ‘localhost:9200/_snapshot/snapshot_repository/curator20170509213607/_restore?pretty’ -d ‘{ “indices”:”nagioslogserver_log”}’

Once these commands execute, we should refresh the Nagios Log Server interface. Navigate around to confirm the restoration of dashboards, inputs, filters, users, and other settings is a success.

Restore Indexes

To restore the existing log data, we need to restore the indexes from the snapshot repository.

i. Navigate to Admin > System > Snapshots & Maintenance

ii. Then click the Restore icon to restore the required snapshot

iii. Subsequently, select the indexes to restore and then click the Restore Indexes button

iv. The restore process will run in the background

v. Finally, to confirm the restoration, navigate to Admin > System > Index Status.

This completes the process of restoring Nagios Log Server without a system backup. At this point, we can add more instances to the cluster if required.