Having a solution of Continuous Integration and Continuous Delivery is very good. It's even a prerequisite for an Agile team. But we must ensure that it is operational and that people are alerted in case of problems. It is precisely this role that Nagios fulfills.


We can classify the monitoring tools into two categories: black boxes (like Nagios), which simply check that services answer to the pings or that there is enough disk space, and white boxes (like Prometheus), which do more by ensuring that the services work properly.

Nagios is a server health monitoring solution released in 1999. It has a large community and a large number of plugins and extensions. His name comes from "Nagios Is not Gonna Insist On Sainthood".

Nagios allows you to monitor hosts and services and to indicate problems (example: unreachable host, disk of a host full, ...).

There are two verification classes:

  • active checks: Initiated by Nagios, they regularly execute code. It's typically used to monitor HTTP services, SSH or databases;
  • passive checks: Initiated by a system other than Nagios, it's notifications sent to Nagios of the result of the verification. They are typically used to monitor asynchronous services or services behind a firewall.

Specifically, Nagios is able for example to ping a web server every 10 minutes and trigger an action such as sending an email to a predefined contact group if the server does not respond within the time limit.

The configuration of Nagios is done via configuration files.

In this article, we will see how to implement Nagios with Docker. For that, we will use the image jasonrivers/nagios. The scenario we are going to follow is the following:

  • A container run an instance of Nagios.
  • Another container runs an instance of the Nginx Web server.
  • When the Nginx container is stopped, the Nagios instance must issue an alert.

To deploy these Docker containers, we will use Docker Compose. Therefore, we will use a docker-compose.yml file like that:

version: '3'
services:
 nagios:
  image: jasonrivers/nagios
  ports:
   - 8081:80
  environment:
   - NAGIOSADMIN_USER=nagiosadmin
   - NAGIOSAMDIN_PASS=nagios
  volumes:
   - ./etc/:/opt/nagios/etc/
 nginx:
  image: nginx
  ports:
   - 8082:80

This file defines two services: nagios, which runs on the port 8081, and nginx, which runs on the port 8082. It also defines the username and password of Nagios (nagiosadmin / nagios) and shares the directory /opt/nagios/etc from the Nagios instance via the local subdirectory ./etc. It is in this directory that we store the Nagios configuration files.

We are ready to launch nagios and nginx services:

docker-compose up

You can verify that both services are running with a web browser by going to:

  • http://localhost:8081 to verify that the Nagios instance has started successfully,
  • http://localhost:8082 to verify that the Nginx Web Server instance has started successfully.

The basic Nagios configuration file is nagios.cfg. It is in the ./etc subdirectory. It is a good practice to divide all information into separate files, for example: hostsgroups.cfg, hosts.cfg, and services.cfg.

For that, you must declare them in the nagios.cfg file. In the "OBJECT CONFIGURATION FILE (S)" section, add these lines:

cfg_file=/opt/nagios/etc/objects/hostgroups.cfg
cfg_file=/opt/nagios/etc/objects/hosts.cfg
cfg_file=/opt/nagios/etc/objects/services.cfg

The hostgroups.cfg file is used to define host groups. It's very convenient when you manage a server farm. Here is an example of content:

define hostgroup{
  hostgroup_name      test-group
  alias               Test Servers
  members             my_client
}

The hosts.cfg file is used to define the hosts. Here is an example of the contents of the hosts.cfg file:

define host{
  use                 generic-host
  host_name           my_client
  address             IP_ADDRESS
  contact_groups      admins
  max_check_attempts  1
  notes               Test my_client
}

Instead of "IP_ADDRESS", write the IP address (for example 192.168.204.31).

Finally, we define the services to be tested in the services.cfg file:

define service{
  use                 generic-service
  hostgroup_name      test-group
  service_description Ping
  check_command       check_ping!200.0,20%!600.0,60%
}

define service{
 host_name            my_client
 use                  generic-service
 service_description  Nginx Web server
 check_command        check_http_port!8082
}

define command{
 command_name         check_http_port
 command_line         $USER1$/check_http -I $HOSTADDRESS$ -p $ARG1$
}

What are we doing here? Well, we define two services:

  • a first "ping" service pinging the servers of the test-group,
  • a second "Nginx Web server" service performing a test of connection to the HTTP server on the port 8082.

We use the plugin "check_http". The basic syntax is:

check_command check_http

However, this plugin does the test on port 80 by default. Our server running on port 8082, we have to modify the way we call this plugin by specifying the port, namely 8082.

If you go to Nagios, after a certain period of time you will see that the verification of the services is done successfully and their status goes green.

Now, let's stop the Nginx service with this command:

docker-compose stop nginx

By returning to Nagios, after a certain delay, and after refreshing the page, you will see that the service verification fails and their status turns red.

Now, it would be nice to be alerted by email. To do this, you must add a contact, or simply modify the predefined contact. To do that, open the contacts.cfg file and specify the recipient's email in the line starting with "email":

define contact {

  contact_name nagiosadmin

  use generic-contact

  alias Nagios Admin

  email XXXXXX

 }

Here are some examples of plugins to monitor services.

In first, there are plugins to verify that a number of servers are working properly. Essentially, this is to verify that they return the expected header when connecting:

  • check_http to monitor an HTTP server,
  • check_ftp to monitor an FTP server,
  • check_ssh to monitor an SSH server,
  • check_smtp to monitor an SMTP server,
  • check_pop to monitor a POP3 server,
  • check_imap to monitor an IMAP server.

But there are also other plugins that can monitor servers:

  • check_local_disk to check the available disk space: it is thus possible to set up a notification of type warning if there remains less than 20% free space and an alert notification if it remains less than 10%;
  • check_local_users to check the number of users currently logged in;
  • check_local_procs to check the number of processors running on a server: it is possible for example to define a notification of type warning if there are more than 200 processes and an alert notification if there are more than 400;
  • check_mem to check the available memory;
  • check_local_load to check the server load;
  • check_local_swap to check the use of the swap space;
  • check_raid to check the RAID operation of the hardware;
  • check_ddos to detect DDOS attacks.

Nagios is one of the services monitoring solutions, among others. It has the advantage of being very old and having a large number of plugins. With a solution like Nagios, the supervision of hosts and their services becomes a much simpler exercise. But there are other more modern monitoring tools. We will see them later.

 


Pensées pour mieux produire

Soyez prévenu dès que mon livre "Pensées pour mieux produire" sera disponible à la vente !

DevOps, Agile, Scrum, Kanban, XP, SAFe, LeSS, Lean Startup, Lean UX, Design Thinking, Craftmanship, Management 3.0, ...

 

Bruno Delb

Agile Coach and DevOps, with an experience in the Medical Device software domain, Management 3.0, Agile games and development (especially on mobile) are my passion.

Search

Ads