Linux Network Monitoring

Jump to Table of Contents

Network Monitoring

Statistics gathering and presentation
Usually continuous recording of system stats, but periodic update of presentation
Real-time Status
e.g. service availability or current CPU load
Alerts on Critical Conditions
e.g. Email and pager when disk almost full
Log Analysis
Usually after the fact (periodic reports), but real-time analyzers do exist

Slide Deck and Source Code Archives

Real-time Status: Applets

various dockapps showing system status

As you can see, it is possible to monitor a select number of machines using dockapps. Basically you run the dockapp or gnome/kde applet on the machine you want to monitor, but display it on the 'X' display of the server.

By using ssh, you can initiate the applet/dockapp from the server. I use a command line such as the following.

ssh -aCXY remotehost "asmon"

It is important to note that, at least for debian, one needs the package xbase-clients installed on the machine running the applet in order to run X applets

A Few Log Tools

  • logwatch: Watches for regexps and creates a daily summary of the syslog for those regexps
  • logcheck: Reports everything except what is specifically excluded (though the exclude lists are fairly comprehensive), on an hourly basis (depending on the amount of logged information).
  • ccze: Fairly comprehensive log colourizer
  • lwatch: Small log colourizer

Munin

What is Munin

  • Client which records statistics on
    • CPU usage & load
      • memory
      • network traffic
      • disk usage
      • I/O throughput
      • and more…
  • Server which periodically retrieves statistics over network (or localhost) and graphs them

Munin Interface

  • Configuration is performed by editing text files in /etc/munin

  • Statistic graphs are presented as static HTML pages which are automatically regenerated periodically (by default, every five minutes)

  • To see what munin’s graphs look like, you may wish to view a snapshot of munin on my network

Installing Munin: Client

  • On a Debian system, execute the command:

    apt-get install munin-node

  • On a RedHat Enterprise Linux system:

    Get munin-node-1.2.4-5rhel3.noarch.rpm, e.g. from the Munin Sourceforge Project and execute the command:

    rpm -ivH munin-node-1.2.4-5rhel3.noarch.rpm

Configuring Munin: Client

  1. Edit the file /etc/munin/munin-node.conf

  2. Add an ‘allow’ line with the ip of the server. It must be a perl regular expression because the Net::Server perl module which munin depends on doesn’t understand CIDR-style network notation. You must include an allow line for every munin server allowed to connect to this client.

     allow ^127\.0\.0\.1$
     allow ^192\.168\.8\.204$
    
  3. Start the munin-node service (on debian, /etc/init.d/munin-node start)

  4. You may wish to view an example munin-node.conf.

Installing Munin: Server

  • On a Debian system, execute the command:

    apt-get install munin

  • On a RedHat Enterprise Linux system: Get munin-1.2.4-5rhel3.noarch.rpm, e.g. from the Munin Sourceforge Project and execute the command:

    rpm -ivH munin-1.2.4-5rhel3.noarch.rpm

Configuring Munin: Server

  1. Edit the file /etc/munin/munin.conf, adding records for each client

  2. Records are of the form:

     [descriptive_name]
     address clientname.your.domain
     use_node_name yes
    
    • ‘descriptive_name’ is the name which will appear on the pages for the client
    • ‘clientname.your.domain’ is the IP or domain name of the client
    • ‘use_name_name yes’ tells the server to use all the available plugins (each type of statistic has a plugin) for the client

The client listing part of my munin.conf is shown below

[revor.fionavar.dd]
    address revor.fionavar.dd
    use_node_name yes

[mornir.fionavar.dd]
    address mornir.fionavar.dd
    use_node_name yes

[darien.fionavar.dd]
    address darien.fionavar.dd
   use_node_name yes

You may also be interested in a full munin.conf example

Nagios

What is Nagios

  • Host monitoring
    • Host status: up/down, cpu, load, swap, processes, disk usage, and other plugins…
  • Network monitoring: nfs, ldap, ntp, httpd, dns, smtp, and more…
  • Other Monitoring Plugins may easily be written for host, network and other monitoring
  • Alerts (email, pager, sms text messages, IM, audio)

You may wish to view a snapshot of Nagios on my network

Nagios Interface: Overviews

Enter through Nagios’ Main Screen

  • The Tactical Overview attempts to summarize the state of nagios in one screen. It is less useful than the status summary and overview screens when it comes to seeing the state of hosts

  • A Status Overview displays the host status (up,down,unknown) as well as a summary of the status of services on the hosts. Quite useful when you have many hosts.

  • Like the Status Overview Status Summary is a status summary, but instead of individual hosts it summarizes hostgroups. This is useful if you have so many hosts that the Status Overview results in information overload.

Nagios: Host & Service Status

  • With a small number of hosts, the most useful screen is the Service Details which lists the status of everything that is monitored by Nagios.

  • The Status Grid provides a less detailed summary, and is suited to a moderate number of hosts.

  • The monitored hosts up/down/unknown status is summarized in the Host Detail page.

  • Service Problems provides a list of services that are warning or critical.

  • The Host Problems page lists all hosts that are down.

  • There is also a Network Outages page.

Nagios: Reports & Graphs

  • Trends
  • Availability: Indicates how often a host/service is up (available)
  • Alert Histogram: How frequently alerts occurred
  • Alert History: A log of alerts
  • Alert Summary
  • Notifications: A log of notifications
  • Event Log

Installing Nagios: Overview

This portion of the presentation will assume debian package names and configuration

  • On the server (collector):
    • Need a web server
    • Need nagios, nagios-plugins, a database backend (e.g. nagios-text), and, if using nagios-nrpe, nagios-nrpe-plugin
  • For host monitoring the clients to be monitored need one or more of nagios-nrpe-server, nagios-plugins, and nagios-statd-server

In the following configuration files, the following definitions apply:

  • for services:

    w = warn, u = unreachable, c = critical, r = recovery

  • for hosts:

    d = down, u = unknown, r = recovery

Also note that for most plugins executing the plugin with the -h option will list the arguments the plugin accepts.

Nagios Config

Overview

  • Define commands (using plugins)

  • Define contacts for alerts and other notifications

  • Define time periods (to alert different people at different times of day/week or to not monitor workstations at night because they're turned off, for example)

  • Define hosts to monitor

  • Define services and optional dependencies (don't report service 2 down if service 1 is down)

Templates

  • Reduce the number of directives in configuration files
  • Basically a regular object definition, except that
  • It has a name directive in the object definition
  • It has a ‘register 0’ directive
  • Can include other templates
  • To use it, one adds a ‘use template_name’ to a regular object or another template (usually before the object-specific parameters as they can override the template)
Example of Templates
define host{
        use                     generic-host        ; Name of host template to use

        name                    fionavar-defaults
        check_command           check-host-alive
        max_check_attempts      20
        notification_interval   60
        notification_period     24x7
        notification_options    d,u,r
        register        0   ; TEMPLATE, NOT REAL HOST
        }

define host {
        use                     fionavar-defaults
        host_name               mornir
        alias                   Mornir
        address                 192.168.8.2
        }



Main

nagios.cfg

The primary configuration file is /etc/nagios/nagios.cfg

Debian uses include directives in this file to logically divide up the various configuration sections

cfg_file=/etc/nagios/checkcommands.cfg
cfg_dir=/etc/nagios-plugins/config/
cfg_file=/etc/nagios/misccommands.cfg
cfg_file=/etc/nagios/contactgroups.cfg
cfg_file=/etc/nagios/contacts.cfg
cfg_file=/etc/nagios/dependencies.cfg
cfg_file=/etc/nagios/escalations.cfg
cfg_file=/etc/nagios/hostgroups.cfg
cfg_file=/etc/nagios/hosts.cfg
cfg_file=/etc/nagios/services.cfg
cfg_file=/etc/nagios/timeperiods.cfg
Contacts

You will need to edit /etc/nagios/contacts.cfg

# 'efs' contact definition
define contact{
        contact_name                    efs
        alias                           Nagios Admin
        service_notification_period     24x7
        host_notification_period        24x7
        service_notification_options    w,u,c,r
        host_notification_options       d,u,r
        service_notification_commands   notify-by-email
        host_notification_commands      host-notify-by-email
        email                           efs@mail.fionavar.dd
        }

notification_commands can include commands for pager calls, sms messages, and so on while notification_options indicates what type of events this contact accepts notifications for. In this case warning, unknown, critical, and recovery for services, and down, unknown, and recovery for hosts.

Contact Groups

You will need to edit /etc/nagios/contactgroups.cfg

# 'nagios-admins' contact group definition
define contactgroup{
        contactgroup_name       nagios-admins
        alias                   Nagios Admins
        members                 efs
        }
Hosts

/etc/nagios/hosts.cfg

define host{
        use                     generic-host        ; Name of host template to use

        name                    fionavar-defaults
        check_command           check-host-alive
        max_check_attempts      20
        notification_interval   60
        notification_period     24x7
        notification_options    d,u,r
        register        0   ; TEMPLATE, NOT REAL HOST
        }

define host {
        use                     fionavar-defaults
        host_name               mornir
        alias                   Mornir
        address                 192.168.8.2
        }
Host Groups

/etc/nagios/hostgroup.cfg

/etc/nagios/hostgroup.cfg)
define hostgroup{
        hostgroup_name  fionavar
        alias           Daniel's Computers
        contact_groups  nagios-admins
        members         mornir,darien,revor
        }

Note that you need a host definition for every host to monitor, and every host needs to be part of some hostgroup.

Nagios Service Definitions

The heart of Nagios is the service definitions file

  • For host status monitoring
    • Using nagios-statd-server/client is easier
    • Using nagios-nrpe remote plugin execution is a bit more work, but can do things nagios-statd-server can’t, especially for network monitoring
  • For network status monitoring
    • network services are tested from the host/server, unless you use nagios-nrpe

See the files in /etc/nagios-plugins for available commands

Nagios Service Template
# Generic service definition template
define service{
    ; The 'name' of this service template, referenced in other service definitions
    name                generic-service
    active_checks_enabled    1  ; Checked by a nagios command
    passive_checks_enabled   0  ; External prog gives us status in a file
    parallelize_check        1  ; Active checks in parallel
    obsess_over_service      0  ; For distributed monitoring (advanced topic)
    check_freshness          0  ; Has it been too long (esp. passive)?
    notifications_enabled    0  ; Service notifications are disabled
    event_handler_enabled    0  ; Service event handler is disabled
    flap_detection_enabled   0  ; Detect flip-flopping service state
    process_perf_data        1  ; Process performance data
    retain_status_information 1 ; Retain status information across program restarts
    retain_nonstatus_information 1  ; Retain non-status infoacross program restarts

    register       0    ; DONT REGISTER, ITS NOT A REAL SERVICE, JUST A TEMPLATE!
    }
Nagios Base Template
# Base options for most services
define service{
    use         generic-service     ; Name of service template to use
    name            fionavar-service
    is_volatile     0     ; For services to report every time non-OK
    check_period        24x7
    max_check_attempts  3
    normal_check_interval   5
        retry_check_interval    1
        contact_groups          nagios-admins
        notification_interval   240
        notification_period     24x7
        notification_options    w,c,r
        register                0   ; TEMPLATE ONLY
}
Service Examples
Example 1

This service checks that the hosts specified are up (in this case all hosts in hostgroup fionavar)

define service{
    use fionavar-service
    hostgroup_name         fionavar
    service_description    net-local-ping
    check_command          check_ping!100.0,20%!500.0,60%
    }

Note the use of hostgroup; one could also specify a single host, or a comma-separated lists of hosts. Specifying multiple hostnames or a hostgroup results in a line for each host+service in the service details screen.

Example 2

The following service checks that ssh on the local machine (mornir) is working

define service{
    use fionavar-service
    host_name              mornir
    service_description    net-local-ssh
    check_command          check_ssh
    }
Example 3

The following service checks that ssh to darien and revor are available from mornir (the nagios server)

define service{
    use fionavar-service
    host_name              darien,revor
    service_description    net-intra-ssh
    check_command          check_ssh
    }
Adding New Service Commands

etc/nagios/checkcommands.cfg is used to define local service commands, for example:

define command{
        command_name    check_privoxy
        command_line    /usr/lib/nagios/plugins/check_http -I $ARG1$ -p $ARG2$ -
u $ARG3$
        } ; $ARG1$ and $ARG2$ are replaced by the arguments you
      ; specify in the service definition with `!' (bang)

define command{
        command_name    check_ifstatus_router
        command_line    /usr/lib/nagios/plugins/check_ifstatus -H $HOSTADDRESS$
-x $ARG1$
        } ; $HOSTADDRES$ is always a single ip address, derived from
      ; the service definition and hostgroup.cfg and host.cfg files

Activating Notifications

Once you have configured your services, you may want to be alerted, e.g. by email, when warning or critical conditions are reached

  1. Load http://nagiosserver/nagios
  2. Choose ‘Status Overview’ or ‘Status Summary’ from the sidebar
  3. Click on hostgroup name (fionavar in this case)
  4. Click ‘Enable notifications for all hosts in this hostgroup’
  5. Click ‘Commit’
  6. Now the notifications defined in contacts.cfg and services.cfg will be active (e.g. email, pager on reaching critical status)

Monitoring

Local (Server) Monitoring

You can monitor disk, swap, etc on the server. For example, to monitor the load averages on the Nagios server

define service{
    use                   fionavar-service
    host_name             mornir
    service_description   stats-load
    check_command         check_load!1.5!2!2!2!2.5!2.5
    }

The first three numbers are the levels at which a warning is issued (for load average over 1 minute, 5 minutes, and 15 minutes), and the second set are the level at which a critical alert is issued.

Disk Space

If you have installed nagios-statd-server on the clients (hosts) and nagios-statd on the server, you can monitor the hosts’ cpu, load, disk, etc.

nagios-statd requires the use of a real device not a symlink.

For all commands ! is the separator for arguments to the command.

define service{
    use                   fionavar-service
    host_name             darien
    service_description   disk-root
    check_command check_disk_statd_level!/dev/ide/host0/bus0/target0/lun0/part1!85!90
    }
Swap, # of processes
define service{
    use                   fionavar-service
    hostgroup_name        fionavar
    service_description   stats-swap
    check_command         check_swap_statd
    }

Notice the notification_options override below
define service{
    use                   fionavar-service
    host_name             darien,revor
    service_description   stats-num-proc
    check_command         check_procs_statd
    notification_options  c,r
    }
Nagios-NRPE Overview
  • Execute a plugin on a client and have the results returned as if the plugin where executed on the server.

  • Con: There are no pre-configured commands

  • Con: There is no secure way to specify arguments to the commands in a service definition, so it is recommended that you define the commands on a per-client basis.

  • Pro: You can test the availability of network services to the client instead of the server

    • To debug a firewall
      • To check the status of hosts and services visible the (nagios) client, but not the server
  • nagios-statd uses the same port as nrpe you will want to use the equivalent local plugins on the server instead.

Nagios-NRPE Config
On the Server

The default nrpe.cfg can be boiled down to:

server_port=5666
# SERVER ADDRESS
# Address that nrpe should bind to in case there are more than one interface
# and you do not want nrpe to bind on all interfaces.
# NOTE: This option is ignored if NRPE is running under either inetd or xinetd

#server_address=192.168.1.1

allowed_hosts=127.0.0.1      # Hosts allowed to talk to nrpe
nrpe_user=nagios
nrpe_group=nagios
dont_blame_nrpe=0 # If this is 1 (true), allow passing of
                  # arguments to commands
command_timeout=60 # kill command after this many seconds
include=/etc/nagios/nrpe_local.cfg
# So we make our changes in nrpe_local.cfg not nrpe.cfg
Client nrpe.cfg

On the client you need to specify that the server is allowed to connect (in this case the server is 192.168.8.2)

server_port=5666
# SERVER ADDRESS
# Address that nrpe should bind to in case there are more than one interface
# and you do not want nrpe to bind on all interfaces.
# NOTE: This option is ignored if NRPE is running under either inetd or xinetd

server_address=192.168.8.1

allowed_hosts=127.0.0.1,192.168.8.2     # Hosts allowed to talk to nrpe
nrpe_user=nagios
nrpe_group=nagios
dont_blame_nrpe=0 # If this is 1 (true), allow passing of
                  # arguments to commands
command_timeout=60 # kill command after this many seconds
include=/etc/nagios/nrpe_local.cfg
# So we make our changes in nrpe_local.cfg not nrpe.cfg
Client nrpe_local.cfg

(command definitions)

command[check_load2]=/usr/lib/nagios/plugins/check_load --warning=5,4.5,4 --crit
ical=7,6.5,5.5
command[check_ping_mornir]=/usr/lib/nagios/plugins/check_ping -H mornir.fionavar.d
d -w 100.0,20% -c 500.0,60%
command[check_disk_mornir]=/usr/lib/nagios/plugins/check_disk -w 10% -c 5% -p /dev
/ide/host0/bus0/target0/lun0/part1
command[check_swap]=/usr/lib/nagios/plugins/check_swap -w 20% -c 10%
command[check_privoxy_local]=/usr/lib/nagios/plugins/check_http -I darien.fionav
ar.dd -p 8118 -u http://mornir.fionavar.dd
command[check_httpd_mornir]=/usr/lib/nagios/plugins/check_http -H mornir.fionavar.
dd -u http://mornir.fionavar.dd/

As you can see, the parameters are hard-coded. This is because there is no secure way to pass parameters over the network (with nagios).

Table of Contents

  1. Network Monitoring
    1. Slide Deck and Source Code Archives
    2. Real-time Status: Applets
    3. A Few Log Tools
  2. Munin
    1. What is Munin
    2. Munin Interface
    3. Installing Munin: Client
    4. Configuring Munin: Client
    5. Installing Munin: Server
    6. Configuring Munin: Server
  3. Nagios
    1. What is Nagios
    2. Nagios Interface: Overviews
    3. Nagios: Host & Service Status
    4. Nagios: Reports & Graphs
    5. Installing Nagios: Overview
    6. Nagios Config
      1. Overview
      2. Templates
        1. Example of Templates
      3. Main
        1. nagios.cfg
        2. Contacts
        3. Contact Groups
        4. Hosts
        5. Host Groups
      4. Nagios Service Definitions
        1. Nagios Service Template
        2. Nagios Base Template
        3. Service Examples
          1. Example 1
          2. Example 2
          3. Example 3
        4. Adding New Service Commands
      5. Activating Notifications
      6. Monitoring
        1. Local (Server) Monitoring
        2. Disk Space
        3. Swap, # of processes
        4. Nagios-NRPE Overview
        5. Nagios-NRPE Config
          1. On the Server
          2. Client nrpe.cfg
          3. Client nrpe_local.cfg
  4. Table of Contents