NagioS |
I'm hooking up nagios to the Simple Event Correlator like so:
I currently monitor many machines that are spread between a few datacenters, many of which run the same services. I have a nagios check defined for each service associated with each server. Occasionally one of the services is unavailable on multiple machines in the same datacenter (e.g. due to a back end system being down in that datacenter), and I don't really want to get paged by every server having a problem. I really just want one page that tells me the number of servers having problems in that datacenter.
This is where SEC comes in. First, I set up a new passive check in nagios for each service that will be updated by SEC. Each service running on the individual servers in a datacenter are dependent upon the SEC passive check for that service in that datacenter.
I whipped up a little perl script called nagios_tail.pl (inspired by nagtail, which didn't compile on my solaris box). this script monitors the nagios status.dat file for status changes to any services, and generates a line of output like this:
SERVICE, hostname, service description, status, output
When SEC starts up, it spawns nagios_tail.pl. If a service goes to a WARNING or CRITICAL state on more than one server, the SEC rules flag the passive check with the current state and number of servers having a problem. By including the datacenter name in the service description, i can ensure that i will get multiple notifications if multiple datacenters are down. Since the checks on each of the individual servers depend on the SEC check, when the SEC check goes to a WARNING or CRITICAL state, you will only get notified about the state of the SEC check instead of getting one notification per server.
# growl
define command{
command_name notify-by-growl
command_line /usr/bin/printf "%b" "Notification Type: $NOTIFICATIONTYPE$\n\nService: $SERVICEDESC$\nHost: $HOSTALIAS$\nAddress: $HOSTADDRESS$\nState: $SERVICESTATE$\n\nDate/Time: $LONGDATETIME$\n\nAdditional Info:\n\n$SERVICEOUTPUT$" | /usr/local/nagios/libexec/growl -t "***** Nagios *****" -s
}
define command{
command_name host-notify-by-growl
command_line /usr/local/nagios/libexec/growl -t "***** Nagios *****" -m "Notification Type: $NOTIFICATIONTYPE$\nHost: $HOSTNAME$\nState: $HOSTSTATE$\nAddress: $HOSTADDRESS$\nInfo: $HOSTOUTPUT$\n\nDate/Time: $LONGDATETIME$\nHost $HOSTSTATE$ alert for $HOSTNAME$!" $CONTACTEMAIL$
}
and the growl script:
#!/usr/bin/perl -w
use strict;
use Mac::Growl;
# Command-line options processing
BEGIN
{
use Getopt::Long qw[ :config gnu_getopt ];
use vars qw(
$verbose $message $title $nostick $stdin $appname
);
GetOptions(
'-v|verbose!' => \$verbose,
'-m|message:s' => \$message,
'-t|title:s' => \$title,
'-n|nostick!' => \$nostick,
'-s|stdin!' => \$stdin,
)
)
}
unless ( $title ) { die "Error no title specified\n" }
unless ( $appname ) { $appname = "growlpl" }
# Sticky is turned off if nostick is set.
my $sticky = $nostick ? 0 : 1;
if ( $stdin )
{
undef $/;
$message = <STDIN>;
}
unless ($message )
{
die "ERROR: No specified message\n";
}
if ( $verbose )
{
print "TITLE: $title\n";
print "MESSAGE:\n$message\n";
}
Mac::Growl::RegisterNotifications($appname,[ $appname ],[ $appname ]);
Mac::Growl::PostNotification( $appname, $appname, $title, $message, $sticky);