Too Quiet
Due to family matters, I haven’t had much desire to post anything for a long time. Hopefully, I can get myself into the mood to write more often. I would especially like to get the Ass-Hate of the Week Award updated. For the moment, though, I’ll just mention the latest cool Linux tool.
A few weeks ago one of the servers I managed went down on a Sunday. It’s a bad thing when the owner of the site sends you an urgent email asking about it. The problem was easily resolved, but the experience convinced me that some sort of automated solution was needed to keep an eye on things.
Asking around, the overwhelming winner was Nagios. An exceptionally full featured package of the automated monitoring of systems and services. With this free program, I was able to configure my home server to keep an eye on four Linux servers. It doesn’t just ping them and report if there is no response. It can check specific services. For example, all four servers are running Apache, so I added a specific check to make sure Apache is alive and well on all systems. I also added tests for SMTP, IMAP, and POP3 (email services) on a couple of the system.
Some of the services can not be checked directly. For example, MySQL is on all the systems, but can not be accessed remotely. For that type of service, I set up cron jobs on the remote systems that report back to my home server every thirty minutes. If a systems fail to report within a reasonable time, my server forces a check by a secondary means.
Notifications can be handled in a number of ways. For important things like the main ecommerce site not responding, it sends a text message to my cell phone. For less imporant services that wouldn’t normally require immediate attention I have the notification turned off. The web interface will show there’s a problem so I can deal with it eventually.
A single system monitoring multiple remote systems has a single point of failure — itself. In fact, this happened a few days ago. My ISP had an equipment problem which resulted in the lose of my internet connection for a short time (less than thirty minutes), but it was enough to cause multiple false alerts. I had stupidly configured Nagios to ping my router to test its own internet connection which, in fact, only tested my local network. I changed that to ping google instead. Now if I lose my internet connection Nagios will know. Since I made all the remote servers child (dependent) on my own internet connection, their tests will be skipped while the connection is down.
One final addition completed the important tests. I set up Nagios on one of the remote systems to monitor my own server. It checks to make sure both Nagios and SMTP are running and sends me a text message if either are down. The SMTP test is needed remotely since if it’s dead, my own server can’t send an alert to me.
There are a number of websites supporting Nagios, as well as a very active mail list and developers community. There are even companies that offer Nagios monitoring as a service. If you have more than one system you need to keep an eye on, I highly recommend you take a serious look at Nagios
.


