Nagios
I’m on call this week and next week (a screw up somewhere) for keeping the web sites of the electricity company I work for running. It means I take my laptop everywhere and they gave me a UMTS card, so I can get online everywhere. I also got another cell phone. On this one I get smses if there’s a system down somewhere.
Nothing happened all week…
I mentioned this at work and people responded that I shouldn’t say it, because the systems were bound to crash now. And they did, or so it seemed. At 23:10 yesterday night a lot of servers couldn’t connect to databases and hosts seemed to be unreachable.
I was on the phone until 3am before I gave up and went to bed. I slept for 7 hours and at 10 in the morning I continued the search. Now at 1:30pm everything is finally back up again. Turns out some guys turned off two databases they shouldn’t have turned off. They were doing maintenance on their systems this weekend. We connect through those databases to them. So if they’re down we still get data. Well, at least everything is back up again.
Nagios is the open source tool that sent me the messages that there’s something wrong. Nagios does periodical checks to see if servers and services are still running. I’ve never played around with such software before, so it was very interesting to see how it works.
Posted in Technology, Work