Many years ago I wrote a simple system and network monitoring tool. It's been developed on and off over the years, but has now reached a major impasse.
Basically, I designed it wrong 10 years ago. I started out with a 2 state system. If the status is 0, then it's fine. If the status is 1, it's broken and needs fixing. Sounds reasonable, right?
Then I realized I needed to add another state, so I defined it so that if the status is 2, there's a warning condition. And all worked well for a few years.
The problem with this scheme is that the severity of the problem isn't a linear function of the status. So I end up playing all sorts of games trying to analyze the status codes trying to work out just how bad the situation really is. It would be much easier if I could simply retrieve the maximum status out of the database - no fiddling required! And I can order problems simply by sorting on the status.
Thinking about this a bit more, this is the obvious thing to do. So obvious, in fact, that I was a dullard for not thinking about this at the start. (But, when I started writing this particular monitoring tool, I wasn't thinking about what version 3 would look like 10 years down the line. And I started out by using the return code from scripts as the status, which is where 0 and 1 came from.)
Of course, I now have to consider what the best scheme might be. Do I simply have 0 for good, 1, for warning, 2 for dead? I think the 0 for good is fine. But should I do something like 255 for dead, 128 for warning, leaving me some room to add finer levels of granularity in the future?
Decisions, decisions...
No comments:
Post a Comment