The Trouble with Tribbles...: March 2010

Sunday, March 14, 2010

Beyond sar

The old standby for recording historical system activity is sar - system activity reporter. There are many alternatives, both free and commercial, but sar has the advantage that it comes with the OS, and pretty much any version of any (unix-like) OS.

Because it's there, we use sar, saving it's output into a big archive and using tools like sar2rrd to produce charts. (It's not the only thing we use, of course.)

The problem is that, particularly on Solaris, sar is terrible. The data it collects is woefully incomplete - network data is the worst, being completely absent, but there's much more missing. Some of what is present is aggregated away so that much of the details is lost. And the list of what's present is fixed, so the whole framework is completely non-extensible.

So, I'm fed up with that, and need to do better. Note that most tools out there don't help with capturing all the data, as they have their own preconceived notions of what data might be useful (although they are generally far more complete than sar).

Enter kar - the kstat activity reporter. This is really amazingly simple. Given that (almost) all the performance data you want is obtained from kstats, simply save all the kstats on a regular basis. The implementation I have here is to save kstat -p output into files inside zip archives. Now, that's not ideal, but it has some advantages: it's almost zero effort, it gives complete coverage, and it's naturally extensible. If it works out and is found to be useful, more optimal mechanisms could be defined.

I've said it a couple of times above, but I'm going to say it again: the key advantage here is that the data is complete and thereby naturally extensible. I don't want to enhance sar by trying to cherry-pick interesting statistics (and we could all argue for months about what might go on the list). By saving everything you automatically pick up anything new that's added. And you let consumers decide which of the statistics are interesting when you get to the post-processing phase. Say I wanted to look at the historical behaviour of the zfs ARC - no problem, it's all there in the kstats.

Using kstat -p is a convenient shortcut, but does have other advantages. Because the output is textual, all your favourite analysis tools - awk, sed, perl, grep, python, whatever - can munge the data with no effort. And you can chuck the data into your graphing application of choice.

If that wasn't enough, jkstat 0.35 has support for reading in the output of kar in both the browser and chart builder.

./jkstat browser -z /var/adm/ka/ka-2010-03-01.zip

./jkstat chartbuilder -z /var/adm/ka/ka-2010-03-01.zip

will do the trick.

Friday, March 05, 2010

JKstat 0.34

I've just pushed out a minor update to JKstat.

The change here (apart from a couple of minor bugfixes and a jnetloadfx example to remind me what JavaFX looked like) is the addition of a class to update multiple accessories together. Previously, in demos such as iostat and cpustate, each item had its own timer loop and was responsible for handling its own updates. This was especially apparent in the kmemalloc example in SolView - it was obvious that separate widgets weren't being updated simultaneously.

Now I can just have a single update loop that updates multiple accessories. Not only does it look neater, but there are noticeable improvements in memory and cpu usage from only having one timer instead of many.

Coming up next is more related work. The ability to read historical kstat -p output works fine, but requires some changes so that you step through the data rather than continuously updating in time. (If you think about it for a moment, the class I mentioned above is one example of updating the time and then telling the world to update, so it's - albeit only tangentially - related.) These changes are likely to be a bit complex, so I also decided to cut a version before starting to make more significant changes to the code.