Monday, November 22, 2010

KAR at LOSUG

Last Wednesday I gave a short talk about KAR at LOSUG.

Apart from the fact that my laptop and the projector refused to get along, it went OK. That disagreement meant that the interactive show and tell (aka the demo) got skipped, but that was always designed as an extra. I hope I got people thinking - the real aim here is to try and make people think about what we do, rather than spoonfeed solutions to them (that's what marketing is for).

For the talk I released a new version of KAR, and in preparing the talk and getting some things ready I stumbled across one or two issues I thought worthy of further fixes.

Part of this is my implementation, but using JKstat to parse the KAR archives has never been exactly quick. Several minutes for a full day's data from a decent server. My original implementation of graph generation involved doing that for every graph - so generating the 100+ graphs took rather a long time.

Here follows a completely gratuitous graph: network traffic on my home machine over the course of a day.



The parser in JKstat was only partially finished, so I tidied it up a bit. It now caches parsed data, and allows consumers to step backwards as well as forwards, and rewind, and the cache is shared across multiple instances. This allows you to just parse the data once and then analyze it to your heart's content. What this means is that generating all the graphs takes only a little longer than generating the first one - making it 100 times faster.

(Which proves the point: fixing truly stupid implementations can give you huge performance gains. There is no way I can imagine getting a hundred-fold performance improvement by tweaking the parser. That's not to say I'm happy with performance now - far from it - but it's made one particular common task viable, which it wasn't before.)

So, a new release of both KAR and JKstat is now available.

The next step for KAR is to identify additional measurements that would be generally useful, and then add them to the list of graphs generated (and maybe have specific utilities for specific data).