Tuesday, October 28, 2008

Scaling administration

Commenting on my last SolView post, somebody asked a question I had asked myself:
does it gracefully handle the situation where you have thousands of zfs files systems?
And I don't actually know - because I haven't actually tried it.

The original code got the list of zfs filesystems by calling zfs list (which is now all it does) and then retrieved all the properties for each one - whether you viewed them or not. I soon scrapped that loop, as it was obvious that it doesn't scale. So I think my code is about as efficient as it can be - it's going to scale as well as the underlying tools do.

However, one of the things I've given some thought to - and one of the reasons for writing SolView in the first place - is how to get a handle on systems as they scale up. I'm not talking about managing large numbers of systems (that's an entirely separate problem), I'm talking about looking at a single system where the number of instances of an object may be measured in the tens, hundreds, or thousands.

For example, my T5140s have 128 processor threads. I have systems with 100 virtual network interfaces. Many people have systems with thousands of zfs filesystems. Zones encourage consolidation of multiple application onto a single system (so do other virtualization technologies, but in those other cases you tend to manage the instances independently), so you maybe looking at a system with dozens of zones and thousands of processes running. A thumper has 48 disks, and that's small. Using SMF, a machine typically has a couple of hundred services.

The common thread here is that the number of objects under consideration is larger than you can fit on screen (or in a terminal window, at any rate) in one go. And is thus larger than you can actually see at once. How does your brain cope with reading the output from running df on 10,000 filesystems?

As we move into this brave new world, we're going to need better tools in the areas of sorting, aggregation, and filtering.

A couple of examples from SolView and (originally) JKstat:

I wrote a lookalike of xcpustate for JKstat. That works great on my desktop. But my desktop isn't big enough to show a copy of it running on a T5140, so I wrote an enhanced version (now shipping with SolView) that shows the aggregate statistics for cores and chips, and allows you to hide the threads or cores, which makes the amount of information thrown at your eyeballs at any given time rather more manageable.

Another example is that the original view of SMF services in SolView was just a linear list. I then wrote a tree view, based on the (apparently) hierarchical names of the services. I found that the imposition of structure - even a structure that's mostly artificial - helps the brain focus on the information rather than be overwhelmed by a flat unstructured list. And that structure breaks the services down into chunks that are small enough for the user to handle easily.

So back to the example of huge numbers of ZFS filesystems. So the plan is to show them in the display grouped in the same hierarchy as the filesystems themselves, rather than as a plain list. And to show snapshots as children of their parent filesystem. So everything possible to break things down into more manageable chunks.

This relies on the underlying data being structured. I'm assuming that when someone has 100,000 filesystems that they are structured somehow - whether by department or hashed by name or whatever - rather than being a great unstructured mess. I can't create order out of chaos, but the tools we use should do everything they can to use what order they can find to create a structure that's easy to comprehend.

Monday, October 27, 2008

SolView moves ahead

I've been working on a few new features in SolView, and it's about time for a new release. So that's version 0.45 out of the door.

The major feature this time is a sneak peek of a prototype I've put together for the System Explorer. See the image here for a sample:

You can see the left hand panel containing a tree view of the various bits of the system that SolView has found. Selecting any of them show whatever information I can find - either by running external commands, or by using JKstat to grab statistics.

This is both skeletal and a prototype. But it does try to answer the question: what's in my system, and how do the bits relate to each other? It's the relationships that I'm trying to capture: so I have a disk, but what's it used for, and where's the load on it coming from? And there's clearly a lot more to do in this regard.

Sunday, October 26, 2008

MilaX - Wow!

I've been playing around on my laptop today. It's a fairly basic model - with a pretty large screen - that I normally just use for simple connectivity. So the fact that it's running Vista doesn't bother me, as it can launch firefox, VNC, and putty just fine.

I wanted to play around with OpenSolaris on it, which is a bit tricky - it doesn't work right on the metal (I can manually get the wired network to function, but never got the wireless going). So I'm back to running stuff under VirtualBox. Which would be easy if I had a decent amount of memory to play with, but the laptop has 768M and Sun's OpenSolaris distro needs pretty much all of that, so it's not going to work.

Enter Milax. Not only is the download tiny, but it claims to boot graphically in 256M, and CLI in 128M. So I gave it 384M in VirtualBox, and it works just fine!

It's a fabulous little distro. There's no room in that footprint for all the bloat we're used to - no desktop environment, no office suite, no java - but it works, and is really very slick.

In many ways I feel right at home: a standalone window manager and individual applications, all very lightweight, and reminds me of the energy of the 90s before the big desktop environments turned computing into a desolate wasteland.

Monday, October 20, 2008

Refactored JKstat

As I mentioned about a month ago, both JKstat and SolView were in line for some major refactoring.

I've done a bit of a spring clean on JKstat, so there's a new version - 0.25 - which has a lot of the cleanups and shuffling about that I had in mind. The class hierarchy has been restructured, code cleanup continues, and the more complex demos have been moved to SolView. The restructuring also allows the easy construction of a jar file containing just the API, so you don't need to drag in all the bloat associated with the browser, gui components, and demos.

The associated SolView release will follow shortly.

And thanks to Mike Duigou for a bunch of helpful fixes and suggestions!