Friday, December 04, 2009

JKstat 0.33

There are a number of features that have been suggested for JKstat that I've been thinking of including, and I've just added one of the more interesting ones.

I think it was Phil Harman suggested this at LOSUG. Someone may have archived kstat -p output, so why not allow JKstat to read that?

So, as of version 0.33, JKstat can do exactly that. You can either save kstat -p output into files in a directory, or simply put those files into a zipfile. (The reader is pretty dumb right now - no other files or subdirectories.) It uses the timestamps on the files as the time the measurement was take,

The browser can handle the input, but needs a bit more work to be useful - it currently steps through the data in real time, which isn't ideal for what could be days of data sampled at multi-minute intervals. Ideally it would be able to step through the data, pause, and rewind. Next release, perhaps.

The chart and areachart subcommands can read in data and display a chart. For example, the cpu behaviour of my home desktop yesterday evening, from kstat -p output saved every 5 minutes, could be shown with:

./jkstat areachart -z /var/adm/ka/ \
cpu_stat:: user kernel idle
which looks like:

(Yes, there's an anomaly on the first datapoint.)

This is a preliminary implementation, so there are plenty of rough edges in need of a bit of polish, but it proves that the idea of reading archival kstat data is viable.

Wednesday, November 25, 2009

Is package management interesting?

My desktop workstation running Solaris 10 has almost 1600 packages installed. Many of the development systems I use have something close to 1000 packages installed. My Ubuntu (8.04, to be exact) desktop install fresh off the CD has just over 1100.

Given 1600 packages, can a sysadmin manage that, or even concisely describe what is installed and what its function is? I suspect not. It's actually so bad that I suspect most of us don't even bother.

Tools don't really help here. If anything, by giving the illusion of ease of use, they encourage the growth in the number of packages, making the underlying problem worse.

Really, though, is managing packages interesting? I submit that it's not, and that looking at the problem as one of managing packages is completely the wrong question.

Instead, we should be managing applications at a higher level of abstraction. Rather than managing the 250 packages that comprise Gnome, we need to collapse that into a single item, which we can then open up into manageable chunks. A package may be the right level of granularity for a packaging system, but it's the wrong level of granularity for an administrator.

We should be thinking of applications and function, not the detail of how packages deliver that functionality. I want to be able to go to a system and ask "What do you do?" and have it come back and say "I'm a web proxy server and mail relay"; I don't want to sift through 500 packages and try to work out which of them are relevant. If I want to set up a wiki for collaborative document editing that authenticates against my Active Directory infrastructure, then I want to phrase the requirement in those terms rather than try to work out the list of components that I need for that task.

From this, the packaging details become uninteresting. What a package contains, the packaging software, package names - are less important because that's just internal implementation detail.

The old Solaris installer had some of this - it defined clusters and metaclusters. The implementation doesn't really help much - the definition of the contents of clusters and metaclusters was poor, and there was no support for the clusters once you were managing an already installed system. Also, what you really want is a system that allows for clusters to be structured hierarchically (so you could take something like Gnome, and either manage it as a single unit, or have the option of dealing with subunits like games or libraries), and to overlap (for example, you could imagine that the apache web server would be in a whole lot of clusters).

One might be tempted to construct packages that use dependency information to bring in the packages they need. This approach is flawed: it doesn't cleanly separate groups from packages; it doesn't allow you to omit subgroups; and it makes removal of a group excessively difficult. Software clusters need to be cleanly layered above packages in order to allow each layer to best meet its own requirements.

Beyond simply delivering files (via packages) a cluster could also contain the details of how to actually take the various bits and pieces and put them together into a functioning system. In fact, the whole area of application configuration is one desperately in need of more attention.

A quick summary, then: package management shouldn't be interesting, and we need to move forward to managing applications and configuration.

Tuesday, November 17, 2009

Not so lucky any more?

The Google home page has seen some changes of late. One of these is the removal of the "I'm feeling lucky" button.

One of the things I've noticed over the past few months is that searching online has become dramatically worse. Google is increasingly failing to find useful results, and when it does find something it will rather present you with multiple instances of the same thing (the same news article syndicated to different sources, for example) rather than the more useful list of independent answers. Commonly I end up going to subsequent search pages, and often don't get to anything useful at all.

Is Google losing its touch? Has dumb search had its day? Perhaps the "feeling lucky" option was removed because it almost never works any more.

Monday, November 16, 2009

So what is wrong with SVR4 packaging, really?

So, as might be predicted I suppose, some people wilfully disregarded the thrust of my argument, and turned it into a debate over specific packaging technologies.

OK, so that brings us to the question: what is so bad about SVR4 packaging, really?

I could go on for pages. One of the reasons I found OpenSolaris attractive was the prospect of being able to fix all the bad things with Installation and Packaging in Solaris. Let's takes some of the comments though:

old and clunky

Guilty as charged. Really, it is old. It is clunky. It's been neglected and unloved. It needs fixing. Those reasons alone aren't enough to dismiss it - the key question should be whether it can actually do the job.

missing lots of features

OK, so what features does dpkg have that SVR4 packaging doesn't? That's really the comparison - versus the dpkg or rpm commands.

enabler of dim-sum patching

Completely and utterly false. The problem with Solaris patching lies fairly and squarely in the domain of patching. This could trivially be solved without any changes to tools - either deliver whole packages, or simply institute a policy saying that you can't deliver changes to a package (or related set of packages) in independent patches. This is a process problem, not something inherent to the underlying package system.

Then there are more material objections:

no repository support

Actually, SVR4 packaging does have the ability to fetch and install packages from remote locations. (A crippling limitation of IPS is that it can't do anything else.) What's wrong with this picture is the lack of repositories - blastwave aside. Wouldn't it have been easier to simply make existing packages available on a web site, without having to retool everything?

lack of dependency resolution

As the success of dpkg/apt demonstrates, having your underlying packager do all the work is neither necessary nor desirable. What that does demonstrate is the requirement for more powerful tools above the base packager. Actually, separating the fancy interface from the tools doing the low-level work is probably a really good thing - it enables compatibility of the system over time even if the higher-level tools change, and it enables innovation by allowing independent components to be developed independently.

arbitrary scripting

Actually, one of the key weaknesses of SVR4 packaging is not that it supports scripting, but that the support it offers is pretty poor. there is no real support - there ought to be a strong scripting framework with a well-defined environment, and predefined functionality for common tasks. Oh, and trigger scripts would be nice. Banning scripting (yet allowing it to sneak in through the back door) fails to solve the problem, and encourages bad workarounds.

poor package names

The actual package names are pretty much immaterial. Assigning real significance to them would be false. What matters is that they're unique and follow a scheme. As a user, you might want to install "php" - all that means is that the software studies the package metadata and works out what packages to really install. Actually having a pcakge with that name isn't necessary, and probably not even desirable (it locks you into current names and prevents evolution).

So, beyond a recognition in the codebase that the 21st century has arrived, and the lack of an apt-get/synaptic style front-0end - all of which could fairly easily be remedied - what is really wrong with SVR4 packaging?

Friday, November 13, 2009

JKstat 0.32

Friday the 13th, and I wonder whether to hold off for a day.

But no, another JKstat release comes out.

Nothing major here, still the continuing cleanup process. The one change here is better exception handling, particularly in Client-Server mode. It's not perfect, by any means, but in the past I just dropped errors and exceptions straight in the bin.

The reason for doing so was that I wasn't keen on declaring Exceptions to be thrown - it seriously clutters up the whole API. But Fabrice Bacchella (thanks!) pointed out the obvious, that if I were to throw a RuntimeException then I wouldn't have to declare that I threw it. So the fix is to create a subclass of RuntimeException and throw that, and consumers that need to know can check for that (and can check for the specific failure - RuntimeException is far too generic).

Tuesday, November 10, 2009

Into the sunset?

The announcement to EOL Solaris Express Community Edition (SXCE) was telegraphed well in advance, and we're coming to the end of the road with only a handful of planned releases left to look forward to.

But, is this just the end of the road for SXCE, or is it something bigger that's at stake here?

Read the Sun marketing and you might believe this is a glorious new dawn for the Solaris/OpenSolaris world. The reality may be more like sailing off into the sunset and disappearing from view.

The fundamental difference between the old and the new is instalation and packaging, which have been ripped out wholesale and been incompatibly replaced. Even if the replacements had been perfect (and, quite frankly, they fall a huge distance short) this would have been a huge challenge. Organisations (and individuals) are under huge pressure to retrench and consolidate. Adding additional technologies that they're expected to support is an uphill battle. Adding brand new (and essentially untested) technologies that they're going to have to learn from scratch makes it doubly hard.

If the next version of Solaris had been based around SXCE, with traditional deployment technologies - traditional packaging and jumpstart - then customers would have been able to start rolling it out tomorrow. Everything a customer knew, all their existing investment in skills and tools, would be preserved. New customers would be able to leverage the skills and expertise of existing customers. All the great features and functionality present in OpenSolaris would be there to be taken advantage of.

Contrast that with the planned OpenSolaris transition. You have to retrain all your staff, replace your entire toolset, rebuild your entire systems deployment and administration infrastructure. Most environments are heterogeneous, so this means you now have an entire extra set of infrastructure to support - you aren't going to transition everything to the new scheme immediately, so you're going to have to shoulder the burden of supporting the extra scheme in parallel for years. Isn't the most likely course of action for a cash-strapped IT
department with a CIO breathing down their neck to simply reject that and migrate everything over to RHEL?

Solaris and OpenSolaris contain fantastic technologies that make them a great choice for IT departments - ZFS, Crossbow, CIFS support, zones (especially sparse root zones), Dtrace, and many more - but by making deployment such an unattractive proposition we're making it far less likely that customers will try or use these technologies, and are giving organizations and managers every excuse to ignore Solaris and OpenSolaris as an option.

The best thing Oracle could do for Solaris and OpenSolaris would be to scrap the OpenSolaris distribution (but not the rest of OpenSolaris) and redirect our energies into building a better Solaris based on SXCE. If not, then I fear that Solaris will ride off into the sunset and be consigned to the wastebasket of superior technologies that failed due to bad strategic decisions, and that's a prospect that truly saddens me.

Sunday, November 08, 2009

Tinkering with the SVR4 packaging source

Take one look at the SVR4 packaging source in Solaris/OpenSolaris and it's clear that it's suffering from serious neglect. It's old. It's fragile. It's had stuff bolted onto the side over and over until it's a wonder that it works at all.

Yet, it's survived for the best part of two decades and every Solaris system uses it.

I recently had to fix up some packages on Solaris 10 that were in the deadly embrace of webstart. If you tried to remove them with prodreg, it said "oh no, you must use pkgrm", yet when you tried to use pkgrm it said "oh no, you must use prodreg".

So, we have access to the source, right? And, while I could have manually gone in and wiped out the errant packages by hand, I had a look at the SVR4 source to see if I could put together a version that actually did what it was told.

This was pretty easy. It took a little effort to get enough together that would compile cleanly on Solaris 10 again. And, having done that, and solved the initial problem, I did a bit more tinkering to remove some of the more obviously redundant code and apply some of the performance improvements that are sorely needed.

After thinking up a meaningless acronym, I've made the code I have available here: SPRATE. You're free to use it; anyone feeling masochistic enough to work on it is free to contact me.

Wednesday, October 21, 2009

Migrating old servers

Buried in a far recess of some organisations you'll find some really old systems. I've just acquired a new playground - not a Solaris 10 box in sight. What I do have is a lot of much older stuff, going back to:

SunOS jawdropped 5.5.1 Generic_103640-29 sun4u sparc SUNW,Ultra-1
SunOS gobsmacked 5.6 Generic_105181-39 sun4m sparc SUNW,SPARCstation-10

Ouch! I thought I had long seen the last of such hardware or Solaris versions.

Now, Solaris has excellent binary compatibility. I have some old SunOS 4 binaries from the 1980s that I use on a daily basis. So I'm expecting to be able to take a lot of what's present and simply move it onto new systems.

I'm also planning on using Solaris Containers to help make the transition easier. I would much prefer to do a proper move of the application to a newer system, but that takes time. The advantage of the Containers approach is that you can lift up a working system lock, stock, and barrel, and migrate the whole lot in one fell swoop - otherwise it can be quite a lot of effort to cleanly extract an application from an old (and usually undocumented) system.

Unfortunately there isn't a Containers implementation for Solaris 7, 2.6, or 2.5.1. So those have to be done the hard way.

Tuesday, October 20, 2009

java + picl = fail

Solaris has PICL - the Platform Information and Control Library - as a mechanism for storing and retrieving platform-specific information. It's structured as a tree, so a natural thought would be to load the information into java and use a swing JTree to present a graphical representation. (And would make a nice add-on to SolView.)

I wrote such a tool several years ago. It uses JNI to interface with the native libpicl, and it never worked properly. It still doesn't work properly - it crashes far more often than not. The crashes are consistent in the sense that if it crashes on a machine it will always crash in the same place (retrieving the same object from the picl tree), but are random in the sense that apparently nonsensical changes to the code will either kill or cure it.

Anyway, I'm just tossing it out there. It's called piclbrowser, and is available here. Anyone's free to take this and play with it, and is welcome to try and fix it.

Monday, October 19, 2009

Appalling Virgin "Broadband"

I've had a home broadband connection from Virgin - formerly NTL - at home for years. Until recently, the quality of the service was excellent - I could almost always get the full quoted speed, and we only had the occasional minor outage.

Recently, though, the connection has been terrible. It's been several weeks, almost a month now. I'm supposed to get 10M/s (that's 10 megabits). Trying the various speed tests that are available - including the ones recommended by Virgin themselves - and I get 5% at best, usually rather less. And I suspect that those aren't telling the full picture, as the actual rate I'm getting in practice is worse. Web pages can take minutes to load. Watching TV programmes or gaming is simply a non-starter. I tried downloading something an hour ago and it's not even managed to get a single byte yet. Packet loss is 10% and up. It's not usage or time dependent.

(Fortunately, gmail seems to remain moderately responsive, but pretty much anything else is essentially down. Attempting to actually work from home is a disaster.)

Virgin Media's online support is awful. For starters, their web form is completely incapable of working out who you are even if you're logged in - so you have to fill in all sorts of account details. The first one I sent was acknowledged and then ignored. The next one got a standardized answer. I did what was asked, got given some more things to try, and did those too. (My connection is sufficiently poor that some of the tests don't even work.) The no response in over a week. No response to my following up.

So I tried the telephone. Oh dear. After the usual rigmarole of giving them every single piece of information they already know we go through the standardized script. Have I rebooted the modem? Tried that, several times. Have I rebooted my computer? Yes. Have I checked for spyware? Yes. Am I running XP or Vista? Well, I have both, and they're slow as well, but they can't even spell Solaris. Can I run a speed test? I try that, and get almost 5% of what I'm paying for, which is actually better than most of the times I've tried.

The poor chap then has the temerity to tell me that my broadband connection is working perfectly, and that I need to call the PC helpline who'll help me fix the problem with my computer. I think I managed to stay civilized throughout the resulting exchange.

It's possible that there's a problem somewhere in the house, of course. I've tried to exclude that, and can't, because the test required to do that - direct connection of a machine into the modem - simply fails completely. I regard that as an interesting data point, but I'm only a sysadmin not a call-centre automaton reading from a script.

Faced with a combination of a terrible broadband connection, and useless customer service, I investigated alternative providers. Which leaves me stuck between a rock and a hard place, because my house must be a long way from the nearest exchange, so that using the BT line (which is the only other connectivity option) would take me down to 3M if I'm lucky. And I really do need the 10M I'm paying for.


Sunday, October 18, 2009

SNMP from java - updated

I've just uploaded Jangle 0.05, a minor update to my graphical java snmp client.

It's not just graphical now; I've added an snmpwalk subcommand to walk the snmp tree from the command line.

There's also the normal clutch of bug fixes and general code improvements.

One thing I've been playing with is trying to automate discovery of relationships. The snmp information is stored in a tree, but the last two nodes in the tree are inverted. Consider ifInOctets as an example: it has a child node for each interface, as does ifOutOctets. So just looking at ifInOctets gives you a list of interfaces, whereas what you really want is to generate a list of interfaces and then get all the data for that interface. You can get this by parsing MIB files, but I would like to be able to do this just by looking for patterns in the OID tree, and then look at the MIB afterwards to describe the structure. (One thing I want to avoid is having to ship all the MIBs for every possible device.)

OpenSolaris Constitution Update

The OpenSolaris Constitution is known to have a number of flaws. It's not well matched to the way the community actually works; it conflates local functional roles with global governance roles; and it's relatively long and complex.

At the last annual election, a revised constitution was proposed that aimed to eliminate those shortcomings - greatly simplified and separating governance from operations. It failed to pass. (Only narrowly, but it failed.)

The current OpenSolaris Governing Board decided early in its term of office that we would again offer a revised constitution to the OpenSolaris community, after taking an opportunity to correct any errors or issues that community members might be concerned about.

As announced, revisions to the constitution are now available for comment, and we invite all interested OpenSolaris community members to send us their comments. (Reviewing the release notes would be a sensible starting point.)

Sunday, October 04, 2009

SolView 0.48

It looks like my SolView 0.47 announcement was a tad premature. Testing was somewhat incomplete.

Version 0.48 fixes the problem, and is now available.

(The technical goof I made is like this: the solview wrapper script sets things like the LD_LIBRARY_PATH and CLASSPATH. However, if you just run solview with no arguments it pulls the CLASSPATH out of the jar file manifest instead. I had updated the wrapper correctly, but forgotten the manifest. I normally run the individual views in isolation, so hadn't noticed that one of the jar files was missing, resulting in the process view generating a stack trace rather than the useful output it was supposed to.)

So, to anyone who grabbed 0.47, you'll want to get 0.48. Sorry about that.

Saturday, September 26, 2009

JProc 0.2

I've been doing a little more work on JProc - a java jni interface to the Solaris procfs filesystem - to make it a little more complete and robust.

This has reached a convenient stopping point, so I've released version 0.2.

There are a couple of features that have been lifted from JKstat, namely the idea (and parts of the implementation thereof) that it's useful to manage a Set of processes as a unit, and that it's useful to be able to filter on process characteristics.

As a start, I've added filtering by user and zone. The demo application now has dropdown lists of users and zones, so you can select processes owned by a user, or running in a specific zone (or both) interactively. (They don't currently update as the list of users and zones changes - that's work left for the future.)

I've also added lookups of username, group, and zone name. This is a simple jni interface to getpwnam and friends.

One other feature I added was to convert a project id to a name. This proved rather more entertaining, as I discovered that the lookup functions are buried off in libproject rather than libc. While that of itself isn't necessarily an issue, libproject drags in quite a lot of subsidiary libraries. While in normal use they use lazy loading so that you don't drag in the extra libraries until you need them, it looks like jni drags them all in regardless. I'm not sure how useful project name/id lookups are going to be, so I may simply remove that bit of functionality.

SolView 0.47

One of the things I've been trying to do with SolView is to provide a user with multiple views of a Solaris system from all angles. I'm not going to claim I've succeeded, but I've released another SolView update with a couple of new features.

The first is to add charts from JKstat that show behaviour over time - visible in the processor and network view. I've put quite a bit into the chart support in JKstat, and was pleased that it was so easy to use it to add charts to SolView. (It's not entirely perfect - it would be nice if the colors in the cpu chart matched the cpustate widget, for example. And it's a shame that I don't collect history so that each time you look at a chart it starts from scratch.)

The second is a Process view from JProc. Which is what I wrote JProc for in the first place.

Apart from that there's the usual raft of bugfixes, and there's a little more polish. Enjoy!

Sunday, September 13, 2009

JKstat - interactive chart builder

In the latest update to JKstat I've introduced a new piece of functionality I've been wanting to get done for a while.

JKstat is pretty good at producing charts, but - until now - you've either had to take what it gives you or delve into the code. No longer. As of version 0.30, if you

./jkstat chartbuilder
you'll get a window just like this:

A kstat can be selected from the tree; you can then choose the statistic or statistics of interest, and then get a chart of either the rates of change or the values of the chosen statistic(s).

You can also choose whether to produce a line chart of the various statistics, or a stacked chart, with statistics in solid color stacked atop each other.

You can then choose whether to just show this individual instance of a kstat, or to show all the available instances (so you can, for example, produce a chart showing all the disks or all the cpus, just by picking one).

And you can decide whether to show the individual kstats, or whether to show an aggregate over all the kstats.

So, the following chart is generated by selecting cpu:0:sys from the tree, checking all instances, and asking for an aggregated, stacked chart of the cpu_nsec_idle, cpu_nsec_user, and cpu_nsec_kernel statistics, showing the relative proportions of user, system, and idle time on my system.

(If you're interested, the spike of user activity at about 21:19 is a build running; the huge splodge of kernel time just after 21:27 is VirtualBox starting up a VM.)

Tuesday, September 08, 2009

Everything Virtual

Virtualization is one of those hot IT topics, but the reality is that - in the general sense - virtualization has been around for what seems like forever.

I define virtualization as the abstraction of logical resources away from physical resources.

For example, NFS (the Network File System) allows you to access your data from any system, not just the one that happens to be physically connected to the disks on which the data is stored.

The X Window System allows you to display applications on a different host to the one where they might be running.

VNC (Virtual Network Computing) is one of many technologies that allow you to access a complete desktop on a different system to that where it's running.

The various types of Virtual Desktops supported by X11 window managers allow you to manage and display applications independently of the physical screen that you may be sitting in front of.

Solaris Zones allow you to construct an operating system instance that, while in any particular instantiation is locked to a piece of hardware, is logically distinct. It's relatively trivial to pack up a Zone, redeploy it on new hardware, and the application layer doesn't notice. Again, a level of abstraction from the physical.

Interestingly, the current virtualization fad doesn't quite work the way you would expect based on the definition I started with. Looking at the various virtual machine technologies currently in vogue, while the virtual machines are abstract from the underlying hardware, the abstraction that's presented to the user is that of physical systems. This doesn't introduce anything new, it just gives us the same old IT infrastructure we've always had, just replicated one layer up in the stack. Useful, but not revolutionary.

And the cloud doesn't really change that, because all you're getting with a cloud is an extra layer (and a whole new charging model) underneath your virtualization layer. Everything may be virtual, but in many cases all we're doing is finding new ways to implement the physical.

Wednesday, September 02, 2009

xfce-taskmanager on Solaris

XFCE has a little task manager application. It's not top or prstat, but it's reasonably lightweight and integrated.

(It also allows you to click on a column heading to sort. That's one of the things that JProc does as well and that you take for granted in a graphical application.)

The original task manager was linux-specific, and the way it wandered through /proc was never going to work on Solaris. But it was easy enough to produce a Solaris version that seems to work well enough for me. Source here.

Monday, August 17, 2009

JProc - procfs from java

As if accessing kstats from Java wasn't enough, I've recently been playing with accessing process information - specifically the /proc filesystem on Solaris - from java. Thus was born JProc.

The idea, eventually, is to link this with JKstat and SolView to give a more complete view of what's happening on a Solaris system.

The obligatory screenshot:

I think the correct description here is "work in progress".

Wednesday, July 22, 2009

Sane terminal behaviour with PgUp and friends

In gnome-terminal on Solaris 10, if you hit the PgUp key, it scrolls the window up. The PgDn, Home, and End keys similarly move the scrollbar.

That's the behaviour I want. It's also what you've traditionally had with xterm, dtterm, and the like.

Recently, it's become fashionable to pass PgUp and friends through to the application, so you get the application running inside the terminal emulator to do its own interpretation, and require the shift key to be pressed in order to get the terminal window itself to scroll.

I find that incredibly disruptive. I don't run anything inside a terminal that does its own scrolling (other than more, and I don't need to reach as far as the PgUp key to manipulate that). Most of the more advanced applications I run are graphical anyway.

So, how to fix? In my exploration of xfce, the Terminal application uses a newer version of vte which suffers from the above problem. So I set out to fix it, and it was relatively simple.

I used vte-0.20.5, and if you look at around line 5055 of vte.c, you'll see some code that looks like:

case GDK_KP_Page_Up:
case GDK_Page_Up:
if (modifiers & GDK_SHIFT_MASK) {
vte_terminal_scroll_pages(terminal, -1);
scrolled = TRUE;
handled = TRUE;
suppress_meta_esc = TRUE;

simply remove the test for the shift:

case GDK_KP_Page_Up:
case GDK_Page_Up:
vte_terminal_scroll_pages(terminal, -1);
scrolled = TRUE;
handled = TRUE;
suppress_meta_esc = TRUE;

for all four keys of interest, and rebuild.

My life is so much better as a result.

It's a shame that's not the default, but there are others who rely on the opposite behaviour. It would be nice if it was customizable - it's OK for me to build a separate copy for my own use on my own machine, but that's not very flexible. It would be neat if it could autodetect the terminal mode and do the right thing.

XFCE on Solaris 10

I'm getting somewhat unsatisfied with the various desktop environments available. The desktop is currently in the doldrums, presenting a dull and dreary landscape.

So, I thought I would give XFCE another look. OK, so it turns out that it is a little bit of work to get it running on Solaris 10. There's a bunch of prerequisites that you need to build first, and a couple of tweaks. So it was a bit tedious, but I got it to work fairly easily. My install notes are available to help others down the same path.

What I've got is essentially functional. I've not worked out why some of the menu and application icons don't appear, and a few of the window manager themes don't quite work (you get strange visual artifacts) but it's functional, meets my requirements (which may be different from other people) of being easy to use, has a reasonable choice of themes, isn't quite as heavyweight as some of the other desktops, and shows promise for the future.

Friday, July 10, 2009

Testing ZFS compression

One of my servers (it's a continuous build server) generates a significant amount of logfiles. About 150G so far, growing at over a gigabyte each day.

Now, logfiles tend to be compressible, so in preparation for migrating it to a new host (it's a zone, so moving from one physical host to another is trivial) I tested ZFS compression to see how well that would work.


Now, the default lzjb gives you fair compression. That saves 80% of my storage, so that's pretty good.

Going to gzip compression the space saving is spectacular. Unfortunately the machine runs like it's stuck in treacle, so that's not exactly practical.

Which is why I tried gzip-1. You save much more space than lzjb, although not as much as the default gzip (which is gzip-6). Unfortunately, while the machine doesn't quite go as comatose as it does with the regular gzip compression, the performance impact is still noticeable.

Overall, while I wouldn't mind the extra compression, the performance impact makes it hard, so I'm going to stick with the regular lzjb.

Sunday, June 14, 2009

JKstat, meet JavaFX

One of the announcements out of CommunityOne was the availability of JavaFX for Solaris and OpenSolaris - if you're lucky enough to be running on x86, anyway.

So the first thing that occurred to me was - could I use JavaFX to make a graphical front-end to JKstat?

(And the same thought occurred to Ben Rockwood as well, who asked me how easy it would be and pushed the thought from idle fancy to a must try, especially after I glibly said - "sure, dead easy".)

So the above image is jcpustate using the JKstat core api rendered with JavaFX. Of itself, it's nothing fancy, but it's served its purpose: I've got a rudimentary understanding of JavaFX out of it; I've worked out how to integrate JKstat amd JavaFX; and this validates that the JKstat core api does its job.

You can now download JKstat 0.29. Look in the jkstatfx directory, the source code for the above example is JCpuStateFX.fx.

I had a couple of issues getting this to work. I found I had to put the jni shared library ( directly into the right javafx lib directory - setting LD_LIBRARY_PATH didn't seem to work. The other was that JavaFX doesn't natively support either java Collections or Generics. It clearly understands them - because it tracks the types of the objects in my Collections correctly (and it doesn't understand how to cast objects back toi the right type either). I ended up having to use the toarray(T[] a) method to create a Sequence of the right type.

I have to say that this wasn't the feature I had originally planned for this version of JKstat. I've been working on an interactive chart builder so you're not limited to the fixed and limited set of charts that JKstat creates on its own, but that will have to wait a while for the next release.

Friday, May 29, 2009

Heading off to CommunityOne

Need to start packing as I'm off to CommunityOne.

It's going to be a busy week. Starting off, Sunday I'll be at the Open HA Cluster Summit, on Monday is CommunityOne, and the OGB will be out in force. If you're around, there's an OGB Town Hall at 6pm, before the OpenSolaris party.

Tuesday there's more CommunityOne, including the Deep Dives and the Crossbow BOF, then on Wednesday I'm hoping to meet up with the OSUG Bootcamp.

And, perhaps the most important thing about getting to events like this, I hope to meet as many people as possible in the few days I'm there.

Friday, May 15, 2009

Reports from the community for CommunityOne

The OGB will be at CommunityOne in a few weeks. If you would like to get involved, there's still time (just) to register - OpenSolaris will have a big presence!

On the Monday, June 1st, we're holding a Town Hall at 6pm, and we invite any and all members of the OpenSolaris community to come along, meet us, talk to us, give us feedback about OpenSolaris and how we the OGB can help in making the OpenSolaris community better.

(It's somewhat unfortunate that there's something else going on at 6pm that involves free alcohol or something like that. Hard choice, huh?)

As part of the OGB presence at CommunityOne, we're looking to display reports from communities, projects, and user groups that make up our community. What I'm thinking of here is lots of individual highlight slides we can put on display boards (so they ought to be self-contained on a single page). Bullet points, and pictures make it look better. I'll also put them all together into a single presentation (we might have that running in a permanent loop on a screen, for example) that anyone can take and use.

Longer term, I see this as a first run of a regular reporting process. The OpenSolaris community is doing great work, and we ought to all be aware of the great work that's going on and highlighting our achievements so they're more widely known.

Saturday, April 18, 2009

What can the OGB do for you?

It's a great honour to be elected to the OGB, and I want to thank everyone who took the trouble to vote.

So, what is the OGB going to do this year?

Let's start by looking at the Charter. This defines what the OGB has to do. And if you look at that, then the job of the OGB - above all else - is to construct and maintain the OpenSolaris Constitution. Much of the Charter is about the Constitution, and there's also quite a bit about the structure of the OGB itself (and a lot of that is the initial bootstrapping of the OGB when the OpenSolaris community was created and we started with the CAB).

Past that, the OGB then manages the community according to the Constitution. There are clear problems with the current Constitution, which is why a revised version was proposed. Having failed to pass, we're going to have to do something, because the current Constitution is a obstacle that's getting in the way of developing the community, but how we deal with that is another matter.

Note that the OGB is quite constrained in its operations. Both the Charter and the Constitution are reasonably specific about what we are supposed to do. In particular, it's pretty explicitly clear that the OpenSolaris we govern is not OpenSolaris the Sun distribution, nor is it OpenSolaris the codebase, it's just the OpenSolaris community.

However, I'm going to interpret the Charter slightly more broadly. At the beginning, it says to manage and direct an OpenSolaris community in its efforts to improve upon and advocate in favor of OpenSolaris. There are two phrases there that are key.

The first is manage and direct - and clearly we the OGB cannot in fact direct any member of the community to do anything. What we can do is put into place management processes that will allow the community to do its work more easily; it's up to the community to use them. One of the things we have talked about (it's come up before, but I think we want to push it again) is some level of reporting. We should expect every structural part of the community to produce regular status reports, so that we know what's going on. At the very least, that a project or community is still alive. Beyond that, these reports would promote wider awareness of what each group is doing, and would also give groups an opportunity to make bottlenecks known so that they can be acted upon. There are problems that are well known already, and we need to make sure that progress in those areas is maintained - and is more visible so that people don't think the problems are being ignored.

The second key phrase is advocate, and I would like to see the OGB more prominent as advocates and champions of OpenSolaris. On a personal note, this is difficult for a quiet reserved Englishman such as myself, but it needs doing. And at least some of the other OGB members (I'm not going to say all, because I certainly can't presume I can speak for them) feel that the OGB should be taking a more active role. If we don't, then call us to account.

Going back to the title - and given the limitations on the real power that the OGB has - what can the OGB do for you?

Friday, April 17, 2009

Backups done right?

I'm fed up with NetBackup. It vaguely works most of the time, but not all (which makes it difficult to justify it as a backup solution). And it's sucking up a lot of time - management and configuration takes a lot of work.

My experience with Legato has been great. It's incredibly easy to manage, just works, and sits there in the background without needing me to constantly mother it. It's not perfect - the latest java-based GUI is pretty horrid, and my experience of it in non-unix (ie. Windows) environments is that it's a duck out of water.

So, I'm looking around. I'll probably give Amanda a whirl. So if anyone has experience of both Amanda (or Zmanda) *and* one or both the above options, then feel free to comment - I'm particularly interested in comparisons.

Sunday, March 22, 2009

Newer != Better

We all know that just because something is new doesn't mean that it's better than what's gone before. A couple of examples I've had the misfortune to experience first-hand recently emphasize this:

Firefox 3 is dramatically inferior to Firefox 2. Not only does it feel much more sluggish, but the URL bar (in particular the drop-down menu) is terrible. Yes, the oldbar extension removes some of the maddening irritations of the look and feel, but the list of URLs presented is completely broken, to the point where it's worse than useless. I've not upgraded every machine I have, and the overall experience on the machines I have upgraded is pretty poor - so much so that I'm tempted to revert.

I've used emacs for decades. (And EDT/EVE/TPU before that.) For many years I've stayed with emacs 19.34, because it worked. Recently I've switched to emacs 22, because it's newer, maintained, and is what tends to be found on other systems. Again, the experience is staggeringly poor. Not only is it much slower but some of its features are just plain stupid. One of the more idiotic features I came across recently was that it assumes that a file named with a "zone" extension must be a DNS zone. (As a Solaris sysadmin, a zone is likely to be something else.) And if the file doesn't parse as it thinks a DNS zone file should then you are completely unable to even save the file - it's so smugly superior.

The world is full of other examples. Why do we put up with it?

Saturday, March 21, 2009

Agile SysAdmin

A while back we had a talk at LOSUG about Agile System Administration. While presenting a particular view, I think Gordon was also trying to get us systems administrators to think more about the theory underlying the work that we do.

Development has formal techniques - Agile, XP, Scrum, pair programming, test-first. And note that these are at a different layer from the actual programming skills required to write the software.

Does Systems Administration have the same sorts of practices? Does it need them? Do the concepts of Agile Development translate into Systems Administration?

Gordon has a blog discussing some of his thoughts on the subject. I encourage those of you interested in the subject to read and contribute. (The home page of the blog is just the welcome; you'll need to look at the Archives and Categories to find the actual blog entries.)

For what it's worth, I think somewhat differently, preferring flexible and lightweight (preferably zero-touch) minimalist administration over mandated standardization and heavyweight processes.

Sunday, March 08, 2009

SNMP GUI - Jangle 0.03

A little bit more work (thanks to all for comments and encouragement!) and a new version of Jangle is available.

Some highlights:
  • Prototype tree view
  • User-defined MIB location (from the File menu)
  • Some error handling
  • Work done in the background for faster startup and better responsiveness
  • The graph has the human-readable name if we can rather than the numeric OID


Monday, March 02, 2009


One of the comments I got yesterday was that my new Java SNMP browser didn't work on Ubuntu. That wasn't entirely a surprise - with such a rough first cut I was astonished that it worked at all, even for me.

But anyway, that was easy enough to fix and I've released a new version that does run on Ubuntu (verified under 8.10) and that ought to me more sane and portable generally. One thing I need to work out is where the MIB files tend to be found on different systems.

I've renamed it to Jangle. It's possible to think of this as

Java Assistant for Networked Graphs that's, Like, Easy

but I wouldn't want you to read too much into it. Really.

Download, and run ./jangle browser

(The new name has more zip, is less generic, and less likely to conflict with other tools.)

Sunday, March 01, 2009

A Java SNMP client

I've been looking at SNMP a bit more closely recently. One thing that has struck me is that while there are a number of java based SNMP clients, none of them seemed to be complete, and I couldn't obviously find one that could produce charts easily.

This is easy, right. So a little while later snmpgui was written.

OK, so this is version 0.01. I've been able to retrieve a list of things to monitor (that's the long list of OIDs above). Clearly that needs prettifying a bit - a tree view would be a nice next step. And I can retrieve values and produce pretty charts. That's incoming bytes on my main network interface.

It's a simple proof of concept - but it proves that something can be done pretty easily, and that it works.

SolView 0.46

I've uploaded a new version of SolView. This broadly corresponds to the version I talked about at LOSUG, with one new feature.

I've improved the Jumpstart Profile Builder by adding the ability to import an existing jumpstart profile. The idea is to use this as an initial customization of the set of installed packages that you can then modify further.

In practice, this didn't work out anything like as well as I wanted it to. One of the constraints I place upon the construction of a profile is that the package selection be self consistent: package dependencies must be satisfied. (Indeed, it was the desire to be able to construct such profiles that led to writing the profile builder in the first place.) However, most existing profiles have customizations that lead to unsatisfied package dependencies. Often, fixing up such profiles as they are imported ends up simply undoing all your customizations.

The current implementation is just a first pass, so I could improve it. For example, if it encounters a dependency error on import, it could ask you whether you want to keep the customization (and add or remove other packages as necessary) or ignore the customization. The problem is that this gets you into a horrible rats nest very quickly.

In practice, I find it easier to build a new profile based on an existing one by eye - looking at what I'm trying to add or remove and replicating that in the jumpstart profile builder.

So I'm tempted not to invest too much time on this feature. However, one thing I should be able to do relatively easily is to take the import code and use it to build a profile based on a straight list of packages. For example, this could be used to generate a jumpstart profile that would reproduce an existing installed system.

Friday, February 27, 2009

JKstat 0.27

I pushed out JKstat 0.27 a few days ago. This is just a minor update to 0.26.

I've been playing with the client-server mode a little, and while the basic code hasn't changed it's now a bit cleaner. I added a jkstat.bat and have been running the client on a Vista laptop quite successfully. I also enabled the ability to embed the client inside tomcat in addition to running the basic XML-RPC webserver.

I also fixed a regression spotted when I updated SolView. I had changed the name of the shared library in 0.25, but 0.26 had the old name again. The reason is simple forgetfulness. I develop on an x86 machine, which normally isn't a problem, but if I do any work on the native code then I need to copy across to a sparc machine to do a recompile. I did that just before uploading, and must have forgotten to copy the final version back to my development system. Doh!

Sunday, February 15, 2009

Simple Web Services

One of the things I like about XML-RPC is that is really is astonishingly easy to implement. I like to be able to understand the code I write - and that includes the things I reuse from elsewhere. And with XML-RPC, it's even better - it's normally possible to parse the XML that comes back by eye.

Contrast that with the monstrosity that SOAP has evolved into - a huge morass of complexity, massive code size, bloat everywhere, and it's really turned into a right mess. There's been nothing simple about SOAP for a long time now.

The snag with XML-RPC is that there are a couple of rather crippling limitations. In particular, you're stuck with 32-bit integers. I've had to enable vendor extensions in order to use 64-bit longs, and while that's fine for my client code, it makes the JKstat server far less useful for clients using other frameworks and languages.

So I'm stuck in a hole. I'm going to stick with XML-RPC for now, as it will be good enough to test with and is really easy to develop against. (And the way I've implemented the remote access makes it trivial to put in new or replacement access services.)

What's the way forward? I'm not at all keen on building a full web-services stack - that defeats the whole object of the exercise. There has been some discussion recently of exposing the kstat hierarchy via SNMP, but that's clunky and depends on SNMP (great if you use SNMP already, less great if you don't). I've been looking at things like JSON-RPC and RESTful web services as alternatives. The simplest approach may just be to encode everything as Strings and convert back.

Thursday, February 12, 2009

Client-server JKstat

One of the things I've wanted to add to JKstat for a while is some sort of client-server capability: the ability to run a server process on a machine and connect to it from a remote client which runs the GUI.

This separation of responsibility can offload the graphical processing associated with displaying the output off the server onto your desktop device. And that desktop device need not be running Solaris. Although I haven't tested it myself, there's no reason why you can't now run the client on - say - Windows.

So, how do you make this work? Having downloaded the latest (0.26) version of JKstat, unpack it, and on the server machine you wish to monitor:

./jkstat server -p 7367

where you can choose a port number, and on the client machine where you want the kstat browser to run

./jkstat remotebrowser -s http://server_name:7367/

so it knows where to look for the server.

The current implementation is very much an experimental proof of concept. There's no access control implemented at present, for example. And the way it operates is bound to change.

At the moment, it's using Apache XML-RPC. I like XML-RPC because it's very simple to use, pretty fast, and lightweight. There are also - in principle - implementations for other languages so you don't have to use Java.

And the Apache XML-RPC implementation can either be standalone (that's the way I'm using it right now) or embedded in a servlet container. The latter would allow you to put pretty well any access control mechanism you like in place.

I'm not particularly fixated about either XML-RPC or this specific implementation, though. This was chosen first because I've used it before and know it's so easy to implement - allowing me to concentrate on the JKstat specific part and seeing how well (or badly) client-server works at all before investing a lot of effort.

In practice, XML-RPC is a pretty good match. It implements Lists (or arrays) and Maps (or hashes) as native constructs. I don't need any more complex data structures. The weakness is that while all I need to pass in those structures are Strings and Long values, the latter requires Vendor extensions, which I've had to turn on, breaking the portability aspect.

Tuesday, January 27, 2009

Compression going really fast

The Sun coolthreads boxes are great for throughput, but not all that hot at single-threaded performance. One thing that seems to go particularly badly is compression.

Enter pbzip2

# pbzip2 foo.tar
50187.77u 156.31s 6:39.39 12605.2%

It can fill up all 128 threads and something that would otherwise take over half a day can be done in just over 6 minutes.

Friday, January 02, 2009

Zones, multiple interfaces, and routing

Some things are reasonably obvious in hindsight. This was one of them.

I've been consolidating some old applications into zones on a Solaris server.

Some of them were on physical servers, some were already in zones on other hardware. It turned out that the applications I was consolidating lived on two different subnets, and I didn't really want to go to the trouble of changing IP addresses.

No problem. The T5140 I was using has multiple interfaces, so I connected one of the unused interfaces to the second subnet and gave it an address (the server's primary interface was already in the first subnet I was using).

Then configure up the zones, remembering that you need to choose the correct network device depending on which subnet the zone is in.

And the zones didn't work. Bother. What did I forget? This:

At least one of the network interfaces used by a zone needs to have a default route associated with it.

Specifically, that second network interface needs to have a default route added to it. For the main host, it didn't matter - it will route packets over whichever interface it needs to. But if a zone is only associated with the second network interface, it can't use the default route associated with the first interface.

I add routes explicitly, so just a quick manual

route add net default

to add a default route for the second interface did the trick - you can have multiple default routes and Solaris will always use the right one.

To make this permanent, just add multiple lines to the /etc/defaultrouter file.