Monday, May 20, 2013

Sparse root zones in Tribblix

Zones was one of the pillars of Solaris 10 (the others being DTrace, SMF, and ZFS). Lightweight virtualization enabled deployment flexibility and significant consolidation.

The original implementation was heavily integrated with packaging. In many ways, it broke the packaging system. In OpenSolaris and Solaris 11, packaging was completely replaced, the zone implementation is very different, but suffers from the same fundamental flaw - it's integrated at the heart of packaging.

Furthermore, sparse-root zones - where most of the operating system is shared between zones, with just configuration and transient files being unique to a zone - do not exist in the new world order, with each zone now being a separate OS instance. The downside to this, apart from requiring significantly more RAM and disk, is that you then have to manage many instances of the OS, rather than just the one.

In Tribblix, I have reimplemented sparse-root (and whole-root) zones, so that they look very similar to what you had in Solaris 10. The implementation is completely different, though, in that it expects zones to understand packaging rather than expecting packaging to understand zones.

Read here on how to create a sparse-root zone using Tribblix. What follows is some of the under-the-hood details of the implementation I've put together.

First, zone configurations are stored in /etc/zones. If you look on a system that supports zones you'll see a number of xml files in that directory. Some correspond to the zones configured on the system; others are templates. For a sparse-root zone in Solaris 10, there will be some inherited-pkg-dir entries. In the Tribblix implementation, these become simply loopback mounts, handled no differently than any other mount.

Then under /usr/lib/brand you will find a number of directories containing scripts to manage zones. Some of it is shared, some specific to a given brand. I've created a sparse-root and a whole-root brand, and created the scripts to build zones of the correct type.

The key script is called pkgcreatezone, which is the script called to actually populate an empty zone with the bits that will make it work. (It's not called that in Solaris 10 - there you'll find a binary that calls another binary from Live Upgrade to do the work. But in OpenSolaris and Tribblix it's just a script.)

For the ipkg brand, the pkgcreatezone script sets a bunch of IPS variables and creates an IPS image followed by a bit of cleanup. Really, it's nothing complicated.

For the sparse-root brand, you get the main /lib, /usr, /platform, and /sbin directories mounted from the global zone, so you can ignore those. Some standard directories you can simply create. And then all I do is cpio the /etc and /var directories into the zone's file system, and that's it. Well, not quite. I actually use the SVR4 contents file to provide the list of files and directories to copy, so that I don't start copying random junk and only have what's supposed to be there. And one advantage of SVR4 packaging here is that it saves a pristine copy of editable files, so I put that in the zone rather than the modified one. All in all, it takes a couple of seconds or so to install a zone on a physical system, which is far quicker than the traditional zone creation method.

I stumbled across an unfortunate gotcha while doing this. SMF manifests used to be in /var (which was always an odd place to put what are configuration files). They're now in /lib, which is again a very odd place to put configuration files. But this has the unfortunate consequence that, as /lib is loopback mounted into a zone, all the SMF manifests in the global zone will be imported, even though many of them are for services that aren't relevant to  a zone, and some of which flat out fail with errors. So what I had to do was create a clone of /lib, delete all the manifests that aren't relevant, and use that as the source for the zone (that's what the /zonelib directory is about, by the way).

When creating a whole-root zone, I simply cpio the /lib, /usr, /platform, and /sbin directories as well. (Cleaning up the SMF manifests as before.) So that takes a few minutes, but is a lot quicker than the old whole-root creation in Solaris 10.

Once I had the zone creation figured, and the /lib shuffle sorted, the remaining problem was zone uninstall. I haven't changed anything for this, but I did need a bit of extra work in system installation.

# beadm list -H

What you see here is the output from beadm list -H. That second field is a UUID that uniquely identifies a boot environment. This is a ZFS property, named org.opensolaris.libbe:uuid, that's set on the ZFS dataset that corresponds to the root filesystem of the specified BE. If you create a zone, its file systems are tagged with the property org.opensolaris.libbe:parentbe that has the same value. When you uninstall a zone, it finds all the file systems that belong to the zone, and checks that they correspond to the currently running boot environment by comparing the UUIDs. I hadn't set this, so nothing matched and uninstall wasn't removing the zone file systems. In the future, the Tribblix installer will set that property and everything that needs it just works.

(As an aside, I ended up writing a quick and dirty script to generate the UUID, as Illumos doesn't actually have one. This is run in a minimalist install context, which I didn't want to bloat, so something that does a SHA1 digest of some data from /dev/random and mocks up the correct form does the trick nicely.)

So, the next release of Tribblix, the 0m6 prerelease, includes support for traditional whole-root and sparse-root zones. The point here isn't merely to simply replicate what's gone before, useful as that is. What this also shows is that, freed from the predefined constraints of a packaging system, you can generate completely arbitrary zone configurations, opening up a whole new array of possibilities.

No comments: