Monday, May 20, 2013

Sparse root zones in Tribblix

Zones was one of the pillars of Solaris 10 (the others being DTrace, SMF, and ZFS). Lightweight virtualization enabled deployment flexibility and significant consolidation.

The original implementation was heavily integrated with packaging. In many ways, it broke the packaging system. In OpenSolaris and Solaris 11, packaging was completely replaced, the zone implementation is very different, but suffers from the same fundamental flaw - it's integrated at the heart of packaging.

Furthermore, sparse-root zones - where most of the operating system is shared between zones, with just configuration and transient files being unique to a zone - do not exist in the new world order, with each zone now being a separate OS instance. The downside to this, apart from requiring significantly more RAM and disk, is that you then have to manage many instances of the OS, rather than just the one.

In Tribblix, I have reimplemented sparse-root (and whole-root) zones, so that they look very similar to what you had in Solaris 10. The implementation is completely different, though, in that it expects zones to understand packaging rather than expecting packaging to understand zones.

Read here on how to create a sparse-root zone using Tribblix. What follows is some of the under-the-hood details of the implementation I've put together.

First, zone configurations are stored in /etc/zones. If you look on a system that supports zones you'll see a number of xml files in that directory. Some correspond to the zones configured on the system; others are templates. For a sparse-root zone in Solaris 10, there will be some inherited-pkg-dir entries. In the Tribblix implementation, these become simply loopback mounts, handled no differently than any other mount.

Then under /usr/lib/brand you will find a number of directories containing scripts to manage zones. Some of it is shared, some specific to a given brand. I've created a sparse-root and a whole-root brand, and created the scripts to build zones of the correct type.

The key script is called pkgcreatezone, which is the script called to actually populate an empty zone with the bits that will make it work. (It's not called that in Solaris 10 - there you'll find a binary that calls another binary from Live Upgrade to do the work. But in OpenSolaris and Tribblix it's just a script.)

For the ipkg brand, the pkgcreatezone script sets a bunch of IPS variables and creates an IPS image followed by a bit of cleanup. Really, it's nothing complicated.

For the sparse-root brand, you get the main /lib, /usr, /platform, and /sbin directories mounted from the global zone, so you can ignore those. Some standard directories you can simply create. And then all I do is cpio the /etc and /var directories into the zone's file system, and that's it. Well, not quite. I actually use the SVR4 contents file to provide the list of files and directories to copy, so that I don't start copying random junk and only have what's supposed to be there. And one advantage of SVR4 packaging here is that it saves a pristine copy of editable files, so I put that in the zone rather than the modified one. All in all, it takes a couple of seconds or so to install a zone on a physical system, which is far quicker than the traditional zone creation method.

I stumbled across an unfortunate gotcha while doing this. SMF manifests used to be in /var (which was always an odd place to put what are configuration files). They're now in /lib, which is again a very odd place to put configuration files. But this has the unfortunate consequence that, as /lib is loopback mounted into a zone, all the SMF manifests in the global zone will be imported, even though many of them are for services that aren't relevant to  a zone, and some of which flat out fail with errors. So what I had to do was create a clone of /lib, delete all the manifests that aren't relevant, and use that as the source for the zone (that's what the /zonelib directory is about, by the way).

When creating a whole-root zone, I simply cpio the /lib, /usr, /platform, and /sbin directories as well. (Cleaning up the SMF manifests as before.) So that takes a few minutes, but is a lot quicker than the old whole-root creation in Solaris 10.

Once I had the zone creation figured, and the /lib shuffle sorted, the remaining problem was zone uninstall. I haven't changed anything for this, but I did need a bit of extra work in system installation.

# beadm list -H
tribblix;51f2d0f4-df6e-6e48-dc0a-a74f37e14930;NR;/;3387047936;static;1361968342


What you see here is the output from beadm list -H. That second field is a UUID that uniquely identifies a boot environment. This is a ZFS property, named org.opensolaris.libbe:uuid, that's set on the ZFS dataset that corresponds to the root filesystem of the specified BE. If you create a zone, its file systems are tagged with the property org.opensolaris.libbe:parentbe that has the same value. When you uninstall a zone, it finds all the file systems that belong to the zone, and checks that they correspond to the currently running boot environment by comparing the UUIDs. I hadn't set this, so nothing matched and uninstall wasn't removing the zone file systems. In the future, the Tribblix installer will set that property and everything that needs it just works.

(As an aside, I ended up writing a quick and dirty script to generate the UUID, as Illumos doesn't actually have one. This is run in a minimalist install context, which I didn't want to bloat, so something that does a SHA1 digest of some data from /dev/random and mocks up the correct form does the trick nicely.)

So, the next release of Tribblix, the 0m6 prerelease, includes support for traditional whole-root and sparse-root zones. The point here isn't merely to simply replicate what's gone before, useful as that is. What this also shows is that, freed from the predefined constraints of a packaging system, you can generate completely arbitrary zone configurations, opening up a whole new array of possibilities.





Monday, May 06, 2013

Seeking the golden turd

Certain trends in IT become popular. The next big thing, as it were.

That's according to the pundits. Who often have a product to sell that they've slapped the latest trendy label on, or a professional services arm ready to take a wad of your cash on a consulting engagement.

Take Big Data, as an example. (Even the name is an oxymoron.) Let me summarize:

Big Data is all about wading through a cesspit of data searching for a useful nugget of information.

The related trend of Analytics is about polishing what you find until it shines.

Businesses can be fooled into thinking they have a valuable nugget; break it open and you discover it's just a turd.

Sunday, April 21, 2013

Tribblix 0m5 - solidification

In Tribblix Milestone 5, there's the dual element of increasing solidity and new development.

First, the new development: ZAP is a simple network package install utility. As in, really simple. Use it like so (as root):

zap install-overlay openexr

or

zap install TRIBpekwm

It should be obvious that it's nowhere near finished, but the necessary first step of having the command exist and the packages be available on the network has been achieved.

As part of that, the funky pkgs.zlib file on the iso that used to be lofi mounted for package installtion has gone. Instead, there's a directory with packages (in zap format) inside it. This is far simpler, and is also much quicker. With a little extra care in package construction, it's also smaller.

Next, a reversion. I've reverted the compiler and toolchain back to gcc3, as in earlier versions and matching OpenIndiana. Migrating to gcc4 is still a target (and is necessary for some newer software) but it has to be done right, and I'm not entirely happy with the gcc4 builds I've been testing. get the system compiler and toolchain wrong, and it's a mistake you have to live with for years.

And there's some polish. Most of this is covered by the change list. Many packages have been rebuilt, which can bring them up to date, optimize their space usage, or build them to my standards rather than importing them from OpenIndiana. Firefox is current, which is important. And there are little things, like including some themes for WindowMaker.

I've said before that there's no real roadmap or release schedule - this is, after all, largely a hobby project. And two months between milestones is rather longer than I would have liked. But to give you a flavour of what might be coming up - gcc4 done right, upgrades, LibreOffice, and working zones are all targets. (Of course, there's significant work in all those areas.)

Sunday, April 14, 2013

Zip Archive Packaging

Under the hood, Tribblix uses the traditional SVR4 packaging utilities. There are a number of reasons for this - compatibility, simplicity, and a low footprint are among them. They're also good enough to get the job done. (And my strong belief is that the underlying package tools should become invisible and thus their implementation irrelevant, so the simpler and smaller the better.)

While SVR4 packaging does support installation of packages from networked locations over http, the support isn't great. The native support was almost never used in practice and its implementation is pretty poor (so much so that I would much rather just rip it out to simplify the code).

Allowing package installation from network repositories is expected of any modern system. However, the packaging system itself doesn't need to do so natively. There are any number of utilities and toolkits to do the network retrieval part - curl, wget, and essentially every modern scripting language will do the job.

Which leaves only the question as to what format to use in putting the data on your networked repository. The requirements here are:
  • A package is packed up into a single file, to allow easy and efficient transfer using any medium
  • The package should be compressed
  • The contents of the package should be easily accessible on any platform without special tools
  • A file should be able to contain multiple packages
If you look at SVR4 packaging, it has two native formats - filesystem and datastream. The former is simply all the files in the package laid out in a directory hierarchy, the latter is a single-file format. However, package datastream isn't generally suitable - it isn't natively compressed, and it's a private format that can't be easily accessed without the SVR4 tools.

The alternative solution I'm using is to simply zip up the filesystem format into a zip file. Hence, Zip Archive Packaging or zap for short.

This has the following advantages:
  • Single file, can contain multiple packages
  • Natively compressed
  • Widespread support to unpack the archives
  • Efficient random access
  • Efficient extraction of list of contents
  • Widely used in other contexts (eg. jar, war files)
  • Some level of data integrity checking
  • No need for any additional tools
  • Supports extensibility for additional functionality later
Now, the standard widely used versions of zip don't support much compression beyond DEFLATE. Newer versions do, but availability isn't universal. So I limit myself to basic DEFLATE - although you can compress better than regular zip.

So installing a package from a network repo in Tribblix is down to a very simple shell script that runs curl + unzip + pkgadd.

Thursday, March 28, 2013

Zipping up tighter

I've recently been creating a lot of zip files. Now, for this purpose the output has to be a regular zip file - readable by all the zip tools out there, including older versions and the jar utility. Change format and you can get better compression, for sure, but you're not compatible with all the existing tools. That rules out the bzip2 support in newer versions of zip and unzip, as well.

To create a zipfile with the zip command is basically:

zip -9 -q -r output.zip input_files ...
 

Now, p7zip can also create zip files (and others) that are absolutely compatible.

7za a -tzip -mx=9 -mfb=256 output.zip input_files ...


On my test data, this gives an additional 4% over the best that zip can do. Might not sound much, but on a CD-sized iso image that's an additional 30M of data you can squeeze in.

Saturday, March 02, 2013

Tribblix 0m4 - wake up and smell the coffee

For Tribblix, I don't have a formal development or release schedule.

What I do have is a set of targets or Milestones, which may be features, software, or part of the build process. What I don't have is any dates associated with these, or any specific order in which they might get worked on.

As a rough summary of the milestones so far:

  • Milestone 0 simply proved that I could make a distribution that worked
  • Milestone 1 added Xfce
  • Milestone 2 used packages from an Illumos build, rather than indirectly via OpenIndiana
  • Milestone 3 added Enlightenment E17, went up to gcc 4.7.2 as the base compiler, and included LZ4 compression for ZFS
I've now made the Milestone 4 build available for download. The new feature here is that java is available, courtesy of OpenJDK.

This allows me to include the other tools I've developed, JKstat, KAR, JProc, and SolView as part of the distribution.

Time to put the kettle on and enjoy the coffee.

Monday, February 25, 2013

1.0 - jkstat, kar, jproc, and solview

After working on them for ages, I've finally released JKstat, KAR, JProc, and SolView as version 1.0.

There are not many changes, no earth-shattering new features, actually very little has changed. And that's largely the point - development has slowed, and what's there is largely stable and unlikely to change. So it's time to call it 1.0 and have done with it.

A second reason is that there are a number of changes that I would like to make, that require incompatible change. There are changes in Solaris and the open-source Illumos derivatives that would make JKstat in particular incompatible, and I would like to migrate to a more recent Java as a baseline. So the 1.0 versions (and any micro releases to fix problems) will remain compatible with Solaris 10 and Java 5, while new development will focus on a forthcoming version 2.0 that will require something newer than Solaris 10 (possibly compatible with recent Solaris 10 updates) and will jump to Java 7.