Friday, November 29, 2013

Tribblix - making PXE boot work

One of the key changes in the latest milestone of Tribblix is the ability to bot and install a system over the network, using PXE. I've covered how to set this up elsewhere, but here I'll talk a little about how this is implemented under the covers.

Essentially, the ISO image has 3 pieces.
  1. The platform directory contains the kernel, and the boot archive. This is what's loaded at boot.
  2. The file solaris.zlib is a lofi compressed file containing an image of the /usr filesystem.
  3. The pkgs directory contains additional SVR4 packages that can be installed.
When booting from CD, grub loads the boot archive into memory and hands over control. There's then a little bit of magic where it tries to mount every device it can find looking for the CD image - it actually checks that the .volsetid file found on a device matches the one in the boot archive to ensure it gets the right device, but once that's done it mounts the CD image in a standard place and from then on knows precisely where to find everything.

When you boot via PXE, you can't blindly search everywhere in the network for the location of solaris.zlib, so the required location is set as a boot argument in menu.lst, and the system extracts the required value from the boot arguments.

What it will get back is a URL of a server, so it appends solaris.zlib to that and retrieves it using wget. The file is saved to a known location and then lofi mounted. Then boot proceeds as normal.

Note that you can use any dhcp/tftp server  for the PXE part, and any http server. There's no requirement on the server side for a given platform, configuration, or software. (And it doesn't even have to be http, as long as it's a protocol built into wget.)

It's actually very simple. There are, of course, a few wrinkles along the way.
  • There are some files in /usr that are needed to mount /usr, so the boot archive contains a minimally populated copy of /usr that allows you to bootstrap the system until you mount the real /usr over the top of it
  • For PXE boot, you need more such files in the boot archive than you do for booting from CD. In particular, I had to add prtconf (used in parsing boot arguments) and wget (to do the retrieve over http)
  • I add wget rather than curl, as the wget package is much smaller than the curl package, even though I had previously standardised on curl for package installation
  • Memory requirements are a little higher than for a CD boot, as the whole of solaris.zlib is copied into memory. However, because it's in memory, the system is really fast
It's possible to avoid this by simply putting all of /usr into the boot archive. The downside to that is that it's quite slow to load - you haven't got a fully-fledged OS running at that point, tftp isn't as reliable as it should be and can fail when retrieving larger files, and it pushes up the hard memory requirement for a CD boot. So I've stuck with what I have, and it works reasonably well.

The final piece of the ISO image is the additional packages. If you tell the system nothing, it will go off to the main repositories to download any packages. (Please, don't do this. I'm not really set up to deliver that much traffic.) But you can copy the pkgs directory from the iso image and specify that location as a boot argument so the installer knows where the packages are. What it actually does underneath is set that location up as the primary repository temporarily during the install.

The present release doesn't have automation - booting via PXE is just like booting from CD, and you have to run the install interactively. But all the machinery is now in place to build a fully automated install mechanism (think like jumpstart, although it'll achieve the same goals via completely different means).

One final note. Unlike the OpenSolaris/Solaris 11/OpenIndiana releases which have separate images for server, desktop, and network install, Tribblix has a single image that does all 3 in one. The ability to define installed packages eliminates the need for separate desktop (live) and server (text) images, and the PXE implementation described here means you can take the regular iso and use that for network booting.

Tribblix - getting boot arguments

This explains how I handled boot arguments for Tribblix, but it's generally true for all illumos and similar distributions. This is necessary for things like PXE boot and network installation, where you need to be able to tell the system critical information without baking it into the source.

And this particular mechanism described here is for x86 only. It's unfortunate that the boot mechanism is architecture specific.

Anyway, back to boot arguments. Using grub, you use the menu.lst file to determine how the system boots. In particular, the kernel$ line specifies which kernel to boot, and you can pass boot arguments. For example, it might say

kernel$ /platform/i86pc/kernel/$ISADIR/unix -B console=ttya

and, in this case, what comes after -B is the boot arguments. This is a list of key=value pairs, comma separated.

Another example, from my implementation of PXE boot,might be:

-B install_pkgs=

So that's how they're defined, and you can really define anything you like. It's up to the system to interpret them as it sees fit.

When the system boots, how do you access these parameters? They're present in the system configuration as displayed by prtconf. In particular

prtconf -v /devices

gets you the information you want - containing a bunch of standard information and the boot arguments. Try this on a running system, and you'll see things like what program actually got booted:

        name='bootprog' type=string items=1

So, all you have to do to find the value of a boot argument is look through the prtconf output for the name of the boot argument you're after, and then pick the value off the next line. Going back to my example earlier, we just look for install_pkgs and get the value. This little snippet does the job:

PKGMEDIA=`/usr/sbin/prtconf -v /devices | \
    /usr/bin/sed -n '/install_pkgs/{;n;p;}' | \
    /usr/bin/cut -f 2 -d \'`

(Breaking this down, sed -n outputs nothing by default, looks for the pattern in /install_pkgs/, then the {;n;p;} skips to the next line and prints it, then cut grabs the second word, split by the quote. Ugly as heck.)

At this point, you can test whether the argument was defined, and use it in your scripts.

Friday, June 14, 2013

Do we hate our users?

As part of my job, I get to deal with all sorts of oddball systems and setups. Whether this is something we've inherited through acquisition, trying to resurrect or repair some antique legacy system, or needing to make some strange application nobody has ever heard of, it tends to veer in my direction.

As a result, I've had the misfortune to use and fix a wide variety of systems and applications, obviously all built by someone else.

Based on this, I can only come to one conclusion: most Unix Adminstrators hate their users, and do everything they can to make their lives miserable.

That's a pretty grim statement, and I'm hoping that most of the people reading here won't fall into that category. But here's just one example today:

I have to migrate an application, so was given a login to the system so I could check it out. What interactive shell do I get? They've given me, and most users by the looks of it, /bin/sh, on a Solaris 8 box.

Sheesh. I've been using an interactive shell that supports command line recall and editing, not to mention completion and spell-checking, since the 1980s. There is absolutely no excuse in the 21st century not to give users a decent shell. If it's not deliberate hatred of your users, then it's either laziness or incompetence.

It goes beyond that, of course. There's no excuse not to provide users with a properly configured environment, install the tools they need to do their job, and provide enough disk space to store their data. (OK. Here's another example: how many storage shops still allocate itty-bitty storage measured in gigabytes?) Yet I see too many systems set up in such a way that it's completely painful to use.

Worse, users (and developers) assume that the systems are intrinsically rubbish and the IT department incompetent. OK, the second part might be true. But that's one reason they go off and try to provide resources for themselves.

As I said earlier, I'm preaching to the converted, right?

Tuesday, May 28, 2013

The disappearance of packaging

One key differentiator between different Linux distributions has been the packaging system used. The same is happening in the world of Illumos distributions, some use IPS, some debian packaging, SmartOS uses pkgsrc, Tribblix sticks true to the retro feel of Solaris by using SVR4.

Overall, there's been a huge amount of effort expended on packaging. Consider the replacement of SVR4 packaging with IPS - a huge multi-year multi-person effort, that required almost the whole of Solaris to be retooled to fit. And yet, this is all wasted effort.

When choosing a packaging system for Tribblix I deliberately chose SVR4 for 3 reasons: it was compatible with what had gone before, it was something I was familiar with, and it was reasonably lightweight and simple. If I had come from a Linux background, I may well have just gone with rpm or dpkg. The key is simplicity and minimal footprint.

What of packaging in the future? I see it largely disappearing. You can see this in the consumerization of applications: it's the App Store, not a package repository. Package management is conspicuous by its absence in the modern world of IT. Looking at where Ubuntu are heading, you can see the same thing. That's not the only initiative - look at AppStream for another example.

And that brings me back to using SVR4 in Tribblix - it's about the lightest weight packaging option I have available. And frankly, it's still far too bloated and complex. But it's merely an implementation detail that is largely invisible, and I want it to become less visible than it is at present.

The point here is that packages aren't relevant to users. Applications are. Which is why the notion of overlays is central to Tribblix - at their simplest, overlays are simply collections of packages (I could have used the term cluster, but that already has meaning to the Solaris installer, although it was never exposed to administrators later which was a terrible design), but the idea is that you manage software at the level of abstraction of an overlay, rather than at a package level.

Even as a unit of delivery, packages aren't that useful - they normally arise as build artifacts, which don't necessarily map well to user needs. And that's another thing - what constitutes a useful component of a package isn't fixed, but is very much context dependent. Worse, the possible contexts in which a package can be used isn't known ahead of time, so the packager cannot enumerate all the possible uses of the software they're packaging. And an individual package is almost never useful in isolation - most working applications are the leaf nodes of a large complex tree. Dependency management is another game where, if you play, you lose. Rather than tightly-coupled systems with strong dependency management, I'm looking for loosely coupled largely self contained units of delivery. If necessary, application bundles manage their own dependencies rather than relying on the system to do so.

Despite the title, it's not that packaging will disappear, but it will (I hope) become largely invisible.

Monday, May 20, 2013

Sparse root zones in Tribblix

Zones was one of the pillars of Solaris 10 (the others being DTrace, SMF, and ZFS). Lightweight virtualization enabled deployment flexibility and significant consolidation.

The original implementation was heavily integrated with packaging. In many ways, it broke the packaging system. In OpenSolaris and Solaris 11, packaging was completely replaced, the zone implementation is very different, but suffers from the same fundamental flaw - it's integrated at the heart of packaging.

Furthermore, sparse-root zones - where most of the operating system is shared between zones, with just configuration and transient files being unique to a zone - do not exist in the new world order, with each zone now being a separate OS instance. The downside to this, apart from requiring significantly more RAM and disk, is that you then have to manage many instances of the OS, rather than just the one.

In Tribblix, I have reimplemented sparse-root (and whole-root) zones, so that they look very similar to what you had in Solaris 10. The implementation is completely different, though, in that it expects zones to understand packaging rather than expecting packaging to understand zones.

Read here on how to create a sparse-root zone using Tribblix. What follows is some of the under-the-hood details of the implementation I've put together.

First, zone configurations are stored in /etc/zones. If you look on a system that supports zones you'll see a number of xml files in that directory. Some correspond to the zones configured on the system; others are templates. For a sparse-root zone in Solaris 10, there will be some inherited-pkg-dir entries. In the Tribblix implementation, these become simply loopback mounts, handled no differently than any other mount.

Then under /usr/lib/brand you will find a number of directories containing scripts to manage zones. Some of it is shared, some specific to a given brand. I've created a sparse-root and a whole-root brand, and created the scripts to build zones of the correct type.

The key script is called pkgcreatezone, which is the script called to actually populate an empty zone with the bits that will make it work. (It's not called that in Solaris 10 - there you'll find a binary that calls another binary from Live Upgrade to do the work. But in OpenSolaris and Tribblix it's just a script.)

For the ipkg brand, the pkgcreatezone script sets a bunch of IPS variables and creates an IPS image followed by a bit of cleanup. Really, it's nothing complicated.

For the sparse-root brand, you get the main /lib, /usr, /platform, and /sbin directories mounted from the global zone, so you can ignore those. Some standard directories you can simply create. And then all I do is cpio the /etc and /var directories into the zone's file system, and that's it. Well, not quite. I actually use the SVR4 contents file to provide the list of files and directories to copy, so that I don't start copying random junk and only have what's supposed to be there. And one advantage of SVR4 packaging here is that it saves a pristine copy of editable files, so I put that in the zone rather than the modified one. All in all, it takes a couple of seconds or so to install a zone on a physical system, which is far quicker than the traditional zone creation method.

I stumbled across an unfortunate gotcha while doing this. SMF manifests used to be in /var (which was always an odd place to put what are configuration files). They're now in /lib, which is again a very odd place to put configuration files. But this has the unfortunate consequence that, as /lib is loopback mounted into a zone, all the SMF manifests in the global zone will be imported, even though many of them are for services that aren't relevant to  a zone, and some of which flat out fail with errors. So what I had to do was create a clone of /lib, delete all the manifests that aren't relevant, and use that as the source for the zone (that's what the /zonelib directory is about, by the way).

When creating a whole-root zone, I simply cpio the /lib, /usr, /platform, and /sbin directories as well. (Cleaning up the SMF manifests as before.) So that takes a few minutes, but is a lot quicker than the old whole-root creation in Solaris 10.

Once I had the zone creation figured, and the /lib shuffle sorted, the remaining problem was zone uninstall. I haven't changed anything for this, but I did need a bit of extra work in system installation.

# beadm list -H

What you see here is the output from beadm list -H. That second field is a UUID that uniquely identifies a boot environment. This is a ZFS property, named org.opensolaris.libbe:uuid, that's set on the ZFS dataset that corresponds to the root filesystem of the specified BE. If you create a zone, its file systems are tagged with the property org.opensolaris.libbe:parentbe that has the same value. When you uninstall a zone, it finds all the file systems that belong to the zone, and checks that they correspond to the currently running boot environment by comparing the UUIDs. I hadn't set this, so nothing matched and uninstall wasn't removing the zone file systems. In the future, the Tribblix installer will set that property and everything that needs it just works.

(As an aside, I ended up writing a quick and dirty script to generate the UUID, as Illumos doesn't actually have one. This is run in a minimalist install context, which I didn't want to bloat, so something that does a SHA1 digest of some data from /dev/random and mocks up the correct form does the trick nicely.)

So, the next release of Tribblix, the 0m6 prerelease, includes support for traditional whole-root and sparse-root zones. The point here isn't merely to simply replicate what's gone before, useful as that is. What this also shows is that, freed from the predefined constraints of a packaging system, you can generate completely arbitrary zone configurations, opening up a whole new array of possibilities.

Monday, May 06, 2013

Seeking the golden turd

Certain trends in IT become popular. The next big thing, as it were.

That's according to the pundits. Who often have a product to sell that they've slapped the latest trendy label on, or a professional services arm ready to take a wad of your cash on a consulting engagement.

Take Big Data, as an example. (Even the name is an oxymoron.) Let me summarize:

Big Data is all about wading through a cesspit of data searching for a useful nugget of information.

The related trend of Analytics is about polishing what you find until it shines.

Businesses can be fooled into thinking they have a valuable nugget; break it open and you discover it's just a turd.

Sunday, April 21, 2013

Tribblix 0m5 - solidification

In Tribblix Milestone 5, there's the dual element of increasing solidity and new development.

First, the new development: ZAP is a simple network package install utility. As in, really simple. Use it like so (as root):

zap install-overlay openexr


zap install TRIBpekwm

It should be obvious that it's nowhere near finished, but the necessary first step of having the command exist and the packages be available on the network has been achieved.

As part of that, the funky pkgs.zlib file on the iso that used to be lofi mounted for package installtion has gone. Instead, there's a directory with packages (in zap format) inside it. This is far simpler, and is also much quicker. With a little extra care in package construction, it's also smaller.

Next, a reversion. I've reverted the compiler and toolchain back to gcc3, as in earlier versions and matching OpenIndiana. Migrating to gcc4 is still a target (and is necessary for some newer software) but it has to be done right, and I'm not entirely happy with the gcc4 builds I've been testing. get the system compiler and toolchain wrong, and it's a mistake you have to live with for years.

And there's some polish. Most of this is covered by the change list. Many packages have been rebuilt, which can bring them up to date, optimize their space usage, or build them to my standards rather than importing them from OpenIndiana. Firefox is current, which is important. And there are little things, like including some themes for WindowMaker.

I've said before that there's no real roadmap or release schedule - this is, after all, largely a hobby project. And two months between milestones is rather longer than I would have liked. But to give you a flavour of what might be coming up - gcc4 done right, upgrades, LibreOffice, and working zones are all targets. (Of course, there's significant work in all those areas.)

Sunday, April 14, 2013

Zip Archive Packaging

Under the hood, Tribblix uses the traditional SVR4 packaging utilities. There are a number of reasons for this - compatibility, simplicity, and a low footprint are among them. They're also good enough to get the job done. (And my strong belief is that the underlying package tools should become invisible and thus their implementation irrelevant, so the simpler and smaller the better.)

While SVR4 packaging does support installation of packages from networked locations over http, the support isn't great. The native support was almost never used in practice and its implementation is pretty poor (so much so that I would much rather just rip it out to simplify the code).

Allowing package installation from network repositories is expected of any modern system. However, the packaging system itself doesn't need to do so natively. There are any number of utilities and toolkits to do the network retrieval part - curl, wget, and essentially every modern scripting language will do the job.

Which leaves only the question as to what format to use in putting the data on your networked repository. The requirements here are:
  • A package is packed up into a single file, to allow easy and efficient transfer using any medium
  • The package should be compressed
  • The contents of the package should be easily accessible on any platform without special tools
  • A file should be able to contain multiple packages
If you look at SVR4 packaging, it has two native formats - filesystem and datastream. The former is simply all the files in the package laid out in a directory hierarchy, the latter is a single-file format. However, package datastream isn't generally suitable - it isn't natively compressed, and it's a private format that can't be easily accessed without the SVR4 tools.

The alternative solution I'm using is to simply zip up the filesystem format into a zip file. Hence, Zip Archive Packaging or zap for short.

This has the following advantages:
  • Single file, can contain multiple packages
  • Natively compressed
  • Widespread support to unpack the archives
  • Efficient random access
  • Efficient extraction of list of contents
  • Widely used in other contexts (eg. jar, war files)
  • Some level of data integrity checking
  • No need for any additional tools
  • Supports extensibility for additional functionality later
Now, the standard widely used versions of zip don't support much compression beyond DEFLATE. Newer versions do, but availability isn't universal. So I limit myself to basic DEFLATE - although you can compress better than regular zip.

So installing a package from a network repo in Tribblix is down to a very simple shell script that runs curl + unzip + pkgadd.

Thursday, March 28, 2013

Zipping up tighter

I've recently been creating a lot of zip files. Now, for this purpose the output has to be a regular zip file - readable by all the zip tools out there, including older versions and the jar utility. Change format and you can get better compression, for sure, but you're not compatible with all the existing tools. That rules out the bzip2 support in newer versions of zip and unzip, as well.

To create a zipfile with the zip command is basically:

zip -9 -q -r input_files ...

Now, p7zip can also create zip files (and others) that are absolutely compatible.

7za a -tzip -mx=9 -mfb=256 input_files ...

On my test data, this gives an additional 4% over the best that zip can do. Might not sound much, but on a CD-sized iso image that's an additional 30M of data you can squeeze in.

Saturday, March 02, 2013

Tribblix 0m4 - wake up and smell the coffee

For Tribblix, I don't have a formal development or release schedule.

What I do have is a set of targets or Milestones, which may be features, software, or part of the build process. What I don't have is any dates associated with these, or any specific order in which they might get worked on.

As a rough summary of the milestones so far:

  • Milestone 0 simply proved that I could make a distribution that worked
  • Milestone 1 added Xfce
  • Milestone 2 used packages from an Illumos build, rather than indirectly via OpenIndiana
  • Milestone 3 added Enlightenment E17, went up to gcc 4.7.2 as the base compiler, and included LZ4 compression for ZFS
I've now made the Milestone 4 build available for download. The new feature here is that java is available, courtesy of OpenJDK.

This allows me to include the other tools I've developed, JKstat, KAR, JProc, and SolView as part of the distribution.

Time to put the kettle on and enjoy the coffee.

Monday, February 25, 2013

1.0 - jkstat, kar, jproc, and solview

After working on them for ages, I've finally released JKstat, KAR, JProc, and SolView as version 1.0.

There are not many changes, no earth-shattering new features, actually very little has changed. And that's largely the point - development has slowed, and what's there is largely stable and unlikely to change. So it's time to call it 1.0 and have done with it.

A second reason is that there are a number of changes that I would like to make, that require incompatible change. There are changes in Solaris and the open-source Illumos derivatives that would make JKstat in particular incompatible, and I would like to migrate to a more recent Java as a baseline. So the 1.0 versions (and any micro releases to fix problems) will remain compatible with Solaris 10 and Java 5, while new development will focus on a forthcoming version 2.0 that will require something newer than Solaris 10 (possibly compatible with recent Solaris 10 updates) and will jump to Java 7.