Sunday, December 04, 2016

Tribblix and the new illumos loader

Recently, a new boot loader was added to illumos, which will in time replace the old and venerable grub that we've been using for about a decade.

I've been looking at how this will impact Tribblix.

The boot loader's arrival was heralded long in advance. I actually released Tribblix milestone 18 when I did to ensure I didn't have to deal with any loader issues. Not that I was expecting any issues, but just in case.

The first step in looking at the impact of the new loader was to build a current copy of illumos. I had a couple of issues due to recent illumos changes. The first being that the transition to Python 2.7 didn't work with my copy of python (I need to build a dual 32/64 bit installation) so I used the old copy of python 2.6. The second was that the loader wants /usr/sfw/bin/gstrip, which I've never had, but a quick symlink set that straight.

The loader is a new package. The first thing I tried was to build an ISO exactly as I always have. This ISO knows nothing about the new loader, doesn't have the loader present, and uses grub just as it always has. If you pretend the new loader doesn't exist, everything just works the way it did before. That's encouraging as a fallback position

Next step was to add the package for the new loader, and persuade the ISO to boot from it. This was very easy, you just need to change the path to the boot image when calling mkisofs. For grub, it was

-b boot/grub/stage2_eltorito

and for the new loader it becomes

-b boot/cdboot

That should be it, but it then tripped up on a Tribblix customization. The loader needs to know where the kernel and the boot archive are. The defaults are reasonable, but use $ISADIR to pick up a 32 or 64-bit image as required. On live media, Tribblix has a single merged boot archive, so I need to override the boot_archive_name to not use $ISADIR. So I create a file /boot/loader.conf.local that contains



boot_archive_load="YES"
boot_archive_type="rootfs"
boot_archive_name="/platform/i86pc/boot_archive"

boot_archive.hash_load="NO"
boot_archive.hash_type="hash"
boot_archive.hash_name="/platform/i86pc/${ISADIR}/boot_archive.hash"


and then make sure that I delete that file on the installed image, where things will look like a regular system again.

Thinking about this, it would have been more sensible to drop a file into /boot/conf.d which is another location that the loader uses for customization. I use this for something else, I create a file /boot/conf.d/chaindisk containing

chain_disk="disk0:"

and the loader menu will have a "boot from hard disk" entry, which I think you do need on media. Again, this gets deleted from the installed system where it doesn't make any sense.

Something else you can do is tweak the branding. I've played with changing the illumos name on the boot screen with Tribblix (look at the ascii art in /boot/forth/brand-illumos.4th for example).

To make the installed system bootable used to involve messing with installgrub, now bootadm can manage it for you. That's just

/sbin/bootadm install-bootloader -M -P rpool

and it should handle pools with multiple drives correctly.

The only other thing the installer needs to do, as far as I can tell, is initialize the list of boot environments. This is similar to grub, and involves putting 2 lines into /rpool/boot/menu.lst, for example

title Tribblix 0.19
bootfs rpool/ROOT/tribblix

and there you are. Some relatively simple changes and Tribblix is ready to use the new loader.

Well, almost. This needs to be packaged up and polished, and I still need to change and test the UFS installer, SPARC builds, and installation into an existing pool.

Sunday, October 09, 2016

zfs receive oddity

Every so often, even a system as good as zfs will throw you a curveball. This one threw me for a while, and here's a simplified example.

All I'm trying to do here is replicate one file system. So I create it, touch a file so I know it's made it.

zfs create -o rpool/t1
touch /rpool/t1/1

OK, snapshot it and send it.

zfs snapshot rpool/t1@t1s1
zfs send rpool/t1@t1s1 | zfs recv rpool/t2

Create another file, and create a snapshot at both source and destination.

touch /rpool/t1/2
zfs snapshot rpool/t1@t1s2
zfs snapshot rpool/t2@t2s2

And now send an incremental stream from the original.

zfs send -i rpool/t1@t1s1 rpool/t1@t1s2 | zfs recv -F rpool/t2

That works, the whole point of the -F flag is to discard any subsequent changes. (You'll usually need this if the file system is mounted at the receiver, because even access time updates count as updates that will need to be discarded.) It will roll back rpool/t2 to the original @t1s1 snapshot, discarding the local @t2s2 snapshot, then update the rpool/t2 file system to the @t1s2 snapshot.

So far so good.

Now a minor variation.

I create it, touch a file so I know it's made it.

zfs create rpool/t1
touch /rpool/t1/1

OK, snapshot it and send it.




zfs snapshot rpool/t1@s1
zfs send rpool/t1@s1 | zfs recv rpool/t2


Create another file, and create a snapshot at both source and destination.



touch /rpool/t1/2
zfs snapshot rpool/t1@s2
zfs snapshot rpool/t2@s2

And now send the incremental stream just like last time.

zfs send -i rpool/t1@s1 rpool/t1@s2 | zfs recv -F rpool/t2

Kaboom. This fails, reporting:

cannot restore to rpool/t2@s2: destination already exists

What? The problem is hinted at in the zfs man page, where the description of -F says:

If receiving an incremental replication stream (for example,
one generated by zfs send -R [-i|-I]), destroy snapshots and
file systems that do not exist on the sending side.

The problem, then, is that zfs won't destroy the @s2 snapshot that exists at the receiver, because a snapshot of the same name exists in the source. It's not the same snapshot, of course, but it has the same name. This prevents the rollback, and the receive fails.

Snapshot name collisions are pretty common. We have an automatic snapshot regime, so pretty much every file system we have has a daily snapshot that embeds the date, and being automatic, they all have the same name.

What this means in practice is that if you have snapshots created on the receiving side, you'll have to explicitly roll the file system back to the snapshot you sent to previously, to avoid hitting name collisions.

I think this behaviour is wrong, although I'm not quite confident enough to call it a bug. The point is that on the receiving side, any snapshots created after the one that was sent are irrelevant - it shouldn't matter what their names are, and I'm not at all sure why zfs even bothers checking the names of snapshots that ought to be deleted.



Wednesday, October 05, 2016

Cats versus Petals

It's become common to talk about Pets versus Cattle as the "new way" of thinking about servers.

Of course, "the new way" isn't really new - many IT shops in the mid 1990s had fully automated, reproducible, and disposable infrastructure. It's just the term that has recently become trendy, and I don't think the analogy is necessarily right.

In the original analogy, the claim was that a Pet is precious, so you care and feed for it specially. If it's sick, you nurse it back to health. Whereas if one of your herd of Cattle gets sick, you take it out back and shoot it. This is based purely on emotional attachment, and makes little business sense. The truth is more that most Pets have little financial value, whereas Cattle are intrinsically valuable. Whether sick Cattle are bursed back to health should be a pure business decision based on the value of a healthy animal compared to the cost of treating it.

Currently, I think a more appropriate analogy would be Cats versus Petals.

Let me explain.

A Cat system has a mind of its own. In fact, it isn't at all clear whether you own the system or the system owns you. Cat systems tend to be solitary and not integrate or interoperate well with others. If you have many Cat systems, they will tend to wish to go their own ways.

In contrast, Petals will be small, simple systems. You will have many, and they will be the same. While a Petal may have some value of its own, their true beauty is only visible when they are put together into larger units - flowers, for example. Different flowers are made up of different types of petals.

One point here is that if you're thinking about Pets and Cattle, you're still thinking of individual animals. With Petals, the role of holistic thinking and orchestration in producing a larger object (the flower, or even the garden) becomes clear.

In terms of terminology, your business is a garden; the services you provide are flowers; they are constructed from containers as the petals via an orchestration service that provides the stems and branches. Your job is to ensure good soil, water and light, prune, remove pests and weeds - not to create each individual Petal by hand.

If you're still herding Cats, it's time to stop and tend gardens instead.

Tuesday, September 27, 2016

Tribblix - updates versus upgrades

Having released a new version of Tribblix, I thought it worth writing a little on how I see updates and upgrades in the Tribblix world, and how they differ.

After all, one thing I said about the Tribblix philosophy of keeping current is that Tribblix is essentially a rolling release, in that new versions of applications are continuously added. You can just update and you'll get the latest version of applications.

So, what defines an upgrade is that it's when the illumos components are updated. In fact, the only way to update any of the illumos packages is via an upgrade.

This is mostly for purely practical reasons. The way a package is updated is to remove it (using pkgrm underneath) and then install the new version (using pkgadd underneath). This is problematic in several ways: you don't want a system problem half way through to leave you with a critical package uninstalled; you can't operate at all with libc removed; and you want the system packages to be updated together as a coherent unit rather than individually. It might be possible to think of a horrendously complex system to solve these problems; it's much better just to do it another easier way.

As for implementation, the illumos packages live in their own software repo, and there's one illumos repo per release. No updates ever get applied to that repo, if there are updates a new repo gets created. The process of doing an upgrade is to clone the system to a new BE (boot environment), change that BE to point to the new repo, update all the packages in the new BE, then reboot into it.

In practice, the main Tribblix repo is also versioned per-release. Originally that was because it contains the zap package, which is where the repos are defined. However, it turns out that creating a new repo is an administrative convenience as well. The new repo at the point of a release contains the most up to date version of each package. (They're just hardlinks, so don't take any space.) This provides an easy way to claim back some space when I retire an old version of a repo, as you just delete the repo and any packages that aren't duplicated in other repos get deleted with it. It also means that an upgraded system cannot see old package versions, so you naturally prevent users getting out of date and incompatible versions.

Whether this approach is viable in the longer term is another matter. If there are stable releases that get "support" long term, then I'll have to keep old package versions and old repos for longer. But it's worked well so far.

By and large, once I've cut a new release, the older releases don't get updates. This isn't completely true, security updates (openssl, for example, and bind today) do get updated in the prior release, at least for a while. This means keeping an old machine around for the build (a simple VM is fine).

Saturday, September 17, 2016

Tribblix Milestone 18

Time for another Tribblix release, this one following the sequence and called Milestone 18.

The list of changes is pretty dry. Let me add a little colour to that.

On the desktop, MATE has been updated to the current 1.14 release. This provoked a little investigation into desktop caches, because adding MATE broke things. (I've just now added another little change to my MATE packaging which should catch another problem. Sigh.) I also added the EDE desktop as another fast and light option.

I finally got around to building my own copy of libtiff (rather than the old binary version I had inherited from OpenIndiana). This involved a major version bump, and then rebuilding anything that depended on the old version. I created a compatibility package containing the old shared libraries as a stopgap, while working my way through the list. One of the applications that needed updating was gdk-pixbuf, and then there are applications that link against both gdk-pixbuf and libtiff directly.

Very little of the software I ship needs or wants GTK3, so I'm happy with GTK2 (which I did a minor update of). But at some point I'm going to have to update to GTK3. So I tried to update to a later version than I had, in accordance with the Tribblix philosophy of keeping current. Because I don't actually use GTK3 much, it was well behind. Unfortunately, getting completely up to date involved updating Cairo, Pango, GLib, ATK, D-Bus, returning ETOOMUCHWORK. I went to an intermediate step of version 3.14.15, which involved updating ATK. As part of that, I had to update D-Bus, another component I had previously inherited from OpenIndiana. As it's pretty foundational, that required some care and attention to detail, but after working out the appropriate tweaks to match how it had been built before, that went very smoothly. The Linux community (rightly) gets a lot of stick for not caring about compatibility, but I have been very pleased at how good binary compatibility has been with the various desktop components.

As I was going through the various version bumps, I realized that almost everything using LCMS now used lcms2, so I made sure that the one holdout, gimp, was forced to use lcms2 rather than the lcms1 that it picked by default.

It's not only the desktop. Tribblix isn't just a desktop distro, that's just rather more visible (and sometimes more fun). Some of the work here tends to follow a theme - for example on load balancers. Reading between the lines you might be able to detect that I've been working on antivirus (clamav and c-icap), there are other cases where I've used Tribblix to build, package, and test components that might be useful elsewhere


There are some isolated new packages that don't obviously make sense. Sometimes, I have to build and package prerequisites as part of building something else. For example, I had a look at pitivi. While building pitivi itself wasn't successful, I needed to get tools like meson and ninja and nose built, and components like pycairo. As I've gone to the effort of packaging, I'll keep them - they'll be useful in the future when I return to pitivi, and may well be useful for other tools. The same is true for snort, which is why libdnet and daq have been added, even though snort itself isn't there yet.

There was a mailing list thread on shells, which mentioned Plan9. So I went and added Plan9 from User Space because, well, I could, and it was an interesting opportunity to play with something different. I've also removed csh, it's now a link to tcsh. That wasn't a result of the thread, it was something I had meant to do for the last 2 releases but had forgotten in the build.

User feedback is always good. It tends to catch the cases I've never encountered myself. I've added an editor to the live environment, there's nano there now, if ever you need to edit any files.

Friday, September 16, 2016

Tribblix philosophy - software fidelity and currency

One of the key things about Tribblix is that it is very light-touch.

Partly this is out of necessity - this is a part-time endeavour for one person, and I do what I can to minimize the amount of effort I have to put in.

So, as far as possible, I don't change what I get from upstream.

For illumos, I'm as vanilla illumos-gate as you can get (I make one change to the SVR4 packaging tools, important to me as I'm the only distro based on illumos-gate who uses SVR4 packaging).

For other packages, apart from setting the install prefix, I only make changes necessary for applications to build and run. I make no real attempts to tweak or flavour them for Tribblix, you get as much as possible what the original author intended.

This is deliberate, who am I to decide to change the behaviour of somebody else's software?

Also, it makes maintenance easier, as I don't have to try and port patches forward to new versions.

Talking about new versions, I try and keep current. Yes, this implies a rolling release model, of sorts. If there's a problem with a package, I'll roll it forward to a newer release. I won't backport fixes to an older version, I'll simply push out the newer version.

If I can, that is. Sometimes a newer version is broken and doesn't work or won't build. Sometimes a newer version requires an update somewhere else, so it gets stalled and dumped back on the TODO list until the other component gets updated. Sometimes, especially for libraries, the new version isn't API compatible, so anything using it will also need to be rebuilt. This tends to get blocked, although I occasionally go in and update a whole dependency tree at once (which is a pain to do).

Fortunately, most of the core packages have learnt the value of compatibility. Things such as X11, Glib, D-Bus, Cairo, Pango, Gtk, most of the core desktop stack are never much of an issue. (Although because of their position in the dependency tree, I tend to be fairly cautious when I do have to touch them.)

Monday, August 29, 2016

The Tribblix filesystem layout

On Solarish systems, the filesystem(5) manpage gives a good description of where in the directory tree you might find the various files associated with a piece of software.

The version in illumos is largely broken, in that many of the directories referenced make no sense at all for illumos itself, and are largely wrong for the various illumos distributions. In particular, some of the directories are very specific to the old Solaris Java Desktop System, or JDS, and relate to GNOME.

Now, how does Tribblix handle all this?

For anything inherited from illumos-gate, I simply put files wherever illumos-gate put them.

For anything I build and ship, I normally build with a --prefix of /usr. And, for most packages, that's the only thing I set. What this means is that for most packages, --sysconfdir is /usr/etc and --localstatedir would be /usr/var. I do not redirect --sysconfdir to /etc by default. In most cases I think I've done the right thing, to be honest, as often the files that would have been put into /etc aren't meaningfully editable in any case.

In those cases where the application does expect user-editable configuration, I will set --sysconfdir to /etc. This covers things like BIND, samba, cups, openssh, and the like.

Laying things out like this helps with things like sparse-root zones. I'm loopback-mounting /usr read-only, and that neatly catches everything (and ensures the parts of a package are consistent).

On the subject of zones, in a sparse-root zone /lib is inherited, which causes a problem. The SMF manifests and method scripts are now stored under /lib, and some are only relevant to the global zone. To handle this, I make a fixed copy of /lib for sparse-root zones to use, that doesn't have any errant SMF services present.

In order to be able to add my own services to zones, I make sure the manifests live under /var, which is unique to a zone.

I also handle /opt specially. According to filesystem(5), this is the "Root of a subtree for add-on application packages." The idea has always been that 3rd-parties pick a directory there and have that as their own dedicated prefix. (As an aside, I've always found the use of /etc/opt/foo and /var/opt/foo to be incredibly confusing, as it basically splatters the files associated with a given application all over the filesystem, making it very hard to keep track of things. Which is one of the reasons I just specify the prefix and put everything under the one root if I can.)

And what I do with /opt is mandate that it's not inherited by zones. Anything installed in /opt won't automatically be inherited by a zone. If you want it in a zone, you need to make sure it gets added there.

For my own applications designed for zones - particularly services, I put them under /opt/tribblix, so that an application foobar lives in /opt/tribblix/foobar, its configuration under /opt/tribblix/foobar/etc, and the like. Again, it's easier to see everything clearly if there's only one place to look. This layout makes it easy to run services in sparse-root zones, as the OS in /usr is read-only and the application never needs to touch that.

Modulo dependencies, anyway. That's a problem I haven't really solved, as some applications depend on packages that live in /usr, so I need some way to ensure that the right packages are installed in the global zone (or the zone template).

Solaris also had the notion of subsystems. For example, CDE (the dt subsystem) lived under /usr/dt, /var/dt, /etc/dt and the like. Again, I don't follow that. (Although there is the one exception which is that I install CDE under /usr/dt, because that's where it's always lived.) Most things are either generic (so live directly in /usr) or are  services that live under /opt/tribblix for zone support.

The exception to this are packages that live under /usr/versions in Tribblix. The main idea here is for things that might come in more than 1 version. For example, python 2 vs python 3. Or the various versions of Node.js or Java. Here the convention is that the application lives in a versioned directory under /usr/versions, allowing multiple versions of an application to coexist. (One or two things end up under /usr/versions even though there's no meaningful need to ever support multiple versions, when I need to put something in it's own directory hierarchy rather than directly in /usr, just to avoid having to create another standard location. Sort of like subsystems, but more tightly managed.) I'll generally put convenience links in the default path, although sometimes that involves picking a default version.

This all mirrors how I used to install software on Solaris 10 with zones many years ago. It's designed with zones in mind, and has been pretty sucessful.