The Trouble with Tribbles...: January 2022

Thursday, January 20, 2022

Tribblix updates and https

One good thing to have happened recently is the rise of Let's Encrypt, bringing https to all websites without all the hassle you previously had to go through to get a certificate.

One not quite so good event recently was the switch by Let's Encrypt to certificates signed by their own ISRG X1 root, and more excitingly the expiry of the prior DST Root CA X3 signing certificate.

My experience of this is that most things just worked, but I'm still seeing odd cases where clients can't connect. Generally, browsers work just fine; CLI tools are a bigger issue.

This might be due to a couple of issues. Sometimes the software itself guesses wrong (older openssl 1.0.2 for example); sometimes the system's CA bundle of trusted root certificates needs updating.

For a while now, the Tribblix package repositories have been served over https and the zap tool for package management has been configured to use https. There are cases where it falls foul of the above issues.

This might occur on older Tribblix releases - I've seen this on m22, for example.

It turns out that curl fails, but wget works. Again, that's an example of the inconsistency in behaviour that I see. You need to update the CA bundle on m22, but if the package update tool is broken that's a bit tricky.

There's an ugly hack, though, because zap will try wget if it can't find curl. So just move curl out of the way temporarily:

mv /usr/bin/curl /usr/bin/curl.t
zap refresh
zap update TRIBca-bundle
mv /usr/bin/curl.t /usr/bin/curl

and you should be good to go again.

There's another way, of course: edit the *.repo files in /etc/zap/repositories to change the URL from https to http. That's not particularly recommended (although the packages are signed and the signatures are checked).

One thing that last hack demonstrates is the value in using simple text files.

Tuesday, January 18, 2022

Inside zone installation

How do zones actually get put together on Solaris and illumos? Specifically, how does a zone get installed?

There are various type of zones. The nomenclature here is a brand. A zone's brand defines how it gets installed and managed and its properties. Often, this is mapped to a zone template which is the default configuration for a zone of that type or brand.

(By the way, this overlap between template and brand can be seen in the create subcommand of zonecfg. You do "create -t SUNWlx" to build a zone from a template, which is where the -t comes from. It's not the create that sets the brand, it's the template.)

The templates are stored as xml files in /etc/zones. As are the configured zones, which is a bit confusing. So in theory, if you wanted to generate a custom template to save adding so much to your zonecfg each time, you could add your own enhanced template here. The actual zone list is in /etc/zones/index.

In fact, Tribblix has template zones, which are sparse-root zones built from a different image to the global zone. They are implemented by building an OS image that provides the file systems to be mounted read only, and a template xml file configured appropriately.

One of the things in the template is the brand. That maps to a directory under /usr/lib/brand. So, for example, the TRIBsparse template in /etc/zones/TRIBsparse.xml sets the brand to be sparse-root, in addition to having the normal lofs mounts for /usr, /lib, and /sbin that you expect for a sparse-root zone. There's a directory /usr/lib/brand/sparse-root that contains everything necessary to manage a sparse-root zone.

In there you'll find a couple more xml files - platform.xml and config.xml. A lot of what's in those is internal to zones. Of the two, config.xml is the more interesting here, because it has entries that match the zoneadm subcommands. And one of those is the install entry. For TRIBsparse, it is

/usr/lib/brand/sparse-root/pkgcreatezone -z %z -R %R

When you invoke zoneadm install, this script gets run, and you get the zone name (-z) and zonepath (-R) passed in automatically. There's not much else that you can specify for a sparse root zone. If you look at the installopts property in config.xml, there's just an h, which means that the user can specify -h (and will get the help).

For a whole-root zone the install entry is similar, but installopts is now o:O:h - this is like getopts, so it's saying that you can pass the -o and -O flags, and that each must have an argument. These flags are used to define what overlays get installed in a whole-root zone. Having the installopts defined here means that zoneadm can validate the install command.

So, for a given brand, we've now seen from config.xml what command will be called when you install a zone, and what options it's allowed.

The point is that there's nothing special here. You can build a custom brand by writing your own install script, and if you need to pass arguments to it you can easily do so as long as you set installopts to match. When building all the zone brands for Tribblix, that's all I did.

To reiterate, the install script is completely open. For existing ones, you can see exactly what it's going to do. If you want to create one, you can have it do anything you like in order to lay down the files you want in the layout you want.

As a crazy example, a long time ago I created a brand that built a sparse-root zone on a system using IPS packaging.

There's a little bit of boilerplate (if you're going to create your own brands, it's probably easier to start with a copy of an existing one so you pick up the common actions that all zone installs do), but after that, the world's your oyster.

Consider the alien-root zone in Tribblix. If you look at the installer for that, it's just dumping the contents of an iso image, tarball, or zfs send stream into the zone root. It does some cleanup afterwards, but generally it doesn't care what's in the files you give it - you can create an arbitrary software installation, tar it up, and install a zone from it.

(In fact, I probably won't create more native zone types for Tribblix - the alien-root is sufficiently generic that I would extend that.)

This generality in scripting goes beyond the install. For example, the prestate and poststate scripts are called before or after the zone transitions from one state to another, and you can therefore get your zone brand to do interesting things triggered by a zone transitioning state. One of the coolest uses here is the way that OmniOS implements on-demand vnics - the prestate script creates a vnic for a zone before a zone boots, and the poststate script tears it down after it halts. (Tribblix uses zap to manage vnics outside of zoneadm, so they're persistent rather than on-demand, it's just a different way of doing things.)

As you can see, you aren't limited to the zone types supplied by your distribution. With enough imagination, you can extend zones in arbitrary ways.

Monday, January 17, 2022

Are software ecosystems a good thing?

One way to judge the health or strength of a product might be to look at the ecosystem surrounding that product. But is this diagnostic?

Note that there are several concepts here that are similar to the ecosystem. I'm not referring to the community, those people who might use or support the product. Nor am I talking about a marketplace, which is a source of artefacts that might be consumed by the product. Those are important in their own right, but they aren't what I mean when I'm talking about an ecosystem.

No, an ecosystem is the set of other services or software that spring up to support or integrate with the product.

There's one immediate problem here that's obvious if you think about it. Much of the ecosystem thus exists to address flaws or gaps in the product. Something that is more polished, more mature, and more finished will provide fewer opportunities for other products to add value.

What this means, then, is that a thriving ecosystem is often a sign of weakness and immaturity, not strength. A good product will not need the extras and hangers on that come with an ecosystem.

The notion of an ecosystem is tied in with that of MVP - Minimum Viable Product. The current trend is to launch a startup with just an MVP, rely on first mover advantage, and hope to actually finish the offering at a later date. By definition, an MVP cannot be complete, and will need a surrounding ecosystem in order to function at all. This is much more common now than in the past, when products - especially proprietary products - were not launched until they were in some sense done.

Over time, too, an ecosystem will - or should - naturally diminish, as bugs are fixed and missing features filled in. The partners in the ecosystem will get frozen out, as their offerings become irrelevant (think ClusterHQ).

As an example from the past, consider the ecosystems that built up around Windows and DOS. Whole industries were built on things like TCP stacks and centralized nameservices and storage (PC-NFS, even Netware). These were products reliant on fundamental failings of the product they supported. (Don't even get me started on antivirus software.)

Fast forward, and I can't be the only one to recognise the CNCF landscape as a disaster area.