The Trouble with Tribbles...: March 2023

Wednesday, March 22, 2023

SPARC Tribblix m26 - what's in a number?

I've just released Tribblix m26 for SPARC.

The release history on SPARC looks a little odd - m20, m20.6, m22, m25.1, and now m26. Do these release versions mean anything?

Up to and including m25.1, the illumos commit that the SPARC version was built from matched the corresponding x86 release. This is one reason there might be a gap in the release train - that commit might not build or work on SPARC.

As of m26, the version numbers start to diverge between SPARC and x86. In terms of illumos-gate, this release is closer to m25.2, but the added packages are generally fairly current, closer to m29. So it's a bit of a hybrid.

But the real reason this is a full release rather than an m25 update is to establish a new baseline, which allows me to establish compatibility guarantees and roll over versions of key components, in this case it allows me to upgrade perl.

In the future, the x86 and SPARC releases are likely to diverge further. Clearly SPARC can't track the x86 releases perfectly, as SPARC support is being removed from the mainline source following IPD 19, and many of the recent changes in illumos simply aren't relevant to SPARC anyway. So future SPARC releases are likely to simply increment independently.

Sunday, March 12, 2023

How I build the Tribblix AMIs

I run Tribblix on AWS, and make some AMIs available. They're only available in London (eu-west-2) by default, because that's the only place where I use them, and it costs money to have them available in other regions. If you want to run them elsewhere, you can copy the AMI.

It's not actually that difficult to create the AMIs, once you've got the hang of it. Certainly some of the instructions you might find can seem a little daunting. So here's how I do it. Some of the details here are very specific to my own workflow, but the overall principles are fairly generic. The same method would work for any of the illumos distributions, and you could customize the install however you wish.

The procedure below assumes you're running Tribblix m29 and have bhyve installed.

The general process is to boot and install an instance into bhyve, then boot that and clean it up, save that disk as an image, upload to S3, and register an AMI from that image.

You need to use the minimal ISO (I actually use a custom, even more minimal ISO, but that's just a convenience for myself). Just launch that as root:

zap create-zone -t bhyve -z bhyve1 \
-x 192.168.0.236 \
-I /var/tmp/tribblix-0m29-minimal.iso \
-V 8G

Note that this creates an 8G zvol, which is the starting size of the AMI.

Then run socat as root to give you a VNC socket to talk to

socat TCP-LISTEN:5905,reuseaddr,fork UNIX-CONNECT:/export/zones/bhyve1/root/tmp/vm.vnc

and as yourself, run the vnc viewer

vncviewer :5

Once it's finished booting, log in as root and install with the ec2-baseline overlay which is what makes sure it's got the pieces necessary to work on EC2.

./live_install.sh -G c1t0d0 ec2-baseline

Back as root on the host, ^C to get out of socat, remove the ISO image and reboot, so it will boot from the newly installed image.

zap remove-cd -z bhyve1 -r

Restart socat and vncviewer, and log in to the guest again.

What I then do is to remove any configuration or other data from the guest that we don't want in the final system. (This is similar to the old sys-unconfig that many of us used to Solaris will be familiar with.)

zap unconfigure -a

I usually also ensure that a functional resolv.conf exists, just in case dhcp doesn't create it correctly.

echo "nameserver 8.8.8.8" > /etc/resolv.conf

Back on the host, shut the instance down by shutting down the bhyve zoned it's running in:

zoneadm -z bhyve1 halt

Now the zfs volume you created contains a suitable image. All you have to do is get it to AWS. First copy the image into a plain file:

dd if=/dev/zvol/rdsk/rpool/bhyve1_bhvol0 of=/var/tmp/tribblix-m29.img bs=1048576

At this point you don't need the zone any more so you can get rid of it:

zap destroy-zone -z bhyve1

The raw image isn't in a form you can use, and needs converting. There's a useful tool - the VMDK stream converter (there's also a download here) - just untar it and run it on the image:

python2 ./VMDK-stream-converter-0.2/VMDKstream.py /var/tmp/tribblix-m29.img /var/tmp/tribblix-m29.vmdk

Now copy that vmdk file (and it's also a lot smaller than the raw img file) up to S3, in the following you need to adjust the bucket name from mybucket to something of yours:

aws s3 cp --cli-connect-timeout 0 --cli-read-timeout 0 \
/var/tmp/tribblix-m29.vmdk s3://mybucket/tribblix-m29.vmdk

Now you can import that image into a snapshot:

aws ec2 import-snapshot --description "Tribblix m29" \
--disk-container file://m29-import.json

where the file m29-import.json looks like this:

{
    "Description": "Tribblix m29 VMDK",
    "Format": "vmdk",
    "UserBucket": {
        "S3Bucket": "mybucket",
        "S3Key": "tribblix-m29.vmdk"
    }
}

The command will give you a snapshot id, that looks like import-snap-081c7e42756d7456b, which you can follow the progress of with

aws ec2 describe-import-snapshot-tasks --import-task-ids import-snap-081c7e42756d7456b

When that's finished it will give you the snapshot id itself, such as snap-0e0a87acc60de5394. From that you can register an AMI, with

aws ec2 register-image --cli-input-json file://m29-ami.json

where the m29-ami.json file looks like:

{
    "Architecture": "x86_64",
    "Description": "Tribblix, the retro illumos distribution, version m29",
    "EnaSupport": false,
    "Name": "Tribblix-m29",
    "RootDeviceName": "/dev/xvda",
    "BlockDeviceMappings": [
        {
            "DeviceName": "/dev/xvda",
            "Ebs": {
                "SnapshotId": "snap-0e0a87acc60de5394"
            }
        }
    ],
    "VirtualizationType": "hvm",
    "BootMode": "legacy-bios"
}

If you want to create a Nitro-enabled AMI, change "EnaSupport" from "false" to "true", and "BootMode" from "legacy-bios" to "uefi".

Saturday, March 11, 2023

What, no fsck?

There was a huge amount of resistance early on to the fact that zfs didn't have an fsck. Or, rather, a separate fsck.

I recall being in Sun presentations introducing zfs and question after question was about how to repair zfs when it got corrupted.

People were so used to shoddy file systems that were so badly implemented that a separate utility was needed to repair file system errors caused by fundamental design and implementation errors in the file system itself that the idea that the file system driver itself ought to take responsibility for managing the state of the file system was totally alien.

If you think about ufs, for example, there were a number of known failure modes, and what you did was take the file system offline, run the checker against it, and it would detect the known errors and modify the bits on disk in a way that would hopefully correct the problem. (In reality, if you needed it, there was a decent chance it wouldn't work.) Doing it this way was simple laziness - it would be far better to just fix ufs so it wouldn't corrupt the data in the first place (ufs logging went a long way towards this, eventually). And you were only really protecting against known errors, where you understood exactly the sequence of events that would cause the file system to end up in a corrupted state, so that random corruption was either undetectable or unfixable, or both.

The way zfs thought about this was very different. To start with, eliminate all known behaviour that can cause corruption. The underlying copy on write design goes a long way, and updates are transactional so either complete or not. If you find a new failure mode, fix that in the file system proper. And then, correction is built in rather than separate, which means that it doesn't need manual intervention by an administrator, and all repairs can be done without taking the system offline.

Thankfully we've moved on, and I haven't heard this particular criticism of zfs for a while.