Monday, July 31, 2017

Building a Tribblix AMI - hvm mode

After having created a Tribblix AMI to prove that Tribblix basically works on EC2, I then moved on to the next issue - how to create an AMI that will run in hvm mode?

As a reminder, pv mode AMIs are deprecated, aren't supported by all instance types, and don't work in all regions. So you really need something that runs in hvm mode.

The first thought might be to convert the existing pv image to a hvm image. I've tried that and, while you can do the conversion, the image doesn't actually work. The problem here is that ZFS has the physical paths of the devices it's installed on embedded in the pool metadata. Changing from pv to hvm mode changes the emulated hardware, in particular the disk paths, so the ZFS pool isn't where it thought it was and the system panics. If you have a mismatch between the disk layout where the pool was created and where you're running you'll get a panic something like this:



If you had console access and could boot from media you could fix this, but AWS doesn't provide that. (And if you could boot from media you could just do a regular install without all the shenanigans involved in producing an AMI.)

So, you have to create the image on a system that looks like EC2. Which means using xen.

Fortunately, this road has been travelled before. These instructions are exactly what you need. They're for OpenIndiana, but will apply to any illumos distribution. And they're the process used by the OpenZFS project to do their testing. (I'll also mention that the OpenZFS folks have put a number of fixes back into illumos that improve the EC2 experience for us.)

I'm not going to repeat those instruction, that would be boring, so I'll talk about what I had to do or change to make those instructions work for me.

I got one of my spare desktop PCs out and installed Ubuntu 16.04 on it. (I must be spoilt by Tribblix, the Ubuntu install was horrendously slow and very high maintenance.) And then installed xen, rebooted as dom0, and set up the bridge networking.

That was my first pothole. There's this thing called systemd that's come along, and it changes the way network configuration is done. Much cussing and googling, but I got it right first time.

Then I discover that there's a new toolstack here. It's all xl not xm, but otherwise seems the same.

I then tried to start a VM, only to be given a completely meaningless and unhelpful error message. Why tell the user what's wrong when you can just vomit a stack trace?

After a bit of head-scratching I worked out that the system didn't actually support hvm mode. If you run xl info and look for virt_caps, it should mention hvm. That's a bit odd, the sticker on the front of the box looks right.

Manufacturers ship hardware with VT-x disabled in the BIOS, it appears. Into the BIOS we go, to find that the relevant settings are greyed out and you need a BIOS password to get into them. Open the box and start looking for jumpers. Fortunately I found a helpful article - the key here was the bit about the jumper being blue, little details like that make all the difference.

OK, so having wiped the BIOS password, gone into the BIOS and enabled VT-x, I go back to xen. Looking at virt_caps now shows hvm, as it should, and my domain starts.

The idea here is that you connect to the console with VNC. Easy enough, but by the time I had got my ssh tunnel set up and started up my VNC client, my VM had gone. I started it again, it starts booting just fine but then issues a few warnings and then a kernel panic. It's all over pretty quick.

In order to catch what it said, I then used vnc2flv. Someone asked me about screen recorders a while back, and I suggested they did what they wanted to do in a vnc session and use vnc2flv to record it. But it's the same here. Once I had the session recorded I can watch the movie and pause it to see what errors it's spitting out.



This, I think, is related to illumos bug 7186. It looks like we can't handle the network presented by newer versions of xen.

To get round this I simply disabled the network interface in the VM definition. Then the VM boots just fine and can be installed. You're a little bit limited in that you can't do updates but, as long as nwam is enabled then it will get itself on the network when you do run it on something that does have a compatible network.

For OmniOS, this means you have to manually enable nwam, as they have networking switched off by default. And remember that you must have networking enabled if you're running on EC2 as there's no other way to access your system.

What you'll also need to ensure at this point is that you have a functional user account you can get in to via ssh. With Tribblix and OpenIndiana you have jack, other distros might need to create a user. You wouldn't want that on a production AMI, of course, but you need to be able to log in to the system the first time in order to complete any configuration and add the various bits of AWS integration that you'll need.

Having got my image installed I followed the instructions through and got an AMI that works just fine.

The configuration file I used is:

builder='hvm'
name='ami-template'
vcpus=1
memory=1024
disk=[  'file:/var/tmp/tribblix-0m20.1.iso,hdb:cdrom,r',
        'file:/root/ami-template.img,xvda,w' ]
boot='d'
vnc=1
vnclisten='0.0.0.0'
vncconsole=1
on_crash='preserve'
xen_platform_pci=1
serial='pty'
on_reboot='destroy'


The one crucial thing here, apart from not having a vif line to create a network, is that you must use xvda for the disk. That's what EC2 will present to you, if you use something else you'll get the same panic on boot that I saw when attempting to convert a pv image.

We're almost done. Next time I'll talk about how to go from something that minimally boots up to something that's done well.

Running illumos on AWS - the first Tribblix AMI

I've run Tribblix on all sorts of hardware - desktops, servers, even the occasional laptop. I've had success running it on some of the smaller cloud providers that allow you to install from a custom ISO, or iPXE, such as my adventures with Vultr.

However, running on AWS has eluded me. You might wonder why you would want to, but the reality is that AWS is a huge player, with many people turning to it as their default (and often only) option. So giving everyone who uses AWS access to Tribblix would be a good thing, and would also offer an easy route for people who might want to play with Tribblix to do so.

The first thing to realize is that AWS is not so much a single cloud as a set of independent clouds. Each region is independent, and has a different set of capabilities. For example, EFS is only available in a few regions. These differences can affect us.

On AWS, there are 2 different types of guest. We have pv (the older, paravirtualized) and hvm (the newer, hardware assisted). Any given AMI (Amazon Machine Image) will only run as either a pv or a hvm guest. And some EC2 instance types are pv, others hvm. Newer regions (such as London) are exclusively hvm, so pv isn't an option.

Building an AMI from scratch looked a little daunting, so I looked to see what other illumos distributions might have made AMIs available. If you go to the community AMI page when launching an instances, the only one you'll find is OmniOS. They even have a page explaining how it was done. The snag is that all their images are pv. For my first set of experiments then, I was operating in the Dublin region.

The OmniOS AMI boots up just fine and works pretty much as you would expect. No problems there. How to get Tribblix running though?

The answer lies in the beauty of ZFS and Boot Environments. The basic approach here is to take a running OmniOS image, create a new Boot Environment, install Tribblix into that Boot Environment, and make the Tribblix Boot Environment the one to boot from next time. Once I've successfully booted the Tribblix image, I can clean up and delete the original OmniOS files.

One of the advantages of Tribblix is that I have my own installer. It's quite a bit simpler than some of the other distros, and thus much easier to mangle to do things in new environments. I decided to use the iPXE image as used in my Vultr experiment, because it was easy and I had it to hand. I then wrote a modified installer script (source here) called img_install that was based on my over_install script used to drop Tribblix into an existing ZFS pool. The difference is that the old over_install was run in the context of a Tribblix Live CD; the new img_install is run in the context of an alternative distro. The other thing in that script is that I don't do any boot loader fiddling - the pv instances have a special pv-grub, which I'm careful not to touch.

(By the way, the same trick will work for other illumos distributions. You just need a source archive of some sort and a script to unpack it. For example, I have a script to unpack some of the ISO images in the tribblix-zones repo, which I use to create alien-root zones. It's the same idea of installing an image in a alternate path.)

So all that was involved was to:
  • Start up an OmniOS instance (a micro instance on the free tier works fine)
  • Run the img_install script to create the alternate BE
  • Reboot, so you boot into Tribblix
  • Delete the old OmniOS BE
  • Finish off the install and apply updates
Then you can do the normal create an image trick on the AWS console, and you have a nice shiny Tribblix AMI.

That all worked out just beautifully. Tribblix runs on EC2 just fine.

In the next article, I'll describe how to create a hvm AMI.

Saturday, July 22, 2017

Mucking around with IPv6 and illumos zones

The world is running out of IPv4 addresses, and it's time to move to IPv6.

I remember that story being told over and over at conferences in the mid 1990s. Yet, here we are in 2017 and while there has been progress, we're definitely not there yet.

With zones, illumos (and Solaris) give you virtualized application environments (containers is the trendy term - we tend not to use that in the Solarish context because it got polluted by Sun marketing). Those environments (usually) need to be networked, so why not with IPv6.

So here goes with a few notes on the subject.

Shared-IP zones


With zones, the original networking model was a shared-ip stack, where the zone is given a fully configured network that is just a virtual IP configured on an existing interface. All the setup is done in the global zone, which makes it very easy.

(By the way, this was the cause of the limit of 8192 zones per system, because you can only have 8192 virtual addresses on a single physical interface.)

And configuring an IPv4 address is just a case of adding a net section to the zone configuration:

add net
set physical=aggr1
set address=172.18.1.172/24
end

It's exactly the same for IPv6, the only interesting issue is what the IPv6 address would be. Let's start with the link-local address - the one that starts with fe80:: - as you will generally need that even if you don't have a routed IPv6 network. For a physical interface, the IPv6 address is usually derived from the MAC address. We can't use that one, because we're sharing the interface and the global zone has already grabbed it. So the convention here is to construct something from the IPv4 address. It's then guaranteed to be unique in a broadcast network, which is all that matters for a link-local address. So all we have to do is convert the IPv4 address to hex, for example with printf:

printf "%x%x:%x%x\n" 172 18 1 172

which gives ac12:1ac, so the link-local address would be configured as:

add net
set physical=aggr1
set address=fe80::ac12:1ac/10
end

You're pretty much done here, if you do that your global and non-global zones will be able to communicate using IPv6 on the local subnet.

If you had a routable prefix, then the same scheme can be applied. Just put the fragment onto the end of your prefix.

add net
set physical=aggr1
set address=XXXX:XXXX:XXXX:XXXX::ac12:1ac/64
end


Of course, if you're assigned specific IPv6 addresses then you can use those directly. The above scheme is pretty trivial to script, though (and it actually makes it fairly easy to keep your DNS zone files up to date too).

Exclusive-IP zones


For an exclusive-ip zone, you just hand over a network interface to a zone and let it go figure. So it can assign whatever addresses it likes.

In particular, the zone can use the normal MAC address scheme to generate its IPv6 link-local address.

Originally in older Solaris, you needed to use a genuine physical interface. Which limited you a little bit as there are only so many network cards you can jam into a server. OpenSolaris introduced full network virtualization in the form of Crossbow, so any illumos distribution or Solaris 11 can create fully virtualized network stacks and present those to zones in the same way.

In Tribblix, I use zap to manage zones, and it takes care of creating the appropriate vnics and, if appropriate, etherstubs, and wiring things together. I also poke functional /etc/hostname.* and /etc/defaultrouter files into the zone so the networking at least gets configured when the zone boots. Adding IPv6 to the zone is simply a case of creating matching empty /etc/hostname6.* files (one for each vnic) and the IPv6 addresses will get autoconfigured.

There's one wrinkle with exclusive-ip that deserves a whole section, that of restricting the zone to only using addresses that you've set.

Restricting with allowed-address


Remember that an exclusive-ip zone can manage the network interface. So it could allocate the wrong address and generally cause havoc on the network. To prevent this, set the allowed-address property on the interface. For example:

add net
set physical=vnic1
set allowed-address=172.18.1
.172/24
end

The zone manages the interface, but an attempt to configure an invalid address will be thwarted.

You can see what's happening under the hood by running dladm show-linkprop. You'll see that the protection and allowed-ips properties are set.

(As an aside, it would be fantastic if illumos got the configure-allowed-address feature that Solaris 11 has, which would bypass my trickery in having to poke the network setup files in the zone.)

The same thing works for IPv6. The first problem I discovered is that (unlike Solaris 11) illumos won't accept multiple addresses in a list. Initially I thought this was something about IPv6, but it turns out you need to specify each address you want to add in a separate block.

The next question is going to be - what is the IPv6 address going to be? It's derived from the MAC address, so isn't fixed in advance.

The first step is to get the properties of the vnic. Running dladm show-vnic will give you the properties you need, including the MAC address. If you just want the one field, that's fairly easy too.

# dladm show-vnic -p -o MACADDRESS vnic1
2:8:20:c6:71:d

The IPv6 address is made up from that as a EUI-64 address, which is basically the fe80:: prefix, the first 3 octets, then ff:fe, then the last 3 octets. Oh, and the 7th bit gets flipped. And conventionally the leading zero gets suppressed. An ugly way of scripting this in ksh looks like:

/usr/sbin/dladm show-vnic -p -o MACADDRESS $VNIC | \
    /usr/bin/sed 's=:= =g' | read o1 o2 o3 o4 o5 o6
integer -i2 vi=16#$o1
integer -i2 nvi
nvi=$(($vi ^ 2#00000010))
integer -i16 xv=$nvi
no1=${xv/16#/}
if [ "$no1" = "0" ]; then
    no1=""

fi
if [ "$no3" = "0" ]; then
    no3=""

fi
if [ "$no5" = "0" ]; then
    no5=""

fi
printf "fe80::%s%s:%sff:fe%s:%s%s/10" "$no1" "$o2" "$o3" "$o4" "$o5" "$o6"

So, what you want to do is add something like:

add net
set physical=vnic1
set allowed-address=fe80::8:20ff:fec6:71d/10
end

to your zone configuration. And if you have a routable IPv6 address, you'll need to duplicate the block again for that.

In practice, this didn't quite work for me. If you don't set the allowed-address properties then the addresses get configured correctly, but with the protection set the address doesn't come up properly. If you try it then you get:

# ifconfig vnic1 inet6 up
ifconfig: setifflags: SIOCSLIFFLAGS: vnic1: Invalid argument

However, if you explicitly set the address, running ifconfig by hand:

# ifconfig vnic1 inet6 fe80::8:20ff:fec6:71d/10 up

then it works perfectly.

Thursday, July 13, 2017

What gets into Tribblix?

The software available for Tribblix is a bit of an eclectic mix. How do I choose what software to package?

There are actually a number of different reasons why you get a particular package.

The basics


Some packages are just basic,and you expect to find them. Much of the GNU stack comes in this way. And often things like Perl and Python are a foundational requirement for a lot of other tools.

What I want personally


There are a number of areas where I have specific interests - I'm a bit of a magpie when it comes to X11 window managers, for example. And I need to open office documents, so I had to get LibreOffice working. There a few games or emulators that I like. This also explains why some things might not be present too - I have no real interest in video or multimedia, for example, so that's an area with relatively little coverage.

Oh, that looks cool


I'm often interested in new stuff. (Even if it's actually old stuff that's just new to me.) So if I come across a piece of software and think "that looks cool" I'll often try and build and package it. If it works, fine, it ends up in the repository. Even if I might not end up doing anything with it, I've gone to the effort of making a package so it may as well stay there and somebody else might make use of it.

I have a $DAYJOB


Yes, I have a day job (a very good one, thank you very much), and it involves running applications on illumos. If I'm going to evaluate software I'll do it on Tribblix first. Building stuff on Tribblix is much easier than on, say, OmniOS - I have total control of the environment, and many more tools and prerequisite packages to give me a head start. So I can easily screen out any software that simply isn't going to work, and identify any patches or modifications necessary, before heading into the rather more constrained work environment.

Can you make X or Y or Z available


I get requests from users. I'll pretty much always at least try to add the software asked for - the fact that someone's bothered to ask indicates it might be useful, and I might find it interesting as well. This doesn't always work, of course, and I'll have to punt.

I got bored one day


Sometimes I get a bit of free time (no, this doesn't happen very often), and start looking for packages that might be worth adding. Sometimes this involves looking at other systems to see what they ship.

It's a prerequisite for something else


Dependency hell is a fact of life, so a lot of the time is actually spent building prerequisites. This is one reason for speculatively trying things out - it identifies prerequisites, and they're often going to be needed by other packages too. Although what you'll find is a number of packages with no obvious consumers, because the software I wanted didn't actually work. As I mentioned before, though, I'll keep those packages I built, and they might come in useful later.

Thursday, July 06, 2017

Running LX zones with Tribblix

I mentioned a few months ago a little project I had been working on - nicknamed omnitribblix, it's regular Tribblix with the illumos components coming from illumos-omnios (now via OmniOS Community Edition) rather than vanilla illumos-gate.

One of the changes I made in the recent Milestone 20 update was to split out the release packages to give more flexibility.

Thiis allowed me to release a micro update to Milestone 20 (imaginatively called m20.1 or update 1), which updates the illumos bits but shares the same main package repository as the main Milestone 20 release.

And the other thing I can now do is build variant releases. So Tribblix has an LX variant!

You can download the omnitribblix ISO image from the Tribblix download page. It installs, operates, and is packaged just like regular Tribblix. If you don't use LX zones, you probably wouldn't notice the difference.

(It's versioned as m20lx.1 - the update 1 there means that it's a parallel release to the regular Tribblix Milestone 20 update 1.)

You can also update to the LX variant from either the regular Milestone 20 or Milestone 20 update 1 releases, in the normal way. It's a micro update, or sidegrade perhaps, but uses the same upgrade process as regular upgrades.

And, because of the magic of boot environments, if there's a problem you can roll back.

Anyway, once you have omnitribblix installed, how do you create an LX zone? Very easily, in the same way you create and destroy other zones on Tribblix, using the zap utility.

Before you can do that, though, you need a Linux image of some sort to install.

I've been using the same images I use under Docker. So, for example, if I want Alpine then I would go:

docker run alpine uname -a

and then get the name of the container

docker ps -a

and then export that with

docker export romantic_galileo > alpine.tar

Then copy the alpine.tar file to your omnitribblix system. If you want something a bit richer, then Ubuntu will work. But generally exporting a Docker container like this will work, and the image characteristics will be a good fit for a zone.

And then all you do to create the zone is use zap, specifying that it's an lx brand and telling it where the tarball is:

zap create-zone -z alpine -t lx \
-x 10.0.2.99 -I /tmp/alpine.tar

and just zlogin to it as normal.

There are constraints around networking - you have to be exclusive-ip (the -x flag) and zap will create (and destroy) the vnic for you automatically. But the networking in the zone won't actually be configured. (While you specify the IP address in the command, that just tells zap how to configure the network plumbing and the vnic.) You'll have to log in to the zone and use the native tools to identify and configure the network, like so:

/native/sbin/ifconfig -a
/native/sbin/ifconfig znic0 inet 10.0.2.99 up
/native/usr/sbin/route add net default 10.0.2.2

And off you go. Sitting on an illumos box with all its goodness, with access to the wide variety of the Linux ecosystem at your fingertips.