Tuesday, December 27, 2016

Minimal illumos zones

Zones, meet MVI. MVI, meet Zones.

In Tribblix, zones can be the traditional Solaris sparse-root or whole-root style, or variations such as partial-root or alien-root. There's also the option to boot a blank zone - one in which nothing (or as close to nothing as possible) is running.

In parallel development, minimal viable illumos allows you to boot illumos in 48M of RAM, or to build single purpose bootable images.

So what happens if you combine these strands of thought? Minimal illumos zones, that's what.

The general idea here is that you can use the (new) zmvix.sh script in mvi to build a tarball containing a filesystem image. This image is designed for use in zones, so contains none of the kernel components. And there's no point building an ISO image, as it never needs to be bootable of itself.

The alien-root brand in Tribblix was originally designed to build a zone from an installation ISO. The minimal zone is a similar concept, although quite a bit simpler. Unpacking a tarball is far more direct that dissecting a bootable ISO. Furthermore, it's not necessary to undo the live media customizations present on an installation image. So the zone installer just has a simple branch to a tarball unpacker or the iso unpacker depending on filename.

The whole premise of mvi is that it's minimal. However, what counts as the bare minimum depends on context.

For example, a zone whose networking is provided via a shared-ip stack has no need for networking tools, as all the networking is configured for it by the global zone. So that's a major potential simplification.

On the other hand, getting zlogin working was a bit of a challenge. The first problem is that you need getent to be present in the zone. This is defined by the user_cmd element of the zone brand's config.xml file. So my zmvix.sh script explicitly adds /usr/bin/getent to the image. That's enough to get zlogin -S to work.

A full zlogin is a bit more work. That calls /usr/bin/login, which has a bunch more dependencies, including a number of pam modules. The list of files needed a bit of trial and error to obtain. So you can make a full zlogin work, but you don't need to.

While I was doing this I had a look through the zlogin source, and to say it's a massive kludge is a bit of an understatement. And when I read comments like:
It's truly amazing that there is no library function in OpenSolaris to do this for us.
Then I get alarmed. There is truly weird stuff going on here, and I'm clearly not supposed to understand it.

The result of all this, if you create an image from mvi with the command:

./zmvix.sh nonet node

then you end up with an 11M image file, which I can use to create a zone with

zap create-zone -z zmvi -t alien \
-I /var/tmp/zmvi.tar.gz \
-i 192.168.1.234

and if you point a browser at the zone's IP address, port 8000, you get back the page from the node server.

You can do this yourself if you check out mvi and are running a fully updated Tribblix m18.

In all, there are 4 processes associated with the zone. There's zsched, init, and the console shell, plus node. That's it.

Of course, this isn't the only way to do it. Another option would be to use the partial-root zone installer and get it to construct the zone's filesystem image the same way that mvi does, bypassing the tarball creation and unpacking.

Sunday, December 04, 2016

Tribblix and the new illumos loader

Recently, a new boot loader was added to illumos, which will in time replace the old and venerable grub that we've been using for about a decade.

I've been looking at how this will impact Tribblix.

The boot loader's arrival was heralded long in advance. I actually released Tribblix milestone 18 when I did to ensure I didn't have to deal with any loader issues. Not that I was expecting any issues, but just in case.

The first step in looking at the impact of the new loader was to build a current copy of illumos. I had a couple of issues due to recent illumos changes. The first being that the transition to Python 2.7 didn't work with my copy of python (I need to build a dual 32/64 bit installation) so I used the old copy of python 2.6. The second was that the loader wants /usr/sfw/bin/gstrip, which I've never had, but a quick symlink set that straight.

The loader is a new package. The first thing I tried was to build an ISO exactly as I always have. This ISO knows nothing about the new loader, doesn't have the loader present, and uses grub just as it always has. If you pretend the new loader doesn't exist, everything just works the way it did before. That's encouraging as a fallback position

Next step was to add the package for the new loader, and persuade the ISO to boot from it. This was very easy, you just need to change the path to the boot image when calling mkisofs. For grub, it was

-b boot/grub/stage2_eltorito

and for the new loader it becomes

-b boot/cdboot

That should be it, but it then tripped up on a Tribblix customization. The loader needs to know where the kernel and the boot archive are. The defaults are reasonable, but use $ISADIR to pick up a 32 or 64-bit image as required. On live media, Tribblix has a single merged boot archive, so I need to override the boot_archive_name to not use $ISADIR. So I create a file /boot/loader.conf.local that contains



boot_archive_load="YES"
boot_archive_type="rootfs"
boot_archive_name="/platform/i86pc/boot_archive"

boot_archive.hash_load="NO"
boot_archive.hash_type="hash"
boot_archive.hash_name="/platform/i86pc/${ISADIR}/boot_archive.hash"


and then make sure that I delete that file on the installed image, where things will look like a regular system again.

Thinking about this, it would have been more sensible to drop a file into /boot/conf.d which is another location that the loader uses for customization. I use this for something else, I create a file /boot/conf.d/chaindisk containing

chain_disk="disk0:"

and the loader menu will have a "boot from hard disk" entry, which I think you do need on media. Again, this gets deleted from the installed system where it doesn't make any sense.

Something else you can do is tweak the branding. I've played with changing the illumos name on the boot screen with Tribblix (look at the ascii art in /boot/forth/brand-illumos.4th for example).

To make the installed system bootable used to involve messing with installgrub, now bootadm can manage it for you. That's just

/sbin/bootadm install-bootloader -M -P rpool

and it should handle pools with multiple drives correctly.

The only other thing the installer needs to do, as far as I can tell, is initialize the list of boot environments. This is similar to grub, and involves putting 2 lines into /rpool/boot/menu.lst, for example

title Tribblix 0.19
bootfs rpool/ROOT/tribblix

and there you are. Some relatively simple changes and Tribblix is ready to use the new loader.

Well, almost. This needs to be packaged up and polished, and I still need to change and test the UFS installer, SPARC builds, and installation into an existing pool.

Sunday, October 09, 2016

zfs receive oddity

Every so often, even a system as good as zfs will throw you a curveball. This one threw me for a while, and here's a simplified example.

All I'm trying to do here is replicate one file system. So I create it, touch a file so I know it's made it.

zfs create -o rpool/t1
touch /rpool/t1/1

OK, snapshot it and send it.

zfs snapshot rpool/t1@t1s1
zfs send rpool/t1@t1s1 | zfs recv rpool/t2

Create another file, and create a snapshot at both source and destination.

touch /rpool/t1/2
zfs snapshot rpool/t1@t1s2
zfs snapshot rpool/t2@t2s2

And now send an incremental stream from the original.

zfs send -i rpool/t1@t1s1 rpool/t1@t1s2 | zfs recv -F rpool/t2

That works, the whole point of the -F flag is to discard any subsequent changes. (You'll usually need this if the file system is mounted at the receiver, because even access time updates count as updates that will need to be discarded.) It will roll back rpool/t2 to the original @t1s1 snapshot, discarding the local @t2s2 snapshot, then update the rpool/t2 file system to the @t1s2 snapshot.

So far so good.

Now a minor variation.

I create it, touch a file so I know it's made it.

zfs create rpool/t1
touch /rpool/t1/1

OK, snapshot it and send it.




zfs snapshot rpool/t1@s1
zfs send rpool/t1@s1 | zfs recv rpool/t2


Create another file, and create a snapshot at both source and destination.



touch /rpool/t1/2
zfs snapshot rpool/t1@s2
zfs snapshot rpool/t2@s2

And now send the incremental stream just like last time.

zfs send -i rpool/t1@s1 rpool/t1@s2 | zfs recv -F rpool/t2

Kaboom. This fails, reporting:

cannot restore to rpool/t2@s2: destination already exists

What? The problem is hinted at in the zfs man page, where the description of -F says:

If receiving an incremental replication stream (for example,
one generated by zfs send -R [-i|-I]), destroy snapshots and
file systems that do not exist on the sending side.

The problem, then, is that zfs won't destroy the @s2 snapshot that exists at the receiver, because a snapshot of the same name exists in the source. It's not the same snapshot, of course, but it has the same name. This prevents the rollback, and the receive fails.

Snapshot name collisions are pretty common. We have an automatic snapshot regime, so pretty much every file system we have has a daily snapshot that embeds the date, and being automatic, they all have the same name.

What this means in practice is that if you have snapshots created on the receiving side, you'll have to explicitly roll the file system back to the snapshot you sent to previously, to avoid hitting name collisions.

I think this behaviour is wrong, although I'm not quite confident enough to call it a bug. The point is that on the receiving side, any snapshots created after the one that was sent are irrelevant - it shouldn't matter what their names are, and I'm not at all sure why zfs even bothers checking the names of snapshots that ought to be deleted.



Wednesday, October 05, 2016

Cats versus Petals

It's become common to talk about Pets versus Cattle as the "new way" of thinking about servers.

Of course, "the new way" isn't really new - many IT shops in the mid 1990s had fully automated, reproducible, and disposable infrastructure. It's just the term that has recently become trendy, and I don't think the analogy is necessarily right.

In the original analogy, the claim was that a Pet is precious, so you care and feed for it specially. If it's sick, you nurse it back to health. Whereas if one of your herd of Cattle gets sick, you take it out back and shoot it. This is based purely on emotional attachment, and makes little business sense. The truth is more that most Pets have little financial value, whereas Cattle are intrinsically valuable. Whether sick Cattle are nursed back to health should be a pure business decision based on the value of a healthy animal compared to the cost of treating it.

Currently, I think a more appropriate analogy would be Cats versus Petals.

Let me explain.

A Cat system has a mind of its own. In fact, it isn't at all clear whether you own the system or the system owns you. Cat systems tend to be solitary and not integrate or interoperate well with others. If you have many Cat systems, they will tend to wish to go their own ways.

In contrast, Petals will be small, simple systems. You will have many, and they will be the same. While a Petal may have some value of its own, their true beauty is only visible when they are put together into larger units - flowers, for example. Different flowers are made up of different types of petals.

One point here is that if you're thinking about Pets and Cattle, you're still thinking of individual animals. With Petals, the role of holistic thinking and orchestration in producing a larger object (the flower, or even the garden) becomes clear.

In terms of terminology, your business is a garden; the services you provide are flowers; they are constructed from containers as the petals via an orchestration service that provides the stems and branches. Your job is to ensure good soil, water and light, prune, remove pests and weeds - not to create each individual Petal by hand.

If you're still herding Cats, it's time to stop and tend gardens instead.

Tuesday, September 27, 2016

Tribblix - updates versus upgrades

Having released a new version of Tribblix, I thought it worth writing a little on how I see updates and upgrades in the Tribblix world, and how they differ.

After all, one thing I said about the Tribblix philosophy of keeping current is that Tribblix is essentially a rolling release, in that new versions of applications are continuously added. You can just update and you'll get the latest version of applications.

So, what defines an upgrade is that it's when the illumos components are updated. In fact, the only way to update any of the illumos packages is via an upgrade.

This is mostly for purely practical reasons. The way a package is updated is to remove it (using pkgrm underneath) and then install the new version (using pkgadd underneath). This is problematic in several ways: you don't want a system problem half way through to leave you with a critical package uninstalled; you can't operate at all with libc removed; and you want the system packages to be updated together as a coherent unit rather than individually. It might be possible to think of a horrendously complex system to solve these problems; it's much better just to do it another easier way.

As for implementation, the illumos packages live in their own software repo, and there's one illumos repo per release. No updates ever get applied to that repo, if there are updates a new repo gets created. The process of doing an upgrade is to clone the system to a new BE (boot environment), change that BE to point to the new repo, update all the packages in the new BE, then reboot into it.

In practice, the main Tribblix repo is also versioned per-release. Originally that was because it contains the zap package, which is where the repos are defined. However, it turns out that creating a new repo is an administrative convenience as well. The new repo at the point of a release contains the most up to date version of each package. (They're just hardlinks, so don't take any space.) This provides an easy way to claim back some space when I retire an old version of a repo, as you just delete the repo and any packages that aren't duplicated in other repos get deleted with it. It also means that an upgraded system cannot see old package versions, so you naturally prevent users getting out of date and incompatible versions.

Whether this approach is viable in the longer term is another matter. If there are stable releases that get "support" long term, then I'll have to keep old package versions and old repos for longer. But it's worked well so far.

By and large, once I've cut a new release, the older releases don't get updates. This isn't completely true, security updates (openssl, for example, and bind today) do get updated in the prior release, at least for a while. This means keeping an old machine around for the build (a simple VM is fine).

Saturday, September 17, 2016

Tribblix Milestone 18

Time for another Tribblix release, this one following the sequence and called Milestone 18.

The list of changes is pretty dry. Let me add a little colour to that.

On the desktop, MATE has been updated to the current 1.14 release. This provoked a little investigation into desktop caches, because adding MATE broke things. (I've just now added another little change to my MATE packaging which should catch another problem. Sigh.) I also added the EDE desktop as another fast and light option.

I finally got around to building my own copy of libtiff (rather than the old binary version I had inherited from OpenIndiana). This involved a major version bump, and then rebuilding anything that depended on the old version. I created a compatibility package containing the old shared libraries as a stopgap, while working my way through the list. One of the applications that needed updating was gdk-pixbuf, and then there are applications that link against both gdk-pixbuf and libtiff directly.

Very little of the software I ship needs or wants GTK3, so I'm happy with GTK2 (which I did a minor update of). But at some point I'm going to have to update to GTK3. So I tried to update to a later version than I had, in accordance with the Tribblix philosophy of keeping current. Because I don't actually use GTK3 much, it was well behind. Unfortunately, getting completely up to date involved updating Cairo, Pango, GLib, ATK, D-Bus, returning ETOOMUCHWORK. I went to an intermediate step of version 3.14.15, which involved updating ATK. As part of that, I had to update D-Bus, another component I had previously inherited from OpenIndiana. As it's pretty foundational, that required some care and attention to detail, but after working out the appropriate tweaks to match how it had been built before, that went very smoothly. The Linux community (rightly) gets a lot of stick for not caring about compatibility, but I have been very pleased at how good binary compatibility has been with the various desktop components.

As I was going through the various version bumps, I realized that almost everything using LCMS now used lcms2, so I made sure that the one holdout, gimp, was forced to use lcms2 rather than the lcms1 that it picked by default.

It's not only the desktop. Tribblix isn't just a desktop distro, that's just rather more visible (and sometimes more fun). Some of the work here tends to follow a theme - for example on load balancers. Reading between the lines you might be able to detect that I've been working on antivirus (clamav and c-icap), there are other cases where I've used Tribblix to build, package, and test components that might be useful elsewhere


There are some isolated new packages that don't obviously make sense. Sometimes, I have to build and package prerequisites as part of building something else. For example, I had a look at pitivi. While building pitivi itself wasn't successful, I needed to get tools like meson and ninja and nose built, and components like pycairo. As I've gone to the effort of packaging, I'll keep them - they'll be useful in the future when I return to pitivi, and may well be useful for other tools. The same is true for snort, which is why libdnet and daq have been added, even though snort itself isn't there yet.

There was a mailing list thread on shells, which mentioned Plan9. So I went and added Plan9 from User Space because, well, I could, and it was an interesting opportunity to play with something different. I've also removed csh, it's now a link to tcsh. That wasn't a result of the thread, it was something I had meant to do for the last 2 releases but had forgotten in the build.

User feedback is always good. It tends to catch the cases I've never encountered myself. I've added an editor to the live environment, there's nano there now, if ever you need to edit any files.

Friday, September 16, 2016

Tribblix philosophy - software fidelity and currency

One of the key things about Tribblix is that it is very light-touch.

Partly this is out of necessity - this is a part-time endeavour for one person, and I do what I can to minimize the amount of effort I have to put in.

So, as far as possible, I don't change what I get from upstream.

For illumos, I'm as vanilla illumos-gate as you can get (I make one change to the SVR4 packaging tools, important to me as I'm the only distro based on illumos-gate who uses SVR4 packaging).

For other packages, apart from setting the install prefix, I only make changes necessary for applications to build and run. I make no real attempts to tweak or flavour them for Tribblix, you get as much as possible what the original author intended.

This is deliberate, who am I to decide to change the behaviour of somebody else's software?

Also, it makes maintenance easier, as I don't have to try and port patches forward to new versions.

Talking about new versions, I try and keep current. Yes, this implies a rolling release model, of sorts. If there's a problem with a package, I'll roll it forward to a newer release. I won't backport fixes to an older version, I'll simply push out the newer version.

If I can, that is. Sometimes a newer version is broken and doesn't work or won't build. Sometimes a newer version requires an update somewhere else, so it gets stalled and dumped back on the TODO list until the other component gets updated. Sometimes, especially for libraries, the new version isn't API compatible, so anything using it will also need to be rebuilt. This tends to get blocked, although I occasionally go in and update a whole dependency tree at once (which is a pain to do).

Fortunately, most of the core packages have learnt the value of compatibility. Things such as X11, Glib, D-Bus, Cairo, Pango, Gtk, most of the core desktop stack are never much of an issue. (Although because of their position in the dependency tree, I tend to be fairly cautious when I do have to touch them.)

Monday, August 29, 2016

The Tribblix filesystem layout

On Solarish systems, the filesystem(5) manpage gives a good description of where in the directory tree you might find the various files associated with a piece of software.

The version in illumos is largely broken, in that many of the directories referenced make no sense at all for illumos itself, and are largely wrong for the various illumos distributions. In particular, some of the directories are very specific to the old Solaris Java Desktop System, or JDS, and relate to GNOME.

Now, how does Tribblix handle all this?

For anything inherited from illumos-gate, I simply put files wherever illumos-gate put them.

For anything I build and ship, I normally build with a --prefix of /usr. And, for most packages, that's the only thing I set. What this means is that for most packages, --sysconfdir is /usr/etc and --localstatedir would be /usr/var. I do not redirect --sysconfdir to /etc by default. In most cases I think I've done the right thing, to be honest, as often the files that would have been put into /etc aren't meaningfully editable in any case.

In those cases where the application does expect user-editable configuration, I will set --sysconfdir to /etc. This covers things like BIND, samba, cups, openssh, and the like.

Laying things out like this helps with things like sparse-root zones. I'm loopback-mounting /usr read-only, and that neatly catches everything (and ensures the parts of a package are consistent).

On the subject of zones, in a sparse-root zone /lib is inherited, which causes a problem. The SMF manifests and method scripts are now stored under /lib, and some are only relevant to the global zone. To handle this, I make a fixed copy of /lib for sparse-root zones to use, that doesn't have any errant SMF services present.

In order to be able to add my own services to zones, I make sure the manifests live under /var, which is unique to a zone.

I also handle /opt specially. According to filesystem(5), this is the "Root of a subtree for add-on application packages." The idea has always been that 3rd-parties pick a directory there and have that as their own dedicated prefix. (As an aside, I've always found the use of /etc/opt/foo and /var/opt/foo to be incredibly confusing, as it basically splatters the files associated with a given application all over the filesystem, making it very hard to keep track of things. Which is one of the reasons I just specify the prefix and put everything under the one root if I can.)

And what I do with /opt is mandate that it's not inherited by zones. Anything installed in /opt won't automatically be inherited by a zone. If you want it in a zone, you need to make sure it gets added there.

For my own applications designed for zones - particularly services, I put them under /opt/tribblix, so that an application foobar lives in /opt/tribblix/foobar, its configuration under /opt/tribblix/foobar/etc, and the like. Again, it's easier to see everything clearly if there's only one place to look. This layout makes it easy to run services in sparse-root zones, as the OS in /usr is read-only and the application never needs to touch that.

Modulo dependencies, anyway. That's a problem I haven't really solved, as some applications depend on packages that live in /usr, so I need some way to ensure that the right packages are installed in the global zone (or the zone template).

Solaris also had the notion of subsystems. For example, CDE (the dt subsystem) lived under /usr/dt, /var/dt, /etc/dt and the like. Again, I don't follow that. (Although there is the one exception which is that I install CDE under /usr/dt, because that's where it's always lived.) Most things are either generic (so live directly in /usr) or are  services that live under /opt/tribblix for zone support.

The exception to this are packages that live under /usr/versions in Tribblix. The main idea here is for things that might come in more than 1 version. For example, python 2 vs python 3. Or the various versions of Node.js or Java. Here the convention is that the application lives in a versioned directory under /usr/versions, allowing multiple versions of an application to coexist. (One or two things end up under /usr/versions even though there's no meaningful need to ever support multiple versions, when I need to put something in it's own directory hierarchy rather than directly in /usr, just to avoid having to create another standard location. Sort of like subsystems, but more tightly managed.) I'll generally put convenience links in the default path, although sometimes that involves picking a default version.

This all mirrors how I used to install software on Solaris 10 with zones many years ago. It's designed with zones in mind, and has been pretty sucessful.

Saturday, August 27, 2016

Updating desktop caches and a tale of woe

I recently updated some of the MATE components for Tribblix. On testing, various bits of MATE didn't work. Worse, various bits of Xfce didn't work.

The first issue that was fairly easy to solve was that MATE was looking for its menus under /etc/xdg/menus, whereas it had installed them under /usr/etc/xdg/menus. I had to set XDG_CONFIG_DIRS=/usr/etc/xdg in the MATE session startup scripts, and the menus reappeared.

Slightly trickier was that all the mime associations had stopped working. For everything - MATE, Xfce, and any other desktop that uses the shared desktop mime infrastructure.

There are various desktop caches that have to be kept up to date. Each cache is handled by an SMF service, so if you know a cache needs to be updated, then you just get a package to kick the service whenever it's installed or uninstalled. I had inherited these from OpenIndiana which got them from OpenSolaris, so they have gnome in the package name even though they're really more generic.

Each of these SMF services has a method script that follows the same pattern. First check to see if anything needs doing, then update the cache if necessary.

This logic is, in some cases, plain broken. There's a python script that does the check, and in most cases the check is much more expensive than actually updating the cache. For a couple of the services, I had already simply ignored the check and blindly done the update. In particular, for updating the icon cache, which is the most common case.

Another problem with this check is that if you add an old package or untar some old files, there's the possibility that the "new" files you've just added get ignored because they have older timestamps than the cache. This shouldn't be a problem because the search looks for ctime, but some things reset that as well.

This time, the mime caches got broken. This only happened because there's normally one package - shared-mime-info - providing the mime types. Nothing else updates it. Until MATE comes along.

I had a bit of a dig, and this python script turns out to have python2.4 in it's shebang. Oops! I've never shipped python older than 2.7, so this never worked. The method scripts redirect errors to /dev/null so they're never seen. The fact that this stuff worked at all was a complete accident, with the cache file being created the first time and never updated since.

There's a desktop mime cache, and that's pretty quick, so that was easy - just do the update without checking, and it's at least twice as quick.

The main mime cache is the one where the update itself is very expensive - it's almost 3s, which would be an eternity during every boot if it's done unconditionally. So for that I had to fix the python shebang and keep the logic intact.

As an aside, the package that delivers the SMF scripts and manifests hadn't been updated in ages, and I hadn't documented how it was built. Updating is easy - just take the existing package, modify, and repackage it. But I put together a desktop-cache repo on github to hold it so I don't have to go dumpster-diving next time.

While I was at it I discovered that the package dependencies weren't quite right (the SMF methods clearly depend on the packages supplying the binaries that update the caches), and the way I had put these into the Tribblix overlays wasn't quite right, so I sorted that out too.

All in all, a lot of effort to sort out something that shouldn't have been broken in the first place. Hence the tale of woe.

Oh, and there's a third bug I haven't yet tracked down. Sometimes the MATE file manager, caja, goes into a loop trying to open a png file under ${prefix}. Yes, the actual open() call includes a literal ${prefix}, so something hasn't been substituted in the code correctly.

Sunday, July 31, 2016

Minimizing apache and PHP

Recently I was looking at migrating a simple website in which every page but one was static.

The simplest thing here would be to use nginx. It's simple, fast, modern, and should make it dead easy to get an A+ on the Qualys SSL test.

But that non-static page? A trivial contact form. Fill in a box, the back-end sends the content of the box as an email message.

The simplest thing here in days gone by would have been to put together a trivial CGI script. Only nginx doesn't do CGI, at least not directly. Not only that, but writing the CGI script and doing it well is pretty hard.

So, what about PHP? Now, PHP has gotten itself a not entirely favourable reputation on the security front. Given the frequent security updates, not entirely undeserved. But could it be used for this?

For such a task, all you need is the mail() function. Plus maybe a quick regex and some trivial string manipulation. All that is in core, so you don't need very much of PHP at all. For example, you could use the follwoing flags to build:

--disable-all
--disable-cli
--disable-phpdbg

So, no modules. Far less to go wrong. On top  of that, you can disable a bunch of things in php.ini:

file_uploads = Off [change]
allow_url_fopen = Off [change]
allow_url_include = Off [default]
display_errors = Off [default]
expose_php=Off [change]

Furthermore, you could start disabling functions to your heart's content:

disable_functions = php_uname, getmyuid, getmypid, passthru, leak, listen, diskfreespace, tmpfile, link, ignore_user_abord, shell_exec, dl, set_time_limit, exec, system, highlight_file, source, show_source, fpaththru, virtual, posix_ctermid, posix_getcwd, posix_getegid, posix_geteuid, posix_getgid, posix_getgrgid, posix_getgrnam, posix_getgroups, posix_getlogin, posix_getpgid, posix_getpgrp, posix_getpid, posix, _getppid, posix_getpwnam, posix_getpwuid, posix_getrlimit, posix_getsid, posix_getuid, posix_isatty, posix_kill, posix_mkfifo, posix_setegid, posix_seteuid, posix_setgid, posix_setpgid, posix_setsid, posix_setuid, posix_times, posix_ttyname, posix_uname, proc_open, proc_close, proc_get_status, proc_nice, proc_terminate, phpinfo

Once you've done that, you end up with a pretty hardened PHP install. And if all it does is take in a request and issue a redirect to a static target page, it doesn't even need to create any html output.

Then, how to talk to PHP?  The standard way to integrate PHP with nginx is using FPM. Certainly, if this was a high or even moderate traffic site, then that would be fine. But that involves leaving FPM running permanently, and is a bit of a pain and a resource hog for one page that might get used once a week.

So how about forwarding to apache? Integration using mod_php is an absolute doddle. OK, it's still running permanently, but you can dial down the process count and it's pretty lightweight. But we have a similar issue to the one we faced with PHP - the default build enables a lot of things we don't need. I normally build apache with:

--enable-mods-shared=most
--enable-ssl

but in this case you can reduce that to:

--enable-modules=few
--disable-ssl

now, there is the option of --enable-modules=none, but I couldn't actually get apache to start at all with that - some modules appear to be essential (mod_authz_host, mod_dir, mod_mime, and mod_log_config at least), and going below the "few" setting is entering unsupported territory.

You can restrict apache even further with configuration, just enable PHP, return an error for any other page, listen only on localhost. (I like the concept of the currently experimental mod_allowmethods as we might only want POST for this case. Normally disabling methods with current apache version involves mod_rewrite, which is one of the more complex modules.)

In the end, we elected to solve the problem a different way, but it was still an instructive exercise.

The above would be suitable for one particular use case. For a general service, it would be completely useless. Most providers and distributions tend to build with the kitchen sink enabled, because you don't know what your users or customers might need at runtime. They might build everything as a shared module, and could package every module in a separate package (although this ends up being a pain to manage); or you rely on the user to explicitly enable and disable modules as necessary.

In Tribblix, I've tended to avoid breaking something like apache or PHP up into multiple packages.  There's one exception, which is that the PHP interface to postgresql is split out into a separate package. This is simply because it links against the postgresql shared library, so I ship that part separately to avoid forcing postgresql to be installed as a dependency.

Saturday, July 30, 2016

Building Tribblix packages

Software in Tribblix is delivered in packages, which come from one of three sources - an illumos build, a bootstrap distribution (OpenIndiana or OpenSXCE depending on hardware architecture), and native Tribblix packages.

The illumos packages are converted from the IPS repo created during a build of illumos-gate using the repo2svr4 script in the tribblix-build repo. There's also a script ips2svr4 in the same repo that's used to construct an SVR4 package from that installed on a system using IPS packaging, such as OpenIndiana.  The OpenSXCE packages are shipped as-is.

(The use of another distro to provide components was expedient during early bootstrapping. Over time, the fraction of the OS provided by that other distribution has shrunk dramatically. At the present time, it's mostly X11.)

What of the other packages, those natively built on Tribblix?

Those are described in the build repo.

In the build repo, there are a number of top-level scripts. Key of these is dobuild, which is the primary software builder. Basically, it unpacks a source archive, runs configure, make, and make install. It can apply patches, run scripting before and after the configure step, and knows how to handle most things that are driven by autoconf.

There are some other scripts of note. The genpkg and create_pkg scripts go from a build to a package. The pkg_tarball script is an easy way to do a straight conversion of an archive to a package. There are scripts to create the package catalogs.

For each package, there is a directory named after the package, contains files used in the build.

At the very minimum, you need a pkginfo file (this is a fragment, the rpocess creates the rest of the actual pkginfo file). There's the possibility of using fixit and fixinstall scripts to fix up any errant behaviour from the make install step before actually creating packages. There are depend files listing package dependencies, and alias files listing user-friendly aliases for packages.

However, how do you know how a package was actually built? Even for packages created with the dobuild script, there are a lot of flags that could have been provided. And a lot of software doesn't fit into the configure style of build in any case.

What I actually did was have a big text file containing the commands I used to build each package. Occasionally with some very unprintable comments about some of the steps I had to take to get things to build. (So simply adding that file to the repo was never going to be a sensible way forward.)

So what I've done is split those notes up and created a file build.sh for each package, which contains the instructions used to create that package. It assumes that the THOME environment variable points to the parent of the build repo, and that there's a parallel tarballs directory containing the archives. (Many of the scripts, unfortunately, assume a certain value for that location, which is the location on my own machine. Yes, that should be fixed.)

There are a number of caveats here.

The first is that some packages don't have a build.sh file. Yet. Some of these are my own existing packages, which were built outside Tribblix. Some go back to the very earliest days, and notes as to how they were built have been lost in the mists of time - these will be added whenever that package is next built.

The second is that the build recipe was valid at the time it was last used. If you were to run the recipe now, it might not work, due to changes in the underlying system - packages are not rebuilt unless they need to be, so the recipes can go all the way back to the very first release. It might not generate the same output. (This is really autoconf, which gropes around the system looking for things it can use, so running it again might pull in additional dependencies. Occasionally this causes problems and I need to explicitly enable or disable certain features. In some cases, you have to uninstall packages to make the build run in a sane manner.)

The third is that, while the build recipe looks like a shell script, and in many cases will actually function as such, it's really a recipe that you cut and paste into a terminal. At least, that's what I do. Sometimes it's necessary, because there was some manual hacky workaround I needed that's just in the build script as a comment.

This has been an outstanding TODO item for a while now, so I'm glad to have got it out of the way.



Wednesday, June 22, 2016

Getting to grips with Docker

A while ago, I described how we took an existing application build script and managed to run it inside Docker.

Having played with this inside Docker a little more, it's probably worth scribbling down a few notes I happened to stumble across on the way.

I'm looking at having 2 basic images: as a foundation, Ubuntu with all the packages we want added; then an image that inherits FROM that with our application stack built and installed (but not configured). The idea behind this layering is simply to separate the underlying OS, which is fairly standard, from the unique stuff that is all ours.

Then, you create an instance image from the application image, simply by running a configuration script that you COPY in. Once you've got a configured application instance, you create a volume container from it, and then run the application image using the volume(s) from the instance image. You keep that volume container around, just as a home for your data, essentially forever. And you can run multiple application instances from the same base image, you just need to configure and create a volume container for each instance.

That's a brief overview of the workflow, now some tweaks and pitfalls.

We're using Ubuntu, so the first step is to run apt-get with our list of packages. This originally created a 965MB image. It's not going to be small, we need both java and a full development stack to create our application.

However, some of the stuff installed we'll never need. Using the --no-install-recommends flag to apt-get saved us about 150M. The recommends list is stuff that might be useful, but not essential. But remember - our Docker container is only ever going to run a fixed set of applications, so we'll never need any of the optional stuff. The only thing to be careful of here is if you accidentally depend on something in the recommends list without realizing you're only getting it indirectly.

We can do slightly better in terms of saving space. We use postgresql, but get it to store the database files in our own locations, so we can remove /var/lib/postgresql/9.X and what's underneath it, saving almost another 40M.

One thing to be aware of is that the list of packages in the official Ubuntu Docker image isn't quite the same as you would get from a regular Ubuntu install. There are one or two packages we didn't bother adding because they were there in a regular install that we need to add with Docker. Things like sudo and wget are on this list, so I needed to add those to the apt-get list.

Another thing to be aware of is that because you're building images afresh each time, you aren't guaranteed that new users will always get the same uid and gid. If you change the list of packages (even by just adding --no-install-recommends), this might change which users exist, and that affects the uid assigned to later users. I got burnt when a later base build ended up giving the postgres user a different uid, so it didn't own its database files on the persistent volume any more. I think the long term fix here is to create the users you need by hand before installing any packages, forcing the uid and gid to known values.

In order to keep image sizes small, you'll often see "rm -rf /var/lib/apt/lists/*" in a Dockerfile. In general, deleting temporary files is a good idea. This includes any files created by your own software deployment stage. Cleaning that up properly saved me another 200M or so in the final image. (Remember to clean up /tmp, that's part of the image too.)

It isn't strictly related to Docker, but I hit an ongoing problem - in some environments I ended up blocking on /dev/random. Search around and you'll find a lot of problems reported, especially related to java and SecureRandom (or, in our case, jruby). Running Docker on my Mac was fine, running it on a server in the cloud gave me 15-minute startup times. The solution here is to add -Djava.security.egd=file:///dev/urandom or -Djava.security.egd=file:/dev/./urandom to your java startup (or JAVA_OPTIONS).

(And, by the way, this illustrates that while Docker can guarantee that your app is the same in all environments, it doesn't magically protect you from differences in the underlying environment that can have a massive impact on your application.)

My application listens on ports 8080 and 8443, which I map on the host to the common ports, with

docker run -p 80:8080 -p 443:8443 ...

This works fine for me in testing, when I'm only running one copy and simply point a browser at the host. Networking gets a whole lot more complicated with multiple containers, although I think something like a load-balancer in front might work.

I've been using the Docker for Mac beta for some of this - while at times it's been beta in terms of stability, generally I can say it's a very impressive piece of work.

Sunday, June 19, 2016

Data Destruction and illumos

When disposing of  a computer, you would like to be sure that it has no data on its storage that could be accessed by the direct recipient (or any future recipient). It would be somewhat embarrassing for personal photos to be retrieved; it would be far worse if financial or business data were to be left accessible.

The keywords you're looking for here are data remanence and disk sanitization.

There are three methods to remove data from a disk. Total physical destruction, degaussing, and overwriting the data. The effectiveness of these methods is up for debate; as is the feasibility of a sufficiently determined and well-funded attacker being able to retrieve data.

Here I'm just going to discuss overwriting the disk. For a lot of casual and home purposes that'll be enough, and is a lot better than not bothering at all, or simply reformatting the drive (or reinstalling on OS on it) which will leave a lot of disk sectors untouched and amenable to simply being read off in software.

The standard here seems to be DBAN. However, it's not seen much activity in a while, and was sold to a company that offers a commercial product that's claimed to be much better.

Basically, all DBAN is doing is scribbling over every sector on a drive. That's not hard.

In Solarish systems, format/analyze/purge does essentially the same thing. It's the documented method for wiping hard drives on Solarish style systems.

However, it's a little fiddly to use and requires a modest level of expertise to get that far. You can't purge the disk you're booted from, the solution proposed there is to boot from installation media, drop to a shell, and run format from there. That has a couple of problems - it's still very manual, and the install (or live) media are rather large and can take an age to boot.

So I started to think, how hard could it be to create a minimalist illumos boot media that just contains the format command, and a simple script around it to make it easy to run?

I've already done most of the work, as part of the minimal viable illumos project. It was pretty easy to create a new variant.

The idea is to erase disk drives, so the intended target is physical hardware rather than a hypervisor. So I added a number of common storage drivers to the image. (As an aside, I really have no idea as to what storage HBAs are actually in common use, so which drivers to put in this list or on the Tribblix install iso is largely guesswork.)

There should be no need for networking. You really don't want a mechanism for any external access to the system while the disks are being wiped, so networking is simply not there.

And I added a simple wrapper script that enumerates disk drives and runs the appropriate format commands. If you want to see how this works, just look at the wrapper script. All this is in the mvi repo, see the files with "wipe" in their names.

And there's the (14M in size) iso image I created also available.

(Why is such a small image good, you might ask? Apart from simply being sure that it's only capable of doing the one function that it's advertised for, if you're trying to wipe a remote system mounting the image over the network, then the smaller the better.)

I tested this in VirtualBox, which exposed a few quirks. For one, the defect list switching you'll see in the docs doesn't work there (I have no idea if it's going to work on any real hardware). The other is that the disk image I was using was a file on a compressed zfs file system. The purge process writes a repeating pattern, which is very compressible, so the 1G disk image I was testing only takes up 16M of disk space.

While I don't think it's really a proper alternative to DBAN, I think it's useful as a real-world example of how to use mvi.

Thursday, June 16, 2016

Connecting to legacy Sun ILOM with modern clients

The bane of many a system administrator's existence is the remote management capability on their servers. In Sun's case, I'm talking about the ILOM.

(Of course, Sun have had RSC and ALOM and eLOM and maybe some other abominations over time.)

Now, for many purposes, you can just ssh to the ILOM and you're done. On Sun boxes anyway, where you often have serial console redirection and the OS using the serial console.

However, if you want to manage the system fully, you need a proper client. There are a couple of common cases. First, if you need the VGA console (either for a broken OS, or to interact with the BIOS), or if you want to do storage redirection (in other words, you want to remotely present a bootable image).

That's where the fun starts, and you get in a tangled relationship with Java. Often, it ends up being a tale of woe.

And that's on the best of days. With legacy hardware - such as the X4150 - it gets a whole lot more interesting.

Now, while the X4150 is legacy and well past end of life now, it turns out that there was an updated firmware release in 2015. (For POODLE, I think.) If you can, apply this, as it should fix some of the UI compatibility issues with newer browsers. (Not all, I suspect, but if you've tried using a current browser and only got half the GUI then you know what I'm talking about.)

However, that doesn't necessarily mean that the Java application is going to work. There are actually a couple of issues here.

The first is that the application is a signed jar, and the certificate used to sign it has expired. Worse, due to Java's rather chequered security history, current versions have draconian checks in place which you'll run into. To fix, go to the Java Control Panel, down to "Perform signed code verification checks" and change it to "Do not check". Generally, disabling security like this is a bad idea, but in this case it's necessary.

Next, if you start up the application, click through the remaining security dialogs, and try to connect to the console, you'll get a cipher suite mismatch failure. The ILOM is pretty old, and uses SSLv3 which is disabled by default in current Java. You'll need to edit the java.security file (in ${JAVA_HOME}/jre/lib/security/java.security[*]) and comment out two lines - the ones with jdk.certpath.disabledAlgorithms and jdk.tls.disabledAlgorithms, then run the application again.

With luck, that will at least enable you to get to the console.

If you want storage redirection, then you're in for more fun. For starters, you need to be running Solaris, Linux, or Windows. If you're on a Mac, it's not going to work. You'll need to get yourself another machine, or run a VM with something else installed.

And the other thing is that you need to be running a 32-bit Java Virtual Machine. If you're running Solaris, this rules out Java 8 - you'll have to go back to Java 7. On other platforms, you'll have to make sure you have a 32-bit JVM, which might not be the default and you might have to manually install it.

Oh, and if you're on Linux or Solaris and running OpenJDK (rather than the Oracle builds) then you'll need IcedTea to get the javaws integration. At least with IcedTea you can ignore the Java Control Panel stuff.

[*: On my Mac I discovered that I had 2 different installations of Java. The one that you get if you type "java" isn't the same one used for browser integration and javaws launching. Running /usr/libexec/java_home gave me the wrong one; I ended up looking at the ps output when running the Control Panel to find out the location of the one I really needed.]

Monday, June 06, 2016

What's present in libc on illumos?

Over time, operating systems such as illumos gain new functionality. For examples, new functions are added to libc. But how do you know what's there, when was it added, and whether a newer version contains something that's missing?

So perhaps the first place to start is with elfdump. If you run

elfdump -v /lib/libc.so

then you'll get a bunch of lines like so:

     index  version                     dependency
       [1]  libc.so.1                                        [ BASE ]
       [2]  ILLUMOS_0.13                ILLUMOS_0.12        
       [3]  ILLUMOS_0.12                ILLUMOS_0.11        
       [4]  ILLUMOS_0.11                ILLUMOS_0.10        
 ...

      [51]  SUNW_0.8                    SUNW_0.7            
      [52]  SUNW_0.7                    SYSVABI_1.3         
      [53]  SYSVABI_1.3                                     
      [54]  SUNWprivate_1.1                                 

So, these lines correspond to the various released versions of libc. Every time you add a new function to libc, that means a new version of the library. And each version depends on the one before.

These versions are listed in a mapfile, at usr/src/lib/libc/port/mapfile-vers in the illumos source. So, what you can see here is that ILLUMOS_0.13 (which is the version shipped with Tribblix 0m16) is where eventfd got added. If you want strerror_l (which you do if you're building vlc), then you need ILLUMOS_0.14; if you want pthread_attr_get_np (which you need for QT5) then you need ILLUMOS_0.21. (Unfortunately, Tribblix 0m17 only picks up ILLUMOS_0.19.) Looking back, you can see everything exposed by libc and what version of Solaris or illumos it was added in.

Another trick is to run elfdump against a binary. For example:

elfdump -v /usr/gnu/bin/tar

will tell us which versions of which libraries are required. Part of this is:

Version Needed Section:  .SUNW_version
     index  file                        version
       [2]  libnsl.so.1                 SUNW_0.7            
       [3]  libc.so.1                   ILLUMOS_0.8         
       [4]                              ILLUMOS_0.1          [ INFO ]
       [5]                              SUNW_1.23            [ INFO ]
       [6]                              SUNW_1.22.6          [ INFO ]

This is quite informative. It tells you that  gtar calls the newlocale stuff from ILLUMOS_0.8, and a bunch of older stuff. But the point here is that it is calling illumos additions to libc, so this binary won't work on Solaris 10 (and probably not on Solaris 11 either). If you're building binaries for distribution across distros, you can use this information to confirm that you haven't accidentally pulled in functions that might not be available everywhere.


Wednesday, May 18, 2016

Installing Tribblix into an existing pool

The normal installation method for Tribblix is the live_install.sh script.

This creates a ZFS pool for installation into, creates file systems, copies the OS, adds packages, makes a few customizations, installs the bootloader, and not much else. It's designed to be simple.

(There's an alternative, to install to a UFS file system. Not much used, but it's kept just to ensure no insidiuous dependencies creep into the regular installer, and is useful for people with older underpowered systems.)

However, if you've already got an illumos distro installed, then you might already have a ZFS pool, and it might also have some useful data you would rather not wipe everything out. Is there not a way to create a brand new boot environment in the existing pool and install Tribblix to that, preserving all your data?

(Remember, also, that  ZFS encourages the separation of OS and data. So you should be able to replace the OS without disturbing the data.)

As of Tribblix Milestone 17, this will work. Booting from the ISO and logging in, you'll find a script called over_install.sh in root's home directory. You can use that instead of live_install.sh, like so:

./over_install.sh -B rpool kitchen-sink

You have to give it the name of the existing bootable pool, usually rpool. It will do a couple of sanity checks to be sure this pool is suitable, but will then create a new BE there and install to that.

Arguments after the pool name are overlays, specifying what software to install, just like the regular install.

It will update grub for you, so that you have a grub on the pool that is compatible with the version of illumos you've just added. With -B, it will update the MBR as well.

It copies some files, the minimum that define the system's identity, from the existing bootable system into the new BE. This basically copies across user accounts (group, passwd, shadow files) and the system's ssh keys, but nothing else.

Any existing zfs file systems are untouched, and will be present in the new system. You'll have to import any additional zfs pools, though.

When you boot up after this, the grub menu will just contain the new BE you just created. However, any old boot environments are still present, so you can still see them, and manipulate them, using beadm. In particular, you can mount up and old BE (in case there are important files you need to get back), and activate an old BE so you can boot into the old system if so desired.

Amongst other things, you can use this as a recovery tool, when your existing system has a functioning root pool but won't boot.

I've also used this to "upgrade" older Tribblix systems. While there is an upgrade mechanism, this really only works (a) for very recent releases, and (b) to update one release at a time. With this new mechanism, I can simply stick a new copy of Milestone 17 on an old box, and enjoy the new version while having all my data intact.

Tuesday, May 17, 2016

Updating Tribblix for SPARC

Having just released an updated version of Tribblix for x86, a little commentary on the status of the SPARC version is probably in order.

After a rather extended delay, there is an updated version of Tribblix for SPARC available for download.

This is Milestone 16, so it's at the same level as the prior x86 release. More precisely, it's built from exactly the same illumos source as the x86 Milestone 16 release was. Yes, this means that it's a bit dated, but it's consistent, and there have been a number of breaking changes for SPARC builds introduced (and fixed) in the meantime.

I did want to get a release out of the door, before bumping the version to 0m17 ready for the next x86 build. Part of this was simply to ensure that I could actually still build a release - it gets tested on x86 all the time, but as it happens I had never built a SPARC release on my current infrastructure.

In terms of additional packages, the selection is still rather sparse. I haven't had the time to build up the full list. This isn't helped by the fact that my SPARC kit is rather slow, so building anything for SPARC simply takes longer. Much longer. (Not to mention the fact that they're noisy and power-hungry.)

It's not just time that's the problem. I've had some difficulty building certain packages. I'm not talking the likes of Go and Node, which I can simply ignore as they're not ported to SPARC at all, but some reasonably common packages would fail with obscure (and unexpected) build errors. If there's a problem with one component, that blocks anything dependent on it too.

Other than expanding the breadth of available packages, the key next steps are (i) to make Tribblix on SPARC self-hosting, like the x86 version has been for a very long time, and (ii) to try and keep the SPARC release more closely aligned so it doesn't drift away from the x86 release and require additional effort to bring back in sync.

Signing Packages in Tribblix

On any computer system you want to know exactly what software is installed and running.

Tribblix uses SVR4 packaging, so you can easily see what's installed. In addition, there are mechanisms - pkgchk - to compare what's on the disk with what the packaging system thinks should be there. But that's just a consistency check, it doesn't verify that the package installed is actually the one you wanted.

Tribblix has had simple integrity checking for a while. The catalog for a package repository includes both the expected size and the md5 checksum of a package. This is largely aimed at dealing with download errors - network drops, application errors, or errant intrusion detection systems mangling the data. In practice, because the downloaded packages are actually zip files, which have inbuilt consistency checking and the catalog at the end of the file, and because SVR4 packaging has its own consistency checks on package contents, the chances of a faulty download getting installed are remote, the checking is so that the layer above can make smart decisions in the case of failure.

But you want to be sure that, not only has the package you downloaded made it across the network intact, but that the source package is legitimate. So the packages are signed using gnupg, and will be verified upon download in upcoming releases. Initially this is just a warning check while the mechanisms get sorted out.

The actual signing and verification part is the easy bit, it's all the framework around it that takes the time to write and test.

One possibility would have been to sign the package catalogs, and use that to prove that the checksum is correct. That's not enough, for a couple of reasons. First, the catalog only includes current package versions, so there would be no way to verify prior versions. Second, there's no reason somebody (or me) couldn't take a subset of packages and create a new repo using them; the modified catalog couldn't be verified. In either case, you need to be able to verify individual packages. (But the package catalog should also be signed, of course.)

It turns out there's not much of a performance hit. Downloads are a little slower, because there's an extra request to get the detached signature, but it's a tiny change overall.

With this in place, you can be sure that whatever you install on Tribblix is legitimate. But all you're doing is verifying the packages at download time. This leaves open the problem of being able to go to a system and ask whether the installed files are legitimate. Yes, there's pkgchk, but there's no validated source of information for it to use as a reference - the contents file is updated with every packaging operation, so it clearly can't be signed by me each time.

This is likely to require the additional creation of a signed manifest for each package. This partially exists already, as the pkgmap fragments for each package are saved (in the global zone, anyway), and those could be signed (as they don't change) and used as the input to pkgchk. However, the checksums in the pkgmap and contents files aren't particularly strong (to put it mildly), so that file will need to be replaced by something with much stronger checksums.

Initial support for signed packages is available starting with the Tribblix Milestone 17 release. At this point, it will check the package signatures, but not act on them, enforcement will probably come in the next release when I can be reasonably sure that everything is actually working correctly.

Saturday, April 02, 2016

Minimal illumos networking

Playing around with Minimal Viable Illumos, it becomes clear how complex a system illumos is. Normally, if you're running a full system, everything is taken care of for you. In particular, SMF starts all the right services for you. Looking through the method scripts, there's a lot of magic happening under the hood.

If you start with a bare install, booted to a shell, with no startup scripts having run, you can then bring up networking automatically.

The simplest approach would be to use ifconfig to set up a network interface. For example

ifconfig e1000g0 plumb 10.0.0.1 up

This won't work. It will plumb the interface, but not assign the address. To get the address assigned, you need to have ipmgmtd running.

env SMF_FMRI=svc/net/ip:d /lib/inet/ipmgmtd

Note that SMF_FMRI is set. This is because ipmgmtd expects to be run under SMF, and checks for SMF_FMRI being set as the determinant. It doesn't matter what the value is, it just needs to exist.

OK, so if you know your network device and your address, that's pretty much it.

If you use dladm to look for interfaces, then you'll see nothing. In order for dladm show-phys to enumerate the available interfaces, you need to start up dlmgmtd and give it a poke

env SMF_FMRI=svc/net/dl:d /sbin/dlmgmtd
dladm init-phys

Note that SMF_FMRI is set here, just like for ipmgmtd.

At this point, dladm show-phys will give you back the list of interfaces as normal.

You don't actually need to run dladm init-phys. If you know you have a given interface, plumbing it will poke enough of the machinery to make it show up in the list.

If you don't know your address, then you might think of using dhcp to assign it. This turns out to require a bit more work.

The first thing you need to do is bring up the loopback.

ifconfig lo0 plumb 127.0.0.1 up
ifconfig lo0 inet6 plumb ::1 up

This is required because dhcpagent wants to bind to localhost:4999, so you need the loopback set up for it to be able to do that. (You don't necessarily need the IPv6 version, that's for completeness.)

Then

ifconfig e1000g0 plumb
ifconfig e1000g0 auto-dhcp primary

Ought to work. Sometimes it does, sometimes it doesn't.

I sometimes found that I needed to

dladm init-secobj

in order to quieten errors that occasionally appear, and sometimes need to run

ifconfig -adD4 auto-revarp netmask + broadcast + up

although I can't even imagine what that would do to help, I presume that it kicks enough of the machinery to make sure everything is properly initialized.

Within the context of mvi, setting a static IP address means you need to craete a new image for each IP address, which isn't actually too bad (creating mvi images is really quick), but isn't sustainable in the large. DHCP would give it an address, but then you need to track what address a given instance has been allocated. Is there a way of doing static IP by poking the values from outside?

A tantalizing hint is present at the end of /lib/svc/method/net-physical, where it talks about boot properties being passed in. The implementation there appears to be aimed at Xen, I'll have to investigate if it's possible to script variables into the boot menu.

Monday, March 28, 2016

Running illumos in 48M of RAM

Whilst tweaking mvi recently, I went back and had another look at just how minimal an illumos install I could make.

And, given sufficiently aggressive use of the rm command, the answer appears to be that it's possible to boot illumos in 48 meg of RAM.

No, it's not April 1st. 48 meg of RAM is pretty low. It's been a long time since I've seen a system that small.

I've added some option scripts to the mvi repo. (The ones with -fix.sh as part of their names.) You don't have to run mvi to see what these do.

First, I start with mvix.sh, which is fairly minimal up front.

Then I go 32bit, which halves the size.

Then I apply the extreme option, which removes zfs (it's the single biggest driver left), along with a bunch of crypto and other unnecessary files. And I clean up lots of bits of grub that aren't needed.

I then abstracted out a nonet and nodisk option. The nonet script removes anything that looks like networking from the kernel, and the bits of userland that I add in order to be able to configure an interface. The nodisk script removes all the remaining storage device drivers (I only included the ones that you normally see when running under a hypervisor in the first place), along with the underlying scsi and related driver frameworks.

What you end up with is a 17M root file system, which compresses down to a 6.8M root archive, which gets packaged up in an 8.7M iso.

For those interested the iso is here. It should run in VirtualBox - in 32-bit mode, and you should be able to push the memory allocated by VirtualBox down to 48M. Scary.

Of course, it doesn't do much. It boots to a prompt, the only tools you have are ksh, ls, and du.

(Oh, and if you run du against the devices tree, it will panic.)

While doing this, I found that there are a lot of dependencies between modules in the illumos kernel. Not all of them are obvious, and trying to work out what's needed and what can be removed has involved large amounts of trial and error. That said, it takes less than 5 seconds to create an iso, and it takes longer for VirtualBox to start than for this iso to boot, so the cycle is pretty quick.

Sunday, March 27, 2016

Almost there with Solarus

Every so often I'll have a go at getting some games running on Tribblix. It's not a gaming platform, but having the odd distraction can't hurt.

I accidentally stumbled across Solarus, and was immediately intrigued. Back in the day, I remember playing Zelda on the SNES. A few years ago it was released for the Game Boy Advance, and I actually went out and bought the console and the game. I haven't added any more games, nothing seemed compelling, although we had a stock of old Game Boy games (all mono) and those still worked, which is great.

So I thought I would try and build Solarus for Tribblix. Having had a quick look at the prerequisites, most of those would be useful anyway so the effort wouldn't be wasted in any event.

First up was SDL 2.0. I've had the older 1.2.15 version for a while, but hadn't had anything actually demand version 2 yet. That was easy enough, and because all the filenames are versioned, it can happily be installed alongside the older version.

While I was at it, I installed the extras SDL_image, SDL_net, SDL_ttf, smpeg, and SDL_mixer. The only build tweak needed was to supply LIBS="-lsocket -lnsl" to the configure script for smpeg and SDL_net. I needed to version the plaympeg binary installed by smpeg to avoid a conflict with the older version of smpeg, but that was about it.

Then came OpenAL, which turned out to be a bit tricky to find. Some of the obvious search results didn't seem appropriate. I have no idea whether the SourceForge site is the current one, but it appears to have the source code I needed.

Another thing that looks abandoned is modplug, where the Solarus folks have a github mirror of the copy that's right for them.

Next up, PhysicsFS. This isn't quite what you expect from the name, it's an abstraction layer that replaces traditional file system access.

On to Solarus itself. This uses cmake to build, and I ended up with the following incantation in a new empty build directory:

cmake ../solarus-1.4.5 \
 -DCMAKE_INSTALL_RPATH_USE_LINK_PATH=NO \
 -DCMAKE_INSTALL_RPATH=/opt/solarus/lib \
 -DSOLARUS_USE_LUAJIT=OFF \
 -DCMAKE_INSTALL_PREFIX=/opt/solarus

Let's go through that. The CMAKE_INSTALL_PREFIX is fairly obvious - that's where I'm going to install it. And SOLARUS_USE_LUAJIT is necessary because I've got Lua, but have never had LuaJIT working successfully.

The two RPATH lines are necessary because of the strange way that cmake handles RPATH. Usually, it builds with RPATH set to the build location, then installs with RPATH stripped. This is plain stupid, but works when you simply dump everything into a single swamp. So you need to manually force it to put the correct RPATH into the binary (which is the sort of thing you would actually want a build system to get right on its own).

Unfortunately, it doesn't actually work properly. There are two problems which I haven't really had a chance to look at - the first is that it fails fatally with an Xlib error if I move the mouse (which is a little odd as it doesn't actually use the mouse); the second is that it runs an order of magnitude or two slower than useful, so I suspect a timing error.

Still, the build is pretty simple and it's so close to working that it would be nice to finish the job.

Saturday, March 26, 2016

Tweaking MVI

A few months ago I first talked about minimal viable illumos, an attempt to construct a rather more minimalist bootable copy of illumos than the gigabyte-size image that are becoming the norm.

I've made a couple of changes recently, which are present in the mvi repository.

The first is to make it easier for people who aren't me (and myself when I'm not on my primary build machine) to actually use mvi. The original version had the locations of the packages hardcoded to the values on my build machine. Now I've abstracted out package installation, which eliminates quite a lot of code duplication. And then I added an alternative package installation script which uses zap to retrieve and install packages from the Tribblix repo, just like the regular system does. So you can much more easily run mvi from a vanilla Tribblix system.

What I would like to add is a script that can run on pretty much any (illumos) system. This isn't too hard, but would involve copying most of the functionality of zap into the install script. I'm holding off for a short while, hoping that a better mechanism presents itself. (By better, what I mean is that I've actually got a number of image creation utilities, and it would be nice to rationalise them rather than keep creating new ones.)

The second tweak was to improve the way that the size of the root archive is calculated, to give better defaults and adapt to variations more intelligently.

There are two slightly different mechanisms used to create the image. In mvi.sh, I install packages, and then delete what I'm sure I don't need; with mvix.sh I install packages and then only take the files I do need. The difference in size is considerable - for the basic installation mvi.sh is 127M and mvix.sh is 57M.

Rather than a common base image size of 192M, I've set mvi.sh to 160M and mvix.sh to 96M. These sizes give a reasonable amount of free space - enough that adding the odd package doesn't require the sizes to be adjusted.

I then have standard scripts to construct 32-bit and 64-bit images. A little bit of experimentation indicates that the 32-bit image ends up being half the size of the base, whereas the 64-bit image comes in at two thirds. (The difference is that in the 32-bit image, you can simply remove all 64-bit files. For a 64-bit kernel, you still need both 32-bit and 64-bit userland.) So I've got those scripts to simply scale the image size, rather than try and pick a new number out of the air.

I also have a sample script to install Node.js. This again modifies the image size, just adding the extra space that Node needs. I've had to calculate this more accurately, as reducing the size of the base archive gave me less margin for error.

(As an aside, adding applications doesn't really work well in general with mvix.sh, as it doesn't know what dependencies applications might need - it only installs the bare minimum the OS needs to boot. Fortunately Node is fairly self-contained, but other applications are much less so.)