Tuesday, September 27, 2016

Tribblix - updates versus upgrades

Having released a new version of Tribblix, I thought it worth writing a little on how I see updates and upgrades in the Tribblix world, and how they differ.

After all, one thing I said about the Tribblix philosophy of keeping current is that Tribblix is essentially a rolling release, in that new versions of applications are continuously added. You can just update and you'll get the latest version of applications.

So, what defines an upgrade is that it's when the illumos components are updated. In fact, the only way to update any of the illumos packages is via an upgrade.

This is mostly for purely practical reasons. The way a package is updated is to remove it (using pkgrm underneath) and then install the new version (using pkgadd underneath). This is problematic in several ways: you don't want a system problem half way through to leave you with a critical package uninstalled; you can't operate at all with libc removed; and you want the system packages to be updated together as a coherent unit rather than individually. It might be possible to think of a horrendously complex system to solve these problems; it's much better just to do it another easier way.

As for implementation, the illumos packages live in their own software repo, and there's one illumos repo per release. No updates ever get applied to that repo, if there are updates a new repo gets created. The process of doing an upgrade is to clone the system to a new BE (boot environment), change that BE to point to the new repo, update all the packages in the new BE, then reboot into it.

In practice, the main Tribblix repo is also versioned per-release. Originally that was because it contains the zap package, which is where the repos are defined. However, it turns out that creating a new repo is an administrative convenience as well. The new repo at the point of a release contains the most up to date version of each package. (They're just hardlinks, so don't take any space.) This provides an easy way to claim back some space when I retire an old version of a repo, as you just delete the repo and any packages that aren't duplicated in other repos get deleted with it. It also means that an upgraded system cannot see old package versions, so you naturally prevent users getting out of date and incompatible versions.

Whether this approach is viable in the longer term is another matter. If there are stable releases that get "support" long term, then I'll have to keep old package versions and old repos for longer. But it's worked well so far.

By and large, once I've cut a new release, the older releases don't get updates. This isn't completely true, security updates (openssl, for example, and bind today) do get updated in the prior release, at least for a while. This means keeping an old machine around for the build (a simple VM is fine).

Saturday, September 17, 2016

Tribblix Milestone 18

Time for another Tribblix release, this one following the sequence and called Milestone 18.

The list of changes is pretty dry. Let me add a little colour to that.

On the desktop, MATE has been updated to the current 1.14 release. This provoked a little investigation into desktop caches, because adding MATE broke things. (I've just now added another little change to my MATE packaging which should catch another problem. Sigh.) I also added the EDE desktop as another fast and light option.

I finally got around to building my own copy of libtiff (rather than the old binary version I had inherited from OpenIndiana). This involved a major version bump, and then rebuilding anything that depended on the old version. I created a compatibility package containing the old shared libraries as a stopgap, while working my way through the list. One of the applications that needed updating was gdk-pixbuf, and then there are applications that link against both gdk-pixbuf and libtiff directly.

Very little of the software I ship needs or wants GTK3, so I'm happy with GTK2 (which I did a minor update of). But at some point I'm going to have to update to GTK3. So I tried to update to a later version than I had, in accordance with the Tribblix philosophy of keeping current. Because I don't actually use GTK3 much, it was well behind. Unfortunately, getting completely up to date involved updating Cairo, Pango, GLib, ATK, D-Bus, returning ETOOMUCHWORK. I went to an intermediate step of version 3.14.15, which involved updating ATK. As part of that, I had to update D-Bus, another component I had previously inherited from OpenIndiana. As it's pretty foundational, that required some care and attention to detail, but after working out the appropriate tweaks to match how it had been built before, that went very smoothly. The Linux community (rightly) gets a lot of stick for not caring about compatibility, but I have been very pleased at how good binary compatibility has been with the various desktop components.

As I was going through the various version bumps, I realized that almost everything using LCMS now used lcms2, so I made sure that the one holdout, gimp, was forced to use lcms2 rather than the lcms1 that it picked by default.

It's not only the desktop. Tribblix isn't just a desktop distro, that's just rather more visible (and sometimes more fun). Some of the work here tends to follow a theme - for example on load balancers. Reading between the lines you might be able to detect that I've been working on antivirus (clamav and c-icap), there are other cases where I've used Tribblix to build, package, and test components that might be useful elsewhere


There are some isolated new packages that don't obviously make sense. Sometimes, I have to build and package prerequisites as part of building something else. For example, I had a look at pitivi. While building pitivi itself wasn't successful, I needed to get tools like meson and ninja and nose built, and components like pycairo. As I've gone to the effort of packaging, I'll keep them - they'll be useful in the future when I return to pitivi, and may well be useful for other tools. The same is true for snort, which is why libdnet and daq have been added, even though snort itself isn't there yet.

There was a mailing list thread on shells, which mentioned Plan9. So I went and added Plan9 from User Space because, well, I could, and it was an interesting opportunity to play with something different. I've also removed csh, it's now a link to tcsh. That wasn't a result of the thread, it was something I had meant to do for the last 2 releases but had forgotten in the build.

User feedback is always good. It tends to catch the cases I've never encountered myself. I've added an editor to the live environment, there's nano there now, if ever you need to edit any files.

Friday, September 16, 2016

Tribblix philosophy - software fidelity and currency

One of the key things about Tribblix is that it is very light-touch.

Partly this is out of necessity - this is a part-time endeavour for one person, and I do what I can to minimize the amount of effort I have to put in.

So, as far as possible, I don't change what I get from upstream.

For illumos, I'm as vanilla illumos-gate as you can get (I make one change to the SVR4 packaging tools, important to me as I'm the only distro based on illumos-gate who uses SVR4 packaging).

For other packages, apart from setting the install prefix, I only make changes necessary for applications to build and run. I make no real attempts to tweak or flavour them for Tribblix, you get as much as possible what the original author intended.

This is deliberate, who am I to decide to change the behaviour of somebody else's software?

Also, it makes maintenance easier, as I don't have to try and port patches forward to new versions.

Talking about new versions, I try and keep current. Yes, this implies a rolling release model, of sorts. If there's a problem with a package, I'll roll it forward to a newer release. I won't backport fixes to an older version, I'll simply push out the newer version.

If I can, that is. Sometimes a newer version is broken and doesn't work or won't build. Sometimes a newer version requires an update somewhere else, so it gets stalled and dumped back on the TODO list until the other component gets updated. Sometimes, especially for libraries, the new version isn't API compatible, so anything using it will also need to be rebuilt. This tends to get blocked, although I occasionally go in and update a whole dependency tree at once (which is a pain to do).

Fortunately, most of the core packages have learnt the value of compatibility. Things such as X11, Glib, D-Bus, Cairo, Pango, Gtk, most of the core desktop stack are never much of an issue. (Although because of their position in the dependency tree, I tend to be fairly cautious when I do have to touch them.)

Monday, August 29, 2016

The Tribblix filesystem layout

On Solarish systems, the filesystem(5) manpage gives a good description of where in the directory tree you might find the various files associated with a piece of software.

The version in illumos is largely broken, in that many of the directories referenced make no sense at all for illumos itself, and are largely wrong for the various illumos distributions. In particular, some of the directories are very specific to the old Solaris Java Desktop System, or JDS, and relate to GNOME.

Now, how does Tribblix handle all this?

For anything inherited from illumos-gate, I simply put files wherever illumos-gate put them.

For anything I build and ship, I normally build with a --prefix of /usr. And, for most packages, that's the only thing I set. What this means is that for most packages, --sysconfdir is /usr/etc and --localstatedir would be /usr/var. I do not redirect --sysconfdir to /etc by default. In most cases I think I've done the right thing, to be honest, as often the files that would have been put into /etc aren't meaningfully editable in any case.

In those cases where the application does expect user-editable configuration, I will set --sysconfdir to /etc. This covers things like BIND, samba, cups, openssh, and the like.

Laying things out like this helps with things like sparse-root zones. I'm loopback-mounting /usr read-only, and that neatly catches everything (and ensures the parts of a package are consistent).

On the subject of zones, in a sparse-root zone /lib is inherited, which causes a problem. The SMF manifests and method scripts are now stored under /lib, and some are only relevant to the global zone. To handle this, I make a fixed copy of /lib for sparse-root zones to use, that doesn't have any errant SMF services present.

In order to be able to add my own services to zones, I make sure the manifests live under /var, which is unique to a zone.

I also handle /opt specially. According to filesystem(5), this is the "Root of a subtree for add-on application packages." The idea has always been that 3rd-parties pick a directory there and have that as their own dedicated prefix. (As an aside, I've always found the use of /etc/opt/foo and /var/opt/foo to be incredibly confusing, as it basically splatters the files associated with a given application all over the filesystem, making it very hard to keep track of things. Which is one of the reasons I just specify the prefix and put everything under the one root if I can.)

And what I do with /opt is mandate that it's not inherited by zones. Anything installed in /opt won't automatically be inherited by a zone. If you want it in a zone, you need to make sure it gets added there.

For my own applications designed for zones - particularly services, I put them under /opt/tribblix, so that an application foobar lives in /opt/tribblix/foobar, its configuration under /opt/tribblix/foobar/etc, and the like. Again, it's easier to see everything clearly if there's only one place to look. This layout makes it easy to run services in sparse-root zones, as the OS in /usr is read-only and the application never needs to touch that.

Modulo dependencies, anyway. That's a problem I haven't really solved, as some applications depend on packages that live in /usr, so I need some way to ensure that the right packages are installed in the global zone (or the zone template).

Solaris also had the notion of subsystems. For example, CDE (the dt subsystem) lived under /usr/dt, /var/dt, /etc/dt and the like. Again, I don't follow that. (Although there is the one exception which is that I install CDE under /usr/dt, because that's where it's always lived.) Most things are either generic (so live directly in /usr) or are  services that live under /opt/tribblix for zone support.

The exception to this are packages that live under /usr/versions in Tribblix. The main idea here is for things that might come in more than 1 version. For example, python 2 vs python 3. Or the various versions of Node.js or Java. Here the convention is that the application lives in a versioned directory under /usr/versions, allowing multiple versions of an application to coexist. (One or two things end up under /usr/versions even though there's no meaningful need to ever support multiple versions, when I need to put something in it's own directory hierarchy rather than directly in /usr, just to avoid having to create another standard location. Sort of like subsystems, but more tightly managed.) I'll generally put convenience links in the default path, although sometimes that involves picking a default version.

This all mirrors how I used to install software on Solaris 10 with zones many years ago. It's designed with zones in mind, and has been pretty sucessful.

Saturday, August 27, 2016

Updating desktop caches and a tale of woe

I recently updated some of the MATE components for Tribblix. On testing, various bits of MATE didn't work. Worse, various bits of Xfce didn't work.

The first issue that was fairly easy to solve was that MATE was looking for its menus under /etc/xdg/menus, whereas it had installed them under /usr/etc/xdg/menus. I had to set XDG_CONFIG_DIRS=/usr/etc/xdg in the MATE session startup scripts, and the menus reappeared.

Slightly trickier was that all the mime associations had stopped working. For everything - MATE, Xfce, and any other desktop that uses the shared desktop mime infrastructure.

There are various desktop caches that have to be kept up to date. Each cache is handled by an SMF service, so if you know a cache needs to be updated, then you just get a package to kick the service whenever it's installed or uninstalled. I had inherited these from OpenIndiana which got them from OpenSolaris, so they have gnome in the package name even though they're really more generic.

Each of these SMF services has a method script that follows the same pattern. First check to see if anything needs doing, then update the cache if necessary.

This logic is, in some cases, plain broken. There's a python script that does the check, and in most cases the check is much more expensive than actually updating the cache. For a couple of the services, I had already simply ignored the check and blindly done the update. In particular, for updating the icon cache, which is the most common case.

Another problem with this check is that if you add an old package or untar some old files, there's the possibility that the "new" files you've just added get ignored because they have older timestamps than the cache. This shouldn't be a problem because the search looks for ctime, but some things reset that as well.

This time, the mime caches got broken. This only happened because there's normally one package - shared-mime-info - providing the mime types. Nothing else updates it. Until MATE comes along.

I had a bit of a dig, and this python script turns out to have python2.4 in it's shebang. Oops! I've never shipped python older than 2.7, so this never worked. The method scripts redirect errors to /dev/null so they're never seen. The fact that this stuff worked at all was a complete accident, with the cache file being created the first time and never updated since.

There's a desktop mime cache, and that's pretty quick, so that was easy - just do the update without checking, and it's at least twice as quick.

The main mime cache is the one where the update itself is very expensive - it's almost 3s, which would be an eternity during every boot if it's done unconditionally. So for that I had to fix the python shebang and keep the logic intact.

As an aside, the package that delivers the SMF scripts and manifests hadn't been updated in ages, and I hadn't documented how it was built. Updating is easy - just take the existing package, modify, and repackage it. But I put together a desktop-cache repo on github to hold it so I don't have to go dumpster-diving next time.

While I was at it I discovered that the package dependencies weren't quite right (the SMF methods clearly depend on the packages supplying the binaries that update the caches), and the way I had put these into the Tribblix overlays wasn't quite right, so I sorted that out too.

All in all, a lot of effort to sort out something that shouldn't have been broken in the first place. Hence the tale of woe.

Oh, and there's a third bug I haven't yet tracked down. Sometimes the MATE file manager, caja, goes into a loop trying to open a png file under ${prefix}. Yes, the actual open() call includes a literal ${prefix}, so something hasn't been substituted in the code correctly.

Sunday, July 31, 2016

Minimizing apache and PHP

Recently I was looking at migrating a simple website in which every page but one was static.

The simplest thing here would be to use nginx. It's simple, fast, modern, and should make it dead easy to get an A+ on the Qualys SSL test.

But that non-static page? A trivial contact form. Fill in a box, the back-end sends the content of the box as an email message.

The simplest thing here in days gone by would have been to put together a trivial CGI script. Only nginx doesn't do CGI, at least not directly. Not only that, but writing the CGI script and doing it well is pretty hard.

So, what about PHP? Now, PHP has gotten itself a not entirely favourable reputation on the security front. Given the frequent security updates, not entirely undeserved. But could it be used for this?

For such a task, all you need is the mail() function. Plus maybe a quick regex and some trivial string manipulation. All that is in core, so you don't need very much of PHP at all. For example, you could use the follwoing flags to build:

--disable-all
--disable-cli
--disable-phpdbg

So, no modules. Far less to go wrong. On top  of that, you can disable a bunch of things in php.ini:

file_uploads = Off [change]
allow_url_fopen = Off [change]
allow_url_include = Off [default]
display_errors = Off [default]
expose_php=Off [change]

Furthermore, you could start disabling functions to your heart's content:

disable_functions = php_uname, getmyuid, getmypid, passthru, leak, listen, diskfreespace, tmpfile, link, ignore_user_abord, shell_exec, dl, set_time_limit, exec, system, highlight_file, source, show_source, fpaththru, virtual, posix_ctermid, posix_getcwd, posix_getegid, posix_geteuid, posix_getgid, posix_getgrgid, posix_getgrnam, posix_getgroups, posix_getlogin, posix_getpgid, posix_getpgrp, posix_getpid, posix, _getppid, posix_getpwnam, posix_getpwuid, posix_getrlimit, posix_getsid, posix_getuid, posix_isatty, posix_kill, posix_mkfifo, posix_setegid, posix_seteuid, posix_setgid, posix_setpgid, posix_setsid, posix_setuid, posix_times, posix_ttyname, posix_uname, proc_open, proc_close, proc_get_status, proc_nice, proc_terminate, phpinfo

Once you've done that, you end up with a pretty hardened PHP install. And if all it does is take in a request and issue a redirect to a static target page, it doesn't even need to create any html output.

Then, how to talk to PHP?  The standard way to integrate PHP with nginx is using FPM. Certainly, if this was a high or even moderate traffic site, then that would be fine. But that involves leaving FPM running permanently, and is a bit of a pain and a resource hog for one page that might get used once a week.

So how about forwarding to apache? Integration using mod_php is an absolute doddle. OK, it's still running permanently, but you can dial down the process count and it's pretty lightweight. But we have a similar issue to the one we faced with PHP - the default build enables a lot of things we don't need. I normally build apache with:

--enable-mods-shared=most
--enable-ssl

but in this case you can reduce that to:

--enable-modules=few
--disable-ssl

now, there is the option of --enable-modules=none, but I couldn't actually get apache to start at all with that - some modules appear to be essential (mod_authz_host, mod_dir, mod_mime, and mod_log_config at least), and going below the "few" setting is entering unsupported territory.

You can restrict apache even further with configuration, just enable PHP, return an error for any other page, listen only on localhost. (I like the concept of the currently experimental mod_allowmethods as we might only want POST for this case. Normally disabling methods with current apache version involves mod_rewrite, which is one of the more complex modules.)

In the end, we elected to solve the problem a different way, but it was still an instructive exercise.

The above would be suitable for one particular use case. For a general service, it would be completely useless. Most providers and distributions tend to build with the kitchen sink enabled, because you don't know what your users or customers might need at runtime. They might build everything as a shared module, and could package every module in a separate package (although this ends up being a pain to manage); or you rely on the user to explicitly enable and disable modules as necessary.

In Tribblix, I've tended to avoid breaking something like apache or PHP up into multiple packages.  There's one exception, which is that the PHP interface to postgresql is split out into a separate package. This is simply because it links against the postgresql shared library, so I ship that part separately to avoid forcing postgresql to be installed as a dependency.

Saturday, July 30, 2016

Building Tribblix packages

Software in Tribblix is delivered in packages, which come from one of three sources - an illumos build, a bootstrap distribution (OpenIndiana or OpenSXCE depending on hardware architecture), and native Tribblix packages.

The illumos packages are converted from the IPS repo created during a build of illumos-gate using the repo2svr4 script in the tribblix-build repo. There's also a script ips2svr4 in the same repo that's used to construct an SVR4 package from that installed on a system using IPS packaging, such as OpenIndiana.  The OpenSXCE packages are shipped as-is.

(The use of another distro to provide components was expedient during early bootstrapping. Over time, the fraction of the OS provided by that other distribution has shrunk dramatically. At the present time, it's mostly X11.)

What of the other packages, those natively built on Tribblix?

Those are described in the build repo.

In the build repo, there are a number of top-level scripts. Key of these is dobuild, which is the primary software builder. Basically, it unpacks a source archive, runs configure, make, and make install. It can apply patches, run scripting before and after the configure step, and knows how to handle most things that are driven by autoconf.

There are some other scripts of note. The genpkg and create_pkg scripts go from a build to a package. The pkg_tarball script is an easy way to do a straight conversion of an archive to a package. There are scripts to create the package catalogs.

For each package, there is a directory named after the package, contains files used in the build.

At the very minimum, you need a pkginfo file (this is a fragment, the rpocess creates the rest of the actual pkginfo file). There's the possibility of using fixit and fixinstall scripts to fix up any errant behaviour from the make install step before actually creating packages. There are depend files listing package dependencies, and alias files listing user-friendly aliases for packages.

However, how do you know how a package was actually built? Even for packages created with the dobuild script, there are a lot of flags that could have been provided. And a lot of software doesn't fit into the configure style of build in any case.

What I actually did was have a big text file containing the commands I used to build each package. Occasionally with some very unprintable comments about some of the steps I had to take to get things to build. (So simply adding that file to the repo was never going to be a sensible way forward.)

So what I've done is split those notes up and created a file build.sh for each package, which contains the instructions used to create that package. It assumes that the THOME environment variable points to the parent of the build repo, and that there's a parallel tarballs directory containing the archives. (Many of the scripts, unfortunately, assume a certain value for that location, which is the location on my own machine. Yes, that should be fixed.)

There are a number of caveats here.

The first is that some packages don't have a build.sh file. Yet. Some of these are my own existing packages, which were built outside Tribblix. Some go back to the very earliest days, and notes as to how they were built have been lost in the mists of time - these will be added whenever that package is next built.

The second is that the build recipe was valid at the time it was last used. If you were to run the recipe now, it might not work, due to changes in the underlying system - packages are not rebuilt unless they need to be, so the recipes can go all the way back to the very first release. It might not generate the same output. (This is really autoconf, which gropes around the system looking for things it can use, so running it again might pull in additional dependencies. Occasionally this causes problems and I need to explicitly enable or disable certain features. In some cases, you have to uninstall packages to make the build run in a sane manner.)

The third is that, while the build recipe looks like a shell script, and in many cases will actually function as such, it's really a recipe that you cut and paste into a terminal. At least, that's what I do. Sometimes it's necessary, because there was some manual hacky workaround I needed that's just in the build script as a comment.

This has been an outstanding TODO item for a while now, so I'm glad to have got it out of the way.