Thursday, November 26, 2015

Buggy basename

Every so often you marvel at the lengths people go to to break things.

Take the basename command in illumos, for example. This comes in two incarnations - /usr/bin/basename, and /usr/xpg4/bin/basename.

Try this:

# /usr/xpg4/bin/basename -- /a/b/c.txt

Which is correct, and:

# /usr/bin/basename -- /a/b/c.txt

Which isn't.

Wait, it gets worse:

# /usr/xpg4/bin/basename /a/b/c.txt .t.t

Correct. But:

# /usr/bin/basename /a/b/c.txt .t.t

Err, what?

Perusal of the source code reveals the answer to the "--" handling - it's only caught in XPG4 mode. Which is plain stupid, there's no good reason to deliberately restrict correct behaviour to XPG4.

Then the somewhat bizarre handling with the ".t.t" suffix. So it turns out that the default basename command is doing pattern matching rather then the expected string matching. So the "." will match any character, rather than being interpreted literally. Given how commonly "." is used to separate the filename from its suffix, and the common usage of basename to strip off the suffix, this is a guarantee for failure and confusion. For example:

# /usr/bin/basename /a/b/cdtxt .txt

The fact that there's a difference here  is actually documented in the man page, although not very well - it points you to expr(1) which doesn't tell you anything relevant.

So, does anybody rely on the buggy broken behaviour here?

It's worth noting that the ksh basename builtin and everybody else's basename implementation seems to do the right thing.

Fixing this would also get rid of a third of the lines of code and we could just ship 1 binary instead of 2.

Tuesday, November 24, 2015

Replacing SunSSH with OpenSSH in Tribblix

I recently did some work to replace the old SSH implementation used by Tribblix, which was the old SunSSH from illumos, with OpenSSH.

This was always on the list - our SunSSH implementation was decrepit and unmaintained, and there seemed little point in general in maintaining our own version.

The need to replace has become more urgent recently, as the mainstream SSH implementations have drifted to the point that we're no longer compatible - to the point that our implementation will not interoperate at all with that on modern Linux distributions with the default settings.

As I've been doing a bit of work with some of those modern Linux distributions, being unable to connect to them was a bit of a pain in the neck.

Other illumos distributions such as OmniOS and SmartOS have also recently been making the switch.

Then there was a proposal to work on the SunSSH implementation so that it was mediated - allowing you to install both SunSSH and OpenSSH and dynamically switch between them to ease the transition. Personally, I couldn't see the point - it seemed to me much easier to simply nuke SunSSH, especially as some distros had already made or were in the process of making the transition. But I digress.

If you look at OmniOS, SmartOS, or OpenIndiana, they have a number of patches. In some cases, a lot of patches to bring OpenSSH more in line with old SunSSH.

I studied these at some length, looked at them, and largely rejected them. There are a couple of reasons for this:

  • In Tribblix, I have a philosophy of making minimal modifications to upstream projects. I might apply patches to make software build, or when replacing older components so that I don't break binary compatibility, but in general what I ship is as close to what you would get if you did './configure --prefix=/usr; make ; make install' as I can make it.
  • Some of the fixes were for functionality that I don't use, probably won't use, and have no way of testing. So blindly applying patches and hoping that what I produce still works, and doesn't arbitrarily break something else, isn't appealing. Unfortunately all the gssapi stuff falls into this bracket.
One thing that might change this in the future, and something we've discussed a little, is to have something like Joyent's illumos-extra brought up to a state where it can be used as a common baseline across all illumos distributions. It's a bit too specific to SmartOS right now, so won't work for me out of the box, and it's a little unfortunate that I've just about reimplemented all the same things for Tribblix myself.

So what I ship is almost vanilla OpenSSH. The modifications I have made are fairly few:

It's split into the same packages (3 of them) along just about the same boundaries as before. This is so that you don't accidentally mix bits of SunSSH with the new OpenSSH build.

The server has
KexAlgorithms +diffie-hellman-group1-sha1
added to /etc/ssh/sshd_config to allow connections from older SunSSH clients.

The client has
PubkeyAcceptedKeyTypes +ssh-dss
added to /etc/ssh/ssh_config so that it will allow you to send DSA keys, for users who still have just DSA keys.

Now, I'm not 100% happy about the fact that I might have broken something that SunSSH might have done, but having a working SSH that will interoperate with all the machines I need to talk to outweighs any minor disadvantages.

Sunday, November 22, 2015

On Keeping Your Stuff to Yourself

One of the fundamental principles of OmniOS - and indeed probably its defining characteristic - is KYSTY, or Keep Your Stuff* To Yourself.

(*um, whatever.)

This isn't anything new. I've expressed similar opinions in the past. To reiterate - any software that is critical for the success of your business/project/infrastructure/whatever should be directly under your control, rather than being completely at the whim of some external entity (in this case, your OS supplier).

We can flesh this out a bit. The software on a system will fall, generally, into 3 categories:

  1. The operating system, the stuff required for the system to boot and run reliably
  2. Your application, and its dependencies
  3. General utilities

As an aside, there are more modern takes on the above problem: with Docker, you bundle the operating system with your application; with unikernels you just link whatever you need from classes 1 and 2 into your application. Problem solved - or swept under the carpet, rather.

Looking at the above, OmniOS will only ship software in class 1, leaving the rest to the end user. SmartOS is a bit of a hybrid - it likes to hide everything in class 1 from you and relies on pkgsrc to supply classes 2 and 3, and the bits of class 1 that you might need.

Most (of the major) Linux distributions ship classes 1, 2, and 3, often in some crazily interdependent mess that you have to spend ages unpicking. The problem being that you need to work extra hard to ensure your own build doesn't accidentally acquire a dependency on some system component (or that you build somehow reads a system configuration file).

Generally missing from discussions is that class 3 - the general utilities. Stuff that you could really do with an instance of to make your life easier, but where you don't really care about the specifics of.

For example, it helps to have a copy of the gnu userland around. Way too much source out there needs GNU tar to unpack, or GNU make to build, or assumes various things about the userland that are only true of the GNU tools. (Sometimes, the GNU tools aren't just a randomly incompatible implementation, occasionally have capabilities that are missing from standard tools - like in-place editing in gsed.)

Or a reasonably complete suite of compression utilities. More accurately, uncompression, so that you have a pretty good chance of being able to unpack some arbitrary format that people have decided to use.

Then there are generic runtimes. There's an awful lot of python or perl out there, and sometimes the most convenient way to get a job done is to put together a small script or even a one-liner. So while you don't really care about the precise details, having copies of the appropriate runtimes (and you might add java, erlang, ruby, node, or whatever to that list) really helps for the occasions when you just want to put together a quick throwaway component. Again, if your business-critical application stack requires that runtime, you maintain your own, with whatever modules you need.

There might also be a need for basic graphics. You might not want or need a desktop, but something is linked against X11 anyway. (For example, java was mistakenly linked against X11 for font handling, even in headless mode - a bug recently fixed.) Even if it's not X11, applications might use common code such as cairo or pango for drawing. Or they might need to read or write image formats for web display.

So the chances are that you might pull in a very large code surface, just for convenience. Certainly I've spent a lot of time building 3rd-party libraries and applications on OmniOS that were included as standard pretty much everywhere else.

In Tribblix, I've attempted to build and package software cognizant of the above limitations. So I supply as wide a range of software in class 3 as I can - this is driven by my own needs and interests, as a rule, but over time it's increasingly complete. I do supply application stacks, but these are built to be in a separate location, and are kept at arms length from the rest of the system. This then integrated with Zones in a standardized zone architecture in a way that can be managed by zap. My intention here is not necessarily to supply the building blocks that can be used by users, but to provide the whole application, fully configured and ready to go.

Sunday, November 15, 2015

The Phallacy of Web Scale

A couple of times recently I've had interviewers ask me to quickly sketch out the design for a web scale architecture. Of course, being a scientist by training the first thing I did was to work out what sort of system requirements we're looking at, to see what sort of scale we might really need.

In both cases even my initial extreme estimate, just rounding everything up, didn't indicate a scaling problem. Most sites aren't Facebook or Google, they see limited use by a fraction of the population. The point here is that while web scale sites exist, they are the exception rather than the rule, so why does everyone think they have to go to the complexity and expense of architecting a "web scale" solution?

To set this into perspective, assume you want to track everyone in the UK's viewing habits. If everyone watches 10 programmes per day and channel-hops 3 times for each programme, and there are 30 million viewers, then that's 1 billion data points per day or 10,000 per second. Each is small, so at 100 bytes each that's 100GB/day or ~10megabits/s. So, you're still talking single server. You can hold a week's data in RAM, a year on disk.

And most businesses don't need anything like that level of traffic.

Part of the problem is that most implementations are horrifically inefficient. The site itself may be inefficient - you know the ones that have hundreds of assets on each page, multi-megabyte page weight, widgets all over, that take forever to load and look awful - if their customers bother waiting long enough. The software implementation behind the site is almost certainly inefficient (and is probably trying to do lots of stupid things it shouldn't as well).

Another trend fueling this is the "army of squirrels" approach to architecture. Rather than design an architecture that is correctly sized to do what you need, it seems all too common to simply splatter everything across a lot of little boxes. (Perhaps because the application is so badly designed it has cripplingly limited scalability so you need to run many instances.) Of course, all you've done here is simply multiplied your scaling problem, not solved it.

As an example,  see this article Scalability! But at what COST? I especially like the quote that Big data systems may scale well, but this can often be just because they introduce a lot of overhead.

Don't underestimate the psychological driver, either. A lot of people want to be seen as operating at "web scale" or with "Big Data", either to make themselves or their company look good, to pad their own CV, or to appeal to unsophisticated potential employees.

There are problems that truly require web scale, but for the rest it's an ego trip combined with inefficient applications on badly designed architectures.

Thursday, November 12, 2015

On the early web

I was browsing around, as one does, when I came across a list of early websites. Or should I say, a seriously incomplete list of web servers of the time.

This was November 1992, and I had been running a web server at the Institute of Astronomy in Cambridge for some time. That wasn't the only technology we were using at the time, of course - there was Gopher, the emergent Hyper-G, WAIS, ftp, fsp, USENET, and a few others that never made the cut.

Going back a bit further in time, about a year earlier, is an email regarding internet connectivity in Cambridge. I vaguely remember this - I had just arrived at the IoA at the time and was probably making rather a nuisance of myself, having come back from Canada where the internet was already a thing.

I can't remember exactly when we started playing with the web proper, but it would have been some time about Easter 1992. As the above email indicates, 1991 saw the department having the staggering bandwidth of 64k/s, and I think it would have taken the promised network upgrade for us to start advertising our site.

Graphical browsers came quite late - people might think of Mosaic (which you can still run if you like), but to start with we just had the CERN line mode browser, and things like Viola. Around this time there were other graphical browsers - there was one in the Andrew system, as I recall, and chimera was aimed at the lightweight end of the scale.

Initially we ran the CERN web server, but it was awful - it burnt seconds of cpu time to deliver every page, and as soon as the NCSA server came out we switched to that, and the old Sun 630MP that hosted all this was much the happier for it. (That was the machine called cast0 in the above email - the name there got burnt into the site URL, it took a while for people to get used to the idea of adding functional aliases to DNS.)

With the range of new tools becoming available, it wasn't entirely obvious which technologies would survive and prosper.

With my academic background I was initially very much against the completely unstructured web, preferring properly structured and indexed technologies. In fact, one of the things I remember saying at the time, as the number of sites started to grow, was "How on earth are people going to be able to find stuff?". Hm. Missed business opportunity there!

Although I have to say that even with search engines, actually finding stuff on the web now is a total lottery - Google have made a lot of money along the way, though. One thing I miss, again from the early days of the web (although we're talking later in the 90s now) is the presence of properly curated and well maintained indices of web content.

Another concern I had about the web was that, basically, any idiot could create a web page, leading to most of the content being complete and utter garbage (both in terms of what it contained and how it was coded). I think I got the results of that one dead right, but it failed to account for the huge growth that the democratization of the web allowed.

After a couple of years the web was starting to emerge as a clear front runner. OK, there were only a couple of thousand sites in total at this point (I think that up to the first thousand or so I had visited every single one), and the concept was only just starting to become known to the wider public.

One of the last things I did at the IoA, when I left in May 1994, was to set up all the computers in the building running Mosaic, with it looping through all the pages on a local website showcasing some glorious astronomical images, all for the departmental open day. This was probably the first time many of the visitors had come across the web, and the Wow factor was incredible.

Wednesday, October 21, 2015

Tribblix Turns Three

It's a little hard to put a fixed date on when I started work on Tribblix.

The idea - of building a custom distribution - had been floating around my mind in the latter days of OpenSolaris, and I registered the domain back in 2010.

While various bits of exploratory work had been going on in the meantime, it wasn't until the autumn of 2012 that serious development started. Eventually, after a significant number of attempts, I was able to produce a functional ISO image. That was:

-rw-r--r--   1 ptribble 493049856 Oct 21  2012 tribblix-0m0.iso

The first blog post was a few days later, but I'm going to put October 21st as the real date of birth.

Which means that Tribblix is 3 years old today!

In that time it's gone from a simple toy to a fully fledged distribution, most of the original targets I set myself have been met, it's been my primary computing environment for a while, it's proving useful as a platform for interesting experiments, and I'm looking forward to taking it even further in the next few years.

Tuesday, October 20, 2015

Minimal Viable Illumos

I've been playing with the creation of several minimal variants of illumos recently.

I looked at how little memory a minimal illumos distro could be installed and run in. Note that this was a properly built distribution - correctly packaged, most features present (if disabled), running the whole set of services using SMF.

In another dimension, I considered illumos pureboot, something that was illumos, the whole of illumos, and nothing but illumos.

Given that it was possible to boot illumos just to a shell, without all the normal SMF services running, how minimal can you make such a system.

At this point, if you're not thinking JEOS, Unikernels, or things like IncludeOS, then you should be.

So the first point is that you're always running this under a hypervisor of some sort. The range of possible hardware configurations you need to worry about is very limited - hypervisors emulate a small handful of common devices.

Secondly, the intention is never to install this. Not directly anyway. You create an image, and boot and run that. For this experiment, I'm simply running from a ramdisk. This is the way the live image boots, or you PXE boot, or even the way SmartOS boots.

First, the starting set of packages, both in Tribblix (SVR4) and IPS naming:

  • SUNWcsd=SUNWcsd
  • SUNWcs=SUNWcs
  • TRIBsys-library=system/library
  • TRIBsys-kernel=system/kernel
  • TRIBdrv-ser-usbser=driver/serial/usbser
  • TRIBsys-kernel-platform=system/kernel/platform
  • TRIBdrv-usb=driver/usb
  • TRIBsys-kernel-dtrace=system/kernel/dtrace/providers
  • TRIBsys-net=system/network
  • TRIBsys-lib-math=system/library/math
  • TRIBsys-libdiskmgt=system/library/libdiskmgt
  • TRIBsys-boot-grub=system/boot/grub
  • TRIBsys-zones=system/zones
  • TRIBdrv-storage-ata=driver/storage/ata
  • TRIBdrv-storage-ahci=driver/storage/ahci
  • TRIBdrv-i86pc-platform=driver/i86pc/platform
  • TRIBdrv-i86pc-ioat=driver/i86pc/ioat
  • TRIBdrv-i86pc-fipe=driver/i86pc/fipe
  • TRIBdrv-net-e1000g=driver/network/e1000g
  • TRIBsys-boot-real-mode=system/boot/real-mode
  • TRIBsys-file-system-zfs=system/file-system/zfs
You can probably go further, but I wanted to at least allow the possibility of talking to a storage device.

There are a few packages here that you might wonder about:

  • usbser is actually needed, it's a hard dependency of consconfig_dacf
  • many admin commands link against the zones libraries, so I add those even though they're not strictly necessary in most scenarios
  • the system will boot and run without zfs, but will panic if you run find down the dev tree
  • the system will panic if the real-mode stuff is missing
  • grub is needed to make the iso and boot it
It's possible to construct a bootable iso from the above components, which can then be customized.

I took two approaches to this. The simple way is to simply start chopping out the files you don't want. For example, man pages and includes. The second is to drop all of userland and only put back the files you need, one by one. I tend not to tweak the kernel much, that's non-trivial and you're only looking at marginal gains.

Working out which files are necessary is trial and error. Especially shared libraries, many of which are loaded lazily so you can't just use what the executable tells you - some of the libraries it's linked against will never be pulled in

I've put together some scripts that know how to create an image suitable for 32-bit or 64-bit hardware, we can be specific as we know exactly the environment we are going to run in - and you just build a new custom iso if things change, rather than try and build a generic image.

To be useful, the system needs to talk to something. You'll see that I've installed e1000g, which is what VirtualBox and qemu will give you by default. First, we have to get the root filesystem mounted read-write:

/etc/fs/ufs/mount -o remount,rw /devices/ramdisk:a /
Normally, there's a whole lot of network configuration handled by SMF, and it's all rather complicated. So we have to do it all by hand, which turns out to be relatively simple:

/sbin/ifconfig e1000g0 plumb
env SMF_FMRI=svc:/net/ip:d /lib/inet/ipmgmtd
/sbin/ifconfig e1000g0 inet up

You need ipmgmtd running, and it's expecting to be run under SMF, but the way it checks is to look for SMF_FMRI in the environment, so it's easy to fool.

If you've got your VirtualBox VM set up with a Host-only Adapter, you should be able to communicate with the guest. Not that there's anything present to talk to yet.

So I set up a simple Node.js server. Now node itself doesn't have many external dependencies - just the gcc4 runtime - and for basic purposes you just need the node binary and a js file with a 'hello world' http server.

With that, I have 64M of data in a 22M boot archive that is put on a 25M iso that boots up in a few seconds with an accessible web server. Pretty neat.

While it's pretty specific to Tribblix and my own build environment, there's an mvi repository on github containing all the scripts I used to build this, for those interested.