The Trouble with Tribbles...: December 2015

Thursday, December 24, 2015

The palatability of complexity

There seems to be a general trend to always add complexity to any system. Perhaps it's just the way most of our brains are wired, but we just can't help it.

Whether this be administrative tasks (filing your expenses), computer software (who hasn't suffered the dead hand of creeping featurism), systems administration, or even building a tax system, the trend seems always to be to keep adding additional layers of complexity.

Eventually, this stops when the complexity becomes unsustainable. People can rebel - they will go round the back of the system, taking short cuts to achieve their objectives without having to deal with the complexity imposed on them. Or they leave - for another company without the overblown processes, or another piece of software that is easier to use.

But there's another common way of dealing with the problem that is superficially attractive but with far worse consequences, which involves the addition of what I'll call a palatability layer. Rather than address the underlying problem, an additional layer is added on top to make it easier to deal with.

Which fails in two ways: you have failed to actually eliminate the underlying complexity, and the layer you've added will itself grow in complexity until it reaches the palatability threshold. (At which point, someone will add another layer, and the cycle repeats.)

Sometimes, existing bugs and accidental implementation artefacts become embedded as dogma in the new palatability layer. Worse, over time all expertise gravitates to the outermost layer, leaving you with nobody capable of understanding the innermost internals.

On occasion, the palatability layer becomes inflated to the position of a standard. (Which perhaps explains why standards are often so poor, and there are so many to choose from.)

For example, computer languages have grown bloated and complex. Features have been added, dependencies have grown. Every so often a new language emerges as an escape hatch.

Historically, I've often been opposed to the use of Configuration Management, because it would end up being used to support complexity rather than enforcing simplicity. This is not a fault of the tool, but of the humans who would abuse it.

As another example, I personally use an editor to write code rather than an IDE. That way, I can't write overly complex code, and it forces me to understand every line of code I write.

Every time you add a palatability layer, while you might think you're making things better, in reality you're helping build a house of cards on quicksand.

Monday, December 14, 2015

The cost of user-applied updates

Having updated a whole bunch of my (proprietary) devices with OS updates today, I was moved to tweet:

Imagining a world in which you could charge a supplier for the time it takes you to regularly update their software

On most of my Apple devices I'm applying updates to either iOS or MacOS on a regular basis. Very roughly, it's probably taking away an hour a month - I'm not including the elapsed time for the update (you just schedule this so you get yourself a cup of coffee or something), but there's a bit of planning involved, some level of interaction during the process, and then the need to fix up anything afterwards that got mangled by the update.

I don't currently run Windows, but I used to have to do that as well. And web browsers and applications. And that used to take forever, although current hardware helps (particularly the move away from spinning rust).

And then there's the constant stream of updates at the installed apps. Not all of which you can ignore - some games have regular mandatory updates and if you don't apply them the game won't even start.

If you charge for the time involved at commercial rates, you could easily justify $100 per month or $1000 per year. It's a significant drain on time and productivity, a burden being pushed from suppliers onto end users. Multiply that by the entire user base and you're looking at it having a significant impact on the economy of the planet.

And that's when things go smoothly. Sometimes systems go and apply updates at inconvenient times - I once had Windows update suddenly decide to update my work laptop just as I was shutting it down to go to the airport. Or just before an important meeting. If the update interferes with a critical business function, then costs can skyrocket very easily.

So you could avoid the manual interaction and associated costs, but then you end up giving users no way to prevent bad updates or to schedule them appropriately. Of course, if the things were adequately tested beforehand, or minimised, then there would be much less of a problem, but the update model seems to be to replace the whole shebang and not bother with testing. (Or worry about compatibility.)

It's not just time (or sanity), there's a very real cost in bandwidth. With phone or tablet images being measured in gigabytes, you can very easily blow your usage cap. (Even on broadband - if you're on metered domestic broadband then the usage cap might be 25GB/month, which is fine for email and general browsing, but OS and app updates for a family could easily hit that limit.)

The problem extends beyond computers (or things like phones that people do now think of as computers). My TV and BluRay player have a habit of updating themselves. (And one's significant other gets really annoyed if the thing decides to spend 10 minutes updating itself just as her favourite soap opera is about to start.)

As more and more devices are connected to the network, and update over the network, the problem's only going to get worse. While some updates are going to be necessary due to newly found bugs and security issues, there does seem to be a philosophy of not getting things right in the first place but shipping half-baked and buggy software, relying on being able to update it later.

Any realistic estimate of the actual cost involved in expecting all your end users to maintain the shoddy software that you ship is so high that the industry could never be expected to foot even a small fraction of the bill. Which is unfortunate, because a financial penalty would focus the mind and maybe lead to a much better update process.

Sunday, December 13, 2015

Zones beside Zones

Previously, I've described how to use the Crossbow networking stack in illumos to create a virtualized network topology with Zones behind Zones.

The result there was to create the ability to have zones on a private network segment, behind a proxy/router zone.

What, however, if you want the zones on one of those private segments to communicate with zones on a different private segment?

Consider the following two proxy zones:

A: address 192.168.10.1, subnet 10.1.0.0/16
B: address 192.168.10.2, subnet 10.2.0.0/16

And we want the zones in the 10.1.0.0 and 10.2.0.0 subnets to talk to each other. The first step is to add routes, so that packets from system A destined for the 10.2.0.0 subnet are sent to host B. (And vice versa.)

A: route add net 10.2.0.0/16 192.168.10.2
B: route add net 10.1.0.0/16 192.168.10.1

This doesn't quite work. The packets are sent, but recall that the proxy zone is doing NAT on behalf of the zones behind it. So packets leaving 10.1.0.0 get NATted on the way out, get delivered successfully to the 10.2.0.0 destination but then the reply packet gets NATted on its way back, so it doesn't really work.

So, all that's needed is to not NAT the packets that are going to the other private subnet. Remember the original NAT rule in ipnat.conf on host A would have been:

map pnic0 10.1.0.0/16 -> 0/32 portmap tcp/udp auto
map pnic0 10.1.0.0/16 -> 0/32

and we don't want to NAT anything that is going to 10.2.0.0, which would be:

map pnic0 from 10.1.0.0/16 ! to 10.2.0.0/16 -> 0/32 portmap tcp/udp auto
map pnic0 from 10.1.0.0/16 ! to 10.2.0.0/16 -> 0/32

And that's all there is to it. You now have a very simple private software-defined network with the 10.1 and 10.2 subnets joined together.

If you think this looks like the approach underlying Project Calico, you would be right. In Calico, you build up the network by managing routes (many more as it's per-host rather than the per-subnet I have here), although Calico has a lot more manageability and smarts built in to it rather than manually adding routes to each host.

While simple, there are obvious problems associated with scaling such a solution.

While adding and deleting routes isn't so bad, listing all the subnets in ipnat.conf would be tedious to say the least. The solution here would be to use the ippool facility to group the subnets.

How do we deal with a dynamic environment? While the back-end zones would come and go all the time, I expect the proxy/router zone topology to be fairly stable, so configuration churn would be fairly low.

The mechanism described here isn't limited to a single host, it easily spans multiple hosts. (With the simplistic routing as I've described it here, those hosts would have to be on the same network, but that's not a fundamental limitation.) My scripts in Tribblix just save details of how the proxy/router zones on a host are configured locally, so I need to extend the logic to a network-wide configuration store. That, at least, is well-known territory.

Thursday, December 10, 2015

Building an application in Docker

We have an application that we want to make easy for people to run. As in, really easy. And for people who aren't necessarily systems administrators or software developers.

The application ought to work on pretty much anything, but each OS platform has its quirks. So we support Ubuntu - it's pretty common, and it's available on most cloud providers too. And there's a simple install script that will set everything up for the user.

In the modern world, Docker is all the rage. And one advantage of Docker from the point of a systems administrator is that it decouples the application environment from the systems environment - if you're a Red Hat shop, you just run Red Hat on the systems, then add Docker and your developers can get a Ubuntu (or whatever) environment to run the application in. (The downside of this decoupling is that it gives people an excuse to write even less portable code than they do even now.)

So, one way for us to support more platforms is to support Docker. I already have the script that does everything, so isn't it going to be just a case of creating a Dockerfile like this and building it:

FROM ubuntu:14.04
RUN my_installer.sh

Actually, that turns out to be (surprisingly) close. It turns out to fail on just one line. The Docker build process runs as root, and when we try and initialise postgres with initdb, it errors out as it won't let you run postgres as root.

(As an aside, this notion of "root is unsafe" needs a bit of a rethink. In a containerized or unikernel world, there's nothing beside the one app, so there's no fundamental difference between root and the application user in many cases, and root in a containerized world is a bogus root anyway.)

OK, so we can run the installation as another user. We have to create the user first, of course, so something like:

FROM ubuntu:14.04
RUN useradd -m hbuild
USER hbuild
RUN my_installer.sh

Unfortunately, this turns out to fail all over the place. One thing my install script does is run apt-get via sudo to get all the packages that are necessary. We're user hbuild in the container and can't run sudo, and if we could we would get prompted, which is a bit tricky for the non-interactive build process. So we need to configure sudo so that this user won't get prompted for a password. Which is basically:

FROM ubuntu:14.04
RUN useradd -m -U -G sudo hbuild && \
echo "hbuild ALL=(ALL) NOPASSWD: ALL" >> /etc/sudoers
USER hbuild
RUN my_installer.sh

Which solves all the sudo problems, but the script also references $USER (it creates some directories as root, then chowns them to the running user so the build can populate them), and the Docker build environment doesn't set USER (or LOGNAME, as far as I can tell). So we need to populate the environment the way the script expects:

FROM ubuntu:14.04
RUN useradd -m -U -G sudo hbuild && \
echo "hbuild ALL=(ALL) NOPASSWD: ALL" >> /etc/sudoers
USER hbuild
ENV USER hbuild
RUN my_installer.sh

And off it goes, cheerfully downloading and building everything.

I've skipped over how the install script itself ends up on the image. I could use COPY, or even something very crude like:

FROM ubuntu:14.04
RUN apt-get install -y wget
RUN useradd -m -U -G sudo hbuild && \
    echo "hbuild ALL=(ALL) NOPASSWD: ALL" >> /etc/sudoers
USER hbuild
ENV USER hbuild
RUN cd /home/hbuild && \
    wget http://my.server/my_installer.sh && \
    chmod a+x my_installer.sh && \
./my_installer.sh

This all works, but is decidedly sub-optimal. Leaving aside the fact that we're running both the application and the database inside a single container (changing that is a rather bigger architectural change than we're interested in right now), the Docker images end up being huge, and you're downloading half the universe each time. So to do this properly you would add an extra RUN step that did all the packaging and cleaned up after itself, so you have a base layer to build the application on.

What this does show, though, is that it's not that hard to take an existing deployment script and wrap it inside Docker - all it took here was a little fakery of the environment to more closely align with how the script was expecting to be run.