Tuesday, April 15, 2014

Partial root zones

In Tribblix, I support sparse-root and whole-root zones, which work largely the same way as in Solaris 10.

The implementation of zone creation is rather different. The original Solaris implementation extended packaging - so the packaging system, and every package, had to be zone-aware. This is clearly unsustainable. (Unfortunately, the same mistake was made when IPS was introduced.)

Apart from creating work, this approach limits flexibility - in order to innovate with zones, for example by adding new types, you have to extend the packaging system, and then modify every package in existence.

The approach taken by Tribblix is rather different. Instead of baking zone architecture into packaging, packaging is kept dumb and the zone creation scripts understand how packages are put together.

In particular, the decision as to whether a given file is present in a zone (and how it ends up there) is not based on package attributes, but is a simple pathname filter. For example, files under /kernel never end up in a zone. Files under /usr might be copied (for a whole-root zone) or loopback mounted (for a sparse-root zone). If it's under /var or /etc, you get a fresh copy. And so on. But the decision is based on pathname.

It's not just the files within packages that get copied. The package metadata is also copied; the contents file is simply filtered by pathname - and that's how the list of files to copy is generated. This filtering takes place during zone creation, and is all done by the zone scripts - the packaging tools aren't invoked (one reason why it's so quick). The scripts, if you want to look, are at /usr/lib/brand/*/pkgcreatezone.

In the traditional model, the list of installed packages in the zone is (initially) identical to that in the global zone. For a sparse-root zone, you're pretty much stuck with that. For a whole-root zone, you can add and remove packages later.

I've been working on some alternative models for zones in Tribblix that add more flexibility to zone creation. These will appear in upcoming releases, but I wanted to talk about the technology.

The first of these is what you might call a partial-root zone. This is similar to a whole-root zone in the sense that you get an independent copy, rather than being loopback mounted. And, it's using the same TRIBwhole brand. The difference is that you can specify a subset of the overlays present in the global zone to be installed in the zone. For example, you would use the following install invocation:

zoneadm -z myzone install -o developer

and only the developer overlay (and the overlays it depends on) will be installed in the zone.

This is still a copy - the installed files in the global zone are the source of the files that end up in the zone, so there's still no package installation, no need for repository access, and it's pretty quick.

This is still a filter, but you're now filtering both on pathname and package name.

As for package metadata, for partial-root zones, references to the packages that don't end up being used are removed.

That's the subset variant. The next obvious extension is to be able to specify additional packages (or, preferably, overlays) to be installed at zone creation time. That does require an additional source of packages - either a repository or a local cache - which is why I treat it as a logically distinct operation.

Time to get coding.

Sunday, April 13, 2014

Cloud analogies: Food As A Service

There's a recurring analogy of Cloud as utility, such as electrical power. I'm not convinced by this, and regard a comparison of the Cloud with the restaurant trade as more interesting. Read on...

Few IT departments build their own hardware, in the same way that few people grow their own food or keep their own livestock. Most buy from a supplier, in the same way that most buy food from a supermarket.

You could avoid cooking by eating out for every meal. Food as a Service, in current IT parlance.

The Cloud shares other properties with a restaurant. It operates on demand. It's self service, in the sense that anyone can walk in and order - you don't have to be a chef. There's a fixed menu of dishes, and portion sizes are fixed. It deals with wide fluctuations of usage throughout the day. For basic dishes, it can be more expensive than cooking at home. It's elastic, and scales, whereas most people would struggle if 100 visitors suddenly dropped by for dinner.

There's a wide choice of restaurants. And a wide variety of pricing models to match - Prix Fixe, a la carte, all you can eat.

Based on this analogy, the current infatuation with moving everything to the cloud would be the same as telling everybody that they shouldn't cook at home, but should always order in or eat out. You no longer need a kitchen, white goods, or utensils, nor do you need to retain any culinary skills.

Sure, some people do eat primarily at a basic burger bar. Some eat out all the time. Some have abandoned the kitchen. Is it appropriate for everyone?

Many people go out to eat not necessarily to avoid preparing their own food, but to eat dishes they cannot prepare at home, to try something new, or for special occasions.

In other words, while you can eat out for every meal, Food as a Service really comes into its own when it delivers capabilities beyond that of your own kitchen. Whether that be in the expertise of its staff, the tools in its kitchens, or the special ingredients that it can source, a restaurant can take your tastebuds places that your own kitchen can't.

As for the lunacy that is Private Cloud, that's really like setting up your own industrial kitchen and hiring your own chefs to run it.

Wednesday, April 02, 2014

Slimming down logstash

Following on from my previous post on logstash, it rapidly becomes clear that the elasticsearch indices grow rather large.

After a very quick look, it was obvious that some of the fields I was keeping were redundant or unnecessary.

For example, why keep the pathname of the log file itself? It doesn't change over time, and you can work out the name of the file easily (if you ever wanted it, and I can't see why you ever would - if you wanted to identify a source, that ought to be some other piece of data you create).

Also, why keep the full log message? You've parsed it, broken it up, and stored the individual fields you're interested in. So why keep the whole thing, a duplicate of the information you're already storing?

With that in mind, I used a mutate clause to remove the file name and the original log entry, like so:

  mutate {
     remove_field => "path"
     remove_field => "message"
  }


After this simple change, the daily elasticsearch indices on the first system I tried this on shrank from 4.5GB to 1.6GB - almost a factor of 3. Definitely worthwhile, and there are benefits in terms of network traffic, search performance, elasticsearch memory utilization, and capacity for future growth as well.