Sunday, August 02, 2015

Blank Zones

I've been playing around with various zone configurations on Tribblix. This is going beyond the normal sparse-root, whole-root, partial-root, and various other installation types, into thinking about other ways you can actually use zones to run software.

One possibility is what I'm tentatively calling a Blank zone. That is, a zone that has nothing running. Or, more precisely, just has an init process but not the normal array of miscellaneous processes that get started up by SMF in a normal boot.

You might be tempted to use 'zoneadm ready' rather than 'zoneadm boot'. This doesn't work, as you can't get into the zone:

zlogin: login allowed only to running zones (test1 is 'ready').
So you do actually need to boot the zone.

Why not simply disable the SMF services you don't need? This is fine if you still want SMF and most of the services, but SMF itself is quite a beast, and the minimal set of service dependencies is both large and extremely complex. In practice, you end up running most things just to keep the SMF dependencies happy.

Now, SMF is started by init using the following line (I've trimmed the redirections) from /etc/inittab

smf::sysinit:/lib/svc/bin/svc.startd

OK, so all we have to do is delete this entry, and we just get init. Right? Wrong! It's not quite that simple. If you try this then you get a boot failure:

INIT: Absent svc.startd entry or bad contract template.  Not starting svc.startd.
Requesting maintenance mode

In practice, this isn't fatal - the zone is still running, but apart from wondering why it's behaving like this it would be nice to have the zone boot without errors.

Looking at the source for init, it soon becomes clear what's happening. The init process is now intimately aware of SMF, so essentially it knows that its only job is to get startd running, and startd will do all the work. However, it's clear from the code that it's only looking for the smf id in the first field. So my solution here is to replace startd with an infinite sleep.

smf::sysinit:/usr/bin/sleep Inf

(As an aside, this led to illumos bug 6019, as the manpage for sleep(1) isn't correct. Using 'sleep infinite' as the manpage suggests led to other failures.)

Then, the zone boots up, and the process tree looks like this:

# ptree -z test1
10210 zsched
  10338 /sbin/init
    10343 /usr/bin/sleep Inf

To get into the zone, you just need to use zlogin. Without anything running, there aren't the normal daemons (like sshd) available for you to connect to. It's somewhat disconcerting to type 'netstat -a' and get nothing back.

For permanent services, you could run them from inittab (in the traditional way), or have an external system that creates the zones and uses zlogin to start the application. Of course, this means that you're responsible for any required system configuration and for getting any prerequisite services running.

In particular, this sort of trick works better with shared-IP zones, in which the network is configured from the global zone. With an exclusive-IP zone, all the networking would need to be set up inside the zone, and there's nothing running to do that for you.

Another thought I had was to use a replacement init. The downside to this is that the name of the init process is baked into the brand definition, so I would have to create a duplicate of each brand to run it like this. Just tweaking the inittab inside a zone is far more flexible.

It would be nice to have more flexibility. At the present time, I either have just init, or the whole of SMF. There's a whole range of potentially useful configurations between these extremes.

The other thing is to come up with a better name. Blank zone. Null zone. Something else?

Saturday, August 01, 2015

The lunacy of -Werror

First, a little history for those of you young enough not to have lived through perl. In the perl man page, there's a comment in the BUGS section that says:

The -w switch is not mandatory.

(The -w switch enables warnings about grotty code.) Unfortunately, many developers misunderstood this. They wrote their perl script, and then added the -w switch as though it was a magic bullet that fixed all the errors in your code, without even bothering to think about looking at the output it generated or - heaven forbid - actually fixing the problems. The result was that, with a CGI script, your apache error log was full of output that nobody ever read.

The correct approach, of course, is to develop with the -w switch, fix all the warnings it reports as part of development, and then turn it off. (Genuine errors will still be reported anyway, and you won't have to sift through garbage to find them, or worry about your service going down because the disk filled up.)

Move on a decade or two, and I'm starting to see a disturbing number of software packages being shipped that have -Werror in the default compilation flags. This almost always results in the build failing.

If you think about this for a moment, it should be obvious that enabling -Werror by default is a really dumb idea. There are two basic reasons:

  1. Warnings are horribly context sensitive. It's difficult enough to remove all the warnings given a single fully constrained environment. As soon as you start to vary the compiler version, the platform you're building on, or the versions of the (potentially many) prerequisites you're building against, getting accidental warnings is almost inevitable. (And you can't test against all possibilities, because some of those variations might not even exist at the point of software release.)
  2. The warnings are only meaningful to the original developer. The person who has downloaded the code and is trying to build it has no reason to be burdened by all the warnings, let alone be inconvenienced by unnecessary build failures.
To be clear, I'm not saying - at all - that the original developer shouldn't be using -Werror and fixing all the warnings (and you might want to enable it for your CI builds to be sure you catch regressions), but distributing code with it enabled is simply being rude to your users.

(Having a build target that generates a warning report that you can send back to the developer would be useful, though.)

Friday, July 24, 2015

boot2docker on Tribblix

Containers are the new hype, and Docker is the Poster Child. OK, I've been running containerized workloads on Solaris with zones for over a decade, so some of the ideas behind all this are good; I'm not so sure about the implementation.

The fact that there's a lot of buzz is unmistakeable, though. So being familiar with the technology can't be a bad idea.

I'm running Tribblix, so running Docker natively is just a little tricky. (Although if you actually wanted to do that, then Triton from Joyent is how to do it.)

But there's boot2docker, which allows you to run Docker on a machine - by spinning up a copy of VirtualBox for you and getting that to actually do the work. The next thought is obvious - if you can make that work on MacOS X or Windows, why not on any other OS that also supports VirtualBox?

So, off we go. First port of call is to get VirtualBox installed on Tribblix. It's an SVR4 package, so should be easy enough. Ah, but, it has special-case handling for various Solaris releases that cause it to derail quite badly on illumos.

Turns out that Jim Klimov has a patchset to fix this. It doesn't handle Tribblix (yet), but you can take the same idea - and the same instructions - to fix it here. Unpack the SUNWvbox package from datastream to filesystem format, edit the file SUNWvbox/root/opt/VirtualBox/vboxconfig.sh, replacing the lines

             # S11 without 'pkg'?? Something's wrong... bail.
             errorprint "Solaris $HOST_OS_MAJORVERSION detected without executable $BIN_PKG !? I are confused."
             exit 1

with

         # S11 without 'pkg'?? Likely an illumos variant
         HOST_OS_MINORVERSION="152"

and follow Jim's instructions for updating the pkgmap, then just pkgadd from the filesystem image.

Next, the boot2docker cli. I'm assuming you have go installed already - on Tribblix, "zap install go" will do the trick. Then, in a convenient new directory,

env GOPATH=`pwd` go get github.com/boot2docker/boot2docker-cli

That won't quite work as is. There are a couple of patches. The first is to the file src/github.com/boot2docker/boot2docker-cli/virtualbox/hostonlynet.go. Look for the CreateHostonlyNet() function, and replace

    out, err := vbmOut("hostonlyif", "create")
    if err != nil {
        return nil, err
    }


with

    out, err := vbmOut("hostonlyif", "create")
    if err != nil {
               // default to vboxnet0
        return &HostonlyNet{Name: "vboxnet0"}, nil
    }


The point here is that , on a Solaris platform, you always get a hostonly network - that's what vboxnet0 is - so you don't need to create one, and in fact the create option doesn't even exist so it errors out.

The second little patch is that the arguments to SSH don't quite match the SunSSH that comes with illumos, so we need to remove one of the arguments. In the file src/github.com/boot2docker/boot2docker-cli/util.go, look for DefaultSSHArgs and delete the line containing IdentitiesOnly=yes (which is the option SunSSH doesn't recognize).

Then you need to rebuild the project.

env GOPATH=`pwd` go clean github.com/boot2docker/boot2docker-cli
env GOPATH=`pwd` go build github.com/boot2docker/boot2docker-cli

Then you should be able to play around. First, download the base VM image it'll run:

./boot2docker-cli download

Configure VirtualBox

./boot2docker-cli init

Start the VM

./boot2docker-cli up

Log into it

./boot2docker-cli ssh

Once in the VM you can run docker commands (I'm doing it this way at the moment, rather than running a docker client on the host). For example

docker run hello-world

Or,

docker run -d -P --name web nginx
 
Shut the VM down

./boot2docker-cli down

While this is interesting, and reasonably functional, certainly to the level of being useful for testing, a sign of the churn in the current container world is that the boot2docker cli is deprecated in favour of Docker Machine, but building that looks to be rather more involved.

Wednesday, July 15, 2015

How to build a server

So, you have a project and you need a server. What to do?
  1. Submit a ticket requesting the server
  2. Have it bounced back saying your manager needs to fill in a server build request form
  3. Manager submits a server build request form
  4. Server build manager assigns the build request to a subordinate
  5. Server builder creates a server build workflow in the workflow tool
  6. A ticket is raised with the network team to assign an IP address
  7. A ticket is raised with the DNS team to enter the server into DNS
  8. A ticket is raised with the virtual team to assign resources on the VMware infrastructure
  9. Take part in a 1000 message 100 participant email thread of doom arguing whether you really need 16G of memory in your server
  10. A ticket is raised with the storage team to allocate storage resources
  11. Server builder manually configures the Windows DHCP server to hand out the IP address
  12. Virtual Machine is built
  13. You're notified that the server is "ready"
  14. Take part in a 1000 message 100 participant email thread of doom arguing that when you asked for Ubuntu that's what you actually wanted rather then the corporate standard of RHEL5
  15. A ticket is raised with the Database team to install the Oracle client
  16. Database team raise a ticket with the unix team to do the step of the oracle install that requires root privileges
  17. A ticket is raised with the ops team to add the server to monitoring
  18. A ticket is raised with your outsourced backup provider to enable backups on the server
  19. Take part in a 1000 message 100 participant email thread of doom over whether the system has been placed on the correct VLAN
  20. Submit another ticket to get the packages you need installed
  21. Move server to another VLAN, redoing steps 6, 7, and 11
  22. Submit another ticket to the storage team because they set up the NFS exports on their filers for the old IP address
There's actually a few more steps in many cases, but I think you get the idea.

This is why devops is a thing, streamlining (eradicating) processes like the above.

And this is (one reason) why developers spin up machines in the cloud. It's not that the cloud is better or cheaper (because often it isn't), it's simply to avoid dealing with dinosaurs of legacy corporate IT departments which only exist to prevent their users getting work done.

My approach to this was rather different.

User: Can I have a server?

Me: Sure. What do you want to call it?

[User, stunned at not immediately being told to get lost, thinks for a moment.]

Me: That's fine. Here you go. [Types a command to create a Solaris Zone.]

Me: Engages in a few pleasantries, to delay the user for a minute or two so that the new system will be ready and booted when they get back to their desk.

Thursday, June 11, 2015

Badly targetted advertising

The web today is essentially one big advertising stream. Everywhere you go you're bombarded by adverts.

OK, I get that it's necessary. Sites do cost money to run, people who work on them have to get paid. It might be evil, but (in the absence of an alternative funding model) it's a necessary evil.

There's a range of implementations. Some subtle, others less so. Personally, I take note of the unsubtle and brash ones, the sort that actively interfere with what I'm trying to achieve, and mark them as companies I'm less likely to do business with. The more subtle ones I tolerate as the price for using the modern web.

What is abundantly clear, though, is how much tracking of your activities goes on. For example, I needed to do some research on email suppliers yesterday - and am being bombarded with adverts for email services today. If I go away, I get bombarded with adverts for hotels at my destination. Near Christmas I get all sorts of advertising popping up based on the presents I've just purchased.

The thing is, though, that most of these adverts are wrong and pointless. The idea that because I searched for something, or visited a website on a certain subject, might indicate that I would be interested in the same things in future, is simply plain wrong.

Essentially, if I'm doing something on the web, then I have either (a) succeeded in the task at hand (bought an item, booked a hotel), or (b) failed completely. In either case, basing subsequent advertising on past activities is counterproductive.

If I've booked a hotel, then the last thing I'm going to do next is book another hotel for the same dates at the same location. More sensible behaviour for advertisers would be to prime the system to stop advertising hotels, and then advertise activities and events (for which they even know the dates) at my destination. It's likely to be more useful for me, and more likely to get a successful response for the advertiser. Likewise, once I've bought an item, stop advertising that and instead move on to advertising accessories.

And if I've failed in my objectives, ramming more of the same down my throat is going to frustrate me and remind me of the failure.

In fact, I wonder if a better targeting strategy would be to turn things around completely, and advertise random items excluding the currently targeted items. That opens up the possibility of serendipity - triggering a response that I wasn't even aware of, rather than trying to persuade me to do something I already actively wanted to do.

Sunday, June 07, 2015

Building LibreOffice on Tribblix

Having decent tools is necessary for an operating system to be useful, and one of the basics for desktop use is an office suite - LibreOffice being the primary candidate.

Unfortunately, there aren't prebuilt binaries for any of the Solaris or illumos distros. So I've been trying to build LibreOffice from source for a while. Finally, I have a working build on Tribblix.

This is what I did. Hopefully it will be useful to other distros. This is just a straight dump of my notes.

First, you'll need java (the jdk), and the perl Archive::Zip module. You'll need boost, and harfbuzz with the icu extensions. Plus curl, hunspell, cairo, poppler, neon.

Then you'll need to build (look on this page for links to some of this stuff):

  • cppunit-1.13.2
  • librevenge-0.0.2
  • libwpd-0.10.0
  • libwpg-0.3.0
  • libmspub-0.1.2
  • libwps-0.3.1
  • mdds_0.11.2
  • libixion-0.7.0
  • liborcus-0.7.0
  • libvisio-0.1.1

If you don't tell it otherwise, LibreOffice will download these and try to build them itself. And generally these have problems building cleanly, which it's fairly easy to fix while building them in isolation, but would be well nigh impossible when they're buried deep inside the LibreOffice build system

For librevenge, pass --disable-werror to configure.

For libmspub, replace the call to pow() in src/lib/MSPUBMetaData.cpp with std::pow().

For libmspub, remove zlib from the installed pc file (Tribblix, and some of the other illumos other distros, don't supply a pkgconfig file for zlib).

For liborcus, run the following against all the Makefiles that the configure step generates:

gsed -i 's:-DMDDS_HASH_CONTAINER_BOOST:-pthreads -DMDDS_HASH_CONTAINER_BOOST:'

For mdds, make sure you have a PATH that has the gnu install ahead of the system install program when running make install.
For ixion, it's a bit more involved. You need some way of getting -pthreads past configure *and* make. For configure, I used:

env boost_cv_pthread_flag=-pthreads CFLAGS="-O -pthreads" CPPFLAGS="-pthreads" CXXFLAGS="-pthreads" configure ...

and for make:

gmake MDDS_CFLAGS=-pthreads

For orcus, it looks to pkgconfig to find zlib, so you'll need to prevent that:

 env ZLIB_CFLAGS="-I/usr/include" ZLIB_LIBS="-lz" configure ...

For libvisio, replace the call to pow() in src/lib/VSDMetaData.cpp with std::pow().

For libvisio, remove zlib and libxml-2.0 from the installed pc file.

If you want to run a parallel make, don't use gmake 3.81. Version 4.1 is fine.

With all those installed you can move on to LibreOffice.

Unpack the main tarball.

chmod a+x bin/unpack-sources
mkdir -p external/tarballs


and then symlink or copy the other tarballs (help, translations, dictionaries) into external/tarballs (otherwise, it'll try downloading them again).

Download and run this script to patch the builtin version of glew.

Edit the following files:

  • svx/Executable_gengal.mk
  • sw/Executable_tiledrendering.mk
  • vcl/Executable_ui-previewer.mk
  • desktop/Library_sofficeapp.mk
  • vcl/Library_vcl.mk

And replace "LINUX" with "SOLARIS". That part of the makefiles is needed on all unix-like systems, not just Linux.

In the file

sc/source/core/tool/interpr1.cxx

replace the call to pow() on line 3160 with std::pow()

In the file

sal/qa/inc/valueequal.hxx

replace the call to pow() on line 87 with std::pow()

In the file

include/vcl/window.hxx

You'll need to #undef TRANSPARENT before it's used (otherwise, it picks up a rogue definition from the system).

And you'll need to create a compilation symlink:

mkdir -p  instdir/program
ln -s libGLEW.so.1.10 instdir/program/libGLEW.so

This is the configure command I used:

env PATH=/usr/gnu/bin:$PATH \
./configure --prefix=/usr/versions/libreoffice-44 \
--with-system-hunspell \
--with-system-curl \
--with-system-libpng \
--with-system-clucene=no \
--with-system-libxml \
--with-system-jpeg=no \
--with-system-cairo \
--with-system-harfbuzz \
--with-gnu-cp=/usr/gnu/bin/cp \
--with-gnu-patch=/usr/gnu/bin/patch \
--disable-gconf \
--without-doxygen \
--with-system-openssl \
--with-system-nss \
--disable-python \
--with-system-expat \
--with-system-zlib \
--with-system-poppler \
--disable-postgresql-sdbc \
--with-system-icu \
--with-system-neon \
--disable-odk \
--disable-firebird-sdbc \
--without-junit \
--disable-gio \
--with-jdk-home=/usr/jdk/latest \
--disable-gltf \
--with-system-libwps \
--with-system-libwpg \
--with-system-libwpd \
--with-system-libmspub \
--with-system-librevenge \
--with-system-orcus \
--with-system-mdds \
--with-system-libvisio \
--with-help \
--with-vendor="Tribblix" \
--enable-release-build=yes \
--with-parallelism=8

and then to make:

env LD_LIBRARY_PATH=/usr/lib/mps:`pwd`/instdir/ure/lib:`pwd`/instdir/sdk/lib:`pwd`/instdir/program \
PATH=/usr/gnu/bin:$PATH \
/usr/gnu/bin/make -k build

(Using 'make build' is supposed to avoid the checks, many of which fail. You'll definitely need to run 'make -k' with a parallel build, because otherwise some of the test failures will stop the build before all the other parallel parts of the build have finished.)

Then create symlinks for all the .so files in /usr/lib/mps in instdir/program, and instdir/program/soffice should start.

Sunday, May 31, 2015

What sort of DevOps are you?

What sort of DevOps are you? Can you even define DevOps?
 
Nobody really knows what DevOps is, there are almost as many definitions as practitioners. Part of the problem is that the name tends to get tacked onto anything to make it seem trendy. (The same way that "cloud" has been abused.)
 
Whilst stereotypical, I tend to separate the field into the puritans and the pragmatists.
 
The puritanical vision of DevOps is summarized by the mantra of "Infrastructure as Code". In this world, it's all about tooling (often, although not exclusively, based around configuration management).
 
From the pragmatist viewpoint, it's rather about driving organizational and cultural change to enable people to work together to benefit the business, instead of competing with each other to benefit their own department or themselves. This is largely a reaction to legacy departmental silos that simply toss tasks over the wall to each other.
 
I'm firmly in the pragmatist camp. Tooling helps, but you can use all the tools in the world badly if you don't have the correct philosophy and culture.
 
I see a lot of emphasis being placed on tooling. Partly this is because in the vendor space, tooling is all there is - vendors frame the discussion in terms of how tooling (in particular, their tool) can improve your business. I don't have a problem with vendors doing this, they have to sell stuff after all, but I regard conflating their offerings with DevOps in the large, or even defining DevOps as a discipline, as misleading at best.
 
Another worrying trend (I'm seeing an awful lot of this from recruiters, not necessarily practitioners) is the stereotypical notion that DevOps is still about getting rid of legacy operations and having developers carry the pager. This again starts out in terms of a conflict between Dev and Ops and, rather than resolving it by combining forces, simply throws one half of the team away.
 
Where I do see a real problem is that smaller organizations might start out with only developers, and then struggle to adopt operational practices. Those of us with a background in operations need to find a way to integrate with development-led teams and organizations. (The same problem arises when you have a subversive development team in a large business that's going round the back of traditional operations, and eventually find that they need operational support.)
 
I was encouraged that the recent DOXLON meetup had a couple of really interesting talks about culture. Practitioners know that this is important, we really need to get the word out.

Where have all the SSDs gone?

My current and previous laptop - that's a 3-year timespan - both had an internal SSD rather than rotating rust. The difference between those and prior systems was like night and day - instant-on, rather than the prior experience of making a cup of coffee while waiting for the old HDD system to stagger into life.

My current primary desktop system is also SSD based. Power button to fully booted is a small number of seconds. Applications are essentially instant - certainly compared to startup times for things like firefox that used to be double-digit seconds before it was ready to go.

(This startup speed changes usage patterns. Who really needs suspend/resume when the system boots in the time it takes to settle comfortably in your chair?)

So I was a little surprised, while browsing in a major high street electronics retailer, to find hardly any evidence of SSDs. Every desktop system had an HDD. Almost all the laptops were HDD based. A couple of the all-in-ones had hybrid drives. SSDs were conspicuous by their absence.

I had actually noticed this trend while looking online. I've just checked the desktops on the Dell site, and there's no sign of a system with an SSD option.

Curious, I asked the shop assistant, who replied that SSDs were far too expensive.

I'm not sure I buy the cost argument. An SSD actually costs the same as an HDD - at least, the range of prices is exactly the same. So the prices will stay unchanged, but obviously the capacity will be quite a bit less. And it looks like the sales pitch is about capacity.

But even there, the capacity numbers are meaningless. It's purely bragging rights, disconnected from reality. With any of the HDD options, you're looking at hundreds of thousands of songs or pictures. Very few typical users will need anything like that much - and if you do, you're going to need to look at redundancy or backup. And with media streaming and cloud-based backup, local storage is more a liability than an asset.

So, why such limited penetration of SSDs into the home computing market?