The Trouble with Tribbles...: 2021

Friday, December 31, 2021

The three strands of Information Technology

How are IT departments structured? I've seen a variety of ways to do this. It depends on the individual business, but over the years I've come up with a way to think about this.

When thinking about Information Technology (IT), it naturally splits into 3 separate strands:

IT for the business

This is the provision of facilities for HR, Finance, Sales, and the like; basic facilities for the organisation to operate as a business

IT for the employee

This is the provision of systems and tools for employees to be able to work at all; laptops/desktops/mobile devices, and communications systems such as telephony and email, together with a way for staff to store and collaborate on documents

IT for the customer

This is the provision of services that your customers use, whether that's a product you sell in its own right, or as a mechanism to sell other products

The relative importance of these 3 strands depends on the nature of the business, of course. And very small organisations might not even have all 3 strands in any meaningful sense.

Structurally, there are two senior roles that an organisation might have, the CIO and CTO. And the way things would naturally be laid out is that the CIO looks after IT for the business and IT for the employee, while the CTO gets IT for the customer.

Splitting things this way works because the characteristics of the strands are quite different. The responsibilities of the CIO are inward-facing, those of the CTO are outward-facing. The work of the CIO is about managing standardised commodities, while the CTO's role is to provide differentiation. Polar opposites, in a way.

There's a third role, that of the CISO, responsible for information security. This is slightly different in that it cuts across all 3 strands. As such, if you have both a CIO and a CTO, it isn't entirely obvious which of the two, if either, should take on the CISO role.

Given the different nature of these 3 strands, where does the IT department (loosely defined as those people whose job is IT) fit? Should you even have one? The job requirements for the 3 strands are sufficiently different that having different IT teams for each strand would seem to make an awful lot of sense, rather than a central IT department. And the IT team for each strand reports to the CIO or CTO as appropriate. In particular, having a product developed in the CTO part of the organisation and then thrown over the wall to be run by an operations team in the CIO organisation is one of the organisational antipatterns that never made any sense and was a major driver for DevOps.

Thus, when structuring the delivery of IT in an organisation, considering the divergent needs of the 3 different IT strands ought to be taken into account. Worst case is a single department that standardises on the same solution to deliver all 3 strands - standardisation is a common refrain of management, but what it really means here is that at least 2 strands (if not all 3) are delivered in a sub-standard way, often in a way that's actually completely unsuitable.

There is a central IT function that does cut across all 3 strands, in the same way that a CISO does at the management level. Which is a compliance function or security office. But for most other functions, you're really looking at providing distinct deliveries for each strand.

Wednesday, December 22, 2021

The cost of cloud

Putting your IT infrastructure into the cloud seems to be the "in" thing. It's been around for a while, of course. And, like most things related to IT, there are tradeoffs to be made.

My rough estimate is that the unit cost of provisioning a service on AWS is about 3 times that of a competent IT organization providing a similar service in house. Other people have come to the same number, and it hasn't really changed much over the last decade. (If you don't think 3x is right, consider what AWS' gross margin is.)

Some services offered by AWS deviate from that simple 3x formula. The two obvious ones are network costs, which as Cloudflare have argued are many times higher than you would expect, and S3, which you're going to struggle to beat. (Although if you're going to use S3 as a distribution site then the network costs will get you, think about Wasabi for that.)

And yet, many organizations move to the cloud to "save money". I'm going to ignore the capex versus opex part of that, and simply note that many IT organizations have in-house operations that are neither very efficient nor cost-effective. In particular, traditional legacy IT infrastructures are ridiculously overpriced. (If you're using commercial virtualization platforms and/or SAN storage, then you're overpaying by as much as a factor of 10, and getting an inferior product into the bargain - so that while many organizations could save a huge amount of money by moving to the cloud, they could save even more by running their internal operations better.)

Often the cost saving associated with a migration - not just cloud, this applies to other transitions too - comes about not because the new solution is cheaper, but because a migration gives a business leverage to introduce better practices. Practices that, if used for your on-premise deployments, would save far more than the cloud ever could. Sometimes, you need to do an end run round an entrenched legacy IT empire.

Another consideration is that the cloud has often been touted as something where you pay for what you use, which isn't always quite correct. For many services, you pay for what you configure. And some services are nowhere near as elastic as you might wish.

Capacity planning doesn't go away either, it's actually more important to get the sizing right, and while you can easily buy more capacity, you have to ensure you have the financial capacity to pay the bills.

Note that I'm not saying you should always run your systems on-premise, nor that it will always be cheaper.

Below a certain scale, doing it yourself isn't financially beneficial. There's a minimum configuration of infrastructure you need in order to get something that works, and many small organizations have needs below that. But generally, the smaller providers are likely to be a better option in that case than full-on cloud offerings.

Having the operational capability to support your infrastructure is also crucial. If you're going to support your own hardware, you really need a team, which is going to set a minimum scale at which operations are worthwhile.

This becomes even more true if you need to deploy globally. It's difficult to do that in-house with a small team, and you have to be pretty large to be able to staff multiple teams in different geographies. A huge advantage of using the cloud for this is that you can deploy into pretty much any location without driving your costs insane. Who wants to hire a full team in every country you operate in? And operationally, it's the same wherever you go, which makes things a lot easier.

In recent times, the Coronavirus pandemic has also had an impact. End user access to colocation facilities has been restricted - we've been able to do repairs, recently, but we've had to justify any datacenter visits as essential.

There are certain workloads that are well matched to the cloud, of course, Anything highly variable, with spikes above 3x the background, will be cheaper in the cloud where you can deploy capacity just for the spike than it would be in house where you either overprovision for peak load or accept that there's a spike you can't handle.

The cloud is also great for experimentation. You can try any number of memory and CPU configurations to see what works well. Much easier than trying to guess and buying equipment that isn't optimal. (This sort of sizing exercise is far less relevant if you have decent virtualization like zones.)

You can even spin up a range of entirely different systems. I do this when testing, just run each of a whole range of Linux distros for an hour or so each.

What the above cases say is that even if the unit cost of cloud resources is high, the cloud gives you more of an opportunity to optimize the number of units. And, when it comes to scaling, this means the ability to scale down is far more important than the ability to scale up.

I use AWS for a lot of things, but I strongly regard the cloud as just another tool, to be used as occasion demands, rather than because the high priests say you should.

Monday, December 20, 2021

Keeping Java alive on illumos

Back in 2019, a new JEP (JDK Enhancement Proposal) appeared.

JEP 362: Deprecate the Solaris and SPARC Ports

Of course, for those of us running Solaris or illumos (which is the same platform as far as Java is concerned), this was a big deal. Losing support for a major language on the platform was potentially a problem.

The stated reason for removal was:

Dropping support for these ports will enable contributors in the OpenJDK Community to accelerate the development of new features that will move the platform forward.

Clearly, this reflected a belief that maintaining Solaris and/or SPARC was a millstone dragging Java down. Still, it's their project, they can make whatever decisions they like, despite those of us who thought it was a bad move.

Eventually, despite objections, the ports were removed, towards the end of the JDK15 cycle.

At which point I simply carried on building OpenJDK. All I did was take the patch from the commit that removed Solaris support, applied that backwards, and added on top the pkgsrc patches that Jonathan Perkin had originally developed to support a gcc port on Solaris and illumos - patches we had already been using extensively from JDK11 onwards.

At that point I wasn't quite sure how sustainable this was. My aim was to support it as long as it wasn't proving too onerous or difficult, and my most optimistic hope was that we might be able to get to Java 17 which was planned as the next LTS release.

The modus operandi was really very simple. Every week a new tag is created. Download the tag, apply the patches, fix any errors in the patch set, try a build, hopefully fix any problems breaking the build.

Rinse and repeat, every week. The idea is that by doing it every week, it's a relatively small and manageable set of changes each time. Some weeks, it's just line number noise in the patches. Other weeks, it could be a more significant change. By loitering on the mailing lists, you become aware of what changes are coming up, which gives you a good idea of where to look when the build breaks.

Along the way, I've been cleaning up the patches to eliminate the SPARC code (you could put it back, but it's not a focus of this project) and most of the code to support the Studio toolchain (the version of Studio to build current Java isn't compatible with illumos anyway). So what we're left with is a straightforward Solaris/illumos+gcc port.

Most of the code changes I've needed to make are fairly straightforward procedural changes. Some functions moved namespace. Some function signatures have been changed. There's also been a lot of work to consolidate a number of functions into common posix code, rather than have each OS provide different implementations which might diverge and become hard to maintain.

Most of this was pretty simple. The only one that caused me a significant amount of work was the signal handling rewrite, which took several attempts to get to work at all.

And it's become fairly routine. Java 17 came along, eventually, and the builds were still succeeding and basic smoke-testing worked just fine. So, illumos has Java 17 available, just as I had hoped.

I originally packaged the builds on Tribblix, of course, which is where I'm doing the work. But I've also dropped tarballs of occasional builds so they can be downloaded and used on other illumos distributions.

Actually, the idea of those builds isn't so much that they're useful standalone, but they provide a bootstrap JDK that you can use to build Java yourself. Which, given that bootstrap JDK and my java patches, ought to be fairly straightforward. (There's a separate patch directory for each jdk release - the directory name ought to be obvious.)

Which means that if you want Java 17 on OmniOS, you can have just that - it's built and packaged ready for you. Not only that, Dominik fixed some problems with my signal handling fix so it works properly and without errors, which benefits everyone.

It doesn't stop there. In addition to the stream of quarterly updates (JDK 17 being an LTS release will see these for some time yet) work is continuing on mainline. JDK 18 works just fine, and as it's ramping down for release shouldn't have any breaking changes, so that's another release supported. I'm building JDK 19, although that's only about 1 build in so hasn't really had any significant changes put into it yet.

The fact that a relatively unskilled developer such as myself can maintain an out of tree Java port for a couple of years, tracking all the upstream changes, does make you wonder if supporting Solaris was really that much of a blocker to progress. At the time my belief was that it wasn't Solaris support that was the problem, but the Studio toolchain, and I think that's been borne out by my experience. Not only that, but the consolidation and simplification of the various OS-specific code into common posix code shows that supporting a variety of modern operating systems really isn't that hard.

Sunday, April 04, 2021

Running Tribblix on Digital Ocean

A relatively recent feature offered by Digital Ocean is the ability to deploy your own custom image. So, can I deploy a Tribblix image to Digital Ocean?

Short answer: Yes!

For the process, read on.

I'm using Bhyve to create the image. This is a slight variation on the Installing Tribblix in Bhyve on Tribblix procedure.

The basic process looks like this:

Boot Tribblix in Bhyve
Install it
Tweak the installed image for Digital Ocea
Copy the ZFS volume to Digital Ocean

The first variation on the previous install is that I make the ZFS volume a bit smaller. We can resize it when it's deployed, so we don't need to make it too big. A 4G volume is fine; any smaller and there won't be any space for our swap partition. It doesn't actually matter too much, as we compress the image anyway

zap create-zone -t bhyve -z bhyve1 \
-x 192.168.0.236 \
-I /var/tmp/tribblix-0m24.1.iso \
-V 4G

Install as before,

./live_install.sh -G c1t0d0

remove the cdrom and reboot as before.

We now need to set up networking. We need to do this on a temporary basis, as we don't want any of these network configurations to carry over to the installed image. So I temporarily disable nwam, and manually bring up a working network. When the image boots on Digital Ocean, all this configuration will have been forgotten and it will bring up nwam as normal.

So run the following in the newly installed guest:

svcadm disable -t network/physical:nwam
ifconfig vioif0 plumb
ifconfig vioif0 up
ifconfig vioif0 inet 192.168.0.236/24
route add net default 192.168.0.1
echo "nameserver 8.8.8.8" > /etc/resolv.conf

Now we need to tweak the image. At some later point this will all be integrated into the installer so it will just work. But for now, we'll start by applying any updates:

zap refresh
zap update-overlay -a

Now for the little tweak. I'm going to add a metadata service that will run at boot on Digital Ocean and do the sort of things that cloud-init would do. Fortunately, there's one for illumos, and it's packaged for Tribblix, so install it:

zap install TRIBmetadata-agent

If you look with svcs you'll see that it's offline. That's not a problem (it's because we've got a temporary manual network setup) - once we boot properly on Digital Ocean we'll have nwam running and the metadata service will run just fine.

We can tidy up and save a bit of space:

zap clean-cache -a

and shut down the zone (and the newly installed instance of Tribblix):

zoneadm -z bhyve1 halt

What we want is a raw image. So all we do is dd the ZFS volume to a file.

dd if=/dev/zvol/rdsk/rpool/bhyve1_bhvol0 \
of=/var/tmp/tribblix-do-m24.1.img bs=1048576

That's a 4G file, the size of the volume. As it's stored on ZFS, and ZFS compression is on, it will actually consume a lot less space as the image is mostly empty. But what we don't want to do is upload 4G of empty space. So we can compress it:

gzip -9 /var/tmp/tribblix-do-m24.1.img

(it ends up as about 300M), or you could use bzip2, I think.

There are two options when you upload the image to Digital Ocean - you can either do a direct upload through the browser, or you can give it a URL where the image can be found and get Digital Ocean to pull it from there. I found it much easier to scp the image up to an existing webserver and get Digital Ocean to grab it, as I don't trust browsers to behave well.

Select 'Images' from the left hand menu
Tab to 'Custom Images'
Import via URL
[Insert the URL where you've stashed the file]

You then get a dialog

Name - tribblix-do-m24.1.img.gz
Distribution - unknown (it's not on the list)
Region - London

Obviously, for me, London is conveniently local. You get warned there will be a charge, and it shows as pending for a few minutes.

Then it pops up 'your image is ready to use'.

To the right of the image in the list is a 'More' dropdown menu from which you can start a droplet. So off we go.

It selects a pretty hefty instance type by default. Reset that to the very cheapest. Choose the ssh key you're going to use, pick a useful (and shorter) hostname, and Create Droplet.

Don't bother with block storage. That's exposed by virtio-scsi, which illumos doesn't yet support.

It'll take a few moments to create the droplet, and once it's ready you'll see the IP address.

At this point, if everything has worked, you should be able to ssh in as root with the ssh key you chose. Chances are that doesn't quite work yet. If it doesn't, simply ssh in as jack, from where you can su to root.

(Remember, jack is on the live ISO, and we haven't deleted it. During experiments I tend not to, to give myself a way in if things don't work right. A proper production image would have the jack user removed and password login for root disabled.)

If the metadata service hasn't run properly (it will resize the ZFS pool, change the hostname, and add the correct key to ssh in as root) then you can restart the metadata service:

svcadm restart metadata

Now ssh to root works, the hostname is set, and the zfs pool has been expanded to the full size.

None of the above is excessively specific to Tribblix, the same general process will work for any of the illumos distributions. (Although you may have to build and install the metadata service yourself.)

Installing Tribblix in Bhyve on Tribblix

One of the big new features recently added to illumos is the Bhyve hypervisor. Rather that the shared-kernel application-level virtualization offered by zones, think of something like VirtualBox, KVM, or Qemu.

One of the things that I am using Bhyve for is to test the Tribblix ISO images and the installer. This allowed me to shrink the installer footprint slightly in recent releases (and showed that one of the tricks I tried wasn't going to work).

Using Bhyve requires a fairly modern system (my own is a little old, and has an Intel Core i7, and works fine). The instructions here also need you to be running current Tribblix, m24.1 or newer.

So, how do I install Tribblix in Bhyve on Tribblix? Most of the following needs to be done as root.

First, make sure that you have the software installed:

zap install-overlay bhyve

I'm doing this on my desktop, so I'm using VNC to connect.

zap install-overlay retro-desktop

You need the current ISO. Again, this has to be m24.1 or newer. Correct the path below to match wherever you've downloaded it to.

While you don't actually have to run bhyve in a zone, that's the standard and most convenient way. So, I use zap to create a zone:

zap create-zone -t bhyve -z bhyve1 \
-x 192.168.0.236 \
-I /var/tmp/tribblix-0m24.1.iso \
-V 8G

Let's run through these. The -t flag says to create a bhyve zone, -z gives it a name, -x gives it an (exclusive) IP address, -I tells it where the ISO image is, and -V tells it to create a ZFS volume of the given size. The other argument that may be of interest is -m, which allows you to set the amount of memory allocated; it defaults to 1024M which is fine for Tribblix.

Then you need to be able to connect to the instance. I'm going to use VNC to connect to the console. We use socat to wire up the bhyve socket in the zone to a network port in the global zone that we can connect to.

socat TCP-LISTEN:5905,reuseaddr,fork UNIX-CONNECT:/export/zones/bhyve1/root/tmp/vm.vnc

You may need to modify the zone name in the path, and you can choose the TCP port (for VNC, it's offset by 5900, so the 5905 is :5 in VNC-speak).

Then, as yourself, you can start a VNC client

vncviewer :5

If that can't connect, then one possibility is that bhyve can't start because it doesn't have enough memory. One little trick to make sure there's enough headroom is to create a file in /tmp and then delete it immediately.

mkfile 1200m /tmp/1200m ; rm /tmp/1200m

You can then log in to the live environment. And do an install like so:

./live_install.sh -G c1t0d0

and the install will carry on

If you want to run this, you have to avoid booting off the CD. The way to do this is to remove the cdrom from the zone specification before you reboot the guest, like so:

zonecfg -z bhyve1 remove attr name=cdrom
zonecfg -z bhyve1 remove fs dir=/var/tmp/tribblix-0m24.1.iso special=/var/tmp/tribblix-0m24.1.iso
zoneadm -z bhyve1 reboot

Obviously, adjusted for the zone name and the ISO file name. I need to teach zap how to do this properly. And then start socat again (if you stopped it) and reconnect using VNC.

You may need to fiddle the networking in the installed guest. The exclusive-ip setting here will force the guest to use the given address. That's unlikely to work terribly well, as the chances of allocating the same address that DHCP would hand out are pretty remote. So to get it on the network you may have to fiddle in the guest by hand.

svcadm disable network/physical:nwam
echo "192.168.0.236/24" > /etc/hostname.vioif0
echo "192.168.0.1" > /etc/defaultrouter
svcadm enable network/physical:default

and populate /etc/resolv.conf with something relevant so that dns works

echo "nameserver 8.8.8.8" > /etc/resolv.conf

Once you're done, you can destroy the zone with

zap destroy-zone -z bhyve1

which will also destroy the ZFS volume it was using.

While the above refers to installing Tribblix inside Bhyve, the same general procedure ought to work to install other illumos distributions.

Sunday, February 21, 2021

Tweaking the Tribblix ISO

For the latest release, I've made a couple of tweaks to the Tribblix ISO image.

The first is that it's now a hybrid ISO. So you should be able to write it (simply dd it) to a USB stick and use that to boot. I've never managed to both create and test a usb image, so this should be a win.

The hybrid ISO also boots under bhyve, which the old ISO never did. So that's a second win.

I didn't do any of the work here, simply taking the steps from the OmniOSce kayak build as suggested by tsoome. But the script I use is here.

The second tweak is that the man pages aren't present in their normal form on the live image. Instead, they're shipped as a compressed tarball. This saves a significant amount of space - the manuals are simple text and compress (with bzip2) really well. This doesn't make a great deal of difference to the size of the ISO, but it reduces the size of the ramdisk we boot from by 10%. And that's important because an overlarge ramdisk chews up too much memory if you're trying to install on a small (1GB or less) system.

Of course, the installer now knows to unpack the man pages during installation. This meant I had to add bzip2 to the live image, but that's tiny compared to the space saving you get in return.

I was hoping to use pbzip2, so that if you're on a multiprocessor machine the uncompression is sped up. That turns out not to be a good idea. The problem is that pbzip2 is written in C++, so needs libstdc++. I don't have that, or any of the gcc runtime, on the live image. Adding it would undo all the space savings I was trying to get, but I've not noticed any slowness.

This can be adjusted a bit. The downside is that you have no access to the manual during installation; I could easily have a small subset of the man pages available, and compress away the majority.

The two features came together, thanks to bhyve. Having an image I can boot and install on bhyve is a huge advantage, as it allows me to test the ISO and the installer much more easily. It was that testing that demonstrated that pbzip2 wouldn't work.