Thursday, February 12, 2015

How illumos sets the keyboard type

It was recently pointed out that, while the Tribblix live image prompts you for the keyboard type, the installer doesn't carry that choice through to the installed system.

Which is right. I hadn't written any code to do that, and hadn't even thought of it. (And as I personally use a US unix keyboard then the default is just fine for me, so hadn't noticed the omission.)

So I set out to discover how to fix this. And it's a twisty little maze.

The prompt when the live image boots comes from the kbd command, called as 'kbd -s'. It does the prompt and sets the keyboard type - there's nothing external involved.

So to save that, we have to query the system. To do this, run kbd with the -t and -l arguments

# kbd -t
USB keyboard

# kbd -l
type=6
layout=33 (0x21)
delay(ms)=500
rate(ms)=40

OK, in the -l output type=6 means a USB keyboard, so that matches up. These are defined in <kbd.h>

#define KB_KLUNK        0x00            /* Micro Switch 103SD32-2 */
#define KB_VT100        0x01            /* Keytronics VT100 compatible */
#define KB_SUN2         0x02            /* Sun-2 custom keyboard */
#define KB_VT220        0x81            /* Emulation VT220 */
#define KB_VT220I       0x82            /* International VT220 Emulation */
#define KB_SUN3         3               /* Type 3 Sun keyboard */
#define KB_SUN4         4               /* Type 4 Sun keyboard */
#define KB_USB          6               /* USB keyboard */
#define KB_PC           101             /* Type 101 AT keyboard */
#define KB_ASCII        0x0F            /* Ascii terminal masquerading as kbd */

That handles the type, and basically everything today is a type 6.

Next, how is the keyboard layout matched. That's the 33 in the output. The layouts are listed in the file


Which are a key-value map of name to number. So what we have is:

US-English=33

And if you check the source for the kbd command, 33 is the default.

Note that the numbers that kbd -s generates to prompt the user with have absolutely nothing to do with the actual type - the prompt just makes up an incrementing sequence of numbers.

So, how is this then loaded into a new system? Well, that's the keymap service, which has a method script that then calls

/usr/lib/set_keyboard_layout

(yes, it's a twisty maze). That script gets the layout by calling eeprom like so:

/usr/sbin/eeprom keyboard-layout

Now, usually you'll see:

keyboard-layout=Unknown

which is fair enough, I haven't set it.

On x86, eeprom is emulated, using the file

/boot/solaris/bootenv.rc

So, to copy the keyboard layout from the live instance to the newly
installed system, I need to:

1. Get the layout from kbd -l

2. Parse /usr/share/lib/keytables/type_6/kbd_layouts to get the name that corresponds to that number.

3. Poke that back into eeprom by inserting an entry into bootenv.rc

Oof. This is arcane.

Sunday, February 08, 2015

Tribblix scorecard

I was just tidying up some of the documentation and scripts I used to create Tribblix, including pushing many of the components up to repositories on github.  One of the files I found was a quick sketch of my initial aims for the distro. I'm going to list those below, with a commentary as to how well I've done.
It must be possible to install a zone without requiring external resources
Success. On Tribblix, you don't need a repo to install a whole or sparse root zone. You can also install a partial-root zone that has a subset of the global zone's packages. (Clearly, adding software to a zone that isn't installed in the global zone will require an external source of packages, although it's possible to pre-cache them in the global zone.)
It must be possible to minimize a system, removing all extraneous software including perl, python, java.
Almost successful. There's no need in Tribblix for any of perl, python, or java. There are still pieces of illumos that can drag in perl in particular, but there is work to eliminate those as well. (One corollorary to this aim is that you can freely and arbitrarily replace any of perl, python, or java by completely different versions without constraint.)
It should be possible to upgrade a system from the live CD
In theory, this could be made to work trivially. The live CD contains both a minimalist installed image and additional packages. During installation, the minimalist image is simply copied to the destination, and additional packages added separately. As a space optimization, I don't put the packages in the minimalist image on the iso, as they would never be used during normal installation.
It should be possible to use any filesystem of the user's choice (although zfs is preferred)
Success. Although the default file system at install is zfs, the live CD comes with a ufs install script (which does work, although it doesn't get uch testing) which should be extensible to other file systems. In addition, I've built systems running with an nfs root file system.
It must be possible to select which versions of utilities are to be installed; and to have multiple versions installed simultaneously. It should be possible to define one of those installed versions as the default.
Partially successful. The way this is implemented is that certain utilities are installed under /usr/versions, and it's possible to have different versions co-exist. I've tried various levels of granularity, so it's a work in progress. For example, OpenJDK has a different package for each update (so you can have 7u71 and 7u75 installed together), whereas for python I just have 2.7 (for all values of 2.7.x) and 3.4. There are symlinks in the regular locations so they're in the regular search path, which can be modified to refer to a different version if the administrator so desires, but there isn't a built-in mechanism such as mediators - by default, the most recently installed version wins.
It must be possible to install by functionality rather than requiring users to understand packages. (Probably implemented as package groups or clusters.)
Success. I define overlays of useful functionality to hide packages, and the zap utility, the installer, and the zone tools are largely based around overlays as the fundamental unit of installation.
It should be possible to use small system configurations. Requiring over 1G of memory just to boot isn't acceptable.
Success. Tribblix will install and run in 512M (with limitations - making heavy use of java or firefox will be painful). I have booted the installer in slightly less, and run in 256M (it's pretty easy to try this in an emulator such as VirtualBox), but the way the installer works, by loading a full image archive into memory, will limit truly small configurations, as the root archive itself is almost 200M.
It should be possible to customize what's installed from the live CD (to a limited extent, not arbitrarily)
Success. You can define the installed software profile by choosing which overlays should be installed.

Overall, all of those initial aims have been met, or could easily be met by making trivial adjustments. I think that's a pretty good scorecard overall.

In addition to the general aims for Tribblix, I wrote down a list of milestones against which to measure progress. The milestones were more about implementation details rather than general aims (things like "migrate from gcc3 to gcc4", "build illumos from scratch", "become self-hosting", "create an upgrade mechanism", and "make a sparc version", or "have LibreOffice working"). That's where the "milestone" nomenclature in the Tribblix releases comes from, although I never specified in which order I would attack the milestones, it just makes for a convenient "yes, I got that working" point at which I might put out a new iso for download.

In terms of progress against those milestones, about the only one left to do that's structural is the upgrade capability. It's almost there, but needs more polish. Much of the rest is adding applications. So it's at this point that I can really start to think about producing something that I can call 1.0.

Tuesday, December 23, 2014

Setting up Logical Domains, part 2

In part 1, I talked about the server side of Logical Domains. This time, I'll cover how I set up a guest domain.

First, I have created a zfs pool called storage on the host, and I'm going to present a zvol (or maybe several) to the guests.

I'm going to create a little domain called ldom1.

ldm add-domain ldom1
ldm set-core 1 ldom1
ldm add-memory 4G ldom1
ldm add-vnet vnet1 primary-vsw0 ldom1


Now create and add a virtual disk. Create a dataset for the ldom and a 24GB volume inside it, add it to the storage service and set it as the boot device.

zfs create storage/ldom1
zfs create -V 24gb storage/ldom1/disk0
ldm add-vdsdev /dev/zvol/dsk/storage/ldom1/disk0 ldom1_disk0@primary-vds0
ldm add-vdisk disk0 ldom1_disk0@primary-vds0 ldom1
ldm set-var auto-boot\?=true ldom1
ldm set-var boot-device=disk0 ldom1


Then bind resources, list, and start:

ldm bind-domain ldom1
ldm list-domain ldom1
ldm start-domain ldom1


You can connect to the console by looking at the number under CONS in the list-domain output:

# ldm list-domain ldom1
NAME             STATE      FLAGS   CONS    VCPU  MEMORY   UTIL  UPTIME
ldom1            bound      ------  5000    8     4G
 

# telnet localhost 5000
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.

Connecting to console "ldom1" in group "ldom1" ....
Press ~? for control options ..


T5140, No Keyboard
Copyright (c) 1998, 2014, Oracle and/or its affiliates. All rights reserved.
OpenBoot 4.33.6.e, 4096 MB memory available, Serial #83586105.
Ethernet address 0:14:4f:fb:6c:39, Host ID: 84fb6c39.



Boot device: disk0  File and args:
Bad magic number in disk label
ERROR: /virtual-devices@100/channel-devices@200/disk@0: Can't open disk label package

ERROR: boot-read fail

Evaluating:

Can't open boot device

{0} ok


Attempt to boot it off the net and it says:

Requesting Internet Address for 0:14:4f:f9:87:f9

which doesn't match the MAC address in the console output. So if you want to jumpstart the box, and that's easy enough, you need to do a 'boot net' first to get the actual MAC address of the LDOM to add it to your jumpstart server.

To add an iso image to boot from:

ldm add-vdsdev options=ro /var/tmp/img.iso iso@primary-vds0
ldm add-vdisk iso iso@primary-vds0 ldom1


At the ok prompt, you can issue the 'show-disks' command to see what disk devices are present. To boot off the iso:

boot /virtual-devices@100/channel-devices@200/disk@1:f

And it should work. This is how I've been testing the Tribblix images for SPARC, by the way.

Setting up Logical Domains, part 1

I've recently been playing with Logical Domains (aka LDOMs, aka Oracle VM Server for SPARC). For those unfamiliar with the technology, it's a virtualization framework built into the hardware of pretty well all current SPARC systems, more akin to VMware than Solaris zones.

For more information, see here, here, or here.

First, why use it? Especially when Solaris has zones. The answer is that it addresses a different set of problems. Individual LDOMs are more independent and much more isolated than zones. You can partition resources more cleanly, and different LDOMs don't have to be at the same patch level (to my mind, it's not that you can have a different level of patches in each LDOM that matters, but that you can do maintenance of each LDOM to different schedules that matters). One key advantage I find is that the virtual switch you set up with LDOMs is much better at dealing with complex network configuration (I have hosts scattered across maybe dozens of VLANs, trying to fake that up on Solaris 10 is a bit of a bind). And some applications don't really get on with zones - I would build new systems around zones, but ill-understood and poorly documented legacy systems might be easier to drop inside an LDOM.

That dealt with, here's how I setup up one of my machines (a T5140, as practice for live deployment on a bunch of replacement T4-1 systems) as an LDOM host. I'll cover setting up the guest side in a second post.

Make sure the firmware is current - these are the minimum revs:

T2 - 7.4.5
T3 - 8.3
T4 - 8.4.2c


Then install the LDOM software.

cd /var/tmp
unzip p17291713_31_SOLARIS64.zip
cd OVM_Server_SPARC-3_1/Install
./install-ldm


You'll get asked if you want to launch the configuration assistant after installation. I chose n, and you can run ldmconfig at any later time. (If you want - it's best not to.)

Now need to apply the LDOM patch

svcadm disable -s ldmd
patchadd 150817-02
svcadm enable ldmd


Verify things are working as expected:

# ldm list
NAME             STATE      FLAGS   CONS    VCPU  MEMORY   UTIL  UPTIME
primary          active     -n-c--  SP      128   32544M   0.1%  16m


You should see the primary domain.

The next step is to establish default services and configure the control domain.

The necessary services are:

Virtual console
Virtual disk server
Virtual switch service

ldm add-vds primary-vds0 primary
ldm add-vcc port-range=5000-5100 primary-vcc0 primary
ldm add-vsw net-dev=nxge0 primary-vsw0 primary


verify with

ldm list-services primary

And we need to limit the control domain to a limited set of resources. The way I do this (this is just a personal view) is, on a system with N cores, to define N units each with 1 core with 1/N of the total memory. Assign one of those units to the primary domain, and then build the guest domains with 1 or more of those units (sizing as necessary - note that you can resize them on the fly so you don't have to get it perfect first time around). You can get very detailed and start allocating down to individual threads and specific amounts of memory, but it's so much better to just keep it simple.

For a T2/T2+/T3 you need to futz with crypto MAUs. This is unnecessary on later systems.

To show:

ldm list -o crypto primary

To add

ldm set-mau 1 primary

To set the CPU resources of the control domain

ldm set-core 1 primary

(I want to just set 1 core. Allocation best done by cores, not threads.)

Start the reconfig

ldm start-reconf primary

Fix the memory

ldm set-memory 4G primary

Save the config to the SP

ldm add-config initial

verify the config has been saved

ldm list-config

Reboot to activate

shutdown -y -g0 -i6

OK, so that creates a 1-core (8-thread) 4G control domain. And that all seems to work.

Next steps are to configure networking and enable terminal services. From the console (as you're reconfiguring the primary network):

ifconfig vsw0 plumb
ifconfig nxge0 down unplumb
ifconfig nxge0 inet6 down unplumb
ifconfig vsw0 inet 172.18.1.128 netmask 255.255.255.0 broadcast 172.18.1.255 up
ifconfig vsw0 inet6 plumb up
mv /etc/hostname.nxge0 /etc/hostname.vsw0
mv /etc/hostname6.nxge0 /etc/hostname6.vsw0


For a T4, replace nxge with igb.

At this point, you have a machine with minimal resources assigned to the primary domain, which looks for all the world like a regular Solaris box, ready to create guest domains using the remaining resources.

Saturday, October 25, 2014

Tribblix progress

I recently put out a Milestone 12 image for Tribblix.

It updates illumos, built natively on Tribblix. There's been a bit of discussion recently about whether illumos needs actual releases, as opposed to being continuously updated. It doesn't have releases, so when I come to make a Tribblix release I simply check out the current gate, build, and package it. After all, it's supposed to be ready to ship at any time.

Note that I don't maintain a fork of illumos-gate, I build it essentially as-is. This is the same for all the components I build for Tribblix - I keep true to unmodified upstream as much as possible.

The one change I have made is to SVR4 packaging. I've removed the dependency on openssl and wanboot (bug #5188), which is a good thing. It means that you can't use signed SVR4 packages, but I've never encountered one. Nor can pkgadd now directly retrieve a package via http, but the implementation via wanboot was spectacularly dire, and you're much better off using curl or wget, which allows proper repository management (as zap does). Packaging is a little quicker now, but this change also makes it much easier to update openssl in future (it's difficult to update something your packaging system is linked against).

Tribblix is now firmly committed to gcc4 (as opposed to the old gcc3 in OpenSolaris). I've rebuilt gcc to fix visibility support. If you've ever seen 'warning: visibility attribute not supported in this configuration' then you'll have stumbled across this. Basically, you need to ensure objdump is found during the gcc build - either by making sure it's in the path or by setting OBJDUMP to point to it.

I've added a new style of zones - alternate root zones. These are sparse root zones, but instead of inheriting from the global zone you can use an alternate installed image. More on that later.

There's the usual slew of updates to various packages, including the obviously sensitive bash and openssl.

There's an interesting fix to python. I put software that might come in multiple versions underneath /usr/versions and use symlinks so that applications can be found in the normal locations. Originally, /usr/bin/python was a symlink that went to ../versions/python-x.y.x/bin/python. This works fine most of the time. However, if you called it as /bin/python it couldn't find its modules, so the symlink has to be ../../usr/versions/python-x.y.x/bin/python which makes things work as desired.

The package catalogs now contain package sizes and checksums, allowing verification of downloaded packages. I need to update zap to actually use this data, and to retry or resume failed or incomplete downloads. (It's a shame that curl doesn't automatically resume incomplete downloads the way that wget does.)

At a future milestone, upgrades will be supported (regular package updates have worked for a while, I'm talking about a whole distro upgrade here). It's possible to upgrade by hand already, but it requires a few extra workarounds (such as forcing postremove scripts to always exit 0) to make it work properly. I've got most of the preparatory work in place now. Upgrading zones looks a whole lot more complicated, though (and I haven't really seen it done well elsewhere).

Now, off to work on the next update.

Wednesday, May 21, 2014

Building illumos-gate on Tribblix

Until recently, I've used an OpenIndiana system to build the illumos packages that go into Tribblix. Clearly this is less than ideal - it would be nice to be able to build all of Tribblix on Tribblix.

This has always been a temporary expedient. So, here's how to build illumos-gate on Tribblix.

(Being able to do so is also good in that it increases the number of platforms on which a vanilla illumos-gate can be built.)

First, download and install Tribblix (version 0m10 or later). I recommend installing the kitchen-sink.

Then, if you're running 0m10, apply some necessary updates. As root:

    zap refresh-overlays
    zap refresh-catalog
    zap update-overlay develop
    zap uninstall TRIBdev-object-file
    zap install TRIBdev-object-file


This won't be necessary in future releases, but I found some packaging issues which interfered with the illumos build (although other software doesn't bother), including some symlinks so that various utilities are where illumos-gate expects.

I run the build in a zone. It requires a non-standard environment, and using a zone means that I don't have to corrupt the global zone, and I can repeatably guarantee that I get a correct build environment.

Then, install a build zone. This will be a whole-root zone in which we copy the develop overlay from the global zone, and add the illumos-build overlay into the zone. (It will download the packages for the illumos-build overlay the first time you do this, but will cache them so if you repeat this later - and I tend to create build zones more or less at will - it won't have to). You need to specify a zone name and give it an IP address.

    zap create-zone -t whole \
        -z il-build -i 172.18.1.206 \
        -o develop -O illumos-build

This will automatically boot the zone, you just have to wait until SMF has finished initialising.

Configure the zone so it can resolve names from DNS:

    cp /etc/resolv.conf /export/zones/il-build/root/etc/
    cp /etc/nsswitch.dns /export/zones/il-build/root/etc/nsswitch.conf


Go into the zone

  zlogin il-build

In the zone, create a user to do the build, and a couple of hacky fixes

    rm /usr/bin/cpp
    cd /usr/bin ; ln -s ../gnu/bin/xgettext gxgettext


(The first is a bug in my gcc, the latter is a Makefile bug.)

If you want to build with SMB printing

    zap install TRIBcups

Now, as the user, the build largely follows the normal instructions: you can use git to clone illumos-gate, unpack the closed bins, copy illumos.sh and nightly.sh, and edit illumos.sh to customize the build.

There are a few things you need to do to get a successful build. The first is to add the following to illumos.sh

export SUPPRESSPKGDEP=true

this is necessary as the IPS dependency step uses the installed image; as Tribblix uses SVR4 packaging, there isn't one. You can still create the IPS repo (and I do, as that's what I then turn into SVR4 packages), but the dependency step needs to be suppressed.

If you want to build with CUPS, then you'll need to have installed cups, and you'll need to patch smb. Alternatively, avoid pulling in CUPS by adding this to illumos.sh:

export ENABLE_SMB_PRINTING='#'

As Tribblix uses newer glib, the API has changed slightly and hal uses the old API. There is a proper fix, but you can simply:

gsed -i '/g_type_init/d' usr/src/cmd/hal/hald/hald.c

Note that this means that you won't be able to run the hal components on a system with a downrev glib.

Then you should be able to run a build:

time ./nightly.sh illumos.sh

The build should be clean (I see ELF runtime attribute warnings, all coming from glib and ffi, but those don't actually matter, and I'm not sure illumos should be complaining about errors in its external dependencies anyway).

Friday, May 16, 2014

Software verification of SVR4 packages with pkgchk

On Solaris (and Tribblix) you can use the pkgchk command to verify that the contents of a software package are correctly installed.

The simplest invocation is to give pkgchk the name of a package:

pkgchk SUNWcsl

I would expect SUNWcsl to normally validate cleanly. Whereas something like SUNWcsr will tend to produce lots of output as it contains lots of configuration files that get modified. (Use the -n flag to suppress most of the noise.

If you want to check individual files, then you can use

pkgchk -p /usr/bin/ls

or (and I implemented this as part of the OpenSolaris project) you can feed a list of files on stdin:

find /usr/bin -mtime -150 | pkgchk -i -

However, it turns out that there's a a snag with the basic usage of pkgchk to analyze a package, in that it will trust the contents file - both for the list of files in the package, and for their attributes.

Modifying the list of files can be a result of using installf and removef. For example, I delete some of the junk out of /usr/ucb (such as /usr/ucb/cc so as to be sure no poor unfortunate user can ever run it), and use removef to clean up the contents file. A side-effect of this is that pkgchk won't normally be able to detect that those files are missing.

Modifying file attributes can be the result of a second package installing the same pathname with different attributes. Having multiple packages deliver a directory is common, but you can also have multiple packages own a file. Whichever package was installed last gets to choose which attributes are correct, and the normal pkgchck is blind to any changes as a result.

There's a trick to get round this. From Solaris 10, the original package metadata (and unmodified copies of editable files) are kept. Each package has a directory in /var/sadm/pkg, and in each of those you'll find a save directory. This is used when installing zones, so you get a pristine copy. However, you can also use the pkgmap file to verify a package:

pkgchk -m /var/sadm/pkg/SUNWscpu/save/pspool/SUNWscpu/pkgmap

and this form of usage will detect files that have been removed or modified by tools that are smart enough to update the contents file.

(Because those save files are used by zones, you'll find they don't exist in a zone because they wouldn't be needed there. So this trick only works in a global zone, or you need to manually copy the pkgmap file.)

Tuesday, April 15, 2014

Partial root zones

In Tribblix, I support sparse-root and whole-root zones, which work largely the same way as in Solaris 10.

The implementation of zone creation is rather different. The original Solaris implementation extended packaging - so the packaging system, and every package, had to be zone-aware. This is clearly unsustainable. (Unfortunately, the same mistake was made when IPS was introduced.)

Apart from creating work, this approach limits flexibility - in order to innovate with zones, for example by adding new types, you have to extend the packaging system, and then modify every package in existence.

The approach taken by Tribblix is rather different. Instead of baking zone architecture into packaging, packaging is kept dumb and the zone creation scripts understand how packages are put together.

In particular, the decision as to whether a given file is present in a zone (and how it ends up there) is not based on package attributes, but is a simple pathname filter. For example, files under /kernel never end up in a zone. Files under /usr might be copied (for a whole-root zone) or loopback mounted (for a sparse-root zone). If it's under /var or /etc, you get a fresh copy. And so on. But the decision is based on pathname.

It's not just the files within packages that get copied. The package metadata is also copied; the contents file is simply filtered by pathname - and that's how the list of files to copy is generated. This filtering takes place during zone creation, and is all done by the zone scripts - the packaging tools aren't invoked (one reason why it's so quick). The scripts, if you want to look, are at /usr/lib/brand/*/pkgcreatezone.

In the traditional model, the list of installed packages in the zone is (initially) identical to that in the global zone. For a sparse-root zone, you're pretty much stuck with that. For a whole-root zone, you can add and remove packages later.

I've been working on some alternative models for zones in Tribblix that add more flexibility to zone creation. These will appear in upcoming releases, but I wanted to talk about the technology.

The first of these is what you might call a partial-root zone. This is similar to a whole-root zone in the sense that you get an independent copy, rather than being loopback mounted. And, it's using the same TRIBwhole brand. The difference is that you can specify a subset of the overlays present in the global zone to be installed in the zone. For example, you would use the following install invocation:

zoneadm -z myzone install -o developer

and only the developer overlay (and the overlays it depends on) will be installed in the zone.

This is still a copy - the installed files in the global zone are the source of the files that end up in the zone, so there's still no package installation, no need for repository access, and it's pretty quick.

This is still a filter, but you're now filtering both on pathname and package name.

As for package metadata, for partial-root zones, references to the packages that don't end up being used are removed.

That's the subset variant. The next obvious extension is to be able to specify additional packages (or, preferably, overlays) to be installed at zone creation time. That does require an additional source of packages - either a repository or a local cache - which is why I treat it as a logically distinct operation.

Time to get coding.