Sunday, February 10, 2019

Thoughts on SPARC support in illumos

One interesting property of illumos is that its legacy stretches back decades - there is truly ancient code rubbing shoulders with the very modern.

An area where we have really old code is on SPARC, where illumos has support in the codebase for a large variety of Sun desktops and servers.

There's a reasonable chance that quite a bit of this code is currently broken. Not because it's fundamentally poor code (although it's probably fair to say that the code quality is of its time, and a lot of it is really old), but it lives within an evolving codebase and hasn't been touched in the lifetime of illumos, and likely much longer. Not only that, but it's probably making more assumptions about being built with the old Studio toolchain rather than with gcc.

What of this code is useful and worth keeping and fixing, and what should be dropped?

A first step in this was that I have recently removed support for starfire - the venerable Sun E10K. It seems extremely unlikely that anyone is running illumos on such a machine. Or indeed that anyone has them running at all - they're museum pieces at this point.

A similar, if rather newer, class of system is the starcat, the Sun F15K and variants. Again, it's big, expensive, requires dedicated controller hardware, and is unlikely to be the kind of thing anyone's going to have lying about. (And, if you're a business, there's no real point in trying to make such a system work - you would be much better off, both operationally and financially, in getting a current SPARC system.)

And if nobody has such a system, then not only is the code useless, it's also untestable.

The domained systems, like starfire and starcat, are also good candidates for removal because of the relative complexity and uniqueness of their code. And it's not as if the design specs for this hardware are out there to study.

What else might we consider removing (with starfire done and starcat a given)?

  1. The serengeti, Sun-Fire E2900-E6800. Another big blob of complex code.
  2. The lw8 (lightweight 8), aka the V-1280. This is basically some serengeti boards in a volume server chassis.
  3. Anything using Sbus. That would be the Ultra-2, and the E3000-E6000 (sunfire). There's also the socal, sf, and bpp drivers. One snag  with removing the Ultra-2 is that it's used as the base platfrom for the newer US-II desktops, which link back to it.
  4. The olympus platform. That's anything from Fujitsu. One slight snag here is that the M3000 was quite a useful box and is readily available on eBay, and quite affordable too.
  5. Netra systems. (Specifically NetraCT - there's a US-IIi NetraCT, and two US-IIe systems, the NetraCT-40 and the NetraCT-60. Code names montecarlo and makaha (something about Tonga too). Also CP2300 aka snowbird.
  6. Server blade. I'm talking the old B100s blade here.
  7. Binary compatibility with SunOS 4 - this is kernel support for a.out, and libbc.
I'm not saying at this point that all of this code and platform support will go, just that it lists the potential candidates. For example, I regard support for M3000 as useful, and definitely worth thinking about keeping.

What does that leave as supported? Most of the US-II and US-III desktops, most of the V-series servers, and pretty much all the early sun4v (T1 through T3 chips) systems. In other words, the sort of thing that you can pick up second hand fairly easily at this point.

Getting rid of code that we can never use has a number of benefits:

  • We end up with a smaller body of code, that is thus easier to manage.
  • We end up with less code that needs to be updated, for example to make it gcc7 clean, or to fix problems found by smatch, or to enable illumos to adopt newer toolchains.
  • We can concentrate on the code that we have left, and improve its quality.
If we put those together into a single strategy, the overall aim is to take illumos for SPARC from a large body of unknown, untested, and unsupportable code to a smaller body of properly maintained, testable, and supportable code. Reduce quantity to improve quality, if you like.

As part of this project, I've looked through much of the SPARC codebase. And it's not particularly pretty. One reason for attacking starfire was that I was able to convince myself relatively quickly that I could come up with a removal plan that was well-bounded - it was possible to take all of it out without accidentally affecting anything else. Some of the other platforms need bit more analysis to tease out all the dependencies and complexity - bits of code are shared between platforms in a variety of non-obvious ways.

The above represents my thoughts on what would be a reasonable strategy for supporting SPARC in illumos. I would naturally be interested in the views of others, and specifically if anyone is actually using illumos on any of the platforms potentially on the chopping block.

Friday, February 08, 2019

SPARC and tod modules on illumos

Following up from removing starfire support from illumos, I've been browsing through the codebase to identify more legacy code that shouldn't be there any more.

Along the way, I discovered a little tidbit about how the tod (time of day) modules - the interface to the hardware clock - work on SPARC.

If you look, there are a whole bunch of tod modules, and it's not at all obvious how they fit together - they all appear to be candidates for loading, and it's not obvious how the correct one for a platform is chosen.

The mechanism is actually pretty simple, if a little odd.

There's a global variable in the kernel named:

tod_module_name

This can be set in several ways - for some platforms, it's hard-coded in that platform's platmod. Or it could be extracted from the firmware (OBP). That tells the system which tod module should be used.

And the way this works is that each tod module has _init code that looks like

if (tod_module_name is myself) {
   initialize the driver
} else {
   do nothing
}

so at boot all the tod modules get loaded, but only the one that matches the name set by the platform actually initializes itself.

Later in boot, there's an attempt to unload all modules. Similarly the _fini for each driver essentially does

if (tod_module_name is myself) {
   I'm busy and can't be unloaded
} else {
   yeah, unload me
}

So, when the system finishes booting, you end up with only one tod module loaded and functional, and it's the right one.

Returning to the original question, can any of the tod modules be safely removed because no platform uses them? To be honest, I don't know. Some have names that match the platform they're for. It's pretty obvious, for example, that todstarfire goes with the starfire platform, so it was safe to take that out. But I don't know the module names returned by every possible piece of SPARC hardware, so it isn't really safe to remove some of the others. (And, as a further problem, I know that at least one is only referenced in closed source, binary only, platform modules.)

Sunday, November 04, 2018

Tribblix and the python transition

It's been over a decade since python 3 came out, and a lot of the world is still using python 2.

But we're at the point now where the world has said enough is enough, and it's time to finally get the 2-to-3 transition over with.

And while Tribblix is all about retro styling, it's also all about keeping up. So I put together a plan for migrating Tribblix from python 2 to python 3.
  • Ship all the modules for python 3 as well as python 2, ready to switch
  • Move the python consumers (eg mercurial) across to python 3
  • Make python 3 the default
  • Deprecate and remove python 2
This is made a little easier by the fact that there's nothing in Tribblix itself that uses python directly - I haven't made the mistake of having my packaging system or anything like that written in python, for example.

There was a little wrinkle in all this. I had got this planned out, and then python 3.7 was just around the corner. So I ended up waiting a little, and put a python 3.6 to 3.7 transition at the beginning of the list.

So where am I right now? I've now got all the modules built and packaged for python 3.7, and python 3.6 has been removed from Tribblix. This was made somewhat easier by the fact that no packages in Tribblix yet depended on python 3 - the transition hadn't been started properly, so I could simply throw away all the python 3.6 stuff.

As an aside, this had the odd side effect that all the python 3.7 modules were packaged straight away for SPARC, whereas python 3.6 was never finished there - the 3.6 to 3.7 switch was all scripted, rather than manual, so was very little actual work. There were a couple of modules that needed to be updated anyway to work with 3.7 (pyyaml for example), and I took the opportunity to do a bunch of routine module updates at the same time.

So just having all the modules turned out to be nearly trivial. Now there's going to be a longer slog migrating all the python consumers across and making python 3 the default. (It might be easiest to make python 3 the default first, so that when building the consumers they automatically pick up the python I want.)

I was originally thinking of a fairly slow and structured approach where each step would be a point release of 3.7. But I'm well ahead of that already, and the remaining steps are likely to occur fairly promptly. (Or, as promptly as I have time to do the work.)

So it won't be long before we bid farewell to python 2 in Tribblix.

Tuesday, July 17, 2018

Tribblix - minimal plus pkgsrc

The latest update (0m20.5) of Tribblix is now available for download.

This one comes relatively quickly after its predecessor, but includes the eager fpu and ldt fixes.

The visible change this time is that there's a minimal ISO available. At just over 200M, it's about a fifth of the size of the regular ISO.

The only difference between the regular and minimal ISO is in what extra packages are dropped on the ISO. If you aren't going to add overlays at install, then the minimal ISO is much better.

The use cases for the minimal ISO are the obvious minimal server installs (including cloud-based), but also if you decide to use pkgsrc for your software rather than Tribblix native packages.

Tribblix is all about choice and flexibility, so if you want to use pkgsrc, go right ahead. You'll get a broader choice of software, certainly. In some areas you'll get newer versions than I provide, in some areas Tribblix will update more aggressively.

One thing I have noticed, though, is that if you want to use pkgsrc, do so exclusively. Mixing and matching native packages and pkgsrc really doesn't work.

With that in mind, the minimal ISO makes a great base for pkgsrc. To make it even easier, the pkgsrc bootstrap is available on the ISO itself. So if you want to install Tribblix with pkgsrc, just invoke the installer like so:

./live_install.sh -B c1t0d0 pkgsrc

Then, as root

export PATH=/opt/local/bin:$PATH

and you're all set to install packages from pkgsrc.

To get the X server and basics:

pkgin install modular-xorg

To install a graphical desktop, such as Xfce - there are many others, of course:

pkgin install xfce4

And then, as a regular user

export PATH=/opt/local/bin:$PATH
startxfce4

And, lo and behold, you should have a functional basic Xfce desktop running.

Friday, June 01, 2018

Tribblix - creating zones from images

One of the things I've done with Tribblix is try and hide some of the complexity around managing zones - rather than having to mess around with zonecfg and zoneadm and all that, just have one simple command that creates a zone correctly.

Tribblix also has the capability to drop an alternative illumos distribution into a zone - so called alien zones.

The OmniTribblix variant has LX zones, so you can run Linux in a zone.

Up to now, you've had to manually download the appropriate image, save it somewhere, and then install the zone from that image.

Wouldn't it be much easier to have Tribblix do that for you? Well, as of the m20.4 release, it can!

So, for example, OmniOS have images as zfs send streams available for download - the .zfs.bz2 files. So you can now build an OmniOS zone on Tribblix like so:

zap create-zone -z omnios -t alien -i 10.0.2.26 -I omnios:r151026

That's it.

And, if you're running OmniTribblix you could create an Ubuntu zone like so:

zap create-zone -z ubuntu -t lx -x 10.0.2.27 -I ubuntu

or, if you want Alpine (and this is really quick)

zap create-zone -z alpine -t lx -x 10.0.2.28 -I proxmox:alpine

The images are downloaded and cached, so creating subsequent zones will be much quicker.

It's a proof of concept at this point, and needs fleshing out a bit more to make it even more friendly, but it shows that it's possible and useful.

Monday, November 06, 2017

Selecting relay smarthosts and using SMTP AUTH on illumos

A problem I looked at recently involved configuring a system to send (relay) email via a customer's own SMTP servers. There are 2 parts to this:

  • Select the relay host depending on some condition
  • Authenticate with the remote relay using SMTP AUTH

Search for SMTP AUTH with sendmail on illumos or Solaris, and you invariably end up with advice on how to build Cyrus SASL and sendmail from scratch.

For example, Andrew has some good instructions.

However, if you look at the sendmail we ship on illumos you'll find that it's already been built with SASLv2 support:

# /usr/lib/sendmail -bt -d0.1 < /dev/null
Version 8.14.4+Sun
 Compiled with: DNSMAP LDAPMAP LOG MAP_REGEX MATCHGECOS MILTER MIME7TO8
        MIME8TO7 NAMED_BIND NDBM NETINET NETINET6 NETUNIX NEWDB NIS
        PIPELINING SASLv2 SCANF STARTTLS TCPWRAPPERS USERDB
        USE_LDAP_INIT XDEBUG

And, if you telnet to port 25 and look at the EHLO response it includes:

250-AUTH GSSAPI DIGEST-MD5 CRAM-MD5

However, that's not actually the part we want here (but I'll come back to that later). I don't want to authenticate against my own server, I need my system to authenticate against a remote server.

Back to the problem at hand.

The first part - selecting the right smarthost - can be achieved using smarttable. All you need is the smarttable.m4 file, and then build a configuration using it by enabling the smarttable feature.

The second part, SMTP AUTH, should also be very simple. Again, it's all documented, and just involves enabling the authinfo feature. But wait - on illumos, there is no authinfo.m4 file, so that won't work.

In fact, it does. So what you need to do is to download the sendmail source, unpack it, and there in the cf/feature directory you'll find the authinfo.m4 file.

OK, so copy both files - smarttable.m4 and authinfo.m4 - into the /etc/mail/cf/feature directory on a server. Copy and edit the sendmail.mc file (i'm going to copy it to /tmp and edit it there) to add the 2 feature lines, like this fragment of the file here:

...
define(`confFALLBACK_SMARTHOST', `mailhost$?m.$m$.')dnl
FEATURE(`authinfo')dnl
FEATURE(`smarttable')dnl
MAILER(`local')dnl
...

Basically, just add the features above the MAILER line. Then compile that:

cd /etc/mail/cf/cf
m4 ../m4/cf.m4 /tmp/sendmail.mc > /tmp/sendmail.cf

That's your new sendmail.cf ready. It uses 2 databases in /etc/mail, to create these (initially empty):

cd /etc/mail
touch smarttable
touch authinfo
makemap hash smarttable < smarttable
makemap hash authinfo < authinfo

then copy your new sendmail.cf into /etc/mail and restart sendmail

cp /tmp/sendmail.cf /etc/mail
svcadm restart sendmail

So far so good, but what should those files look like?

First the smarttable file, which is just a map of sender to relay host. For example, it might just have:

my.name@gmail.com smtp.gmail.com

Which means that if I want my home system to send out mail with my address on it, it should route it through gmail's servers rather than trying to deliver it direct (and likely getting marked as spam).

Then the authinfo file, which looks like

Authinfo:smtp.gmail.com "U:root" "I:my.name@gmail.com" "P:mypassword" "M:LOGIN PLAIN"
Authinfo:smtp.gmail.com:587 "U:root" "I:my.name@gmail.com" "P:mypassword" "M:LOGIN PLAIN"
(There are just 2 lines there, starting with Authinfo:, even if the blog shows it wrapped.)

Basically, for gmail, you need to supply your email address as the identifier and your password as, well, the password. (Note: if you've got two-factor authentication set up, you'll need to set up an app key.)

Of course, the authinfo files ought to to readable only by root, otherwise anyone one your system can read your password in the clear.

There are a couple of non-standard tweaks you'll need for gmail to work. First, you need to go to your gmail account settings and allow less secure apps. Second, you will need the "M:LOGIN PLAIN" entry in the authinfo file, else you'll get an "available mechanisms do not fulfill requirements" error back.

Redo the two makemap commands above and you're good to go.

That's SMTP AUTH the one way. At which point you're probably thinking, can we authenticate against an illumos sendmail using SMTP AUTH?

The answer, sadly, is no. At least as far as I can tell. While our sendmail is built correctly against SASLv2, illumos doesn't seem to ship enough supporting bits of the SASL infrastructure to make this work. You should be able to create the file /etc/sasl/Sendmail.conf to configure it. Unfortunately the only pwcheck_method available is auxprop (using shadow, which would allow you to authenticate against local system accounts, isn't available; neither is saslauthd, and there's no saslauthd anyway). Worse, illumos has no auxprop plugins, so the whole thing is rather useless. Note that rebuilding sendmail alone won't fix this, as the problem is in the underlying sasl implementation.

The above notes were developed on Tribblix, but ought to apply to any illumos distribution using the vanilla illumos sendmail+sasl combination.

Wednesday, November 01, 2017

Building illumos-gate on AWS

Having talked about running Tribblix on AWS, one of the things that would be quite neat would be to be able to build illumos-gate.

This is interesting because it's a relatively involved process, and might require proper resources - it's not really possible to build illumos inside VirtualBox, for instance, and many laptops don't run illumos terribly well. So it's hard for the average user to put together a decent - most likely dedicated - rig capable of building or developing illumos, which is clearly a barrier to contribution.

Here's how anyone can build illumos, using Tribblix.

Build yourself an EC2 instance as documented here, with 2 changes:

  1. The instance type should be m4.large or bigger - m4.xlarge or c4.xlarge would be better. The bigger the instance, the quicker the build, but m4.large is pretty much the minimum size.
  2. Attach an EBS volume to the instance, at least 8G in size. If you want to do multiple builds, or do lint or debug builds, then it has to be larger. I attach the volume as /dev/sdf, which is assumed below. (You could keep the volume around to persist the data, of course.)
Once booted, log in as root. You then need to set up the zfs pool (the disk showing up as c2t5d0 below matches the /dev/sdf attachment point) and create a couple of file systems that can be used to host the build zone and store the build.

zpool create storage c2t5d0
zfs set compression=lz4 storage
zfs destroy rpool/export/home
zfs create -o mountpoint=/export/home storage/home
zfs create -o mountpoint=/export/zones storage/zones

You should then do an update to ensure packages are up to date, and install the develop overlay to get you some useful tools.

zap refresh
zap update-overlay -a
zap install-overlay develop

Then create a user, which you're going to use to do the build. For me, that is:

groupadd -g 10000 it
useradd -g it -u 11730 -c "Peter Tribble" -s /bin/tcsh \

  -d /export/home/ptribble ptribble
mkdir -p /export/home/ptribble
chown -hR ptribble:it /export/home/ptribble
passwd ptribble

Then create a build zone. It has an IP address, just pick any unused private address (I simply use the address above that of the global zone, which you can get with ifconfig or from the AWS console - note that it's the private address, not the public IP that you ssh to).

zap create-zone -z illumos-build -t whole \
  -i 172.xxx.xxx.xxx -o develop \
  -O java -O illumos-build -U ptribble

What does this do? It creates a new zone, called illumos-build. It's a whole root zone, with its own exclusive set of file systems. The IP address is 172.xxx.xxx.xxxx. The develop overlay is installed (in this case, copied from the global zone); the java and illumos-build overlays are added to this new zone (note the upper-case -O here). Finally, the user account ptribble is shared with the zone.

Give that a few seconds to boot and log in to it, then a couple of tweaks that are necessary for illumos to build without errors.

zlogin illumos-build
rm /usr/bin/cpp
cd /usr/bin ; ln -s ../gnu/bin/xgettext gxgettext

Now log out and log back in to the instance as your new user. We're going to create somewhere to store the files, and check out the source code.

mkdir Illumos
cd Illumos
git clone git://github.com/illumos/illumos-gate.git
wget -c \
  https://download.joyent.com/pub/build/illumos/on-closed-bins.i386.tar.bz2 \
  https://download.joyent.com/pub/build/illumos/on-closed-bins-nd.i386.tar.bz2

Now we set up the build.

cd illumos-gate
bzcat ../on-closed-bins.i386.tar.bz2 | tar xf -
bzcat ../on-closed-bins-nd.i386.tar.bz2 | tar xf -
cp usr/src/tools/scripts/nightly.sh .
chmod +x nightly.sh

There are two more files we need. Go to the tribblix-build repo and look in the illumos directory there. Grab one of the illumos.sh files from there and put it into your illumos-gate directory with the name illumos.sh. If you need to change how the build is done, this is the file to edit (but start from one of those files so you get one appropriate for Tribblix as the host). Also, grab Makefile.auditrecord and use it to replace usr/src/cmd/auditrecord/Makefile.

Now log in to the zone and start the build.

pfexec zlogin -l ptribble illumos-build
cd Illumos/illumos-gate
time ./nightly.sh illumos.sh

On an m4.xlarge instance, this took me just under 75 minutes. Look in the log directory and check that the mail_msg looks clean without errors, and you'll have the built files in the proto directory and an IPS repo under packages.

For more behind the scenes details on the illumos build process itself, look at the how to build illumos page.