Sunday, February 10, 2019

Thoughts on SPARC support in illumos

One interesting property of illumos is that its legacy stretches back decades - there is truly ancient code rubbing shoulders with the very modern.

An area where we have really old code is on SPARC, where illumos has support in the codebase for a large variety of Sun desktops and servers.

There's a reasonable chance that quite a bit of this code is currently broken. Not because it's fundamentally poor code (although it's probably fair to say that the code quality is of its time, and a lot of it is really old), but it lives within an evolving codebase and hasn't been touched in the lifetime of illumos, and likely much longer. Not only that, but it's probably making more assumptions about being built with the old Studio toolchain rather than with gcc.

What of this code is useful and worth keeping and fixing, and what should be dropped?

A first step in this was that I have recently removed support for starfire - the venerable Sun E10K. It seems extremely unlikely that anyone is running illumos on such a machine. Or indeed that anyone has them running at all - they're museum pieces at this point.

A similar, if rather newer, class of system is the starcat, the Sun F15K and variants. Again, it's big, expensive, requires dedicated controller hardware, and is unlikely to be the kind of thing anyone's going to have lying about. (And, if you're a business, there's no real point in trying to make such a system work - you would be much better off, both operationally and financially, in getting a current SPARC system.)

And if nobody has such a system, then not only is the code useless, it's also untestable.

The domained systems, like starfire and starcat, are also good candidates for removal because of the relative complexity and uniqueness of their code. And it's not as if the design specs for this hardware are out there to study.

What else might we consider removing (with starfire done and starcat a given)?

  1. The serengeti, Sun-Fire E2900-E6800. Another big blob of complex code.
  2. The lw8 (lightweight 8), aka the V-1280. This is basically some serengeti boards in a volume server chassis.
  3. Anything using Sbus. That would be the Ultra-2, and the E3000-E6000 (sunfire). There's also the socal, sf, and bpp drivers. One snag  with removing the Ultra-2 is that it's used as the base platfrom for the newer US-II desktops, which link back to it.
  4. The olympus platform. That's anything from Fujitsu. One slight snag here is that the M3000 was quite a useful box and is readily available on eBay, and quite affordable too.
  5. Netra systems. (Specifically NetraCT - there's a US-IIi NetraCT, and two US-IIe systems, the NetraCT-40 and the NetraCT-60. Code names montecarlo and makaha (something about Tonga too). Also CP2300 aka snowbird.
  6. Server blade. I'm talking the old B100s blade here.
  7. Binary compatibility with SunOS 4 - this is kernel support for a.out, and libbc.
I'm not saying at this point that all of this code and platform support will go, just that it lists the potential candidates. For example, I regard support for M3000 as useful, and definitely worth thinking about keeping.

What does that leave as supported? Most of the US-II and US-III desktops, most of the V-series servers, and pretty much all the early sun4v (T1 through T3 chips) systems. In other words, the sort of thing that you can pick up second hand fairly easily at this point.

Getting rid of code that we can never use has a number of benefits:

  • We end up with a smaller body of code, that is thus easier to manage.
  • We end up with less code that needs to be updated, for example to make it gcc7 clean, or to fix problems found by smatch, or to enable illumos to adopt newer toolchains.
  • We can concentrate on the code that we have left, and improve its quality.
If we put those together into a single strategy, the overall aim is to take illumos for SPARC from a large body of unknown, untested, and unsupportable code to a smaller body of properly maintained, testable, and supportable code. Reduce quantity to improve quality, if you like.

As part of this project, I've looked through much of the SPARC codebase. And it's not particularly pretty. One reason for attacking starfire was that I was able to convince myself relatively quickly that I could come up with a removal plan that was well-bounded - it was possible to take all of it out without accidentally affecting anything else. Some of the other platforms need bit more analysis to tease out all the dependencies and complexity - bits of code are shared between platforms in a variety of non-obvious ways.

The above represents my thoughts on what would be a reasonable strategy for supporting SPARC in illumos. I would naturally be interested in the views of others, and specifically if anyone is actually using illumos on any of the platforms potentially on the chopping block.

3 comments:

Unknown said...

I would vote for keeping Fujitsu support if possible, I have an M3000. Of the platforms listed here they are probably the most useful/practical.

David said...

The v100’s & v120’s are inexpensive & power efficient.
The T1 & T2 CPU’s are also power efficient & open sourced, meaning others can construct new processors with impunity, so should be retained as reference platforms.

Peter Tribble said...

While I agree that there are systems that are relatively affordable, they're also cripplingly slow by modern standards. Likewise power efficiency - coolthreads were very good 15 years ago, but look power hungry today (looking at the consumption of my T5140 as an example).

Realistically, I'm not expecting people to go out and buy SPARC hardware to run illumos on; that would be quite keen. Rather, it's those who already have the gear and wish to keep it in a reasonably useful state that are a more interesting target.