The Trouble with Tribbles...: A brief history with Solaris

I first encountered Solaris (as in Solaris 2.x, as opposed to the retrospectively branded SunOS 4 as Solaris 1.x) when we got a SPARC classic workstation. Initially, that hardware didn't support SunOS 4. That made the shiny workstations useless doorstops, as nothing worked, and building stuff from source didn't work either.

Besides, Solaris 2.1 was utter garbage. It took decades to rid it of some of the more erratic design stupidities inherent in System V. (Cough. Printing. SAF.)

I just missed any serious association with Solaris 2.2, as the SS1000s I got to look after had been upgraded to 2.3 just before my arrival.

So, as a sysadmin, Solaris 2.3 was my first exposure to Solaris at scale. On the SS1000 you didn't have a choice, that was a completely new architecture that was never going to run SunOS 4, and we had several of them as the core of the service.

We built out NISplus. This had a bunch of, shall we call them quirks, and the early releases were pretty grim. But once the more irritating bugs got fixed, it served as a solid workhorse for years. As a network nameservice it was years ahead of its time - having proper administrative tooling and permissions, and a hierarchical structure. It was orders of magnitude better than the older NIS, and far better than anything available today. The SS5 running as our NISplus master did so running Solaris 2.3 for far longer than it probably should.

(We were also one of the few places to use X/Open federated naming, another game-changing state of the art technology Sun introduced and is now lost without trace.)

There's a common rule that odd releases are bad, even release are good. That didn't work with early releases of Solaris, they were all bad. But Solaris 2.4 was getting to be better - more stable, more performant, generally a better feel.

As you might expect, there was a pattern, and we found Solaris 2.5 to be pretty dreadful. We installed it on a couple of systems, but it was so poor we gave up. And then Solaris 2.5.1 was pretty decent, so the alternation pattern was starting to become established.

For me, Solaris 2.6 was a watershed. It was atrocious. We weren't exclusively a Solaris shop, we had RS/6000s running AIX, a decent SGI presence, some Linux, odd bits of other Unices, and still had SunOS 4 (an old ELC salvaged from the skip running our multicast router to connect us to the mbone). But we were starting to like Solaris, as it was so much easier to manage than anything else out there, so I started to report the bugs I was hitting.

I was reporting bug after bug after bug. We had given feedback previously, of course, but at nothing like the scale we were doing here. And, unlike other vendors who slammed the door in our faces and told us to go away, the Sun engineers actually wanted the feedback and the bugs, and fixed things for us.

So when they were planning Solaris 2.7, they got me to test it before it was released, rather than letting all those bugs get out into the wild and have to deal with my irate bug reports afterwards.

This ended up with an odd anecdote. As a beta tester, I was sent the Solaris media just before the official release. And the CDs said "Solaris 7" on them. Sun didn't communicate the renumbering (dropping the leading 2.) very well internally, although clearly whoever pressed the CDs needed to know. So I was able to confirm to the rather sceptical Sun salesforce and reseller community in the UK that the renumbering wasn't a joke.

We tested Solaris 8, and all the updates, and the Solaris 9 and its updates. We found Solaris 8 to be a bit dull, to be honest, and shifted to Solaris 9. At this point we were tracking every release, and it was always better. It was rather annoying that industry seemed to settle on Solaris 8, as that meant that some new hardware was only supported at launch on the old Solaris 8 rather than the current Solaris 9.

With Solaris 10, we got invited onto the Platinum Beta program. This basically means that you run the latest build, in production. As Sun Service hadn't even seen the release, any bugs or problems we had went straight back to Solaris engineering, and every customer in the program had a dedicated engineer we would deal with.

I also got to go out to Menlo Park a couple of times, at the start and end of the program. We got the inside scoop on all the new features from the people who wrote them.

Also with the Platinum Beta, a select few of us got hold of ZFS. You know how you build a prototype and throw it away, and then do it properly? Well, the version we had was that prototype. And yes, it was thrown away and ZFS was rewritten pretty much from scratch. That was why ZFS wasn't in Solaris 10 at launch, by the way. And the version we tested was a bit different to the way that it ended up working - for example, initially the pool didn't have an associated top-level mountpoint, so that pools and datasets were quite distinct. But the attitude of that ZFS testing was quite simple - they just sent us the zfs and zpool binaries, the kernel driver, and a 3-line crib sheet, and everything was supposed to be intuitive and obvious. If you couldn't work out how to do something that was considered to be a bug.

Immediately after Solaris 10 (in fact, starting just before the release) we kicked off OpenSolaris, initially as a closed pilot - nobody really knew how it was going to work, or indeed if some lawyer would find a speck of dust to jam up the works and prevent the whole thing going live. But OpenSolaris launched and its descendants, yes I'm talking illumos, are still making a difference.

The Trouble with Tribbles...

Monday, October 28, 2019

A brief history with Solaris

1 comment: