Saturday, November 02, 2019

Testing hardware

My relationship with Sun wasn't just about testing Solaris. One of the other things we were involved with was beta testing new hardware.

Doing beta testing of software contributed to us being offered the chance to test new hardware, of course. Sun could be reasonably sure they would get lots of honest feedback from us.

And also, we had given a huge amount of feedback via the sales organization. Usually of the form "we don't want systems like that, why can't you make something like this instead?".

Some examples here: At one point we ended up buying Dell 6350s (running Solaris on Intel of course). They were a lot cheaper, for one thing. Paradoxically, we often had more memory in those 32-bit Intel machines than in the 64-bit SPARC servers, often down to cost but sometimes limitations in configuration. But a lot of it was that the 4U rack-mount Dell would fit in the datacenter, whereas we wouldn't have been able to fit a pile of E450s in, even if we could have afforded them.

Another example was that we bought a load of Ultra-5 workstations, some cheap shelving, and a KVM switch. Tied together with Grid Engine and its distributed make, they could get through some parallel tasks for a lot less than any server on the price list at the time. We tried (and failed) to persuade Sun to go even further with a cut-down system - we didn't need a CD drive or anything like that, it was just waste.

We also kept complaining about little things like chassis design, cabling, configuration, access and repair.

Eventually, we were asked to trial some new products.

One of the most interesting, and long running, was the B1600 blade system. As in blade servers, not Blade workstations. Aka Stiletto.

This had a variety of Blades - simple cheap single-processor SPARCs (B100s - like a V120 or flapjack, but smaller), and a single-processor AMD (B100x) with a twin-processor Xeon (B200x). There was also a plan for a variety of appliance blades, the only one I think that came out was a load-balancing device, but we didn't test those. If you think about the design, the whole thing full of SPARC blades was not much bigger than an Ultra- 5, but far more powerful and easier to wire.

I had an unfortunate accident while on Holiday just before we started the testing, where I broke my arm. I wasn't able to go to the "training" session and learn about the system, but then I'm a great believer that systems should actually be obvious to use. As the only UK customer though, we had one of the engineering team come and make a video of us unpacking, racking, and configuring the system.

So there was me just supervising, Andy with the video camera, and Geoff and Terry doing the lifting. All the way through from opening the box to having things ready to roll. So the project did cover things like the way it was packed in the carton, how you were able to lift it out safely, and how useful the bits of paper that came with it were.

We vastly preferred Sun rack cabinets, they're just so much stronger and more stable. But we decided that we would try putting the B1600 chassis into a third-party rack, as an additional test. This was a nightmare! The B1600 chassis didn't use traditional rackmount rails, it had some very thin wheels that slotted into side-rails, the tolerances were tiny, you had to line the thing up to within a millimetre, and if you think about getting anything with cage nuts down to under a millimetre by eye, it was always going to be difficult. It took us the best part of an hour (part of that was us giving a running commentary, to be fair) and multiple attempts, including some where we thought it was nicely aligned but actually wasn't, so it could have fallen.

The video was widely shared inside Sun, as I understand it, and the fix was to supply a simple metal measuring bar that you could offer up to the rack to ensure everything was square and at the correct spacing. We ended up trying several designs as they optimised it. If you've ever wondered how those spacer bars originated, now you know!

Of course, we tried sticking it in a proper Sun rack, and it was racked perfectly there in 5 seconds flat before Andy could even get over there with the camera.

Sun didn't really help themselves at times. When we first got the x86 blades, we couldn't run Solaris on them and had to run Linux for a bit. Things like drivers and management interfaces took a while to be completed.

Another project we did was the original V40z, the first generation 64-bit Opterons (this was just the Newisys Opteron reference design with a different bit of plastic tacked on the front). This was less about the actual hardware as bringing in a 64-bit operating system and the overall ecosystem associated with it. The first thing we noticed about it was that as soon as you apply power, the fans scream at full tilt - it was incredibly loud. A nice feature of this generation of systems was that they had 2 management network interfaces, so you could daisy chain the ILOMs in a rack, saving a huge number of external switch ports.

We also tested the V250, a tower server. One of the things we had complained about was that you never got a lot of choice in disk configurations - you either had too few (most of the Sun rack-mount range), or you needed the compute power of an E450 and had to buy a big metal box with mostly unused space. There was never a way to size a Sun box properly. We liked the V250 because it took a sensible number of disks, so for standalone databases it was great.