The Trouble with Tribbles...: 2006

Wednesday, December 27, 2006

Fun with NVIDIA driver

I have a Sun W2100z running Solaris. It's a fantastic machine, but the graphics has never been too hot. It's got the entry level graphics card - an NVS 280 if I remember correctly - and I've been using the bundled nv driver as I wasn't aware that the accelerated driver could do much better.

I was trying to get DVD playback to work, and gxine seems to work (after I had built and installed libdvdcss), but was rather slow and I noticed that Xorg seemed to be pegging a whole cpu. In addition, it might be neat to get Looking Glass to work.

So I ambled over to NVIDIA and their Solaris Driver downloads, and noticed that my graphics card was listed as one of those supported. I downloaded and installed the latest version (9764), and that didn't work. I get an error message as it boots that seems to be from the nvidia driver, but then the system immediately reboots. Not good. (I went into failsafe from the grub menu and used pkgrm to delete the two packages to recover.)

Then I had a look in the forums and noticed a message about a legacy release (9631). Now, it doesn't really explain what the legacy release is in this case (the information on the download page about legacy releases applies to Linux), and there isn't a link to the legacy release on the download page itself, but to cut a long story short - the 9631 legacy release works fine on my entry level box. DVD playback is a bit smoother, although still not brilliant, but the cpu load is pretty negligible compared to before.

If you, like me, have an entry-level W2100z with an NVS280 graphics card, then you may be advised to use the NVIDIA legacy driver rather than the latest and greatest.

Tuesday, December 05, 2006

Jumpstart Profile Builder

One of my little projects is to write an interactive jumpstart profile builder, so that you can play with Solaris software selections, and construct a working jumpstart profile.

The problem currently is that you can, with some effort, create a jumpstart profile. But the only real way to test it is to use it to install a system, and either see what works and what doesn't, or look at the installation log to see what package dependencies you messed up on. Eventually you can get quite good at this, but it shouldn't be this hard.

You can select packages in the interactive install, but the dependency resolution is awful. It tells you what the dependencies that failed are, but doesn't give you a way to feed that back into the gui to resolve them. Nor does it tell you what dependencies are required as you look at the individual packages. And once you've selected your software, it will install it but you can't go from that to create a jumpstart profile to automatically install the next set of systems.

I have most of the base code already, in the shape of Solview. That knows how to turn the .clustertoc file (which is where the clusters and metaclusters are defined) into a tree representation of the packages, that could be used as the basis of a selection tree. And it can evaluate the dependencies. I've also got code that can describe a jumpstart profile.

Where I'm currently struggling a little is the actual interface. Using a java swing JTree as the base is fairly straightforward - this is what Solview does already. And I need a custom cell renderer to add checkboxes to show whether the software package is selected or not (or, in the case of an installed system, whether it's been installed or not). But then I got to thinking and I can't use the regular JCheckBox, as there are more states that I wish to display than JCheckBox gives me out of the box.

Package or cluster selected
Package or cluster not selected
Cluster partially selected
Package or cluster selected and cannot be deselected

The last case is for required packages and clusters (which you can actually delete in a jumpstart profile but which it ignores), and I could probably do simply enough by having the checkbox selected but disabled. I might also want to have a separate state for packages and clusters which have forward or reverse dependencies, so it's clear visually whether it's safe to select or deselect them.

I've had a quick look around with the help of google, and I'm obviously not the first person to have requirements of this type. And there is a fairly large amount of sample code out there. Unfortunately it appears to be fairly complex, and I would be adding a significant amount of extra code, but it doesn't appear that I have much choice.

Wednesday, November 22, 2006

T2000 vs bison

This wasn't a benchmark, or even meant as a test, but it was an interesting number nonetheless.

I installed bison today. As I have 3 different families of sytems (Sparc Solaris 8; Sparc Solaris 10; x86 Solaris 10) I have 3 copies of /usr/local, so build on 3 different systems. Anyway, after building bison (and the T2000 allows a very parallel make that some builds can take advantage of) I ran a make check.

The results were quite interesting. My sub $1000 Opteron 146 system took 107s. An old V880 (750MHz US-III) took 529s. The T2000 took a whopping 794s. This really isn't looking good.

Friday, November 17, 2006

Cheap sparcs and constrained configurations

I've recently been looking through the Sun server range, both to get some systems for present needs and to try and map out what is likely to meet our needs in the next year or two.

In some areas there are good matches and obvious choices. For example, theX2200 makes a phenomenal compute node, and the X4500 (aka thumper) is a phenomenal data storage machine.

But there are other areas where the choice is less obvious, or where what I actually want is a small variation to the configuration that Sun don't do. Here are some examples of the sort of things I want to do but that aren't available:

A machine at the lower end of the range, such as the X2100, V210, or V125, with redundant power
Much larger internal disk drive capacities, such as 300G drives in the V240 and similar, and 500G SATA drives in the T1000; these SAS drives really aren't helping
Some new machines with full-height PCI slots so I can use my old PCI cards and connect old peripherals
Something like an X2100 with 4 500G drive bays on the front (would make a great fileserver appliance)

In other words, more configuration flexibility, and less of the one size fits all mentality.

Things get especially problematic when you start looking at entry level sparc systems. Yes, application availability often means that we have to keep looking at sparc servers. And the available configurations aren't encouraging. The biggest issue is actually around disk, where the newer systems (T1000, T2000, V245, even the V445) are poorly provided for in terms of internal capacity and performance. In many ways, the V245 is a hugely retrograde step down from the V240. I really don't want to have to buy a SAN an FC cards to attach the machines to it, just to have enough space to load an application on them.

Monday, October 23, 2006

Problematic Printing

I've just migrated an important service off an old Solaris 8 machine onto a new(ish) server running Solaris 10.

Everything went pretty smoothly, with the exception of a few little printing problems. (I'm not a printer person, really. Whatever happened to the paperless office?)

I had just used lpadmin to set up the printers. I've read that printing in Solaris 10 is much better than it used to be (and that may be true) so that it can drive printers properly out of the box without the need for extra products like the HP JetDirect (what used to be JetAdmin) printer driver. Indeed, to the point that you're not supposed to use JetDirect any more.

In our testing we had ended up setting the type to any (rather than postscript) and using the netstandard model script. Anything more sophisticated and it mangled some of the printouts (specifically, we had some plaintext reports that had embedded characters to switch the printer to landscape that didn't work).

What we discovered today was that printer tray and media selection weren't working. I had been told that this was done in the application, but what that actually meant was that there was a wrapper shell script that called lp with some options that the jetdirect software can understand. And clearly the vanilla print system under Solaris doesn't grok those options.

So what I've had to do is to install the latest version of the JetDirect software that I can find (E10.34). And then add the specific model scripts for our model printers (and these are on the HP website, although they're a devil to find and the HP search is almost useless). And then using that to drive the printers made everything work flawlessly.

I'm sure there must be a way to pass the appropriate printer options to select the correct media type without running everything through JetDirect, but I've yet to find it. (Oh, and it would be nice if it used the same option names, to save having to track down every wrapper script that might be involved, although that's not absolutely necessary.)

Thursday, October 19, 2006

Network Corruption?

I've just started seeing corrupted files on one of my X2100 M2 boxes. The visible indication is that I'm unpacking gzipped tar files from an NFS mounted disk, and was getting crc errors and invalid compressed data. Testing indicated that if I used rcp to transfer the files then I was also getting corruption. (I can check the md5 signatures, rather than just rely on gzcat to barf.) There wasn't any other evidence - I'm not seeing corruption of local files (basically, of Solaris itself), nor any network errors on the machine or the switch, and it's just the one machine that's affected.

How to test? SunVTS runs on Solaris x86, so I tried that out. Which wasn't encouraging:


INFO: Host System identified is "X2100 M2"
This Platform is not supported by SunVTS.

However, if I edit the platform.conf file it does at least run. And, so far, no memory or network errors.

Mind you, the errors I was seeing earlier seem to have disappeared...

Thursday, October 12, 2006

Keeping the T2000 busy

Like many organizations, we use old computers for lightweight tasks - Sun systems just keep going. (Which can be a problem, as you can be tempted to keep them for too long when replacing them with a new cheap system would actually be a much better bet.)

So I'm putting together a couple of SunBlade 100s. They're more than powerful enough for the tasks in hand, and it saves cash. But building the software on them can take forever. Especially when you have to do it a few times before you get it right.

Enter the T2000. Type gmake -j 32 and boom! Build over.

If only ./configure could be parallelized...

Friday, October 06, 2006

T2000 - more performance

I've managed to do a couple more performance tests on my loan T2000.

the first was kicked by by the announcement that pbzip2 is available on sunfreeware. Now, bzip2 is pretty cpu intensive, so if there's a way of making it go faster then I'm all for it.

So here are some figures - in seconds - for pbzip2 as a function of the number of cpus used. I've used a small blocksize (100K) rather than the default, which helps as the test file isn't all that big.

# cpus	Time
1	16.444
2	8.686
4	4.287
8	2.791
16	1.821
32	1.784

That's pretty good. I would expect to get approximately a factor 8 gain here: it's a cpu intensive application and there are 8 physical cores. So we see about a factor 9, which is pretty good.

My next test was to load up an apache/php/mysql combo and see how that goes. The comparison machine is a SunBlade 2000 with a pair of 1.015GHz processors. (Note: this is using the same binaries - not optimised code on the T2000. Yes, Sun have optimised binaries, but that isn't really helpful in comparison, as with most applications a recompile wouldn't be feasible.)

For a simple database query operation, the T2000 took 0.160s as opposed to the 0.112s. For generating a graph, the T2000 was again slightly slower - 9.538s against 7.167s.

Again you see the single-threaded performance is at about the 60-70% of a regular UltraSparc cpu of the same clockspeed. But, this thing has 8 cores, how well does that work?

So I fired up the apache benchmark ab. And just for the database retrieval report I can get about 30 pages per second out of the SunBlade, but (with 32 concurrent requests) about 100 pages per second out of the T2000. And - according to top - the T2000 is only 50% busy. In fact, my server (apache/php/mysql) setup croaks if I push it much harder (but then it's a reporting thing designed for 1 page a minute).

So, on this test the T2000 is at least as good as 3 SunBlade 2000s. Or,equivalently, it's better than an 8x750MHz V880. Which is pretty good, as it's a fair amount cheaper and smaller.

Wednesday, October 04, 2006

Server Wars: The M2 Strikes Back

And I thought I was getting somewhere!

Having actually installed my X2100 M2 servers, I want to mirror the boot disk.

As an aside, I found Sun Infodoc 83605. This says:

It is recommended to use Hardware RAID controllers for the Root disk mirror, for performance and reliability reasons.

and then a little later, being consistent:

Sun Fire x2100 uses an nVidia RAID controller. Solaris OS doesn't support the nVidia driver as of now, so Solaris Volume Manager is the only option for mirroring the root disk.

OK, so I can't use hardware raid at the present time, which means (I think) that I can't hotswap the drives.

But anyway, I do the regular thing to mirror the boot disk with SVM. Works fine. I installgrub and use eeprom to define an alternate boot path. I reboot both servers. One comes back all fine and dandy.

The other one is dead. Faulty boot archive. Can't start console services. Remember that I'm having a wee bit of trouble controlling these machines, so getting to a failsafe system is a bit tricky. I fiddle for a bit, but rebooting doesn't help.

I was going to reinstall anyway, so pxe boot away. This time, I can't even see the console output. Oh dear, it's getting worse....

Tuesday, October 03, 2006

The M2 submits

As I have previously documented, I've been setting up a Sun X2100 M2.

It's not been the smoothest ride. And I wouldn't say that everything is completely solved, but having got it installed I followed the instructions for serial redirection and I can now ssh to the SP and get console access. That's probably all I need, although I still have a slight lack of confidence in the system - I can imagine ways in which it could get itself messed up where I get no access, but for most of the administration tasks I'll ever need to do I'm sorted.

Still can't get Serial-over-LAN to work, though. Never mind, ssh is good enough for me.

Monday, October 02, 2006

The M2 comes alive

I'm getting somewhere in my ongoing battle with the Sun X2100 M2.

I still haven't got the dratted thing to give me console access. But I have managed to install the beast in such a way that I can finish the install and log in.

The first step is to ensure that you have ipmitool installed. If you have Solaris 10, you may have it, and if so you need to patch it (119764 or 119765 according to your system). Then I can tell it to do a pxe boot next time around:


ipmitool -U root -I lanplus -H 1.2.3.4 chassis bootdev pxe

And I can power cycle it and it will do a jumpstart install.

Then in my finish script I have


sysidconfig -b /a -r /usr/openwin/bin/kdmconfig

which stops it asking me whether I want Xorg or Xsun at boot (questions which I can't answer because I have as yet found no way of typing back at it).

And I have a working system I can access using ssh. With luck I can get the installed system to play nice with the console, which is going to be essential when we move this box into production use.

Friday, September 29, 2006

Me versus the M2

I'm getting a very negative feeling about the SP on the X2100 M2 I'm trying to set up. This thing is just plain awful.

Yes, it's got some very fancy features. Basic functionality like being easily able to access the system console, and being able to operate for more than a few minutes without goofing up, or having documentation that is understandable, seem to be missing at this point.

The basic problem is that I can't get anything sensible out of the serial console. I can tip to the SP, and that works fine. But I just get junk output (if at all) on the console.

The KVM applet gadget is very nifty. And I can see output (once I've redirected to ttyb, anyway), but nothing I can do can persuade Solaris to accept keyboard input. It's just Solaris, as I can type into the GRUB screen OK. But if the install goes interactive I'm stuffed.

Anyway, I've got it installed by persuading it to jumpstart completely hands off. It insisted on asking for terminal type and locale before, although I'm not sure why. (Well, not completely - it complained about not being able to set the boot device and didn't reboot when it had finished installation, but that's OK, I can remote power-cycle it [and I've used that piece of functionality a few times today!].)

So my attempts here were to add:


-b console=ttyb

to the add_install_client invocation, and make sure that terminal and locale were defined in the sysidcfg file, and then edit the menu.lst file to add install as a argument - after:


kernel/unix

I added


 -v -m verbose install

Of course, I'm now stuck a little further on - looking at the kdmconfig screen where it's asking me to select an X server. Must find my notes about how to disable that prompt.

Thursday, September 28, 2006

On to the X2100 M2

I've just been trying to set up one of the new X2100 M2s. I did a plain X2100 a while ago, and that was pretty simple - all I had to find out was the appropriate ctrl-alt-meta-shift-escape-thingy to replace F12 for the initial netboot while on the serial port. (Which I've forgotten, so if anyone could enlighten me - thanks!)

The new M2 is a different beast. The first problem was getting to the serial port. I've got my loan T2000 in the rack next to it, so I thought I would simply establish a tip session from that. However, it appears that the normal 'tip hardwire' trick doesn't work with the T2000 - there is no serial port B at all.

So I tried a different machine. And our console server. And about a dozen different cables. No joy. Eventually I did find a cable that worked, but I wasted far too long. (Don't get me started on serial cables. Every time I use them I feel this inner urge to go and throttle somebody.)

So I got an SP prompt. And this thing was - ahem - strange. After 5 minutes or so of being unable to get anything sensible out of it I punted and set the IP address manually, hooked the net management port into the network, and pointed my browser at it.

(I was somewhat dismayed to find that Sun's instructions told me to open up Internet Explorer. Oh how low have we fallen...)

So I point firefox at the LOM address and - wonder of wonders - it worked!

(There was the bit about the certificate expiring in 1979, and the mismatch between the IP address and the name on the certificate, but nothing serious.)

And you get what is actually quite a neat interface at this point. I started up the remote console, and nothing happened. Bother, edit pop-up preferences and try again. And I have the remote console.

This is all actually rather clever. It certainly looks good, and actually works pretty well too.

The next battle is to install the machine. At the moment when it tries to net boot one of our Windows Domain controllers jumps in and answers the DHCP request so my install server doesn't get a look in.

T2000 - initial performance

OK, so I've got the machine running so I thought I would try some simple performance tests.

I know that these aren't going to show the T2000 in a good light. These are simple CPU intensive single-threaded apps. (If you can call them applications.) The aim was to get a feel for just how well the machine would do.

So I have a twin 360MHz Ultra 60, a twin 1.5GHz V240, a quad 1.28GHz V440, an 8x1.0GHz T200, and a 500MHz SunBlade 100 and - for fun - a cheap X2100 with a 2.2GHz Opteron 148.

I copies the /var/sadm/install/contents file from my desktop into /tmp on each machine, and timed grep, wc, gzip, gunzip, bzip2, bunzip2 on the file (it's about 12 Meg). The times, in seconds, are:



U60:
grep    0.540
wc      0.615
gzip    3.055
gunzip  0.619
bzip2   25.589
bunzip2 4.187

SB100:
grep    0.416
wc      0.517
gzip    2.408
gunzip  0.492
bzip2   29.303
bunzip2 4.314

V240:
grep    0.136
wc      0.210
gzip    0.776
gunzip  0.152
bzip2   8.028
bunzip2 1.054

V440:
grep    0.159
wc      0.247
gzip    0.911
gunzip  0.180
bzip2   9.034
bunzip2 1.241

T2000:
grep    0.402
wc      0.656
gzip    2.772
gunzip  0.495
bzip2   17.695
bunzip2 2.285

X2100:
grep    0.079
wc      0.077
gzip    0.445
gunzip  0.092
bzip2   4.053
bunzip2 0.555

What's clear from this is the the Opteron (not entirely unexpectedly) wins by a distance. And the T2000 is handily outpaced by the V240 and V440 - even accounting for clock speed. In fact, the T2000 seems to be - for the completely unfair single tasking case - more comparable to the USIIe/USIIi in something like a V100 or Netra X1.

Of course, once you take into account the parallelism available, the T2000 might be comparable to a whole rack of the old 1U netra systems.

Now to see if some of our applications can be started up on this machine, and if we can test some applications that would suit the T2000 better.

T2000 - install performance

The first thing I did with my loan T2000 was to install Solaris on it. My first impression was that it seemed to be going fairly slowly, which was confirmed by some actual timing numbers.

I can time two parts of the install. The first is the actual Solaris installation, from the begin script to the finish script. This is only a part of the installation process, but is easy to measure. The second is my local installation, which takes place on the next boot, and installs some extra packages, runs some cleanup scripts, untars the whole of /opt/sfw, and applies current patches, and includes the reboot time. I've done 4 different systems this week, and the times (in minutes) are shown below:

Type	cpu speed	install	localinstall
Ultra 60	2x360MHz	45	73
V240	2x1.5GHz	16	40
V440	4x1.28GHz	18	34
T2000	8x1.0GHz	36	60

OK, so the install time isn't necessarily a good metric, but it's probably a fair indication of how long it's going to take to do general system administration on such a system. It's also essentially serial, which isn't good for the T1 chip. Even so, the numbers here are slightly disappointing - it's doing slightly worse for it's clock speed compared to the other sparc systems I've got available to play with today.

Wednesday, September 27, 2006

T2000 - past the first roadblock

Just got past the first roadblock with my loan T2000.

Plugged it in, powered it on, and got no network connectivity.

That's odd. All the lights look fine on both the T2000 and the switch. But I can't get the link to come up properly. I get


Timed out waiting for Autonegotation to complete
Check cable and try again
Link Down

So I pull it out of the foundry switch and into a cisco. And then I get


100 Mbps full duplex  Link up

OK, it's a shame that I can't get it working at gigabit but at least I can jumpstart it now, and I'll worry about getting it running at full speed later.

Tuesday, September 26, 2006

New Toys

I've been racking some new kit today. A couple of simple boxes for mail relays (and spam filters), and a backup server.

We've gone for the new Sun X2100 M2 for the mail systems. Something lightweight, but fast to handle the load. And very affordable at that.

We got an X2100 a while back, and it's been a huge success. We run our trouble ticket system on it, and it's not just faster - it's revolutionized the way we work. Before, doing anything with the trouble ticket system was so painful that it discouraged anyone from logging or handling tickets. Now, it just flies. We're hoping for the same from the mail system, which (and I guess everyone is getting the same experience) is getting swamped by the spam deluge.

The backup system is nothing special, I've put in a V240 with a C4 library. The only thing to say about this is that the C4 library is a deep beast - it barely fits into the Sun 900 rack and that's without being cabled up yet!

And I was just sitting down after that when a try-and-buy T2000 showed up. That's in the rack too, and it all looks very impressive.

We're interested to see how well the coolthreads system works. In general, I see increasing use of opteron systems, but we still have a number of sparc based applications that we're going to have to run. It stands a good chance - apache, tomcat, with a database or proprietary search engine at the back - so we're optimistic it'll be a success.

Tuesday, September 05, 2006

Solview Updated

I've updated my solview utility to version 0.3.

What is solview, I hear you ask? It's a little java gui to show useful information about a Solaris system. It shows installed software packages (and installation software clusters); services and their status; and the output from various informational commands.

This new version enhances the package and cluster information. It's split up into several tabbed panels, and now shows cluster membership, the installed status of clusters, and can calculate full dependencies.

What happened to version 0.2? Well, that version included a rather spectacular failure to enhance the software display by parsing the contents file. I expect it is possible to get java to extract useful information from this file without running out of memory, but I didn't succeed in doing so. So that attempt got scrapped and I moved on to version 0.3.

Sunday, August 20, 2006

How hard can it be?

Yesterday I tried to write a CD on the family PC. I've been taking digital photos almost exclusively for the past couple of years, and we went through the accumulation and decided to get a few printed off properly. Simple - just burn them to a CD and take it to the nearest supermarket.

So the family PC has a writable CD drive, and a popular user-friendly operating system that's supposed to make this a breeze. So it should be a doddle, right? Wrong! I go to my photos and click where it says "Copy all items to CD", and failed completely to produce a CD with any pictures on. Several coasters, for sure, but no viable CDs. (My understanding of the way this is supposed to work is that it just copies the files to a holding cell, and when you open a writable CD it's supposed to ask you if you want to burn the CD. Nope, just trashed the CD every time.)

The PC has a number of utilities, that either came preinstalled, or as part of some other package, that allow you to write to a CD. None of those did the job either (I know the thing works, because I've written backups or VideoCDs, but I wanted something terribly simple. No dice.)

Unwilling to waste any more time, I simply copied the files to a real computer running Solaris. A couple of simple commands later and I had burnt a CD, first time, no problem.

Oh, and the pictures came out great.

Friday, August 18, 2006

Back in Circulation

I haven't disappeared. I haven't given up. I'm still alive.

The last month has been pretty hectic for me. It being summer and the kids having time off school, we took some holiday.

Also, as I mentioned earlier, I've changed jobs. I'm now working for Proquest Information and Learning, looking after their Sun Solaris servers. Half the first month was spent on holiday (arranged long before, and Proquest were happy with that - they were keen for me to make a start as soon as possible rather than wait until after my vacation). So far, I've been enjoying myself enormously. It's a big challenge and an environment rather different to what I've been used to, but I'm starting to get my teeth into it. The lifestyle is much better, and I've been cycling to work quite a lot (although, if you read the holiday story, it's not as if I have much choice at the moment).

Sunday, July 30, 2006

Self Preservation

It's been unpleasantly warm here in the UK over the last month. I'm sure other places have it hotter, but we're not used to it and not geared up for it.

I have some old work colleagues who I set up some systems for, and I got an urgent phone call just before going on holiday: "the air conditioning failed yesterday and the power in the data centre was unceremoniously yanked - could you check the systems out?".

It turns out the servers - Sun machines - were fine. With a fine sense of self preservation they had shut themselves down gracefully due to the heat:


Jul 19 18:00:14 foobar rmclomv: [ID 690426 kern.crit]
  SC initiating soft host system shutdown due to fault at MB.T_ENC.
Jul 19 18:00:15 foobar rmclomv: [ID 222544 kern.error]
  SC Request to Power Off Host.
Jul 19 18:00:16 foobar rmclomv: [ID 428258 kern.crit]
  TEMP_SENSOR @ MB.T_ENC has exceeded high soft shutdown threshold.

So far, all appears to be well, and I haven't seen any further errors.

Thursday, July 13, 2006

Enhancing the JDS desktop look

While the JDS desktop that currently comes with Solaris is essentially functional, I've always found it rather bland. In one sense it's a good thing - it has no major irritations (most window managers or desktop environments have something that drives me up the wall).

Still, I would like to get a bit more character out of the desktop. Unfortunately, the available themes are pretty thin on the ground. Unlike the huge array of themes that you can get for Window Maker, you're really very restricted.

I actually quite liked the new theme that the latest JDS builds come with, but it doesn't seem to be separately available.

I've found a couple of themes that I like. I've used eXPerience for most of the last year. I've got my Windows XP box set to the olive green theme, and find that color combination works quite well for me. And I've just been playing with Clear Blue, which works pretty well. It's a bit pale on my current monitor, and I might have to tone down some of the other applications I use to blend in a little better, but it's looking good.

There's still a lot of room for more gorgeous looking desktop themes, though.

Wednesday, July 05, 2006

S10U2 companion problems

Recently, the update 2 of Solaris 10 was released.

Along with this comes the companion CD - a collection of unsupported freeware.

However, the S10U2 companion DVD has a number of problems. Apart from the old one of being distributed as a single ISO image with both sparc and x86 binaries, so that you have to download both architectures even if you only wanta one, some new problems have appeared.

There isn't an installer. THere are over a hundred packages here and installing them individually by hand can get a bit tedious.
Worse, some packages depend on packages that are no longer shipped. I had to install SFWgcc2l, SFWgcc34l, SFWgfile, SFWgtext, SFWshutl, and probably SFWgm4 off the S10U1 companion to get the dependencies and missing components back.
Some useful utilities (some of them listed above, but also things like bluefish) have just disappeared.
Coreutils has been added, without the g prefixes.

For all that pain, is there any gain? I can't see any. I can't see any new packages that add any value, we've lost functionality, the installed packages don't work, and it's not even as if we have anything useful like a more up to date version of KDE.

I recommend going back to the S10U1 companion.

Tuesday, July 04, 2006

Time and Leadership

A couple of excellent links noticed recently:

First off is Ben Rockwood's book recommendation.

Second is Slow Leadership.

Thursday, June 22, 2006

#opensolaris

One part of the OpenSolaris first anniversary (or birthday) last week was an IRC chat party.

I'm not entirely sure I'm fluent with IRC yet. I used to chat interactively 15-20 years ago, but haven't done much since. One problem is that I get interrupted or distracted quite a lot. This isn't much of a problem for email discussions, but it's rather easy to lose the thread of a chat conversation.

First couple of times I used the builtin mozilla client. Just tried the ChatZilla extension for Firefox, which is basically the mozilla client, and it works fine.

Wednesday, June 21, 2006

Solaris Install

Mark Mayo lists his top 10 Solaris Installation Annoyances.

I'm not going to try and pretend that the Solaris installer isn't anything other than bad, but consider the following:

I've done thousands of Solaris installs over the last decade or so. I can probably count the number of times I've used the interactive installer on the fingers of one hand. No, I didn't like it much either.

The fact that most admins with a clue use jumpstart means that the interactive installer has received precious little attention. Even if the interactive installer was the best thing since sliced bread, I would never use it at work. That's not to say that the interactive installer can't be improved - it can, and must be.

I've not used a modern Linux installer (say, Ubuntu) but have used RedHat, Mandrake, Suse, Fedora. And I wouldn't necessarily say they're actually much better. Sure, they might look prettier, and feel slicker, but were often fragile (installer just bombs out) and unreliable (system all messed up afterwards). The Solaris installer might be butt-ugly and dog-slow, but produced working systems if you were prepared to suffer to the end. And if people are using interactive installers professionally, they need to get with the program and adopt some automatic provisioning scheme.

1. As for why it's slow, some work has been done on this. The biggest problem is that the distribution is compressed with bzip2. It goes way quicker if you swap out bzip2 for gzip.

2,3,4. Never happen if you use jumpstart.

5. I would argue against any attempt to change the default shell to bash. (Now, if they suggested tcsh, that might be received more favourably.)

6. Err, who on earth logs in as root?

7,8. We ought to chuck vi and vim and replace 'em with emacs...

9. Urrgh. Sendmail.....

10. Secure by default has finally made it into Nevada. But automation under jumpstart would normally knock out services you don't need.

Opera 9, Solaris x86

Not only is there a new version of Opera, but it's available for Solaris x86 as well!

Hooray!

An unfair comparison:


   PID USERNAME THR PRI NICE  SIZE   RES STATE    TIME    CPU COMMAND
   107 ptribble   4  49    0 1024M  971M sleep  219:15  1.75% firefox-bin
 25551 ptribble   1  49    0   64M   52M sleep    0:07  0.02% opera

I still prefer firefox, but that might change. Don't know why, it just seems to work for me better than other browsers. One thing that opera does have that's useful is the ability to duplicate a tab. And I am getting rather fed up of having to restart firefox every day or two to keep memory utilization within bounds on my workstation (or cpu utilization down on the XP box at home).

Thursday, June 15, 2006

Moving On

Not as newsworthy as Scoble, but I'm changing jobs soon.

I've been in my current job for about 9 months, and it's been tough. I've never really settled properly into the organization, and there have been a lot of frustrations that I haven't been able to identify properly or address. Combine that with spending between 2 and 3 hours a day commuting, and it wasn't really a situation I wanted to continue in.

Last week got fairly frantic. I saw an advert for an attractive position with a company only a few minutes from home at the beginning of the week. Sent off my CV, got called by the agency, went for interview, went back for a second interview the next day. I liked what I saw, and they obviously liked me enough to offer me the post, so I start there in about a month's time.

I'm looking forward to it. And not just the job - 2 hours of my life back every weekday is going to make a huge difference.

Wednesday, June 14, 2006

Striding Out

A year ago today, OpenSolaris was launched and started taking its first baby steps.

Over the last year, a lot has happened. More code has been released. We've had lots of lively discussions. Many projects have started. OpenSolaris distributions have been created. There's been serious progress on building a Source Code Management system that fits the needs of Sun and the community. Members of the outside community have been filing bugs and contributing fixes.

Progress in some areas has been slow, but that's fine. We've been learning how to walk before we can run, and we love Solaris too much to risk running too fast and falling over and breaking anything. But the project is gaining strength and starting to stride out more confidently.

For my own part, I've been involved in the community since the early days of the pilot. Job issues meant I had to back off on the code contribution front for a while, but I've recently managed to get some fixes put back, with more on the way. This work seems to have been picked up by ZDnet Australia and even a Sun feature.

So we've had a good year, and I feel privileged to have been allowed to be a part of it.

OpenSolaris, here's to many more successful years!

How fast is my SCSI?

Discovered a quick way of working out at what speed SCSI devices are connected to a Sun server at.

This only works for some SCSI cards - it seems to be a bit hit and miss. I've generally had some success on my SPARC workgroup servers, but other machines give zilch.

And the actual data transfer rate is something else again. But at least you can check that things are being set up at the rated speeds.

The command is:


prtpicl -v | grep sync-speed

and you then have to work out what the devices are.

Some examples. Here's a V240 with an attached 3310 array:


          :target0-sync-speed        160000 
          :target1-sync-speed        160000 
          :target2-sync-speed        160000 
          :target3-sync-speed        160000 
          :target0-sync-speed        320000

which shows the intrenal drives at Ultra160, and the external array at Ultra320.

Here's a V440 with a pair of 3120 arrays:


          :target5-sync-speed        5000 
          :target8-sync-speed        320000 
          :target9-sync-speed        320000 
          :targeta-sync-speed        320000 
          :targetb-sync-speed        320000 
          :target5-sync-speed        5000 
          :target8-sync-speed        320000 
          :target9-sync-speed        320000 
          :targeta-sync-speed        320000 
          :targetb-sync-speed        320000 
          :target0-sync-speed        320000 
          :target1-sync-speed        320000 
          :target2-sync-speed        320000 
          :target3-sync-speed        320000

all the disks are at Ultra320, and you see an SES target on each array that's come in at 5M.

Here's an old E450:


          :target0-sync-speed        40000 
          :target1-sync-speed        40000 
          :target2-sync-speed        40000 
          :target3-sync-speed        40000 
          :target6-sync-speed        20000 
          :target1-sync-speed        10000

which has the disks at 40M, the DVD at 20M, and the tape at 10M.

Friday, June 09, 2006

Ultra 25

So where did the Ultra 25 come from? I don't recall any product announcement or big splash.

Still, that's not much of a surprise really, as the machine doesn't look too impressive. Only a 1.34GHz chip, which is slower than the SunBlade 1500 it presumably replaces.

I also note that the operating systems it supports include


Solaris 9 (Available September 2006)

Does this mean that there's going to be a new hardware release of Solaris 9 in September?

Wednesday, June 07, 2006

That's Reliable?

I was just looking at a Yankee Group report on server reliability, as reported by Yahoo!.

Now it's nice to hear that they think Solaris is winning on reliability. I knew that :-)

However, going beyond the headline they find:

3-5 failures per server per year
10-19.5 hours of downtime per server per year

Ouch!

Of course, that's server uptime, not service uptime. With a decent architecture you would have some backup so that the service would be available even if a server failed. And servers do fail, no matter how good they are, or need maintenance work.

But whatever, I don't regard 99.8% availabilty as anything like good. In fact, it's terrible.

My own experience is that Solaris is pretty damn reliable. Much better than the figures quoted, at any rate. And Windows servers themselves don't seem to be too bad (although they do seem vulnerable to major corruption events which, while rare, involve significant outage), although PC networks overall seem very fragile. Linux I've found to be less robust, with older versions simply wedging and hanging regularly (something that I believe has been dramatically improved), but I suspect a lot of Linux problems are due to people believing the myth that it's free and will run on any old piece of junk hardware, and so they use junk hadware and don't manage it properly - with predictable consequences.

The other aspect of system reliability is applications and, quite frankly, application reliability is often simply not up to scratch.

Friday, May 26, 2006

Lapping it up

My intention to learn some new tricks has been going quite well.

I've been working with PHP on a little project of mine. It's a system monitoring application - data about the health of systems is inserted into a database and reports displayed on a web page.

It was written using JSP, and that works extremely well up to a point. But it doesn't really need any specific technology - there's nothing in JSP that is especially suited to the task, which is fairly simple. Rewriting it in PHP therefore looked fairly easy, and having a real project is a good way to learn something new.

And it worked very well. PHP is very easy to develop in, and I was able to implement some new features as well.

So how would I compare PHP and JSP? Each has advantages and disadvantages:

PHP is quicker to get going in, so putting together simple prototypes and building on them is slightly easier.

I found it easier to make silly mistakes in PHP - the language is a bit more forgiving, so that some typos that would have generated a failure in JSP get through and you end up with something that doesn't quite work right. You need to impose a bit more discipline with PHP.

PHP doesn't need a huge java process to run a servlet engine. This has always bothered me - the overhead of a servlet container, just to dynamically create a few web pages, is considerable.

The really big advantage of PHP, though, is that it's very easy to create images on the fly. The GD extension is pretty well standard, and works (although not capable of horizontal dashed line - doh!). I've tried various ways of creating graphs using JSP and, while it's possible, it's neither easy nor pleasant. There's clearly for an opening here for a dead-easy JSP image generator to be created (or, if it exists, to come out from wherever it's been hiding).

The downside to PHP was the length of time it took to put the stack together. Tomcat is a zip file you unzip, and you're basically done. PHP makes the common mistake of using autotools and is a right pain in the neck to configure and build correctly as a result. (Note that its not PHP's fault; it's autoconf.)

So that project is almost done, and while I wouldn't claim to be fluent in PHP, at least I'm capable of asking directions in the language. Now on to ruby on rails when I get some more free time...

OpenSolaris contributions

As Jim has just noted, some of my contributions to OpenSolaris recently got putback.

This is great. It's great that Sun - as an organization - gives external contributors like myself the opportunity to put changes into Solaris. It's great that individuals within Sun (thanks Dave!) take on the time and effort required to integrate the changes into the codebase.

It's taken a while to get going. I was starting to work on fixes during the pilot stage of the OpenSolaris project, over a year ago. The fixes just putback aren't those (although I am looking at getting that work integrated too). This set of fixes to the install consolidation was actually due to something I was trying to do at work, and pkgchk wasn't cooperating. In the old days, I could have spent days fighting with support trying to persuade them that (a) it was a problem, and (b) that it might be worth fixing. Instead of which I can now get in there and fix the thing properly.

I have some other fixes planned for the Solaris package tools as well, with several possible performance improvements having been identified.

Actually getting the fixes into OpenSolaris isn't hard (the Sun sponsor does most of the legwork). It would do no harm to start off simple, just to run through the process (I didn't, of course, and it was the first external contribution to the install consolidation as well, so it was a learning experience for us).

Monday, May 15, 2006

Old dog, new tricks

I've decided it's time to learn a few new tricks.

The first trick I'm trying to learn is PHP. I'm used to java and JSP pages (although these modern new-fangled frameworks do nothing for me), but there seem to lots of web-based applications out there using PHP, and if I'm going to install and maintain some of them, I ought to understand how the language works.

The second trick is Ruby on Rails. The promise of simplicity is very appealing to me. That agrees with my systems philosophy - KISS.

(More generally, the more completely you can understand a system, the better the chance you have of making it work properly.)

After a few trails and tribulations, I've now got both software stacks up and running. Off to try a few examples...

Flimsy fans?

Anyone having trouble with the main fans ("rear fan 1") in a W2100z?

We've got 3 W2100z boxes at work, from two different batches. Even the oldest one is only a year old.

Now all 3 have had the main fan die, needing hardware replacement. The system is dead while waiting for a new fan to show up, which is somewhat annoying.

(No, it's not the old BIOS problem.)

Still, my old SunBlade 2000 had its disk die today, so sparc boxes aren't immune. This means I'm temporarily without a system running a nevada build - I shall have to go root around in the dumpster tomorrow to see if there are any spare FCAL drives left.

dumb

Every so often I have it reinforced just how stupid using autoconf can be.

I maintain my own software stack in /usr/local - applications that don't come with the operating system, or different version of ones that do.

I was just updating gzip. So the ./configure script, when invoked with --help, says:


By default, `make install' will install all the files in
`/usr/local/bin', `/usr/local/lib' etc.  You can specify
an installation prefix other than `/usr/local' using `--prefix',
for instance `--prefix=$HOME'.

OK, so I do ./configure, make, make install.

What the heck?!?!?!

It's installed it in /usr, overwriting the system copies!

Looking at the configure output in more detail, it did say:


checking for prefix by checking for gzip... /usr/bin/gzip

I know I can't blame autoconf itself for this one, because the tool has been misused, but using a tool and then violating both its common conventions (default prefix of /usr/local) and its own self-documenting behaviour, is really bad.

And one last thing. It installs itself as zcat, not gzcat. On Solaris, zcat isn't even the same beast.

Thursday, May 11, 2006

MyISAM vs InnoDB

I've been using MySQL for the database componenet of a number of projects over the years.

Usually, I've used the MyISAM storage engine. It's fast (that's the reason for using MySQL in the first place), and generally reliable.

MyISAM isn't robust against system failure though. System crashes, reboots, and power cuts tend not to be handled very well. (Yeah, I know, they shouldn't happen in the first place, but this is the real world.)

I don't need the transactional capabilities of InnoDB (from the functional point of view even MyISAM is overkill for most of what I do), but something more robust would help.

So I thought I would do a quick check of the impact of using InnoDB on the system. This isn't a benchmark, it's completely unscientific, and all that, but it told me what I needed to know.

Running InnoDB, one minute's worth of disk activity looks like:


r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
0.1 7.0  0.8 34.1  0.0  0.1    1.6   12.4  1  5 d0

whereas with MyISAM it looks like:


r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
0.1 2.3  0.8  3.0  0.0  0.0    3.5   14.5  1  1 d0

OK, so InnoDB generates 3 times as many writes, and 10 times as much data transfer, as MyISAM. And the mysqld process consumes correspondingly more cpu time (although it's still very small - much less than 1% of a processor), so the system load average is a bit lower with MyISAM too.

I don't think this rules out InnoDB, although it does indicate that there is a significant cost to changing, and to scale up by a factor 10 (which I'm going to need to do, and then some) would likely have problems if I was using InnoDB. If I go down that route, I need to do more optimisation of the system design and the database client interactions.

Wednesday, May 10, 2006

Sinking ship?

Is this the end for SGI?

It's been a long painful slide, but inexorably downwards, and I can't really see any way back (short of a government funded rescue package).

I remember sitting through a sales presentation from SGI some years ago. We were told of their grand plans to throw away their well-respected workstation technology and become a shifter of (not quite standard or compatible with anybody elses) PCs running Windows NT; to throw away IRIX and adopt Linux; and to throw away their own RISC chips and go down with Itanium. (Itanium wasn't even shipping at the time.)

We regarded this as doubly suicidal. Not only were the products entirely unattractive to us (as existing customers with investments in their hardware and software platforms, incompatibility is a big turn-off), but we could see the plan failing and were thus reluctant to invest in products from a company that - in a single presentation - had gone from a front-runner to certainly doomed.

Sunday, May 07, 2006

Outdated libraries

Every so often, the ugly interaction of Solaris and the undisciplined open software world completely breaks a project I'm working on.

Latest case in point - libxml2. Solaris ships a horrifically antiquated version (2.6.10, to be exact) with Solaris 10 (and nevada, currently). Snag is, PHP requires 2.6.11 or later and won't build.

There are a couple of bugs open: 6362386 and 6193876. Not to mention 6378879.

It's been this way for 16 months. Surely time enough for something to have happened?

I realise that, because certain core components of Solaris itself rely upon libxml2, it cannot be upgraded without due care. If that is the case, then those components should use a private, compatible, copy, and allow the publically available version to be kept reasonably current.

Unfortunately the download site appears to be down at the moment, so I'll have to grab the source and build my own up-to-date copy tomorrow before I can make progress.

It's not as if libxml2 is the only external component that is somewhat antiquated. In fact, quite a lot of it is getting sufficiently old as to be useless, and if Solaris is going to be used as a platform for other open-source applications then some serious updating is going to have to be done.

Great Open Source

Looked fairly innocuous to start with, but this blog on ZDnet points to a list of the top 50 open source projects. Check it out - there are some real nuggets, and starting points for further exploration.

Friday, May 05, 2006

Solaris to/from Linux?

So Matty gave some reasons why people might consider switching from Solaris to Linux.

This is all a matter of opinion, of course. Despite computers being binary systems, everything in IT is shades of grey (often very dirty shades of grey). So here's my take on Matty's points, in the same order:

1. Integrated LAMP stack. Actually, all my Solaris boxes could have a full SAMP stack installed, but I would never use it. I wouldn't (and didn't) use it on Linux either. In both cases I would much rather install the application stack separately. It's very much safer that way - not only can I be sure that all the components are at specified known levels, and are the same across platforms, but I can be sure that my OS vendor isn't going to screw me around. I always had to wipe all evidence of apache, mysql, and friends from a RedHat system in order to get operational stability; unfortunately it appears that Sun are heading down the same misguided path by bundling more services in Solaris.

2. At my previous place of employment, you could always tell when the next version of Fedora had been released - all the group of developers were surly and miserable for a week because there desktop had been randomly rearranged and they had to relearn it. I reckon each of them lost a week's productivity as a result. Sun aren't much better - open source software doesn't care for operational stability, and every time Sun update Gnome or JDS in Solaris nothing works. And I'm now convinced we're going backwards. (As an aside, I've had many more problems with desktop apps not working right on Fedora than Solaris, but that may have been because the Solaris versions didn't try to push the bleeding edge so hard.)

3. The JES stack deserves a good kicking. My experience of it has been woeful. So I agree with that part. The other side of the coin is regular updates. As I said in point 1, I don't want my OS provider to randomly modify the working of key applications. So I wouldn't use either of them.

4. My experience of ISVs is that they hate supporting Linux. And I can't blame them - the qualification effort is horrendous. I asked one the other week about Solaris x86 support, and their answer was "we keep getting asked about that, but it's not on our roadmap". Anothe one went "phew!" when we said we were going to deploy on Solaris - he was then confident it would work out of the box. But, yes, there does seem to be more visibility for Linux - I suspect vendors think that's what customers want, when most of the time any alignment being customer requirements and vendor offerings is random at best. At least with Solaris, if it works it works, rather than having to stick to one particular release of one particular Linux distribution with a specific set of patches.

5. Not so; managing applications and patches on Solaris is an absolute doddle. Provided you ignore completely Sun's perpetually failed attempts to improve the process - patchpro, smpatch, prodreg, update connection are all worthy only of the dustbin. Fortunately it's trivial to write your own tools, or use pkg-get or pca which are massively better than anything Sun have come up with so far.

6. Why not upgrade? Modulo bugs in occasional releases (and these are the sorts of bugs that mean that not every build qualifies as a regular Express release) regular upgrade or live upgrade work fine. There's no need to futz with bfu unless you really want to.

7. At least with Solaris you have the option of zones! Are zones a universal panacea? No. But they are enormously useful for a whole range of operations where you need to consolidate or isolate services. I can believe smpatch update could take days, but then I wouldn't do it like that - I wouldn't use smpatch, for starters, and I wouldn't create the zones and then apply the patches. For this sort of thing our mode of operation was to simply migrate services onto a zone on a different box, and then rebuild machines, if the patch overhead was too great. (Based on the philosophy of never patching or rebooting a live service.) The fact is that there's no getting away from the fact that by creating 25 zones, you've increased the update cost 25 times. It is certainly true, though, that there's room for improvement in the patch tools. They are significantly slower than can be accounted for by the actual work they're doing.

8. It's a tricky balancing act. And I thin Sun have done pretty well with OpenSolaris. There was massive concern that the open source process would destroy Solaris' core strengths, and it doesn't seem to have done so yet. My hope was that it would address some of Solaris' weaknesses, and I don't think it's actually done that yet, but the level of engagement is there, and the signs are promising. To be honest, I'm not sure I would want to use an OS that would allow outsiders to put code back to the kernel source tree - does Red Hat allow me to modify their kernel source?

One thing that worries me about end-user putbacks is that the whole thing has to be vetted and managed, and that has to be done by Sun (because their position is that if it goes into OpenSolaris then it goes into Solaris - there is only one codebase), so it costs Sun more to have a community member fix something than to have one of their own engineers do it. Given the financials of the past few years, this doesn't strike me as an optimal situation.

There are some good points here - it's not as if Solaris hasn't got any weaknesses. The real killer is the availability of commercial software. If you're only offered it under Linux then you're going to have to put up with increased costs in terms of licensing, support, hardware, and staffing (or just pretend they don't exist), or choose another product. But there are many places where one person's ideal is anathema to someone else, and we just have to accept that there is no one solution to every problem.

Thursday, May 04, 2006

Well, that didn't work...

I notice that Eric made a decent comment about my recent rant on the decline of the desktop.

Now, I've tried xfce a couple of times in the past, but it has similarities to CDE that have always put me off. But I've never tried fluxbox (although I did try blackbox back in the day).

So I thought it deserved a try. Downloaded and installed it (which was fine apart from the usual problems).

I started it up, and everything seemed unusually sluggish. That's odd, this is supposed to be light and fast. I then made the mistake of clicking on the background and selecting 'About'. This sent the whole thing into a tailspin. 4G of memory usage later and my machine is like stuck in treacle.

Worth a try, but I think I have to write that one off.

I'm leaning back towards windowmaker. (Anyone know which black hole the windowmaker website has disappeared down at the moment?)

Wednesday, May 03, 2006

tuxpaint

One of my girls came home from school yesterday raving about Tux Paint.

It looks pretty good stuff to me. (The fact that one of my girls is mad about penguins might also make it popular.)

They're using it on Windows at School, but it's cross-platform, so I built it for Solaris too.

I had to go through the prerequistes:

In all cases I did a


(setenv CC cc ; setenv CXX CC ; setenv F77 f77 ; ./configure )
gmake
gmake install

to build, using the free Sun Compiler, as libtool couldn't cope with gcc for some reason.

Then to build tuxpaint


gmake CC=gcc

This wasn't quite enough. I needed to add an explicit


-lpng -lsocket

to the link list, and then it fails because it needs strcasestr().

OK, so I grabbed a shim copy of strcasestr from OpenSolaris (for example, this one), compiled it, and linked that in.

And, because it used a simple makefile, fixing up the problems was trivial.

Oh, and I've filed a couple of bugs. One against tuxpaint so that it handles a missing strcasestr (or doesn't require it at all); and an RFE that Solaris include this function (as opposed to the several private copies dotted about in the source), as this isn't the first time I've seen applications want it.

Saturday, April 29, 2006

Wither the desktop?

It's a personal thing. My desktop is a virtual space in which I work, play, and socialize. As such, I want it to work my way.

This covers several aspects: the way it looks (colour scheme, imagery); the way it behaves (in response to my requests); what it does (which programs are available).

In unix-like environments, the desktop was originally a number of independent windows looked after by a window manager. This has evolved into the (more or less) tightly integrated desktop environments such as GNOME and KDE that we have today.

In years gone by...

Some of the early window managers were fairly basic. I remember wm and uwm, but I really liked the old Ardent window manager. It was friendly and very customizable.

Time went on and the original generation of window managers bit the dust. It seemed that twm was the new standard. I never used twm itself much, as I was starting to run into problems caused by having too many windows open. So I ended up using tvtwm
instead.

Customizing tvtwm was fairly easy (assuming you're happy to edit configuration files) and the degree of customization available was quite extensive. You had complete control over all the mouse button events, for example. I defined all my own menus. And you could define your colour scheme extremely precisely - I had the window decorations for each type of window in different colours, so I could more easily spot a particular application on screen.

Then along came desktop environments - a window manager with an associated set of libraries, and a (more or less) complete toolset. Open Look and CDE were the well known ones, although I'm sure there were others.

However, I found one thing in common with both Open Look and CDE. I hated them both. Utterly and completely. They both forced me into unnatural and counterintuitive ways of working, and I can't stand either of them for more than 5 minutes.

We're now into the brave new world of Gnome and KDE. Again, the aim is a complete and all encompassing desktop environment with a set of libraries and a complete set of applications.

Unlike Open Look and CDE, I've found I can tolerate both KDE and Gnome. Sure, they're slow, and both have irritating features, but both are good enough that I don't start swearing at the screen after a couple of minutes.

Have we really made progress in the last decade? I'm not sure we have. Some of the applications we have now certainly have more functionality than was available 10 years ago, but I don't see that they are necessarily better suited to today's problems than the applications of 10 years ago were to the problems of the time.

Put it another way. Ten years ago the applications and environments I had available met my needs of the time more than adequately. That's no longer true. And it's not as if my requirements have actually changed all that much - certainly less than the overall computing landscape has.

I'm not optimistic about the future. I was less than impressed with the Gnome-based JDS that will shortly go into OpenSolaris. I see very little of interest, and a lot of regressions.

What I also see is a lack of variety, a lack of excitement, no spark. Coupled with an increasing inability to do the basics, and the desktop is withering away.

When it comes down to it, I'm not actually asking for very much. I want to be able to set the focus policy, define the actions to be taken on mouse clicks, define what menus appear, and under what circumstances, and with what contents, and define the shape, colour, and imagery of decorations. And then use the applications that I want.

The desktop environments appear to have become less customizable, not more. Consider the available themes. There are thousands for WindowMaker (and some of them are quite decent). How many Gnome themes are there? Now, OK, a WindowMaker theme doesn't really do very much, but it does what you want.

It's not as if the desktop frameworks have provided a solid foundation on which to build better applications, either. Most of the standard applications that ship are pretty poor. I would much rather have a dedicated window manager and ally that with best of breed applications than have a bunch of applications that happen to be built using the same toolkit.

In summary, I feel that desktop development has headed off down a cul-de-sac, and we need to get back on the main road to make real progress.

Thursday, April 27, 2006

More distractions...

As if I didn't have enough distractions, I've been playing with Google SketchUp.

(And yes, this means on the Windows box...)

I've used various CAD and 3-D programs over the years (and the majority of consumer ones felt like they were just some third-rate CAD lookalike), and they were pretty hard work. SketchUp is brilliant - so easy to build a model.

Wednesday, April 26, 2006

Die, configure, die!!!

So I'm building a new version of Eterm and, as usual, the ./configure script is getting in the way.

First it couldn't find libast. That was legitimate - so I install libast and expect all to be well.

So now it constructs a compiler command line that gcc chokes on.

That's fine, I tell it to use cc instead and it can't find libast. The reason is simple - it's putting the -last ahead of the -L switch to tell it where the library is.

So I fix that and the compile just blows up. Looking more closely:


cc -DHAVE_CONFIG_H -I. -I. -I.. -I/usr/local/i clude
  -I/usr/local/i clude -I/usr/ope wi /i clude -O -c
  actions.c  -KPIC -DPIC -o .libs/actions.lo

Look closely: it's whacked all the n's out of the include directories.

So I randomly shuffle my PATH (presumably it's found a sed it doesn't like, but isn't that the sort of thing that autoconf is supposed to solve?) and eventually get it to work.

Won't compile. The code simply isn't liked by the Sun compiler.

Ho hum. Back to gcc - let's try a different version. OK, so I get it to configure and make, but it still fails, this time with an Undefined symbol error.

The upshot of all this is that autoconf has just wasted 10 minutes of my time and completely failed to produce a successful result at the end of it all.

Sadly, this is becoming the norm.

Tuesday, April 25, 2006

Retro gaming

There aren't enough hours in the day.

I'm trying to wrap up a bunch of contributions to OpenSolaris, and have managed to get some of them submitted, but I find myself getting distracted.

I just got Pro Pinball: Timeshock for PSOne off eBay (and that has to be the best Pinball simulation ever) so I'm spending a little time on that.

I just got the PS2 Atari Anthology and it's a blast. I remember most of the games. The great thing about having it on the PS2 is that it doesn't cost a fortune to play! The downside is that the controls don't always translate to a console gamepad all that well so some of the games can be quite tricky to play.

And then I've been playing some of the old ZX Spectrum games on World of Spectrum, using the Java emulator.

Gotta go, there's a hi-score to beat...

Sunday, April 16, 2006

Graphical DTrace

So the Chime Visualization Tool for DTrace is now available.

I'm itching to try this out, and to see how you can build bridges with kstat (by means of jkstat or something different).

Unfortunately there appears to be a network problem with my test machine. Or more specifically there's a problem somewhere in the path between where I'm sat and where it's at. I've managed to determine that the machine itself is fine, but doing any work on it is impossible. It being a holiday I don't expect it to be fixed for a day or two.

Wednesday, April 12, 2006

Extra Software

No operating system - no matter how good - comes with a complete set of every piece of software you're likely to want. There are always cases where an end-user needs to add additional software to meet specific needs. Corporate servers need business software; developers may want to live on the bleeding edge.

I used to maintain a lot of stuff myself. But this has become harder and harder over time, as build systems become more complex and less intelligent, and dependencies becore more entwined and harder to resolve.

One place I now use to keep stuff up to date - and to try software out easily without having to invest the effort in building the whole dependency tree from scratch myself - is Blastwave.

It's very use. Install pkg-get and go install. It grabs the software you want and installs it and any dependencies you need. All simple and painless.

Of course, what it also shows you is how bad this dependency tracking gets. I wanted to try out a couple of pieces of software this afternoon - dia and enlightenment as it happens - and off it went, installing package after package. It's a lot easier than doing it myself, but complexity is on the rise and I'm not sure how close we are to total meltdown.

Install investigations

There's been a reaonable amount of discussion recently regarding Solaris installation speed. Indeed, there's even some idea of what the problem is.

However, closer investigation reveals that rewriting the contents file isn't the limiting factor in a Solaris install. It's not even in the top 3, which are:

The time taken to uncompress the data (they're bzip2 compressed)
The time taken write Solaris to the disk
The SMF manifest import on first boot

Currently, the contents file rewrite is having a race with the pkg tools overhead for 4th place.

Why this disconnect between the obvious problem - and it is a problem - of rewriting this large file many times and its relatively minor importance to Solaris install times? After all, for a slightly trimmed full install Solaris itself accounts for 3.5G of writes, and rewriting the contents file is twice that.

The point is, though, that rewriting the contents file involves a small number of large writes, which are quick. Solaris itself is many small files, so it generates something like 100 times the number of I/O operations.

Not only that, but it's possible to tweak the package installation order to minimize the amount of rewritten data. Simply install all the small packages early and leave the big packages that bloat the contents file until last. Doing so could - in principle - reduce the contents file rewrites by an order of magnitude.

This affects zone installs as well. For zone install, the uncompress cost doesn't exist at all. And for a sparse root zone, there's no Solaris to write to disk - it's loopback mounted from the global zone. So the contents file is much more important as a limiting factor for zone creation performance. However, I've managed to halve the contents file rewrites by tweaking the package installation order. I've not got that much control over the installation order, as it seems to depend on both dependency calculations and the order that opendir() goes through /var/sadm/pkg, but even then a gain of a factor 2 was fairly easy.

This isn't to say that the management of the contents file isn't an interesting and important subject that can lead to some benefits, but the relative importance of it in install performance can easily be substantially overstated. There's other low-hanging fruit to have a go at!

Monday, April 10, 2006

./configure is evil

For years I've used emacs as my editor.I don't want to get into religious wars here - if you want to use vi, then that's fine too. I happen to like emacs because I started out with EDT and TPU under VMS, and when I moved off the VAX I had to find an alternative - and I was able to make emacs play ball pretty easily.

I don't use an IDE, or indeed any custom authoring tools for that matter. I write web pages in emacs, as plain HTML.

Si I was glad to find out that I'm not entirely alone. Bill Rushmore just wrote about Emacs as a Java IDE. Spurred on by this, I thought about upgrading the version of emacs that I use.

I've been stuck on GNU emacs, version 19, for a very long time. The reason I haven't upgraded to later version is that they're too darn slow. (The same goes for XEmacs.) Which is one reason for not using an IDE - an editor has to start faster than I can start typing.

I now have a dual Opteron desktop, so it should be possible to get the latest version to start up fast enough, right?

I don't know, the darn thing won't build.

So I download emacs, unpack it on my Solaris box, see what configure options I have, and type:


./configure --prefix=/usr/local/versions/emacs-21.4 --without-gcc
   --with-xpm --with-jpeg --with-tiff --with-gif --with-png

Why without-gcc? I need this to go real fast, so I want to use the latest studio compiler and crank it up.

It fails. No xpm, png, in fact no nothing.


configure:2694: /usr/ccs/lib/cpp    conftest.c >/dev/null 2>conftest.out
"/usr/include/sys/isa_defs.h", line 500: undefined control

Well, there's a fundamental problem. The configure script is trying to be too clever by half, and is calling cpp directly. Don't do that. Run it the same way the compiler does, and things will work much better.

So I tell it to use "cc -E". This gets it past some of the cpp stuff, but there are two new problems that crop up:


configure:5430: cc -o conftest -I/usr/openwin/include  - g - O  
    -I/usr/openwin/include     -L/usr/openwin/lib conftest.c -lpng -lz -lm  
    - lX11   - lkvm - lelf - lsocket - lnsl - lkstat  1>&5
ld: fatal: file g: open failed: No such file or directory
ld: fatal: file O: open failed: No such file or directory
ld: fatal: file lX11: open failed: No such file or directory
ld: fatal: file lkvm: open failed: No such file or directory
ld: fatal: file lelf: open failed: No such file or directory
ld: fatal: file lsocket: open failed: No such file or directory
ld: fatal: file lnsl: open failed: No such file or directory
ld: fatal: file lkstat: open failed: No such file or directory

There's a major thing wrong here. Why has it randomly decided to put spaces in? (And why does it think it needs libelf, libkvm, and libkstat just to see if it can find a png function? I'll let it off some of the other libraries, as although it doesn't need to specify them all there were times in the past when you had to.)

That's not all:


"junk.c", line 586: invalid input token: 8.elc
...

So it looks like it's trying to use the C preprocessor in different ways.

Foiled again.

Really, I can't see why anyone would think that getting a configure script to make a bunch of random guesses (and often get them wrong) is anything other than a stupid idea. There was a time when unix variants differed dramatically and you needed a way to tell them apart, but that time has long gone. As it is, we've now got to the point where the amount of software that compiles and works is heading towards zero, and the worst thing is that - unlike in the old days when it would have taken a couple of seconds to fix the Makefile to make it work - actually correcting the error when it goofs up like this is almost impossible.

Wednesday, March 22, 2006

poking around with dtrace

Having released jfsstat, I've been using it on my test system just to see what would come up.

I left it running looking at file accesses - lookup, access, pathconf, readlink - and started to notice some odd patterns.

The first was that /proc was getting a very steady 25 lookups and 12.5 accesses per second. OK, that's easy - I had top running (refreshing every 5 seconds) and this system has 60 processes on it, so every refresh top is doing 1 access and 2 lookup per process.

The next was that / (and this machine just has the single filesystem) was seeing about 600 lookup and 10 access requests every 10 seconds.

The third part of the pattern was that roughly every minute there was a veritable storm of operations on the / filesystem.

Now, one of the longer term aims of my writing tools like jfsstat and jkstat is that you see some strange activity, and you can click on the activity there and then and drill down deeper - probably with dtrace. I'm not there yet, but I thought I would practice my dtrace skills and do it by hand.

OK, so what I'm after is ufs lookups. What can dtrace give me?


% dtrace -l | grep ufs| grep lookup
  536     vtrace               ufs                        ufs_lookup TR_UFS_LOOKUP_END
  537     vtrace               ufs                        ufs_lookup TR_UFS_LOOKUP_START
17836        fbt               ufs                        ufs_lookup entry
17837        fbt               ufs                        ufs_lookup return

Right. Let's have a look at what's generating those lookup requests:


% dtrace -n 'fbt:ufs:ufs_lookup:entry{@[execname]=count();}'
dtrace: description 'fbt:ufs:ufs_lookup:entry' matched 1 probe
^C

  java                                                            612

So, that 10 second repeating burst of 600 lookups is java. In fact, I know this to be tomcat.

Now I run the script again to catch the massive burst of activity that happens every minute:


% dtrace -n 'fbt:ufs:ufs_lookup:entry{@[execname]=count();}'
dtrace: description 'fbt:ufs:ufs_lookup:entry' matched 1 probe
^C

  sched                                                            24
  pwd                                                              28
  sh                                                               75
  fping                                                            91
  server.sh                                                       112
  uname                                                           150
  mysqld                                                          162
  ping.sh                                                         238
  nscd                                                            491
  cron                                                            498
  init                                                            988
  load.sh                                                        1014
  rup                                                            1539
  java                                                           1836
  perl                                                           2120
  awk                                                            2394
  mysql                                                         63666

I've hit the tomcat activity burst 3 times, but the once a minute is coming from something launched by cron - a system monitoring script that runs fping and rup and pokes the results back into a mysql database. But what on earth is mysql doing making 63666 lookup requests?

(The first question I asked was - is this one instance of mysql or many? If I aggregate on pid as well as execname then I see that I'm running a lot of copies of mysql, each of which generates 786 lookup requests on the filesystem.)

Next question: what are the pathnames that are being used in the lookup request? To get this, I need to understand the ufs_lookup call itself in a little more detail. So the source tells us that the 4th argument is a struct pathname, and the string I'm after is the pn_buf member. So let's see what pathnames are being looked up. First the little tomcat burst:


dtrace -n 'fbt:ufs:ufs_lookup:entry{@[stringof(args[3]->pn_buf)]=count();}'
dtrace: description 'fbt:ufs:ufs_lookup:entry' matched 1 probe
^C

  /opt/XSload/tomcat/conf/Catalina/localhost                        6
  /opt/XSload/tomcat/webapps/ROOT/META-INF/context.xml                6
  /opt/XSload/tomcat/webapps/jsp-examples/META-INF/context.xml                6
...
  /opt/XSload/tomcat/webapps/xsload/WEB-INF                        18
  /opt/XSload/tomcat/webapps/xsload.war                            20
  /opt/XSload/tomcat/conf/Catalina/localhost/host-manager.xml               35
  /opt/XSload/tomcat/conf/Catalina/localhost/manager.xml               35
  /opt/XSload/tomcat/webapps/balancer/META-INF/context.xml               35
  /opt/XSload/tomcat/conf/context.xml                              90

Pretty clear, really - every 10 seconds tomcat goes round checking to see if you've gone and modified anything.

Now for the mysql burst. What are the pathnames here? There's quite a lot of output, so I've trimmed it a bit:


...
  /usr/ccs/lib/libc.so.1                                          432
  /usr/ccs/lib/libcrypt_i.so.1                                    432
  /usr/ccs/lib/libcurses.so.1                                     432
  /usr/ccs/lib/libgen.so.1                                        432
  /usr/ccs/lib/libm.so.1                                          432
  /usr/ccs/lib/libnsl.so.1                                        432
  /usr/ccs/lib/librt.so.1                                         432
  /usr/ccs/lib/libsocket.so.1                                     432
  /usr/ccs/lib/libthread.so.1                                     432
  /usr/ccs/lib/libw.so.1                                          432
  /usr/ccs/lib/libz.so.1                                          432
  /lib/ld.so.1                                                    504
  /opt/SUNWspro/lib/rw7/libCrun.so.1                              540
  /opt/SUNWspro/lib/rw7/libCstd.so.1                              540
  /opt/SUNWspro/lib/rw7/libc.so.1                                 540
  /opt/SUNWspro/lib/rw7/libcrypt_i.so.1                           540
  /opt/SUNWspro/lib/rw7/libcurses.so.1                            540
  /opt/SUNWspro/lib/rw7/libgen.so.1                               540
  /opt/SUNWspro/lib/rw7/libm.so.1                                 540
  /opt/SUNWspro/lib/rw7/libnsl.so.1                               540
  /opt/SUNWspro/lib/rw7/librt.so.1                                540
  /opt/SUNWspro/lib/rw7/libsocket.so.1                            540
  /opt/SUNWspro/lib/rw7/libthread.so.1                            540
  /opt/SUNWspro/lib/rw7/libw.so.1                                 540
  /opt/SUNWspro/lib/rw7/libz.so.1                                 540
  /opt/SUNWspro/lib/v8/libCrun.so.1                               540
  /opt/SUNWspro/lib/v8/libCstd.so.1                               540
  /usr/local/mysql/data/my.cnf                                    540
...
  /opt/SUNWspro/prod/usr/lib/cpu/sparcv8plus+vis/libCstd_isa.so.1              756
  /opt/SUNWspro/prod/usr/lib/cpu/sparcv8plus+vis2/libCstd_isa.so.1              756
  /opt/SUNWspro/prod/usr/lib/cpu/sparcv9+vis/libCstd_isa.so.1              756
  /opt/SUNWspro/prod/usr/lib/cpu/sparcv9+vis2/libCstd_isa.so.1              756
  /opt/SUNWspro/prod/usr/lib/cpu/sparcv9/libCstd_isa.so.1              756
...
  /opt/SUNWspro/lib/libthread.so.1                                864
  /opt/SUNWspro/lib/libw.so.1                                     864
  /opt/SUNWspro/lib/libz.so.1                                     864
  /lib/libm.so.2                                                  888
  /usr/lib/libc.so.1                                              972
  /usr/lib/libcrypt_i.so.1                                        972
...
  /opt/SUNWspro/lib/v8/libgen.so.1                               1080
  /opt/SUNWspro/lib/v8/libm.so.1                                 1080
  /opt/SUNWspro/lib/v8/libnsl.so.1                               1080
  /opt/SUNWspro/lib/v8/librt.so.1                                1080
  /opt/SUNWspro/lib/v8/libsocket.so.1                            1080
  /opt/SUNWspro/lib/v8/libthread.so.1                            1080
...
  /opt/SUNWspro/prod/usr/lib/cpu/sparcv8plus/../../libCrun.so.1             2160
  /opt/SUNWspro/prod/usr/lib/cpu/sparcv8plus/libCstd_isa.so.1             2592
  /platform/SUNW,Sun-Blade-1000/lib/../../sun4u-us3/lib/libc_psr.so.1             3360
  /platform/SUNW,Sun-Blade-1000/lib/libc_psr.so.1                4480

What on earth?

So, most of these lookup operations are the mysql binary looking for shared libraries when it starts. And in some less than obvious places too! So why is this? I go and have a look at the binary:


% dump -Lv mysql
[INDEX] Tag         Value
[1]     NEEDED          libcurses.so.1
[2]     NEEDED          libz.so.1
[3]     NEEDED          librt.so.1
[4]     NEEDED          libcrypt_i.so.1
[5]     NEEDED          libgen.so.1
[6]     NEEDED          libsocket.so.1
[7]     NEEDED          libnsl.so.1
[8]     NEEDED          libm.so.1
[9]     NEEDED          libCstd.so.1
[10]    NEEDED          libCrun.so.1
[11]    NEEDED          libw.so.1
[12]    NEEDED          libthread.so.1
[13]    NEEDED          libc.so.1
[16]    RUNPATH         /opt/SUNWspro/lib/rw7:/opt/SUNWspro/lib/v8:/opt/SUNWspro/lib:/opt/SUNWspro/lib/v8:/opt/SUNWspro/lib:/usr/ccs/lib:/usr/lib
[17]    RPATH           /opt/SUNWspro/lib/rw7:/opt/SUNWspro/lib/v8:/opt/SUNWspro/lib:/opt/SUNWspro/lib/v8:/opt/SUNWspro/lib:/usr/ccs/lib:/usr/lib

The list of libraries matches what we see being looked for, so that makes sense. The problem is that the compiled in library search path contains places that it shouldn't (and some repeated), so it needlessly searches those locations (and multiple times at that) when mysql starts up.

So, problem solved. And, as I said earlier, the idea is that you be able to see this anomalous activity in one of the jkstat tools and click on it to drill down into the system to see what's going on, so that all the dtrace is done automatically on the fly for you.

It won't be that easy, of course. I'm relying on the fbt provider, so that I need to pretty well write the dtrace script by hand for each function I wish to investigate. (There isn't even a consistent naming or calling scheme - you can't just replace ufs by procfs and expect it to work.) But fortunately we have the OpenSolaris source to look at to see what's actually going on underneath the covers.

Tuesday, March 21, 2006

jfsstat

I've just updated jkstat.

The notable new feature in this version is the jfsstat utility:

This uses the new kstats introduced in Nevada recently - the kstats used by fsstat. So you need to be running Nevada build 35 or newer to see these. (ON downloads - you want SXCR build 35 or later.)

The other change that's part of this is that I've started to use JTable to display things. I don't know why I've avoided this in the past - I guess that JTable seemed rather complex, but once I had got the hang of it it turns out to be very easy. I'm also using TableSorter from the Java Swing tutorial, which gives me the ability to sort the columns for free.

Enjoy!

Friday, March 17, 2006

Home Networked

I'm writing this blog entry using my W2100z workstation upstairs in the spare room.

We've spent the last couple of weekends decorating Amanda's room, and I took the opportunity to run some cat-5 cable in the upstairs rooms. The next step was to run cable down the stairs to connect my little ethernet switch to the broadband router, and enable dhcp. Hey presto! It all works.

The girls each have a SunBlade 150. Well, I've got them, and it means I don't have to worry about them getting virus infestations. So a surfing we can go! (And it means that the main computer is free instead of being taken up by someone trying to do their homework half the evening.)

I used an all-in-one kit I got from Maplins. 50m of cable, connectors, boxes, faceplates, tools, and 4 patch leads. A great buy and a real bargain.

Wednesday, March 08, 2006

T2000 problem

I just tried to buy a pair of Sun's T2000 machines, and then discovered (very late) that the system specification doesn't match my requirements.

It's not the system itself that's the problem, but the peripheral connectivity - or rather lack of such.

In particular, the plan was to hook up a couple of 3120 SCSI arrays. Not everything wants a fancy raid array, and FC arrays (and associated HBAs) are pretty pricy anyway. In this case, I'm looking at raw spindles for database access, but also if you look at ZFS it wants plain drives - raid hardware just gets in the way. So the plan was to have 2 SCSI channels and mirror them.

This won't work in a T2000. There's only 1 free PCI-X slot (the internal SAS controller takes the other one), and the only supported SCSI HBA is single channel anyway. So at the present time you simply can't have more than a single SCSI chain on a T2000.

To say that this is annoying is an understatement. It also limits the usefulness of the T2000 for several other projects I have in the pipeline.

Now, I could use FC storage, because the T2000 does have PCI-E slots, and there are PCI-E fibre HBAs, so it would work, but you're looking at a 50% or so increase in cost, which I regard as unacceptable. (The cost differential is particularly bad in small configurations - as you push up it becomes much less pronounced.)

Ho hum. Time to construct a plan B.