The Trouble with Tribbles...: 2005

Tuesday, December 20, 2005

Updated NVIDIA drivers

For those not constantly checking, NVIDIA have released updated Solaris drivers.

(OK, so it was a few days ago, but I had missed it.)

Went along to the LOSUG meeting last night. Good to meet up with everyone again.

Mulled wine; nibbles; good talks; even the occasional mince pie finally made an appearance.

The lightning talks part worked pretty well.

Even managed a quick pint before having to leg it for the train home.

Monday, December 12, 2005

Domino Backup

One of the problems I'm working on at the moment is online backup of a Lotus Domino server running on Solaris.

Nothing too complicated, right? Just whip out your favourite backup solution , install the domino module, and you're good to go. Right? Wrong!

I've tried Legato Networker, which I've used for regular backups without any problems for the best part of a decade. Works on a trivial test, fails completely on the real thing. I've tried Backup Express from Syncsort (used by our PC systems) and haven't yet managed to persuade it to recognise that I've got a domino server installed.

I stumbled across BakBone, who make something called NetVault. I had never heard of it, but first impressions from the web site were good, and I was able to get an eval copy off their download site straight away. Installed pretty easily, and it wasn't too hard to work out how to drive it, so it's currently doing a test backup. (Performance isn't too bad, especially considering I've set it up to save to a disk based virtual library on the same disk array that the Domino server lives on.)

The real test, of course, is to wipe the Domino server out completely and see what happens if you restore it. More on that phase as it happens.

Tuesday, December 06, 2005

[ID 335743 kern.notice] BAD TRAP:

Bother!

As you may recall, I've been playing with apache httpd 2.2.0.

I was also looking at Derek Crudgington's comparison of Apache and Sun Webserver. So I decided to just test out Apache 1.3.34 against 2.2.0, and also going through to tomcat which was generating dynamic pages from mysql.

To cut a long story short, testing Apache 2.2.0 paniced my machine. I wasn't able to do too much damage with 1.3.34, but under stress 2.2.0 became rather sluggish, and then it and the whole machine became completely unresponsive.

Of course, it's not apache's fault. It shouldn't be capable of taking the box out. This is definitely something in Solaris that's gone awry.

A quick search of sunsolve didn't show a match, but for the enthusiast here's the (trimmed) message:


Dec  6 11:32:28 ratbert genunix: fffffe8001773b80 unix:die+da (fffffe8001773c20, 1fb955d3a)
Dec  6 11:32:28 ratbert genunix: fffffe8001773c60 unix:trap+5ea ()
Dec  6 11:32:28 ratbert genunix: fffffe8001773c70 unix:cmntrap+11b ()
Dec  6 11:32:28 ratbert genunix: fffffe8001773d70 genunix:list_remove+b ()
Dec  6 11:32:28 ratbert genunix: fffffe8001773da0 genunix:port_remove_done_event+4b ()
Dec  6 11:32:28 ratbert genunix: fffffe8001773e10 portfs:port_associate_fd+2b8 ()
Dec  6 11:32:28 ratbert genunix: fffffe8001773ec0 portfs:portfs+303 ()
Dec  6 11:32:28 ratbert genunix: fffffe8001773ed0 portfs:portfs32+24 ()

Simple presentations

One of the things almost everyone has to do sooner or later is make presentations. And while some people can just stand up for a few minutes and hold an audience's attention, most of us need some sort of visual aids.

Personally, good old fashioned overhead foils work fine. But we're in the 21st century, and it's almost impossible to find an overhead projector.

So the general solution appears to be powerpoint or something similar. I've been using StarOffice Impress for a few years now. It sort of works, but it's a very clunky way of doing things.

Yesterday I came across Eric Meyer's S5, a simple but phenomenally powerful slide show system. Using a combination of XHTML and CSS you can easily and quickly put a simple presentation together. If you can edit basic html, you can put together a presentation. (And using superior content creation tools like emacs or vi, at that.)

As a very trivial example, I've put together a presentation on JKstat.

Monday, December 05, 2005

New Apache

Just recently, Apache httpd 2.2.0 was released.

Now, I've been a bit traditional here. I'm still using apache 1.3.34, almost always with mod_jk to talk to tomcat, and occasionally with mod_ssl for https support.

Why not upgrade? Well, there are two reasons really. One was that it wasn't at all obvious that 2.0.x was in fact an upgrade. It always felt like a retrograde step and if anything I would describe 1.3.x to 2.0.x as downgrading. The second is that actually getting 2.0.x installed was a right pain. They mistakenly switched to using autoconf, so it's much harder to get the installation and configuration right. (If it installs at all. Many a time I would find autoconf just goofing out on one of its random guesses and failing to do anything at all.)

So, is 2.2.0 any better? Well, it still uses autoconf - and it's still a very bad move - but it is possible with enough effort to circumvent most of its mistakes (although not all). But it includes ajp support to talk to tomcat, including load balancing. Which, coupled with native ssl support, should reduce the complexity of installation - if it works.

So far, running under Solaris 10 works fine for http. I haven't exhaustively tested the ajp support to see how well it handles load balancing and failover, but basically it works and looks good.

I had fun and games with getting https to work though. Essentially, the combination of httpd 2.2.0, gcc and the Solaris 10 openssl libraries didn't work. (Using curl I could get sslv2 to work but not sslv3, and neither mozilla nor firefox would have anything to d with my server.) Compiling up the latest openssl myself (which is what I do for 1.3.x anyway) with gcc works just fine. So it's either a gcc vs. cc incompatibility, or a version problem (Sun are supplying quite an old version), or some other strange incompatibility. It would be nice if I could rely on the openssl bits that come with Solaris, as openssl itself is a reasonable size and takes quite a while to build, but it looks as if I still have to do it myself.

Wednesday, November 30, 2005

Awash with freebies...

Seems like Xmas has come early.

Of course, Sun go on at length about the Java Enterprise System. Now, this is interesting in parts, but JES is a complicated beast and likely to be of interest primarily to - well - Enterprises.

What I like, though, is the promise of free stuff a bit further down. There has been a good emphasis on developers recently - Studio 11 and Creator, for example. But what's also now promised is free versions of Tarantella and SunRay, which are likely to be of interest to a far wider range of customers.

And as I read it, the nebulous N1, including Sun Management Center, is included in the deal too.

Tuesday, November 29, 2005

suspend/resume at a crawl

I've got a Sun Blade 1500 at home (one of the old red ones). Works great.

Apart from suspend/resume, that is.

I have no idea why, but both suspend and resume take an absurd amount of time. The suspend isn't too bad (slower than it should be), but resume is in the 5-10 minute range. To use an Americanism, this sucks.

(It's doubly odd because I've tried this on a Blade 150, and that's much, much quicker.)

Get it right first time!

Many years ago I wrote a simple system and network monitoring tool. It's been developed on and off over the years, but has now reached a major impasse.

Basically, I designed it wrong 10 years ago. I started out with a 2 state system. If the status is 0, then it's fine. If the status is 1, it's broken and needs fixing. Sounds reasonable, right?

Then I realized I needed to add another state, so I defined it so that if the status is 2, there's a warning condition. And all worked well for a few years.

The problem with this scheme is that the severity of the problem isn't a linear function of the status. So I end up playing all sorts of games trying to analyze the status codes trying to work out just how bad the situation really is. It would be much easier if I could simply retrieve the maximum status out of the database - no fiddling required! And I can order problems simply by sorting on the status.

Thinking about this a bit more, this is the obvious thing to do. So obvious, in fact, that I was a dullard for not thinking about this at the start. (But, when I started writing this particular monitoring tool, I wasn't thinking about what version 3 would look like 10 years down the line. And I started out by using the return code from scripts as the status, which is where 0 and 1 came from.)

Of course, I now have to consider what the best scheme might be. Do I simply have 0 for good, 1, for warning, 2 for dead? I think the 0 for good is fine. But should I do something like 255 for dead, 128 for warning, leaving me some room to add finer levels of granularity in the future?

Decisions, decisions...

Saturday, November 26, 2005

Another JKstat update

I've updated JKstat to version 0.08.

It's getting better. The accessory widgets have been cleaned up and a couple of new ones added (distribution of packet sizes on bge interfaces, and dma transfer rate on ifb graphics cards). Rates are now accurately computed based on the actual snaptime, rather than approximately based on the intended refresh interval. A couple of internal changes streamline the whole system. And I've fixed it so that actually enumerating the kstats doesn't blindly read all the data, which improves performance.

With these changes, I'm much happier that it's closing in on its design goals. I was tempted to bump the version up to 0.1, but that would probably be premature based on the number of bugs that I introduced and fixed recently.

My next idea is to build a graphical iostat. Why is this of value? Well, pictures tell you a lot - the eye is very good at interpreting graphical data. You can dynamically hide uninteresting data, or expand areas of interest for a finer view (for example, you could dynamically expand a disk's I/O to show partition data). You can show historical rates, and generally have multiple views of the same data. You can use the gui to show additional context-sensitive data beyond the basic I/O data. And you could, in the future, link to other areas of functionality - such as dtrace to show what was causing all that I/O in the first place.

Thursday, November 24, 2005

Bumps-a-daisy!

Had a bit of a problem yesterday. While driving to work I got bumped hard from behind, in stop-start traffic on the A1(M).

Nothing that serious - nobody was hurt, which is what really matters. The other car was a total wreck, and the rear-end of my Toyota is pretty well squashed. It's not so bad that it can't be repaired, so in a week or so it goes into the body shop and should be all fixed again ready for Christmas.

I have to say, though, that the insurance company aren't exactly covering themselves in glory here. I mean, they presumably deal with this sort of thing on a daily basis, but they do seem to be making heavy weather of it.

Saturday, November 19, 2005

Properly connected

For a long time now, we've had a broadband connection to the internet, but we only had the main home PC hooked up.

No longer! I'm typing this from one of my home Sparc machines running Solaris.

What took me so long I'm not sure, but I finally went and ordered a little cable router (a non-wireless version, which seem to have vanished completely from the shops in favour of wireless models which cost twice as much and I can't take advantage of). Put in the Setup CD, follow the instructions, and it was working. Connect up my Sun, tell it to use dhcp, and I'm online.

I love it when things just work!

Now to find some really long cables to connect the machines upstairs...

Friday, November 18, 2005

Quest for small server continues...

I'm still working on my quest for a small reliable server.

I was just reading Richard Elling's blog entry on RAS and the X4100/X4200 servers. You should read this - blindingly obvious design features like not putting heat generators like disks in the airflow path for the CPUs. But he also says that most thin servers don't need more than 2 disks. Perhaps this is why I'm having so much trouble finding a server to fit my requirements!

Oh well, having exhausted Sun's catalog, I'm now looking at the likes of the HP DL385 or the Dell PE2850. Both of these are listed in the HCL, which is pretty much essential as I would naturally be running Solaris on the machine.

PostgreSQL, Sun, and Integration

Sun sure are busy with the announcements this week.

With the PostgreSQL announcement (and they don't seem to have mastered the spelling of PostgreSQL, it seems), Sun are offering to integrate and support PostgreSQL into Solaris.

Now, this has to be a good thing for both Sun and PostgreSQL. But does it help me?

I'm not really sure that integration does help me. Note that it's the integration - or bundling - that I have a problem with, not Sun supporting it or optimizing it or just supplying it.

Sun already bundle the Apache web server and Tomcat. I spend quite a lot of time setting up web servers, usually using Apache and Tomcat, often with other components (including, as it happens, PostgreSQL on occasions). And I never use the bundled versions that Sun supply. And the reason it quite simple - Sun's versions aren't the right versions, aren't set up the way I need, and are installed in the wrong place. It's much easier and safer to just install them yourself and you know exactly how they're set up and that they're going to work exactly the way you want, and that you can upgrade to the latest version at any time of your choosing.

Integration really ties you up in knots. Solaris comes with ancient versions of Gnome, and because they're integrated we're stuck with them. Not only that, because it's integral with Solaris 10, we can't apply the same version to our Solaris 9 or 8 machines. Integration locks application update to OS updates, and everybody loses.

What I want - and Sun need - is the ability to choose between sticking with a given version, or going to a new version. This requires that products such as Gnome/JDS (and the same argument applies to anything else, like Mozilla, OpenSSL, Apache, even Java) are unbundled and separated from the core OS. Then, I can select whether I want to stay with Gnome 2.6 (as in the version that comes with JDS on Solaris 10) or have a Gnome 2.12 desktop instead.

Likewise for PostgreSQL, which started this whole blog entry. For different applications, I'm going to have to support different versions - maybe on the same physical machine (using zones, for example). I need the ability to make that choice independent of the underlying OS version, otherwise you end up with an upgrade nightmare.

Thursday, November 17, 2005

Deluged by good stuff

Whole load of interesting stuff coming out of Sun at the moment.

It seems that free stuff isn't just for Fridays anymore. We now get free developer tools. This is something I ought to try. I have to confess to being one of the old-school who can't see what on earth is wrong with emacs, but I'm always keen to try new things, and maybe an IDE might help out.

Then we get snippets about the new Niagara chip. Marketing have clearly got in on this one ("CoolThreads", anyone?, and there does seem to be a certain greenness in the positioning, but this looks like some serious technology.

More free stuff - the Studio 11 compilers. Making these free was inevitable, really - Studio has been free to anyone in the OpenSolaris community for some time now, and I've long felt that the excessive pricing for Studio was crippling its uptake. Good one!

Is that enough? No way! Something really big happened this week - ZFS was unleashed on the world. Sure, it got hyped (overhyped) a year ago when Solaris 10 got announced, but ZFS is the real deal - I've been privileged to have been testing it for almost a year and a half, and it does what it says on the tin. So go check out all the blogs.

Monday, November 14, 2005

Turning Opteron Down

I was putting together a server spec recently. Nothing special, just a reliable box to store 100G or so of data safely and serve it up via the web.

Easy, right?

Well, that's what I thought, and I was wrong.

I have this thing about real servers. They have to have redundant PSUs, redundant disk - mirrored. This means greater than 2 internal drives.

(Note that, according to this definition, Sun's SF280R, V210, X2100, E220R, E420R, V480, V490, V20z, and E1280 don't qualify. All are limited by 2 internal drives. They're fine for compute nodes and similar tasks, where the aim is simply to survive long enough to finish the job and decommission the node, but not for real servers. You have to have at least 3 disks to guarantee survival - and reboot - after a disk goes. OK, so you're supposed to add external arrays, but usually you can't do anything like place metadevice databases on the arrays. And also, only having 2 drives make Live Upgrade harder than need be. End of first rant.)

OK, so the next thing is that 100G of storage. It doesn't really justify getting an external array - that's fine for a terabyte, but would be a waste in this case. And, unlike something like 10G you can't just lose it on the boot drives. So 100G is an interesting number.

Grabbing 100G off a SAN doesn't look promising either. Apart from not having one to hand right now, the cost of the HBAs makes a nonsense of it for this amount of data.

So, what else? iSCSI could be interesting, as it saves you the cost of the HBAs. But it's not really mature yet, and I don't happen to have a server handy. (I don't happen to have a convenient NFS server either, which is a shame.)

OK. So the next best thing is to get a box with 4 drives - 2 to house the OS and the application binaries, and a couple extra 146G drives for the data.

So, I start of by thinking - these Sun Opteron boxes look real nice. Particularly the X4100, which can take 4 drives without the DVD. (And you don't need a DVD - it's just something else to waste money and electricity.) However, this won't work. Sun only offer 36G or 73G drives. Not enough! And there isn't a slightly bigger variant that takes more drives. OK, so Sun don't make an Opteron box that will work. Bother.

So, go to Sparc. The V240 works a treat. I like the V240. A couple of boot drives and a couple extra 146G drives and I'm all set. It's interesting that an old Sparc box is better suited than a new Opteron box.

(Not that the V240 is perfect. In the same way that it's a major disappointment that the Opteron boxes don't take 146G drives, it's disappointing that the V240 doesn't support 300G drives. Why don't Sun realize that customers want choice?)

OK, so I'm a Solaris fan, and Solaris x86 runs on a wide range of systems. A quick browse through other manufacturers websites (and some of them are nowhere near as easy to navigate as they ought to be) shows that this trend of useless system design is fairly widespread. Other manufacturers are more agile at supporting larger drive capacities, but the systems designs are similar.

In the end I decided to simply park the problem in a zone on a bigger system. It's a good solution, and was what I wanted to do anyway.

What is intriguing is that Sun used to have ideal systems for this sort of task, and have now scrapped them. The V60x allowed you to have 3 drives, so you could avoid the twin-drive trap. The V65x was a wonderful compact server and let you put 6 drives in. The V250 let you put 8 drives in the chassis, but seemed to get canned pretty quickly. It's not entirely obvious to me that genuine progress is being made.

JKstat updated

After a long hiatus, I've released an updated version of JKstat.

For those who don't know what a kstat is, it's a Solaris kernel statistic. There are a lot of these, and they give you an awful lot of information about what you Solaris system is doing.

JKstat allows you to get at the kstats from a Java application. (Solaris already includes a fabulous perl implementation. One day, I hope JKstat will be as good. It isn't yet.)

The kstats naturally form a tree structure, and I've written a graphical browser that allows you to go through the kstat tree. Like so:

(Oh dear, what has blogger done to my beautiful image? Oh well - click on it and you'll see the real thing!)

There's still a lot to be done. For one thing, I want to actually create a decent API rather than the horrible kludge that I'm using at the moment (it's this, rather than any lack of maturity or functionality, that keeps the version number at a lowly 0.07). And there are a number of existing tools that could be enhanced by a decent graphical user interface (in particular, the ability to dynamically expand or compress certain features - imagine iostat with the ability to zoom in on specific disks and show or hide the partition data on the fly). Looking further ahead, one can imagine integration with dtrace to answer the question "what is causing this activity?".

Enjoy, and if you have any comments (and, in particular, you would like a graphical display of a particular kstat, or can think of novel and useful ways of displaying kstat data) I would love to hear them.

Sunday, November 06, 2005

Everyone an administrator?

James Dickens asks: Why not a server?

And it's an interesting question. Why, in a house with multiple computers, do you not have a dedicated machine somewhere and store all your files on it? It makes a lot more sense than having files spread at random amongst all those machines.

My own solution to this is a portable USB zip drive. I use this to carry stuff about between my machines, and between work and home if needed.

I'm not sure that the suggestion of using a real computer (and an Ultra 2 certainly qualified as a real computer) as the server though. Yes, I know that Solaris is an absolute doddle to administer (yes, really - once you've got to know it). But it's bad enough that everyone owning a Windows PC has to be a systems administrator, without expanding that even further. Even though it pays my wages, I'm a firm believer that when it comes to systems administration, less is definitely better.

Using a general-purpose computer doesn't necessarily make sense to me. (The one situation where it really comes into its own is if you were to use it as something like a Sun Ray Server.) But generally, some sort of appliance seems to make more sense.

And an appliance running a cut down OpenSolaris with ZFS would be a stunner.

The only downside to a server is that part of the assumption is that it's always on. I'm not sure that we should be encouraging that and the accompanying waste of power when there's so much damage already being done to the environment.

Hosted services - grid, if you like - also have limitations that are painful. They neatly solve the administration, availability, and backup problems, though. The biggest problem I see is that upload speeds on my internet connection are absolutely pathetic. Most internet connectivity is highly asymmetric - fast download, with just enough bandwidth the other way to handle the administrative packets and not a lot more. If we are to see hosted storage really become useful, then upload speeds are going to have to be significantly increased. (And, frankly, network reliability could stand a little improvement.)

Monday, October 31, 2005

Invalid system disk

I was updating Solaris on my W2100z the other night. After applying the latest recommended patches, I rebooted and went off to get a drink.

On my return I was rather startled to find displayed the old message "Invalid system disk. Press any key to continue." Oh bother - what's gone wrong?

It would be a bit of a pain, because I do quite a bit of development on this machine and I really don't want to lose it. The usual suspect in these cases would be a floppy disk, but the W2100z doesn't have one, so I was worried for a second that the main disk had been trashed.

But then I worked out what had happened. I had taken the recommended patch bundle home on a zip disk, and left it connected. So the invalid disk it was complaining about was in fact the one I had left in the USB zip drive. (So clearly the W2100z is capable of booting off a USB device. Scary.)

Panic over...

Thursday, October 20, 2005

Putting Google to work

I don't know how many others use Google as a problem solving tool, but I use it all the time.

I had a problem yesterday, jumpstarting Solaris 10 onto a workstation. Couldn't find the jumpstart directory. So plan A is to drop the error into Google and see what we get.

Bingo! The right answer comes straight back. Yup, simple netmask mismatch.

It's not just that, though. I was intrigued to see that Sun use google for making Solaris development decisions.

Heck, what would we do without Google?

Sunday, October 16, 2005

system() doesn't constitute an API

I like APIs. I like being able to call a function, give it some data, and tell it what to do with it, and have it do it without fuss or bother.

And I like interfaces that are stable and reliable.

However, on a typical unix-like system, certain common operations have no real API. For some operations, the normal way to do things is to run an external program.

Usually the system() call of the title is involved, but it could be popen() or some other variant. I'm using system() here as shorthand terminology for executing another program to do the work.

The two standard examples are mail and printing. You throw the data you want something done with at sendmail or lp.

This is atrocious. This isn't an API - this is merely convention that often works. Frankly, it worries me that there aren't standard programming APIs for such common tasks as sending mail and printing a file.

While mail appears to be standardized sufficiently to be largely hidden by applications, printing is another matter. The variety (and general disfunctionality) of print dialog boxes should be a clue here. Worst, of course, are those that simply have a text field that you type the print command into.

Apart from the inefficiency of launching external programs, the lack of genuine APIs limits the interactions a program can have, in particular the feedback it can get from the application it launched.

It's the 21st century. We deserve to have some decent 21st century APIs.

Friday, October 14, 2005

When did that get added?

I haven't used format to partition a disk by hand under Solaris for ages. Normally I set everything up at install time using jumpstart, and then prtvtoc and fmthard if I need to copy a partition table (if a disk fails and needs to be replaced, for example). So I happened to be working on a machine today, and was presented with:


partition> 4
Part      Tag    Flag     Cylinders         Size            Blocks
  4 unassigned    wm       0                0         (0/0/0)            0

Enter partition id tag[unassigned]: 
Enter partition permission flags[wm]: 
Enter new starting cyl[0]: 14521
Enter partition size[0b, 0c, 14521e, 0.00mb, 0.00gb]:

Say what? What's this e thing? Well, it's what you expect - the end cylinder. Aha! My, this makes putting that last partition in so much easier. No more trying to calculate in your head how many cylinders are left...

(Looking at a couple of other machines, I think this got added in Solaris 9.)

Why don't people upgrade?

Gary mentions a topic I've always found interesting: why do people persist in running old versions of Solaris?

I know of many reasons, some of them are even valid. Most of these are in the general area of supported configurations. If a vendor (or even Sun) won't support a new version of Solaris, then you're pretty well out of luck. It's not just official support, either - sometimes the product plain won't work.

(This last point makes me wonder. Given the very strong binary compatibility guarantees in Solaris, what are some of these vendors doing to make their code stop working? Presumably they think they're being clever, but they must be putting in a lot of effort to break things.)

There are also a couple of burying head in the sand excuses I hear

nobody else has upgraded so there must be a problem

we'll wait for someone else to try it and find all the bug

both of which are bogus. Let's nail the first one for starters. Actually, plenty of people do upgrade, and don't have problems (if they read the instructions, that is). Do you want to be the one left behind? This herd instinct does seem very strong, though.

The "waiting for someone else" mentality is wrong. For one thing, Solaris doesn't go round randomly breaking functionality - compatibility is very strong and the level of bugs is very low. Secondly, the Solaris development model fixes bugs in the new version first and then fixes the older versions later (if at all), so that upgrading to a new version is the best way to reduce the number of bugs that you might be exposed to. Thirdly, it's actually already been tested (to destruction and beyond) very very hard indeed already. I know, I did it. Yes, we found bugs. But a whole lot of people inside and outside of Sun have beaten on Solaris for a long time, and most of the bugs are gone. Finally, because it's been well tested, the remaining bugs tend to be in strange distant areas that nobody has covered yet. In other words, the chances are pretty good that if you're going to get affected by a bug, it may well be specific to you, your environment, and the way you use the product - and these bugs aren't going to be picked up by other people, so it's daft to wait for other people to do the testing because it doesn't mean anything for you.

There's also the argument "it's not broken, so there's no point fixing it". Sure, it is reasonable not to immediately go out and heedlessly upgrade every time a new version comes out. And once something works leaving it alone is generally a good policy. (But I think this really ought to be expressed as "don't fiddle" rather than "never touch".) But regular maintenance is an essential part of the process, and upgrades and replacements need to be planned for. The problem with just leaving things alone is that they start to rot, and if you leave them too long you have a disaster on your hands. For example, if you don't apply any patches for 4 years, and then hit a problem, you have to apply 4 years of patches - a massive change, and a massive risk. Or if you insist on running old hardware too long, you suddenly discover that when you need to replace it you have to change the OS and application by 3 or 4 major versions and it becomes a nightmare. Essentially, planned steady change is a lot better than hanging on too long and hoping you can survive the inevitable wreck. Better many small steps than a desperate leap over a gaping chasm.

I haven't even considered all the cool new features.

So I have this message to all those still hesitating about upgrading to Solaris 10:

Come on in - the water's fine!

Friday, September 30, 2005

Yay! Gnome terminal fixed!

Every so often, there's a Solaris patch that fixes a problem that's bugged me for ages.

Today's example is a big win. After applying patches 120461 and 120289 (x86), or 120460 and 120288 (sparc), the JDS gnome terminal that comes with Solaris 10 no longer needs you to press the stupid shift key to get PageUp, PageDown, Home, and End working.

This is one of those little useability things that makes a huge difference.

To whoever fixed this - Thank You!.

Thursday, September 29, 2005

Being Enterprising

I've been playing around with various bits of the Java Enterprise System.

It comes on a neat DVD in the Solaris 10 media kit. So I had a play with that at home, only to discover that it's missing one thing. Instructions. There is no documentation included. Zilch. Nothing.

The basic instructions can be found on the web, but that's not the point. It's a fairly small pdf file - about 4 megabytes, and there's plenty of space on the DVD to fit that in all the supported languages. It would make it so much easier for people to test.

Another useful document I came across is a useful guide to evaluating JES on a single system. (This would also be absolutely easy to simply copy onto the DVD.) This covers the basic steps you need to get started.

Thursday, September 15, 2005

Google Earth

Surfing away last night, I came across Google Earth.

Wow!

This is a stunner. I haven't shown it to the kids yet, but one of them is doing geography homework at the moment and would find it amazing. (One reason for not showing it to them is so that I can get access to the machine.)

Tuesday, September 13, 2005

New Sun Boxes

So now we know what the new Sun servers are like.

And I'm pleasantly surprised. There are a couple of things that I feel are positive here:

The entry level box, the X2100. This makes a nice addition to the range. Yes. it's dumbed down, but you don't always need all the features - sometimes you just want a server.
I'm pleased to see the two configurations of the X4100. You can get this in either a 2-disk (plus DVD) or 4-disk variant. I like the 4-disk variant. Often, a local CD is just a waste of space and money, and I like having decnt local storage (read more than 2 disks - so you don't have to worry about quorum in a 2-disk configuration, and you can have some space to put data for a data base or some such locally). It's amazed me that the V210 (and the V1280, even) only allow 2 internal drives.

OK, so the range is still a little thin, but that's effectively twice as many variants as I was expecting, and I like the look of all of them.

Monday, September 05, 2005

All Change

Today's been a pretty big change for the Tribble household.

I started a New Job, while Amanda started a new school. Not just that, but this involves completely different schedules for getting up in the morning, getting ready, and travelling.

I can't tell you much about the new job, because I've largely been drowned in paperwork. I did manage to unpack and install my new workstation (wow!), but that's about all.

Friday, September 02, 2005

Java man pages

As the holidays draw to a close, I've been trying out a few more ideas.

Solaris comes with man pages in nroff or sgml format, that can be preformatted and indexed with catman. (Why the preformatted man pages and index aren't created by default is beyond me.) But there are still a couple of weaknesses with the man system

There's no search system
There's no way to follow cross-references

The Documentation DVD includes html formatted man pages which have the linking element, which is an improvement. But this doesn't solve the search problem.

However, Solaris comes with Javahelp. So I had a little play today with creating a Javahelp helpset out of the html manpages.

This worked reasonably well, up to a point. It was pretty easy to knock up the helpset file, and the map, TOC, and index files, and to create a full-text database for searching.

The Javahelp hsviewer did a reasonable job of providing a user interface. You can navigate, and the search isn't too bad.

Where it all fell down was that the html display in the viewer uses the swing JEditorPane. Which is - shall we say - a bit primitive. So while the overall solution functions reasonably well, the actual display looks pretty awful. Sufficiently bad, in fact, that it's not going to work in practice.

So the idea that I could produce something from a quick hack came to nothing.

I'm not done yet, though. As I see it, there are 3 ways to go forward:

Recreate the html versions of the man pages with an old html syntax that the javahelp display engine is going to cope with.
Use a different html renderer component.
Integrate this into a web server that does the searching so you do the search and indexing in a servlet but just shove the html pages out to a browser.

At the moment, the first looks the most attractive, as this also solves the problem of what to do with other man pages that you might have.

Galaxy - About Time

According to ZDNet: Sun's 'Galaxy' servers making September debut.

Well, it's about time. And not just for these new servers specifically, but the flow of new product out of Sun has all but dried up this year and Sun need to get some new products out into the marketplace.

I generally like the look of these boxes (although the internal storage is more limiting than I would like).

What does worry me, though, is the lack of choice and diversity. I don't see any tower systems or dirt-cheap single processor boxes (like a server version of the Ultra 20, for example).

Thursday, September 01, 2005

Those ole backup tapes

Some technologies are spectacularly primitive and have been largely replaced by modern equivalents. One such is backup - the idea that you use something like ufsdump to back up a Solaris filesystem has no place in the 21st century. Tools such as Legato Networker (or the Sun equivalent - whatever it may be called this week) or Veritas NetBackup should have relegated ufsdump to museums years ago.

(I have used ufsdump occasionally - usually piped direct into ufsrestore as a quick hack to copy a filesystem.)

But having closed down the systems at my former employer, deleted all the data and sent the machines off to oblivion, and burnt all the backup tapes, of course I needed to get some files back for one particular project.

So: no tapes, no tape libraries, no backup server, no legato license. But I did have a ufsdump tape made several years ago.

Every week we used to drop a copy of critical system files onto a bootstrap tape - jumpstart profiles, copies of systems configurations, basic software. (And we knew the directory trees were stable so we were happy to do the ufsdump live.) The theory was that in the event of a total disaster, we could get any old machine and put enough on it to be able to work well enough to reconstruct everything else. (Having needed to do this, it proved its worth.)

So the files were on this tape. I managed to find a DLT drive in the loft; managed to find a machine to hook it up to; managed to get the files off the tape.

(Finding a suitable machine was more tricky than I expected - I was a little surprised to go round the back of my W2100z - a SCSI based machine - and find no SCSI orifice. Fortunately I'm looking after a SunBlade 2000 for a colleague and that does. The other criterion was that I wanted to drop the files onto a USB connected zip disk, so having SCSI and USB in the same box was a help. I need to get that home network set up!)

So, problem solved. And it just goes to show that despite my reservations about tapes and old backup systems, they do actually come in handy once in a while.

Tuesday, August 30, 2005

The great thing about standards...

You know the old chestnut: The great thing about standards is that there are so many to choose from.

Now SCSI is a great standard. The truth is that SCSI devices do interoperate
incredibly well. At that level, it's been a phenomenal success.

But SCSI really covers a lot of things. The thing that hit me recently was connectors and cables.

I remember old SCSI cables with DB50 and centronics connectors on. I don't remember using them much - they were being phased out when I started playing this game. Then came the small 50-pin connector. Then the HD68, and more recently the VHDCI connector. Did I forget anything?

And then of course there was original SCSI, fast SCSI, fast wide SCSI, FWD, ultraSCSI, and ultra320 SCSI. And single ended plus two variants of differential.

So it's not actually that easy, given a system and a device, to say whether they will in fact interoperate. And then you need to find the right cable. (And then is the cable rated for ultraSCSI speeds?)

Which has bothered me for a while, but recently came to the fore after a bad incident caused a system to fail. So I decide to move the disks from a Netra T1 to a V240 and run the services in a zone. Easy, right? Well, not quite. The T1 has a HD68 orifice as I recall, and I think the V240 has a VHDCI outlet on the back!

So I go for a rummage in the loft and find the right cable. I'll send that across to the machines, and hopefully will be able to set the zone up and have the service back operational later this week.

Monday, August 29, 2005

The good, the bad, and the ugly

Let's start with the good:

We've now got SMBIOS integrated into OpenSolaris. I think it's the systems administrator in me showing, but I find this really exciting!

Now the bad:

I know I don't work for a living at the moment, but still keep an eye on some systems for former colleagues. And one of them suffered a power cut recently. Not just any power cut either, by the sounds of it. I've already alluded to one of the problems, and I had to roll back the SMF repository on another machine, but worse was to come.

Basically, it fried a Netra T1. I managed to persuade it to power on from the lom prompt, but it immediately starts vomiting errors. No system, no ok prompt, just errors. Looks like it can't even run POST. This looks like a dead system to me, beyond hope of repair. That's bad.

And the ugly:

SJVN has been emitting more mindless drivel. I see two choices here - either he's genuinely incapable of understanding licensing, or he's got an axe to grind and is using his journalistic position as a veneer of respect.

The article - at least the snide part about Sun - is simply plain wrong. Using CDDL as the license for OpenSolaris doesn't give Sun control. Exactly the opposite: with CDDL as the license Sun have less control of what people can do with OpenSolaris than they would have had if they had used the GPL.

And as for failing to build a significant programming community, well for one thing they already have gotten a major community, and for another thing getting a handful of part-timers to build Solaris on the cheap was never the driving force - Sun spent a huge amount of cash on building Solaris, and open-sourcing it, and it was done for sound business reasons - such as to open up new markets - not to outsource development. (Or kill the penguin, or introduce submarine patents into Linux, or any of the other stupid reasons naysayers put forward.) Of course, once the open source plan became known, a huge bunch of use started clamouring to get involved so we can modify and improve Solaris in the ways that we want.

Saturday, August 27, 2005

Should I program to Tiger?

Simple question - when writing Java code, should I use the new features in Java 5.0, and thus not allow my code to work with older versions of Java?

There are, after all, some compelling reasons to do so.

In particular, I love Generics - I use Collections a lot, and being able to tell Java that the collection contains objects of a given (often fixed) type makes code so much cleaner. And again working with collections a lot, the enhanced for loop is a big win.

The question still remains - is backwards compatibility more important?

(I never found anything compelling in Java 1.4 to make breaking compatibility with 1.3 worthwhile, but the changes in 5.0 are a different matter.)

Thursday, August 25, 2005

NISminus

I'm a great fan of NIS+. Not only is it very easy to set up and use, it scales well and it's dead easy to manage the data stored in it.

(Much easier than old NIS or LDAP. Sun could be suicidal and ditch it, as they've been threatening too, but replacing it with dramatically inferior solutions like NIS or LDAP is going to make a lot of people miserable.)

Of course, it doesn't always work perfectly. I just had this one case where it stopped working. The server partially booted - the NISplus server was running and serving clients. It's just that the NISplus client on the server wasn't working. So lots of other things wouldn't work. Amongst others, the NFS server. So while I could log in to a client, I couldn't access any files. Bad!

(Some testing indicates an RPC authentication problem. But basically everything goes into the RPC black box and never comes back.)

The only thing I had done was to change the IP address of the server. And yes, I had run nisupdkeys -a to update the server's addresses. So my theory is that somewhere buried deep in the bowels of NISplus is some memory of the machine's old IP address, and I can't spot it right now.

(I managed to get it working again, but I'm not sure which of the half a dozen random kludges was the one that mattered.)

Back to JNI

While still thinking about how to make progress with jkstat - basically, how closely to align it with the existing libjkstat implementation - I started another quick project to use JNI to get at Solaris system information from Java.

And, again, programming in JNI is hard going. The real problem is that it's astonishingly easy to make a mistake. And then it falls over with SIGSEGV and cores on me. Even the stack trace isn't too informative - it tells me that it's gone wrong in the native code (usually when calling back to java). As if I didn't know that! Even knowing where it blew up isn't that much help - it never tells you why. So you have to fix by inspection, and it's not the most obvious syntax.

(For example, it was blowing up at one point. Eventually I realized that I had a typo in the class name, but there was no indication in the failure that that might be the case.)

So what's the application this time? It's a java interface to PICL (the Solaris Platform Information and Control Library). The first aim is to produce a graphical picl browser (like my kstat browser). The prtpicl command is useful, and you get the information with -v, but it's not at all easy to browse through the many pages of output that can result. The picl data is tree structured, so the arrangement used in kstatbrowser works quite well - a Java JTree in the left panel to display the tree structure, and the actual data in a panel on the right.

Give me a day or two to work through the JNI errors I keep making and some code might be ready to release.

Wednesday, August 03, 2005

Keeping Busy

Having finished the last job, I'm at home for a while.

Not entirely idle, though. We've been to a theme park, I've started to sign on, and I've been looking at an odd little NFS problem for some former colleagues.

I've also been going through my CD collection. No, not the musical one! The pile of Sun CDs for Solaris, Java, and all the other bits of software that Sun have made that I've got the media for. After all, how many copies of the StarOffice 5.2 CD do you need? I brought my whole stock home, and have managed to prune it quite considerably.

My next project is to skim through all the sysadmin books I've got, reminding myself of what it used to be like, and what obscure things I'm supposed to know all about but nobody in the 21st century actually uses any more.

(On the first page of the first book, it tells me the key responsibilities of a Systems Administrator. High up on the list is formatting floppy diskettes. Say what?)

Thursday, July 28, 2005

Submerged!

Been a bit busy lately. So much so that I've almost submerged out of sight.

With the closure of my place of work we've been working extremely hard to decommission a pretty significant computing service. It's been tough.

Our kit (aging but still serviceable) has been split up and sent off to a number of other departments - 1, 2, 3, 4, 5, 6 - not to mention some kind people who actually paid us for some of our older kit.

While our service has been shut down, it lives on in spirit.

Actually shutting down a large system cleanly was quite interesting. And, by and large, it did go down cleanly in an organised fashion, while being essentially functional right up to the bitter end. We've been shutting down and decommissioning kit steadily for weeks, and consolidating services onto fewer servers. Shutting a system down is the easy part - we've spend the last two days untangling the spaghetti of ten years worth of patch cables. One of my colleagues was still answering user support queries mid-afternoon yesterday - by early evening the system was down for good. This morning it got put into a lorry and the new owners should be receiving it soon. It just remains to send the last few bits of junk off to the local recycling company, hand in my keys, and put my feet up.

Tuesday, July 12, 2005

Missing the comforts of $HOME

I'm currently involved in a project with another organization to replicate large parts of our computing infrastructure on their systems.

It's been a while since I've had to use a system that I haven't personally specified and installed. All my systems are set up the way I want, matching the requirements of the applications, and work extremely well.

So it's a bit of a shock to be given an existing system and have to use that. At least it's running Solaris, so I don't have to port my code. But it feels wierd to go back to something prehistoric like NIS - we've used NISplus for over a decade - and some of the other design decisions like not using the automounter aren't decisions I would make myself. So I'm working in slightly unfamiliar territory, and it makes me realize how spoilt I am on my own systems.

(Moving some of our own application code over revealed some rather - ahem - strange implementation decisions. For example, I was fixing up a whole bunch of scripts today that had to construct the name of a user's home directory. Now, most of our users get their home directories automounted under /people, so the scripts refer to the home directory as /people/$USER. Oh dear! Why not use $HOME?)

But back to the comforts of $HOME. We've installed a wide range of useful software over the years - most of it a very long time ago now. Some of these tools come in extremely handy for certain tasks, and I've become used to having everything available - to the point that I forget that it doesn't come on a system as standard. Some of these things aren't very big, and I'll give you one small example of the sort of thing I'm talking about: rgrep. Just a recursive grep, you say. That's true - I could use grep, maybe allied with find, but rgrep is one of those little finishing touches that turns a bland computer system into my $HOME.

Friday, June 24, 2005

London OpenSolaris User Group

So we had the first LOSUG meeting Monday night.

(See some other comments by: Gary, Sean, Peter Harvey, Chris Beal, and Chris Gerhard.)

Overall, I enjoyed the evening, although we overran rather badly - as a speaker I have to shoulder some of the blame for that, of course. And I'm looking forward to the nxt one!

Thursday, June 23, 2005

What's your disk MTBF?

A typical disk has a quoted MTBF of a million hours or so. (For SCSI; IDE may be somewhat less.) Let's call that 100 years - allowing for a mix of IDE and SCSI drives.

Now I have about 1,000 drives. Of a wide range of types and ages - IDE, SCSI, FC-AL. Some dating from the late 1990s. Based on the MTBF and the number of drives, I would expect to see 10 disk failures a year - or almost one each month.

I'm not actually seeing anything like that failure rate. It's much lower. Had a disk fail earlier in the week, but that's rare. I'm guessing that I'm seeing only a third of the expected rate of failure.

That's on my main systems anyway. Those are in a controlled environment - stable power, A/C provides constant temperature (not as cool as I would like, but stable). They spin for years without being provoked. I have a 12-disk D1000 array that sat unused in a cupboard for the best part of a year before being rescued and thrown together for beta testing ZFS, and so far 4 of the 12 disks in that have failed in the last year. Which is a rate much larger than you would expect based on the rated MTBF.

My conclusion would be - and this should be well known to everyone anyway - that if you take care of your kit and give it a nice stable environment in which to operate, it will reward you with excellent reliability. Beat it up and treat it like rubbish, and it will refuse to play nice.

Yikes - had to log out!

Finally had to throw in the towel and log out of my workstation (a SunBlade 2000 with 2 gig of memory) this morning.

There's obviously a memory leak somewhere. I had Xsun up at 2G, mozilla up at 2G, several gnome items (especially the panel) were at half a gig. Restarting the clients didn't free any memory in the X server, and the whole box was starting to grind.

So I had to log out to get Xsun to restart. Mind you, the machine had been up (and still is) since Feb 2nd - and I had been logged in that long. That's when Solaris 10 came out, so it's clear that stability isn't an issue.

Thinking back over the last few years, that's probably the longest I've been logged in continuously. Simply because, as a beta tester, I would upgrade or completely rebuild the box every month or two.

And, unless the first update to Solaris 10 arrives real soon, the next time I log out is likely to be on my last day as I turn out the lights.

Wednesday, June 15, 2005

Some Modified OpenSolaris Utilities

One of the things that you can do now that OpenSolaris has been released is to modify the way the Solaris utilities work to suit your own preferences.

So I've done just that and created some modified versions of utilities like ptime, prtpicl, du, and the ucb ls and df.

These just scratch a few of my itches and aren't terribly earth shattering. But they show that it's amazingly easy to modify the OpenSolaris source and gain value from it.

Technorati: OpenSolaris

Tuesday, June 14, 2005

Somebody just lit the blue touch paper

Stand well back! OpenSolaris just got let loose!

I've been involved in the pilot for months and it's been a long road, but it's so exciting to finally get here.

Now, off to that source code.

Technorati: OpenSolaris

Monday, June 13, 2005

Moved my Solaris Zone

OK, so in preparation for the place I work being closed down, I've moved my Solaris Zone to a new home.

This includes our Postfix under SMF page, and my Java Kstat project (recently updated!).

Friday, June 10, 2005

London OpenSolaris User Group

As Simon Phipps and Stephen Harpster have already noted, we have the first ever OpenSolaris User Group meeting in the UK - organised by Ulf Andreasson - at Sun's London City office at 6pm on June 20.

This will be great. It's good to finally be able to join in the OpenSolaris meeting buzz, to meet our local CAB member, to catch up with some old friends, to get to know more people from the OpenSolaris pilot, and to meet more people interested in Solaris and OpenSolaris.

We haven't confirmed all the speakers yet, but I hope to be talking about my experience with the Solaris development process, from the Solaris 10 platinum beta, through the OpenSolaris pilot, and beyond.

I'm looking forward to meeting lots of you!

Technorati: OpenSolaris and Solaris

V250 EOL?

Looking at the Sun V250, I've just noticed that it says No Longer Orderable.

What's going on? The V250 is actually one of Sun's better machines. It's the only tower server they had in their lineup, and the only machine with a decent amount of internal disk capacity.

So basically Sun are only interested in customers with expensive datacenters with exclusively rack servers and separate storage arrays. And I haven't seen anything to indicate that they're going to introduce anything else into their lineup.

It's hardly surprising that Sun are struggling to grow sales if they keep reducing the market segments their hardware addresses.

Thursday, June 09, 2005

Solaris, OpenSolaris - which to run?

With the imminent release of OpenSolaris, we have the prospect of one or more OpenSolaris-based distributions such as SchilliX as an alternative to Solaris proper.

Would I use such a distribution? Almost certainly not. Not that I have anything against an OpenSolaris-based distribution, but my primary interest in OpenSolaris is that it's the base for Solaris proper. As such, the "distribution" I'm most likely to run is something along the lines of Solaris Express.

So if I'm happy with Solaris Express, what benefit do I get from OpenSolaris? Well, quite a lot actually.

Simply being able to look at the source is an invaluable aid to understanding the behaviour of the system and debugging problems as they arise. Looking at the source, it can be immediately obvious what the problem is. (Or, equally, that the problem isn't where you first thought so you can eliminate that line of enquiry.) Something that has often frustrated me in the past is that I've often been able to make a fairly confident guess as to the nature of a problem, but haven't been able to confirm it. With source code, I can trivially check; without, I have to wonder whether to fight through the process of a service call. (Note that Sun should also benefit from its customers being able to do problem diagnosis themselves.)

As a true open source project, you won't need a service contract or have to log a support call to report a bug. If something isn't right, you'll be able to report it without any hassle. This is something I've wanted - even as a contract customer - for years. The overhead of raising a support call is sufficient that many smaller problems I spot would go unreported. And the overall quality of Solaris will only be improved if bugs get logged. So making it easier to report bugs should lead to a general increase in the quality of Solaris.

Beyond the ability to report a bug or ask for an enhancement is the opportunity to fix the problem myself. Either for my own personal use, or to be put back into the main product. I'm not going to be writing kernel modules or filesystems or device drivers, but there are plenty of irritations in the standard commands that could be addressed.

My primary interest in all this is Solaris. Sun have decided that OpenSolaris is the development mechanism for Solaris going forward. As such, getting involved in OpenSolaris is a natural thing for me to do. Other distributions look to be a lot of fun, but (by their very nature as being a distinct distribution) aren't going to do the same things that Solaris proper does. So while I'm going to watch them with interest, I'm going to be actually running Solaris or Solaris Express.

Monday, June 06, 2005

Step by tiny step

In TuxJornal, Theo de Raadt says:

"The list of new developments is impressive, but in my view not nearly as impressive as the small little details that continue to be fixed during each development cycle.

Development of OpenBSD is not a milestone-driven series of revolutions. It is a series of small evolutionary steps headed which continue to become cleaner, tiny step by tiny step."

This is interesting, because it highlights the weaknesses I've seen in Solaris development. You see, the problem I have with Solaris is that it's great at the big things, but leaves minor imperfections untouched. Zones and Dtrace are stunning features, and the general solidity and architectural integrity of Solaris is awesome. But Sun have historically spent little time and effort on fixing the tiny little flaws or irritations that are too small (for them) to bother about.

There is hope. With OpenSolaris just around the corner, we have the opportunity to scratch those itches and get to fixing all the little things that have bugged us ourselves. I have my own list of pet peeves ready and waiting to be fixed.

Technorati: OpenSolaris - Solaris

Saturday, June 04, 2005

StorageTek Buy - ???

So Sun buys StorageTek.

Huh? I just don't get this.

There's not an awful lot of overlap in product terms. Sun already resell StorageTek kit to fill in the gaps in their storage line. (And if you take the 3rd party stuff out, Sun's product line is full of gaps.) So I don't see much consolidation of product and hence cost reductions.

Also, I can't see how StorageTek kit is going to fare being sold into environments containing other vendors' kit. As I see it, an independent vendor is going to be more credible than one owned by a competitor, so StorageTek/Sun is going to sell less kit into other environments and, while it may sell more kit into Sun environments the results is a net loss.

I don't see any organisational benefits from the merger, and I can't see it helping sales.

So why buy StorageTek at all? Why shell out the cash? (And it's a lot of cash!) The Fujitsu (SPARC chip) and Hitachi (high end storage) deals solved a number of problems without dipping deep into their pockets.

Personally, if I had all that cash burning a hole in my pocket I would have gone after some of the smaller high-tech companies that are developing emerging technologies. In areas like InfiniBand or next generation ethernet, for example. Or innovative system builders like they did with the Kealia acquisition.

But having said all that, Sun's track record in making a success of the assets it's acquired isn't good. Time will tell if the Kealia bet works out (so far, it looks good but is taking its time to mature), but I can't think of any others that haven't sunk without trace.

Thursday, May 26, 2005

Solaris and GUI management

Following up on this comment, which gets its roots from some other thoughts.

I've heard this criticism of Solaris elsewhere - it's a fairly common comment.

Is it justified? Well, yes and no. Let me take this on a bit further.

It's true, without question, that Solaris is largely lacking in GUI management tools. And generally I would have to say that those that are present are pretty poor.

So the criticism is justified? Well, not quite.

One reason that Solaris doesn't have and of that fancy management crap is that quite simply it doesn't need it most of the time. The CLI interfaces are pretty straightforward, and if you want to ignore those then just editing a few files won't tax anybody - the files used are in well defined locations with well defined contents, and all this is pretty stable. I've managed Solaris systems for well over ten years and never in that time felt the need for anything more than the tools that are provided.

In that sense, Sun's existing customer base - myself included - must take some of the blame for Sun's failure to provide fancy management tools. We (I know I have and I know other customers who have the same response) have repeatedly told Sun that we don't want dodgy GUI management tools. Given the negative response, is it any wonder that Sun haven't produced much in the way of groundbreaking works of art in this area?

(On the other hand, my experience of trying to do essentially trivial things with networking under RedHat was very frustrating. I can see why why RedHat admins need tools to help.)

One other thing: if you're spending that much time doing trivial things to your systems that the GUI vs. not argument makes any difference, then there's something fundamentally wrong with your admin framework.

OK, so the critcism is completely unjustified? Well, not quite.

The reality is that there are some great graphical admin tools, and the possibility for some additional ones. But there is precious little for the inexperienced or part-time admin. And if Solaris wishes to expand out of its datacenter roles, it must cater to wider markets.

Security Awareness for Ma, Pa and the Corporate Clueless

Check this out: Security Awareness for Ma, Pa and the Corporate Clueless.

One thing I've often wondered, though. Is the Wintel platform really as bad as it's made out to be?

I mean, I'm a Solaris guy, and I know that Wintel is bad. It's obvious that the cost is way higher than our Sun/Solaris setup, and the reliability/availability is way lower.

But the idea that home users have to spend hours per week maintaining their PCs just sounds crazy to me.

I'll admit here that I have a PC at home. It's a reasonably modern, broadband connected PC, running Windows XP. Nothing special about the setup - it gets the latest service packs and patches applied regularly and promptly. But it has no third-party security software of any kind on it. No anti-virus, no firewall other than the basic one that XP now provides.

Is my machine infested with trojans, spyware, and virusses? I don't think so. I occasionally check, but we don't see any problems, and everything runs pretty solidly.

Am I just lucky?

Probably not. You see, there are a couple of things that I (and perforce the rest of the family) do to keep it that way.

Rule 1. No Internet Explorer. Period. Total. It's not the default browser, and everything except windows update is forced to run at the highest security setting I can think of. No icon on the desktop. Depending on personal preference, we run either netscape or firefox.

Rule 2. No email. Period. Total. You want email, you ssh into a real computer and read it from there. I'm still a ucb mail person, but the wife uses pine. Whatever, there's no chance of any garbage getting through onto the machine.

Rule 3. (OK so I can't count.) No network clients other than decent browsers and ssh. No music, chat, online services, none of that rubbish.

Given that, I've found Windows XP to be an adequate, and largely trouble-free and useable, environment.

Of course I would rather run Solaris on it, but (at least last I heard) you couldn't get Flight Simulator, Zoo Tycoon, Age of Empires, or Rise of Nations for Solaris.

Friday, May 13, 2005

Welcome Dowstone

A big welcome to Dowstone Limited!

One of the directors is Gary Pennington.

(This blog is delighted to be listed among Gary's friends. That's a real honour.)

I worked with Gary for about a year - he was unfortunate enough to be my support engineer for the Solaris 10 platinum beta program. Which means he's used to a dumb systems administrator trying to do stupid things with Solaris!

I wish Gary and Alan well in their new venture.

Friday, May 06, 2005

Great Birthday Present

I know it's off-topic, as I have a separate personal blog, but...

For my recent birthday Mel bought me the Season 2 DVD boxed set of Star Trek, The Original Series. So I'm just about to pop one of the disks into the DVD player, and watch this blog's namesake episode.

(And then the other episodes, but I've got to watch them Tribbles first!)

Tuesday, April 26, 2005

Simple DTrace

One of the headline features in Solaris 10 is DTrace, allowing you to probe the inner workings of a system in more detail than ever before.

I'm no expert, but I like some of Brendan Gregg's DTrace Tools.

In fact, my favourite so far is execsnoop.

(I think this says rather more about the sort of activity on my systems than anything else. We don't run significant databases or servers; many systems run random junk. Desktop use; development use; loads of badly written shells scripts. And I don't need DTrace to tell me that most of the compute applications are awful.)

So execsnoop tells me how badly written some of this scripting is.

The worst I've found so far is mozilla. This isn't a binary - almost 60 shell commands happen before the mozilla binary is reached. And essentially all this scripting is completely pointless - the parameters that are being set are fixed and don't need to be worked out afresh each time you launch it.

Another interesting thing I spotted was uname being run when I logged in. This turned out to be my tcsh startup working out what sort of machine I was using. It turns out that tcsh already knows exactly what sort of system it's running on. The OSTYPE and MACHTYPE environment variables tell you all you need to know. I knew this already, of course - but DTrace revealed that there was one place I had missed. (And also - in tcsh you don't need to exec any comands to set a dynamic prompt: tcsh has builtin variables you can use.)

I've also found unnecessary duplication of work in various system monitoring shell scripts, and lots of simple cases of inefficient coding. Most common things I see are excessive calls to uname (often generic scripts finding out that they're using Solaris, which they ought to have known already) and excessive use of expr (either learn to iterate over $# correctly, or rewrite in a more advanced shell like ksh that can do arithmetic).

In short: try leaving execsnoop running and see what stupidities show up!

Friday, April 22, 2005

Can Sybase get it?

At ZDNet: � CEO John Chen sees open source in Sybase’s future:

Sybase has a free version of its high-end database for Linux

However, as I commented on the BTL blog entry, this comes with limitations. In fact, the same limitations I've complained about recently.

Come on Sybase! If you really want to drum up business, have the same deal for all operating systems - Solaris in particular.

As it is, we're just moving to MySQL, which is free on all platforms.

Envy

There's a slightly imperfect press release: Sun Announces Dual-Core Technology Across Entire x64 Server Product Line.

For one thing, Sun's x64 server product line consists of two models in its entirety, and even then they've only actually announced dual-core on the V40z. Ho hum.

I still want one!

One of the annoying aspects of working at an institute facing imminent closure is that we simply can't get access to any of the exciting new technology that's emerging. Hence the title, which describes how I feel right now.

Tuesday, April 19, 2005

Another Solaris irritation zapped!

From Casper Dik's Weblog:

USB hotplug finally works

Monday, April 18, 2005

Is Linux becoming Windows?

From ZDNet: � Is Linux becoming Windows?:

Some people are starting to think so. There is support for so many features in the Linux 2.6 kernel that it may be getting so fat as to be unstable.

The Open source community that's seeing a problem is CA. Not the most obvious Open Source Company. Mind you, they seem to be getting their fingers into a lot of pies at the moment.

Now, sure, I see the logic that says you can build yourself a custom kernel. But you shouldn't have to mess like this - I still can't understand why Linux can't accept a stable driver ABI and make every kernel feature a loadable module (and have it support loadable modules for 5 years or more). After all, Solaris can do it, why can't Linux?

Actually, I don't think it's just restricted to Linux, or the Kernel. I see other open-source projects getting to be more like windows, in terms of bloat and complexity - modern desktop environments spring to mind.

Oh great...

Yikes: Mozilla flaws could allow attacks, data access | Tech News on ZDNet:

Multiple vulnerabilities that could allow an attacker to install malicious code or steal personal data have been discovered in the Mozilla Suite and the Firefox open-source browser.

Oh dear, let's get downloading away!

Are upgrades viable?

In an article: Choosing an upgrade path from Windows 98 the author describes how to:

...give a new lease on life to aging laptops and PCs by replacing obsolete OSes such as Windows 98 with a combination of Linux, free open source applications, and inexpensive commercial software.

OK, good idea. But how viable is this in practice? The argument goes that they aren't good enough to run Windows XP, but make brilliant machines to run Linux.

Frankly, I don't see this working. My experience of all modern desktop environments and applications is that they're bloated with significant memory and CPU requirements. We're talking about using applications like openoffice, crossover office, KDE, Gnome - and I wouldn't want to run any of them on an ancient machine.

As I see it, new hardware (or even nearly new) is dirt cheap, and the cost of making the old stuff work, and supporting it, and the time lost due to it being slower than a new machine, makes the idea of trying to re-use old machines financially unattractive.

Bad Journalism - Janus non-article

In this article, it says (about Janus):

Customers who want the stability and security of the Solaris Operating System and the flexibility to also use Linux applications won't have to wait much longer.

So, where is it? Now, I wouldn't mind having Janus available - and soon - but at the present time I haven't actually seen any sign of Janus.

Where on earth did this article come from anyway? It looks like a straight ripoff of a 6-month old Sun feature story, without correct attribution, and makes it out as if it's current news.

(Mind you, over 6 months after that Sun article appeared, still no Janus.)

Thursday, April 14, 2005

Pssst... Free Linux! Only $799!

Paul Murphy has a story: Pssst... Free Linux! Only $799!. Note that the point he's making isn't against Linux, or even Red Hat - it's against suppliers of hardware and software that essentially force you into paying for an expensive Red Hat license that you don't want.

The nub of it is:

If your application vendor only supports one of the Red Hat enterprise editions and this obligates you to pay at least $799 for your first year, is it still free?

Of course not, and remember - it's not Red Hat's fault. It's the other vendor adding a bad dependency that is the problem.

We have a system bought to run a particular piece of commercial software. This software used to run on SGI boxes running IRIX. And any old SGI would do. But our O2s were getting a bit long in the tooth and not really up to it, so they offered to let us transfer the license to Linux. Which is where the trouble began. The vendor's spec for the machine didn't put it in the bargain basement category, and we had to get a Red Hat Workstation License - not cheap - and a fancy quadro card - not cheap either.

Even worse is that we had to use Workstation 2.1. The application just refused to work - point blank - under 3.0. (What is it with Linux compatibility between releases? Don't the distribution builders or application suppliers care? I have applications running flawlessly under Solaris that are 15, 20 years old. God I love Solaris.)

Which is another common point - many commercial applications seem to want a Linux version that is, to put it nicely, antique. Red Hat 7.2 is pretty common. I don't understand why this is. Is it that it doesn't work under newer releases (often, yes, unfortunately)? Can't they be bothered to get it working under something newer (who knows)?

(This phenomenon isn't restricted to Linux. We've had the same problem in the past with commercial applications under Solaris not supporting current versions. Or even Sun stupidly not supporting their own hardware under current Solaris [think back to the disaster of the Sun Blade 1500 and Solaris 9 - ours sat in their boxes for 6 months because Sun couldn't be bothered to get Solaris 9 running on them - and that after delivering them months late]; or Sun not supporting their own products on Solaris 10 yet [or Solaris x86] - think SunRay.)

Overall, this is the biggest beef I have with commercial application software. Not it's quality, or price, or anything religious about licensing. Simply the fact that they force you into a straitjacket when it comes to configuring your system, and that hurts.

Tuesday, April 12, 2005

More dubious methodology

In this article: Linux servers praised for security - ZDNet UK News we discover how they worked out this fascinating conclusion:

Over 6000 software development managers were asked in a survey conducted by BZ Media to rate the security of server operating systems

Oh great. So, being that software development managers obviously know all there is to know about operating system security, we can all sleep soundly in our beds knowing that reports like this are based on expert fact.

Or was it clueless opinion?

(I have nothing against software development managers. If I needed someone to manage software development, that's precisely who I would likely turn to. But the more I discover about the methodology used in some of the reports currently appearing, the more I treat those reports as a joke.)

Industry to adopt open source constitution - vnunet.com

There's some coverage on vnunet.com of the CA plan to simplify the open source licensing nightmare.

It's not clear to me how the CA plan will necessarily work. Sure, it's nice that they're thinking about something like CDDLas the foundation, but how will an infinite number of variations of the CDDL help?

The article also contains the following misleading statement:

To deal with those issues, Sun Microsystems has created the CDDL for the release of the Solaris 10 source code, and Computer Associates formed CA-TOSL when it released its Ingress database last year.

But this has led to a proliferation of open source licences and caused confusion with both end users and developers.

Let's be absolutely clear here: the CDDL and TOSL didn't create the license proliferation problem. It already existed - these new licenses didn't suddenly create a problem. The fact that new licenses needed to be created is symptomatic of the problem, and the CDDL is explicit in identifying the problem and taking steps to address it. Blaming it for being the cause simply won't wash (but then, when did shooting the messenger ever go out of fashion?).

IT Observer - Mozilla: The Honeymoon is over

According to IT Observer Mozilla: The Honeymoon is over. One snippet:

But then it may be asked is it really within the remit of a browser to guarantee Internet security. Are we asking too much? We don’t expect our browsers to block viruses, spyware or malicious scripts so why should we have such high expectations for their security capabilities.

It's not a case of guaranteeing security. I expect security by default. And I don't expect my browser to block viruses, spyware or malicious scripts - I expect that a web browser should be immune to them, so that blocking is irrelevant.

I don't often get decent security by default, mind you.

On my home PC, with Windows XP, for a couple of days when I first got it I was using IE (just until I got round to installing Netscape and later Firefox). And just a couple of days using IE was enough to persuade me never to do so again. Something that will allow a web site to randomly install software on my machine without even bothering to tell me has no place on my machine. I currently have IE set up so that everything except windows update is run at the highest possible security setting, and don't use IE anyway as I have something other than IE as the default browser. Since then, I've been trouble free. (And I don't read mail on my PC with anything - I ssh onto a Sun box and use good old ucb mail for that.)

Are the unix variants maintaining the high ground in terms of security? On the server side, I could honestly argue that they are. I'm not at all sure that this is true on the desktop, though. The problem I see here is the increasing complexity of desktop environments, with tight integration and extra services opening up new avenues of attack (or the same sorts of avenues that have been present on Windows for some time).

Monday, April 11, 2005

Dubious Testing

I was reading an article: Study Finds Windows More Reliable than Linux.

One thing that caught my eye was the testing methodology:

During the test, VeriTest also initiated a series of events that broke or disabled various system services in the administrators' test environments, which remained down until they were fixed by the administrators.

and then the conclusion is that it took longer to fix the Linux system than the Windows one.

The staggering thing - to me - is the idea that systems breaking down is normal. I'm sure we must have service failures, but they're incredibly rare on my Solaris machines. In fact, so rare that I'm really having trouble trying to think of one that wasn't a direct and obvious result of hardware failure. The Linux machines seem to need a kick once in a while, but the Windows machines generate a constant steam of calls along the lines of "help - my machine's stopped working! again!".

It's not just how quickly problems can be fixed, it's how often they crop up. (Both MTBF and MTTR enter into the equation here.)

The fact that Windows gets repaired quicker may simply be a reflection of the fact that Windows admins have more practice fixing problems...

One example from personal experience. We used to have a couple of RS/6000 machines running AIX. These were astonishingly reliable. (They were a pain in the neck because, while they were really fast, most applications we ran on them had to be ported, so we had to have a dedicated person to not only do user support for those applications, but also to port and test them. So when he left we had to move the applications onto Suns, where they compiled and ran without any effort. But I digress.) So reliable, in fact, that I had to log in to them maybe once a year to do some minor housekeeping. The fact is, I got so little practice in looking after them that I was starting afresh from my manual and course notes every time, and there was a slight delay before I found the right place in SMIT (and it's not as if the AIX commands are identical to Solaris).

And, of course, if the systems in the test were running Solaris 10, the chances are that SMF would have silently fixed the problems in no time at all.

&@#^ serial connections

If there's one thing I hate it's serial connections and the monstrosity known as RS-232.

Whoever invented this beast I don't know, but the whole thing has caused me no end of trouble over the years.

For starters, there's the question of finding something that's got the right connector on it. OK, so there are reasonably common connectors like 9-pin and 25-pin, but then you have male and female, and Sun have at times combined two serial ports into one so you need a splitter. Modern systems might have RJ-45 connections. And then is it straight through or crossover? Whatever, it's a matter of luck if it ever works. Or sacrificing an intern or two.

But then there's older stuff - this morning we (ourselves plus Sun engineer) were trying to connect to the serial port on an A3500 controller, which has a 15-pin gadget that looks suspiciously like an old AUI ethernet port. Apparently there's a special cable that you can get if you are ever unfortunate enough to need to connect to the serial port on an A3500, but we don't have one. (For all I know, Sun might just have the one cable - I can imagine it being passed from engineer to engineer like an ancient relic, and who knows whether it was blessed or cursed.) So the A3500 is still sick.

Still, it could be worse. I reckon I'm cursed when it comes to serial connections, printers, and scanners. Trust me on this: you really wouldn't want me going anywhere near a fax machine!

Friday, April 08, 2005

Welcome Stephen Harpster

As Ben Rockwood noted, Stephen Harpster has started blogging.

Stephen noted in his blog that there had been some concern over his taking charge of OpenSolaris. I spoke to Stephen in San Francisco a month ago and, like Ben, I came away confident that Stephen is one of the good guys and that OpenSolaris is in good hands.

(Mind you, I would have loved to have been able to get back to SF for the CAB launch and OSBC. I really envy those guys who are on the spot.)

Sun shooting themselves in the foot

So Sun recently announced that they're going to restrict access to certain sunsolve features and the system handbook to contract (paying) customers only.

Frankly, this is daft. This is a valuable resource that is also highly valuable to potential customers. Sun is telling them to take their custom elsewhere.

Not only that, I'm a paying contract customer and I'm locked out. What on earth are they trying to do here?

Thursday, April 07, 2005

Sun VP Tom Goguen Discusses Evolution of OpenSolaris

So Tom Goguen was interviewed about OpenSolaris.

OK, so I thought I would take some of the questions and give some brief answers of my own:

LinuxInsider: What expertise do the Advisory Board members bring to the OpenSolaris initiative?

It's sort of difficult to answer this. We're heading out into uncharted territory, but I think we've got a good mix here. A Solaris insider, a Sun open source advocate, and a couple of community members of wide experience and great enthusiasm. I think it's the variety of expertise that's important - we really don't know how opensolaris is going to develop, and exactly what the role of the CAB is, so we'll just have to get some smart guys and see how it plays out.

My own view here is that Roy Fielding is going to play a crucial role in defining the character of the CAB and, by extension, the operation of the OpenSolaris community.

LinuxInsider: Do you see strong community support behind OpenSolaris initiative today?

Oh yes. And I'm surprised how varied it is. We have the usual suspects from the Solaris community, but also significant and active involvement from outside the traditional Solaris base.

LinuxInsider: Analysts have said of one of the hurdles that the Advisory Board will face is making it easier for developers with a computer science background and no prior Solaris coding experience to actually do a Solaris build. How will you get over that hurdle?

I would like to know which analysts said this sort of thing, because doing a Solaris build is easy. Plenty of pilot members successfully built and installed OpenSolaris as soon as we got our hands on the code.

I don't have any particular Solaris coding experience either. Having worked with bits of the code, I've found it very easy to understand what's going on and to make modifications. There's a lot of code here, though, and it takes a little while to work out how the whole fits together. But that's true of getting to grips with any piece of source you're not familiar with.

Wednesday, April 06, 2005

Job Hunting

Later this year (August 24th to be precise) I'm going to be out of a job.

This isn't coming as a surprise. The capricious decisions of funding bodies are well known, and it's been 18 months since it first became clear that the place I work wasn't going to get funded.

So what to do? The severance package isn't to be sneezed at, and I haven't yet seen anything sufficently attractive to be worth turning redundancy down. So I'm going to hang on till the end - and expect to be busy keeping systems running while decommissioning as much as possible and setting up those researchers moving elsewhere.

Family commitments limit me to Cambridge (so we're not moving or emigrating). Or within an hour or so travelling - which could include some parts of London. Fortunately the area isn't a technological wasteland. And I don't mind travelling, using home as a base.

I really want to stick with Solaris, at some level. It's what I know, and what I enjoy. And I'm reasonably confident that some part of the University or some research group will need someone like me.

Should I stick to regular employment though? Or should I think about consultancy?

At least, with a redundancy package, I have the luxury of being able to wait a while rather than having to take the first thing that comes along just to pay the mortgage. Or I can try something different and be able to pull out if it turns out to be a mistake.

So, if anyone reading this has an interesting Solaris/OpenSolaris project that could use someone in the Cambridge (UK) area later in 2005, let me know!

Tuesday, April 05, 2005

OpenSolaris CAB

Sun have announced the members of the OpenSolaris Community Advisory Board.

It looks good. Solid. Professional. Nothing too flashy. Basically, I trust these people.

And I can claim to have met two of them, even. Casper briefly at a Solaris 10 meeting over a year ago, and Rich in a former life of his through SunService here in the UK.

Simon Phipps has talked about the process the CAB is going to go through. This is uncharted territory, and these guys are going to steer us through it.

I think Sun have done well with a difficult balancing act. Clearly, as the foundation for Solaris, there needs to be some measure of control by Sun on the way that OpenSolaris develops. And yet they are absolutely committed to opening up. In fact, many people I know are really concerned that Sun are giving up too much control and that the core values of Solaris will suffer. I don't think that will happen - most of the Solaris and OpenSolaris community have the same values, and so the mainstream OpenSolaris will keep many of those values. Meanwhile, there will undoubtedly be criticism from outside that Sun have chosen the majority of the CAB and the community members are pretty partisan too. And that's fair enough. But, remember: OpenSolaris is open source, and anybody is free to take it and set up a project of their own outside of the CAB's governance if they so choose.