Monday, January 30, 2006

Why I use Windows

I have a confession to make.

My main home computer runs Windows XP. Now, I'm a Solaris guy, so you would expect me to have banished Windows, but no, most of the times I use a computer at home it's the Windows box that gets the nod.

There are reasons for this renegade behaviour. The two main ones are Windows ability to do suspend-to-RAM and fast user switching. Essentially, if I just want to check my email quick, then either the Windows machine is on already and I just have to switch user, or I hit the power button. In both cases it's about the time it takes me to get the seat located comfortably before I'm online and working.

Solaris can't match this. If I'm working at home then the Sun box comes on, as the superior behaviour and environment are worth waiting 10 minutes for. The boot time is awful; the JDS startup time is awful; suspend resume is only on sparc and is very much hit and miss. If we're after wider adoption, then this is a huge area we have to address.

newboot 2 - nvidia failure

OK, so having had great success upgrading my home W2100z to Solaris 10 Update 1, I tried it on my work machine.

Now, I wasn't expecting this to be entirely trouble free. My work machine gets all sorts of abuse, with various test versions of anything that might be lying about - so before it started I cleaned up an old zfs beta release, deleted all the different sorts of backup software I had been trying, and tried to clean up and back out all the hacks and kludges.

In the end, none of the problems I was anticipating surfaced. But when it came back up - no graphics. Looking at the /dev/nvidia* and /dev/fb* entries, they looked very suspect.

Turns out this is a known problem with the NVIDIA drivers and the S10U1 upgrade. And it was essentially what I had surmised by looking at the device entries, so a quick deinstall, reinstall, clear up /etc/path_to_inst and a couple of reconfiguration reboots later and everything's back in business again.

Phew!

Saturday, January 28, 2006

newboot

Got my Solaris 10 1/06 DVD kit yesterday. So popped the x86 DVD into my W2100z (running Solaris 10 FCS) and let it get on with it.

Upgraded without a hitch. Everything looks good!

The boxed set includes quite a lot of stuff, so I also updated the freeware, and added the Studio 11 compilers. I'm leaving the Java Enterprise System for another time - I'm not going to risk playing with it on my main personal machine!

One interesting thing - I've got a decent monitor (a Sun 21 inch CRT) on this machine, and it had been running at 1280x1024. I hadn't really had time to investigate tweaking Xorg to get it to run at higher resolution. After upgrading, it came up on its own at 2048x1536. This was a little over the top, so I dropped it back to 1600x1200, which is the max recommended anyway, and runs at 85Hz rather than 75Hz, and makes the text a little more readable.

Everything still works, all I need now is a little more time to get back into development work.

Thursday, January 26, 2006

Buried in good stuff

The good stuff keeps on coming.

OpenSolaris now has even more communities - particularly for sysadmins and appliances - to follow. As if approachability, networking, observability, zfs, and zones weren't enough. And that's only half the communities I'm interested in!

What I really need is a time machine so I'm able to keep track of it all.

Tuesday, January 17, 2006

Enemy action?

According to Auric Goldfinger:

Once is happenstance. Twice is coincidence. Three times is enemy action.

One: panic[cpu1]/thread=fffffe80fd47aea0:
BAD TRAP: type=e (#pf Page fault) rp=fffffe8001773c70 addr=0 occurred in module "genunix" due to a NULL pointer dereference

Two: panic[cpu0]/thread=fffffe80f83a5de0:
BAD TRAP: type=e (#pf Page fault) rp=fffffe80010e8bd0 addr=0 occurred in module "unix" due to a NULL pointer dereference

Three: panic[cpu1]/thread=fffffe8000f56c80:
BAD TRAP: type=e (#pf Page fault) rp=fffffe8000f56450 addr=0 occurred in module "conskbd" due to a NULL pointer dereference

That's 3 failures on my desktop machine in just over a month. For those interested, the stack traces are:


stack pointer for thread fffffe80fd47aea0: fffffe8001773cb0
fffffe8001773d70 0xffffffff8a965940()
fffffe8001773da0 port_remove_done_event+0x4b()
fffffe8001773e10 port_associate_fd+0x2b8()
fffffe8001773ec0 portfs+0x303()
fffffe8001773ed0 portfs32+0x24()
fffffe8001773f20 sys_syscall32+0xd9()


stack pointer for thread fffffe80f83a5de0: fffffe80010e8c10
fffffe80010e8d10 tcp_close+0xff()
fffffe80010e8d50 qdetach+0x84()
fffffe80010e8dc0 strclose+0x3e4()
fffffe80010e8e00 socktpi_close+0x12b()
fffffe80010e8e30 fop_close+0x2a()
fffffe80010e8e60 closef+0x62()
fffffe80010e8ec0 closeandsetf+0x249()
fffffe80010e8ed0 close+0xb()
fffffe80010e8f20 sys_syscall32+0xd9()


stack pointer for thread fffffe8000f56c80: fffffe8000f56330
fffffe8000f56c80 5()


Huh?

Wednesday, January 11, 2006

This can't be serious...

I've often wondered why Solaris patches take so long to apply.

I'm still wondering, after updating the Java patch (118668, for the technically minded). OK, so it's a big patch, but I've got a high-spec dual Opteron W2100z so it should be done in the blink of an eye.

Or maybe not. I decided to back out the old revision (this is one of those foibles I have - I tend to apply patches regularly on test boxes, so they get every revision, so I have a habit of backing out old revisions to keep things clean). The backout took 10 minutes, adding the new version 5. That's 15 minutes on one of the fastest machines around. Ouch!

One thing I did notice is that the patch backout was writing at an average of 30Mbytes/s for most of that 10 minutes. Overall, I reckon that I had about 15Gbytes of disk writes. Why on earth?????

Clearly it's up to something very clever here.

(For comparison, simply installing that version of java - which is about 135M - generated about 200Mbytes of disk writes and took about 20 seconds. And some of that is accepting the license, unpacking and verifying the archive, and building the jars. Clearly there's some scope for improvement.)

Software Stacks

I liked the idea that Ben Rockwood came up with - SIDEkick. Essentially, a complete software stack, in this case for a php-powered postgres-backed web server, in a single file.

This is along the lines of my simplicity rant the other day. What I want is things set up ready to go.

I do this myself, for most of the projects I get involved in. For a project, I build up a software stack that contains all the components necessary, with an install script that does the work - and not only installs the stack but can also put project-specific customizations in place, and install data if required. It's largely self-documenting as well, as the install script contains all the tweaks and kludges I needed to get the thing to work.

The full stack actually has quite a bundle of components. The primary ones are:

I still use Apache 1.3.X, due to problems I've had with 2.0.X and 2.2.0 not working right. Of course, not all projects require the full set.

As an example, I've used this software stack to enable easy deployment of DSpace, in addition to basic web servers and complex soap application servers.

Which raises the valid question - why build the stack myself?

There are several reasons. Being self-contained is a pretty good reason, all on its own - having everything in a single bundle with all dependencies means you don't have to worry about how to integrate it with some other component that may or may not be installed, may or may not be the right version, and may or may not be configured compatibly. It just works. You make yourself independent of other suppliers, who may change things underneath you without your knowledge. And it fully documents and supplies your requirements so that someone else can come along later and not only understand what you did but has all the bits already at hand to reproduce it.

What I would like to see is more lightweight software stacks developed - in a modular fashion so that I can just click a link and get a single file that gives me a well-defined area of functionality. Big frameworks are all very well, but there's an awful lot of complexity due to the generality that comes with a full framework, and that gets in the way of actually doing work.

This also means that I don't really want - at all - bits of the stack bundled with the operating system. These are just a pain and get in the way. I've nothing against OS suppliers having the full software stack available, but please compartmentalize it so I can safely ignore it.

Monday, January 09, 2006

Licensing Complexity

Following on the heels of my rant about keeping software simple, is another one:

Why are licensing schemes so horrendous?

A while ago, I was looking at backup solutions. I've just started to get costings together. What a nightmare!

Really, how hard can this be? But then there are different tiers of servers, prices are different for different platforms, some of it's done by volume (how many terabytes can you afford?), tape libraries by capacity in several dozen steps. There are different base products, with no explanation of the differences.

(This sort of thing screams out for an online order form where you just tick the boxes and it puts together an order for you. How do I know whether I've actually chosen the right options from 6 densely packed pages?)

Friday, January 06, 2006

Updated NVIDIA drivers [again]

Dang, after only just noticing the latest Solaris NVIDIA driver release, they've gone and updated the drivers again.

Start out Simple

One of the failings of modern IT infrastructure is that it's far too complex. Individual components are complex, and have complex interdependencies with other complex components. And some of us have to make sense of this mess every day.

Sometimes, complexity is unavoidable. That's OK. If it's a complex problem, then I expect some level of complexity in the solution. It would be better if the solution were simple, but we can't always have what we want.

Sometimes, new technologies come along that radically simplify the way that things are done. ZFS is one recent example - it takes away whole layers of complexity. But, as a rule, things get more complex over time as layer upon layer of cruft is added.

While you may need something big and complicated to solve the big and complicated version of a problem, does that mean that you need to be equally big and complicated to solve the little version of that problem? It seems that, all too often, you do need the big complicated version - with all its attendant hassles - to solve the little problems. Or, at least, that that's what we end up using.

I've always been opposed to this approach. I've always been in favour of starting off small and simple. I want to get something working, without delving deep into an impenetrable morass of configuration and tuning. Then, having got that to work (and, more to the point, having understood it), I can build on that foundation.

I've spent some time playing with the Java Enterprise System. And you know what? It's way too complex and hard to get into. I'm sure it can do wonderful things, but before it can do wonderful things I want it to do something. Anything, really - just to give me the sense of accomplishment that keeps me going to the next stage. I really can't see JES getting that much of the market, simply because most admins and organisations simple don't have the time and energy to invest in making it work at all.

Part of this is ease-of-use, but it's slightly different. It's really about the ease of getting started. And that's what complex technologies need to supply: an easy way in, to allow potential users to get started.