Monday, September 03, 2007

X2200M2 fault diagnosis

I'm just setting up some new servers - Sun X2200M2 (running Solaris, of course).

One wasn't happy. Solaris would start to boot via PXE but the system would reset (over and over, in a loop). A quick look and the fault light was on. Not good.

These systems have remote management - ILOM - but after searching and looking through the documentation I couldn't actually see how to persuade it to tell me what was wrong. It's not helped by the fact that there are several ILOM variants in use, but the one on the X2200M2 is one of the more basic ones.

After running through all the options systematically, I stumbled across:

show /SP/AgentInfo/SEL

which tells me

Nonrecoverable ,2007/08/31 17:22:00 ,CPU1 DIMM 3 has multi-bit error

Aha! I've reseated the DIMM after swapping it with its partner, and the installation is proceeding apace.

No comments: