Monday, January 19, 2009

Hardware sucks

Sucks in many ways. the X4240 is sucking manpower like mad, it is sucking our reputation away with our customers, it is sucking my patience to the limit.

The latest:

PCI kernel panic and reboot! - Using the Myricom CX4 card (long story why this card) and doing some testing here is what happened:

fault.io.pciex.device-interr dev:////pci@0,0/pci10de,377@a/pci14c1,9@0 faulted but still in service

and this is what is at the pci location:

/pci@0,0/pci10de,377@a/pci14c1,9@0"

So, big problems, can't bring this hardware combination into production. Can't wait either, will have to roll back to our older PCI-X platform.

On a completely different server, but still an X4240, the BIOS posts 60GB of memory with 64GB installed. After the SUN tech swapped memory, the problem remained. He attempted to swap the motherboard, but the "new" MB failed to boot. We are still waiting for a resolution.

So, yet another server, another customer, can't go into production.

Oh my. I have often been accused of pulling the trigger to fast on switching vendors when things like this happen. I have been told that these things happen and we just have to work through them.

There is something to that. In the last five years I have tried the following vendors, and moved away from each of them due to intractable problems:

HP, IBM, Dell, Supermicro, Tyan and now SUN.

Is this way it is? Am I cursed? Either way, it sucks.

p.s. As if this was not enough, on our Thumper platform, the X4500, we have 6 in production as iSCSI servers. These have been active for less than a year. That add's up to 288 one TB SATA drives. We just had our second drive failure! That's a .7% failure rate and a full year is not year finished on these! It's a good thing we planned on mulitple sources of redundancy here.

No comments:

Post a Comment