2013-11-24

Fortunetelling Like An Oracle

It starts out with an email from a friend saying that they are looking into acquiring a storage server for their science lab.

------------------------------------------------------------
Date: Mon, 1 Mar 2010
From: A Friend In That Lab
Subject: Fwd: Fwd: server config pricing spreadsheet
To: Chris X Edwards

We're looking for recommendations for something you know a lot more about than I do, so if you have time, I thought I'd ask what you think...

---------- Forwarded message ----------
Subject: Re: Fwd: server config pricing spreadsheet
To: PI
From: Someone The PI Knows Who Knows-About-Computers
Cc: A Friend In That Lab, Person Responsible For System Administration

Hi folks

I like the x4440 the best, it seems to be the best value. Nothing wrong with the x4275, though the extra power would be good.  If you did purchase the x4275, I would recommend and even number of drives (4 or more) and configure with a RAID 10. [Blah blah blah]

[A bunch of talk about various server options which is uninteresting.]
------------------------------------------------------------

I offered to go to their lab meeting and tell them my perspective on storage servers. This offer was accepted and I met with this entire group (PI, PRFSA, et al) on March 16, 2010. I explained to them the
McNett Storage Server Architecture, build it yourself from components that are easily replaceable and if in doubt, buy two and put one on the shelf. And, of course, use Linux.

In the previous couple of months I had just built such a storage server of similar capacity. I showed them my notes on that specific build. They were looking to buy a used Sun machine. With storage capacity and RAID configuration similar to my box, their prospect was, despite being used, about three times more expensive.

I clearly told them that the storage server they were looking at would be a proprietary machine which would limit their responsiveness to problems. I enumerated all the ways that storage servers fail and
addressed how the McNett design accommodates those failures. By having easily obtainable cold spares of anything that could fail on hand and ready to go you greatly reduce your dependence on uncontrollable entities in keeping the machine running.

They were very skeptical, instead appearing to favor the apparent greater security of Sun's very reassuring warranty (which this machine apparently would be "covered" by). I explained my experience with warranties and how big companies are more motivated to make the
service seem good before the sale than after.

In March of 2010, I basically described the exact course of events which ultimately took place 3 years later after they bought the Sun...

------------------------------------------------------------
Date: Jan 2013
From: Person Responsible For System Administration
Subject: [admin-mailinglist]  Spare hard drive for a Sun Fire x4540?
To: admin-mailinglist

Hi everyone.

I realize this is a long shot, but does anyone have any spare hard drives for a Sun Fire X4540 ("thor") ?  We currently have 2 failed drives and are waiting on replacements from Oracle, but they are backordered with no ETA.  If one more drive fails, we could lose all of our data, so naturally I'm very concerned.

Thanks,
PRFSA
------------------------------------------------------------
Date: Jan 2013
From: Helpful Sys-Admin
Subject: [admin-mailinglist] Re: Spare hard drive for a Sun Fire x4540?
To: Person Responsible For System Administration

What size?

If you have a hardware support contract, Oracle is bound to specific response times (2 hours onsite for "Premier").

        http://www.oracle.com/us/support/premier/servers-storage/overview/index.html

Are they not honoring that?
------------------------------------------------------------
Date: Jan 2013
From: Person Responsible For System Administration
Subject: [admin-mailinglist] Re: Spare hard drive for a Sun Fire x4540?
To: Helpful Sys-Admin

It's a 1 TB drive.  Oracle has responded, but only to tell me that the drives are backordered.

Thanks,
PRFSA
------------------------------------------------------------
Date: Jan 2013
From: Helpful Sys-Admin
Subject: [admin-mailinglist] Re: Spare hard drive for a Sun Fire x4540?
To: Person Responsible For System Administration

Sorry, our thors have 500GB drives.

Oracle's response sounds unacceptable to me.  They should have stock on hand sufficient to cover their contracts. Obviously that's not entirely feasible.  What if a earthquake took out some major sites with lots of drives?  But barring any such disaster, they should have a replacement drive on-site in 2 hours, as their contract obliges them to.

Have you tried to escalate it?
------------------------------------------------------------



My storage servers have Linux software RAID1 OS drives, minimal lean secure Gentoo, custom kernels, and daily notifications that failure has not occurred. But I owe a big debt of gratitude to Dr. McNett for showing me the wisdom of doing it right by doing it yourself.

1 comment:

xed said...

Here's some more helpful feedback I received:

"Yes, that was similar to the experience we had with a storage server [someone] insisted we buy from a company called Procom. We ended up actually losing data due to a combination of a hang of their proprietary OS and a non-battery backed raid controller in their box.

Guess who later purchased Procom -- Sun."