Archive for July, 2006

Linux RAID solutions (Part I)

Tuesday, July 25th, 2006 | hardware, linux | No Comments

We have a customer that expects to generate a few hundred gigabytes of data a week. They have small Linux cluster which we helped to build (I will blog about it soon, honest!) for oceanographic modelling – on a good week they expect to generate maybe 500GB of raw data. At the moment, they have 2 problems,

  1. How to store it reliably.
  2. How to back it up.

The cluster itself consists of a bunch of diskless dual-core opteron compute nodes and a head node with a few hundred GB of local storage. We don’t want to turn that into a general storage server because it might impact the performance of a modelling run on the cluster, and besides, it’s a 1U box so it doesn’t have room for many disks.

It’s been a while since I looked closely at storage. When I worked in HP, I got the chance to play around with some of their SAN technology including the EVA storage arrays which can store up to 84TB in a single cabinet. That kinda hardware, hooked up to each of your systems with fibre, is nice – you can scale your storage up as your needs increase, I/O speed is the same or close on local storage and you can add a tape autoloader and the SAN practically manages it’s own backups.

Of course convenience comes at a price – as does scalability. In the case of our customer, I’m not sure it makes sense at this stage of their project to burn their budget on an enterprise class storage solution like this. Sure, their data is important (maybe not quite as critical as payroll but even if the model can be rerun to regenerate the data, it could still represent a weeks work), but maybe not important enough to justify the those prices.

I’ve been investigating an alternative solution for them using off the shelf hardware plugged into a dedicated Linux storage server. It’s been a while since I looked at RAID h/w on Linux and SCSI RAID was the only option back then.
The RAID Controller

These days SATA RAID is looking pretty attractive from a price point of view. My research indicates that SATA RAID controllers are generally pretty well supported on Linux although you have to watch out for host-based or software RAID (also sometimes known as fake RAID) where the RAID controller driver actually does a lot of the work in software, rather than letting the RAID controller take care of all the work (dubbed hardware RAID). The 2 main issues with fake RAID from a Linux perspective are,

  1. Linux kernel drivers may not be sophisticated enough to implement all the functionality that the RAID controller vendor delivers in their Windows driver (in other words your RAID controller may not actually work as RAID controller under Linux).
  2. Even if the driver is capable, this means that the processor on your system, not the dedicated processor on the RAID controller will have to do a lot of the work. This may not be an issue if, like the average desktop user, your processor is actually idle most of the time, but if you have a busy storage server handling lots of network requests, it may lead to problems.

I guess I like my RAID storage to be a black box thats not subject to the system running out of memory, system crashes or other operating system level failures.

Various research indicates that both the Adaptec 2820-SA and 3Ware 9550SX-8LP are true hardware RAID controllers that are well supported on Linux. I’m opting for 8-port cards to give us plenty of room for expansion although initially I don’t expect to use more than 4 ports. I’m also opting for SATA-II cards to avail of the increased performance (we’re expecting this storage system to be in use for quite a while so it would be nice to be reasonably future-proof). I would normally lean towards Adaptec gear because I’ve had good experiences with them in the past but it’s not obvious from their support site how well Linux distributions other than Red Hat and SuSE are supported. There are no good technical reasons not to make an effort to support your hardware on all Linux distributions and our customer is using Debian on all of their other systems so I’m inclined to favour a hardware vendor that supports Linux in a general way.

3Ware give the impression of being interested in supporting you regardless of what distribution you are using. This makes sense to me for a hardware vendor and gives me hope that if I do experience a problem down the road I’m less likely to get a response from their support people telling me that the distribution I’m using isn’t supported.

The Enclosure

I’ve been burned by cheap cases and enclosures in the past so I’m inclined to go with reliable vendors for cases. Problems with cheap, unreliable power-supplies can cause endless headaches and can be very hard to track down. Both Supermicro and Chenbro seem to have a good reputation with at least some of the people on the beowulf mailing list. A final decision hasn’t been made on this yet but both the Supermicro SC833 and the Chenbro RM215 look good. The main thing is to ensure that the motherboard includes 133MHz PCI-X support.

The drives

There is considerable debate about the reliablity of SATA drives versus SCSI. Certainly, SCSI drives have traditionally been intended for enterprise use so manufacturers have worked on producing more reliable drives (at a higher price tag) than traditional consumer IDE drives. SATA drives are being used in both enterprise and consumer markets. The bottom line is that there is nothing inherently unreliable about SATA technology, but cheaper consumer drives certainly do have lower MTBF figures than enterprise SCSI drives. When it comes to purchasing SATA drives for our array, we’ll be looking to the more expensive SATA drives that come with a 3 or 5 year manufacturers warranty and the higher MTBF.

The configuration

We’re initially looking at logical storage of about 1TB and reliability is key so I’m opting for a RAID 1 configuration. Later on, we may investigate RAID 0+1 for increased performance. Lots of people seem to like RAID5 – it’s certainly cheap in terms number of disks available for storage but its write performance is poor. We’re going to start with 4 x 500GB drives and expand from there. This gives us 1TB of logical storage and room to expand.
In the second part of this article I’ll look at our options for backups and maybe discussing the implementation if we get it rolled out by then.