Linux RAID solutions (Part II)

Monday, September 18th, 2006 | hardware, linux

I finished my last article on Linux RAID solutions with a discussion of the hard drive technology we’d be investigating. Once we have the system up and running and reliably storing data, we’re going to have to look into backing it all up. The traditional approach to backups has been to regularly dump your data to tape. This worked pretty well until hard-drive capacity started significantly over-taking tape-drive capacities.

Linear Tape-Open is one of the most common high capacity tape formats available these days. Its raw capacity runs from 100GB (LTO-1) up to 400GB (LTO-3). Tape drive manufacturers like to prominently quote the data capacity in terms of compressed data capacity rather than raw data capacity. This is slightly misleading since it depends on the compressibility of your data. Text files will compress very well, you can normally expect them to halve in size as the LTO drive compresses them. Binary data files on the other hand, may not compress at all. Binary data is what our customer will be generating, and the applications generating it may very well be using compression on the fly already .. in which case we’ll see need closer to the raw storage capacity than the compressed capacity. Something to be aware of.

Nowadays, if you’re using SAN technology, you can normally take a snapshot of your data and copy it onto another part of your SAN array. Assuming you have enough storage, this is a good quick way of taking a reliable copy of your data. Of course, you may want to copy it onto another SAN at a remote location if you want a disaster-tolerant solution. A lot of commercial providers are starting to show up in this space, in Ireland we have companies like Central DataBank and Hosting365 (who wonder why anyone would use tape backup anymore). I guess there are pros and cons and I’d like to see a detailed cost-benefit analysis before I could definitively say that tape backup never makes sense – at least in countries like Ireland where broadband is still relatively expensive and the providers like to keep the Asymmetric in ADSL, it may not be economically viable to upload large amounts of data to a 3rd party backup provider.

If you happen to have an organisation with multiple offices and the bandwidth between them, making offsite backups to offsite servers makes absolute sense – maybe instead of tape backup. If you don’t have that option, and especially if you are generating a large amount of data on a regular basis you may need to look at a technology such as LTO-3. We’re still trying to decide whether LTO-2 will meet our needs or whether we’ll need to look at LTO-3. We have to do another round of sizing estimates with our customer and decide on a backup schedule – if we’re doing weekly backups we may be ok with LTO-2, if it’s monthly LTO-3 may be neccesary. For business critical data, nightly or weekly backups are vital but in our case, monthly backups may be sufficient, especially if we have good availability and reliability from the underlying storage array.

Once we’ve selected a drive from one of the main vendors (all the big names supply LTO hardware so we’ll probably go with the vendor the customer already has a relationship with, the technology is largely similar anyways), we’ll need to look at which backup software to use. For small setups, its still common enough to see the tar command (or cpio) being used. For larger, more complex configurations the 2 most common open source packages are Bacula and Amanda. Both packages have a good reputation, while both have some limitations . Our current plan is to evaluate both during initial installation of the storage system. If neither solution meets our customer’s needs, we may also investigate Arkeia’s SmartBackup which looks like it will also meet our requirements.

We’re planning on deploying this system in the next few months, I’ll report back on how that goes some time after we finished testing.

No comments yet.