useful tools

Parallel ssh

Tuesday, August 18th, 2009 | linux, useful tools | No Comments

I’m increasingly working on clusters of systems – be they traditional HPC clusters running some MPI based software or less traditional clusters running software such as Hadoop‘s HDFS and MapReduce.

In both cases, the underlying operating systems are largely the same – pretty standard Linux systems running one of the main Linux distributions (Debian, Ubuntu, Red Hat Enterprise Linux, CentOS, SuSE Linux Enterprise Server or OpenSUSE).

There are various tools for creating standard system images and pushing those to each of the cluster nodes – and I use those (more in a future post), but often, you need to perform the same task on a bunch of the cluster nodes or maybe all of them. This task is best achieved by simply ssh’ing into each of the nodes and running some command (be it a status command such as uptime, or ps or a command to install a new piece of software).

Normally, one or more users on the cluster will have been configured to use password-less logins with ssh so a first-pass at running ssh commands on multiple systems would be to script the ssh calls from a management cluster node.  The following is an example script for checking the uptime on each node of our example cluster (which has nodes from cluster02 to cluster20, I’m assuming we’re running on cluster01).

#!/bin/bash
#
for addr in {2..20}
do
 num=`printf "%02d" $addr`
 echo -n "cluster${num}:" && ssh cluster${num} uptime
done

The script works, the downside is you have to create a new script each time you have a new command to run, or a slightly different sequence of actions you want to perform (you could improve the above by passing the command to be run as an argument to the script but even then the approach is limited).

What you really need at this stage is a parallel ssh, an ssh command which can be instructed to run the same command against multiple nodes. Ideally, the ssh command can merge the output from multiple systems if the output is the same – making it easier for the person running the parallel ssh command to understand which cluster nodes share the same status.

A quick search through Debian’s packages and a Google for parallel ssh turns up a few candidates,

This linux.com article reviews a number of these shells.

After looking at a few of these, I’ve settled on using pdsh. Each of the tools listed above use slightly different approaches – some provide multiple xterms in which to run commands – some provide a lot of flexibility in how the output is combined. What I like about pdsh is that it provides a pretty straightforward syntax to invoke commands and, most importantly for me, it cleanly merges the output from multiple hosts – allowing me to very quickly see the differences in a command’s output from different hosts.

You will need to configure your password-less ssh operation as normal. Once you have done that, on Debian or Ubuntu,  edit /etc/pdsh/rcmd_default and change the contents of this file to a single line containing the following,

ssh

(create the file it it doesn’t exist).

Now you can run a command, such as date (to verify if NTP is working correctly) on multiple hosts with the following,

pdsh -w cluster[01-05] date

This runs date on cluster01,cluster02,cluster03,cluster04 and cluster05 and returns the output. To consolidate the output from multiple nodes into a compact display format, pdsh comes with a second tool called dshbak, used as follows,

pdsh -w cluster[01-05] date | dshbak -c

Personally, I find this output most readable. On Debian and Ubuntu systems, to invoke dshbak by default, edit /usr/bin/pdsh (which is a shell script wrapper) and change the invocation line from

exec -a pdsh /usr/bin/pdsh.bin "$@"

to

exec -a pdsh /usr/bin/pdsh.bin "$@" | /usr/bin/dshbak -c

Now when you invoke pdsh by default, it’s output will be piped through dshbak.

Digg This
Reddit This
Stumble Now!
Buzz This
Vote on DZone
Share on Facebook
Bookmark this on Delicious
Kick It on DotNetKicks.com
Shout it
Share on LinkedIn
Bookmark this on Technorati
Post on Twitter
Google Buzz (aka. Google Reader)

Tags: , ,

sudo via ssh

Wednesday, August 12th, 2009 | linux, useful tools | No Comments

By default, if you attempt to run sudo through ssh, when you respond to the password prompt from sudo – it will echo the password back on your console. To avoid this, provide the -t option to ssh which forces the remote session to behave as if run through a normal tty (and thus masks the password), for example,

ssh -t foo.example.com sudo shutdown -r now
Digg This
Reddit This
Stumble Now!
Buzz This
Vote on DZone
Share on Facebook
Bookmark this on Delicious
Kick It on DotNetKicks.com
Shout it
Share on LinkedIn
Bookmark this on Technorati
Post on Twitter
Google Buzz (aka. Google Reader)

Tags: , ,

Repartitioning modern Linux systems without reboot

Friday, April 17th, 2009 | galway, linux, useful tools, web | No Comments

This one is for my own future reference as much as anything. Ever since the move to udev in Linux 2.6, I’ve found it neccesary to do the very un-Linux like thing of rebooting before the appropriate device appeared under /dev. This was only an occasional hassle but still, you shouldn’t need to reboot Linux for such a thing.

Thanks to Robert for his Google magic in turning up partprobe, part of the GNU Parted package. As the Debian man page for partprobe says

partprobe is a program that informs the operating system kernel of
partition table changes, by requesting that the operating  system
re-read the partition table.

Excellent! Parted is normally installed on Debian and Ubuntu by default anyways, if not, simply, aptitude install parted and you’ll have access to the excellent partprobe.

We were trying to add some additional swap to a running system, the full series of commands needed as follows (I could have used parted to create the partition  but the cfdisk tool has a nice interface),

  1. sudo cfdisk /dev/sda (and create new partition of type FD, Linux RAID)
  2. sudo cfdisk /dev/sdb (and create new partition of type FD, Linux RAID)
  3. sudo partprobe
  4. sudo mdadm –create /dev/md3 -n 2 -x 0 -l 1 /dev/sda4 /dev/sdb4 (our swap devices are software RAID1 devices)
  5. sudo /etc/init.d/udev restart (this updates /dev/disk/by-uuid/ with the new RAID device)
  6. sudo mkswap /dev/md3
  7. sudo vi /etc/fstab (and add a new entry for /dev/md3 as a swap device)
  8. sudo swapon -a (to activate the swap device)
  9. sudo swapon -s (to verify it is working)
Digg This
Reddit This
Stumble Now!
Buzz This
Vote on DZone
Share on Facebook
Bookmark this on Delicious
Kick It on DotNetKicks.com
Shout it
Share on LinkedIn
Bookmark this on Technorati
Post on Twitter
Google Buzz (aka. Google Reader)

Tags: , , ,

Subversion sparse checkouts

Tuesday, April 7th, 2009 | galway, software engineering, useful tools | 3 Comments

I’ve been using Subversion for a few years now but as with lots of technology I work with, I’ve learned enough about it to do the job I need to do but I’ve never dug into it exhaustively. It turns out a nice feature called sparse checkouts was introduced into Subversion 1.5. With Subversion,  you can either create one repository for each project or use a single repository for multiple projects. I like using a single repository for multiple projects but there are advantages and disadvantages to both approaches and it’s yet another source of religious debate and flamage so I won’t suggest which would suit your needs best.

One of the disadvantages of using a single repository for multiple projects is that any time you want to check out part of your repository, you either had to do something like this,

svn checkout http://www.example.com/svn/myrepo

to check out the whole repository (and if it’s a big repository, and you’re on a slow connection, you get to watch the world wide wait in action) or something like this

svn checkout http://www.example.com/svn/myrepo/oneofmyprojects

to just check out a teensie part of your repository which should happen faster than the former approach. The disadvantage to the second approach is that you end up with only part of the repository checked out and if you want another part in the future, you’ll have to check that out separately like

svn checkout http://www.example.com/svn/myrepo/anotheroneofmyprojects

Pretty soon, you’ll have a directory full of separately checked out projects, each of which you have to individually svn update, svn commit and so on. Hey, it starts looking like you have one repository for each project. Ideally, what you want to be able to do is to check out your entire repository but only the bits your are interested in, while keeping the option open of checking out other parts in the future and managing them all as the one repository that they are. Sparse checkouts introduced this functionality.

With svn’s sparse directory support, you can do the following,

svn checkout --depth=immediates http://www.example.com/svn/myrepo

This checks out the myrepo repository, but only to a depth of 1, that is, all files and directories immediately under myrepo but not any further subdirectories and files. So a directory listing of your checked out repository might look like,

oneofmyprojects/
anotheroneofmyprojects/
README.txt

This gives you an overview of the myrepo hierarchy without pulling all the files. Furthermore, it is sticky – any subsequent svn update commands you run will honour the scope you set in the first checkout.

If you now want to flesh out parts of the tree, you can do the following

svn update --set-depth=infinity myrepo/oneofmyprojects

This updates the contents of myrepo/oneofmyprojects with all children (files and subdirectories) ensuring you have a full copy of that part of the repository. If you subsequent run an svn update in myrepo – the behaviour for oneofmyprojects continues to be sticky and will result in an update of all files and subdirectories (while not checking out the children of any of  the other myrepo top-level directories).

Unfortunately, you cannot checkout a directory with depth=infinity and then update it to a reduced a depth (the behaviour only works in the direction of increasing depth for now).

More detail is available at http://svnbook.red-bean.com/en/1.5/svn.advanced.sparsedirs.html

I took a quick look at TortoiseSVN (a very nice graphical Subversion client for Windows) and if you do an SVN checkout it has an option  for Checkout Depth which I’m guessing provides the same functionality (but I haven’t tested it).

Digg This
Reddit This
Stumble Now!
Buzz This
Vote on DZone
Share on Facebook
Bookmark this on Delicious
Kick It on DotNetKicks.com
Shout it
Share on LinkedIn
Bookmark this on Technorati
Post on Twitter
Google Buzz (aka. Google Reader)

Tags: ,

Converting Openoffice Calc spreadsheet to image

Thursday, April 2nd, 2009 | linux, useful tools | 3 Comments

Just a quick tip, if you have a spreadsheet or part of a spreadsheet in Openoffice Calc which you want to convert to an image (for use in a web page or something).

I was initially dismayed to find that Export and Save As in Openoffice calc only support other spreadsheet formats, XHTML and PDF (to be fair, the XHTML is pretty clean compared to that output by Microsoft Office but it wasn’t what I wanted in this case).

After some playing around, I figured out a pretty easy way of converting my spreadsheet to an image. I highlighted the table in Openoffice calc and clicked on Edit / Copy. Then I started the GNU Image Manipulation Program (GIMP) – I guess other graphics programs should work equally well – and clicked on File / Acquire / Paste as New and voilà – the spreadsheet appeared in a new GIMP window ready to be saved in whatever graphics format you wish.

(I did this on a Linux system, I’d be curious to know if the same works on Windows).

Digg This
Reddit This
Stumble Now!
Buzz This
Vote on DZone
Share on Facebook
Bookmark this on Delicious
Kick It on DotNetKicks.com
Shout it
Share on LinkedIn
Bookmark this on Technorati
Post on Twitter
Google Buzz (aka. Google Reader)

Tags: , , , , ,

What a difference a Gig makes

Tuesday, October 14th, 2008 | hardware, linux, useful tools | No Comments

We’re working on a project at the moment that involves deploying various Linux services for visualising Oceanographic modelling data using tools such as Unidata’s THREDDS Data Server (TDS) and NOAA/PMELS’s Live Access Server (LAS). TDS is a web server for making scientific datasets available via various protocols including plain old HTTP, OPeNDAP which allows a subset of the original datasets to be accessed and WCS. LAS is a web server which, using sources such as an OPeNDAP service from TDS, allows you to visualise scientific datasets, rendering the data overlaid onto world maps and allowing you to select particular variables from the data which you are interested in. In our case, the datasets are generated by the Regional Ocean Modeling System (ROMS) and include variables such as sea temperature and salinity at various depths.

The data generated by the ROMS models we are looking at uses a curvilinear coordinate system – to the best of my understanding (and I’m a Linux guy, not an Oceanographer, so my apologies if this is a poor explanation) since the data is modelling behaviour on a spherical surface (the Earth) it makes more sense to use the curvilinear coordinate system. Unfortunately, some of the visualisation tools, in particular LAS prefers to work with data using a regular or rectilinear grid. Part of our workflow involves remapping the data from curvilinear to rectilinear using a tool called Ferret (also from NOAA). Ferret does a whole lot more than regridding (and is, in fact, used under the hood by LAS to generate a lot of the graphical output of LAS) but in our case, we’re interested mainly in its ability to regrid the data from one gridding system to another. Ferret is an interesting tool/language – an example of the kind of script required for regridding is this one from the Ferret examples and tutorials page. Did I mention we’re not Oceanographers? Thankfully, someone else prepared the regridding script, our job was to get it up and running as part of our work flow.

We’re nearly back to the origins of the title of this piece now, bear with me!

We’re using a VMware virtual server as a test system. Our initial deployment was a single processor system with 1 GB of memory. It seemed to run reasonably well with TDS and LAS – it was responsive and completed requests in a reasonable amount of time (purely subjective but probably under 10 seconds if Jakob Nielsen’s paper is anything to go by). We then looked at regridding some of the customer’s own data using Ferret and were disappointed to find that an individual file took about 1 hour to regrid – we had about 20 files for testing purposes and in practice would need to regrid 50-100 files per day. I took a quick look at the performance of our system using the htop tool (like the traditional top tool found on all *ix systems but with various enhancements and very clear colour output). There are more detailed performance analysis tools (include Dag Wieers excellent dstat) but sometimes I find a good high-level summary more useful than a sea of numbers and performance statistics. Here’s a shot of the htop output during a Ferret regrid,

High kernel load in htop

What is interesting in this shot is that

  • All of the memory is used (and in fact, a lot of swap is also in use).
  • While running the Ferret regridding, a lot of the processor is being spent in kernel activity (red) instead of normal (green) activity.

High kernel (or system) usage of the processor is often indicative of a system that is tied up doing lots of I/O. If your system is supposed to be doing I/O (a fileserver or network server of some sort) then this is good. If your system is supposed to be performing an intensive numerical computation, such as here, we’d hope to see most of the processor being used for that compute intensive task, and a resulting high percentage of normal (green) processor usage. Given the above it seemed likely that the Ferret regridding process needed more memory in order to efficiently regrid the given files and that it was spending lots of time thrashing (moving data between swap and main memory due to a shortage of main memory).

Since we’re working on a VMware server, we can easily tweak the settings of the virtual server and add some more processor and memory. We did just that after shutting down the Linux server. We restarted the server and Linux immediately recognised the additional memory and processor and started using that. We retried our Ferret regridding script and noticed something interesting. But first, here’s another shot of the htop output during a Ferret regrid with an additional gig of memory,

Htop with high use processor time

What is immediately obvious here is that the vast majority of the processor is busy with user activity – rather than kernel activity. This suggests that the processor is now being used for the Ferret regridding, rather than for I/O. This is only a snapshot and we do observe bursts of kernel processor activity still, but these mainly coincide with points in time when Ferret is writing output or reading input, which makes sense. We’re still using a lot of swap, which suggests there’s scope for further tweaking, but overall, this picture suggests we should be seeing an improvement in the Ferret script runtime.

Did we? That would be an affirmative. We saw the time to regrid one file drop from about 60 minutes to about 2 minutes. Yes, that’s not a typo, 2 minutes. By adding 1 GB of memory to our server, we reduced the overall runtime of the operation by 97%. That is a phenomenal achievement for such a small, cheap change to the system configuration (1GB of typical system memory costs about €50 these days).

What’s the moral of the story?

  1. Understand your application before you attempt tuning it.
  2. Never, ever tune your system or your application before you understand where the bottlenecks are.
  3. Hardware is cheap, consider throwing more hardware at a problem before attempting expensive performance tuning exercises.

(With apologies to María Méndez Grever and Stanley Adams for the title!)

Digg This
Reddit This
Stumble Now!
Buzz This
Vote on DZone
Share on Facebook
Bookmark this on Delicious
Kick It on DotNetKicks.com
Shout it
Share on LinkedIn
Bookmark this on Technorati
Post on Twitter
Google Buzz (aka. Google Reader)

Tags: , , , , , ,

Stress testing a PC revisited

Thursday, September 25th, 2008 | hardware, linux, useful tools | No Comments

I’m still using mostly the same tools for stress testing PCs as when I last wrote about this topic. memtest86+ in particular continues to be very useful. In practice, the instrumentation in most PCs still isn’t good enough to identify which DIMM is failing most of the time (mcelog sometimes makes a suggestion about which DIMM has failed and EDAC can also be helpful, but in my experience there is lots of hardware out there which doesn’t support these tools well). The easiest approach I’ve found to date is to take out one DIMM at a time and re-run memtest86+ … when the errors go away you’ve found your problematic DIMM – put it back in again and re-run to make sure you’ve identified the problem. If you keep getting the errors regardless of which DIMMs are installed, you may be looking at a problem with the memory controller (either on the processor or the motherboard depending on which type of processor you are using) – if you have identical hardware, you should look at swapping the components into that for further testing.

Breakin is a tool recently announced on the beowulf mailing list which looks like it has a lot of potential also and I plan on adding it to my stress testing toolkit the next time I encounter a problem which looks like a possible hardware problem. What looks nice about Breakin is that it tests all of the usual suspects including processor, memory, hard drives and it includes support for temperature sensors, MCE logging and EDAC. This is attractive from the perspective of being able to fire it up, walk away and come back to check on progress 24 hours later.

Finally, we’ve found the Intel MPI Benchmarks (IMB, previously known as the Pallas MPI benchmark) to be pretty good at stress testing systems. Anyone conducting any kind of qualification or UAT on PC hardware, particularly hardware intended to be used in HPC applications should definitely be including
an IMB run as part of their tests.

Digg This
Reddit This
Stumble Now!
Buzz This
Vote on DZone
Share on Facebook
Bookmark this on Delicious
Kick It on DotNetKicks.com
Shout it
Share on LinkedIn
Bookmark this on Technorati
Post on Twitter
Google Buzz (aka. Google Reader)

Tags: , , , , , ,

Viruses and Malware on Windows

Tuesday, September 9th, 2008 | useful tools, windows | 2 Comments

Here I am writing about Windows – If I’m not careful, I’ll have to rename this blog to Thoughts on Windows. What’s the Linux angle here? I guess I’m the smug Linux user poking fun at Windows or something along those lines (but don’t leave just yet if you’re one of those smug Windows users, I’d be interested in your thoughts on the following).

Two unrelated events inspired this piece. I came across an interesting blog recently comparing the performance of various anti-virus products on a number of items of malware. I haven’t come across the guys behind this before, InfraGard but given their links to the FBI they seem to have some credibility so I’m assuming their testing methodologies are reasonably reliable.

Three things struck me about that blog,

  • AVG does a pretty good job of protecting Windows systems from malware and viruses (I know I’m starting to sound like an AVG fan-boy between this and my previous references to it).
  • Some of the “leading” anti-virus programs / suites are pretty poor at protecting Windows systems (not to mention the fact that they interfere with the operation of your computer).
  • You can’t rely on any anti-virus software to fully protect your Windows system.

That’s about the point where I become the smug Linux user, up until the point where I remembered that I have to look after my share of Windows systems both in our offices and for friends and family. This brings me on to the second recent event which inspired this piece.  A friend running Windows Vista had recently started getting worrying messages about things called Trojan-Spy.Win32.KeyLogger.aa trying to send traffic from his PC and wanted to know if he should be worried. “Probably”, I said and took a look at his system.

In the past, my toolbox for a healthy Windows PC would include the aforementioned AVG and, if I had concerns about spyware, Spybot – Search & Destroy – another great Windows tool that is free for non-commercial use. Between those two tools, I could be pretty confident that a Windows machine was running clean of any malicious software. So I installed and ran both on my friends PC – multiple times! Spybot even suggested running immediately after start-up as Administrator so that it could ferret out as much dodgy malware as possible. A few hours later, we were still being entertained by messages from Windows about our good friend Trojan-Spy.Win32.KeyLogger.aa (and maybe some others) which hadn’t even been detected by AVG or Spybot, never mind removed by them.

Some research on the interweb turned up posts and comments from various people who had encountered this particular trojan and by all accounts it’s a tough one to remove. I was on the verge of suggesting an OS re-install (taking inspiration from Aliens,  sometimes nuking the system from orbit is the only way to be sure) possibly in tandem with a Linux re-install to forever banish such nasties when I came across some references to another tool called Superantispyware which some recommended as the antidote to Trojan-Spy.Win32.KeyLogger.aa. With a name like that, it had to be good at dealing with spyware right? I figured it was worth a shot before we tried something more drastic, particularly since there is a free for non-commercial use version available. One download and install later, it kicks off and immediately warns us about some spyware it has found (either our friend the KeyLogger or another, as yet unknown, piece of spyware). After a half hour or so, it had finished a scan and proceeded to remove or quarantine all of the various pieces of spyware it had turned up. We booted the system once more, re-ran AVG and Spybot S&D and didn’t get any more warnings about Trojan-Spy.Win32.KeyLogger.aa. trying to send data off of the system. My friend was happy enough that the system was clean. Me? I’d probably still go and re-install the OS before putting my credit card details near the computer again (to be sure, to be sure) but the odds are it is clean – for which we probably have Superantispyware to thank.

So, what are our conclusions?

  • (With my smug Linux hat on once more) – consider installing and running Linux for your home desktop – a distribution such as the latest Ubuntu will provide all the software you need for typical day to day surfing, emailing and word-processing and won’t leave you open to half of this stuff (you’ll still be susceptible to phishing attacks and cross-site scripting attacks but you’ll be automatically eliminating a whole world of viruses, keyloggers and trojans which won’t ever run on a Linux system).
  • If you must run Windows, make sure you install some decent software to protect you – start with AVG, Spybot S&D (and maybe Superantispyware) – or let a comment to tell us about other useful ones.
  • If you’re running Windows, do not use the Administrator account for your activities, and don’t set up an alternative account with administrator privileges either – that kinda defeats the purpose. I know it’s a pain in the ass when you want to install some new software, but trust me, it’ll be a bigger pain in the ass when someone starts buying things from Itunes with your credit card.
  • Don’t click on things that you don’t understand and don’t install stuff from random web-pages, even if they do tell you it’s for your security (cmon, if some random stranger came to your door and told you he needed to “install something” in your bedroom “for your security” you’d slam the door in their face, before calling the police, why would you react differently to a stranger on the internet?).
  • Finally, the bad news is that email you just received claiming to be a red hot picture of Britney or Christina in a compromising position … well it probably isn’t (I know, if some international criminal ring is going to take over your computer for nefarious purposes you’d think they’d at least give you a naughty picture to take your mind off things, but I’m afraid they generally don’t play fair) so don’t click on the attached zip-file.
Digg This
Reddit This
Stumble Now!
Buzz This
Vote on DZone
Share on Facebook
Bookmark this on Delicious
Kick It on DotNetKicks.com
Shout it
Share on LinkedIn
Bookmark this on Technorati
Post on Twitter
Google Buzz (aka. Google Reader)

Tags: , , , , ,

Google Chrome – first impressions

Wednesday, September 3rd, 2008 | useful tools, web, windows | No Comments

I guess most of you have heard about Google Chrome by now, courtesy of the interesting comic book marketing device (allegedly accidentally published before it was ready, hhhmmm). Some of the features and design decisions mentioned in the comic made me curious enough to keep an eye out for its release this evening.  Ok, it doesn’t run on Linux (yet) but it is open source (Google seem to be using the BSD license for their code in Chrome) and contains some interesting features.

The intention with Google Chrome seems to be to keep the UI clean – first impressions are that they’ve succeeed in doing that. It seems much cleaner than either IE (which I find to be irritatingly non-intuitive) or Firefox (which, while it has a lot going on, since 3.0, manages to display things pretty cleanly).

Interestingly during initial start-up, it offered to import my Firefox settings, but I didn’t see any sign of an offer to import my Internet Explorer settings – not that I would have needed it but there seems to be a statement of intent here.

A quick tour of a few of the sites that I usually visit didn’t reveal any major problems. Chrome also enforces the same kind of warning about self-signed SSL certs that Firefox 3.0 introduced but doesn’t present quite as intimidating a warning. Performance seems pretty good but I couldn’t think of any particularly tortuous sites that I regularly visit so I don’t know how well it will handle heavier sites. I do miss my Adblock Plus Firefox extension though – I didn’t have time to see whether there is anything equivalent in Chrome yet or whether you can somehow get it to use Firefox extensions (mind you, considering Google’s core business, it probably won’t be going out of it’s way to help us filter ads). The new tab page / home page is interesting but I’m not sure how useful it will be in the long-term. I may revisit the same old pages every day more than I realise, in which case it may turn out to be a handy launch-pad.

An hour of use isn’t going to show a great deal. I’ll probably give this a test drive for a week or so before I come to any solid conclusions. Unfortunately (or maybe fortunately) most of my day-to-day activities are carried out on Linux desktops / notebooks so I won’t get to fully battle test Chrome until they release the Linux port.

First impressions though, are that Google have an interesting new browser with some nice features and that both Microsoft and Mozilla have some interesting times ahead.

Digg This
Reddit This
Stumble Now!
Buzz This
Vote on DZone
Share on Facebook
Bookmark this on Delicious
Kick It on DotNetKicks.com
Shout it
Share on LinkedIn
Bookmark this on Technorati
Post on Twitter
Google Buzz (aka. Google Reader)

Ghost for Linux

Tuesday, March 11th, 2008 | linux, useful tools, windows | 1 Comment

We have a number of laptops in the office for pool use – when someone is travelling to a customer site or a conference they can take one of the pool laptops for development, email and so on. Since these are occasionally used and tend to get knocked around a bit, when we purchased them we went for sturdy middle of the road laptops (the HP Compaq nx6310 in case you’re interested – love those memorable URLs HP) . While this made sense when we purchased them, one of the laptops is being used pretty heavily for Windows development at the minute and is showing some signs of stress. The laptops only have 512MB of memory and 5400rpm hard-drives so I figured some upgrades were worth trying before we move to purchasing a faster laptop.

Memory and drive upgrades for laptops are surprisingly cheap these days – 1GB of DDR2 for the nx6310 cost just €20.50. A 7200rpm notebook drive cost a little more but I figured it was worth upgrading both as we were doing any upgrades. Upgrading memory in the nx6310s is very straightforward, there is a memory expansion port on the underside of the laptop accessed through a panel with a single screw – it took all of 30 seconds.

Upgrading the hard drive is physically very straightforward but of course there is one catch – ideally I’d prefer not to spend a half a day to a day reinstalling Windows XP on the new drive including all the post-SP2 updates and hot-fixes and all of the applications installed (unfortunately we’re not big enough yet for me to justify the time it would take to develop a proper customised install image although I have been looking at tools like nlite to see what’s possible). So I need some way of copying or ghosting the contents of the existing hard-drive and restoring them to the new drive when I swap them. The traditional solution to this was to the use the aforementioned Ghost software – but since we use Linux for a lot of our infrastructure I was more interested in seeing if there were viable alternatives on Linux for doing the same thing.

Some research reveals that the wikipedia page for disk cloning summarises the current Linux-based options pretty well. After looking at the various tools and their functionality, I opted to run with partimage which seemed to be lightweight and capable of doing what I required (dumping the Windows partition from the notebook onto a Linux server and restoring this partition onto a new notebook – all over the network). I had briefly considered just using dd after booting the notebook up with a rescue disk – it would work fine (I’ve used this approach in the past to recover a badly corrupted LVM volume to a new disk) but it is a little less user-friendly than a cloning tool like partimage. One of the benefits of using partimage is that it understands a number of filesystems including NTFS and it’s smart enough to only back up the parts of the filesystem that have data on them, rather than copying the whole partition as dd would. It’s also capable of backing up the Master Boot Record and the partition data, and allowing you to restore them independently of restoring the whole drive.

So partimage it was – I needed client software to run on the notebook and server software to run on a Linux box and receive the partition data read from the client. The partimage guys recommend the SystemRescueCd which is a Live Linux CD which you boot off of and which provides a whole bunch of tools including partimage. I’ve used SystemRescueCd before and it’s well put together and does exactly what it says on the tin. So I downloaded the latest version of that which includes partimage 0.6.6. Note that you seem to need the same version of partimage on the client and the server. I’m using Debian 4.0 on our Linux server which includes version 0.6.4 of the partimage server software. To get around the version incompatibility, I had to go with building the partimage server from a source package downloaded from the partimage site. It sounds worse than it turned out in practice! It’s a pretty painless configure, make, make install after you install a few dependencies.

I compiled my partimage server with ssl and login disabled because it was only running on our local network for a short while under my supervision. If you’re running this permanently, you should probably opt for a more secure configuration. After pointing the partimage server at a writable area on the Linux server (you’ll need a good amount of disk space, partimage can compress backed up images, but you should probably still allow close to the raw size of the partition you are backing up to have some headroom), the laptop was rebooted with the SystemRescueCd.

After booting, the partimage command was started and a basic curses dialog was displayed. I selected the partition we wanted to back up (/dev/sda1) and gave it a name of hostname.partition and pointed it at the server with partimage running. This brought me to a second screen where I specified to use a gzip compressed image and put in a description of “sda1″. After this the backup started and partimage told me it was backing up 17.5GB out of the 37GB NTFS partition (the rest was unused).

The backup took about an hour all told (this over a gigabit LAN – I’d imagine the laptop drive was the bottleneck) after which I installed the new drive in the laptop and again booted with the SystemRescueCd.

Before starting partimage to restore the image, I had to create a partition on the new drive. Partimage doesn’t seem to like running against a drive with no partition (even though I planned to restore the partition and mbr from the partimage backup anyway). So I created a throwaway partition of 10MB and then started partimage. First, I selected the option to restore just the MBR and pointed it at the server. I then selected the image I wanted to restore from the server and proceed with a restore of the MBR and the partition table. When this had finished (it took seconds to do the MBR restore), I exited partimage and verified that the throwaway partition table I had created had been replaced with the partition table from the partimage backup (I used cfdisk, but the SystemRescueCd includes a bunch of different partition tools if you prefer something a little more powerful).

The partition table looked exactly as it had on the original drive, so I restarted partimage pointing it at the server again and went for a full restore of the sda1 image to the sda1 partition this time. This took about 40 minutes, which was faster than the original backup. Since writes are normally a bit slower than reads I was surprised – I’m guessing the speed difference is down to the faster laptop drive but it might be something else. Either way, after 40 minutes partimage told me the image had been restored. So the moment of truth had arrived, I rebooted the laptop and waited to see if it gave me the old “Operating System Not Found …” message or whether it booted back to Windows as it had with the original drive. Success! After a few tense moments, the laptop booted to Windows on the new drive and allowed me to login with the same credentials as I’d used on the old drive. A quick inspection of the environment indicated that it all looked as per the original – and there weren’t any wierd errors in the Windows event logs. As a quick smoke test, I ran a defrag of the Windows drive – I figured if there were any problems with the installation, it was a good way of stress testing the filesystem. There were no problems with the defrag, so unless the main user of the laptop notices any problems when I return it to him, I’m pronouncing this a success.

For users of Ghost, I suspect the interface on Partimage may be a bit rough around the edges, but for anyone that is comfortable with command-line Linux and has done some system administration – Partimage is definitely a very useful tool for disk cloning. I can see myself using this regularly both for migrating systems across hard drives and for backing up critical systems at the partition level.

Digg This
Reddit This
Stumble Now!
Buzz This
Vote on DZone
Share on Facebook
Bookmark this on Delicious
Kick It on DotNetKicks.com
Shout it
Share on LinkedIn
Bookmark this on Technorati
Post on Twitter
Google Buzz (aka. Google Reader)

Tags: , ,

Search