linux
Parallel ssh
I’m increasingly working on clusters of systems – be they traditional HPC clusters running some MPI based software or less traditional clusters running software such as Hadoop‘s HDFS and MapReduce.
In both cases, the underlying operating systems are largely the same – pretty standard Linux systems running one of the main Linux distributions (Debian, Ubuntu, Red Hat Enterprise Linux, CentOS, SuSE Linux Enterprise Server or OpenSUSE).
There are various tools for creating standard system images and pushing those to each of the cluster nodes – and I use those (more in a future post), but often, you need to perform the same task on a bunch of the cluster nodes or maybe all of them. This task is best achieved by simply ssh’ing into each of the nodes and running some command (be it a status command such as uptime, or ps or a command to install a new piece of software).
Normally, one or more users on the cluster will have been configured to use password-less logins with ssh so a first-pass at running ssh commands on multiple systems would be to script the ssh calls from a management cluster node. The following is an example script for checking the uptime on each node of our example cluster (which has nodes from cluster02 to cluster20, I’m assuming we’re running on cluster01).
#!/bin/bash # for addr in {2..20} do num=`printf "%02d" $addr` echo -n "cluster${num}:" && ssh cluster${num} uptime done
The script works, the downside is you have to create a new script each time you have a new command to run, or a slightly different sequence of actions you want to perform (you could improve the above by passing the command to be run as an argument to the script but even then the approach is limited).
What you really need at this stage is a parallel ssh, an ssh command which can be instructed to run the same command against multiple nodes. Ideally, the ssh command can merge the output from multiple systems if the output is the same – making it easier for the person running the parallel ssh command to understand which cluster nodes share the same status.
A quick search through Debian’s packages and a Google for parallel ssh turns up a few candidates,
This linux.com article reviews a number of these shells.
After looking at a few of these, I’ve settled on using pdsh. Each of the tools listed above use slightly different approaches – some provide multiple xterms in which to run commands – some provide a lot of flexibility in how the output is combined. What I like about pdsh is that it provides a pretty straightforward syntax to invoke commands and, most importantly for me, it cleanly merges the output from multiple hosts – allowing me to very quickly see the differences in a command’s output from different hosts.
You will need to configure your password-less ssh operation as normal. Once you have done that, on Debian or Ubuntu, edit /etc/pdsh/rcmd_default and change the contents of this file to a single line containing the following,
ssh
(create the file it it doesn’t exist).
Now you can run a command, such as date (to verify if NTP is working correctly) on multiple hosts with the following,
pdsh -w cluster[01-05] date
This runs date on cluster01,cluster02,cluster03,cluster04 and cluster05 and returns the output. To consolidate the output from multiple nodes into a compact display format, pdsh comes with a second tool called dshbak, used as follows,
pdsh -w cluster[01-05] date | dshbak -c
Personally, I find this output most readable. On Debian and Ubuntu systems, to invoke dshbak by default, edit /usr/bin/pdsh (which is a shell script wrapper) and change the invocation line from
exec -a pdsh /usr/bin/pdsh.bin "$@"
to
exec -a pdsh /usr/bin/pdsh.bin "$@" | /usr/bin/dshbak -c
Now when you invoke pdsh by default, it’s output will be piped through dshbak.
sudo via ssh
By default, if you attempt to run sudo through ssh, when you respond to the password prompt from sudo – it will echo the password back on your console. To avoid this, provide the -t option to ssh which forces the remote session to behave as if run through a normal tty (and thus masks the password), for example,
ssh -t foo.example.com sudo shutdown -r now
The top 10 supercomputers in the world run Linux
Pingdom (a Swedish uptime monitoring company) took the results of the latest Top500.org list (Top500 is a site that publishes a list, usually twice a year, of the top-performing supercomputers in the world where the performance is ranked against the LINPACK benchmark) and published a study of what operating systems the top 20 supercomputers in world are running. It turns out that 19 of the top 20 are running some form of Linux (with the top 10 all running Linux).
While most of you reading this probably don’t plan on purchasing a supercomputer on the scale of Roadrunner or Jaguar in the near future (if you do, we’d love to assist you in the process) our own experience with deploying smaller HPC clusters for organisations is that Linux is equally suitable for smaller clusters for a variety of reasons including the cost of deployment, the flexibility of Linux as a server OS, the wide range of HPC software that is developed on or targetted at the Linux operating system and the excellent support provided by the Linux community for new hardware (see the recent announcement of Linux being the first operating to support USB 3.0 devices as a good example).
Categories
Archives
- September 2010
- February 2010
- November 2009
- September 2009
- August 2009
- June 2009
- May 2009
- April 2009
- March 2009
- February 2009
- January 2009
- October 2008
- September 2008
- August 2008
- July 2008
- June 2008
- May 2008
- April 2008
- March 2008
- February 2008
- November 2007
- September 2007
- April 2007
- March 2007
- February 2007
- January 2007
- December 2006
- September 2006
- July 2006
- June 2006
- April 2006