mmrun: Simple job distributing using MPI

by Kasper Peeters, kasper.peeters (at) aei.mpg.de

Introduction

The mmrun tool is a very simple command-line tool to run a series of (non-MPI) programs on machines connected through an MPI layer. You give it a list of programs, and mmrun will spawn each of these in turn over as many nodes as you have requested from the MPI layer. It is useful, for instance, if you want to run one program many times, but with different values of the input parameters.
You do not need to be root in order to use mmrun. You only need SSH access to the machines in your cluster.
The programs that you want to run with mmrun do not have to be compiled with MPI. An explicit design goal was to require no changes whatsoever to existing programs, but still be able to distribute jobs easily over many processors. MPI is only used by the master to spawn processes on the clients. A typical situation in which mmrun is thus useful is when you have a long list of independent programs to be executed, and have access to a set of machines which can be reached through SSH.
No other networking tools are necessary except for SSH access (key-based) to all machines and a working LAM installation (other MPI implementations can probably also be made to work, but LAM is very simple to install). All jobs which you want to run have to be executable on all machines, so this tool is only useful if you have access to a homogeneous cluster of machines.
Progress can be monitored by pointing a web browser at the simple web server built into mmrun.
The mmrun program is licensed under the GNU General Public License.

Downloading and installing

In order to compile mmrun you need the following libraries: Then do
tar zxf mmrun-1.0.tar.gz
cd mmrun-1.0/src
make
in order to compile mmrun (you may need to change a couple of paths in the Makefile so the compiler finds your mpic++ and the ehs and mpi libraries). Install the binary somewhere convenient (autoconf stuff will be added sometime soon).
Make sure that you have a machine file for LAM, something like
# This is an example of a LAM machine
# file, listing all machines in the cluster.
#
machine1.domain.org
machine2.domain.org
machine3.domain.org
machine4.domain.org
Start LAM by doing
lamboot [machinefile]
and make sure that this works fine (i.e. that LAM manages to start its daemons on all machines in the machine file without asking for an SSH password).

The input file

Each line of the input file is interpreted as one 'command', and executed by the users' default shell on the computing node. It can contain multiple commands separated by semi-colons. A typical example: (to create a 7-frame movie with PoVray)
# This is an example input file for mmrun.
# Each line contains one command (or sequence
# of commands separated by semicolons) which
# will be executed by mmrun.
#
povray +SF1 +EF1 +Imovie.pov +Omovie_frames
povray +SF2 +EF2 +Imovie.pov +Omovie_frames
povray +SF3 +EF3 +Imovie.pov +Omovie_frames
povray +SF4 +EF4 +Imovie.pov +Omovie_frames
povray +SF5 +EF5 +Imovie.pov +Omovie_frames
povray +SF6 +EF6 +Imovie.pov +Omovie_frames
povray +SF7 +EF7 +Imovie.pov +Omovie_frames
Comments can be included by prefixing them with a '#' sign. Make sure that all commands listed in the input file can run on any of the nodes in the machine file.

Running

Make sure that LAM is running (see "installing" above). Also make sure that all commands listed in the input file can be run on any of the nodes (mmrun will only run these command lines, it does not take care of installing or copying software).
To set your cluster of machines to work on the tasks described in the mmrun input file, do
mpirun -np [number of working nodes + 1] -f mmrun [inputfile]
where "number of working nodes" is the number of nodes on which LAM is running (or fewer, of course, if you do not want to use all nodes). The "+1" is for the master node, which consumes very few resources and can easily be run together with a compute job on a single CPU.
Currently, all output from all nodes goes (without separators) to stdout on the master node. All jobs run as if you had manually ssh'ed to a client and started the program there; if you want these programs to create files which are to be visible on the master node, you have to write to a shared filesystem.

Monitoring

Once mmrun is running, you can monitor it by pointing a web browser at port 12345 on the machine on which mpirun was called (you can change this port number by giving mmrun the option "--port [number]"). This interface is view-only; the mmrun program cannot be influenced through this web interface. For the machine and input files shown above, the command
mpirun -np 4 -f mmrun jobs.mmrun
yields, after the first four jobs have completed, a monitor screen as shown below,
jobs

commandrankstatusstarted onended on
povray +SF1 +EF1 +Imovie.pov +Omovie_frames1completed15:52:0515:52:08
povray +SF2 +EF2 +Imovie.pov +Omovie_frames2completed15:52:0515:52:08
povray +SF3 +EF3 +Imovie.pov +Omovie_frames1completed15:52:0815:52:11
povray +SF4 +EF4 +Imovie.pov +Omovie_frames2completed15:52:0815:52:11
povray +SF5 +EF5 +Imovie.pov +Omovie_frames1running15:52:11 
povray +SF6 +EF6 +Imovie.pov +Omovie_frames2running15:52:11 
povray +SF7 +EF7 +Imovie.pov +Omovie_frames not running  

machines

machinerankstatusload avg
machine1.domain.org00.93master
machine2.domain.org10.87running
machine3.domain.org20.99running
machine4.domain.org32.30overloaded

Note that machine4 is recognised to be loaded and waiting for the load level to drop. The screen updates automatically every second to reflect the current status (note, however, that the machine status summary is not updated during the time that a command runs, so the load average reflects the status just before the command was started).

Startup options

There are a few startup options:
--port=
Determines the port on which the internal web server is run.
--maxload=
Determines the maximum load average that a slave node is allowed to have; if the load average goes higher than this the slave will be instructed to wait and not will receive any additional jobs until the load level drops again.

Coming up / todo

There are certainly some improvements left to incorporate. One is the ability to start processes only when the load level drops below a given number, or when there are no interactive users logged in to a machine. A related useful option would be to temporarily sleep a process on a client when the interactive load goes up (so that you can start a process at night on a computer of your office mate but have it automatically sleep in the morning, when the machine is needed for interactive). Finally, a clean way to separate the stdout streams of all processes would probably be useful.
If you have any other suggestions, feel free to contact me by email (kasper.peeters (at) aei.mpg.de).

Webstats4U - Free web site statistics
Personal homepage website counter
$Id: index.html,v 1.6 2005/11/04 16:04:11 kp229 Exp $