Setting up MPI in compute clusters

A quick start guide on how to run MPI programmes in the Institute's linux compute clusters machines.

  1. Basic usage in C or Fortran
  2. Basic usage in Python
  3. Running across several nodes

This guide assumes some familiarity with the module system at ITA. The Intel compilers and MPI library are recommended for running parallel jobs; this guide only describes the procedure for the Intel compilers. Several of the options described will not work with the GNU compilers and/or OpenMPI.

The target systems of this guide are the Linux compute clusters (beehive, owl, euclid, hercules, pleiades).

Basic usage

Start by loading the Intel parallel studio module:

module load Intel_parallel_studio

To test MPI, you can use the following test programme written in C:

#include <mpi.h>
#include <stdio.h>
#include <unistd.h>
int main(int argc, char **argv)
{
  int rank;
  char hostname[256];

  MPI_Init(&argc,&argv);
  MPI_Comm_rank(MPI_COMM_WORLD, &rank);
  gethostname(hostname,255);
  printf("Hello from process %3d on host %s\n", rank, hostname);
  MPI_Finalize();
  return 0;
}

Download mpitest.c

or the equivalent programme in Fortran:

program main
   use mpi
   integer rank, size, ierror, tag, status(MPI_STATUS_SIZE), len
   character*(MPI_MAX_PROCESSOR_NAME) name

   call MPI_INIT(ierror)
   call MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierror)
   call MPI_GET_PROCESSOR_NAME(name, len, ierror)
   print "('Hello from process ',I3,' on host ',A,'')", rank, trim(name)
   call MPI_FINALIZE(ierror)
 end

Download mpitest.f90

To use the MPI-linked Intel compilers, you should use mpiicc, mpiifort, and mpiicpc for C, Fortran, and C++ respectively. Do NOT use mpif90, mpicc, etc. These may default to the GNU compilers.

Compile and link the test programme:

mpiicc -o mpitest mpitest.c

or

mpiifort -o mpitest mpitest.f90

Run the MPI program with mpiexec, e.g.:

mpiexec -n 10 ./mpitest

You should get output like the following:

Hello from process   1 on host beehive.uio.no
Hello from process   2 on host beehive.uio.no
Hello from process   3 on host beehive.uio.no
Hello from process   4 on host beehive.uio.no
Hello from process   5 on host beehive.uio.no
Hello from process   6 on host beehive.uio.no
Hello from process   7 on host beehive.uio.no
Hello from process   8 on host beehive.uio.no
Hello from process   9 on host beehive.uio.no
Hello from process   0 on host beehive.uio.no

Basic usage in Python

Support for running MPI with Python is provided via the mpi4py module. This is available through the python modules, which load different anaconda distributions (for python 2.7.x and 3.6.x). Make sure you load the python module:

module load python

And then you can run mpi4py's builtin test to make sure everything is working:

mpiexec -n 10 python -m mpi4py helloworld

Hello, World! I am process 0 of 10 on beehive2.uio.no.
Hello, World! I am process 1 of 10 on beehive2.uio.no.
Hello, World! I am process 2 of 10 on beehive2.uio.no.
Hello, World! I am process 3 of 10 on beehive2.uio.no.
Hello, World! I am process 4 of 10 on beehive2.uio.no.
Hello, World! I am process 5 of 10 on beehive2.uio.no.
Hello, World! I am process 6 of 10 on beehive2.uio.no.
Hello, World! I am process 7 of 10 on beehive2.uio.no.
Hello, World! I am process 8 of 10 on beehive2.uio.no.
Hello, World! I am process 9 of 10 on beehive2.uio.no.

You can also run python programmes across multiple nodes (detailed below), just like you would any C or Fortran programme.

Running across several nodes

For larger jobs you can run MPI jobs across several machines machines. The easiest way to configure this is to create a hosts file, with one hostname or IP address per line (unlike OpenMPI, don't add anything else in the file). For example, create the following host.txt file:

beehive2 
beehive3

You can now run mpiexec with the option -hostfile:

mpiexec -n 10 -hostfile hosts.txt ./mpitest

By default this will use all the CPUs on the first host, then use the CPUs on the next host in the file, and so on until all processes are created. If the first host has 10 or more CPUs, then it only runs there. You can also specify a maximum number of cores to use in each machine with the -ppn option:

mpiexec -n 10 -hostfile hosts.txt -ppn 5 ./mpitest

Now it will use a maximum of 5 processes in each host (unless more processes are requested that those available in the nodes). Another load-balancing strategy is to share the load equally by the different hosts. To to this, you can either use -ppn 1 or -rr:

mpiexec -n 10 -hostfile hosts.txt -rr ./mpitest

Communication fabric

By default, the Intel MPI library will try to use the fastest communication possible between the nodes. It will make use of the Infiniband network through the Direct Access Programming Library (DAPL) or Infiniband verbs (OFA) for the nodes that support it, and then fall back to other protocols. For nodes without Infiniband interfaces, these fall back attempts can be slow and even fail. In such cases (no infiniband) it is best to force TCP as communication fabric by setting the environment variable I_MPI_FABRICS_LIST to tcp:

export I_MPI_FABRICS_LIST=tcp    # in bash
setenv I_MPI_FABRICS_LIST tcp    # in tcsh/csh

Conversely, to force running through Infiniband, you can pass the option -DAPL to mpiexec, or set I_MPI_FABRICS_LIST to dapl:

export I_MPI_FABRICS_LIST=dapl    # in bash
export I_MPI_FABRICS_LIST=ofa
setenv I_MPI_FABRICS_LIST dapl    # in tcsh/csh
setenv I_MPI_FABRICS_LIST ofa

NOTE: owl25-owl28 only supports ofa. You can also use the -IB keyword to mpiexec to select ofa.

 

 

By Tiago Pereira
Published July 25, 2017 5:55 PM - Last modified June 15, 2018 1:33 PM