Check out these two virtual workshops on:
Parallel Processing with MPI
Parallel Processing with OpenMP
Message Passing Interface (MPI) is a communication protocol for parallel programming. MPI is specifically used to allow applications to run in parallel across a number of separate computers connected by a network.
Basic Features of MPI:
Message passing programs generally run the same code on multiple processors, which then communicate with one another via library calls which fall into a few general categories:
- Calls to initialize, manage, and terminate communications.
- Calls to communicate between two individual processes (point-to-point).
- Calls to communicate among a group of processes (collective).
- Calls to create custom datatypes.
- Rich extended functionality, see some extended training materials here
Implementations of MPI:
There are several different implementations of MPI available on the UB CCR clusters.
- Intel MPI (Recommended)
- This implementation has multi-network support (TCP/IP, Infiniband, Myrinet, etc.) - by default the best network is tried first.
- Compiler "wrappers" around both Intel's compiler suite (mpiifort, mpiicc, mpiicpc) and the GNU compilers (mpif90, mpicc, mpicxx)
- Show all current versions of Intel-MPI: module avail intel-mpi
- Intel MPI website
- MVAPICH2
- This implementation runs over InfiniBand.
- Show all versions of MVAPICH2: module avail mvapich2
- MVAPICH2 website
- MPICH 2 - A portable implementation of standard Message Passing Interface (MPI) created by the Argonne National Laboratory.
- MPICH is built specifically for a given combination of network interface and compiler. The UB CCR clusters have three internal networks, Gigabit ethernet, Q-Logic infiniband, and Mellanox infiniband. The compilers are GNU, Intel, and PGI.
- NOTE: The MPICH 1 (MPICH) implementation is now deprecated on the CCR cluster. Instead, please use Intel MPI.
- MPICH 2 website
- OPENMPI - An open source implementation of MPI that is developed and maintained by a consortium made up of researchers from academia and industry.
- This implementation is network aware, so it will automatically select the network interface.
- OPENMPI is built specifically for a particular compiler.
- Show all the current versions of OPENMPI: module avail openmpi
- OPENMPI website
Using MPI with GNU Compilers:
- Use the default Intel-MPI module to set paths to the compiler.
- See Compiling Code for examples of compiling C and Fortran codes without MPI.
[user@vortex mpi-stuff]$ module load intel/14.0
[user@vortex mpi-stuff]$ module load intel-mpi/4.1.3
- Create a nodefile:
- In these examples the nodefile contains entries for the front-end machine (vortex).
[user@vortex mpi-stuff]$ cat nodefile
vortex vortex vortex vortex [user@vortex mpi-stuff]$
Compiling with MPI, C:
- Code: cpi from MPI test suite.
- view cpi code
- Compilation:
[user@vortex mpi-stuff]$ mpiicc -o cpi cpi.c
- Running the code:
[user@vortex mpi-stuff]$ mpiexec.hydra -n 2 ./cpi
Process 0 of 2 on k07n14
Process 1 of 2 on k07n14
pi is approximately 3.1415926544231247, Error is 0.0000000008333316
wall clock time = 0.000111
[user@vortex mpi-stuff]$
Compiling with MPI, Fortran:
- Code: fpi from MPI test suite, modified to not run interactively.
- view fpi code
- Compilation:
[user@vortex mpi-stuff]$ mpif77 -o fpi fpi.f
- Running the code:
[user@vortex mpi-stuff]$ mpiexec.hydra -n 2 ./fpi
Process 1 of 2 is alive
Process 0 of 2 is alive
pi is approximately: 3.1415926569231196 Error is: 0.0000000033333265
[user@vortex mpi-stuff]$
Tutorials on MPI and Related Topics
- See also /util/academic/slurm-scripts on the UB CCR front-end
Parallel Computing
Cluster Resources:
The UB CCR clusters provide extensive resources for parallel computing.
Learn more about the UB-HPC cluster
Learn more about the faculty cluster
Running Interactively:
Using the front-end login machines (vortex.ccr.buffalo.edu):
- If you wish to run an MPI code interactively on the front-end machines (only for a few processes and short duration, please - otherwise use the batch system), you can simply launch using mpirun:
[user@vortex mpi-stuff]$ module load desired_version_of_mpi
[user@vortex mpi-stuff]$ mpirun -np 2 ./a.out
(the "-np 2" argument requests 2 processors)
Using a compute node:
- Machines within the cluster are available for interactive use through the batch scheduler. Note: Depending on the selected MPI module, mpirun may not be the appropriate task launcher when in an interactive SLURM environment. See the individual MPI pages for details.
[user@vortex mpi-stuff]$ salloc --partition=debug --qos=debug --ntasks=2 --time=01:00:00
(wait a bit for node to be allocated)
[user@compute-node mpi-stuff]$ module load desired_version_of_mpi
[user@compute-node mpi-stuff]$ mpirun -np 2 ./a.out