Phase 7: MPI
In this assignment you’ll parallelize your solver with MPI. You can use MPL like in the example code, or the C bindings if you’re more familiar with them.
Performance requirements are less stringent than in previous phases to account for inter-node communication–the program must run on 2d-medium-in.wo
in 30 seconds given 4 processes on one full m9
node and 4 processes on another. In a job with two full m9
nodes, you can launch 4 processes per node with:
mpirun --npernode 4 wavesolve_mpi ...
This phase is significantly more challenging than previous phases for most students. I strongly recommend that you break it into two steps:
- Get your I/O working–make an MPI program that reads in a wave orthotope file with MPI I/O, prints the header and any other debug information you need, and writes back out an identical file with MPI I/O
- Once the I/O is working, figure out the exchange of halos and your
solve
function
When you run with multiple processes, put your output files in ~/nobackup/archive
–NFS (the rest of your home directory) is astonishingly slow to allow MPI I/O writes.
Load gcc/14.1
and openmpi/5.0
to get access to a recent MPI compiler, and mpl
to use MPL.
Division of Labor
Much as in the previous phase, you’ll split work roughly evenly among processes. I recommend an approach similar to in the example code, with each process being in charge of about
Since updating a cell of u
requires data from the rows above and below it, the processes will need to store ghost rows and exchange them on each iteration (see exchange_halos
in the example code).
Recommended Development Process
This guide is one possible way to approach the implemention for this project phase. It is not required.
This guide emphasizes the principle of incremental development by making small changes and verifying the results at each step.
- Disable (comment out) wave solving.
- For each of the following steps,
- verify correctness of a single node and
- verify correctness on multiple nodes with multiple processes (2 nodes x 4 processes each)
- Verify that header data can be read in properly using MPI
- Read in the header data
- COUT the results to the screen
- Verify correctness
- Check that all data can be read and written properly using MPI (without halos)
- Read the header data and vector data from MPI
- Write the results
- Manually inspect the result with
waveshow
orwavediff
- Check that all data can be read and written properly using MPI (with halos)
- Read the header data and vector data from MPI
- Fill in the appropriate U/V rows with halos with the rank of the current process
- Write the results
- Manually inspect the result with
waveshow
orhexdump
- By visualizing the rank, you can verify:
- Each process is writing to only the rows it should
- Halo rows are not appearing in the result
- Check your code for actual functionality
- Remove the code that populates junk values
- Verify that the result is saved properly
- Check your wave energy
- Find the energy of the initial wave
- Print out the result
- Verify correctness
- Check your wave solve
- Reenable wave solving
- Hardcode the program to solve only a single iteration
- Perform another version of your program to solve a single iteration (
wavesolve_threaded
orwavesolve_original
) - Manually inspect the result with
waveshow
orwavediff
- Verify correctness
- Try solving the entire wave
- Compare results with the answer files (
wavefiles/*D/*-out.wo
files) usingwavediff
- Compare results with the answer files (
- If some end result is not correct, consider the following resources:
- Try increasing the solve limit to solve 2 iterations, then 4…
- Watch this video on Debugging MPI
Submission
Update your CMakeLists.txt
to create wavesolve_mpi
(making sure to compile with MPI), develop in a branch named phase7
or tag the commit you’d like me to grade from phase7
, and push it.
Grading
This phase is worth 30 points. The following deductions, up to 30 points total, will apply for a program that doesn’t work as laid out by the spec:
Defect | Deduction |
---|---|
Failure to compile wavesolve_mpi |
5 points |
Failure of wavesolve_mpi to work on each of 3 test files |
5 points each |
Failure of wavesolve_mpi to checkpoint correctly |
5 points |
Failure of wavesolve_mpi to run on 2d-medium-in.wo on two m9 nodes with 4 threads each in 30 seconds |
5 points |
…in 60 seconds | 5 points |
wavesolve_mpi isn’t an MPI program, or doesn’t distribute work evenly among processes |
1-30 points |