• Olli-Pekka Lehto's avatar
    added script to help manage native and symmetric MPI runs within SLURM · fdf56162
    Olli-Pekka Lehto authored
    Dear all,
    
    As quick fix, I have put together this script to help manage native and symmetric MPI runs within SLURM. It's a bit bare-bones currently but I needed to get it working quickly :)
    
    It does not provide tight integration between the scheduler and MPI daemons and requires a slot on the host, even when running fully on the MIC, so it's really far from an optimal solution but could be a stopgap.
    
    It's inspired by the TACC Stampede documentation. They seem to have a similar script in place.
    
    It's fairly simple, you provide the names of the MIC binary (with -m) and host binary (with -c). The host MPI/OpenMP parameters are given as usual and the Xeon Phi side parameters as environment variables (MIC_PPN, MIC_OMP_NUM_THREADS). Currently it supports only 1 card per host but extending it should be simple enough.
    
    Here are a couple of links to documentation:
    
    Our prototype cluster documentation:
    https://confluence.csc.fi/display/HPCproto/HPC+Prototypes#HPCPrototypes-XeonPhiDevelopment
    Presentation at the PRACE Spring School in Umeå earlier this week:
    https://www.hpc2n.umu.se/sites/default/files/1.03%20CSC%20Cluster%20Introduction.pdf
    
    Feel free to include this in the contribs -directory. It might need a bit of cleanup though and I don't know when I have the time to do this.
    
    I have also added support for TotalView debugger (provided it's installed and configured properly for Xeon Phi usage).
    
    Future ideas:
    
    For the native MIC client, I've been testing it out a bit and looking at ways to minimize the changes needed for support. The two major challenges seem to be in scheduling and affinity:
    
    I think it might be necessary to put it into a specific topology plugin, like the one for BG/Q, but it looks like a lot of work to do that.
    
    Best regards,
    Olli-Pekka
    fdf56162
To find the state of this project's repository at the time of any of these versions, check out the tags.