added script to help manage native and symmetric MPI runs within SLURM
Dear all, As quick fix, I have put together this script to help manage native and symmetric MPI runs within SLURM. It's a bit bare-bones currently but I needed to get it working quickly :) It does not provide tight integration between the scheduler and MPI daemons and requires a slot on the host, even when running fully on the MIC, so it's really far from an optimal solution but could be a stopgap. It's inspired by the TACC Stampede documentation. They seem to have a similar script in place. It's fairly simple, you provide the names of the MIC binary (with -m) and host binary (with -c). The host MPI/OpenMP parameters are given as usual and the Xeon Phi side parameters as environment variables (MIC_PPN, MIC_OMP_NUM_THREADS). Currently it supports only 1 card per host but extending it should be simple enough. Here are a couple of links to documentation: Our prototype cluster documentation: https://confluence.csc.fi/display/HPCproto/HPC+Prototypes#HPCPrototypes-XeonPhiDevelopment Presentation at the PRACE Spring School in Umeå earlier this week: https://www.hpc2n.umu.se/sites/default/files/1.03%20CSC%20Cluster%20Introduction.pdf Feel free to include this in the contribs -directory. It might need a bit of cleanup though and I don't know when I have the time to do this. I have also added support for TotalView debugger (provided it's installed and configured properly for Xeon Phi usage). Future ideas: For the native MIC client, I've been testing it out a bit and looking at ways to minimize the changes needed for support. The two major challenges seem to be in scheduling and affinity: I think it might be necessary to put it into a specific topology plugin, like the one for BG/Q, but it looks like a lot of work to do that. Best regards, Olli-Pekka
Please register or sign in to comment