-
Morris Jette authored
Ancient versions of OpenMPI and their derivatives (i.e. Cray MPI) are dependent upon communication ports being assigned to them by Slurm. Such MPI jobs will experience step launch failure if any component of a heterogeneous job step is unable to acquire the allocated ports. Non-heterogeneous job steps will retry step launch using a new set of communication ports (no change in Slurm behavior). NOTE: Correcting this would necessitate assigning the same set of ports to all components of the heterogeneous job (not possible today) plus changes to srun in order to better synchronize the step startup and error handling.
d64a5f67
To find the state of this project's repository at the time of any of these versions, check out the tags.