• Morris Jette's avatar
    Retry MPI reserved port logic only for non-pack job steps · d64a5f67
    Morris Jette authored
    Ancient versions of OpenMPI and their derivatives (i.e. Cray MPI) are
    dependent upon communication ports being assigned to them by Slurm. Such MPI
    jobs will experience step launch failure if any component of a
    heterogeneous job step is unable to acquire the allocated ports.
    Non-heterogeneous job steps will retry step launch using a new set of
    communication ports (no change in Slurm behavior).
    
    NOTE: Correcting this would necessitate assigning the same set of ports
    to all components of the heterogeneous job (not possible today) plus changes to
    srun in order to better synchronize the step startup and error handling.
    d64a5f67
To find the state of this project's repository at the time of any of these versions, check out the tags.