Add jobid/stepid to MESSAGE_TASK_EXIT to address race condition when
a job step is cancelled, another is started immediately (before the first one completely terminates) and ports are reused. NOTE: This change requires that SLURM be updated on all nodes of the cluster at the same time. There will be no impact upon currently running jobs (they will ignore the jobid/stepid at the end of the message).
Please register or sign in to comment