Added configuration parameter SrunIOTimeout to optionally ping srun's tasks
for better fault tolerance (e.g. killed and restarteed SLURM daemons on compute node).
Please register or sign in to comment
for better fault tolerance (e.g. killed and restarteed SLURM daemons on compute node).