Add timeout on srun's I/O connect message to better handle some failure modes
If the slurmstepd connects task I/O, but aborts after srun accepts the connect and before slurmstepd writes data then srun could possibly hand indefinitely. This probably does not explain failures seen at CEA, but can't hurt matters. then the sr
Please register or sign in to comment