Do not abort a job if ALL if its nodes are unavailable and not responding
The job will be aborted if any node is set DOWN while responding or when "scontrol reconfig" is executed or the slurmctld restarts, but it should respond better to global failures, like the network going down.
Please register or sign in to comment