Commit e80b2442 authored by Moe Jette's avatar Moe Jette
Browse files

Slurmctld now pings srun periodically. If srun fails to respond, the job

and/or job step(s) will have their resources de-allocated and be killed.
A resource allocation will not be release unless no job steps are active
for at least InactiveLimit seconds. DPCS jobs will be subject to this
forced de-allocation if they remain inactive for an extended period of
time, which can get SLURM and DPCS back in sync if DPCS does a cold-start.
parent 680a2faf
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment