Modify power_save logic to attempt to resume nodes which have been allocated
to a job, but which are not responding when the slurmctld restarts. This should help prevent race conditions when slurmctld crashes and we try to reconver consistent state (e.g. did ResumeProgram complete or not).
Please register or sign in to comment