Commit 3c2b46af authored by Didier GAZEN's avatar Didier GAZEN Committed by Morris Jette
Browse files

Fix for node reboot/down state

In your node_mgr fix to keep rebooted nodes down (commit 9cd15dfe), you
forgot to consider the case of nodes that are powered up but are responding after
ResumeTimeout seconds (the maximum time permitted). Such nodes are
marked DOWN (because they didn't respond within ResumeTimeout seconds) than
should become silently available when ReturnToService=1 (as stated in the slurm.conf manual)

With your modification when such nodes are finally responding, they are seen as
rebooted nodes and remain in the DOWN state (with the new reason: Node
unexpectedly rebooted) even when ReturnToService=1 !

My patch to obtain the correct behaviour:
parent f2a08ce7
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment