Do not defer slurmd node registration if HealthCheckProgram fails (b31fa177) · Commits · Manuel G. Marciani / ces_slurm_simulator

Commit b31fa177 authored Jul 07, 2017 by

Alejandro Sanchez Committed by Morris Jette Jul 07, 2017

Do not defer slurmd node registration if HealthCheckProgram fails

This behavior was introduced in bug 2504, commit 7fb0c981 and bug 2643
commit 988edf12 respectively.

The reasoning is that sysadmins who see nodes with Reason "Not Responding"
but they can manually ping/access the node end up confused. That reason
should only be set if the node is trully not responding, but not if the
HealthCheckProgram execution failed or returned non-zero exit code. For
that case, the program itself would take the appropiate actions, such
as draining the node and setting an appropiate Reason.

Bug 3931

parent 3c161d32

Hide whitespace changes

Inline Side-by-side

Please register or to comment