Add a separate heartbeat thread for HA.
Will write out a timestamp into a 'heartbeat' file in StateSaveLocation every (SlurmctldTimeout / 4) seconds to demonstrate that the primary controller still has access to the directory, and thus the backup should avoid taking control. Bug 4142.
Please register or sign in to comment