- 24 Mar, 2017 11 commits
-
-
Morris Jette authored
-
Morris Jette authored
The wrong function was being called to release memory. Bug introduced yesterday in commit c8bf6b5d
-
Morris Jette authored
The xmalloc was based upon the wrong data structure
-
Morris Jette authored
No changes to logic, just added some parenthesis and brackets
-
Brian Christiansen authored
Reroute federated scancel <jobid>'s to the origin cluster.
-
Brian Christiansen authored
The local cluster will cancel the job if federated job is running on the cluster, otherwise it will route the request, back through the client, to the origin cluster.
-
Brian Christiansen authored
Set the cluster lock even when the cluster is the only viable cluster. The cluster_lock is used to determine if the cluster is running the job or not.
-
Brian Christiansen authored
when seeing if the cluster is the only cluster in the viable list.
-
Brian Christiansen authored
Adding to be able to route cancel (and other future) requeusts to the origin cluster in a federation.
-
Brian Christiansen authored
Federation Reconciliation
-
Brian Christiansen authored
When a sibling establishes a connection a silbing that sibling will then reconcile jobs with the other sibling.
-
- 23 Mar, 2017 20 commits
-
-
Brian Christiansen authored
when the job is being purged due a the origin job being "cleaned" (e.g slurmctld -c).
-
Brian Christiansen authored
-
Brian Christiansen authored
instead of just all of the viables. I need to be able send the viable list to only a specific sibling. For example if reconciliation finds that a sibling doesn't have a job that the origin thinks it could.
-
Brian Christiansen authored
-
Brian Christiansen authored
-
Brian Christiansen authored
-
Brian Christiansen authored
The connection's memory wasn't getting free'd which caused issues when the connection needed to be restablished.
-
Brian Christiansen authored
-
Brian Christiansen authored
data_size is used when sending buffer messages (e.g. RESPONSE_JOB_INFO).
-
Brian Christiansen authored
-
Brian Christiansen authored
-
Brian Christiansen authored
-
Brian Christiansen authored
This protects the fed_mgr_fed_rec. Reverted 315cff15. This is a cleaner patch.
-
Brian Christiansen authored
This reverts commit 315cff15.
-
Brian Christiansen authored
This reverts commit e5f2c720.
-
Brian Christiansen authored
Continuation of 315cff15
-
Morris Jette authored
-
Morris Jette authored
-
Brian Christiansen authored
-
Morris Jette authored
Explicitly test WIFEXITED() rather than assume an exit code if WIFSIGNALED is false. bug 3562
-
- 22 Mar, 2017 9 commits
-
-
Morris Jette authored
This will avoid filling the slurmctld logs with communication error messages when a cluster in the federation is down.
-
Morris Jette authored
-
Morris Jette authored
bug 3610
-
Morris Jette authored
No change to logic
-
Morris Jette authored
-
Morris Jette authored
Fix some recently introduced memory leaks related to lists of RPCs.
-
Morris Jette authored
Correct spelling of persistent in various comments and log messages. No changed to logic.
-
Morris Jette authored
Log the RPCs queued for processing by the federation agent which remain unprocessed at slurmctld shutdown
-
Morris Jette authored
The broadcast was temporarily removed for testing purposes
-