- 22 Sep, 2016 5 commits
-
-
Morris Jette authored
-
Brian Christiansen authored
-
Brian Christiansen authored
This reverts commit 54a270f7.
-
Danny Auble authored
the fed_mgr.
-
Brian Christiansen authored
-
- 21 Sep, 2016 6 commits
-
-
Morris Jette authored
capmc_suspend/resume - If a request modify NUMA or MCDRAM state on a set of nodes or reboot a set of nodes fails then just requeue the job and abort the entire operation rather than trying to operate on individual nodes. bug 3100
-
Morris Jette authored
Allow a node's PowerUp state flag to be cleared using update_node RPC. bug 3100
-
Morris Jette authored
When powering up a node to change it's state (e.g. KNL NUMA or MCDRAM mode) then pass to the ResumeProgram the job ID assigned to the nodes in the SLURM_JOB_ID environment variable. bug 3100
-
Morris Jette authored
Don't log error for job end_time being zero if node health check is still running. bug 3053
-
Brian Christiansen authored
Previous logic duplicated checking error_codes returned from job_allocate. job_allocate() will set job state to FAILED if there was an actual issue.
-
Brian Christiansen authored
Was just checking for ESLURM_REQUESTED_PART_CONFIG_UNAVAILABLE and ENFORCE_ALL however in _slurm_rpc_allocate_resources() and _slurm_rpc_submit_batch_job() both check for ANY and ALL.
-
- 20 Sep, 2016 12 commits
-
-
Danny Auble authored
to siblings (If not already connected). This will happen when the next message is sent to them.
-
Tim Wickberg authored
-
Danny Auble authored
-
Danny Auble authored
sibling clusters.
-
Danny Auble authored
back a message to the caller.
-
Danny Auble authored
a federation connection, someone adding and removing the cluster from the federation lots of times at the same time the cluster could be not found.
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
-
Tim Wickberg authored
Fixes build issue caused by 844830d4.
-
Ben Matthews authored
-
- 19 Sep, 2016 13 commits
-
-
Danny Auble authored
connection
-
Danny Auble authored
-
Danny Auble authored
error.
-
Danny Auble authored
-
Danny Auble authored
at startup. Starting it up when you get a connection from another cluster could cause delays in processing the request.
-
Danny Auble authored
want to only wait for message_timeout instead of forever. Otherwise we could hit deadlock if the other person is trying to do the same thing.
-
Danny Auble authored
-
Danny Auble authored
processed at a time. Otherwise you could get issues if you are rapidly adding and removing a cluster from a federation. Probably not likely in real life, but in testing that is a different story.
-
Danny Auble authored
slurmctld.
-
Danny Auble authored
scenario when first added to a federation.
-
Danny Auble authored
-
Morris Jette authored
-
Damien François authored
-
- 17 Sep, 2016 4 commits
-
-
Danny Auble authored
the same logic that was found in the slurmdbd. Now both functionalities share the same code. This was done with the merge right before this commit.
-
Danny Auble authored
-
Danny Auble authored
update is sent to a slurmctld.
-
Danny Auble authored
with real persistent connections.
-