- 27 Oct, 2016 31 commits
-
-
Brian Christiansen authored
with the sbatch -M<clusters> option. A fed will run will the fastest time of all siblings in a federation.
-
Brian Christiansen authored
-
Brian Christiansen authored
Was moved in 5cfc577a
-
Brian Christiansen authored
-
Brian Christiansen authored
More to come. This sets up the controller side.
-
Brian Christiansen authored
-
Brian Christiansen authored
-
Brian Christiansen authored
-
Brian Christiansen authored
It could have state saved a null federation.
-
Brian Christiansen authored
instead of one at a time. Saves ~20 seconds.
-
Brian Christiansen authored
If all of the clusters get updated at the same time, the could get in a state where they are waiting on each other to respond and will eventually timeout and will then will reconnect. By spacing out the clusters being added this helps prevent them from talking to everyone at the same time.
-
Brian Christiansen authored
-
Brian Christiansen authored
-
Brian Christiansen authored
-
Brian Christiansen authored
-
Brian Christiansen authored
-
Brian Christiansen authored
-
Brian Christiansen authored
get_next_job_id() didn't take into consideration job_ids that already be taken by other jobs like set_job_id() did.
-
Brian Christiansen authored
Send the existing packed buffer that has the job_desc to the sibling. The sibling will unpack it on the other side. This prevents having to pack the job_desc for each willrun/allocation to each sibling.
-
Brian Christiansen authored
squeue --fedtrack
-
Brian Christiansen authored
squeue long options: fedorigin, fedoriginraw, fedsiblings and fedsiblingsraw.
-
Brian Christiansen authored
Also make strings of siblings for passing back to the api.
-
Brian Christiansen authored
-
Brian Christiansen authored
For the fed_mgr, instead of packing up the job_desc again for will_runs and allocations to each sibling just send the buffer that came in. Since job_allocate will modify the received job_desc_msg_t, this also makes it easy to make a copy of the received job_desc_msg_t to do a willrun on the receiving cluster before allocating resources.
-
Dominik Bartkiewicz authored
-
Dominik Bartkiewicz authored
-
Morris Jette authored
-
Tim Wickberg authored
-
Morris Jette authored
This option specifies minimum characteristics of the compute nodes which should be considered for use, not the resource allocation size. bug 3118
-
Alejandro Sanchez authored
Create separate check_hosts_contiguous procedure in globals and use it for both test1.83 and test15.21. Bug 3006.
-
Morris Jette authored
-
- 26 Oct, 2016 9 commits
-
-
Morris Jette authored
-
Morris Jette authored
Fix bug that was clearing MAINT mode on nodes scheduled for reboot (bug introduced in version 16.05.5 to address bug in overlapping reservations, commit 5eee1d28). Note that a node's MAINT flag is used for both a requested reboot and maintenance reservation. What I'd like to do is add a new node state flag to differenciate between these two cases, but that involves some significant changes that could introduce instability, so it will be defered to version 17.02 bug 3210
-
Morris Jette authored
Correct/expand description of NODE_STATE_FLAG
-
Danny Auble authored
-
Danny Auble authored
-
Alejandro Sanchez authored
salloc are requested with -n tasks < hosts from -w hostlist or from -N.
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
requested with -n tasks < hosts from -w hostlist.
-