- 20 Jan, 2017 34 commits
-
-
Brian Christiansen authored
-
Brian Christiansen authored
-
Brian Christiansen authored
If a job was requeued while in the completing state, the database wasn't being updated with the requeue state.
-
Brian Christiansen authored
When a fed job is requeued, it needs to be requeued to clusters that it was submittted to.
-
Brian Christiansen authored
When the a fed job is requeued and new siblings are submitted to the other siblings, the restart_cnt needs to go to the siblings in case the job runs on a remote sibling.
-
Brian Christiansen authored
The federation needs to make a job_desc when requeueing jobs to siblings.
-
Brian Christiansen authored
-
Brian Christiansen authored
-
Brian Christiansen authored
Since a persistent connection can only be established by SlurmUser this prevents non-slurmuser users from calling the rpcs. It also requires that all slurmctlds in the federation have the same SlurmUser.
-
Brian Christiansen authored
-
Brian Christiansen authored
If the job can't start now, just submit the job to all siblings.
-
Brian Christiansen authored
_update_sibling_job_siblings()
-
Brian Christiansen authored
-
Brian Christiansen authored
-
Brian Christiansen authored
-
Brian Christiansen authored
-
Brian Christiansen authored
-
Brian Christiansen authored
like it does in slurm_send_recv_msg. The resp needs to be inited before _check_send it called.
-
Brian Christiansen authored
Sibling jobs have to get lock from the origin cluster in order to attempt to allocate nodes. If it gets the allocation then it lets the origin cluster know and the origin cluster will set the siblings jobs, if any, into a REVOKED state and purge the jobs. If the sibling job is the only sibling then it assumes the lock and attempts to start the job to avoid extra communications. If nodes can't be allocated then the job releases the lock for another cluster to try.
-
Brian Christiansen authored
-
Brian Christiansen authored
for fed sibling jobs that don't start.
-
Brian Christiansen authored
-
Brian Christiansen authored
-
Brian Christiansen authored
To handle JOB_REVOKED
-
Brian Christiansen authored
-
Brian Christiansen authored
-
Brian Christiansen authored
-
Brian Christiansen authored
-
Tim Wickberg authored
Overwriting _size leads to the bitmap being the wrong length.
-
Tim Wickberg authored
CID 160063.
-
Tim Wickberg authored
The safe_unpackstr_xmalloc() call could jump to unpack_error, which would leak memory allocated for bitmap. Allocate only after the unpackstr has succeeded. Coverity 160092 (+ more due to macro expansion leading to repeats).
-
Tim Wickberg authored
Code would return NO_VAL if the requested frequency was greater than the highest available. While here improve the errors printed to the slurmstepd log location, and change the initial check against nfreq to test for zero. (Rather than (uint8_t) NO_VAL which it could never be set to.) Bug 3335.
-
Danny Auble authored
Bug 2508
-
Tim Wickberg authored
-
- 19 Jan, 2017 6 commits
-
-
Morris Jette authored
If job is allocated nodes which are powered down, then reset job start time when the nodes are ready and do not charge the job for power up time. bug 3411
-
Morris Jette authored
No changes in logic
-
Isaac Hartung authored
Modify get_my_user_name to return a FAILURE if the user_name cannot be determined, as was handled inconsistently.
-
Danny Auble authored
the function.
-
Danny Auble authored
-
Danny Auble authored
-