- 16 Nov, 2017 10 commits
-
-
Morris Jette authored
-
Morris Jette authored
Coverity CID 179255
-
Morris Jette authored
Coverity CID 179200
-
Morris Jette authored
Fix logic so that we can find all components of a pack job if the submit fails. Populate the "pack_job_list" field on a pack job leader if some later component can't be submitted.
-
Morris Jette authored
-
Morris Jette authored
Correct printing error type based upon errno rather than returned rc.
-
Dominik Bartkiewicz authored
If PrologSlurmctld fails for pack job leader then kill all components of the job. bug 4379
-
Dominik Bartkiewicz authored
Add SLURM_PACK_JOB_NODELIST to PrologSlurmctld and EpilogSlurmctld environment. bug 4379
-
Morris Jette authored
-
Morris Jette authored
bug 4370
-
- 15 Nov, 2017 10 commits
-
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
Prevent scheduling deadlock with multiple components of heterogeneous job in different partitions (i.e. one heterogeneous job component is higher priority in one partition and another component is lower priority in a different partition). bug 4370
-
Alejandro Sanchez authored
Issue could be reproduced by restarting slurmctld after a heterogeneous job finished, but before MinJobAge time passed. Since the pack_job_list job_record memeber wasn't saved/loaded to/from the job_state, the function _validate_pack_jobs() is responsible for rebuilding the pack_job_list. Issue was that the function was skiping the rebuild work for finished jobs, thus other functions like the thread responsible for purging old jobs was failing to iterate over a NULL pack_job_list which was never rebuilt. Bug 4383.
-
Felip Moll authored
If run from srun and lua job submit plugin sets environment, slurmctld crashes. Bug#4247
-
Alejandro Sanchez authored
From within slurm_job_submit(): job_desc.pack_job_offset From within slurm_job_modify(): job_rec.pack_job_id job_rec.pack_job_id_set job_rec.pack_job_offset Bug 4372.
-
Felip Moll authored
bug 4339
-
Felip Moll authored
added some additional checks to prevent segfaults in some basic situations. Bug 4247
-
Felip Moll authored
bug 4368
-
Dominik Bartkiewicz authored
Add SLURM_PACK_JOB_ID and SLURM_PACK_JOB_OFFSET to PrologSlurmctld and EpilogSlurmctld environment bug 4379
-
- 14 Nov, 2017 2 commits
-
-
Morris Jette authored
-
Morris Jette authored
Avoid srun abort trying to run on heterogeneous job component that has ended. bug 4366
-
- 13 Nov, 2017 9 commits
-
-
Morris Jette authored
Coverity CID 179200
-
Morris Jette authored
Coverity CID 179201
-
Morris Jette authored
-
Morris Jette authored
bug 4374
-
Morris Jette authored
Do so even if pack-group 0 is completed, so long as not all components are completed bug 4374
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
Specify "BackupController#=<hostname> and "BackupAddr#=<address>" to identify up to 9 backup servers in slurm.conf. Output format of "scontrol ping" and the daemon status at the end of "scontrol status" is modified to report up status of the primary and all backup servers. "scontrol takeover [#]" command can now identify the BackupController index number. Default value is "1" (the configured "BackupController" or "BackupController1" node).
-
Tim Wickberg authored
In a prior incarnation of the patch that introduced it, it was MaxQueryTimeLimit, and that was not updated with the code base when changed. Bug 4365.
-
- 10 Nov, 2017 8 commits
-
-
Tim Wickberg authored
-
Tim Wickberg authored
Update slurm.spec and slurm.spec-legacy as well
-
Tim Wickberg authored
-
Tim Wickberg authored
-
Felip Moll authored
Bug 4323.
-
Isaac Hartung authored
This now matches the sinfo documentation. Bug 4306.
-
Tim Wickberg authored
The race condition this is avoiding has been fixed elsewhere. This reverts commit 6c21c8bd.
-
Morris Jette authored
-
- 09 Nov, 2017 1 commit
-
-
Danny Auble authored
Coverity 178912
-