- 14 Dec, 2015 2 commits
-
-
Morris Jette authored
Decrease parallelism in job cancel request to prevent denial of service when cancelling huge numbers of jobs. bug 2256
-
Morris Jette authored
Prevent triggering gang scheduling within a partition if configured with PreemptType=partition_prio and PreemptMode=suspend,gang. The essence of this fix is to change a "<=" to "<" in cons_res/job_test.c: - if ((p_ptr->part_ptr->priority <= jp_ptr->part_ptr->priority) && + if ((p_ptr->part_ptr->priority < jp_ptr->part_ptr->priority) && but logic was also added to insure that a partition configuration with PreemptMode did not override PreemptType != partition_prio. bug 2232
-
- 11 Dec, 2015 4 commits
-
-
Tim Wickberg authored
Previously an error() would be logged when the attempt to open the job script using the new directory format failed but the successive fallback to the old directory structure was successful, leading to confusion when troubleshooting. Move emitted warnings to debug(), and only error() after failing to open in both directory structures. Add a note about backwards compatibility to both functions - we cannot remove these fallbacks as directory structure for pending jobs does not change on Slurm version update, and people may need to chain multiple version update together to get to a current slurm version which would correctly update slurmctld state files but leave pending jobs in the old directory structure. Bug #2244.
-
Morris Jette authored
If a job is requeued while in the process of being launch, remove it's job ID from slurmd's record of active jobs in order to avoid generating a duplicate job ID error when launched for the second time (which would drain the node). bug 2240
-
Morris Jette authored
In slurmctld log file, log duplicate job ID found by slurmd. Previously was being logged as prolog/epilog failure. bug 2240
-
Morris Jette authored
-
- 10 Dec, 2015 2 commits
-
-
Danny Auble authored
so we can set it correctly before tasks are set.
-
David Bigagli authored
-
- 09 Dec, 2015 4 commits
-
-
Morris Jette authored
news
-
Alejandro Sanchez authored
through sacctmgr.
-
Morris Jette authored
select/cray: Prevent NHC from running more than once per job or step. bug 2192
-
Morris Jette authored
In both sched/basic and backfill: If a job can not be started due to some account/qos limit, then don't start other jobs which could delay jobs. The old logic would skip the job and start other jobs, which could delay the higher priority job. bug 2129
-
- 08 Dec, 2015 2 commits
-
-
Brian Christiansen authored
-
Danny Auble authored
requests no time limit. http://bugs.schedmd.com/show_bug.cgi?id=2177
-
- 07 Dec, 2015 1 commit
-
-
Tim Wickberg authored
Usernames are comma separated, not colon delimited. Bug #2222. While here fix a few spelling mistakes.
-
- 05 Dec, 2015 2 commits
-
-
Brian Christiansen authored
Bug 2130
-
Brian Christiansen authored
Adopted processes didn't have access to the job's devices. Bug 2130
-
- 04 Dec, 2015 2 commits
-
-
Danny Auble authored
Full revert of c2fbf88f, 13b64c35 had caught part of this, but this will revert it completely. The code just wasn't needed in modern Slurm. It appears the patch came from an older version of Slurm that didn't handle this correctly.
-
David Bigagli authored
This reverts commit 29f25688. Conflicts: NEWS Looks like this isn't needed, commit c2fbf88f doesn't appear to be needed and is what is causing this issue. c2fbf88f was added from an older version of Slurm where this was already handled correctly in commit 815e5a44.
-
- 03 Dec, 2015 5 commits
-
-
Morris Jette authored
Cray job NHC delayed until after burst buffer released and epilog completes on all allocated nodes. bugs 2099 and 2192
-
Morris Jette authored
Release a job's allocated licenses only after epilog runs on all nodes rather than at start of termination process. bug 2192
-
Morris Jette authored
sched/backfill - Delay backfill scheduler for completing jobs only if CompleteWait configuration parameter is set (make code match documentation).
-
David Bigagli authored
-
Tim Wickberg authored
-
- 02 Dec, 2015 1 commit
-
-
Josko Plazonic authored
Bug 2030
-
- 01 Dec, 2015 4 commits
-
-
Danny Auble authored
correctly when PriorityFlags=CALCULATE_RUNNING is set. Previously the slurmctld could seg fault if the tres_alloc_str is NULL.
-
Morris Jette authored
Prevent slurmdbd divide by zero if no associations defined at rollup time.
-
Danny Auble authored
else behind.
-
David Bigagli authored
-
- 30 Nov, 2015 5 commits
-
-
Danny Auble authored
uint64_t.
-
Danny Auble authored
-
Danny Auble authored
as all the associations from the database will be lower case.
-
Thomas Cadeau authored
Correct job task count calcuation if only node count and ntasks-per-node options supplied. bug 2196
-
David Bigagli authored
-
- 26 Nov, 2015 1 commit
-
-
jette authored
sched/backfill: If max_rpc_cnt is configured and the backlog of RPCs has not cleared after yielding locks, then continue to sleep.
-
- 25 Nov, 2015 1 commit
-
-
Danny Auble authored
requesting any specific association.
-
- 23 Nov, 2015 1 commit
-
-
Danny Auble authored
-
- 19 Nov, 2015 3 commits
-
-
Morris Jette authored
BurstBuffer/cray: Fix job record purging if cancelled from pending state. The problem can occur when the a burst buffer record was created for the job in the plugin data structure, but no burst buffers were actually allocated for it. bug 2165
-
David Bigagli authored
-
Morris Jette authored
BurstBuffer/cray: Enable clearing of burst buffer string on completed job as a means of recovering from a failure mode. Format is "scontrol update jobid=### burstbuffer=". partial resolution of bug 2165
-