- 06 Apr, 2016 6 commits
-
-
Danny Auble authored
-
Danny Auble authored
-
Morris Jette authored
Prevent use of NULL pointer and SEGV when changing a job's QOS when the slurmdbd is not configured.
-
Morris Jette authored
bug 2609
-
Morris Jette authored
These tests failed with MinJobAge=3, so when the tests looked for completed jobs, the job records had already been purged. Log this configuration as a possible reason for failure.
-
Tim Wickberg authored
-
- 05 Apr, 2016 8 commits
-
-
Janne Blomqvist authored
-
Morris Jette authored
Conflicts: src/plugins/sched/backfill/backfill.c
-
Morris Jette authored
Fix backfill scheduler race condition that could cause invalid pointer in select/cons_res plugin. Bug introduced in 15.08.9, commit: efd9d35e The scenario is as follows 1. Backfill scheduler is running, then releases locks 2. Main scheduling loop starts a job "A" 3. Backfill scheduler resumes, finds job "A" in its queue and resets it's partition pointer. 4. Job "A" completes and tries to remove resource allocation record from select/cons_res data structure, but fails to find it because it is looking in the table for the wrong partition. 5. Job "A" record gets purged from slurmctld 6. Select/cons_res plugin attempts to operate on resource allocation data structure, finds pointer into the now purged data structure of job "A" and aborts or gets SEGV Bug 2603
-
Danny Auble authored
misleading.
-
Danny Auble authored
-
Danny Auble authored
instead of ID to make things easier to read.
-
Danny Auble authored
-
Danny Auble authored
# Conflicts: # src/common/gres.c
-
- 04 Apr, 2016 4 commits
-
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
canceled while launching.
-
Morris Jette authored
-
- 02 Apr, 2016 3 commits
-
-
Morris Jette authored
-
Danny Auble authored
-
Danny Auble authored
-
- 01 Apr, 2016 4 commits
-
-
Morris Jette authored
-
Morris Jette authored
Rather than making sure that a running job's socket count on a node remain constant, just make sure the total core count remains constant.
-
Morris Jette authored
-
Morris Jette authored
Rename partition configuration from "Shared" to "OverSubscribe". Rename salloc, sbatch, srun option from "--shared" to "--oversubscribe". The old options will continue to function. Output field names also changed in scontrol, sinfo, squeue, and sview.
-
- 31 Mar, 2016 3 commits
-
-
Morris Jette authored
-
Morris Jette authored
Power/cray: Don't specify NID list to Cray APIs. If any of those nodes are not in a ready state, the API returned an error for ALL nodes rather than valid data for nodes in ready state. bug 2332
-
Matthieu Hautreux authored
and retries are done making the error message a little misleading.
-
- 30 Mar, 2016 10 commits
-
-
Morris Jette authored
Update a node's socket and cores per socket counts as needed after a node boot to reflect configuration changes which can occur on KNL processors. Note that the node's total core count must not change, only the distribution of cores across varying socket counts (KNL NUMA nodes treated as sockets by Slurm).
-
Morris Jette authored
Log if the number of cores is not evenly divisible by the socket count (which will be the case on some KNL) or the number of threads is not evenly divisible by the core count.
-
Danny Auble authored
rollup would effectively never run again. bug 2575 and sort of bug 2596
-
Morris Jette authored
Remove the SchedulerParameters option of "assoc_limit_continue", making it the default value. Add option of "assoc_limit_stop". If "assoc_limit_stop" is set and a job cannot start due to association limits, then do not attempt to initiate any lower priority jobs in that partition. Setting this can decrease system throughput and utlization, but avoid potentially starving larger jobs by preventing them from launching indefinitely.
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
Conflicts: META NEWS
-
Morris Jette authored
-
Morris Jette authored
-
- 29 Mar, 2016 2 commits
-
-
Danny Auble authored
-
Morris Jette authored
This adds a FAQ to go with commit 8ee976b4
-