- 19 Nov, 2015 4 commits
-
-
Morris Jette authored
-
Morris Jette authored
BurstBuffer/cray: Fix job record purging if cancelled from pending state. The problem can occur when the a burst buffer record was created for the job in the plugin data structure, but no burst buffers were actually allocated for it. bug 2165
-
David Bigagli authored
-
Morris Jette authored
BurstBuffer/cray: Enable clearing of burst buffer string on completed job as a means of recovering from a failure mode. Format is "scontrol update jobid=### burstbuffer=". partial resolution of bug 2165
-
- 18 Nov, 2015 4 commits
-
-
Morris Jette authored
bug 2028
-
Morris Jette authored
BurstBuffer/cray: Add logic to terminate dw_wlm_cli child processes at shutdown. bug 2166
-
Morris Jette authored
Previous logic required the buffer name to work bug 2167
-
Morris Jette authored
Added srun option of --bcast to move executable file to compute nodes
-
- 17 Nov, 2015 2 commits
-
-
Morris Jette authored
burst_buffer/cray: Support file staging when job lacks job-specific buffer (i.e. only persistent burst buffers). bug 2113
-
David Bigagli authored
-
- 16 Nov, 2015 2 commits
-
-
Morris Jette authored
bug 2143
-
Morris Jette authored
Backfill scheduler: Test association and QOS node limits before reserving resources for pending job. bug 2129
-
- 13 Nov, 2015 9 commits
-
-
Brian Christiansen authored
-
Brian Christiansen authored
Prevents the following sequence from causing a segfault: $ scontrol create partitionname=stuff nodes=ALL $ sbatch --wrap="hostname" -o/dev/null -p stuff Submitted batch job 1047468 $ scontrol delete partitionname=stuff $ scontrol update jobid=1047468 partition=stuff
-
Danny Auble authored
-
Danny Auble authored
tree.
-
Danny Auble authored
-
Danny Auble authored
-
Brian Christiansen authored
Bug 2006
-
Danny Auble authored
step.
-
Morris Jette authored
-
- 12 Nov, 2015 5 commits
-
-
Morris Jette authored
Stop searching sbatch scripts for #PBS directives after 100 lines of non-comments. Stop parsing #PBS or #SLURM directives after 1024 characters into a line. Required for decent perforamnce with huge scripts.
-
Mark Roberts authored
-
David Bigagli authored
-
Morris Jette authored
-
Morris Jette authored
Previously only supported by SlurmUser and root.
-
- 11 Nov, 2015 5 commits
-
-
Morris Jette authored
Previously only reserved space for one task of pending job array.
-
Morris Jette authored
Support taking node out of FUTURE state with "scontrol reconfig" command. Previous logic would keep node in FUTURE state if that was the original configuration when slurmctld started. If job was running on the node, it will stay running, but the node make not be visible.
-
David Bigagli authored
-
Morris Jette authored
Previous logic would create a environment and script file for each task of a job array (hard link to original actually). Due to file system limitations and clutter, this was less than ideal. This patch eliminates the redundant files, using only the original file created for the job array. This should also make support for burst buffers easier in the future for job arrays.
-
Morris Jette authored
Make SLURM_ARRAY_TASK_MIN, SLURM_ARRAY_TASK_MAX, and SLURM_ARRAY_TASK_STEP environment variables available to PrologSlurmctld and EpilogSlurmctld.
-
- 10 Nov, 2015 5 commits
-
-
Hongjia Cao authored
-
Danny Auble authored
We needed to send a finish from each node in the step whether it had any activity or not. This way the controller knew things were done on the node and the data was sent to the database. Bug 2097
-
Danny Auble authored
-
Morris Jette authored
Burst_buffer/cray: Don't stall scheduling of other jobs while a stage-in is in progress. bug 2114
-
Morris Jette authored
Fix to purge terminated jobs with burst buffer errors. bug 2123
-
- 09 Nov, 2015 2 commits
-
-
Morris Jette authored
The prolog_running counter can now exceed 1. New logic raises limit from 1 to 4 before preventing job recovery on restart.
-
David Bigagli authored
-
- 07 Nov, 2015 1 commit
-
-
Morris Jette authored
Added burst_buffer.conf flag parameter of "TeardownFailure" which will teardown and remove a burst buffer after failed stage-in or stage-out. By default, the buffer will be preserved for analysis and manual teardown. bug 2116
-
- 06 Nov, 2015 1 commit
-
-
David Bigagli authored
-