- 13 Feb, 2017 4 commits
-
-
Morris Jette authored
burst_buffer/cray - Do not execute "pre_run" operation until after all nodes are booted and ready for use. bug 3461
-
Danny Auble authored
partitions.
-
Danny Auble authored
-
Morris Jette authored
Insure job does not start running before node is booted and PrologSlurmctld is complete. bug 3446
-
- 10 Feb, 2017 2 commits
-
-
Artem Polyakov authored
Ported from 7a4aa7f2.
-
Danny Auble authored
-
- 09 Feb, 2017 6 commits
-
-
Morris Jette authored
burst_buffer/cray - Support default pool which is not the first pool reported by DataWarp and log in Slurm when pools that are added or removed from DataWarp. bug 3453
-
Morris Jette authored
-
Danny Auble authored
-
Danny Auble authored
This reverts commit fd690a9c.
-
Danny Auble authored
-
Artem Polyakov authored
-
- 08 Feb, 2017 4 commits
-
-
Danny Auble authored
was there for jobs and steps, but for no real reason. I couldn't see a reason anyway.
-
Alejandro Sanchez authored
Jobs preempted with PreemptMode=REQUEUE were incorrectly recorded as REQUEUED in the accounting. Bug 3444
-
Morris Jette authored
bug 3448
-
Danny Auble authored
i.e. AcctGatherInterconnectType = ofed,none Bug 3412
-
- 07 Feb, 2017 3 commits
-
-
Danny Auble authored
Actual code change to make it happen. Just renaming things, nothing really changed.
-
Dominik Bartkiewicz authored
Bug 3447
-
Brian Christiansen authored
Makes it so that an srun/salloc initiated from a cluster with a one switch type will work on a different cluster with a different switch type.
-
- 06 Feb, 2017 1 commit
-
-
Danny Auble authored
dynamically or statically to libslurm. This can result in much smaller binaries, but isn't the easiest to develop against as if the files in src/common or src/*api change and the binaries linking against them are compiled directly libslurm.so doesn't get recompiled automatically, you have to manually do it. But in production this could give large benefits as Slurm's footprint is now considerably smaller. This used to not work on environments like AIX where plugins couldn't resolve variables used in the parent program, but this appears to not be the case now we link to the "full" .so which exports everything as the .o does.
-
- 03 Feb, 2017 1 commit
-
-
Alejandro Sanchez authored
Bug 3444
-
- 01 Feb, 2017 2 commits
-
-
Morris Jette authored
Fix srun I/O race condtion to eliminate a error message that might be generated if the application exits with outstanding stdin. bug 3166
-
Chansup Byun authored
-
- 31 Jan, 2017 4 commits
-
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
-
Alejandro Sanchez authored
-
- 30 Jan, 2017 3 commits
-
-
Morris Jette authored
Properly set SLURM_JOB_GPUS environment variable for Prolog. bug 3437
-
Morris Jette authored
Clear job's reason of "BeginTime" in a more timely fashion and/or prevents them from being stuck in a PENDING state. There are multiple ways of clearing the reason, especially on a lightly loaded system, but the state can persist indefinitely on a heavily loaded system. bug 3368
-
Morris Jette authored
Fix to logic for getting expected start time of existing job ID with explicit begin time that is in the past. Previous logic would compare that (past) begin time with advanced reservations that would compete with it rather than the current time.
-
- 29 Jan, 2017 1 commit
-
-
Morris Jette authored
CRAY systems only: TaskPlugins must list task/cgroup before task/cray in order for the cgroup files to be created before task/cray runs. Without this change, the task/cray plugin frequently produces errors about the "mems" file being missing. The errors don't seem consistent, so this probably involves a race condition. Note that NERSC uses this order today and I changed read_config.c to produce a fatal error if the order is reversed.
-
- 27 Jan, 2017 3 commits
-
-
Danny Auble authored
Turns out this never worked, ever. What used to happen is if the protocol_version that was read in didn't match the rpc_version given to unpack things was just 0. What this does now is set the rpc_version to what was stored making it all good.
-
Morris Jette authored
Revert logic originally added for bug 3166. Revisit as time permits. bug 3166
-
Morris Jette authored
Interpet all format options in output/error file to log prolog errors. Prior logic only supported "%j" (job ID) option. bug 3354
-
- 26 Jan, 2017 1 commit
-
-
Alejandro Sanchez authored
Bug 3431
-
- 25 Jan, 2017 4 commits
-
-
Morris Jette authored
burst_buffer/cray - Fix race condition that could cause multiple batch job launch requests resulting in downed nodes. bug 3366
-
Dominik Bartkiewicz authored
-
Danny Auble authored
This reverts commit b9bff82f.
-
Danny Auble authored
-
- 24 Jan, 2017 1 commit
-
-
Danny Auble authored
-