- 01 Feb, 2017 3 commits
-
-
pamxl authored
-
Morris Jette authored
-
Chansup Byun authored
-
- 31 Jan, 2017 12 commits
-
-
Danny Auble authored
-
Danny Auble authored
# Conflicts: # META # NEWS
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
-
Dominik Bartkiewicz authored
Use instead of the repeated construction of: bit_not(b); bit_and(a, b); bit_not(b); to avoid two addition iterations.
-
Alejandro Sanchez authored
-
Brian Christiansen authored
Can't set the job_id of a will_run will_run job prior to creating the job. Invalidates the jobs for non-slurmuser jobs. Also not burning a jobid could make the will_run job not be distinguishable in the logs. Will handle the federation aspect of this in 17.11. Bug 3436
-
Dominik Bartkiewicz authored
job_write_lock and sleep a second before grabbing it again. This should probably be made configurable later on.
-
Danny Auble authored
-
- 30 Jan, 2017 9 commits
-
-
Danny Auble authored
# Conflicts: # src/slurmctld/proc_req.c
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
e3a7bdcc f9804256 d72b13f2 Reference bug 3366 If you are running on a Bluegene system we rely on the prolog to take us out of configuring state. These commits work good for system rebooting the nodes where the prolog is running, but in the case of Bluegene this is the opposite desire :). These commits on a Bluegene pretty much make it so a batch job never gets launched.
-
Morris Jette authored
Properly set SLURM_JOB_GPUS environment variable for Prolog. bug 3437
-
Morris Jette authored
-
Morris Jette authored
Clear job's reason of "BeginTime" in a more timely fashion and/or prevents them from being stuck in a PENDING state. There are multiple ways of clearing the reason, especially on a lightly loaded system, but the state can persist indefinitely on a heavily loaded system. bug 3368
-
Danny Auble authored
-
Morris Jette authored
Fix to logic for getting expected start time of existing job ID with explicit begin time that is in the past. Previous logic would compare that (past) begin time with advanced reservations that would compete with it rather than the current time.
-
- 29 Jan, 2017 9 commits
-
-
Morris Jette authored
-
Morris Jette authored
On cray systems with step NHC, the step launches are delayed and produce a pair of messages (below) that caused the test to fail: srun: Job step creation temporarily disabled, retrying srun: Job step created
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
The v17.02 updates appear to have only gone into master rather than the slurm-17.02 branch
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
CRAY systems only: TaskPlugins must list task/cgroup before task/cray in order for the cgroup files to be created before task/cray runs. Without this change, the task/cray plugin frequently produces errors about the "mems" file being missing. The errors don't seem consistent, so this probably involves a race condition. Note that NERSC uses this order today and I changed read_config.c to produce a fatal error if the order is reversed.
-
Morris Jette authored
Failure to do so results in a bunch of task/cray errors about not finding the cgroup set up.
-
- 28 Jan, 2017 7 commits
-
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
Avoid a test failing of all nodes in a partition are not usable (down, drained, reserved, or otherwise unusable).
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
Disable test if underlying select/linear use
-
Morris Jette authored
Modify qsub test to explicitly create and destroy the error files to avoid leaving around a bunch of error files (even if they are normally empty).
-