- 21 Nov, 2014 2 commits
-
-
Danny Auble authored
-
Dominik Bartkiewicz authored
This can happen if the specified job ID is not found.
-
- 13 Nov, 2014 2 commits
-
-
Brian Christiansen authored
Bug 1253
-
Brian Christiansen authored
Bug 1255
-
- 12 Nov, 2014 2 commits
-
-
Danny Auble authored
-
Morris Jette authored
Do not requeue a batch job from slurmd daemon if it is killed while in the process of being launched (a race condition introduced in v14.03.9). This partially reverts commit 2bc9bc29
-
- 10 Nov, 2014 1 commit
-
-
Danny Auble authored
with CR_PACK_NODES. Really do commit d388dd67 a different way to get the same info and be able to lay out tasks correctly when --hint=nomultithread. tests on a 4 core 8 thread system are srun -n6 --hint=nomultithread --exclusive whereami | sort -h srun: cpu count 6 0 snowflake0 - MASK:0x1 1 snowflake0 - MASK:0x2 2 snowflake0 - MASK:0x4 3 snowflake0 - MASK:0x8 4 snowflake1 - MASK:0x1 5 snowflake1 - MASK:0x2 and srun -n10 -N5 --hint=nomultithread --exclusive whereami | sort -h srun: cpu count 10 0 snowflake0 - MASK:0x1 1 snowflake0 - MASK:0x2 2 snowflake0 - MASK:0x4 3 snowflake0 - MASK:0x8 4 snowflake1 - MASK:0x1 5 snowflake1 - MASK:0x2 6 snowflake1 - MASK:0x4 7 snowflake2 - MASK:0x1 8 snowflake3 - MASK:0x1 9 snowflake4 - MASK:0x1
-
- 07 Nov, 2014 2 commits
-
-
David Bigagli authored
an maintenance reservation that is not active yet.
-
Danny Auble authored
work "partition". reference bug 1246
-
- 06 Nov, 2014 4 commits
-
-
Danny Auble authored
is requested. This is a re-factor of commit e5635a76 related to bug 1148 to handle the cases where a job could run, but an error was given when selecting the nodes.
-
Danny Auble authored
-
Danny Auble authored
lock was locked outside of the function or not. This also fixes a race condition when adding a QOS and planning on using it right away when the controller is busy with previous requests.
-
Danny Auble authored
PerCPU. Before it wasn't taking into account if the user was requesting per node memory or the job was told it needed to use less than the node allowed.
-
- 05 Nov, 2014 1 commit
-
-
Danny Auble authored
-
- 04 Nov, 2014 2 commits
-
-
Danny Auble authored
-
Danny Auble authored
This was an unrealized regression from commit 0da01963. The problem is we were clearing the job_ptr->job_resrcs too early. This patch fixes it to wait until the job is actually being requeued so it does the right thing.
-
- 31 Oct, 2014 4 commits
-
-
Danny Auble authored
-
Danny Auble authored
This isn't that big of an issue for 14.03, but 14.11 added more to this string which could overflow the buffer since sprintf is used instead of snprintf. Using xstrfmtcat fixes the issue and is easier to read code.
-
Danny Auble authored
-
Danny Auble authored
amount of tasks / number of node.
-
- 30 Oct, 2014 1 commit
-
-
David Bigagli authored
-
- 27 Oct, 2014 1 commit
-
-
Danny Auble authored
are specified. This is a fix to commit b9cc5b31 which just didn't know mc_ptr->ntasks_per_core is initialized to INFINITE. Without it the node_cnt packed would be set to 1 on the user tools. This fixes bug 1148.
-
- 24 Oct, 2014 1 commit
-
-
David Singleton authored
We've seen slurmctld crashes due to negative job array indices.
-
- 23 Oct, 2014 1 commit
-
-
Morris Jette authored
BGQ: Fix race condition when job fails due to hardware failure and is requeued. Previous code could result in slurmctld abort with NULL pointer. bug 1096
-
- 21 Oct, 2014 1 commit
-
-
Morris Jette authored
Fix bug that prevented preservation of a job's GRES bitmap on slurmctld restart or reconfigure (bug was introduced in 14.03.5 "Clear record of a job's gres when requeued" and only applies when GRES mapped to specific files). bug 1192
-
- 20 Oct, 2014 4 commits
-
-
Danny Auble authored
-
David Bigagli authored
permission for the batch step.
-
David Bigagli authored
-
jette authored
Otherwise there will be no log file to write to, resulting in an abort bug 1185
-
- 18 Oct, 2014 1 commit
-
-
Nicolas Joly authored
-
- 17 Oct, 2014 3 commits
-
-
Morris Jette authored
-
David Bigagli authored
-
Morris Jette authored
Correct tracking of licenses for suspended jobs on slurmctld reconfigure or restart. Previously licenses for suspended jobs were not counted, so the license count could be exceeded with those jobs get resumed.
-
- 16 Oct, 2014 2 commits
-
-
Brian Christiansen authored
-
Morris Jette authored
Treat Cray MPI job calling exit() without mpi_fini() as fatal error for that specific task and let srun handle all timeout logic. Previous logic would cancel the entire job step and srun options for wait time and kill on exit were ignored. The new logic provides users with the following type of response: $ srun -n3 -K0 -N3 --wait=60 ./tmp Task:0 Cycle:1 Task:2 Cycle:1 Task:1 Cycle:1 Task:0 Cycle:2 Task:2 Cycle:2 slurmstepd: step 14927.0 task 1 exited without calling mpi_fini() srun: error: tux2: task 1: Killed Task:0 Cycle:3 Task:2 Cycle:3 Task:0 Cycle:4 ... bug 1171
-
- 15 Oct, 2014 4 commits
-
-
Nicolas Joly authored
This reverts commit 4d03d0b4. Make sure the correct Author is attributed here.
-
Danny Auble authored
This reverts commit 1891936e.
-
Danny Auble authored
This has apparently been broken from the get go. This fixes bug 1172. test21.22 should be updated to test the dump and load of a file that is generated.
-
Danny Auble authored
using --ntasks-per-node. This is related to bug 1145. What was happening is all the cpus were allocated on one socket instead of a cyclic method. While this is allowed it is strange and resulted in this bug. There appears to be a different bug as to why the tasks were laid out in a block fashion in the first place.
-
- 14 Oct, 2014 1 commit
-
-
Danny Auble authored
with no way to get them out. This fixes bug 1134. It is advised the pro/epilog to call xtprocadmin in the script instead of returning a non-zero exit code.
-