- 11 Mar, 2014 3 commits
-
-
Morris Jette authored
-
Morris Jette authored
Rather than continuously retrying a step create for suspended jobs, add a sleep with exponential backoff
-
Morris Jette authored
If a job is suspended, log the step create failure using debug rather than info in slurmctld
-
- 10 Mar, 2014 5 commits
-
-
Morris Jette authored
Conflicts: src/plugins/switch/nrt/nrt.c
-
Morris Jette authored
Cache results for major performance improvement. bug 636
-
David Bigagli authored
-
Rod Schultz authored
-
Morris Jette authored
The test for NRT_NULL_MAGIC failed to capture some problems if the pointer to the structure was NULL. This is an ammendment to commit 2a55aa0b
-
- 08 Mar, 2014 7 commits
-
-
Morris Jette authored
-
Morris Jette authored
If a job request explicitly requests a GRES count of zero and that is not the last GRES in the slurm internal data structures, the job request will be rejected. bug 633
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
Conflicts: src/plugins/switch/nrt/nrt.c
-
Danny Auble authored
Perhaps should also look into doing this for nodeinfo and libstate
-
- 07 Mar, 2014 8 commits
-
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
this would cause pmd's to hang.
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
-
- 06 Mar, 2014 10 commits
-
-
David Bigagli authored
-
Morris Jette authored
This prevented jobs from sharing nodes
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
code.
-
David Bigagli authored
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
-
- 05 Mar, 2014 7 commits
-
-
David Bigagli authored
remove typo
-
John Morrissey authored
-
Morris Jette authored
-
Morris Jette authored
-
Nathan Yee authored
-
Morris Jette authored
-
Morris Jette authored
In test15.17, an salloc spawns an sbatch within that same job allocation then exits. This results in a race condition which can cause the terminate job RPC from slurmctld to slurmd to briefly hang and mark the node non-responsive, causing later tests that may need the node to fail (timeout waiting for job allocation). A sleep is added to the salloc here to give the batch job a chance to begin and clean up the job allocation quickly.
-