- 05 Oct, 2012 2 commits
-
-
Morris Jette authored
Preemptor was not being scheduled. Fix for bugzilla #3.
-
Morris Jette authored
While this change lets gang scheduling happen, it overallocates resources from different priority partitions when gang scheduling is not running.
-
- 04 Oct, 2012 1 commit
-
-
Morris Jette authored
Preemptor was not being scheduled. See bugzilla #3 for details
-
- 02 Oct, 2012 2 commits
-
-
Morris Jette authored
See bugzilla bug 132 When using select/cons_res and CR_Core_Memory, hyperthreaded nodes may be overcommitted on memory when CPU counts are scaled. I've tested 2.4.2 and HEAD (2.5.0-pre3). Conditions: ----------- * SelectType=select/cons_res * SelectTypeParameters=CR_Core_Memory * Using threads - Ex. "NodeName=linux0 Sockets=1 CoresPerSocket=4 ThreadsPerCore=2 RealMemory=400" Description: ------------ In the cons_res plugin, _verify_node_state() in job_test.c checks if a node has sufficient memory for a job. However, the per-CPU memory limits appear to be scaled by the number of threads. This new value may exceed the available memory on the node. And, once a node is overcommitted on memory, future memory checks in _verify_node_state() will always succeed. Scenario to reproduce: ---------------------- With the example node linux0, we run a single-core job with 250MB/core srun --mem-per-cpu=250 sleep 60 cons_res checks that it will fit: ((real - alloc) >= job mem) ((400 - 0) >= 250) and the job starts Then, the memory requirement is doubled: "slurmctld: error: cons_res: node linux0 memory is overallocated (500) for job X" "slurmd: scaling CPU count by factor of 2" This job should not have started While the first job is still running, we submit a second, identical job srun --mem-per-cpu=250 sleep 60 cons_res checks that it will fit: ((400 - 500) >= 250), the unsigned int wraps, the test passes, and the job starts This second job also should not have started
-
Morris Jette authored
-
- 27 Sep, 2012 3 commits
-
-
Danny Auble authored
purged from the system if its front-end node goes down.
-
Danny Auble authored
database, and the job is running on a small block make sure we free up the correct node count.
-
Bill Brophy authored
-
- 25 Sep, 2012 1 commit
-
-
Morris Jette authored
Based upon work by Jason Sollom, Cray Inc. and used by permission
-
- 24 Sep, 2012 1 commit
-
-
Morris Jette authored
This addresses bug 130
-
- 21 Sep, 2012 1 commit
-
-
Danny Auble authored
with a job running or trying to run on it.
-
- 20 Sep, 2012 1 commit
-
-
Danny Auble authored
are planning on using the block. Previously it would fail those jobs erroneously.
-
- 19 Sep, 2012 1 commit
-
-
Danny Auble authored
-
- 18 Sep, 2012 3 commits
-
-
Morris Jette authored
-
Morris Jette authored
-
Danny Auble authored
-
- 17 Sep, 2012 2 commits
-
-
Danny Auble authored
-
Danny Auble authored
or previous piecemeal method.
-
- 15 Sep, 2012 1 commit
-
-
Danny Auble authored
Adapted from a patch from Stephen Trofinoff <trofinoff@cscs.ch>
-
- 13 Sep, 2012 3 commits
-
-
Morris Jette authored
-
Danny Auble authored
having the realtime server go down over and over again while waiting for the poll to finish.
-
Morris Jette authored
-
- 12 Sep, 2012 3 commits
-
-
Hongjia Cao authored
srun: do not allocate resources cn9 cn9 cn8 cn8
-
Danny Auble authored
host names. Like done elsewhere in SLURM
-
Don Lipari authored
users/accounts/flags
-
- 11 Sep, 2012 2 commits
-
-
Danny Auble authored
correctly for current job.
-
Danny Auble authored
-
- 08 Sep, 2012 1 commit
-
-
Danny Auble authored
realtime server.
-
- 07 Sep, 2012 2 commits
-
-
Danny Auble authored
-
Danny Auble authored
start in the future but is able to run now.
-
- 06 Sep, 2012 1 commit
-
-
Danny Auble authored
-
- 05 Sep, 2012 3 commits
-
-
Danny Auble authored
-
Don Albert authored
accounting as well.
-
Danny Auble authored
set errno == ESLURM_INVALID_TIME_VALUE on error instead.
-
- 04 Sep, 2012 1 commit
-
-
Hongjia Cao authored
-
- 30 Aug, 2012 1 commit
-
-
HAUTREUX Matthieu authored
We've discovered that our implementation of proctrack/cgroups includes internal threads as well as the process pids so that "scontrol list pids" shows them as well. The attached patch (courtesy of Matthieu Hautreux) to 2.4.2 fixes this problem.
-
- 29 Aug, 2012 1 commit
-
-
Danny Auble authored
running on it are preemptable by scheduling job.
-
- 28 Aug, 2012 3 commits
-
-
Kacper Kowalik authored
-
Don Lipari authored
reported some suspended time.
-
Danny Auble authored
static blocks.
-