- 15 Jun, 2015 6 commits
-
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
Logic was assuming the reservation had a node bitmap which was being used to check for overlapping jobs. If there is no node bitmap (e.g. a licenses only reservation), an abort would result.
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
-
- 12 Jun, 2015 4 commits
-
-
Brian Christiansen authored
Bug 1739
-
Brian Christiansen authored
Bug 1743
-
Brian Christiansen authored
-
Brian Christiansen authored
Bug 1743
-
- 11 Jun, 2015 9 commits
-
-
Brian Christiansen authored
Prevent double free.
-
Brian Christiansen authored
cpufreq variables weren't being intialized to NO_VAL when using task/none plugin. This caused the conditions in cpur_freq_reset to not stop test_cpu_owner_lock from being called.
-
Brian Christiansen authored
Conflicts: src/common/cpu_frequency.c
-
Brian Christiansen authored
Conflicts: src/common/cpu_frequency.c
-
Brian Christiansen authored
-
Brian Christiansen authored
Bug 1733
-
jette authored
-
Didier GAZEN authored
In your node_mgr fix to keep rebooted nodes down (commit 9cd15dfe), you forgot to consider the case of nodes that are powered up but are responding after ResumeTimeout seconds (the maximum time permitted). Such nodes are marked DOWN (because they didn't respond within ResumeTimeout seconds) than should become silently available when ReturnToService=1 (as stated in the slurm.conf manual) With your modification when such nodes are finally responding, they are seen as rebooted nodes and remain in the DOWN state (with the new reason: Node unexpectedly rebooted) even when ReturnToService=1 ! Correction of commit 3c2b46af
-
Didier GAZEN authored
-
- 10 Jun, 2015 9 commits
-
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
It was always failing when a node list was supplied on job submission
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
-
Didier GAZEN authored
In your node_mgr fix to keep rebooted nodes down (commit 9cd15dfe), you forgot to consider the case of nodes that are powered up but are responding after ResumeTimeout seconds (the maximum time permitted). Such nodes are marked DOWN (because they didn't respond within ResumeTimeout seconds) than should become silently available when ReturnToService=1 (as stated in the slurm.conf manual) With your modification when such nodes are finally responding, they are seen as rebooted nodes and remain in the DOWN state (with the new reason: Node unexpectedly rebooted) even when ReturnToService=1 ! My patch to obtain the correct behaviour:
-
Morris Jette authored
Conflicts: doc/man/man5/slurm.conf.5 src/plugins/select/cons_res/job_test.c
-
Morris Jette authored
Equivalent fix as e1a00772 for select/serial rather than select/cons_res
-
- 09 Jun, 2015 12 commits
-
-
David Bigagli authored
-
Morris Jette authored
1. I submit a first job that uses 1 GPU: $ srun --gres gpu:1 --pty bash $ echo $CUDA_VISIBLE_DEVICES 0 2. while the first one is still running, a 2-GPU job asking for 1 task per node waits (and I don't really understand why): $ srun --ntasks-per-node=1 --gres=gpu:2 --pty bash srun: job 2390816 queued and waiting for resources 3. whereas a 2-GPU job requesting 1 core per socket (so just 1 socket) actually gets GPUs allocated from two different sockets! $ srun -n 1 --cores-per-socket=1 --gres=gpu:2 -p testk --pty bash $ echo $CUDA_VISIBLE_DEVICES 1,2 With this change #2 works the same way as #3. bug 1725
-
Morris Jette authored
-
Brian Christiansen authored
Bug 1572
-
Brian Christiansen authored
Bug 1572
-
Danny Auble authored
-
Danny Auble authored
-
Morris Jette authored
-
David Bigagli authored
option.
-
Brian Christiansen authored
-
Morris Jette authored
-
Morris Jette authored
Modify test to work if "." is not in search path Fix error message, change "sbatch" to "salloc"
-