- 22 Sep, 2015 10 commits
-
-
Danny Auble authored
-
Morris Jette authored
If GRES are associated with specific CPUs and a job allocation includes GRES, which are not associated with the specific CPUs allocated to the job, then when the job is deallocated, an underflow error results. To reproduce: gres.conf: Name=gpu File=/dev/tty0 CPUs=0-5 Name=gpu File=/dev/tty1 CPUs=6-11 Name=gpu File=/dev/tty2 CPUs=12-17 Name=gpu File=/dev/tty3 CPUs=18-23 Then $ srun --gres=gpu:2 -N1 --ntasks-per-node=2 hostname In slurmctld log file: error: gres/gpu: job 695 dealloc node smd1 topo gres count underflow Logic modified to increment the count based upon the specific GRES actually allocated, ignoring the associated CPUs (too late to consider that after the GRES as picked).
-
Danny Auble authored
Conflicts: NEWS src/slurmctld/acct_policy.c
-
Danny Auble authored
-
Danny Auble authored
Also a very minor sanity check in job_mgr.c to make sure we at least have a task count. This shouldn't matter, but just to be as robust as possible.
-
Nathan Yee authored
only 1 job was accounted (against MaxSubmitJob) for when an array was submitted.
-
David Bigagli authored
-
Tommi Tervo authored
-
Morris Jette authored
-
Danny Auble authored
Correct counting for job array limits, job count limit underflow possible when master cancellation of master job record. bug 1952
-
- 21 Sep, 2015 10 commits
-
-
Brian Christiansen authored
-
Brian Christiansen authored
-
Morris Jette authored
-
Morris Jette authored
-
Axel Huebl authored
Implement an option NONE for not sending mails at all. Closes http://bugs.schedmd.com/show_bug.cgi?id=1962
-
Morris Jette authored
-
Danny Auble authored
Also a very minor sanity check in job_mgr.c to make sure we at least have a task count. This shouldn't matter, but just to be as robust as possible.
-
Tim Wickberg authored
-
Nathan Yee authored
only 1 job was accounted (against MaxSubmitJob) for when an array was submitted.
-
Manuel Rodriguez-Pascual authored
I've noticed that parameter JobCheckpointDir has a (from my point of view) inconsistent behavior. * in sbatch executions, it is exported as CWD * in srun it is also exported as CWD, * except when it is manually set with "--checkpoint-dir=dir". If so, that value is exported. * value defined in slurm.conf is, as far as I know, never read. I have created this small patch to correct that behaviour. Now it is exported with the value configured on slurm.conf. If nothing is set, returned value is the defined on common/read_config.h,
-
- 18 Sep, 2015 4 commits
-
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
The "scontrol hold/release" commands accept either "name=" or "jobname=". I've modified the documentation to only show "jobname" for consistentcy with the "scontrol update" command. I have also modified the "scancel" command to accept "--jobname=" in addition to the existing "--name=" and "-n".
-
Morris Jette authored
If a sleep was interrupted or ran a bit long, the backfill scheduler run times could be significantly wrong as the sleep time was based upon the calculation of sleep_count x desired_sleep_time. This new logic captures and uses the actual sleep time for good accuracy. bug 1939
-
- 17 Sep, 2015 2 commits
-
-
David Bigagli authored
-
Tommi Tervo authored
-
- 16 Sep, 2015 3 commits
-
-
-
Morris Jette authored
bug 1947
-
Morris Jette authored
Fix teardown race condition that can result in infinite loop. bug 1947
-
- 15 Sep, 2015 3 commits
-
-
Danny Auble authored
-
David Bigagli authored
-
Yiannis Georgiou authored
-
- 14 Sep, 2015 1 commit
-
-
David Bigagli authored
-
- 13 Sep, 2015 2 commits
-
-
Danny Auble authored
-
Danny Auble authored
-
- 11 Sep, 2015 5 commits
-
-
Morris Jette authored
This prevents a step from being launched if the job is killed while the prolog is running. Reproducing the original failure requires use of srun to trigger the prolog and using scancel while that prolog is running. bug 1755
-
Danny Auble authored
-
Danny Auble authored
anomaly when only asking for 1 (task_id was never set to INFINITE).
-
Danny Auble authored
-
Danny Auble authored
-