- 12 May, 2014 5 commits
-
-
Morris Jette authored
-
Morris Jette authored
If a job has non-responding node, retry job step create rather than returning with DOWN node error. bug 734
-
Morris Jette authored
-
Puenlap Lee authored
Also correct related documentation
-
Hongjia Cao authored
Completing nodes is removed when calling _try_sched() for a job, which is the case in select_nodes(). If _try_sched() thinks the job can run now but select_nodes() returns ESLURM_NODES_BUSY, the backfill loop will be ended.
-
- 09 May, 2014 3 commits
-
-
Danny Auble authored
-
Danny Auble authored
-
Morris Jette authored
Do not resume a job with specialized cores on a node running another job with specialized cores (only one can run at a time). bug 792
-
- 08 May, 2014 2 commits
-
-
Morris Jette authored
Fix sinfo -R to print each down/drained node once, rather than once per partition. This was broken in the sinfo change to process each partition's information in a separate pthread.
-
Morris Jette authored
Correct sinfo --sort fields to match documentation: E => Reason, H -> Reason Time (new), R -> Partition Name, u/U -> Reason user (new)
-
- 07 May, 2014 5 commits
-
-
Morris Jette authored
Without this patch, jobs with an infinite time limit would have their preemption GraceTime ignored.
-
Morris Jette authored
related to bug 789
-
Danny Auble authored
-
Danny Auble authored
them.
-
Morris Jette authored
Added ChosLoc configuration parameter in slurm.conf (Chroot OS tool location). bug 685
-
- 06 May, 2014 6 commits
-
-
Morris Jette authored
-
Danny Auble authored
-
Danny Auble authored
-
Morris Jette authored
-
Morris Jette authored
In slurm.spec file, replace "Requires cray-MySQL-devel-enterprise" with "Requires mysql-devel" per David Gloe.
-
Morris Jette authored
Permit jobs steps full control over cpu_bind options if specialized cores are included in the job allocation. bug 782
-
- 05 May, 2014 5 commits
-
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
Related to bug 771
-
Morris Jette authored
In version 14.03.2 was using "slurm_<jobid>_4294967294.out" due to error in job array logic.
-
Danny Auble authored
cnode counts.
-
- 02 May, 2014 4 commits
-
-
Danny Auble authored
-
Danny Auble authored
This is for bug 775
-
Danny Auble authored
-
Danny Auble authored
-
- 01 May, 2014 6 commits
-
-
Danny Auble authored
regression from 2a674aee
-
Danny Auble authored
is running.
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
is running.
-
Danny Auble authored
-
- 30 Apr, 2014 3 commits
-
-
David Bigagli authored
together.
-
Morris Jette authored
Switch/nrt - Properly track usage of CAU and RDMA resources with multiple tasks per compute node. Previous logic would allocate resources once per task and then deallocate once per node, leaking CMA and RDMA resources and preventing their use by future jobs.
-
Morris Jette authored
If a job is held, then only release it with the "scontrol release <jobid>" command rather than a simple reset of the job's priority. This is needed to support job arrays better. Otherwise a priority reset of a job array would free all requeued/held jobs from that job array rather than leaving them held.
-
- 29 Apr, 2014 1 commit
-
-
Morris Jette authored
Modify slurmd to keep track of which jobs have already been launched. It the launch is complete, then process suspend requests immediately. Previously the suspend request was always delayed by 1 second, which adversely impacts gang scheduling performance. If the job can't be found (say after a slurmd restart), then delay the suspend by up to 3 seconds, but only once.
-