- 05 Oct, 2015 3 commits
- 03 Oct, 2015 2 commits
-
-
Morris Jette authored
Conflicts: NEWS
-
Morris Jette authored
Don't requeue RPC going out from slurmctld to DOWN nodes (can generate repeating communication errors). bug 2002
-
- 02 Oct, 2015 4 commits
-
-
Morris Jette authored
-
Morris Jette authored
This will only happen if a PING RPC for the node is already queued when the decision is made to power it down, then fails to get a response for the ping (since the node is already down). bug 1995
-
Morris Jette authored
If a job's CPUs/task ratio is increased due to configured MaxMemPerCPU, then increase it's allocated CPU count in order to enforce CPU limits. Previous logic would increase/set the cpus_per_task as needed if a job's --mem-per-cpu was above the configured MaxMemPerCPU, but NOT increase the min_cpus or max_cpus varilable. This resulted in allocating the wrong CPU count.
-
Morris Jette authored
This will only happen if a PING RPC for the node is already queued when the decision is made to power it down, then fails to get a response for the ping (since the node is already down). bug 1995
-
- 01 Oct, 2015 2 commits
-
-
Danny Auble authored
values.
-
Morris Jette authored
This required a fairly major re-write of the select plugin logic bug 1975
-
- 30 Sep, 2015 6 commits
-
-
Morris Jette authored
Correct some cgroup paths ("step_batch" vs. "step_4294967294", "step_exter" vs. "step_extern", and "step_extern" vs. "step_4294967295").
-
Morris Jette authored
Document that if a job's memory per CPU limit exceeds the system limit, that the job's memory limit is decreased and it's CPU count increased automatically.
-
Morris Jette authored
If a job's CPUs/task ratio is increased due to configured MaxMemPerCPU, then increase it's allocated CPU count in order to enforce CPU limits. Previous logic would increase/set the cpus_per_task as needed if a job's --mem-per-cpu was above the configured MaxMemPerCPU, but NOT increase the min_cpus or max_cpus varilable. This resulted in allocating the wrong CPU count.
-
Brian Christiansen authored
Conflicts: NEWS src/slurmctld/job_mgr.c src/srun/libsrun/launch.c
-
Brian Christiansen authored
Continuation of 1252d1a1 Bug 1938
-
Morris Jette authored
Requeue/hold batch job launch request if job already running. This is possible if node went to DOWN state, but jobs remained active. In addition, if a prolog/epilog failed DRAIN the node rather than setting it down, which could kill jobs that could continue to run. bug 1985
-
- 29 Sep, 2015 4 commits
-
-
Morris Jette authored
This makes srun more consistent with salloc and sbatch
-
Morris Jette authored
Previous logic would not report termiation siganl, only exit code, which could be meaningless.
-
Brian Christiansen authored
Bug 1938
-
Brian Christiansen authored
Bug 1984
-
- 28 Sep, 2015 4 commits
-
-
Morris Jette authored
When nodes have been allocated to a job and then released by the job while resizing, this patch prevents the nodes from continuing to appear allocated and unavailable to other jobs. Requires exclusive node allocation to trigger. This prevents the previously reported failure, but a proper fix will be quite complex and delayed to the next major release of Slurm (v 16.05). bug 1851
-
Morris Jette authored
When nodes have been allocated to a job and then released by the job while resizing, this patch prevents the nodes from continuing to appear allocated and unavailable to other jobs. Requires exclusive node allocation to trigger. This prevents the previously reported failure, but a proper fix will be quite complex and delayed to the next major release of Slurm (v 16.05). bug 1851
-
Gennaro Oliva authored
-
Morris Jette authored
Optimizing topology takes place first, then picking lowest weight nodes within the switches offering the best fit. bug 1979
-
- 25 Sep, 2015 10 commits
-
-
Koji Tanaka authored
-
Danny Auble authored
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
Add ability to change a job array's maximum running task count: "scontrol update jobid=# arraytaskthrottle=#" bug 1863
-
Morris Jette authored
-
Morris Jette authored
Added as part of requeue/hold update
-
Morris Jette authored
-
- 24 Sep, 2015 5 commits
-
-
Morris Jette authored
Was printing "Name=#" rather than "JobID=#"
-
Danny Auble authored
-
Nathan Yee authored
Validate that sbatch, srun, salloc return partition error message on invalid partition name. bug 1223
-
Danny Auble authored
-
Morris Jette authored
-