- 07 Oct, 2015 6 commits
-
-
Morris Jette authored
-
Morris Jette authored
byg 2013
-
David Bigagli authored
-
Hongjia Cao authored
-
David Bigagli authored
-
Hongjia Cao authored
-
- 06 Oct, 2015 13 commits
-
-
Morris Jette authored
Create a "task" cgroup at job allocation time via the prolog container. A dummy "sleep" process will occupy the cgroup so long as the job exits. bug 1994
-
Morris Jette authored
-
Morris Jette authored
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
requirements.
-
Axel Auweter authored
Add acct_gather_energy/ibmaem plugin for systems with IBM Systems Director Active Energy Manager.
-
Morris Jette authored
Conflicts: src/slurmctld/job_mgr.c
-
Thomas Cadeau authored
bug 2011
-
jette authored
It would not cause any problem other than excess memory being allocated, but was found by CLANG.
-
Danny Auble authored
','.
-
Morris Jette authored
Conflicts: src/common/proc_args.c
-
Morris Jette authored
bug 1999
-
- 05 Oct, 2015 4 commits
-
-
Morris Jette authored
A configuration of "DefMemPerNode=UNLIMITED" prevented more than one job from running at a time on a given node, which broke some tests. These changes prevent the tests from breaking.
-
david authored
-
jette authored
-
jette authored
-
- 03 Oct, 2015 2 commits
-
-
Morris Jette authored
Conflicts: NEWS
-
Morris Jette authored
Don't requeue RPC going out from slurmctld to DOWN nodes (can generate repeating communication errors). bug 2002
-
- 02 Oct, 2015 4 commits
-
-
Morris Jette authored
-
Morris Jette authored
This will only happen if a PING RPC for the node is already queued when the decision is made to power it down, then fails to get a response for the ping (since the node is already down). bug 1995
-
Morris Jette authored
If a job's CPUs/task ratio is increased due to configured MaxMemPerCPU, then increase it's allocated CPU count in order to enforce CPU limits. Previous logic would increase/set the cpus_per_task as needed if a job's --mem-per-cpu was above the configured MaxMemPerCPU, but NOT increase the min_cpus or max_cpus varilable. This resulted in allocating the wrong CPU count.
-
Morris Jette authored
This will only happen if a PING RPC for the node is already queued when the decision is made to power it down, then fails to get a response for the ping (since the node is already down). bug 1995
-
- 01 Oct, 2015 2 commits
-
-
Danny Auble authored
values.
-
Morris Jette authored
This required a fairly major re-write of the select plugin logic bug 1975
-
- 30 Sep, 2015 6 commits
-
-
Morris Jette authored
Correct some cgroup paths ("step_batch" vs. "step_4294967294", "step_exter" vs. "step_extern", and "step_extern" vs. "step_4294967295").
-
Morris Jette authored
Document that if a job's memory per CPU limit exceeds the system limit, that the job's memory limit is decreased and it's CPU count increased automatically.
-
Morris Jette authored
If a job's CPUs/task ratio is increased due to configured MaxMemPerCPU, then increase it's allocated CPU count in order to enforce CPU limits. Previous logic would increase/set the cpus_per_task as needed if a job's --mem-per-cpu was above the configured MaxMemPerCPU, but NOT increase the min_cpus or max_cpus varilable. This resulted in allocating the wrong CPU count.
-
Brian Christiansen authored
Conflicts: NEWS src/slurmctld/job_mgr.c src/srun/libsrun/launch.c
-
Brian Christiansen authored
Continuation of 1252d1a1 Bug 1938
-
Morris Jette authored
Requeue/hold batch job launch request if job already running. This is possible if node went to DOWN state, but jobs remained active. In addition, if a prolog/epilog failed DRAIN the node rather than setting it down, which could kill jobs that could continue to run. bug 1985
-
- 29 Sep, 2015 3 commits
-
-
Morris Jette authored
This makes srun more consistent with salloc and sbatch
-
Morris Jette authored
Previous logic would not report termiation siganl, only exit code, which could be meaningless.
-
Brian Christiansen authored
Bug 1938
-