Commits · bc127394c5f2d33eacc7979af83dfd8ff9318bab · Manuel G. Marciani / ces_slurm_simulator

05 Oct, 2015 3 commits
- Fix typo. · bc127394
  david authored Oct 05, 2015
  
  bc127394
- Merge branch 'slurm-14.11' into slurm-15.08 · 721029bb
  jette authored Oct 04, 2015
  
  721029bb
- Include header for clean BGQ/Cray build · 3d601061
  jette authored Oct 04, 2015
  
  3d601061
03 Oct, 2015 2 commits
- Merge branch 'slurm-14.11' into slurm-15.08 · 68d3ae59
  Morris Jette authored Oct 02, 2015
```
Conflicts:
	NEWS
```
  68d3ae59
- Don't requeue RPCs from slurmctld to DOWN nodes · f4ea9dec
  Morris Jette authored Oct 02, 2015
```
Don't requeue RPC going out from slurmctld to DOWN nodes (can generate
    repeating communication errors).
bug 2002
```
  f4ea9dec
02 Oct, 2015 4 commits

Update v15.08.2 NEWS with v14.11.10 work · ff24578a
Morris Jette authored Oct 01, 2015

ff24578a

Don't mark powered down node as not responding · c0bb562a

Morris Jette authored Oct 01, 2015

This will only happen if a PING RPC for the node is already queued
  when the decision is made to power it down, then fails to get
  a response for the ping (since the node is already down).
bug 1995

c0bb562a

Reset job CPU count if CPUs/task ratio increased for mem limit · 29fe3eae

Morris Jette authored Sep 30, 2015

If a job's CPUs/task ratio is increased due to configured MaxMemPerCPU,
then increase it's allocated CPU count in order to enforce CPU limits.
Previous logic would increase/set the cpus_per_task as needed if a
job's --mem-per-cpu was above the configured MaxMemPerCPU, but NOT
increase the min_cpus or max_cpus varilable. This resulted in allocating
the wrong CPU count.

29fe3eae

Don't mark powered down node as not responding · 8c03a8bc

Morris Jette authored Oct 01, 2015

This will only happen if a PING RPC for the node is already queued
  when the decision is made to power it down, then fails to get
  a response for the ping (since the node is already down).
bug 1995

8c03a8bc

01 Oct, 2015 2 commits
- MYSQL - Remove restriction to have to be at least an operator to query TRES · 2bfbcbd8
  Danny Auble authored Oct 01, 2015
```
values.
```
  2bfbcbd8
- Fix advanced reservation core selection logic with network topology · 9e4a695d
  Morris Jette authored Oct 01, 2015
```
This required a fairly major re-write of the select plugin logic
bug 1975
```
  9e4a695d
30 Sep, 2015 6 commits

Make cgroup paths consistent · c5c566ff

Morris Jette authored Sep 30, 2015

Correct some cgroup paths ("step_batch" vs. "step_4294967294", "step_exter"
    vs. "step_extern", and "step_extern" vs. "step_4294967295").

c5c566ff

Document CPU count increase for mem_per_cpu limit · 0164729f

Morris Jette authored Sep 30, 2015

Document that if a job's memory per CPU limit exceeds the system
limit, that the job's memory limit is decreased and it's CPU count
increased automatically.

0164729f

Reset job CPU count if CPUs/task ratio increased for mem limit · 836912bf

Morris Jette authored Sep 30, 2015

If a job's CPUs/task ratio is increased due to configured MaxMemPerCPU,
then increase it's allocated CPU count in order to enforce CPU limits.
Previous logic would increase/set the cpus_per_task as needed if a
job's --mem-per-cpu was above the configured MaxMemPerCPU, but NOT
increase the min_cpus or max_cpus varilable. This resulted in allocating
the wrong CPU count.

836912bf

Merge remote-tracking branch 'origin/slurm-14.11' into slurm-15.08 · 8812fabe
Brian Christiansen authored Sep 30, 2015
```
Conflicts:
	NEWS
	src/slurmctld/job_mgr.c
	src/srun/libsrun/launch.c
```
8812fabe
Enable srun -I to use pending step logic. · 0bf0e71f
Brian Christiansen authored Sep 30, 2015
```
Continuation of 1252d1a1
Bug 1938
```
0bf0e71f

Don't start duplicate batch job · c1513956

Morris Jette authored Sep 29, 2015

Requeue/hold batch job launch request if job already running. This is
  possible if node went to DOWN state, but jobs remained active.
In addition, if a prolog/epilog failed DRAIN the node rather than
  setting it down, which could kill jobs that could continue to
  run.
bug 1985

c1513956

29 Sep, 2015 4 commits
- srun: Add SLURM_JOB_NODELIST env var · dfaa33ee
  Morris Jette authored Sep 29, 2015
```
This makes srun more consistent with salloc and sbatch
```
  dfaa33ee
- Improve job_completion logging · 4cebe297
  Morris Jette authored Sep 29, 2015
```
Previous logic would not report termiation siganl, only exit code,
  which could be meaningless.
```
  4cebe297
- Fix srun -I<timeout> from flooding the controller with step create requests. · 1252d1a1
  Brian Christiansen authored Sep 29, 2015
```
Bug 1938
```
  1252d1a1
- Fix updating job in db after extending job's timelimit past partition's timelimit. · 7a0836fc
  Brian Christiansen authored Sep 29, 2015
```
Bug 1984
```
  7a0836fc
28 Sep, 2015 4 commits

Fix for node state when shrinking jobs · 16f4b6a9

Morris Jette authored Sep 28, 2015

When nodes have been allocated to a job and then released by the
  job while resizing, this patch prevents the nodes from continuing
  to appear allocated and unavailable to other jobs. Requires
  exclusive node allocation to trigger. This prevents the previously
  reported failure, but a proper fix will be quite complex and
  delayed to the next major release of Slurm (v 16.05).
bug 1851

16f4b6a9

Fix for node state when shrinking jobs · 6c9d4540

Morris Jette authored Sep 28, 2015

When nodes have been allocated to a job and then released by the
  job while resizing, this patch prevents the nodes from continuing
  to appear allocated and unavailable to other jobs. Requires
  exclusive node allocation to trigger. This prevents the previously
  reported failure, but a proper fix will be quite complex and
  delayed to the next major release of Slurm (v 16.05).
bug 1851

6c9d4540

Fix typo in scontrol man page · ba60b31e
Gennaro Oliva authored Sep 28, 2015

ba60b31e

Document topology/tree with respect to node weight · f1a3e958

Morris Jette authored Sep 28, 2015

Optimizing topology takes place first, then picking lowest weight
nodes within the switches offering the best fit.
bug 1979

f1a3e958

25 Sep, 2015 10 commits
- Added FAQ on how to exclude users from pam_slurm · 26e6ddfe
  Koji Tanaka authored Sep 25, 2015
  
  26e6ddfe
- remove vestigal debug · 3a3a6437
  Danny Auble authored Sep 25, 2015
  
  3a3a6437
- Add another SLUG15 link · 0ce284c7
  Morris Jette authored Sep 25, 2015
  
  0ce284c7
- Start NEWS for v15.08.2 · b202f5e7
  Morris Jette authored Sep 25, 2015
  
  b202f5e7
- Update META for v15.08.1 tag · f41d3073
  Morris Jette authored Sep 25, 2015
  
  f41d3073
- Add test for job array throttle changes · 84f8662f
  Morris Jette authored Sep 25, 2015
  
  84f8662f
- Allow changing job array max task count · 56b0ff1c
  Morris Jette authored Sep 25, 2015
```
Add ability to change a job array's maximum running task count:
    "scontrol update jobid=# arraytaskthrottle=#"
bug 1863
```
  56b0ff1c
- Improve scontrol error handling logic/logging · b9f85230
  Morris Jette authored Sep 25, 2015
  
  b9f85230
- remove stray debug logging · cdda0a0c
  Morris Jette authored Sep 25, 2015
```
Added as part of requeue/hold update
```
  cdda0a0c
- Improve limit description · 598b4b3b
  Morris Jette authored Sep 24, 2015
  
  598b4b3b
24 Sep, 2015 5 commits
- tweak of "scontrol show burst" output · e80001f8
  Morris Jette authored Sep 24, 2015
```
Was printing "Name=#" rather than "JobID=#"
```
  e80001f8
- Fix TRES counts on GRES on a clean start of the slurmctld. · 8274ea54
  Danny Auble authored Sep 24, 2015
  
  8274ea54
- Add test for invalid partition · 05f21d92
  Nathan Yee authored Sep 24, 2015
```
Validate that sbatch, srun, salloc return partition error message
on invalid partition name.
bug 1223
```
  05f21d92
- Fix long sacct output to use correct variables · ae13a710
  Danny Auble authored Sep 24, 2015
  
  ae13a710
- Make tests a bit more robust · 453abf39
  Morris Jette authored Sep 24, 2015
  
  453abf39