Commits · ff24578a967b0d5978be75414375b104e58f9c9c · Manuel G. Marciani / ces_slurm_simulator

02 Oct, 2015 3 commits

Update v15.08.2 NEWS with v14.11.10 work · ff24578a
Morris Jette authored Oct 01, 2015

ff24578a

Don't mark powered down node as not responding · c0bb562a

Morris Jette authored Oct 01, 2015

This will only happen if a PING RPC for the node is already queued
  when the decision is made to power it down, then fails to get
  a response for the ping (since the node is already down).
bug 1995

c0bb562a

Reset job CPU count if CPUs/task ratio increased for mem limit · 29fe3eae

Morris Jette authored Sep 30, 2015

If a job's CPUs/task ratio is increased due to configured MaxMemPerCPU,
then increase it's allocated CPU count in order to enforce CPU limits.
Previous logic would increase/set the cpus_per_task as needed if a
job's --mem-per-cpu was above the configured MaxMemPerCPU, but NOT
increase the min_cpus or max_cpus varilable. This resulted in allocating
the wrong CPU count.

29fe3eae

01 Oct, 2015 2 commits
- MYSQL - Remove restriction to have to be at least an operator to query TRES · 2bfbcbd8
  Danny Auble authored Oct 01, 2015
```
values.
```
  2bfbcbd8
- Fix advanced reservation core selection logic with network topology · 9e4a695d
  Morris Jette authored Oct 01, 2015
```
This required a fairly major re-write of the select plugin logic
bug 1975
```
  9e4a695d
30 Sep, 2015 2 commits

Make cgroup paths consistent · c5c566ff

Morris Jette authored Sep 30, 2015

Correct some cgroup paths ("step_batch" vs. "step_4294967294", "step_exter"
    vs. "step_extern", and "step_extern" vs. "step_4294967295").

c5c566ff

Don't start duplicate batch job · c1513956

Morris Jette authored Sep 29, 2015

Requeue/hold batch job launch request if job already running. This is
  possible if node went to DOWN state, but jobs remained active.
In addition, if a prolog/epilog failed DRAIN the node rather than
  setting it down, which could kill jobs that could continue to
  run.
bug 1985

c1513956

29 Sep, 2015 2 commits
- Fix srun -I<timeout> from flooding the controller with step create requests. · 1252d1a1
  Brian Christiansen authored Sep 29, 2015
```
Bug 1938
```
  1252d1a1
- Fix updating job in db after extending job's timelimit past partition's timelimit. · 7a0836fc
  Brian Christiansen authored Sep 29, 2015
```
Bug 1984
```
  7a0836fc
28 Sep, 2015 2 commits

Fix for node state when shrinking jobs · 16f4b6a9

Morris Jette authored Sep 28, 2015

When nodes have been allocated to a job and then released by the
  job while resizing, this patch prevents the nodes from continuing
  to appear allocated and unavailable to other jobs. Requires
  exclusive node allocation to trigger. This prevents the previously
  reported failure, but a proper fix will be quite complex and
  delayed to the next major release of Slurm (v 16.05).
bug 1851

16f4b6a9

Fix for node state when shrinking jobs · 6c9d4540

Morris Jette authored Sep 28, 2015

When nodes have been allocated to a job and then released by the
  job while resizing, this patch prevents the nodes from continuing
  to appear allocated and unavailable to other jobs. Requires
  exclusive node allocation to trigger. This prevents the previously
  reported failure, but a proper fix will be quite complex and
  delayed to the next major release of Slurm (v 16.05).
bug 1851

6c9d4540

25 Sep, 2015 2 commits
- Start NEWS for v15.08.2 · b202f5e7
  Morris Jette authored Sep 25, 2015
  
  b202f5e7
- Allow changing job array max task count · 56b0ff1c
  Morris Jette authored Sep 25, 2015
```
Add ability to change a job array's maximum running task count:
    "scontrol update jobid=# arraytaskthrottle=#"
bug 1863
```
  56b0ff1c
24 Sep, 2015 2 commits
- Fix TRES counts on GRES on a clean start of the slurmctld. · 8274ea54
  Danny Auble authored Sep 24, 2015
  
  8274ea54
- Fix issue with wrong protocol version when using the srun --no-allocate · b8b7f2d6
  Danny Auble authored Sep 24, 2015
```
option.
```
  b8b7f2d6
23 Sep, 2015 8 commits
- Put node count in TRES string for steps. · e73ed4f3
  Danny Auble authored Sep 23, 2015
  
  e73ed4f3
- Fix sacct documentation about [Alloc|Req]TRES · 443833e2
  Danny Auble authored Sep 23, 2015
  
  443833e2
- Add [Alloc|Req]Nodes to sacct to be more like cpus. · 9c1c9c62
  Danny Auble authored Sep 23, 2015
  
  9c1c9c62
- For pending jobs have sacct print 0 for nnodes instead of the bogus 2. · 71287134
  Danny Auble authored Sep 23, 2015
```
The 2 came from the nodelist being "None assigned", which would be treated
as 2 hosts when sent into hostlist.
```
  71287134
- Make is so 'scontrol update job 1234 qos='' will set the qos back to · 8942aa1e
  Danny Auble authored Sep 23, 2015
```
the default qos for the association.
```
  8942aa1e
- Fix sacct --format=nnodes to print out correct information for pending · 6d82e5bf
  Danny Auble authored Sep 23, 2015
```
jobs.

Bug 1969
```
  6d82e5bf
- squeue re-combined pending job array records · f797fd1d
  Morris Jette authored Sep 23, 2015
```
Pending job array records will be combined into single line by default,
    even if started and requeued or modified.
bug 1759
```
  f797fd1d
- Combine 2 _valid_uid_gid functions into a single function to avoid · 1e854f69
  Danny Auble authored Sep 23, 2015
```
diversion.
```
  1e854f69
22 Sep, 2015 4 commits
- Add fixes from 14.11.10 to 15.08 change log · d56c72a3
  Danny Auble authored Sep 21, 2015
  
  d56c72a3
- Fix to handle arrays with respect to number of jobs submitted. Previously · 6c5c2026
  Nathan Yee authored Sep 21, 2015
```
only 1 job was accounted (against MaxSubmitJob) for when an array was
submitted.
```
  6c5c2026
- Fix typo. · 939dfc66
  David Bigagli authored Sep 17, 2015
  
  939dfc66
- Correct job count limit logic for job arrays · add3d8cd
  Danny Auble authored Sep 21, 2015
```
Correct counting for job array limits, job count limit underflow possible
    when master cancellation of master job record.
bug 1952
```
  add3d8cd
21 Sep, 2015 4 commits
- Fix memory leak when using PrologFlags=Alloc. · 93fe985d
  Brian Christiansen authored Sep 21, 2015
  
  93fe985d
- job reboot flag implicitly sets exclusive flag · 42d958db
  Morris Jette authored Sep 21, 2015
  
  42d958db
- SBATCH --mail-type=NONE · 4794d326
  Axel Huebl authored Sep 21, 2015
```
Implement an option NONE for not sending mails
at all.
Closes http://bugs.schedmd.com/show_bug.cgi?id=1962
```
  4794d326
- Fix to handle arrays with respect to number of jobs submitted. Previously · b404c3af
  Nathan Yee authored Sep 21, 2015
```
only 1 job was accounted (against MaxSubmitJob) for when an array was
submitted.
```
  b404c3af
17 Sep, 2015 1 commit
- Fix typo. · 48b7c2f0
  David Bigagli authored Sep 17, 2015
  
  48b7c2f0
16 Sep, 2015 1 commit
- burst_buffer/cray infinite loop fix · 49e4ac00
  Morris Jette authored Sep 16, 2015
```
Fix teardown race condition that can result in infinite loop.
bug 1947
```
  49e4ac00
15 Sep, 2015 1 commit
- Update NEWS · 522565d5
  David Bigagli authored Sep 15, 2015
  
  522565d5
13 Sep, 2015 1 commit
- Fix issue when tres cnt for energy is 0 for total reported · 08b85a56
  Danny Auble authored Sep 13, 2015
  
  08b85a56
11 Sep, 2015 5 commits
- handle job kill while step prolog running · 5342941c
  Morris Jette authored Sep 11, 2015
```
This prevents a step from being launched if the job is killed
while the prolog is running. Reproducing the original failure
requires use of srun to trigger the prolog and using scancel
while that prolog is running.
bug 1755
```
  5342941c
- MYSQL - If user is requesting various task_ids only return requested steps. · 310f1407
  Danny Auble authored Sep 11, 2015
  
  310f1407
- Simplify code when user is selecting a job/step/array id and removed · 323e197c
  Danny Auble authored Sep 11, 2015
```
anomaly when only asking for 1 (task_id was never set to INFINITE).
```
  323e197c
- MYSQL - Change debug to print out with DebugFlags=DB_Step instead of debug4 · 78ae8647
  Danny Auble authored Sep 11, 2015
  
  78ae8647
- handle job kill while step prolog running · bda0a436
  Morris Jette authored Sep 11, 2015
```
This prevents a step from being launched if the job is killed
while the prolog is running. Reproducing the original failure
requires use of srun to trigger the prolog and using scancel
while that prolog is running.
bug 1755
```
  bda0a436