Commits · bd7504fbe0986f87b3c63539f79a5d81cc122f56 · Manuel G. Marciani / ces_slurm_simulator

13 Feb, 2017 3 commits
- Fix burst_buffer/cray race condition · bd7504fb
  Morris Jette authored Feb 13, 2017
```
burst_buffer/cray - Do not execute "pre_run" operation until after all nodes
    are booted and ready for use.
bug 3461
```
  bd7504fb
- Fix minor memory leak in the slurmctld when removing a QOS. · 9984400a
  Danny Auble authored Feb 13, 2017
  
  9984400a
- Don't start job early · f6d42fdb
  Morris Jette authored Feb 13, 2017
```
Insure job does not start running before node is booted and PrologSlurmctld
    is complete.
bug 3446
```
  f6d42fdb
10 Feb, 2017 2 commits
- Protecting `volatile` variables to make Valgrind's drd happy. · 595b24f1
  Artem Polyakov authored Feb 10, 2017
```
Ported from 7a4aa7f2.
```
  595b24f1
- PMI2 - Make it possible to use %n or %h in a spool dir. · ef03f126
  Danny Auble authored Feb 10, 2017
  
  ef03f126
09 Feb, 2017 5 commits
- burst_buffer/cray - DataWarp parsing fix · b9d05100
  Morris Jette authored Feb 09, 2017
```
burst_buffer/cray - Support default pool which is not the first pool
    reported by DataWarp and log in Slurm when pools that are added or removed
    from DataWarp.
bug 3453
```
  b9d05100
- Commit for last patch · 018314a6
  Danny Auble authored Feb 08, 2017
  
  018314a6
- Revert "PMIX - Make it possible to use %n or %h in a spool dir." · 22ba56b9
  Danny Auble authored Feb 08, 2017
```
This reverts commit fd690a9c.
```
  22ba56b9
- PMIX - Make it possible to use %n or %h in a spool dir. · fd690a9c
  Danny Auble authored Feb 08, 2017
  
  fd690a9c
- Use volatile specifier to avoid flag caching. · 84e765d9
  Artem Polyakov authored Feb 08, 2017
  
  84e765d9
08 Feb, 2017 2 commits
- Fix record job state as PREEMPTED instead of REQUEUED. · 72c0b659
  Alejandro Sanchez authored Feb 07, 2017
```
Jobs preempted with PreemptMode=REQUEUE were incorrectly recorded as
REQUEUED in the accounting.

Bug 3444
```
  72c0b659
- power/cray - Disable power cap get and set operations on DOWN nodes · 08edc116
  Morris Jette authored Feb 07, 2017
```
bug 3448
```
  08edc116
07 Feb, 2017 1 commit
- Print message back to stderr when disabling affinity with --cpu_bind=verbose. · d4a286a4
  Dominik Bartkiewicz authored Feb 03, 2017
```
Bug 3447
```
  d4a286a4
03 Feb, 2017 1 commit
- Record job state as PREEMPTED instead of TIMEOUT when GraceTime is reached. · ba2c5584
  Alejandro Sanchez authored Feb 03, 2017
```
Bug 3444
```
  ba2c5584
31 Jan, 2017 2 commits
- NEWS update for next 16.05 version · b8aecfd0
  Danny Auble authored Jan 31, 2017
  
  b8aecfd0
- Make sure acct policy limits imposed on a job are correct after requeue. · 5d941217
  Alejandro Sanchez authored Jan 31, 2017
  
  5d941217
30 Jan, 2017 2 commits

Morris Jette authored Jan 30, 2017

Clear job's reason of "BeginTime" in a more timely fashion and/or prevents
    them from being stuck in a PENDING state. There are multiple ways of
    clearing the reason, especially on a lightly loaded system, but the
    state can persist indefinitely on a heavily loaded system.
bug 3368

0abbf727

will_run fix for job with begin time in past · f75abc9c

Morris Jette authored Jan 30, 2017

Fix to logic for getting expected start time of existing job ID with
explicit begin time that is in the past. Previous logic would
compare that (past) begin time with advanced reservations that
would compete with it rather than the current time.

f75abc9c

27 Jan, 2017 1 commit

Fix DBD cache restore from previous versions. · f31751fe

Danny Auble authored Jan 27, 2017

Turns out this never worked, ever. What used to happen is if the protocol_version that was
read in didn't match the rpc_version given to unpack things was just 0. What this does
now is set the rpc_version to what was stored making it all good.

f31751fe

26 Jan, 2017 1 commit
- Fix case where vestigial reservations were not purged. · 469e2de8
  Alejandro Sanchez authored Jan 26, 2017
```
Bug 3431
```
  469e2de8
25 Jan, 2017 4 commits
- burst_buffer/cray race condition fix · 60d682ff
  Morris Jette authored Jan 25, 2017
```
burst_buffer/cray - Fix race condition that could cause multiple batch job
    launch requests resulting in downed nodes.
bug 3366
```
  60d682ff
- Fix a few other minor memory leaks when uncommon failures occur. · 0085483a
  Dominik Bartkiewicz authored Jan 25, 2017
  
  0085483a
- Revert "MYSQL - Fix a few other minor memory leaks when uncommon failures occur." · b95e0323
  Danny Auble authored Jan 25, 2017
```
This reverts commit b9bff82f.
```
  b95e0323
- MYSQL - Fix a few other minor memory leaks when uncommon failures occur. · b9bff82f
  Danny Auble authored Jan 25, 2017
  
  b9bff82f
23 Jan, 2017 1 commit

fix slurmctld/agent race condition · 53784477

Morris Jette authored Jan 23, 2017

slurmctld/agent race condition fix: Prevent job launch while PrologSlurmctld
    daemon is running or node boot in progress.
bug 3366

53784477

20 Jan, 2017 1 commit

Fix mutlicluster options to work with newer ctlds · 8b430b6a

Brian Christiansen authored Jan 20, 2017

If a lower version client would try to communicate with a higher version
controller the dbd would return the controller's version and the client
would use that version to talk to the controller. When the controller
would respond, the client wouldn't know how to unpack the higher version
msg.

8b430b6a

19 Jan, 2017 2 commits
- News for last commit · b6c1e4e4
  Danny Auble authored Jan 19, 2017
  
  b6c1e4e4
- Missed NEWS on commit de3ee50a81d1d · 0a7d222f
  Danny Auble authored Jan 19, 2017
  
  0a7d222f
18 Jan, 2017 3 commits
- Make it so sacctmgr accepts column headers like MaxTRESPU and not MaxTRESP. · c675e0bc
  Danny Auble authored Jan 18, 2017
```
Bug 3398
```
  c675e0bc
- MYSQL - Fix minor memory leak when querying steps and the sql fails. · 18dec618
  Danny Auble authored Jan 18, 2017
  
  18dec618
- Prevent job timeout on node power up · 4114e6ce
  Morris Jette authored Jan 17, 2017
```
bug 3099
```
  4114e6ce
17 Jan, 2017 4 commits
- Revert "Require the normal scheduler to set/clear an Assoc/QOS limit on a job" · ef9546d0
  Danny Auble authored Jan 17, 2017
```
This reverts commit e92b49d3.
```
  ef9546d0
- Require the normal scheduler to set/clear an Assoc/QOS limit on a job · e92b49d3
  Dominik Bartkiewicz authored Jan 17, 2017
```
instead of also in the backfill scheduler.
```
  e92b49d3
- Fix debug2 message using wrong array index in _qos_job_runnable_post_select(). · 369bfd69
  Josh Samuelson authored Jan 17, 2017
```
Bug 3405.
```
  369bfd69
- Fix missing TRES read lock in acct_policy_job_runnable_pre_select() code. · 726c7cea
  Josh Samuelson authored Jan 17, 2017
```
acct_policy_job_runnable_pre_select() calls assoc_mgr_set_qos_tres_cnt()
without tres READ_LOCK.

Note that existing code does not modify the tres structures, so this
cannot currently lead to a race condition.

Bug 3406.
```
  726c7cea
15 Jan, 2017 1 commit
- Fix slurm.spec file for BlueGene builds. · 1d6addaf
  Michael Robbert authored Jan 14, 2017
```
job_submit/cnode was previously removed by commit 63bc71ed.

Bug 3403.
```
  1d6addaf
12 Jan, 2017 2 commits

burst_buffer/cray - Avoid pre_run operation if not required · 33fed094

Morris Jette authored Jan 11, 2017

burst_buffer/cray - Avoid "pre_run" operation if not using buffer (i.e.
    just creating or deleting a persistent burst buffer).
bug 3391

33fed094

Correct accounting info for jobs requeue due to burst buffer errors · 68b594fc

Morris Jette authored Jan 11, 2017

Previous job state information was "PENDING" rather than "REQUEUED"
  for each job requeued due to a burst buffer error.
bug 3388

68b594fc

11 Jan, 2017 2 commits

CRAY - Fix deadlock issue when updating accounting in the slurmctld and · 69567910

Danny Auble authored Jan 11, 2017

scheduling a Datawarp job.

The assoc_mgr lock needs to happen before the bb_state.bb_mutex.  One place
this could cause deadlock is from src/slurmctld/controller.c
_accounting_cluster_ready() which calls clusteracct_storage_g_cluster_tres
which inturn calls bb_g_job_set_tres_cnt which calls bb_p_job_set_tres_cnt
which will lock the bb_muxtex after the assoc_mgr is already locked.

Bug 3389

69567910

Improve performance of cr_sort_part_rows. · dc6a5220
Dominik Bartkiewicz authored Jan 11, 2017
```
Cache results of bit_set_count() calls.

Bug 3393.
```
dc6a5220