Commits · 71bbaf6d5bb1301e7ce3e04499237ab311d0c85a · Manuel G. Marciani / ces_slurm_simulator

01 Mar, 2017 2 commits
- Print formatted tres string when creating/updating a reservation. · 8dfffa28
  Danny Auble authored Feb 28, 2017
  
  8dfffa28
- Fix print of consumed energy in sstat when no energy is being collected. · 9a168d20
  Danny Auble authored Feb 28, 2017
  
  9a168d20
28 Feb, 2017 4 commits

If gres is NULL on a job don't try to process it when returning detailed · f7a24285
Dominik Bartkiewicz authored Feb 28, 2017
```
information about a job to scontrol.
```
f7a24285
Fix missing locks in gres logic to avoid potential memory race. · 58a2f450
Dominik Bartkiewicz authored Feb 28, 2017

58a2f450

Remove unneeded job lock when running assoc_mgr cache. This lock could · b17e2aee

Danny Auble authored Feb 28, 2017

cause potential deadlock when/if TRES changed in the database and the
slurmctld wasn't made aware of the change.  This would be very rare.

The lock was originally there to keep new jobs from grabbing the assoc
information.  If the lock was done afterwards the worst case is we get the new
information.

b17e2aee

Fix deadlock scenario when dumping configuration in the slurmctld. · 0c5e3508

Danny Auble authored Feb 28, 2017

It was determined we didn't need the write locks on the job and no locks were
needed on the node either.

Doing these different locked beforehand would create a window where you could
get a config write lock

0c5e3508

27 Feb, 2017 3 commits

Update slurm.spec file to note obsolete RPMs. · 95cf960a
Daniel Letai authored Feb 27, 2017

95cf960a

Reset job update time when job is held after begin failure · e62d288d

Morris Jette authored Feb 27, 2017

This will be triggered after either a burst buffer job_begin function
  or select plugin job_begin function fails. Without this change, the
  "squeue -i" and "scontrol show job" commands can report old job
  state information.
bug 3504

e62d288d

Burst_buffer/cray - Prevent slurmctld abort · 733b57dc

Tim Wickberg authored Feb 27, 2017

Burst_buffer/cray - Prevent slurmctld daemon abort if "paths" operation
    fails. Now job will be held.
bug 3504

733b57dc

24 Feb, 2017 6 commits
- job_submit/lua - Add job "bitflags" field · f9fcae35
  Josko Plazonic authored Feb 24, 2017
```
bug 3182
```
  f9fcae35
- Add %x to sbatch/srun filename pattern to represent the job name. · 1a4237a3
  Tim Shaw authored Feb 24, 2017
  
  1a4237a3
- Update to sbatch/srun man pages to explain the "filename pattern" clearer · 047b991d
  Tim Shaw authored Feb 24, 2017
  
  047b991d
- Modify pam module to work when configured NodeName and NodeHostname differ · 1ff7252b
  Don Lipari authored Feb 24, 2017
```
bug 3473
```
  1ff7252b
- Add 17.02.1 to NEWS · d8376bec
  Danny Auble authored Feb 23, 2017
  
  d8376bec
- Update META for 17.02.0 tag · 2c5d4afc
  Danny Auble authored Feb 23, 2017
  
  2c5d4afc
23 Feb, 2017 6 commits
- Fix packing of NULL slurmdb_reservation_cond_t · df133644
  Brian Christiansen authored Feb 23, 2017
  
  df133644
- Fix packing of NULL slurmdb_clus_res_rec_t · 2260e158
  Brian Christiansen authored Feb 23, 2017
  
  2260e158
- Fix squeue to not limit the size of partition, burst_buffer, exec_host, or · 5a4a6044
  Danny Auble authored Feb 23, 2017
```
reason to 32 chars.
```
  5a4a6044
- Propogate NEWS from v15.08 to v16.05 · f49dba56
  Morris Jette authored Feb 23, 2017
  
  f49dba56
- Correct job resize script · f42f6943
  Morris Jette authored Feb 23, 2017
```
For job resize, correct logic to build "resize" script with new values.
    Previously the scripts were based upon the original job size.
bug 3498
```
  f42f6943
- slurm.spec - only install init scripts if service scripts aren't. · faf9b413
  Tim Wickberg authored Feb 22, 2017
```
Do not enable init scripts if not present.

Please note that, unlike the init scripts, service files are not
automatically enabled at this time.

Bug 3371.
```
  faf9b413
22 Feb, 2017 3 commits

Fix node reboot timing bug · 8431929d

Morris Jette authored Feb 22, 2017

If node boot in progress when slurmctld daemon is restarted, then allow
    sufficient time for reboot to complete and not prematurely DOWN the node as
    "Not responding".
bug 3494

8431929d

Fix for possible squeue parsing failure · 7b226965
Morris Jette authored Feb 21, 2017
```
Could result in squeue abort
Coverity error CID 44969
```
7b226965

squeue to load new data if job_id or user_id specified · dbf9a211

Morris Jette authored Feb 21, 2017

Reduces possibility of old data if job_id or user_id option specified
  with iterate option
Coverity error CID 44783

dbf9a211

21 Feb, 2017 1 commit

Increased maximum file size supported by sbcast · ee5fea6d

Morris Jette authored Feb 21, 2017

Increased maximum file size supported by sbcast from 2 GB (32-bit integer
    to 64-bits). This required changing the file broadcast RPC and several
    internal variables.
bug 3485

ee5fea6d

18 Feb, 2017 2 commits

Added ability to override the invoking uid for "scontrol update job" · 1e42df07
Tim Shaw authored Feb 17, 2017
```
by specifying "--uid=<uid>|-u <uid>".

# Conflicts:
#	NEWS
```
1e42df07

Fix controller/cmds talking to a pre-released DBD · ec350f17

Brian Christiansen authored Feb 17, 2017

A 17.02 controller,sacctmgr couldn't talk to a "master/17.11" DBD
because the 17.02 client was talking attempting to talk to the DBD with
the 17.02's MIN_PROTOCOL_VERSION -- which was 15.08 and is more than 2
version behind the master. The master's MIN_PROTOCOL_VERSION is 16.05,
so it couldn't unpack the messages.

The controller should always communicate at it's current protocol to the
DBD.

For federations, it's possible that a higher version controller could
talk to a lower version controller. So the cluster needs to talk to the
remote cluster using the remote cluster's protocol version -- which is
given back from the DBD.

ec350f17

17 Feb, 2017 3 commits

Add 'preempt_youngest_order' option to preempt/partition_prio plugin. · 4e045105

Dominik Bartkiewicz authored Feb 17, 2017

Enable through SchedulerParameters. Will sort by youngest jobs first,
rather than based on priority. Use alongside 'preempt_strict_order' if
you don't want the plugin to try to further optimize the preemption
list.

Bug 3457.

4e045105

Fix potential race condition in job_time_limit. · cc82087a

Dominik Bartkiewicz authored Feb 16, 2017

Introduced by commit 059275f6 when the timer is trigger.
Releasing the locks means that job_ptr may point to an element that was
deleted by a different thread in the meantime. Restructuring the code
to advance the iterator prevents this - the iterator itself does not have
this issue as the List structure will manage the position during the
sleep().

While here, move the reservation update handling outside of this
loop to simplify operation. This does not need to piggy-back on the
scan of the job_list - switching to using list_for_each should
mitigate some of the performance loss by needing a second full pass.

Bug 3414.

cc82087a

job_submit/lua - remove access to reservation job_run_cnt/job_pend_cnt fields. · 7489e3fe
Tim Wickberg authored Feb 16, 2017
```
These were mis-calculated previously, and are internal implementation details
that weren't meant to be exposed.
```
7489e3fe

16 Feb, 2017 4 commits
- Fix correct state reason when job can't run 'safely' because of an · fad27852
  Josh Samuelson authored Feb 16, 2017
```
association GrpWall limit.
```
  fad27852
- Better debug output when a job is being held because of a GrpTRES[Run]Min · 325d674a
  Danny Auble authored Feb 16, 2017
```
limits.
```
  325d674a
- Fix correct variables when validating GrpTresMins on a QOS. · 92d2c645
  Josh Samuelson authored Feb 16, 2017
```
Bug 3476
```
  92d2c645
- Fix comments in acct_policy.c to reflect actual variables instead of · 4cfe6bde
  Danny Auble authored Feb 16, 2017
```
old ones.

This is cosmetic only, no code change.

Bug 3476
```
  4cfe6bde
15 Feb, 2017 6 commits

Fix squeue when SLURM_BITSTR_LEN=0 is set in the user environment. · 0ea581a7
Danny Auble authored Feb 15, 2017
```
Bug 3472
```
0ea581a7

Prevent deadlocked slurmstepd processes due to unsafe use of regcomp. · fe193906

Tim Wickberg authored Feb 15, 2017

regcomp() is not safe to use across a fork in older glibc versions.
Reinitialize the keyvalue_re structure after the fork through an atfork()
handler.

Bug 3276.

fe193906

Make it so configure uses C as the compiler instead of C++. · 17476194

Danny Auble authored Feb 15, 2017

This is a regression from commit b818dd9d

Basically the first AC_LINK_IFELSE sets whatever compiler we
are using to be that.  Since the above commit removed the BGL/P
code that was linked using C C++ became the compiler since the
next thing was BGQ in configure.ac to test against.

I just grabbed the DATABASES call, but any other one could had
worked.

17476194

Fix ordering of step task allocation to fill in a socket before going into · e7347799

Danny Auble authored Feb 15, 2017

another one.

Bug 3465

This is the way it is done with the task plugins.  It appears this only
really matters when requesting 1 task with a full socket with exclusive
access.  This code would cyclically allocate sockets to the step instead
of filling up one socket then going to the next.

e7347799

Fix for job allocation with certain options · d2123a52

Morris Jette authored Feb 15, 2017

Fix for job constraint specification with counts, --ntasks-per-node value,
    and no node count.
bug 3470

d2123a52

task/cgroup logging change · b57a34a6

Morris Jette authored Feb 14, 2017

Task/cray: Treat missing "mems" cgroup with "debug" messages rather than
"error" messages. The file may be missing at step termination due to a
change in how cgroups are released at job/step end.

b57a34a6