Commits · 4e8b77fa1ec12b7b7f5ea94a65b5b5ac5959b606 · Manuel G. Marciani / ces_slurm_simulator

27 Jul, 2018 2 commits
- Remove erroneous unlock in acct_gather_energy/ipmi. · 4e8b77fa
  Danny Auble authored Jul 27, 2018
```
Bug 5468

This is a backport of commit cefc9ec1.
```
  4e8b77fa
- Fix segfault in slurmctld when a job's node bitmap is NULL during a · fef07a40
  Dominik Bartkiewicz authored Jul 27, 2018
```
scheduling cycle.  Primarily caused by EnforcePartLimits=ALL.

Bug 5452
```
  fef07a40
24 Jul, 2018 1 commit
- Fix spelling in man page · 074b0ea0
  Brian Christiansen authored Jul 24, 2018
  
  074b0ea0
19 Jul, 2018 7 commits

Start NEWS for v17.11.9 · 8b27b9c9
Tim Wickberg authored Jul 19, 2018

8b27b9c9
Update META for v17.11.8. · 07ad0727
Tim Wickberg authored Jul 19, 2018
```
Update slurm.spec and slurm.spec-legacy as well.
```
07ad0727
Add NEWS entry missed on prior commit. · 380abb0b
Tim Wickberg authored Jul 19, 2018

380abb0b

Use one macro for all listen() backlog arguments. · b039ba24

Tim Wickberg authored Jul 19, 2018

The lower limit of 1024 may be too short for srun with large-scale
jobs, and lead to problems processing task completion messages in a
timely fashion.

Rather than adjust that, unify the two separate macros into
SLURM_DEFAULT_LISTEN_BACKLOG with the higer 4096 value.

Bug 5164.

b039ba24

Add Delegate=yes to slurmd.service file to prevent systemd from interfering. · cecb39ff

Tim Wickberg authored Jul 19, 2018

Without Delegate=yes, systemd will "fix" the cgroup hierarchies whenever
'systemctl daemon-reload' is called, which will then remove any
restrictions placed on memory or device access for a given job.

This is a problem especially since 'systemctl daemon-reload' may be called
automatically by rpm/yum or a variety of config file mangers, leading to
jobs escaping from slurmd/slurmstepd's control.

This setting should work for systemd versions >= 205.
https://www.freedesktop.org/wiki/Software/systemd/ControlGroupInterface/

Bug 5292.

cecb39ff

Merge branch 'slurm-17.02' into slurm-17.11 · 954830f5
Tim Wickberg authored Jul 19, 2018

954830f5

Fix segfault in hourly rollup · 346ce48b

Felip Moll authored Jul 19, 2018

When a job with time_end=0 and TRES null exists from an association that is
currently inside a reservation, the hourly rollup segfaults.

Bug 5143

346ce48b

18 Jul, 2018 5 commits
- Prevent possible divide by zero in _validate_time_limit(). · 993ce884
  Dominik Bartkiewicz authored Jul 18, 2018
```
As reported by Avalon Johnson on slurm-users
https://groups.google.com/forum/#!topic/slurm-users/BsMQ7Uk1PLw
Bug 5287.
```
  993ce884
- Fix grammar in RebootProgram docs · 72b4f3c4
  Brian Christiansen authored Jul 17, 2018
  
  72b4f3c4
- Fix printing off --hint options for sbatch, salloc · 17e6e23b
  Brian Christiansen authored Jul 16, 2018
```
srun was already fixed in b7053bda (Bug 3294).

Bug 5126
```
  17e6e23b
- Add xstrstr() · 40abb764
  Brian Christiansen authored Jul 16, 2018
  
  40abb764
- Docs - Change to using 'show engines' for verifying InnoDB availability. · 79fd5e83
  Broderick Gardner authored Jul 17, 2018
```
'have_innodb' is deprecated.

Bug 5317.
```
  79fd5e83
17 Jul, 2018 5 commits

Fix for formating when printing arrays in squeue · f1991701

Felip Moll authored Jul 17, 2018

When printing arrays in squeue and setting the SLURM_BITSTR_LEN variable to 0
or to NULL, the length of the output defaulted to 64, when the documentation
says it will default to "unlimited". This patch fixes this situation.

Bug 5440

f1991701

Docs - fix reference to enable_user_top option. · 29cc55b7
Marshall Garey authored Jul 16, 2018
```
Logic was switched around in 17.11, enable_user_top is now the
correct option.

Bug 5165.
```
29cc55b7

Docs - Clarify MPI apps don't work with hetjobs in 17.11. · 3060b62e

Alejandro Sanchez authored Jul 16, 2018

This is not working reliably even when setting
SchedulerParameters=enable_hetero_steps and/or using OpenMPI with Slurm's
mpi/pmi2, as it was previously documented.

Bug 5309.

3060b62e

Fix incorrect locking in _init_power_save. · 1f8ede44

Marshall Garey authored Jul 16, 2018

Documented, and code reads as needing, the node lock. But these
were incorrectly set as the job locks.

Bug 5394.

1f8ede44

Fix incorrect locking in _slurm_rpc_resv_delete(). · 45e029c5

Dominik Bartkiewicz authored Jul 16, 2018

Needs the job write lock, as it may change job status not just node
status. Especially after commit 33e352a6.

Bug 5406.

45e029c5

16 Jul, 2018 2 commits

Clarify RawUsage is TRES-seconds. · d971dd99

Marshall Garey authored Jul 16, 2018

Previously, usage was just CPU-seconds. However, since
TresBillingWeights has been added, usage gets calculated from that, or
it's just CPU-seconds if TresBillingWeights isn't defined.

Bug 5434.

d971dd99

Fix typo in faq · 0c1d3400
Felip Moll authored Jul 16, 2018

0c1d3400

13 Jul, 2018 1 commit

SlurmDBD - improve error message on archive load failure. · 1c27a2e6

Isaac Hartung authored Jul 12, 2018

Add errno to info message in the SlurmDBD log, and pass the actual
errno back to the sacctmgr process so the user can see it.

Bug 5152.

1c27a2e6

12 Jul, 2018 4 commits
- mpi/pmix: fixed the collectives canceling · f15c8183
  Boris Karasev authored Jun 16, 2018
```
- avoid `abort()` when collective is failed
- added logging of coll details for fail cases

Bug 5067
```
  f15c8183
- Make code compile with hdf5 1.10.2+ · 90c4e7e7
  Danny Auble authored Jul 12, 2018
```
Note, this is setting it up so we can use defunct functions.  It will
probably need to be properly fixed in a future version so we don't
do this.
```
  90c4e7e7
- Fix for potential deadlock in the assoc_mgr_get_user_assocs() · 80d38355
  Dominik Bartkiewicz authored Jul 12, 2018
```
Bug 5098.
```
  80d38355
- Fix issues with --exclusive=[user|mcs] to work correctly · 72736af2
  Dominik Bartkiewicz authored Jul 12, 2018
```
with preemption or when job requests a specific list of hosts.

Bug 5293.
```
  72736af2
09 Jul, 2018 1 commit
- Add news for 4daeedd8 · d10854d9
  Danny Auble authored Jul 09, 2018
  
  d10854d9
06 Jul, 2018 6 commits

Add workaround for importing newly install namespace packages · da2ecda8
Thea Flowers authored Jun 22, 2018
```
Bug 5395
```
da2ecda8
Fix potential segfault when closing the mpi/pmi2 plugin. · 4daeedd8
Danny Auble authored Jul 06, 2018
```
Bug 5390
```
4daeedd8

Fix leaking freezer cgroups. · 7f9c4f73

Marshall Garey authored Jul 06, 2018

Continuation of 923c9b37.

There is a delay in the cgroup system when moving a PID from one cgroup
to another. It is usually short, but if we don't wait for the PID to
move before removing cgroup directories the PID previously belonged to,
we could leak cgroups. This was previously fixed in the cpuset and
devices subsystems. This uses the same logic to fix the freezer
subsystem.

Bug 5082.

7f9c4f73

Combine duplicate code in cgroup fini functions. · 923c9b37

Marshall Garey authored Jul 06, 2018

cpuset and devices subsystems have duplicate code to cleanup the cgroup
and prevent leaking cgroups by moving the process to the root cgroup and
waiting for it to be moved.

Move this duplicate code to a common function so it can be used later by
the freezer subsystem.

Bug 5082.

923c9b37

Clarify Depth Mean Try Sched in sdiag man page · dd6ca4b0
Marshall Garey authored Jul 06, 2018
```
Bug 5227
```
dd6ca4b0
Fix test to make sure something happens to deem success. · 2f9a326e
Danny Auble authored Jul 05, 2018

2f9a326e

04 Jul, 2018 2 commits
- Add some corrections to FAQ and remove Slurm 1.3 string · 0985c8b1
  Felip Moll authored Jul 04, 2018
```
bug4451
```
  0985c8b1
- Combine the active and available node feature change logs · 3818159e
  Morris Jette authored Jul 04, 2018
```
So that multiple nodes changes will be reported on one line rather than one
line per node. Otherwise this could lead to performance issues when reloading
slurmctld in big systems.

Bug4980
```
  3818159e
03 Jul, 2018 2 commits

Clarify gres.conf Cores documentation · 3ee3795f

Felip Moll authored Jul 03, 2018

Slurm numbers the cores using an abstract index, starting from CPU 0
on the first socket, core, thread, and continuing until N on the last socket,
last core, last thread. Explain that in the documentation.

bug 5189

3ee3795f

Fix _step_signal() from always returning success · 2ab24e04
Brian Christiansen authored Jul 02, 2018
```
Currently, no caller checks the return code.

Bug 5164
```
2ab24e04

02 Jul, 2018 1 commit
- Update StoragePass docs password restrictions · 0c606741
  Marshall Garey authored Jul 02, 2018
```
Can't have # character in the password since it is treated as a comment.

Bug 5294
```
  0c606741
27 Jun, 2018 1 commit

Fix incorrect quoting in x_ac_debug.m4. · 2bde148f

Pär Lindfors authored Jun 27, 2018

Only produces a whitespace difference in configure. Inadvertently
introduced by commit 103ebaac.

Bug 5335.

2bde148f