Commits · 33941f84f5a68825df6bdc70ab147bb36734587b · Manuel G. Marciani / ces_slurm_simulator

31 Jul, 2018 1 commit
- Enable support for hwloc version 2.0.1 · cbe7015e
  Morris Jette authored Jul 31, 2018
```
Bug 5070
```
  cbe7015e
27 Jul, 2018 3 commits
- Remove erroneous unlock in acct_gather_energy/ipmi. · 4e8b77fa
  Danny Auble authored Jul 27, 2018
```
Bug 5468

This is a backport of commit cefc9ec1.
```
  4e8b77fa
- Now pmi library resides in contribs just as pmi2 one. · 2d735d94
  Felip Moll authored May 08, 2018
```
Bug 4918
```
  2d735d94
- Fix segfault in slurmctld when a job's node bitmap is NULL during a · fef07a40
  Dominik Bartkiewicz authored Jul 27, 2018
```
scheduling cycle.  Primarily caused by EnforcePartLimits=ALL.

Bug 5452
```
  fef07a40
24 Jul, 2018 1 commit
- Added database InnoDB settings verification to accounting storage plugin init · 0368fb33
  Broderick Gardner authored Jun 20, 2018
```
Bug 5248.
```
  0368fb33
19 Jul, 2018 4 commits

Start NEWS for v17.11.9 · 8b27b9c9
Tim Wickberg authored Jul 19, 2018

8b27b9c9
Add NEWS entry missed on prior commit. · 380abb0b
Tim Wickberg authored Jul 19, 2018

380abb0b

Add Delegate=yes to slurmd.service file to prevent systemd from interfering. · cecb39ff

Tim Wickberg authored Jul 19, 2018

Without Delegate=yes, systemd will "fix" the cgroup hierarchies whenever
'systemctl daemon-reload' is called, which will then remove any
restrictions placed on memory or device access for a given job.

This is a problem especially since 'systemctl daemon-reload' may be called
automatically by rpm/yum or a variety of config file mangers, leading to
jobs escaping from slurmd/slurmstepd's control.

This setting should work for systemd versions >= 205.
https://www.freedesktop.org/wiki/Software/systemd/ControlGroupInterface/

Bug 5292.

cecb39ff

Fix segfault in hourly rollup · 346ce48b

Felip Moll authored Jul 19, 2018

When a job with time_end=0 and TRES null exists from an association that is
currently inside a reservation, the hourly rollup segfaults.

Bug 5143

346ce48b

18 Jul, 2018 4 commits

Prevent possible divide by zero in _validate_time_limit(). · 993ce884

Dominik Bartkiewicz authored Jul 18, 2018

As reported by Avalon Johnson on slurm-users
https://groups.google.com/forum/#!topic/slurm-users/BsMQ7Uk1PLw
Bug 5287.

993ce884

Fix printing off --hint options for sbatch, salloc · 17e6e23b
Brian Christiansen authored Jul 16, 2018
```
srun was already fixed in b7053bda (Bug 3294).

Bug 5126
```
17e6e23b
Add xstrstr() · 40abb764
Brian Christiansen authored Jul 16, 2018

40abb764

add job --gres-flags=disable-binding · aa61233b

Morris Jette authored Jul 17, 2018

Add salloc/sbatch/srun option of --gres-flags=disable-binding to disable
    filtering of CPUs with respect to generic resource locality. This option is
    currently required to use more CPUs than are bound to a GRES (i.e. if a GPU
    is bound to the CPUs on one socket, but resources on more than one socket
    are required to run the job). This option may permit a job to be allocated
    resources sooner than otherwise possible, but may result in lower job
    performance.
bug 5189

aa61233b

17 Jul, 2018 5 commits

Fix for formating when printing arrays in squeue · f1991701

Felip Moll authored Jul 17, 2018

When printing arrays in squeue and setting the SLURM_BITSTR_LEN variable to 0
or to NULL, the length of the output defaulted to 64, when the documentation
says it will default to "unlimited". This patch fixes this situation.

Bug 5440

f1991701

Fix incorrect locking in _init_power_save. · 1f8ede44

Marshall Garey authored Jul 16, 2018

Documented, and code reads as needing, the node lock. But these
were incorrectly set as the job locks.

Bug 5394.

1f8ede44

Fix incorrect locking in _slurm_rpc_resv_delete(). · 45e029c5

Dominik Bartkiewicz authored Jul 16, 2018

Needs the job write lock, as it may change job status not just node
status. Especially after commit 33e352a6.

Bug 5406.

45e029c5

Fix for handle directory names within '\' in it · a9a4a7da

Felip Moll authored Jul 17, 2018

Previously, slashes '\' in job->cwd were always expanded regardless of they
were part of the name of a directory or not.

Bug 4859

a9a4a7da

Improve escaping paths on user commands · 9c58150f

Felip Moll authored Jul 17, 2018

When dealing with special characters like %A, %u, %s and so on and escaping it
on the command line, problems arises when one have directories with multiple
slashes in their names. This patch fixes this situation removing only one
slash on each pair of slashes just as normal escaping works i.e. in bash.

Bug 4859

9c58150f

13 Jul, 2018 1 commit

SlurmDBD - improve error message on archive load failure. · 1c27a2e6

Isaac Hartung authored Jul 12, 2018

Add errno to info message in the SlurmDBD log, and pass the actual
errno back to the sacctmgr process so the user can see it.

Bug 5152.

1c27a2e6

12 Jul, 2018 3 commits

mpi/pmix: fixed the collectives canceling · f15c8183

Boris Karasev authored Jun 16, 2018

- avoid `abort()` when collective is failed
- added logging of coll details for fail cases

Bug 5067

f15c8183

Make code compile with hdf5 1.10.2+ · 90c4e7e7

Danny Auble authored Jul 12, 2018

Note, this is setting it up so we can use defunct functions.  It will
probably need to be properly fixed in a future version so we don't
do this.

90c4e7e7

Fix issues with --exclusive=[user|mcs] to work correctly · 72736af2
Dominik Bartkiewicz authored Jul 12, 2018
```
with preemption or when job requests a specific list of hosts.

Bug 5293.
```
72736af2

09 Jul, 2018 1 commit
- Add news for 4daeedd8 · d10854d9
  Danny Auble authored Jul 09, 2018
  
  d10854d9
06 Jul, 2018 1 commit

Fix leaking freezer cgroups. · 7f9c4f73

Marshall Garey authored Jul 06, 2018

Continuation of 923c9b37.

There is a delay in the cgroup system when moving a PID from one cgroup
to another. It is usually short, but if we don't wait for the PID to
move before removing cgroup directories the PID previously belonged to,
we could leak cgroups. This was previously fixed in the cpuset and
devices subsystems. This uses the same logic to fix the freezer
subsystem.

Bug 5082.

7f9c4f73

05 Jul, 2018 1 commit
- Make it so the slurmdbd's pid file gets created before initing · 7e47579f
  Danny Auble authored Jul 05, 2018
```
the database.

Bug 5247
```
  7e47579f
04 Jul, 2018 2 commits

Combine the active and available node feature change logs · 3818159e

Morris Jette authored Jul 04, 2018

So that multiple nodes changes will be reported on one line rather than one
line per node. Otherwise this could lead to performance issues when reloading
slurmctld in big systems.

Bug4980

3818159e

Fix read slurm.conf performance issues · 23e815c6

Felip Moll authored Jul 04, 2018

Cleaned up code that could've caused performance issues when reading config
and there was nodes with features defined.

bug4980

23e815c6

03 Jul, 2018 2 commits
- Added pending RPC stats to sdiag output · 6033f246
  Broderick Gardner authored Jul 02, 2018
```
bug 5337
```
  6033f246
- Fix _step_signal() from always returning success · 2ab24e04
  Brian Christiansen authored Jul 02, 2018
```
Currently, no caller checks the return code.

Bug 5164
```
  2ab24e04
26 Jun, 2018 4 commits

Fix problem when validating job memory on multi-partition requests. · f07f53fc

Dominik Bartkiewicz authored Jun 08, 2018

Some job fields can change in the course of scheduling. This patch
reinitializes previously adjusted job fields to their original value
when validating the job memory in multi-partition requests.

Bug 4895.

f07f53fc

Revert "Fix different issues when requesting memory per cpu/node." · d52d8f4f
Alejandro Sanchez authored Jun 08, 2018
```
This reverts commit bf4cb0b1.

Bug 5240, Bug 4895 and Bug 4976.
```
d52d8f4f

Prevent reboot of busy KNL node when asking for inactive features. · d8c5379b

Felip Moll authored Jun 26, 2018

When one asks for an inactive feature and also specifies the node with -w flag,
the node will be rebooted despite it may contain running jobs.

bug4821

d8c5379b

Reorder proctrack/task plugin load in the slurmstepd to match that of slurmd · 164da888
Tim Wickberg authored Jun 25, 2018
```
and avoid race condition calling task before proctrack can introduce.

Bug 5319
```
164da888

25 Jun, 2018 1 commit
- Add new job dependency type of "afterburstbuffer". The pending job will be · 3d4baee9
  Morris Jette authored Jun 25, 2018
```
delayed until the first job completes execution and it's burst buffer
stage-out is completed.

Bug 4675
```
  3d4baee9
22 Jun, 2018 2 commits
- Define alternate MailProg configuration parameter · 914ef205
  Morris Jette authored Jun 22, 2018
```
If MailProg is not configured and "/bin/mail" (the default) does
not exist, but "/usr/bin/mail" does exist then use "/usr/bin/mail"
as a default value.
```
  914ef205
- Prevent slurmctld from abort when attempting to set non-existing qos as def_qos_id · c9682e1a
  Dominik Bartkiewicz authored Jun 22, 2018
```
Bug 5159.
```
  c9682e1a
20 Jun, 2018 2 commits

Add partition name to will_run_response_msg · b577ab71
Morris Jette authored Jun 20, 2018

b577ab71

Make job_start_data() multi partition aware on REQUEST_JOB_WILL_RUN. · 35a13703

Alejandro Sanchez authored Jun 20, 2018

Previously the function was only testing against the first partition in
the job_record. Now it detects if the job request is multi partition and
if so then loops through all of them until the job will run in any or
until the end of the list, returning the error code from the last one if
the job won't run in any partition.

Bug 5185

35a13703

19 Jun, 2018 2 commits

Don't enforce MaxQueryTimeRange with specific jobs · d41cb31a

Isaac Hartung authored Jun 19, 2018

When requesting specific jobids with sacct, the starttime of the request
is 0, which will cause the time range to be outside of the
MaxQueryTimeRange range -- if specified. When requesting specific
jobids, sacct should be able to find the job whenever it started --
unless confined to a smaller range with -S and/or -E.

Bug 5009

d41cb31a

heterogeneous job scheduling fix · 118a73b6

Morris Jette authored Jun 19, 2018

For heterogeneous job component with required nodes, explicitly exclude
    those nodes from all other job components.

118a73b6