Commits · 1e529c7174734c89f28221d27558b6fd1798be7f · Manuel G. Marciani / ces_slurm_simulator

26 Dec, 2018 4 commits
- Add ave watts to api and sview. · 1e529c71
  Felip Moll authored Dec 26, 2018
```
Bug 6283
```
  1e529c71
- Change sreport job reports · c3dfb3fd
  Felip Moll authored Dec 03, 2018
```
SizesByAccount and SizesByAccountAndWckey now defaults to display the
root account totals, or the exact requested Account=. To query like the old
behavior a new option 'AcctAsParent' has been added which will still shown
the subaccounts of the requested accounts in Account=.

Bug5793
```
  c3dfb3fd
- Fix debug2 prefix for sched log. · 83564971
  Marshall Garey authored Dec 26, 2018
```
Bug 6289
```
  83564971
- Fix minor memory leak when scheduling rebootable nodes. · 7743c07b
  Morris Jette authored Dec 26, 2018
```
Introduced in d9b9eb23

Bug 6292
```
  7743c07b
21 Dec, 2018 4 commits
- Add XCC plugin for reading Lenovo Power. · e79d9469
  Danny Auble authored Dec 13, 2018
```
Bug 5441

Co-authored-by: Felip Moll <felip.moll@schedmd.com>
Co-authored-by: Danny Auble <da@schedmd.com>
```
  e79d9469
- Store ave watts in energy plugins. · a530e1ee
  Felip Moll authored Dec 21, 2018
```
Bug 5441
```
  a530e1ee
- Better debugging when a job doesn't have a job_resrcs ptr. · dc583bd1
  Dominik Bartkiewicz authored Dec 21, 2018
```
Bug 5971
```
  dc583bd1
- Don't assume the first node of a job is the batch host when purging jobs · 7da439b4
  Dominik Bartkiewicz authored Dec 21, 2018
```
from a node.

Bug 5971
```
  7da439b4
20 Dec, 2018 3 commits
- Read gres.conf for cloud nodes on slurmctld. · 200dbf17
  Brian Christiansen authored Dec 19, 2018
```
Bug 6229
```
  200dbf17
- Fix double accounting of energy at end of job · 196ada03
  Danny Auble authored Dec 19, 2018
```
Bug 6239
```
  196ada03
- Read gres.conf for cloud nodes on slurmctld. · 1bceb1d9
  Brian Christiansen authored Dec 19, 2018
```
Bug 6229
```
  1bceb1d9
19 Dec, 2018 2 commits
- Honor --quiet flag in sbatch for printing job id · 4b765fde
  Nate Rini authored Dec 11, 2018
```
bug 6197
```
  4b765fde
- Fix double accounting of energy at end of job · 66e000a4
  Danny Auble authored Dec 19, 2018
```
Bug 6239
```
  66e000a4
18 Dec, 2018 5 commits
- Better debug for bad values in gres.conf · a1b0b2d9
  Moe Jette authored Dec 18, 2018
```
backport of 03ada72e

Bug 5682
```
  a1b0b2d9
- cons_res: Prevent overflow on multiply · df57ef17
  Dominik Bartkiewicz authored Feb 13, 2018
```
Bug 5682 and 4584

Backport of ba07a6e09b6071
```
  df57ef17
- Fix NodeFeaturesPlugins=node_features/knl_generic to allow other gres · 4d433d38
  Dominik Bartkiewicz authored Dec 18, 2018
```
other than knl.

Bug 5603
```
  4d433d38
- Insure that "hbm" is a configured GresType on knl systems. · c7d66c8d
  Danny Auble authored Dec 18, 2018
```
Bug 5603
```
  c7d66c8d
- Avoid bit offset of -1 in call to bit_nclear(). · 0e802d92
  Moe Jette authored Dec 18, 2018
```
This patch also simplifies the code using bit_cnt instead of
_bitstr_bits().  This isn't really necessary, but does look cleaner.

Bug 6216
```
  0e802d92
17 Dec, 2018 1 commit
- Fix sacctmgr show runawayjobs from sibling cluster · f9adfede
  Broderick Gardner authored Dec 14, 2018
```
It was using the local cluster, sending the query to the wrong table

Bug 6159
```
  f9adfede
14 Dec, 2018 2 commits
- Fix sacctmgr show events event=cluster · 1b0e01ba
  Brian Christiansen authored Dec 14, 2018
```
Bug 6237
```
  1b0e01ba
- Backfill - If a job has a time_limit guess the end time of a job better · 8c919742
  Dominik Bartkiewicz authored Dec 14, 2018
```
if OverTimeLimit is Unlimited.

Bug 6093
```
  8c919742
11 Dec, 2018 1 commit
- Start NEWS for v18.08.5 · 17e96ba6
  Tim Wickberg authored Dec 11, 2018
  
  17e96ba6
09 Dec, 2018 1 commit

Tim Wickberg authored Dec 08, 2018

New X11 forwarding code will only support forwarding back to
salloc or an allocating srun command.

Using this option within sbatch was always hit-or-miss. If the
user submitting was disconnected from the alloc host for any
reason their xauth credentials would likely fail even if they
managed to get assigned the same local TCP port for forwarding.

Bug 3647.

c9728469

07 Dec, 2018 4 commits

pam_slurm_adopt: Use uid to determine whether root is logging. · 17c63947

Matthias Gerstner authored Dec 07, 2018

In some systems there can be multiple user accounts for uid 0, therefore
the check for literal user name "root" might be insufficient.

Bug 6184

17c63947

pam_slurm_adopt: avoid running outside of the sshd PAM service context · 4f954bd8

Matthias Gerstner authored Dec 05, 2018

This pam module is tailored towards running in the context of remote ssh
logins. When running in a different context like a local sudo call then
the module could be influenced by e.g. passing environment variables
like SLURM_CONF.

By limiting the module to only perform its actions when running in the
sshd context by default this situation can be avoided. An additional pam
module argument service=<service> allows an Administrator to control
this behavior, if different behavior is explicitly desired.

Bug 6184

4f954bd8

salloc/sbatch/srun - print warning if both --mem and --mem-per-cpu are set. · 13a606a4

Nate Rini authored Dec 07, 2018

Only print a warning for 18.08. If a user has SLURM_MEM_PER_CPU or
SLURM_MEM_PER_NODE environment variables set for some reason this
situation could be happening by accident, and we don't want to prevent
the srun command from launching steps at this point.

Bug 6058.

13a606a4

Expand %x in 'scontrol show job' and related API calls. · 0a125e08
Broderick Gardner authored Dec 07, 2018
```
Bug 5648.
```
0a125e08

06 Dec, 2018 5 commits

Bump RLIMIT_NOFILE for daemons in systemd services · 7f2e6a7e

Janne Blomqvist authored Dec 05, 2018

The Linux kernel default hard limit of 4096 for the number of file
descriptors is quite small. Debian/Ubuntu have for a long time
overridden this, increasing it to 1M. Recently systemd also bumped the
default to 512k.

https://github.com/systemd/systemd/blob/master/NEWS

https://github.com/systemd/systemd/pull/10244

https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/thread/ZN5TK3D6L7SE46KGXICUKLKPX2LQISVX/

https://github.com/systemd/systemd/commit/09dad04c49cae3ad2b319c9b4e7773fedd34309a

Here the limits are increased as follows:

- slurmd: 128k; some workloads like Hadoop/Spark need a lot of fd's,
  and recommend that the limit is increased to at least 64k.

- slurmctld: 64k; per the Slurm high throughput and big system guides
  which recommend a file-max of at least 32k.

- slurmdbd: 64k, matching slurmctld, though slurmdbd shouldn't need
  that many fd's, bumping the limit shouldn't hurt either.

Bug 6171

7f2e6a7e

Fix formatting issues when printing uint64_t. · a04ae6d7
Tim Wickberg authored Dec 06, 2018
```
Bug 5248
```
a04ae6d7
job_submit/lua: Add user/group info to jobs. · 6bf90e85
Mike Nolta authored Dec 06, 2018
```
Bug 6055
```
6bf90e85

job_submit/lua: add several slurmctld return codes · c538e144

Mike Nolta authored Nov 16, 2018

Add the following slurmctld return codes to the lua plugin:

  ESLURM_ACCESS_DENIED
  ESLURM_ACCOUNTING_POLICY
  ESLURM_INVALID_NODE_COUNT
  ESLURM_JOB_MISSING_SIZE_SPECIFICATION
  ESLURM_MISSING_TIME_LIMIT

Bug 6055

c538e144

Rename "no_send_gids" to "disable_send_gids". · f05be686
Tim Wickberg authored Dec 06, 2018
```
Rework one timer error message while here.

Bug 5861.
```
f05be686

05 Dec, 2018 8 commits

Run SlurmctldPrimaryOffProg when the primary shuts down. · ee29bba8
Felip Moll authored Dec 05, 2018
```
Backups already run it when dropping to backup.

Bug 6098.
```
ee29bba8
Run SlurmctldPrimaryOffProg when the primary shuts down. · ba491557
Felip Moll authored Dec 05, 2018
```
Backups already run it when dropping to backup.

Bug 6098.
```
ba491557

pam_slurm_adopt - send an error message to the user if no Jobs found. · 9fb15b4a

Marshall Garey authored Dec 05, 2018

Also throw an error message within stepd_available() if the nodename
is not set or cannot be inferred correctly.

Bug 5399.

9fb15b4a

Add NEWS entry for 893bb1de . · 06dde2f8
Tim Wickberg authored Dec 05, 2018

06dde2f8
Fix missing suffixes in squeue. · 9b0399b8
Trey Dockendorf authored Dec 05, 2018
```
Bug 6120
```
9b0399b8
Decrease an error message to be debug. · 639b3e87
Tim Wickberg authored Dec 05, 2018
```
Bug 6155
```
639b3e87
Decrement message_connections in stepd code on error path correctly. · 57daec20
Tim Wickberg authored Dec 05, 2018
```
Bug 6155
```
57daec20

Add bf_ignore_newly_avail_nodes option to SchedulerParameters. · 5ad1447e

Felip Moll authored Oct 19, 2018

When bf_continue is set, and locks are released during a backfill cycle,
other operations can make new resorces available while part way through
the queue. When backfill continues the cycle and evaluates new jobs, it
may allocate some of these newly available resources to lower priority jobs,
rather than to higher priority jobs that were already considered in this
backfill cycle.

This patch introduces bf_ignore_newly_avail_nodes to SchedulerParameters
to solve this issue. This option will ignore nodes made available when
the backfill scheduler yields when resuming the backfill cycle.

Bug 5279.

5ad1447e