Commits · b26925a6f6c94cc7c3d42c9f60ff8c7fbfd1218d · Manuel G. Marciani / ces_slurm_simulator

29 Aug, 2019 12 commits
- Merge branch 'slurm-18.08' into slurm-19.05 · b26925a6
  Alejandro Sanchez authored Aug 29, 2019
  
  b26925a6
- Testsuite - add time limit to test12.10 job submissions. · ca123ee5
  Albert Gil authored Jun 07, 2019
```
Bug 7149.
```
  ca123ee5
- Fix undefined variable in test · 1fc2c26f
  Brian Christiansen authored Aug 29, 2019
  
  1fc2c26f
- Merge branch 'bug7445' into slurm-19.05 · daf57df7
  Brian Christiansen authored Aug 29, 2019
  
  daf57df7
- Fix batch_step being created with correct batch_host when using --batch · 55a96bcf
  Brian Christiansen authored Aug 22, 2019
```
Continuations of a2f2894f



Bug 7445

Signed-off-by: Marshall Garey <marshall@schedmd.com>
```
  55a96bcf
- Fix getting batch_host after its been set when requesting --batch · fe2bf9bf
  Brian Christiansen authored Aug 21, 2019
```
Continuation of 30bbc11d



Bug 7445

Signed-off-by: Dominik Bartkiewicz <bart@schedmd.com>
```
  fe2bf9bf
- Add comment in pick_batch_host() · 4b4a11cc
  Brian Christiansen authored Aug 20, 2019
```
Bug 7445

Signed-off-by: Dominik Bartkiewicz <bart@schedmd.com>
```
  4b4a11cc
- Make --batch requests wait for all nodes to boot before launching · 76956c87
  Brian Christiansen authored Aug 20, 2019
```
When --batch=<feature> is used, the batch_host isn't chosen until the
job is being launched -- because the features could be different on boot
(e.g. KNL nodes). Thus if the job is allocated nodes that need to be
booted, it needs to wait till they are all booted so it can make a
decision at launch time.

Bug 7445

Signed-off-by: Dominik Bartkiewicz <bart@schedmd.com>
```
  76956c87
- Don't assume the first node of a job is the batch host · 875cbf9c
  Dominik Bartkiewicz authored Jun 18, 2019
```
This is a continuation to 7da439b4

Bug 7445
```
  875cbf9c
- Merge branch 'slurm-18.08' into slurm-19.05 · 4ff334a0
  Alejandro Sanchez authored Aug 29, 2019
  
  4ff334a0
- Testsuite - fix for test17.25 long default account names. · d3c3906c
  Albert Gil authored Jul 08, 2019
```
The get_default_acct was truncating the account name for long account
names. This commit uses -P/--parsable2 to avoid it.

Bug 7369.
```
  d3c3906c
- pam_slurm_adopt - print errno message when using stat() on cgroup path. · 73ff0e81
  Marcin Stolarek authored Aug 08, 2019
```
Bug 7467.
```
  73ff0e81
28 Aug, 2019 1 commit

Don't update [min|max]_exit_code on job array task requeue. · 0e42eb87

Alejandro Sanchez authored Aug 28, 2019

Only do so once the task actually finishes. Otherwise, a requeued task
could set an incorrect max_exit_code even if completed with exit code 0
after re-running again, leading to problems with i.e. other jobs with an
afterok type of dependency on such array relying on the incorrectly set
max_exit_code.

Bug 7552.

0e42eb87

26 Aug, 2019 3 commits

Continuation of commit . · e57b297e

Danny Auble authored Aug 26, 2019



We only remove from registered_clusters if conn->rem_port != 0,
so only add to it if the same.

Bug 5213

Signed-off-by: Alejandro Sanchez <alex@schedmd.com>

e57b297e

Remove stray zero-width space characters from x_ac_nvml.m4 comment. · 7a29a3cc
Tim Wickberg authored Aug 26, 2019

7a29a3cc

Soften langauge in log message about topology. · cad50250

Marshall Garey authored Jul 24, 2019

The previous log message implied that you should never use the topology
plugin where no switch could reach all nodes through its descendants.
However, this is a valid configuration where sites may not want jobs
spanning across certain switches, so we've softened the language in the
log message.

Bug 7466.

cad50250

23 Aug, 2019 3 commits

valid_feature_counts should not take care about XOR/XAND features · 1c051c61

Marcin Stolarek authored Aug 01, 2019

In case of features like cpu&fastio&[knl|westmere] additional bit_or
resulted in returning something like (cpu&fastio)|knl|westmere, which
is obviously wrong. XOR/XAND features are handled properly in
_get_req_features.

Bug 7378

1c051c61

improve debug logging in valid_feature_counts · 04756b78
Marcin Stolarek authored Aug 01, 2019
```
Display nodenames instead of bitmap ranges
```
04756b78

HTML docs faq, example for slurmd restart, should use systemctl restart · 26a1cd8a

Marcin Stolarek authored Aug 23, 2019

We changed FAQ in 4cea931c we replaced stop/start of slurmd with just
restart, but the example now suggest to use systemctl start which will
actually do nothing in case of started slurmd.

26a1cd8a

20 Aug, 2019 2 commits

Handle situation where a slurmctld tries to communicate with slurmdbd more... · af7b4531

Danny Auble authored Aug 12, 2019

Handle situation where a slurmctld tries to communicate with slurmdbd more than once at the same time.

What can happen here is the slurmdbd/slurmctld connection gets hung up
somehow. If the slurmctld is restarted a new connection is made along
side the old connection. When the old connection gets unwedged the old
connection will clear out the registration of the slurmctld making it so
no updates are sent to that slurmctld.

What this does is checks for old connections when a registration message
comes in. If we find one we print error set the rem_port = 0 and
remove it from the list. This makes it so when it gets unwedged we just
close the socket instead of remove the registration.

Bug 5213

af7b4531

Fix NEWS entry for the previous commit a04eea2e. · d0729247
Alejandro Sanchez authored Aug 20, 2019
```
Bug 7360.
```
d0729247

19 Aug, 2019 6 commits
- Detach threads once they are done to avoid having to join them · a04eea2e
  Danny Auble authored Aug 19, 2019
```
in track scripts code.

Bug 7360


Signed-off-by: Alejandro Sanchez <alex@schedmd.com>
```
  a04eea2e
- Fix unaccounted TRESRunMins usage from HetJobs · 1da9c5d0
  Broderick Gardner authored Jun 26, 2019
```
The implementation of priority_p_job_end in priority/multifactor
expects the job state to be set to complete or completing in order to
properly remove some job usage from the assoc and qos. This must be
simulated by the pack job run check code, or the check-time usage is not
removed.

Bug 7284
```
  1da9c5d0
- Merge branch 'bug7428' into slurm-19.05 · cfe4697b
  Brian Christiansen authored Aug 19, 2019
  
  cfe4697b
- Update power_save.html with current logic and recommendations. · a7ff04ae
  Brian Christiansen authored Aug 16, 2019
```
Bug 7428
```
  a7ff04ae
- Improve power_save.html Don't suggest wait_job in PrologSlurmctld · c693867d
  Marcin Stolarek authored Jul 23, 2019
```
Use of scontrol wait_job in slurmctld will result in prolog
hanging since the command will complete only when PrologSlurmctld
is completed. It's a deadlock.

Bug 7428.
```
  c693867d
- Add missing documnetation for DebugFlag Route · ccb1cb3c
  Marcin Stolarek authored Aug 12, 2019
  
  ccb1cb3c
16 Aug, 2019 2 commits

job_submit/lua - fix problem where nil was expected for min_mem_per_cpu. · 2d017875
Chad Vizino authored Aug 12, 2019
```
It wasn't properly set under certain conditions.

Bug 7276
```
2d017875

Docs - remove EnforcePartLimits=YES from slurm.conf man page. · d48eba21

Marcin Stolarek authored Aug 09, 2019

"ANY" is the canonical and most accurate value identifier for
PARTITION_ENFORCE_ANY although "Yes", "Up", "True" and "1" continue
being parsed and accepted as equivalent values for retro-compatibility
purposes with the initial commit edf3880c.

Bug 7248.

d48eba21

15 Aug, 2019 2 commits
- Cray - fix contribs slurm.conf.j2 with updated cray_aries plugin names. · e945917d
  Marcin Stolarek authored Aug 15, 2019
```
Bug 7410.
```
  e945917d
- Fix Coverity CID 203367. · 9dcaccf8
  Dominik Bartkiewicz authored Aug 15, 2019
```
Continuation of 884c0191.

Bug 7362.
```
  9dcaccf8
14 Aug, 2019 9 commits
- uncomment links about SLUG registration. · d7ff6f4b
  Danny Auble authored Aug 14, 2019
  
  d7ff6f4b
- Improve description of test_only variable in comments · 61269349
  Morris Jette authored Jun 13, 2019
```
Bug 6769
```
  61269349
- COMPLETING nodes available immediately for job will-run test · 0666db61
  Morris Jette authored Jun 12, 2019
```
Consider jobs in COMPLETING state as being available immediatley for
a job will-run evaluation. This assumes the completion will happen
very soon after the test is run.

bug 6769
```
  0666db61
- Avoid select plugin resource usage underflow from duplicate job free · 2dd1f448
  Morris Jette authored Jul 29, 2019
```
All of the select plugins were performing a duplicate resource free
for jobs in completing state when performing a will-run test for
new jobs. This would frequently result in underflow messages.

Bug 6769
```
  2dd1f448
- Fix typos for 'termination'. · c775166f
  Ben Roberts authored Aug 14, 2019
  
  c775166f
- Fix typos for 'communications'. · b68be80b
  Ben Roberts authored Aug 14, 2019
  
  b68be80b
- Fix typos for 'approximately'. · 7f70b882
  Ben Roberts authored Aug 14, 2019
  
  7f70b882
- Fix typos for 'beginning'. · 0efe2a70
  Ben Roberts authored Aug 14, 2019
  
  0efe2a70
- Fix typo for 'physical'. · 6ae20e61
  Ben Roberts authored Aug 14, 2019
  
  6ae20e61