Commits · 162f6a05183c7390d73d39e576da9c1d49f278ce · Manuel G. Marciani / ces_slurm_simulator

25 May, 2017 6 commits

Prevent a job tested on multiple partitions from being marked WHOLE_NODE_USER. · 162f6a05

Dominik Bartkiewicz authored May 25, 2017

If a job is considered on a partition with ExclusiveUser=YES
then it would be marked as if it was submitted with the
--exclusive flag, which would lead to delays launching it
on ExclusiveUser=NO partitions, and cause lower-than-expected
cluster usage.

As a side effect, the job_ptr->part_ptr->flags need to be
tested wherever WHOLE_NODE_USER is considered, instead of
just job_ptr->details->whole_node.

Bug 3771.

162f6a05

Revert "Prevent a job tested on multiple partitions from being marked" · f1a45962
Tim Wickberg authored May 25, 2017
```
Wrong author attributed by mistake.

This reverts commit 9128476a.
```
f1a45962
Revert "Prevent a race between completing jobs on a user-exclusive node from" · 82b0f802
Tim Wickberg authored May 25, 2017
```
Wrong author attributed by mistake.

This reverts commit a02d04f1.
```
82b0f802

Prevent a race between completing jobs on a user-exclusive node from · a02d04f1

Tim Wickberg authored May 25, 2017

leaving the node owned.

Two jobs completing simultaneously leads to make_node_idle()
returning before it has a chance to decrement node_ptr->owner_job_cnt,
which can result in the node being "owned" by that user even
through no jobs are running on it.

Move the decrement block to the end at a fini label, and make sure
all return paths pass through it. While moving that add a guard
against node_ptr->owner_job_cnt underflowing.

Bug 3771.

a02d04f1

Prevent a job tested on multiple partitions from being marked · 9128476a

Tim Wickberg authored May 25, 2017

WHOLE_NODE_USER.

If a job is considered on a partition with ExclusiveUser=YES
then it would be marked as if it was submitted with the
--exclusive flag, which would lead to delays launching it
on ExclusiveUser=NO partitions, and cause lower-than-expected
cluster usage.

As a side effect, the job_ptr->part_ptr->flags need to be
tested wherever WHOLE_NODE_USER is considered, instead of
just job_ptr->details->whole_node.

Bug 3771.

9128476a

Fix WithSubAccounts option to not include WithDeleted unless requested. · 29ebc4b2

Alejandro Sanchez authored May 25, 2017

_setup_assoc_cond_limits was using the table 'prefix' passed by argument
in the where clause to select the where clause prefix.deleted=something.

It turns out that _setup_assoc_cond_limits is called by these functions:
as_mysql_modify_assocs
as_mysql_remove_assocs
as_mysql_get_assocs
as_mysql_acct_no_users

which set the prefix to 't2' before the call if a QOS is provided or if
WithSubAccounts is provided. The 't2' prefix is fine for other where
conditions in that case, but for choosing the deleted we need the t1
which is the table we're selecting the records off.

Bug 3835

29ebc4b2

24 May, 2017 4 commits
- Check if variable given to scontrol show job is a valid jobid. · ea906a24
  Tim Shaw authored May 24, 2017
```
Bug 3821
```
  ea906a24
- Handle a reservation update to UNLIMITED correctly. · 6180ff64
  Tim Wickberg authored May 23, 2017
```
'scontrol update reservationname=foo duration=unlimited' sets INFINITE
as the duration, which needs to be translated to a year as is done
elsewhere. Otherwise it'll convert to 49710 days, which is definitely
wrong.

Bug 3836.
```
  6180ff64
- Fix unsafe MAX() macro use that can lead to repeated cancellation attempts in scancel. · 5bc278f7
  Alejandro Sanchez authored May 23, 2017
```
Bug 3443.
```
  5bc278f7
- Fix unsafe use of MAX macro that could lead to problems with acct_gather plugins. · 03a374d3
  Alejandro Sanchez authored May 23, 2017
```
MAX() will re-evaluate the higher value argument; if this is a function
is may be called twice over, leading to unintended side effects or a
crash.

Bug 3443.
```
  03a374d3
23 May, 2017 2 commits

Fix it so the backup slurmdbd will take control correctly. · 4f87dc53

Danny Auble authored May 23, 2017

This also fixes the fed_mgr on the backup slurmctld to start backup
correctly if the backup takes control more than once.

Bug 3827

4f87dc53

Fix Partition line in 'scontrol show node'. · e089a84f

Tim Shaw authored May 22, 2017

Previously, incorrect partitions and duplicated partition names
could be shown.

The array needs to be incremented by two, not one, as each
element is a start-end pair.

Bug 3793.

e089a84f

22 May, 2017 1 commit
- Fix null-derefer in sreport cluster ulitization · c30629bc
  Brian Christiansen authored May 22, 2017
```
when configured with memory-leak-debug
```
  c30629bc
19 May, 2017 6 commits
- When doing a dlopen on liblua only attempt the version compiled against. · e75f6118
  Danny Auble authored May 19, 2017
```
Bug 2131
```
  e75f6118
- Add missing QOS read lock to backfill scheduler. · 5d948801
  Danny Auble authored May 19, 2017
```
Bug 3776
```
  5d948801
- node_features/knl_generic: Do not repeatedly log errors when trying to read · ea2a0d25
  Morris Jette authored May 19, 2017
```
KNL modes if not KNL system.

Bug 3825
```
  ea2a0d25
- Revert "node_features/knl_generic: Do not repeatedly log errors when trying to read" · 4e7794e7
  Danny Auble authored May 19, 2017
```
This reverts commit c2380520.
```
  4e7794e7
- node_features/knl_generic: Do not repeatedly log errors when trying to read · c2380520
  Danny Auble authored May 19, 2017
```
KNL modes if not KNL system.

Bug 3825
```
  c2380520
- node_features/knl_cray: Preserve non-KNL active features if slurmctld · bc484054
  Morris Jette authored May 19, 2017
```
reconfigured while node boot in progress.

Bug 3679
```
  bc484054
18 May, 2017 1 commit
- Fix minor typos in the documentation · 0bc04046
  Damien François authored May 18, 2017
```
bug 3822
```
  0bc04046
17 May, 2017 3 commits
- Calculate priority correctly when 'nice' is given. · a1168840
  Dominik Bartkiewicz authored May 17, 2017
```
Bug 3708
```
  a1168840
- NEWS for commit 79ff60f4 · 3618e592
  Danny Auble authored May 17, 2017
  
  3618e592
- Add support for lua5.3. · 7cc4d0d8
  Danny Auble authored May 17, 2017
```
In 17.11(or other future version) we should move a lot of this common
code into a new lib.  The reason I didn't put these common changes
into common/xlua.c was because then I would have to link common to
liblua which I really didn't want to do.
```
  7cc4d0d8
16 May, 2017 4 commits
- Add missing locks to job_submit/pbs plugin when updating a jobs · 5674dd74
  Dominik Bartkiewicz authored May 16, 2017
```
dependencies.

Bug 3708
```
  5674dd74
- Fix incorrect lock levels when testing when job will run or updating a job. · 1120d85a
  Tim Wickberg authored May 16, 2017
```
Bug 3772
```
  1120d85a
- Test if the node_bitmap on a job is NULL when testing if the job's nodes · e9ab5517
  Morris Jette authored May 15, 2017
```
are ready.  This will be NULL is a job was revoked while beginning.
```
  e9ab5517
- Add new burst_buffer function bb_g_job_revoke_alloc() to be executed · e6fa25fa
  Morris Jette authored May 15, 2017
```
if there was a failure after the initial resource allocation. Does not
release previously allocated resources.

Bug 3783

This is the initial patch that adds the stubs for the logic.  Outside of
that this patch really does nothing.
```
  e6fa25fa
15 May, 2017 3 commits
- node_features/knl_generic disable mode change unless RebootProgram · 60a2bd6f
  Morris Jette authored May 15, 2017
```
configured.
```
  60a2bd6f
- node_features/knl_generic - If a node is rebooted for a pending job, but · 8befe639
  Morris Jette authored May 15, 2017
```
fails to enter the desired NUMA and/or MCDRAM mode then drain the node and
requeue the job.

Bug 3785
```
  8befe639
- When rebooting a node and using the PrologFlags=alloc make sure the · 7f8ac296
  Tim Shaw authored May 15, 2017
```
prolog is ran after the reboot.

Bug 3618
```
  7f8ac296
13 May, 2017 2 commits
- Remove log files from test20.12 · 7bb4d9a1
  Isaac Hartung authored May 12, 2017
```
Bug 3695
```
  7bb4d9a1
- knl_cray plugin: Change capmc parsing of mcdram_pct from string to number · 7bd276b1
  Morris Jette authored May 12, 2017
```
bug 3779
```
  7bd276b1
12 May, 2017 4 commits

knl_cray plugin: Log incomplete capmc output for a node · 80b27490

Morris Jette authored May 12, 2017

If capmc reports a node name, but not mcdram_cfg for the node, then
  log the missing data rather than assume the value is zero and
  report a value mismatch with cnselect.

80b27490

Prevent scontrol crash when operating on array and no-array jobs at once. · 006f7eeb

Alejandro Sanchez authored May 12, 2017

When requesting an operation on jobs, where the operation permits to specify
more than one job in the same request, and a job array appears before a
regular job (no-array job) in the list of jobs to operate with, the
job_array_resp_msg_t pointer was not properly NULL'ed and thus incorrectly
accessed when processing the no-array job. This fix prevents the crash from
happening in the following scontrol operations:

uhold, hold, suspend, requeue, requeuehold, update, release

when the same request has <array_jobid>,<non-array_jobid> in this order in
the job list to process.

Bug 3759

006f7eeb

Enhance job expansion example · 02b790bc

Morris Jette authored May 12, 2017

Job expansion example in FAQ enhanced to demonstrate operation in
    heterogeneous environments.
bug 2979

02b790bc

avoid starting scheduler on busy system after power cap change · e29e8511
Alejandro Sanchez authored May 12, 2017
```
Do not attempt to schedule jobs after changing the power cap if there are
    already many active threads.
```
e29e8511

11 May, 2017 1 commit
- Update NEWS for next release. · d65ed698
  Danny Auble authored May 10, 2017
  
  d65ed698
10 May, 2017 1 commit
- Return error when bad separator is given for scontrol update job licenses. · 521a574c
  Dominik Bartkiewicz authored May 10, 2017
```
Bug 3760
```
  521a574c
09 May, 2017 2 commits
- Revert "Return error when bad separator is given for scontrol update job licenses." · 36718220
  Danny Auble authored May 09, 2017
```
This reverts commit ecfd007f.
```
  36718220
- Return error when bad separator is given for scontrol update job licenses. · ecfd007f
  Dominik Bartkiewicz authored May 09, 2017
  
  ecfd007f