Commits · 06caf845d38b509467c6bb5b058bad22a2a0f803 · Manuel G. Marciani / ces_slurm_simulator

27 Mar, 2019 10 commits

Morris Jette authored Mar 27, 2019

Remove reference to REQUEST_SIGNAL_PROCESS_GROUP in slurmstepd.
It has been defunct since July 2013

06caf845

make gres test order independent · f77cc826

Morris Jette authored Mar 27, 2019

sort the expected and actual output for GRES APIs irrelevant.
Depending upon the GRES plugins loaded (specifically gres/gpu plus
gres/mps), the GRES records can be sorted by File name to insure
the GRES records line up (the same position in both lists should
refer to the same device file).

f77cc826

Merge branch 'slurm-18.08' · 28ec160d
Alejandro Sanchez authored Mar 27, 2019

28ec160d
Fix slurmctld segfault due to job's partition pointer NULL dereference. · b08e3137
Dominik Bartkiewicz authored Mar 27, 2019
```
Bug 6750.
```
b08e3137
Merge remote-tracking branch 'origin/slurm-18.08' · a33a4c33
Danny Auble authored Mar 27, 2019

a33a4c33

Fix for GRES with zero count · e3269a5a

Morris Jette authored Mar 27, 2019

This logic could allocate to a job a GRES device with
an availability count of zero.

e3269a5a

Cosmetic changes, no change in logic · be35f362
Morris Jette authored Mar 27, 2019

be35f362

gres/mps prevent possible zero divide · af461fad

Morris Jette authored Mar 27, 2019

This should only happen if there is flawed logic somewhere, but
avoiding an abort is better than not.

af461fad

Set CUDA_VISIBLE_DEVICES if GPU count mismatch · 059e2287

Morris Jette authored Mar 27, 2019

If the count of GPUs configured in slurm.conf and gres.conf differ
and FastSchedule>=1 then the bitmap identifying the GPU allocation
sent from slurmctld to slurmd will differ. Previously this resulted
in CUDA_VISIBLE_DEVICES being set to NULL. Now it will be set correctly.

bug 6725

059e2287

Fix for gres/gpu count mismatch · ec0e7c8c

Morris Jette authored Mar 27, 2019

If slurmd finds GRES with files and slurmctld can't use them
(i.e. slurm.conf has a GRES count of 0), then avoid trying to
create zero length bitmaps in the GRES data structure.
bug 6725

ec0e7c8c

26 Mar, 2019 15 commits
- correct gres bitmap size · c96180e8
  Morris Jette authored Mar 26, 2019
```
This makes the gres bitmap size equal to the number of records for
shared gres (i.e. gres/mps), otherwise it is the gres count (i.e.
gres/gpu).

bug 6733
```
  c96180e8
- gres/mps fix for non-ordered gres/gpu · e9dde61c
  Morris Jette authored Mar 26, 2019
```
if the device files for gres/gpu are out of order or grouped
in an unordered fashion (e.g. "Name=gpu Files=/dev/nvidia[2,8,10]")
then split the gres/gpu records to one record per file and make
sure the gres/mps records are in an identical order. Required for
matching gres/gpu and gres/mps records (one GPU can be allocated
either as gres/gpu or as gres/mps, but not both, so we need to
be able to match records in slurmctld).
```
  e9dde61c
- Check for possible NULL pointer · 164c129f
  Morris Jette authored Mar 26, 2019
```
Coverity CID 197447
```
  164c129f
- sched/backfill - Make hetjobs sensitive to bf_max_job_start. · 399e04b9
  Alejandro Sanchez authored Mar 20, 2019
```
Bug 6710.
```
  399e04b9
- Document a safe way to use scontrol suspend/resume · a706bcb3
  Marshall Garey authored Mar 25, 2019
```
Bug 6590.
```
  a706bcb3
- Tweak tests for CR_ONE_TASK_PER_CORE · f43a71b8
  Morris Jette authored Mar 26, 2019
```
Make some tests better able to work with CR_ONE_TASK_PER_CORE
```
  f43a71b8
- Enable select/cons_tres with CR_ONE_TASK_PER_CORE · afaafedc
  Morris Jette authored Mar 26, 2019
  
  afaafedc
- cons_tres: everything working with CR_ONE_TASK_PER_CORE · 8359dbd6
  Morris Jette authored Mar 26, 2019
  
  8359dbd6
- select/cons_tres working with CR_ONE_TASK_PER_CORE · 5854e834
  Morris Jette authored Mar 26, 2019
```
More testing required. This configuration is still disabled in
select_cons_tres.c
```
  5854e834
- Tweak tests for CR_ONE_TASK_PER_CORE · b91fd55b
  Morris Jette authored Mar 26, 2019
```
Add --ntasks-per-core option to execute line as needed
```
  b91fd55b
- gres: correct configuration processing · f236698d
  Morris Jette authored Mar 25, 2019
```
Without this change, test7.17 was failing
```
  f236698d
- Rename variable for better clarity · 516e5a3f
  Morris Jette authored Mar 25, 2019
```
Cosmetic change
No change in logic
```
  516e5a3f
- improve test documentation · 338c9613
  Morris Jette authored Mar 25, 2019
```
cosmetic change only
```
  338c9613
- Fix for potential NULL pointer reference · 9d324bfc
  Morris Jette authored Mar 25, 2019
```
This can happen on node failure
```
  9d324bfc
- code clean up · 8ffc6419
  Morris Jette authored Mar 25, 2019
```
Use stored pointer rather than pointer to pointer for
cleaner code. No change in logic.
```
  8ffc6419
25 Mar, 2019 9 commits
- Apply workaround for MySQL 5.5 index update bug · 4a1d612d
  Broderick Gardner authored Mar 25, 2019
```
Workaround involves specifying the index name when modifying an
existing index.
Bug 6303
```
  4a1d612d
- Fix table schema change detection · db3a4ee3
  Broderick Gardner authored Mar 22, 2019
```
The correct use of correct_query and query is clarified here.
Also adds saving the index name to parsing so correct_query
can be set to the correct "drop <index name>" for future query.
Bug 6303
```
  db3a4ee3
- Move min_gres_cpu into job details · 67ce47bb
  Morris Jette authored Mar 25, 2019
```
This relocates a variable in order to move some common code into
one place rather than repeat it in several locations. No changes
to functionality, but simpler/less code.
```
  67ce47bb
- Continuation from commit e7b1a864 · 80415c4e
  Albert Gil authored Mar 25, 2019
```
Bug 6680
```
  80415c4e
- Correct gres/mps bitmap size · 482f806a
  Morris Jette authored Mar 25, 2019
```
The bitmap size should equal the GPU count not the MPS count
bug 6733
```
  482f806a
- gres/mps: Prevent possible zero divide · ef3b5ebc
  Morris Jette authored Mar 25, 2019
  
  ef3b5ebc
- Add some job hold logging · 84329e6c
  Morris Jette authored Mar 25, 2019
```
No change in logic other than additional logs
```
  84329e6c
- select/cons_tres: add some more logging · bc788da3
  Morris Jette authored Mar 25, 2019
  
  bc788da3
- Add MsgAggregation doc · 337e8e21
  Felip Moll authored Mar 25, 2019
```
Note that using this feature in non-flat networks is not supported since the
sender address is set depending on the hostname resolution in each node.

Bug 6007
```
  337e8e21
22 Mar, 2019 6 commits
- [PATCH 2/2] Rename set_cnt to topo_cnt · ad4b1356
  Brian Christiansen authored Mar 22, 2019
```
topo_cnt == the number of gres types or gres w/different topology.

Bug 6725
```
  ad4b1356
- [PATCH 1/2] Fix allocating correct gres_topo_bitmap bitsize · 3154f4fc
  Brian Christiansen authored Mar 22, 2019
```
Bug 6725
```
  3154f4fc
- Remove unused/vestigial variables · 19b281aa
  Morris Jette authored Mar 22, 2019
  
  19b281aa
- Merge branch 'slurm-18.08' · 4b66614e
  Alejandro Sanchez authored Mar 22, 2019
  
  4b66614e
- Merge branch 'slurm-17.11' into slurm-18.08 · fb6bc32c
  Alejandro Sanchez authored Mar 22, 2019
  
  fb6bc32c
- testsuite - increase srun --immediate from 2 to 5 in test2.[18,19]. · 8cffad1b
  Marshall Garey authored Mar 22, 2019
```
With schedulerparameters=defer and prolog scripts and/or spank plugins
that take some time, jobs weren't starting within 2 seconds that tests
2.18 and 2.19 expected, causing these tests to fail.

Bug 6670.
```
  8cffad1b