Commits · 341d0d6e69e1e793dbb29284353998cdcafa5d13 · Manuel G. Marciani / ces_slurm_simulator

21 May, 2019 22 commits
- Avoid slurmctld abort on zero size jobs · 341d0d6e
  Morris Jette authored May 16, 2019
```
Jobs with zero nodes/CPUs are permitted to create and destroy
persistent burst buffers.

Bug 7034.
```
  341d0d6e
- Avoid accounting error on zero size jobs · 66dd627e
  Morris Jette authored May 16, 2019
```
Bug 7034.
```
  66dd627e
- Add 18.08.8 NEWS to 19.05.9rc2 NEWS · 09ec07ef
  Brian Christiansen authored May 21, 2019
  
  09ec07ef
- Merge remote-tracking branch 'origin/slurm-18.08' into slurm-19.05 · f69f1a82
  Brian Christiansen authored May 21, 2019
  
  f69f1a82
- Correctly set unlimited sched_job_limit · 69621444
  Dominik Bartkiewicz authored May 06, 2019
```
unlimited could get overwritten with default queue depth preventing the
whole queue from being looked at -- especially in a high-throughput
envrionment.

Bug 6822

Co-authored-by: Morris Jette <jette@schedmd.com>
```
  69621444
- Minor formatting issues. · f8ba5e5d
  Danny Auble authored May 21, 2019
```
Bug 5562
```
  f8ba5e5d
- Change code to match cons_res in commit b4d7de48. · 6a166c50
  Danny Auble authored May 21, 2019
```
Bug 5562
```
  6a166c50
- Move code into if statement since it is only used there. · 36b59335
  Danny Auble authored May 21, 2019
```
Bug 5562
```
  36b59335
- Merge remote-tracking branch 'origin/slurm-18.08' into slurm-19.05 · 6256a10a
  Danny Auble authored May 21, 2019
  
  6256a10a
- cons_res/job_test - fix to consider a node's current allocated memory. · b4d7de48
  Alejandro Sanchez authored Apr 11, 2019
```
Node memory overallocation wouldn't be properly detected since we would
just be interpreting the available memory as RealMemory - MemSpecLimit,
ignoring other job's memory usage.

Bug 5562.
```
  b4d7de48
- cons_res/job_test - prevent a job from overallocating a node memory. · 6871185a
  Alejandro Sanchez authored Apr 11, 2019
```
This compares a job memory request against each selected node available
memory, interpreting the latter for now as RealMemory - MemSpecLimit.

Bug 5562.
```
  6871185a
- cons_res/job_test - non-functional code restructuring. · 406f343a
  Alejandro Sanchez authored Apr 11, 2019
```
Place all three memory cases (per cpu, per node and all node memory) in
a single loop, since all three cases need to traverse all job_resources
selected nodes. Preparation for a follow-up commit that contains the
real fix.

Bug 5562.
```
  406f343a
- Remove some duplicate code in test suite · 38daa7a9
  Morris Jette authored May 21, 2019
```
Move common (or similar) logic to globals and remove it from
the individual tests.
```
  38daa7a9
- Merge branch 'slurm-18.08' into slurm-19.05 · 39990536
  Tim Wickberg authored May 21, 2019
  
  39990536
- slurm.spec-legacy - package two additional plugins. · 496358f9
  Tim Wickberg authored Apr 29, 2019
```
Add handling for acct_gather_energy/xcc and acct_gather_profile/influxdb.

Bug 6829.
```
  496358f9
- Docs - add documentation for nss_slurm. · a8b9204b
  Tim Wickberg authored May 21, 2019
```
Bug 5773.
```
  a8b9204b
- Simplify the code · 90200513
  Danny Auble authored May 21, 2019
```
No functional change.

Bug 6508
```
  90200513
- Fix wrongly setting start_time to 0 for multi-part jobs. · 457e7517
  Dominik Bartkiewicz authored May 21, 2019
```
Bug 6508
```
  457e7517
- Fix DefMemPer[CPU|Node] assignment on multi-partition job requests. · 8a1e5a52
  Alejandro Sanchez authored May 09, 2019
```
Previously when no memory was explicitly requested the job was assigned
the DefMemPer[CPU|Node] from the first partition in the list (or the
cluster-wide value if the partition wasn't configured with it), even
when evaluating against a different partition.

Bug 6950.
```
  8a1e5a52
- Remove incorrect comment. · e080c8d8
  Dominik Bartkiewicz authored May 17, 2019
```
No functional change

Bug 5303
```
  e080c8d8
- Fix segfault if enable_nss_slurm is not set but REQUEST_GETGR is called. · cdcd50b1
  Tim Wickberg authored May 20, 2019
```
Bug 7072
```
  cdcd50b1
- Continuation of 7dcde848 · 94d0a365
  Dominik Bartkiewicz authored May 16, 2019
```
Bug 6845
```
  94d0a365
20 May, 2019 5 commits

Fix test to work in front-end mode · 72b1774b
Morris Jette authored May 20, 2019

72b1774b
Modify test to work in front-end mode · 4eb243ee
Morris Jette authored May 20, 2019

4eb243ee

Log ID for individual sub-tests · 4983907a

Morris Jette authored May 20, 2019

Log the ID of individual sub-tests in order to more easily identify
which sub-test failed rather than having to scan and compare the
various execute lines in the tests.

4983907a

Disable portion of test in front-end mode · 04b4dde0
Morris Jette authored May 20, 2019
```
A batch job will run on front-end node, not an assigned compute node
```
04b4dde0

Change test to better match sacct behavior · ca7d1d0a

Morris Jette authored May 20, 2019

The sacct command in verison 19.05 when job ID is specified will
find all examples of that job ID run at any time. That means if
the job IDs numbers wrap around, this test will always fail. This
adds a start time to the sacct command of 00:00 (midnight of current
day) to avoid problems with wrapping job IDs and make this test
work more like it did in version 18.08. Note this test does have
a very tiny window for failures if the test program ran just before
midnight and the sacct command to view it's state ran just after
midnight. Given that the entire test only runs for a minute, that
is unlikely in practice.

ca7d1d0a

18 May, 2019 1 commit
- Fix bad wording in log message · 8fdcb6cd
  Morris Jette authored May 17, 2019
```
Change "Could not for..." to "Could not find ..."
```
  8fdcb6cd
17 May, 2019 8 commits

Make test job_submit.lua a NO-OP except for selected jobs · 22d3bfd0

Morris Jette authored May 17, 2019

Do not effect non-test jobs with the test LUA script to avoid
impacting jobs outside of this specific test.

Bug 7050

22d3bfd0

Always revert configuration at test end for test7.20. · c0d25394
Nate Rini authored May 17, 2019
```
Bug 7050.
```
c0d25394

Improve logic to detect if gres/gpu spans sockets · a2a9c7de

Morris Jette authored May 17, 2019

Previous logic only checked the first gpu record found, which
is not going to reliably work if the first gpu type is on one
socket and the next gpu type is on a different socket or itself
spans sockets.

a2a9c7de

Correct index used in building GRES socket string · 4f92fb3e

Morris Jette authored May 17, 2019

The wrong variable was clearly being used resulting in a node's
"gres" string not containing the proper socket identification for
GRES bound to sockets.

4f92fb3e

Enhance test with job name and mpi option · 4b1f0d6e

Morris Jette authored May 17, 2019

This change adds a job name to all tests spawned by the test.
It also explicitly sets the MPI type to none. This is required
by some of the tests if using OpenMPI in multi-slurmd mode.
See note in test1.88 for full description of OpenMPI limitations
in this Slurm mode.

4b1f0d6e

Merge branch 'slurm-18.08' into slurm-19.05 · e35e93e5
Tim Wickberg authored May 16, 2019

e35e93e5
Fix NEWS from previous commit. · 438ffc1c
Tim Wickberg authored May 16, 2019
```
This is select/cons_res, not select/cons_tres.
```
438ffc1c
Only allocate 1 CPU per node with the --overcommit and --nodelist options · 46197135
Morris Jette authored May 10, 2019
```
Previous select/cons_res logic would allocate one CPU per task on the node

Bug 6981
```
46197135

16 May, 2019 4 commits

Cosmetic change only · e78b6cfa
Dominik Bartkiewicz authored May 16, 2019
```
Bug 6221
```
e78b6cfa
Only allocate 1 CPU per node with the --overcommit option · dd7775ef
Morris Jette authored May 10, 2019
```
Previous select/cons_tres logic would allocate one CPU per task on the node

Bug 6981
```
dd7775ef

modify task layout with --overcommit · 42d7e312

Morris Jette authored May 10, 2019

Modify task layout with --overcommit option plus a heterogeneous job
allocation so that a cyclic task distribution can start happening before
all CPUs on all nodes are fully allocated. The number of tasks per node
will be unchanged from the previous algorithm, but tasks will be distributed
in a cyclic fashion first and then extra tasks placed on nodes with more
CPUs. Previously all CPUs would be fully allocated in a cyclic fashion,
then excess tasks distributed evenly across all allocated nodes.
Bug 6981

42d7e312

Fix hetjob MPI test for multi-slurmd and OpenMPI · dd4ac0ae

Morris Jette authored May 16, 2019

OpenMPI can only run in multi-slurmd mode if no more than one node has
more than one task. Individual nodes with more than one task use shared
memory for communications and if more than one node is doing that, their
shared memory use collides. That means these MPI tests will work if five
nodes or more are available, otherwise some tests will fail. See test1.117
for a variation of this test that will work with OpenMPI and multi-slurmd
mode.

dd4ac0ae