Commits · 9b4d654125e2339c0e2954132d2fc26212be5ca5 · Manuel G. Marciani / ces_slurm_simulator

23 May, 2019 3 commits

Morris Jette authored May 23, 2019

an MPI problem caused a test failure for me and since it's not running
an MPI job, just disable the MPI plugin

9b4d6541

Fix coverity warning · 289b2c66
Brian Christiansen authored May 23, 2019
```
CID 198639

Continuation of 8a1e5a52

Bug 6950
```
289b2c66
Replace "SLURM" with "Slurm" · 6d5fdc78
Morris Jette authored May 22, 2019
```
Mostly in copywrite notices, but some comments and documents also.
bug 7090
```
6d5fdc78

22 May, 2019 16 commits
- Merge branch 'slurm-18.08' into slurm-19.05 · 46cdefae
  Tim Wickberg authored May 22, 2019
  
  46cdefae
- Clarify how SLURM_SUBMIT_DIR is set in salloc/sbatch/srun man pages. · 69d5d94b
  Ben Roberts authored May 22, 2019
```
Bug 7092.
```
  69d5d94b
- Update error message to be more descriptive if port selection fails. · 13230170
  Tim Wickberg authored May 22, 2019
```
Can happen if SrunPortRange has been set too small, especially on shared
login nodes launching multiple large-scale srun processes.
```
  13230170
- Update sacct man page · 1a563823
  Ben Roberts authored Apr 30, 2019
```
Bug 6916
```
  1a563823
- Update Elastic Computing docs with TCPTimeout info · c06b1c27
  Ben Roberts authored May 10, 2019
```
Bug 6995
```
  c06b1c27
- Add 19.05 NEWS line for e7d4d593 from 18.08 · 5bbc2543
  Brian Christiansen authored May 22, 2019
```
Bug 6467
```
  5bbc2543
- Merge remote-tracking branch 'origin/slurm-18.08' into slurm-19.05 · 55a7dd97
  Brian Christiansen authored May 22, 2019
  
  55a7dd97
- Use correct rank for cloud stepd's. · e7d4d593
  Marshall Garey authored Apr 18, 2019
```
Job steps that run on cloud nodes and use the alias_list - in other
words, SlurmctldParameters=cloud_dns is not in slurm.conf - all talk
directly back to the slurmctld. To make that happen, we set the parent
tank of each stepd to -1. However, we also set the rank of each stepd to
0. this meant that when each stepd sent a REQUEST_STEP_COMPLETE RPC to
the slurmctld, they would tell slurmctld to clean up node 0 in the step
allocation. So, multi-node step allocations weren't cleaning up after
the steps completed and would cause subsequent job steps to hang. The
step allocations would only clean up properly at the end of the job.

Ensure that each stepd uses the correct rank so that job steps are
properly cleaned up after each step completes.

Bug 6467.
```
  e7d4d593
- Copy two 18.08 NEWS entries to 19.05. · a7084228
  Alejandro Sanchez authored May 22, 2019
```
They were associated to these two commits:

b4d7de48
6871185a

Bug 5562.
```
  a7084228
- Merge branch 'slurm-18.08' into slurm-19.05 · d85f2d39
  Alejandro Sanchez authored May 22, 2019
  
  d85f2d39
- Move two NEWS entries to appropriate maintenance release. · 09a7da34
  Alejandro Sanchez authored May 22, 2019
```
They were associated to these two commits:

b4d7de48
6871185a

Bug 5562.
```
  09a7da34
- cons_tres/dist_tasks - fix variable usage in cyclic distribution. · abb732c8
  Morris Jette authored May 22, 2019
```
Bug 6998.
```
  abb732c8
- Update dwstat test · c5482f48
  Morris Jette authored May 21, 2019
```
Modify "scontrol show dwstat" test to work if no active sessions,
configurations, etc.
```
  c5482f48
- Correction to test suite · 8dde917c
  Morris Jette authored May 21, 2019
```
Correct some logic in commit 38daa7a9. The variable "file_in"
did not exist in some places where the function was called from.
```
  8dde917c
- Make test work on fat node · 0831b2a2
  Morris Jette authored May 21, 2019
```
Modify test to work on cray/kachina system when allocated KNL node.
```
  0831b2a2
- Make test work with fat (KNL) nodes · 7c0b965a
  Morris Jette authored May 21, 2019
```
Test broke on cray/kachina when allocated KNL
```
  7c0b965a
21 May, 2019 21 commits
- job_preempt_check() - consider only jobs in an overlapping partition · 2d309f2e
  Dominik Bartkiewicz authored May 06, 2019
```
Bug 6822
```
  2d309f2e
- Change index use for lower overhead and better clarity · 23c827d9
  Moe Jette authored May 21, 2019
```
Bug 7061
```
  23c827d9
- Make functions static instead of extern. No logic changes. · 3c208630
  Danny Auble authored May 21, 2019
```
Bug 7061
```
  3c208630
- Move common code into central location. No logic change. · b9de39f8
  Danny Auble authored May 21, 2019
```
Bug 7061

Co-authored-by: Morris Jette <jette@schedmd.com>
```
  b9de39f8
- Merge branch 'slurm-18.08' into slurm-19.05 · c9434dd8
  Tim Wickberg authored May 21, 2019
  
  c9434dd8
- Prevent use of uninitialized variable · 1244dc98
  Morris Jette authored Apr 25, 2019
```
Error reported by CLANG

Cherry pick to 18.08.

Bug 6996.
```
  1244dc98
- Avoid slurmctld abort on zero size jobs · 341d0d6e
  Morris Jette authored May 16, 2019
```
Jobs with zero nodes/CPUs are permitted to create and destroy
persistent burst buffers.

Bug 7034.
```
  341d0d6e
- Avoid accounting error on zero size jobs · 66dd627e
  Morris Jette authored May 16, 2019
```
Bug 7034.
```
  66dd627e
- Add 18.08.8 NEWS to 19.05.9rc2 NEWS · 09ec07ef
  Brian Christiansen authored May 21, 2019
  
  09ec07ef
- Merge remote-tracking branch 'origin/slurm-18.08' into slurm-19.05 · f69f1a82
  Brian Christiansen authored May 21, 2019
  
  f69f1a82
- Correctly set unlimited sched_job_limit · 69621444
  Dominik Bartkiewicz authored May 06, 2019
```
unlimited could get overwritten with default queue depth preventing the
whole queue from being looked at -- especially in a high-throughput
envrionment.

Bug 6822

Co-authored-by: Morris Jette <jette@schedmd.com>
```
  69621444
- Minor formatting issues. · f8ba5e5d
  Danny Auble authored May 21, 2019
```
Bug 5562
```
  f8ba5e5d
- Change code to match cons_res in commit b4d7de48. · 6a166c50
  Danny Auble authored May 21, 2019
```
Bug 5562
```
  6a166c50
- Move code into if statement since it is only used there. · 36b59335
  Danny Auble authored May 21, 2019
```
Bug 5562
```
  36b59335
- Merge remote-tracking branch 'origin/slurm-18.08' into slurm-19.05 · 6256a10a
  Danny Auble authored May 21, 2019
  
  6256a10a
- cons_res/job_test - fix to consider a node's current allocated memory. · b4d7de48
  Alejandro Sanchez authored Apr 11, 2019
```
Node memory overallocation wouldn't be properly detected since we would
just be interpreting the available memory as RealMemory - MemSpecLimit,
ignoring other job's memory usage.

Bug 5562.
```
  b4d7de48
- cons_res/job_test - prevent a job from overallocating a node memory. · 6871185a
  Alejandro Sanchez authored Apr 11, 2019
```
This compares a job memory request against each selected node available
memory, interpreting the latter for now as RealMemory - MemSpecLimit.

Bug 5562.
```
  6871185a
- cons_res/job_test - non-functional code restructuring. · 406f343a
  Alejandro Sanchez authored Apr 11, 2019
```
Place all three memory cases (per cpu, per node and all node memory) in
a single loop, since all three cases need to traverse all job_resources
selected nodes. Preparation for a follow-up commit that contains the
real fix.

Bug 5562.
```
  406f343a
- Remove some duplicate code in test suite · 38daa7a9
  Morris Jette authored May 21, 2019
```
Move common (or similar) logic to globals and remove it from
the individual tests.
```
  38daa7a9
- Merge branch 'slurm-18.08' into slurm-19.05 · 39990536
  Tim Wickberg authored May 21, 2019
  
  39990536
- slurm.spec-legacy - package two additional plugins. · 496358f9
  Tim Wickberg authored Apr 29, 2019
```
Add handling for acct_gather_energy/xcc and acct_gather_profile/influxdb.

Bug 6829.
```
  496358f9