Commits · 27bf7328a79d264369b5a774680b68504285fe39 · Manuel G. Marciani / ces_slurm_simulator

20 Aug, 2018 40 commits
- cons_tres: resolve questionable variable value · 27bf7328
  Morris Jette authored Aug 15, 2018
```
The NULL value logged was not unexpected for an
unused partition (initial state).
```
  27bf7328
- cons_tres: improve thread specialization logic · 0cc27b06
  Morris Jette authored Aug 15, 2018
```
if each core has one thread, then remove whole cores from the
job allocation.
bug 5567
```
  0cc27b06
- cons_tres: get core specialization logic working · 78c72689
  Morris Jette authored Aug 15, 2018
  
  78c72689
- cons_tres: updates to comments only · b1a6f9b5
  Morris Jette authored Aug 15, 2018
  
  b1a6f9b5
- Fix new test · 5711dd84
  Morris Jette authored Aug 14, 2018
```
original was using env vars from batch host rather than using newly
set env vars from tasks spawned by srun
```
  5711dd84
- Add GPU allocation stress test · 140f4413
  Morris Jette authored Aug 14, 2018
  
  140f4413
- Fix come clang-reported problems · 0e4a9a33
  Morris Jette authored Aug 14, 2018
```
no real change in functionality
```
  0e4a9a33
- Add new cons_tres regression test · 9ce54b36
  Morris Jette authored Aug 14, 2018
```
This schedules GPUs and craynetwork GRES for a single job
```
  9ce54b36
- cons_tres logic fixes · 2e6ed363
  Morris Jette authored Aug 14, 2018
```
Fix some anomalies when scheduling multiple GRES for a single job
(e.g. GPUs plus craynetwork).
```
  2e6ed363
- cons_tres fix · 6d98c2cb
  Morris Jette authored Aug 13, 2018
```
partially revert commit c6888db6d, which caused test39.5 to fail
with some configurations
```
  6d98c2cb
- add tests for GRES lacking node topology · 244d497a
  Morris Jette authored Aug 10, 2018
  
  244d497a
- add job ID with test number to identify possible failures · c6a63b5f
  Morris Jette authored Aug 10, 2018
  
  c6a63b5f
- Fix some cons_tres GRES bugs · a2c9f0e5
  Morris Jette authored Aug 10, 2018
```
Previous logic had an invalid pointer that could result in a segv.
Previous logic failed to properly allocate tres-per-node ot tres-per-job
without defining gres topology information
```
  a2c9f0e5
- modify scontrol show job gres info · 38ddfc1e
  Morris Jette authored Aug 10, 2018
```
rather than just report GRES_IDX (index) info, report GRES count
info as well, since it can vary from node-to-node with cons_tres
```
  38ddfc1e
- cons_tres work for gres without topology · 8c235912
  Morris Jette authored Aug 10, 2018
```
This fixes a couple of bugs related to allocating GRES when
there is no associated topology, including adding support for
the --tres-per-job option
```
  8c235912
- add srun --tres-per-job option · eb5ef3e4
  Morris Jette authored Aug 10, 2018
```
for current cons_tres testing and future use
```
  eb5ef3e4
- cons_gres: fix GRES string build for new options · 4d2ea15f
  Morris Jette authored Aug 10, 2018
  
  4d2ea15f
- cons_gres: enforce configured gres defaults · 1bdec16c
  Morris Jette authored Aug 09, 2018
```
Enforce configured default values for DefMemPerGPU
and DefCPUPerGPU
```
  1bdec16c
- cons_tres - enhance mem-per-gpu logic · c332b480
  Morris Jette authored Aug 09, 2018
```
spread a job over multiple nodes if needed to
satisfy mem-per-gpu specification
```
  c332b480
- cons_tres: first pass on mem-per-gpu logic · 796d8d2d
  Morris Jette authored Aug 08, 2018
  
  796d8d2d
- cons_tres: --cpus-per-gpu enhancements · b717dc96
  Morris Jette authored Aug 07, 2018
```
Force job to span nodes when appropriate
```
  b717dc96
- Add some --gpu-freq logic including test · 7f86e971
  Morris Jette authored Aug 07, 2018
```
The last bit of logic to talk with the GPUs is still needed
```
  7f86e971
- cons_tres: pass info for cpus per core calculation · 61f8c922
  Morris Jette authored Aug 06, 2018
  
  61f8c922
- cons_tres: harden logic for missing bitmap · 6994b9cb
  Morris Jette authored Aug 06, 2018
  
  6994b9cb
- cons_tres: add --gpu-bind infrastructures · 68b47156
  Morris Jette authored Aug 06, 2018
```
this includes a new regression test
```
  68b47156
- cons_tres: flesh out cpus-per-gpu logic more · 8230d674
  Morris Jette authored Aug 02, 2018
  
  8230d674
- cons_tres: first cut at cpus-per-gres support · 4728168c
  Morris Jette authored Aug 02, 2018
  
  4728168c
- cons_tres: fix some logic to spread job across sockets · b2379f70
  Morris Jette authored Aug 02, 2018
```
This fixes a couple of bugs in commit 0e4874e19490a24
1. round up core count for job as needed (i.e. if job needs 3
   CPUs per task and there are 2 CPUs per core, the job needs
   2 cores rather than 1)
2. fix some bad logic of available cores on socket 0 is 0
3. failed to set exit_code to 1 on a expect test failure
```
  b2379f70
- cons_tres correction · 155afb83
  Morris Jette authored Aug 01, 2018
```
correction to logic for explicit hostname specification on job submit
bug introduced in commit 0e4874e19490a24fb54961ef89176a3e8f55952b
```
  155afb83
- try to allocate cpus on all sockets as gpus · 5433e2a6
  Morris Jette authored Aug 01, 2018
```
also add a regression test for this scheduling logic
bug 4584
```
  5433e2a6
- Add new job gpu test · 2096f797
  Morris Jette authored Aug 01, 2018
```
Add that desired GPU count is actually allocated to a job based upon
--gpus, --gpus-per-node, --gpus-per-socket, and --gpus-per-task
options
```
  2096f797
- cons_tres: fix logic determining available CPUs · 29b54c67
  Morris Jette authored Aug 01, 2018
  
  29b54c67
- cons_tres: tres-per-task dev work · 3de7199c
  Morris Jette authored Jul 31, 2018
  
  3de7199c
- modify test to avoid vestigial output file · 90d065b7
  Morris Jette authored Jul 31, 2018
  
  90d065b7
- cons_tres: first cut at gpus-per-task logic · 48278b9c
  Morris Jette authored Jul 30, 2018
  
  48278b9c
- gres underflow error fix · eb1ff338
  Morris Jette authored Jul 30, 2018
```
this bug exists with all select plugins. if a job has been allocated
gres and the gres have either topology or type information and the
slurmctld daemon restarts (while the job is running), then when the
job ends gres underflow errors will be generated. the problem is
due to the slurmctld not having gres topology or type information
available at restart time so that it can not update counters. the
overhead of updating those counters at node registration time is
high, so we just avoid generating the errors in this case.
note: this bug is not specific to cons_tres and  exists in
earlier versions of slurm.
```
  eb1ff338
- cons_tres: fix gres-per-job logic bug · 4aed2aa9
  Morris Jette authored Jul 27, 2018
  
  4aed2aa9
- cons_tres: heterogeneous GRES steps starting to function · 4ef99d9f
  Morris Jette authored Jul 27, 2018
```
if the step does not explicity specify a gres-per-node value,
then the step will be allocated gres identical to that allocated
to the job
```
  4ef99d9f
- cons_tres: refactor some step logic for new tres job options · d7d61399
  Morris Jette authored Jul 27, 2018
  
  d7d61399
- fix test: using wrong variable name · 60fbe9e3
  Morris Jette authored Jul 27, 2018
  
  60fbe9e3