Commits · f04682806ea5d3801807f47341a93a876f519f5f · Manuel G. Marciani / ces_slurm_simulator

20 Aug, 2018 40 commits
- fix for bad cons_tres branch merge · f0468280
  Morris Jette authored Aug 20, 2018
  
  f0468280
- Remove duplicate function · b8136e8e
  Morris Jette authored Aug 17, 2018
  
  b8136e8e
- cosmetic changes · cbe9bd72
  Morris Jette authored Aug 16, 2018
```
improve some logs
correct comment formatting for linux kernel standard
```
  cbe9bd72
- avoid duplicate epilog execution · 7a7588a2
  Morris Jette authored Aug 16, 2018
```
this will happen if a job releases its resources to expand another job,
triggering a second epilog and slurmctld errors about cpu count underflows
```
  7a7588a2
- cons_tres: add support for decreasing size of running job · 3386ff4d
  Morris Jette authored Aug 16, 2018
  
  3386ff4d
- cons_tres: fix gres state recovery logic · 6f16f363
  Morris Jette authored Aug 16, 2018
```
state recovery for jobs with gres and topology was
previously incorrect
```
  6f16f363
- cons_tres: resolve questionable variable value · 27bf7328
  Morris Jette authored Aug 15, 2018
```
The NULL value logged was not unexpected for an
unused partition (initial state).
```
  27bf7328
- cons_tres: improve thread specialization logic · 0cc27b06
  Morris Jette authored Aug 15, 2018
```
if each core has one thread, then remove whole cores from the
job allocation.
bug 5567
```
  0cc27b06
- cons_tres: get core specialization logic working · 78c72689
  Morris Jette authored Aug 15, 2018
  
  78c72689
- cons_tres: updates to comments only · b1a6f9b5
  Morris Jette authored Aug 15, 2018
  
  b1a6f9b5
- Fix new test · 5711dd84
  Morris Jette authored Aug 14, 2018
```
original was using env vars from batch host rather than using newly
set env vars from tasks spawned by srun
```
  5711dd84
- Add GPU allocation stress test · 140f4413
  Morris Jette authored Aug 14, 2018
  
  140f4413
- Fix come clang-reported problems · 0e4a9a33
  Morris Jette authored Aug 14, 2018
```
no real change in functionality
```
  0e4a9a33
- Add new cons_tres regression test · 9ce54b36
  Morris Jette authored Aug 14, 2018
```
This schedules GPUs and craynetwork GRES for a single job
```
  9ce54b36
- cons_tres logic fixes · 2e6ed363
  Morris Jette authored Aug 14, 2018
```
Fix some anomalies when scheduling multiple GRES for a single job
(e.g. GPUs plus craynetwork).
```
  2e6ed363
- cons_tres fix · 6d98c2cb
  Morris Jette authored Aug 13, 2018
```
partially revert commit c6888db6d, which caused test39.5 to fail
with some configurations
```
  6d98c2cb
- add tests for GRES lacking node topology · 244d497a
  Morris Jette authored Aug 10, 2018
  
  244d497a
- add job ID with test number to identify possible failures · c6a63b5f
  Morris Jette authored Aug 10, 2018
  
  c6a63b5f
- Fix some cons_tres GRES bugs · a2c9f0e5
  Morris Jette authored Aug 10, 2018
```
Previous logic had an invalid pointer that could result in a segv.
Previous logic failed to properly allocate tres-per-node ot tres-per-job
without defining gres topology information
```
  a2c9f0e5
- modify scontrol show job gres info · 38ddfc1e
  Morris Jette authored Aug 10, 2018
```
rather than just report GRES_IDX (index) info, report GRES count
info as well, since it can vary from node-to-node with cons_tres
```
  38ddfc1e
- cons_tres work for gres without topology · 8c235912
  Morris Jette authored Aug 10, 2018
```
This fixes a couple of bugs related to allocating GRES when
there is no associated topology, including adding support for
the --tres-per-job option
```
  8c235912
- add srun --tres-per-job option · eb5ef3e4
  Morris Jette authored Aug 10, 2018
```
for current cons_tres testing and future use
```
  eb5ef3e4
- cons_gres: fix GRES string build for new options · 4d2ea15f
  Morris Jette authored Aug 10, 2018
  
  4d2ea15f
- cons_gres: enforce configured gres defaults · 1bdec16c
  Morris Jette authored Aug 09, 2018
```
Enforce configured default values for DefMemPerGPU
and DefCPUPerGPU
```
  1bdec16c
- cons_tres - enhance mem-per-gpu logic · c332b480
  Morris Jette authored Aug 09, 2018
```
spread a job over multiple nodes if needed to
satisfy mem-per-gpu specification
```
  c332b480
- cons_tres: first pass on mem-per-gpu logic · 796d8d2d
  Morris Jette authored Aug 08, 2018
  
  796d8d2d
- cons_tres: --cpus-per-gpu enhancements · b717dc96
  Morris Jette authored Aug 07, 2018
```
Force job to span nodes when appropriate
```
  b717dc96
- Add some --gpu-freq logic including test · 7f86e971
  Morris Jette authored Aug 07, 2018
```
The last bit of logic to talk with the GPUs is still needed
```
  7f86e971
- cons_tres: pass info for cpus per core calculation · 61f8c922
  Morris Jette authored Aug 06, 2018
  
  61f8c922
- cons_tres: harden logic for missing bitmap · 6994b9cb
  Morris Jette authored Aug 06, 2018
  
  6994b9cb
- cons_tres: add --gpu-bind infrastructures · 68b47156
  Morris Jette authored Aug 06, 2018
```
this includes a new regression test
```
  68b47156
- cons_tres: flesh out cpus-per-gpu logic more · 8230d674
  Morris Jette authored Aug 02, 2018
  
  8230d674
- cons_tres: first cut at cpus-per-gres support · 4728168c
  Morris Jette authored Aug 02, 2018
  
  4728168c
- cons_tres: fix some logic to spread job across sockets · b2379f70
  Morris Jette authored Aug 02, 2018
```
This fixes a couple of bugs in commit 0e4874e19490a24
1. round up core count for job as needed (i.e. if job needs 3
   CPUs per task and there are 2 CPUs per core, the job needs
   2 cores rather than 1)
2. fix some bad logic of available cores on socket 0 is 0
3. failed to set exit_code to 1 on a expect test failure
```
  b2379f70
- cons_tres correction · 155afb83
  Morris Jette authored Aug 01, 2018
```
correction to logic for explicit hostname specification on job submit
bug introduced in commit 0e4874e19490a24fb54961ef89176a3e8f55952b
```
  155afb83
- try to allocate cpus on all sockets as gpus · 5433e2a6
  Morris Jette authored Aug 01, 2018
```
also add a regression test for this scheduling logic
bug 4584
```
  5433e2a6
- Add new job gpu test · 2096f797
  Morris Jette authored Aug 01, 2018
```
Add that desired GPU count is actually allocated to a job based upon
--gpus, --gpus-per-node, --gpus-per-socket, and --gpus-per-task
options
```
  2096f797
- cons_tres: fix logic determining available CPUs · 29b54c67
  Morris Jette authored Aug 01, 2018
  
  29b54c67
- cons_tres: tres-per-task dev work · 3de7199c
  Morris Jette authored Jul 31, 2018
  
  3de7199c
- modify test to avoid vestigial output file · 90d065b7
  Morris Jette authored Jul 31, 2018
  
  90d065b7