Commits · 76494b5869bb245994a82f1ebc689cbc84e4ee8a · Manuel G. Marciani / ces_slurm_simulator

27 Oct, 2016 31 commits
- Only do one willrun per fed when picking a cluster · 76494b58
  Brian Christiansen authored Oct 05, 2016
```
with the sbatch -M<clusters> option. A fed will run will the fastest
time of all siblings in a federation.
```
  76494b58
- Fix indentation · 27a2bfc8
  Brian Christiansen authored Oct 05, 2016
  
  27a2bfc8
- Alphabetize -M<clusters> option in sbatch help · 06311002
  Brian Christiansen authored Oct 05, 2016
```
Was moved in 5cfc577a
```
  06311002
- Handle error case where no will_runs respond · 23ea78d7
  Brian Christiansen authored Oct 04, 2016
  
  23ea78d7
- Only submit federated jobs to -M<cluster_list> · 4b5f1036
  Brian Christiansen authored Oct 04, 2016
```
More to come. This sets up the controller side.
```
  4b5f1036
- Update error message · 9dd708f3
  Brian Christiansen authored Oct 03, 2016
  
  9dd708f3
- Get fed jobid before doing willruns to self · 20fa714b
  Brian Christiansen authored Oct 03, 2016
  
  20fa714b
- Handle clang error - prevent null deref · 56eadd0e
  Brian Christiansen authored Sep 28, 2016
  
  56eadd0e
- Don't show error when fed state was empty · 110fd4b0
  Brian Christiansen authored Sep 28, 2016
```
It could have state saved a null federation.
```
  110fd4b0
- Update test37.1 to add 63 clusters at once · 4b9cf938
  Brian Christiansen authored Sep 28, 2016
```
instead of one at a time. Saves ~20 seconds.
```
  4b9cf938
- Update fed tests to space out adding clusters · dba94cb7
  Brian Christiansen authored Sep 28, 2016
```
If all of the clusters get updated at the same time, the could get in a
state where they are waiting on each other to respond and will
eventually timeout and will then will reconnect. By spacing out the
clusters being added this helps prevent them from talking to everyone at
the same time.
```
  dba94cb7
- Update test error message · 48ac0ffa
  Brian Christiansen authored Sep 28, 2016
  
  48ac0ffa
- Update test37.3 for updated output · 274fd1d7
  Brian Christiansen authored Sep 28, 2016
  
  274fd1d7
- Handle error case in test · cbf3d4b8
  Brian Christiansen authored Sep 27, 2016
  
  cbf3d4b8
- Don't schedule tracker only fed jobs · 68da6b91
  Brian Christiansen authored Sep 27, 2016
  
  68da6b91
- Don't use tracker only fed jobs in start time estimates · 27741784
  Brian Christiansen authored Sep 27, 2016
  
  27741784
- Submit federated jobs to siblings · ec3d4891
  Brian Christiansen authored Sep 27, 2016
  
  ec3d4891
- Refactor get_next_job_id() to give valid job_ids · 030d5233
  Brian Christiansen authored Sep 27, 2016
```
get_next_job_id() didn't take into consideration job_ids that already be
taken by other jobs like set_job_id() did.
```
  030d5233
- Add new protocols to send msgs to a fed sibling · 7e9a140c
  Brian Christiansen authored Sep 26, 2016
```
Send the existing packed buffer that has the job_desc to the sibling.
The sibling will unpack it on the other side. This prevents having to
pack the job_desc for each willrun/allocation to each sibling.
```
  7e9a140c
- Add squeue option to show tracker only fed jobs · aea05a4b
  Brian Christiansen authored Sep 26, 2016
```
squeue --fedtrack
```
  aea05a4b
- Display job fed status in squeue and show jobs · eb634c74
  Brian Christiansen authored Sep 26, 2016
```
squeue long options: fedorigin, fedoriginraw, fedsiblings and
fedsiblingsraw.
```
  eb634c74
- Store on job where siblings are · c97c0866
  Brian Christiansen authored Sep 26, 2016
```
Also make strings of siblings for passing back to the api.
```
  c97c0866
- Add fed_siblings to job_desc to track sibling jobs · 7c30df95
  Brian Christiansen authored Sep 26, 2016
  
  7c30df95
- Keep msg buffer around to send to siblings · e2cb4c55
  Brian Christiansen authored Sep 26, 2016
```
For the fed_mgr, instead of packing up the job_desc again for will_runs
and allocations to each sibling just send the buffer that came in.

Since job_allocate will modify the received job_desc_msg_t, this also
makes it easy to make a copy of the received job_desc_msg_t to do a
willrun on the receiving cluster before allocating resources.
```
  e2cb4c55
- Fix NRT build - remove type casting from switch_p_unpack_node_info · f38c8db4
  Dominik Bartkiewicz authored Oct 27, 2016
  
  f38c8db4
- Fix build with NRT - use correct format to print memory value · d2b12310
  Dominik Bartkiewicz authored Oct 27, 2016
  
  d2b12310
- Merge branch 'slurm-16.05' · dd2c8a70
  Morris Jette authored Oct 27, 2016
  
  dd2c8a70
- Merge branch 'slurm-16.05' · 4785f164
  Tim Wickberg authored Oct 27, 2016
  
  4785f164
- Clarify job submit -B option use · bdf5a4f1
  Morris Jette authored Oct 27, 2016
```
This option specifies minimum characteristics of the compute nodes
  which should be considered for use, not the resource allocation
  size.
bug 3118
```
  bdf5a4f1
- Testsuite - fix test15.21 to handle non-sequential but contiguous nodes correctly. · 642a6f8c
  Alejandro Sanchez authored Oct 27, 2016
```
Create separate check_hosts_contiguous procedure in globals and use it
for both test1.83 and test15.21.

Bug 3006.
```
  642a6f8c
- Add logging of node reboot requests · 60211fa0
  Morris Jette authored Oct 27, 2016
  
  60211fa0
26 Oct, 2016 9 commits
- Merge branch 'slurm-16.05' · 2903a97d
  Morris Jette authored Oct 26, 2016
  
  2903a97d
- Fix for clearing node MAINT node flag · f3930fbe
  Morris Jette authored Oct 26, 2016
```
Fix bug that was clearing MAINT mode on nodes scheduled for reboot (bug
    introduced in version 16.05.5 to address bug in overlapping reservations,
    commit 5eee1d28). Note that a node's
    MAINT flag is used for both a requested reboot and maintenance reservation.
    What I'd like to do is add a new node state flag to differenciate
    between these two cases, but that involves some significant changes
    that could introduce instability, so it will be defered to version
    17.02
bug 3210
```
  f3930fbe
- Fix node reboot flag preservation · 4e179d3b
  Morris Jette authored Oct 26, 2016
```
Correct/expand description of NODE_STATE_FLAG
```
  4e179d3b
- Minor format fix · f059a200
  Danny Auble authored Oct 26, 2016
  
  f059a200
- Make slightly more efficient code. Add on to the last commit. · a4c9b8be
  Danny Auble authored Oct 26, 2016
  
  a4c9b8be
- Fix issue where number of nodes is not properly allocated when sbatch and · 1fbd95f7
  Alejandro Sanchez authored Oct 26, 2016
```
salloc are requested with -n tasks < hosts from -w hostlist or from -N.
```
  1fbd95f7
- Merge remote-tracking branch 'origin/slurm-16.05' · d60aa2be
  Danny Auble authored Oct 26, 2016
  
  d60aa2be
- Update srun documentation for -N, -w and -m arbitrary. · 3a5fc0ec
  Danny Auble authored Oct 26, 2016
  
  3a5fc0ec
- Fix issue where number of nodes is not properly allocated when srun is · d80bd01e
  Danny Auble authored Oct 26, 2016
```
requested with -n tasks < hosts from -w hostlist.
```
  d80bd01e