Commits · 7ca3cfdacf0fc0728649c0be7320f19fec42f8f5 · Manuel G. Marciani / ces_slurm_simulator

27 Oct, 2016 40 commits
- Fix local fed_mgr functions to be declared static · 7ca3cfda
  Brian Christiansen authored Oct 06, 2016
  
  7ca3cfda
- Remove unused fed_mgr function. · 004141d1
  Brian Christiansen authored Oct 06, 2016
  
  004141d1
- Fix sending one willrun to only one fed. · 30299105
  Brian Christiansen authored Oct 06, 2016
```
cluster_rec->fed.name will be non-null and empty when the cluster is not
part of a federation. Need to check fed.id instead. A fed.id of 0 means
the cluster is not part of federation.
```
  30299105
- Note that dbd must be up for -M<clusters> option · 305fadef
  Brian Christiansen authored Oct 06, 2016
  
  305fadef
- Add comment why -M<clusters> is not threaded. · 61bbeae6
  Brian Christiansen authored Oct 06, 2016
```
See previous unreverted commit.
```
  61bbeae6
- Revert "Thread -M<clusters> will_run calls" · e18fda18
  Brian Christiansen authored Oct 06, 2016
```
This reverts commit 2ec92d36a8ad7184897c9a322ba2d9978d2ccdbd.
```
  e18fda18
- Thread -M<clusters> will_run calls · 116ff7de
  Brian Christiansen authored Oct 06, 2016
```
This is an example of how to do it. The problem is that select_jobinfo
on the job_desc is packed using working_cluster's->plugin_id.
job_desc's->select_jobinfo is only used by bluegene and alps code which
will eventually go away.
```
  116ff7de
- Don't burn job_ids on will_runs · 63698ae5
  Brian Christiansen authored Oct 05, 2016
  
  63698ae5
- Enable sbatch -M<clusters> for federations · 734d6f63
  Brian Christiansen authored Oct 05, 2016
```
sbatch will choose the federation or individual cluster with the fast
startime. A willrun to a federation will return the fastest start time
of all clusters in a federation.
```
  734d6f63
- Only do one willrun per fed when picking a cluster · 76494b58
  Brian Christiansen authored Oct 05, 2016
```
with the sbatch -M<clusters> option. A fed will run will the fastest
time of all siblings in a federation.
```
  76494b58
- Fix indentation · 27a2bfc8
  Brian Christiansen authored Oct 05, 2016
  
  27a2bfc8
- Alphabetize -M<clusters> option in sbatch help · 06311002
  Brian Christiansen authored Oct 05, 2016
```
Was moved in 5cfc577a
```
  06311002
- Handle error case where no will_runs respond · 23ea78d7
  Brian Christiansen authored Oct 04, 2016
  
  23ea78d7
- Only submit federated jobs to -M<cluster_list> · 4b5f1036
  Brian Christiansen authored Oct 04, 2016
```
More to come. This sets up the controller side.
```
  4b5f1036
- Update error message · 9dd708f3
  Brian Christiansen authored Oct 03, 2016
  
  9dd708f3
- Get fed jobid before doing willruns to self · 20fa714b
  Brian Christiansen authored Oct 03, 2016
  
  20fa714b
- Handle clang error - prevent null deref · 56eadd0e
  Brian Christiansen authored Sep 28, 2016
  
  56eadd0e
- Don't show error when fed state was empty · 110fd4b0
  Brian Christiansen authored Sep 28, 2016
```
It could have state saved a null federation.
```
  110fd4b0
- Update test37.1 to add 63 clusters at once · 4b9cf938
  Brian Christiansen authored Sep 28, 2016
```
instead of one at a time. Saves ~20 seconds.
```
  4b9cf938
- Update fed tests to space out adding clusters · dba94cb7
  Brian Christiansen authored Sep 28, 2016
```
If all of the clusters get updated at the same time, the could get in a
state where they are waiting on each other to respond and will
eventually timeout and will then will reconnect. By spacing out the
clusters being added this helps prevent them from talking to everyone at
the same time.
```
  dba94cb7
- Update test error message · 48ac0ffa
  Brian Christiansen authored Sep 28, 2016
  
  48ac0ffa
- Update test37.3 for updated output · 274fd1d7
  Brian Christiansen authored Sep 28, 2016
  
  274fd1d7
- Handle error case in test · cbf3d4b8
  Brian Christiansen authored Sep 27, 2016
  
  cbf3d4b8
- Don't schedule tracker only fed jobs · 68da6b91
  Brian Christiansen authored Sep 27, 2016
  
  68da6b91
- Don't use tracker only fed jobs in start time estimates · 27741784
  Brian Christiansen authored Sep 27, 2016
  
  27741784
- Submit federated jobs to siblings · ec3d4891
  Brian Christiansen authored Sep 27, 2016
  
  ec3d4891
- Refactor get_next_job_id() to give valid job_ids · 030d5233
  Brian Christiansen authored Sep 27, 2016
```
get_next_job_id() didn't take into consideration job_ids that already be
taken by other jobs like set_job_id() did.
```
  030d5233
- Add new protocols to send msgs to a fed sibling · 7e9a140c
  Brian Christiansen authored Sep 26, 2016
```
Send the existing packed buffer that has the job_desc to the sibling.
The sibling will unpack it on the other side. This prevents having to
pack the job_desc for each willrun/allocation to each sibling.
```
  7e9a140c
- Add squeue option to show tracker only fed jobs · aea05a4b
  Brian Christiansen authored Sep 26, 2016
```
squeue --fedtrack
```
  aea05a4b
- Display job fed status in squeue and show jobs · eb634c74
  Brian Christiansen authored Sep 26, 2016
```
squeue long options: fedorigin, fedoriginraw, fedsiblings and
fedsiblingsraw.
```
  eb634c74
- Store on job where siblings are · c97c0866
  Brian Christiansen authored Sep 26, 2016
```
Also make strings of siblings for passing back to the api.
```
  c97c0866
- Add fed_siblings to job_desc to track sibling jobs · 7c30df95
  Brian Christiansen authored Sep 26, 2016
  
  7c30df95
- Keep msg buffer around to send to siblings · e2cb4c55
  Brian Christiansen authored Sep 26, 2016
```
For the fed_mgr, instead of packing up the job_desc again for will_runs
and allocations to each sibling just send the buffer that came in.

Since job_allocate will modify the received job_desc_msg_t, this also
makes it easy to make a copy of the received job_desc_msg_t to do a
willrun on the receiving cluster before allocating resources.
```
  e2cb4c55
- Fix NRT build - remove type casting from switch_p_unpack_node_info · f38c8db4
  Dominik Bartkiewicz authored Oct 27, 2016
  
  f38c8db4
- Fix build with NRT - use correct format to print memory value · d2b12310
  Dominik Bartkiewicz authored Oct 27, 2016
  
  d2b12310
- Merge branch 'slurm-16.05' · dd2c8a70
  Morris Jette authored Oct 27, 2016
  
  dd2c8a70
- Merge branch 'slurm-16.05' · 4785f164
  Tim Wickberg authored Oct 27, 2016
  
  4785f164
- Clarify job submit -B option use · bdf5a4f1
  Morris Jette authored Oct 27, 2016
```
This option specifies minimum characteristics of the compute nodes
  which should be considered for use, not the resource allocation
  size.
bug 3118
```
  bdf5a4f1
- Testsuite - fix test15.21 to handle non-sequential but contiguous nodes correctly. · 642a6f8c
  Alejandro Sanchez authored Oct 27, 2016
```
Create separate check_hosts_contiguous procedure in globals and use it
for both test1.83 and test15.21.

Bug 3006.
```
  642a6f8c
- Add logging of node reboot requests · 60211fa0
  Morris Jette authored Oct 27, 2016
  
  60211fa0