Commits · 118ffaf99b5594d2f341c17752dbad68aa9ade25 · Manuel G. Marciani / ces_slurm_simulator

27 Oct, 2016 40 commits
- Update NEWS · 118ffaf9
  Brian Christiansen authored Oct 27, 2016
```
Federated submissions
```
  118ffaf9
- Fix test to not find expected 'error' · 38f72141
  Brian Christiansen authored Oct 26, 2016
```
e.g.
allocation failure: Unspecified error
```
  38f72141
- Fix spelling in test · c26a295f
  Brian Christiansen authored Oct 26, 2016
  
  c26a295f
- Fix get_next_job_id to return fed job id · 288cd42b
  Brian Christiansen authored Oct 26, 2016
```
get_next_job_id() was returning a local id and then the fed_mgr was
turning that into a fed job id. This was a problem because
get_next_job_id() couldn't check to see if an existing job already had
the fed job id. It was only checking for the local job id. This was
exposed in tests that did a reconfigure and the reconfigure loaded in a
old job_id_sequence so that the next job got an id that was already
being used.
```
  288cd42b
- Add -M<clusters> option to salloc,srun · a542eb33
  Brian Christiansen authored Oct 25, 2016
```
The logic to talk to the correct compute nodes still needs to be
implemented. It will come at a later date.
```
  a542eb33
- Remove extra line · fddabe36
  Brian Christiansen authored Oct 24, 2016
  
  fddabe36
- Enable srun/salloc job allocations to federation · 5510e7d3
  Brian Christiansen authored Oct 18, 2016
```
Will submit using federation submission logic. Scheduling logic to come.
```
  5510e7d3
- Refactor _slurm_rpc_submit_batch_job() · 566af952
  Brian Christiansen authored Oct 18, 2016
```
to make sure job ptr is accessed within locks.
```
  566af952
- Refactor fed_mgr_job_allocate() · 9bcad8ad
  Brian Christiansen authored Oct 18, 2016
```
In prep for refactoring _slurm_rpc_submit_batch_job to make sure the
job_ptr is accessed within locks.
```
  9bcad8ad
- Only do func if loglevel is at least debug3 · 553c5422
  Brian Christiansen authored Oct 18, 2016
  
  553c5422
- Correctly reset cluster weight in test · 35dc99e9
  Brian Christiansen authored Oct 18, 2016
  
  35dc99e9
- Prevent invalid read when fed_mgr is finishing · 9e4f1488
  Brian Christiansen authored Oct 17, 2016
  
  9e4f1488
- Disable job arrays when in a federation · 4c31e15a
  Brian Christiansen authored Oct 13, 2016
  
  4c31e15a
- Add test37.4 to test federated job submissions · 168395fd
  Brian Christiansen authored Oct 13, 2016
  
  168395fd
- Update info message with timestamp · adec2842
  Brian Christiansen authored Oct 13, 2016
  
  adec2842
- Fix fixing sibling that can start fed job now · d3c70366
  Brian Christiansen authored Oct 13, 2016
```
It was picking a higher weighted federation over lower weighted
federations because it had a earlier starttime. This shouldn't happen
because that's what the weights are for.

e.g.
will_run_resp for fed1: start:2016-10-13T15:19:47 sys_usage:0.00   weight:2
will_run_resp for fed2: start:2016-10-13T15:19:48 sys_usage:0.00   weight:1
will_run_resp for fed3: start:2016-10-13T15:19:48 sys_usage:0.00   weight:1
Earliest cluster:fed1 time:1476393587 now:1476393588
Submitted federated job 67119254 to fed1(self)
```
  d3c70366
- Be able to cancel fed tracker only jobs · aff5c3de
  Brian Christiansen authored Oct 13, 2016
  
  aff5c3de
- Add -O, --Format option to squeue --help · 29bb5011
  Brian Christiansen authored Oct 12, 2016
  
  29bb5011
- Document squeue federated job long output options · 1cde3f71
  Brian Christiansen authored Oct 12, 2016
```
fedorigin
fedoriginraw
fedsiblings
fedsiblingsraw
```
  1cde3f71
- Throttle fed_mgr_job_alloc for batch job rpcs · 19cc14f2
  Brian Christiansen authored Oct 12, 2016
  
  19cc14f2
- Fix comment · e765d567
  Brian Christiansen authored Oct 12, 2016
  
  e765d567
- Have resource alloc rpc respond on persist conn · c8407f77
  Brian Christiansen authored Oct 12, 2016
```
If it exists.
```
  c8407f77
- Add comment to get_next_job_id · 977e576c
  Brian Christiansen authored Oct 12, 2016
  
  977e576c
- Add comment for job_desc_msg_t structure · d4d6760b
  Brian Christiansen authored Oct 12, 2016
  
  d4d6760b
- Sort and prettify fed_mgr.h · cd9143c1
  Brian Christiansen authored Oct 06, 2016
  
  cd9143c1
- Fix local fed_mgr functions to be declared static · 7ca3cfda
  Brian Christiansen authored Oct 06, 2016
  
  7ca3cfda
- Remove unused fed_mgr function. · 004141d1
  Brian Christiansen authored Oct 06, 2016
  
  004141d1
- Fix sending one willrun to only one fed. · 30299105
  Brian Christiansen authored Oct 06, 2016
```
cluster_rec->fed.name will be non-null and empty when the cluster is not
part of a federation. Need to check fed.id instead. A fed.id of 0 means
the cluster is not part of federation.
```
  30299105
- Note that dbd must be up for -M<clusters> option · 305fadef
  Brian Christiansen authored Oct 06, 2016
  
  305fadef
- Add comment why -M<clusters> is not threaded. · 61bbeae6
  Brian Christiansen authored Oct 06, 2016
```
See previous unreverted commit.
```
  61bbeae6
- Revert "Thread -M<clusters> will_run calls" · e18fda18
  Brian Christiansen authored Oct 06, 2016
```
This reverts commit 2ec92d36a8ad7184897c9a322ba2d9978d2ccdbd.
```
  e18fda18
- Thread -M<clusters> will_run calls · 116ff7de
  Brian Christiansen authored Oct 06, 2016
```
This is an example of how to do it. The problem is that select_jobinfo
on the job_desc is packed using working_cluster's->plugin_id.
job_desc's->select_jobinfo is only used by bluegene and alps code which
will eventually go away.
```
  116ff7de
- Don't burn job_ids on will_runs · 63698ae5
  Brian Christiansen authored Oct 05, 2016
  
  63698ae5
- Enable sbatch -M<clusters> for federations · 734d6f63
  Brian Christiansen authored Oct 05, 2016
```
sbatch will choose the federation or individual cluster with the fast
startime. A willrun to a federation will return the fastest start time
of all clusters in a federation.
```
  734d6f63
- Only do one willrun per fed when picking a cluster · 76494b58
  Brian Christiansen authored Oct 05, 2016
```
with the sbatch -M<clusters> option. A fed will run will the fastest
time of all siblings in a federation.
```
  76494b58
- Fix indentation · 27a2bfc8
  Brian Christiansen authored Oct 05, 2016
  
  27a2bfc8
- Alphabetize -M<clusters> option in sbatch help · 06311002
  Brian Christiansen authored Oct 05, 2016
```
Was moved in 5cfc577a
```
  06311002
- Handle error case where no will_runs respond · 23ea78d7
  Brian Christiansen authored Oct 04, 2016
  
  23ea78d7
- Only submit federated jobs to -M<cluster_list> · 4b5f1036
  Brian Christiansen authored Oct 04, 2016
```
More to come. This sets up the controller side.
```
  4b5f1036
- Update error message · 9dd708f3
  Brian Christiansen authored Oct 03, 2016
  
  9dd708f3