Commits · 85e8fe939a6328f34facef166dee29fb0cd5a781 · Manuel G. Marciani / ces_slurm_simulator

26 Jun, 2017 2 commits
- Initial work for heterogenous step launch · 85e8fe93
  Morris Jette authored Jun 26, 2017
  
  85e8fe93
- Backfill scheduling improvements · ef1f3e73
  Dominik Bartkiewicz authored Jun 26, 2017
```
Improve backfill scheduling algorithm with respect to starting jobs as soon
    as possible while avoiding advanced reservations.
bug 3757
```
  ef1f3e73
24 Jun, 2017 4 commits
- Fix ALPS and BGQ build errors for pack jobs · 016b004d
  Morris Jette authored Jun 23, 2017
  
  016b004d
- Merge branch 'hetero_jobs' · 0bc4ac49
  Morris Jette authored Jun 23, 2017
  
  0bc4ac49
- Correct typos in NEWS · 30c013a4
  Morris Jette authored Jun 23, 2017
  
  30c013a4
- Describe contents of hetero_jobs branch merge · 82a7d1d3
  Morris Jette authored Jun 23, 2017
  
  82a7d1d3
23 Jun, 2017 7 commits
- Fix for bad commit 5947670e · bcbb4447
  Morris Jette authored Jun 23, 2017
```
bug 3886
```
  bcbb4447
- Merge branch 'slurm-17.02' · edd00d54
  Morris Jette authored Jun 23, 2017
  
  edd00d54
- Print warning in slurmctld if the stack size is not unlimited. · 5947670e
  Tim Shaw authored Jun 23, 2017
```
Not necessarily fatal(), but of potential interest when debugging
odd slurmctld crashes. Cannot go where the limit is originally set,
as the logging infrastructure is not avaiable at that point.

Bug 3886.
```
  5947670e
- Disable task binding test if TaskPluginParams=boards/sockets/cores · 5ec9d584
  Morris Jette authored Jun 23, 2017
```
test1.91 fails with non-default binding
```
  5ec9d584
- Merge branch 'slurm-17.02' · 9bbb0322
  Tim Wickberg authored Jun 23, 2017
  
  9bbb0322
- Fix configurator.easy.html to output SelectTypeParameters line. · 9c8f0426
  Tim Shaw authored Jun 23, 2017
```
Bug 3581.
```
  9c8f0426
- Define variables needed in plugin in case of use from API · 237ab634
  Morris Jette authored Jun 23, 2017
```
Fix for commit 250378c2
  test7.3 was failing without this patch
  bug 3502
```
  237ab634
22 Jun, 2017 27 commits
- Merge branch 'slurm-17.02' · 3e763ab7
  Morris Jette authored Jun 22, 2017
  
  3e763ab7
- Improve test clean-up · 9b631f2a
  Morris Jette authored Jun 22, 2017
```
test 17.12 was leaving slurm-#.out files around. Explicitly set
  output file to /dev/null and set time limit to 1 minute to avoid
  vestigial jobs.
```
  9b631f2a
- Start NEWS for v17.11.0pre2 · 276d2f24
  Morris Jette authored Jun 22, 2017
  
  276d2f24
- Update META for v17.11.0-pre1 tag · 46d8862c
  Morris Jette authored Jun 22, 2017
  
  46d8862c
- Merge branch 'slurm-17.02' · 91f4ae59
  Morris Jette authored Jun 22, 2017
  
  91f4ae59
- Start NEWS for v17.02.6 · 27df99a2
  Morris Jette authored Jun 22, 2017
  
  27df99a2
- Update META for v17.02.5 tag · b1d644d2
  Morris Jette authored Jun 22, 2017
  
  b1d644d2
- Update NEWS · ca5d7574
  Brian Christiansen authored Jun 22, 2017
  
  ca5d7574
- Merge branch 'federation' · 5d2bc28d
  Brian Christiansen authored Jun 22, 2017
  
  5d2bc28d
- Update federation syncing. · c78ef5b7
  Brian Christiansen authored Jun 22, 2017
```
Handle syncing between siblings (not the origin). Job could have been
canclled or started while the origin was down. The sibling should see
this and remove it's copy if the job is running or cancelled on the
remote cluster.

Handle case where job was cancelled or finished while the origin was
down and the siblings were up.
```
  c78ef5b7
- Don't queue up job complete msg to origin if down · c4810d57
  Brian Christiansen authored Jun 22, 2017
```
The controller will keeps job in the job_list until the origin comes
back up and will find out about it then.
```
  c4810d57
- Keep federated jobs until origin is up and synced · 08d534c5
  Brian Christiansen authored Jun 22, 2017
```
This allows the origin to be able to sync up jobs after it has been
down.
```
  08d534c5
- Prevent scheduling until all active siblings sync · 5982e2ad
  Brian Christiansen authored Jun 22, 2017
  
  5982e2ad
- Send cancel to all viable sibs when origin is down · 49c27c23
  Brian Christiansen authored Jun 22, 2017
  
  49c27c23
- Fix passing of int ptr · 30eda101
  Brian Christiansen authored Jun 22, 2017
```
(void *)(intptr_t)0 is treated as a NULL
```
  30eda101
- Adjust test timing · a38bb2ff
  Brian Christiansen authored Jun 21, 2017
  
  a38bb2ff
- Add sanity checks · ac81244b
  Brian Christiansen authored Jun 21, 2017
  
  ac81244b
- Update regexes in test37.16 · 4e928ad3
  Brian Christiansen authored Jun 21, 2017
  
  4e928ad3
- Add error checking · 0ec2b13b
  Brian Christiansen authored Jun 20, 2017
```
_cleanup_removed_origin_jobs() could have been called without ever being
part of a federation.
```
  0ec2b13b
- Dont send fed job_complete if job was requeued · a7347067
  Brian Christiansen authored Jun 20, 2017
```
Job could have been requeued if the nodes failed.
```
  a7347067
- Fix clearing of cluster-constraints & clusters · e3a84ed0
  Isaac Hartung authored Jun 20, 2017
```
bef69448 was fixed/changed so that slurm_addto_char_list() would now
add an empty string to the list if no constraints or clusters were
given. The code was expecting an empty List previously.
```
  e3a84ed0
- Fix scontrol completing to show correct fed nodes · faf26c4b
  Brian Christiansen authored Jun 20, 2017
```
Like sview it wasn't mapping the job's node indexes to the correct
nodes since federated nodes are merged into one array.
```
  faf26c4b
- Schedule fed jobs if origin cluster is down · 4625a79c
  Brian Christiansen authored Jun 19, 2017
  
  4625a79c
- Fix memory leaks · 4644e307
  Brian Christiansen authored Jun 16, 2017
  
  4644e307
- Fix memory leak · 7fd31569
  Brian Christiansen authored Jun 16, 2017
  
  7fd31569
- Handle fed jobs when cluster is removed from fed · f12a382d
  Brian Christiansen authored Jun 15, 2017
```
while the cluster is down. The cluster will figure out what changed
after starting up or after resuming from using the cache.
```
  f12a382d
- Update fed_mgr after resuming from using cache · f116be96
  Brian Christiansen authored Jun 15, 2017
```
When the controller starts up and the dbd is not up it waits until the
dbd comes up. At this point, the controller needs to find out if
anything has changed in the federation (e.g. other clusters or self
removed from cluster).
```
  f116be96