Commits · b2e30b01ae8759c9fe8b67ec8a13f85d0c29a4d5 · Manuel G. Marciani / ces_slurm_simulator

13 Jul, 2017 8 commits
- Merge branch 'slurm-17.02' · b2e30b01
  Morris Jette authored Jul 13, 2017
  
  b2e30b01
- Modify srun --pty option to use configured SrunPortRange range · 9ed3a300
  Tim Shaw authored Jul 13, 2017
```
bug 3979
```
  9ed3a300
- Merge branch 'slurm-17.02' · 209e7b31
  Tim Wickberg authored Jul 13, 2017
  
  209e7b31
- When resuming node only send one message to the slurmdbd. · 2f2680d7
  Danny Auble authored Jul 13, 2017
```
Bug 3967
```
  2f2680d7
- Make srun --pty option ignore EINTR allowing windows to resize. · 88c60014
  Danny Auble authored Jul 13, 2017
```
Bug 3979 and 3989
```
  88c60014
- Revert "Make srun --pty option ignore EINTR allowing windows to resize." · 8dd12372
  Danny Auble authored Jul 13, 2017
```
This reverts commit d49081df.
```
  8dd12372
- Make srun --pty option ignore EINTR allowing windows to resize. · d49081df
  Danny Auble authored Jul 13, 2017
```
Bug 3979 and 3989
```
  d49081df
- add additional check before using buffer returned from slurm_persist_msg_pack() · 82e08ad9
  Dominik Bartkiewicz authored Jul 13, 2017
  
  82e08ad9
11 Jul, 2017 1 commit
- When clearing a hashtbl clear the global variables to free their memory as well. · c7fe6d14
  Danny Auble authored Jul 11, 2017
```
This isn't a memory leak, but does show memory not freed.
```
  c7fe6d14
10 Jul, 2017 1 commit
- Docs - update link to pestat tool. · 449ad270
  Ole H Nielsen authored Jul 10, 2017
  
  449ad270
07 Jul, 2017 5 commits

Follow on to last commit. Make it so no job that is currently pending · 66201036
Danny Auble authored Jul 07, 2017
```
will have a time displayed when truncating time.

Bug 3940.
```
66201036

Set job/step start and end times to 0 when using --truncate and start > end. · 3e11d04c

Alejandro Sanchez authored Jul 07, 2017

Otherwise we can end up printing Start times greater than End times,
leading to confusion when reading sacct output. 0 is displayed as Unknown.
Cosmetic change.

Bug 3940.

3e11d04c

Do not defer slurmd node registration if HealthCheckProgram fails · b31fa177

Alejandro Sanchez authored Jul 07, 2017

This behavior was introduced in bug 2504, commit 7fb0c981 and bug 2643
commit 988edf12 respectively.

The reasoning is that sysadmins who see nodes with Reason "Not Responding"
but they can manually ping/access the node end up confused. That reason
should only be set if the node is trully not responding, but not if the
HealthCheckProgram execution failed or returned non-zero exit code. For
that case, the program itself would take the appropiate actions, such
as draining the node and setting an appropiate Reason.

Bug 3931

b31fa177

Fix potential memory leak when creating partition name. · 3c161d32
Dominik Bartkiewicz authored Jul 07, 2017

3c161d32
Fix deadlock if requesting to create more than 10000 reservations. · b7bb1e05
Dominik Bartkiewicz authored Jul 07, 2017

b7bb1e05

06 Jul, 2017 8 commits
- job_test_resv() · 81ea1cb0
  Dominik Bartkiewicz authored Jul 06, 2017
  
  81ea1cb0
- Fix memory leak reported by Coverity · 1728045f
  Morris Jette authored Jul 05, 2017
```
CID 171497
```
  1728045f
- Docs - remove stray reference to old cgroup ReleaseAgent settings. · 8b485909
  David Matthews authored Jul 06, 2017
```
Bug 3963.
```
  8b485909
- Merge branch 'federation' · 9d02b2db
  Brian Christiansen authored Jul 06, 2017
  
  9d02b2db
- Update test37.14 · 92b1ea40
  Brian Christiansen authored Jul 06, 2017
  
  92b1ea40
- Update test37.10 · af14ddc9
  Brian Christiansen authored Jul 05, 2017
  
  af14ddc9
- Fix potential deadlock · 110aeced
  Brian Christiansen authored Jul 05, 2017
```
Since a list_for_each was being used for reconciling fed_jobs and if
fed_mgr_job_revoke() is called on a non-origin job it will try to purge
the job from the job_list which will deadlock since the list_for_each()
will be holding the job_list's mutex.
```
  110aeced
- Correct check from commit c78ef5b7 · d752588f
  Brian Christiansen authored Jul 05, 2017
  
  d752588f
05 Jul, 2017 17 commits
- Fix coverity error - uninitilized char arrays · 7a0f40f8
  Brian Christiansen authored Jul 05, 2017
```
CIDs: 45332, 45327, 45326
```
  7a0f40f8
- Fix coverity error - not checking return code · 230247b1
  Brian Christiansen authored Jul 05, 2017
```
CID: 171885
```
  230247b1
- Add URG to 'scancel --signal' options. · 47b4b785
  Tim Wickberg authored Jul 05, 2017
```
Bug 3957.
```
  47b4b785
- Merge branch 'slurm-17.02' · 5c48c6dd
  Morris Jette authored Jul 05, 2017
  
  5c48c6dd
- Start NEWS for v17.02.7 · 856ba827
  Morris Jette authored Jul 05, 2017
  
  856ba827
- Merge branch 'federation' · 5052298e
  Brian Christiansen authored Jul 05, 2017
  
  5052298e
- Update META for version 17.02.6 tag · 737a8840
  Morris Jette authored Jul 05, 2017
  
  737a8840
- Fix clusters constraints getting to controller · 1581957c
  Brian Christiansen authored Jul 05, 2017
```
Was initially added in 734d6f63 but was refactored out in the
heterogenous jobs branch.
```
  1581957c
- Start fed job when origin has been removed · 60c0a35d
  Brian Christiansen authored Jul 03, 2017
```
When an origin cluster is removed from the federation it could keep
federated jobs in the federation without an origin (e.g. job is viable
on multiple siblings other than the origin cluster). The job should
schedule amongst its siblings when the origin is gone.
```
  60c0a35d
- Leave jobs pending in fed when cluster is removed · 8d297bd0
  Brian Christiansen authored Jul 03, 2017
```
When a cluster is removed from the federation, pending jobs should
remain pending.

1. If a job is pending on a origin cluster and the origin is being
removed then leave the pending job on the origin as a non-federated job
and remove the other sibling jobs.

2. If the job is viable on only one cluster than leave it as a pending
as non-federated job on the viable cluster.

3. If the origin cluster is being removed and the job is viable on
multiple clusters other than the origin then leave the sibling jobs as
federated job and the remainin viable clusters will schedule amongst
themselves to start the job.
```
  8d297bd0
- Update fed tests to handle updated revoked states · 852691a9
  Brian Christiansen authored Jul 03, 2017
  
  852691a9
- Fed origin jobs not eligible should be revoked · 4db6651b
  Brian Christiansen authored Jul 03, 2017
```
Previously they were treated as only pending.
```
  4db6651b
- Fix federated job requeue · 347c7354
  Brian Christiansen authored Jul 03, 2017
```
With the addition of b9719be2, which deletes the job file in a
separate thread, the job file could still exist when a new sibling job
is being submitted as a requeued fed job. The file needs to deleted
before submitting a new fed sib job.
```
  347c7354
- Delete federated origin jobs after minjobage · f1441da3
  Brian Christiansen authored Jul 03, 2017
```
It wasn't doing it for origin jobs.
```
  f1441da3
- Fix error handling. · 20c324a7
  Brian Christiansen authored Jun 28, 2017
```
The persistent connection was being destroyed which closed the socket
which made it so that the response rc couldn't make it back to the
originating cluster.
```
  20c324a7
- Remove unused function · f9f125da
  Brian Christiansen authored Jun 28, 2017
  
  f9f125da
- Disable sview grid for federated views · 433cd7ae
  Brian Christiansen authored Jun 28, 2017
  
  433cd7ae