- 13 Jul, 2017 8 commits
-
-
Morris Jette authored
-
Tim Shaw authored
bug 3979
-
Tim Wickberg authored
-
Danny Auble authored
Bug 3967
-
Danny Auble authored
Bug 3979 and 3989
-
Danny Auble authored
This reverts commit d49081df.
-
Danny Auble authored
Bug 3979 and 3989
-
Dominik Bartkiewicz authored
-
- 11 Jul, 2017 1 commit
-
-
Danny Auble authored
This isn't a memory leak, but does show memory not freed.
-
- 10 Jul, 2017 1 commit
-
-
Ole H Nielsen authored
-
- 07 Jul, 2017 5 commits
-
-
Danny Auble authored
will have a time displayed when truncating time. Bug 3940.
-
Alejandro Sanchez authored
Otherwise we can end up printing Start times greater than End times, leading to confusion when reading sacct output. 0 is displayed as Unknown. Cosmetic change. Bug 3940.
-
Alejandro Sanchez authored
This behavior was introduced in bug 2504, commit 7fb0c981 and bug 2643 commit 988edf12 respectively. The reasoning is that sysadmins who see nodes with Reason "Not Responding" but they can manually ping/access the node end up confused. That reason should only be set if the node is trully not responding, but not if the HealthCheckProgram execution failed or returned non-zero exit code. For that case, the program itself would take the appropiate actions, such as draining the node and setting an appropiate Reason. Bug 3931
-
Dominik Bartkiewicz authored
-
Dominik Bartkiewicz authored
-
- 06 Jul, 2017 8 commits
-
-
Dominik Bartkiewicz authored
-
Morris Jette authored
CID 171497
-
David Matthews authored
Bug 3963.
-
Brian Christiansen authored
-
Brian Christiansen authored
-
Brian Christiansen authored
-
Brian Christiansen authored
Since a list_for_each was being used for reconciling fed_jobs and if fed_mgr_job_revoke() is called on a non-origin job it will try to purge the job from the job_list which will deadlock since the list_for_each() will be holding the job_list's mutex.
-
Brian Christiansen authored
-
- 05 Jul, 2017 17 commits
-
-
Brian Christiansen authored
CIDs: 45332, 45327, 45326
-
Brian Christiansen authored
CID: 171885
-
Tim Wickberg authored
Bug 3957.
-
Morris Jette authored
-
Morris Jette authored
-
Brian Christiansen authored
-
Morris Jette authored
-
Brian Christiansen authored
Was initially added in 734d6f63 but was refactored out in the heterogenous jobs branch.
-
Brian Christiansen authored
When an origin cluster is removed from the federation it could keep federated jobs in the federation without an origin (e.g. job is viable on multiple siblings other than the origin cluster). The job should schedule amongst its siblings when the origin is gone.
-
Brian Christiansen authored
When a cluster is removed from the federation, pending jobs should remain pending. 1. If a job is pending on a origin cluster and the origin is being removed then leave the pending job on the origin as a non-federated job and remove the other sibling jobs. 2. If the job is viable on only one cluster than leave it as a pending as non-federated job on the viable cluster. 3. If the origin cluster is being removed and the job is viable on multiple clusters other than the origin then leave the sibling jobs as federated job and the remainin viable clusters will schedule amongst themselves to start the job.
-
Brian Christiansen authored
-
Brian Christiansen authored
Previously they were treated as only pending.
-
Brian Christiansen authored
With the addition of b9719be2, which deletes the job file in a separate thread, the job file could still exist when a new sibling job is being submitted as a requeued fed job. The file needs to deleted before submitting a new fed sib job.
-
Brian Christiansen authored
It wasn't doing it for origin jobs.
-
Brian Christiansen authored
The persistent connection was being destroyed which closed the socket which made it so that the response rc couldn't make it back to the originating cluster.
-
Brian Christiansen authored
-
Brian Christiansen authored
-