- 06 Jul, 2017 2 commits
-
-
Brian Christiansen authored
Since a list_for_each was being used for reconciling fed_jobs and if fed_mgr_job_revoke() is called on a non-origin job it will try to purge the job from the job_list which will deadlock since the list_for_each() will be holding the job_list's mutex.
-
Brian Christiansen authored
-
- 05 Jul, 2017 26 commits
-
-
Brian Christiansen authored
CIDs: 45332, 45327, 45326
-
Brian Christiansen authored
CID: 171885
-
Tim Wickberg authored
Bug 3957.
-
Morris Jette authored
-
Morris Jette authored
-
Brian Christiansen authored
-
Morris Jette authored
-
Brian Christiansen authored
Was initially added in 734d6f63 but was refactored out in the heterogenous jobs branch.
-
Brian Christiansen authored
When an origin cluster is removed from the federation it could keep federated jobs in the federation without an origin (e.g. job is viable on multiple siblings other than the origin cluster). The job should schedule amongst its siblings when the origin is gone.
-
Brian Christiansen authored
When a cluster is removed from the federation, pending jobs should remain pending. 1. If a job is pending on a origin cluster and the origin is being removed then leave the pending job on the origin as a non-federated job and remove the other sibling jobs. 2. If the job is viable on only one cluster than leave it as a pending as non-federated job on the viable cluster. 3. If the origin cluster is being removed and the job is viable on multiple clusters other than the origin then leave the sibling jobs as federated job and the remainin viable clusters will schedule amongst themselves to start the job.
-
Brian Christiansen authored
-
Brian Christiansen authored
Previously they were treated as only pending.
-
Brian Christiansen authored
With the addition of b9719be2, which deletes the job file in a separate thread, the job file could still exist when a new sibling job is being submitted as a requeued fed job. The file needs to deleted before submitting a new fed sib job.
-
Brian Christiansen authored
It wasn't doing it for origin jobs.
-
Brian Christiansen authored
The persistent connection was being destroyed which closed the socket which made it so that the response rc couldn't make it back to the originating cluster.
-
Brian Christiansen authored
-
Brian Christiansen authored
-
Brian Christiansen authored
e.g. Jobs on cluster1 (fed:cluster) since a multiple clusters could have the same node names.
-
Brian Christiansen authored
-
Brian Christiansen authored
-
Brian Christiansen authored
Previously remote jobs would be removed from the job_list as quickly as possible to prevent collisions with requeued jobs and to clear up the jobs and the orign job would stay around till minjobage on the origin. But the origin job didn't have the details from the job that ran on a remote cluster. Now just don't show revoked jobs. The origin tracking job will remain as revoked and not shown and the remote job will hang around for display till minjobage. scontrol show jobs will show the job from the cluster that ran the job. The job is requeuable as long as the origin job is still in the origin cluster's job_list.
-
Brian Christiansen authored
Just check for the revoked state instead of checking if it's a tracker job since an origin job will be revoked if it can't run on the origin or if it's running on a remote cluster.
-
Tim Wickberg authored
-
Don Lipari authored
Bug 3938.
-
David Matthews authored
Bug 3954.
-
Gennaro Oliva authored
Bug 3947.
-
- 03 Jul, 2017 3 commits
-
-
Morris Jette authored
It's old code, but a newly report.
-
Alejandro Sanchez authored
_update_bb_resv() received a bb_spec whose units were originally always interpreted as powers of 1024 (IEC). This change supports both IEC/SI formats. Bug 3922
-
Alejandro Sanchez authored
Bug 3922
-
- 30 Jun, 2017 9 commits
-
-
Brian Christiansen authored
-
Brian Christiansen authored
With commit 3e00ede5, _pack_job_alloc_info_msg wasn't hitting the xassert as expected.
-
Morris Jette authored
-
Morris Jette authored
-
Alejandro Sanchez authored
burst_buffer logic modified to support sizes in both SI and EIC size units (e.g. M/MiB for powers of 1024, MB for powers of 1000). bug 3922
-
Dominik Bartkiewicz authored
This patch removes a window in which a message bound for the DBD could be packed with the non-dbd packing. This would result in a packed msg_type, but nothing else. When that message was given to the DBD it would complain forever about an unpacking error. Bug 3891 and 3939
-
Danny Auble authored
-
Danny Auble authored
list functions.
-
Danny Auble authored
# Conflicts: # src/slurmd/slurmd/req.c # src/srun/libsrun/allocate.c
-