- 27 Oct, 2016 40 commits
-
-
Brian Christiansen authored
Federated submissions
-
Brian Christiansen authored
e.g. allocation failure: Unspecified error
-
Brian Christiansen authored
-
Brian Christiansen authored
get_next_job_id() was returning a local id and then the fed_mgr was turning that into a fed job id. This was a problem because get_next_job_id() couldn't check to see if an existing job already had the fed job id. It was only checking for the local job id. This was exposed in tests that did a reconfigure and the reconfigure loaded in a old job_id_sequence so that the next job got an id that was already being used.
-
Brian Christiansen authored
The logic to talk to the correct compute nodes still needs to be implemented. It will come at a later date.
-
Brian Christiansen authored
-
Brian Christiansen authored
Will submit using federation submission logic. Scheduling logic to come.
-
Brian Christiansen authored
to make sure job ptr is accessed within locks.
-
Brian Christiansen authored
In prep for refactoring _slurm_rpc_submit_batch_job to make sure the job_ptr is accessed within locks.
-
Brian Christiansen authored
-
Brian Christiansen authored
-
Brian Christiansen authored
-
Brian Christiansen authored
-
Brian Christiansen authored
-
Brian Christiansen authored
-
Brian Christiansen authored
It was picking a higher weighted federation over lower weighted federations because it had a earlier starttime. This shouldn't happen because that's what the weights are for. e.g. will_run_resp for fed1: start:2016-10-13T15:19:47 sys_usage:0.00 weight:2 will_run_resp for fed2: start:2016-10-13T15:19:48 sys_usage:0.00 weight:1 will_run_resp for fed3: start:2016-10-13T15:19:48 sys_usage:0.00 weight:1 Earliest cluster:fed1 time:1476393587 now:1476393588 Submitted federated job 67119254 to fed1(self)
-
Brian Christiansen authored
-
Brian Christiansen authored
-
Brian Christiansen authored
fedorigin fedoriginraw fedsiblings fedsiblingsraw
-
Brian Christiansen authored
-
Brian Christiansen authored
-
Brian Christiansen authored
If it exists.
-
Brian Christiansen authored
-
Brian Christiansen authored
-
Brian Christiansen authored
-
Brian Christiansen authored
-
Brian Christiansen authored
-
Brian Christiansen authored
cluster_rec->fed.name will be non-null and empty when the cluster is not part of a federation. Need to check fed.id instead. A fed.id of 0 means the cluster is not part of federation.
-
Brian Christiansen authored
-
Brian Christiansen authored
See previous unreverted commit.
-
Brian Christiansen authored
This reverts commit 2ec92d36a8ad7184897c9a322ba2d9978d2ccdbd.
-
Brian Christiansen authored
This is an example of how to do it. The problem is that select_jobinfo on the job_desc is packed using working_cluster's->plugin_id. job_desc's->select_jobinfo is only used by bluegene and alps code which will eventually go away.
-
Brian Christiansen authored
-
Brian Christiansen authored
sbatch will choose the federation or individual cluster with the fast startime. A willrun to a federation will return the fastest start time of all clusters in a federation.
-
Brian Christiansen authored
with the sbatch -M<clusters> option. A fed will run will the fastest time of all siblings in a federation.
-
Brian Christiansen authored
-
Brian Christiansen authored
Was moved in 5cfc577a
-
Brian Christiansen authored
-
Brian Christiansen authored
More to come. This sets up the controller side.
-
Brian Christiansen authored
-