- 17 May, 2017 31 commits
-
-
Brian Christiansen authored
-
Brian Christiansen authored
Only used in fed_mgr.c
-
Brian Christiansen authored
This will make it easier to add new proto types without having to modifying protocol_defs.[ch]. Leaving job_lock and job_unlock to be handled by slurmctld_req since they aren't a "queued" type.
-
Brian Christiansen authored
prevent possible memory leak.
-
Brian Christiansen authored
-
Brian Christiansen authored
-
Brian Christiansen authored
All handled in _proc_multi_msg except for sib_job_[un]lock.
-
Brian Christiansen authored
-
Brian Christiansen authored
-
Brian Christiansen authored
-
Brian Christiansen authored
-
Brian Christiansen authored
-
Brian Christiansen authored
This prevents deadlocks when having the fed_job_list_mutex locked higher up and calling job_completion_logger inside of the locked mutex.
-
Brian Christiansen authored
-
Brian Christiansen authored
-
Brian Christiansen authored
Don't need to have the fed_write_lock when destroying the persist_conn_server.
-
Brian Christiansen authored
-
Brian Christiansen authored
Since federated submissions are now asynchronous and because the working_cluster_rec can be multithreaded, it's better to have the federadated will_runs in the client. This prevents the deadlocks and holding up the persistent connections as could happen in the previous model.
-
Brian Christiansen authored
-
Brian Christiansen authored
With the change to the asynchronous model, it's better to have the cluster always get the lock from the origin cluster. Previously, the origin cluster would try to pick one cluster that could start the job the soonest and the scenario where there would be only one sibling was more common. Now that sibling jobs are sent to all clusters this is less common.
-
Brian Christiansen authored
Queue up the fed job completions.
-
Brian Christiansen authored
Federated submissions now happen ansynchronously. Sibling jobs are submitted to the sibling cluster. The sibling cluster queue's up the request to be handled later when it can get the job write lock. The sibling cluster submits the job and sends a message back to the origin cluster which is queued up as well. If the submission failed then the sibling cluster is removed from the job's active siblings.
-
Brian Christiansen authored
The problem was that the origin cluster had to get the internal job write lock to test and set the fed cluster lock. This would hold up the persistent connection and get into a dead lock. The solution is create a separate table for tracking the federated job and the cluster lock which is controlled by seperate lock. Plus all communication on the persist connection must be quick. Thus all communications that need to be modify the actual job need to be put onto a queue for the scheduler to handle later so that the persistent connection isn't being held up. The response will be sent back when the request is processed. This moves to an asynchronous model for communications between clusters in a federation.
-
Brian Christiansen authored
-
Brian Christiansen authored
-
Brian Christiansen authored
-
Brian Christiansen authored
Prevents a second call to the database. This could happen when the origin job is cancelled and the sibling jobs report back that the job is gone as well.
-
Brian Christiansen authored
-
Brian Christiansen authored
Mimicking how cancelled jobs are. The database will show that the job start_time is 0 but in the controller the the start time will be the same as the end time. sacct will set the start time to the end time if there is an end time and the start time is 0.
-
Brian Christiansen authored
-
Brian Christiansen authored
-
- 16 May, 2017 5 commits
-
-
Tim Shaw authored
bug 805
-
Brian Christiansen authored
To be able to set a default federated view for all status commands.
-
Brian Christiansen authored
even if --sibling is specified.
-
Brian Christiansen authored
to show federated view. sacct, scontrol, sinfo, sprio, squeue, sreport
-
Brian Christiansen authored
-
- 15 May, 2017 2 commits
-
-
Brian Christiansen authored
Show a tab in the cluster combo box to select a federated view for a given federation.
-
Morris Jette authored
-
- 13 May, 2017 2 commits
-
-
Morris Jette authored
-
Isaac Hartung authored
Bug 3695
-