Improve fed job locking (9fb07473) · Commits · Manuel G. Marciani / ces_slurm_simulator

Commit 9fb07473 authored Apr 25, 2017 by

Brian Christiansen

Improve fed job locking

The problem was that the origin cluster had to get the internal job
write lock to test and set the fed cluster lock. This would hold up the
persistent connection and get into a dead lock. The solution is create a
separate table for tracking the federated job and the cluster lock which
is controlled by seperate lock.

Plus all communication on the persist connection must be quick. Thus all
communications that need to be modify the actual job need to be put onto
a queue for the scheduler to handle later so that the persistent
connection isn't being held up. The response will be sent back when the
request is processed. This moves to an asynchronous model for
communications between clusters in a federation.

parent 7587038a

Hide whitespace changes

Inline Side-by-side

Please register or to comment