Improve fed job locking
The problem was that the origin cluster had to get the internal job write lock to test and set the fed cluster lock. This would hold up the persistent connection and get into a dead lock. The solution is create a separate table for tracking the federated job and the cluster lock which is controlled by seperate lock. Plus all communication on the persist connection must be quick. Thus all communications that need to be modify the actual job need to be put onto a queue for the scheduler to handle later so that the persistent connection isn't being held up. The response will be sent back when the request is processed. This moves to an asynchronous model for communications between clusters in a federation.
Please register or sign in to comment