Fix for backfill launch job with reboot
This bug was likely the root cause of bug 3366. If the backfill scheduler allocates resources for a batch job and a node reboot is required, the batch launch RPC would be sent to the agent. At that point, there is a race condition between the agent and the job_time_limit() function testing for boot completion. If the job_time_limit() function ran first, it would trigger a second launch RPC request getting sent to the agent. bug 3366
Please register or sign in to comment