This fixes:
-
!527 (merged) , in line with this merge, now a job can't be put in the UniqueQueue if it has no ID.
-
Fixed an issue with failed jobs when chunks are> 3, retrials are> 5, and the job fulfills the 3 retrials.
The fix consists of adding a copy(job) in the put method.
I really thought that was what you put in the multiprocess.Queue() is always a copy of your actual instance. However, I saw that it was affecting the jobs in the Queue.
Since I have to add a copy ( not a deep one, tho, so it shouldn't be expensive), an alternative would be to pass job.id, fail_count, out , err, status, and create a new job in the log_recovery process.
- Reorganized the log retrievals in case of failure
Now, instead of reconnecting to the platform and not going through each job in the queue, AS will get all the jobs from the queue and then reconnect if any job has an issue.
This avoids the issue of not getting the rest of the jobs that really have err and out files at the end of the main process
- Now the check of if the job is recovered or not, is only done at the end and at the start instead of in the middle.
Tomorrow, I'll add to the regression test, a check for the log retrieval names
-
Removes unused stuff
-
Pipeline should now always works, the nature of the issue was tied to the CPU clock I believe.