Fixed DB test # Means to fix the failed jobs not being written in the db due the logs not being retrieved (!531) · Merge requests · Earth Sciences / autosubmit

Open dbeltran requested to merge 4.1.12-dev-branch-failed-jobs into 4.1.12-dev-branch Dec 09, 2024

This fixes:

!527 (merged) , in line with this merge, now a job can't be put in the UniqueQueue if it has no ID.
Fixed an issue with failed jobs when chunks are> 3, retrials are> 5, and the job fulfills the 3 retrials.

The fix consists of adding a copy(job) in the put method.

I really thought that was what you put in the multiprocess.Queue() is always a copy of your actual instance. However, I saw that it was affecting the jobs in the Queue.

Since I have to add a copy ( not a deep one, tho, so it shouldn't be expensive), an alternative would be to pass job.id, fail_count, out , err, status, and create a new job in the log_recovery process.

Reorganized the log retrievals in case of failure

Now, instead of reconnecting to the platform and not going through each job in the queue, AS will get all the jobs from the queue and then reconnect if any job has an issue.

This avoids the issue of not getting the rest of the jobs that really have err and out files at the end of the main process

Now the check of if the job is recovered or not, is only done at the end and at the start instead of in the middle.

Tomorrow, I'll add to the regression test, a check for the log retrieval names

Removes unused stuff
Pipeline should now always works, the nature of the issue was tied to the CPU clock I believe.

Edited Dec 09, 2024 by dbeltran