Inconsistencies in COMPLETED/FAILED times and logs
This has implications in the log visualization, the correct identification of runs, and also in the performance metrics due to some jobs reporting unrealistic metrics that force us to implement outlier removal algorithms.
I take experiment a0uo
as example:
The .err and .out logs for the last run, according to the GUI, point to logs from different runs. Maybe the fact that the latest .out has been compressed is the cause since it's not findable by the GUI/API. If that's the case it should be more robust and not point to the latest available, but for the one that really corresponds to the run.
Then, in the job history view, we see a job COMPLETED in only 30 seconds.
The job COMPLETED in 8:43:53 supposedly ran later, but this is not the one shown in the tree view when we open it:
If we open the /appl/AS/AUTOSUBMIT_DATA/a0uo/tmp/a0uo_19900101_fc0_11_SIM_TOTAL_STATS
file, the FAILED jobs don't have an ending time (then I infer this information is added in the DDBB directly by Autosubmit when the job is failed). The COMPLETED job ran for 8:34h, like the one with counter 62 shown above. There is no trail about the job running in 30 seconds.
20240301223744 20240301223905 19700101020000 FAILED
20240301233222 20240302044603 19700101020000 FAILED
20240302044634 20240302045337 19700101020000 FAILED
20240302045408 20240302045752 19700101020000 FAILED
20240302131902 20240302132206 19700101020000 FAILED
20240302151731 20240302152345 19700101020000 FAILED
20240302152416 20240302152614 20240303001007 COMPLETED
Unfortunately, there is no AS log for this run, and I cannot look there at what could happen. But this behaviour is repeated in other experiments and could be tracked by someone.
Feel free to move the part of this issue that corresponds to Autosubmit to the other project. Sorry for open a single issue for the whole problem: @ltenorio @dbeltran @bdepaula