how to deal with UNKNOWN status
Hi, During the execution of experiment a034, several times, I am having UNKNOWN status (White box) at some point (it can be POST, or CLEAN...). Normally I change the status manually to READY, re-launch and it works. This implies a constant control of the experiment, but at least it works. Anyway, today I realized that the UNKNOWN status, can be even a COMPLETED status! Here a real example. The UNKNOWN status appeared at the CLEAN of chunck 78. I checked the error: The final lines of a034_19500201_fc0_78_CLEAN.err
++ ls -1 /gpfs/scratch/bsc32/bsc32774/a034/LOG_a034/a034_19500201_fc0_78_CLEAN_1549492.err /gpfs/scratch/bsc32/bsc32774/a034/LOG_a034/a034_19500201_fc0_78_CLEAN_1549492.out /gpfs/scratch/bsc32/bsc32774/a034/LOG_a034/a034_19500201_fc0_78_CLEAN.cmd
- failed_errfiles=
- set -e
- failed_jobs_qt=0
- failed_jobs_rt=0
- echo '1454262998 48 237 0 0 0'
- exit 0
It did not produce "a034_19500201_fc0_78_CLEAN_COMPLETED", but was in fact completed. In order to have the exp run I did set the status to COMPLETED and create manually a a034_19500201_fc0_78_CLEAN_COMPLETED. Then re-launch the experiment.
I report it so we have a trace of what happened. Do not know if it can be solved.
thank you valentina @obellprat, @macosta, @dmanubens, @jvegas