Wrappers not working as expected with PISCES 1D parameter sensitiviy analysis workflow, AS 3.14.0
This is the follow-up of another issue:
Modified workflow to construct a surrogate model for sensitivity analyses
It may also be related to
Horizontal-vertical wrapper fine tuning, autosubmit issue #640
I copy the previous message below:
Update:
- I launched a45b (account bsc32882) following your specifications, debug queue.
- I launched a5mr (account bsc32340) without vertical wrappers (as I used to do), bsc_es queue.
In both cases I see some chunks were completed, but most failed. Some before completing a single SIM, others after completing a few.
Failures seem related to hitting the wallclock, although I cannot always see the TIME LIMIT message in the logs. I also tried to change the jobs wallclock from 10 to 20 minutes (a44w on bsc32882). Same issues. In summary: it seems to me than when jobs in a chunk start running before hitting the wallclock, some of them make it to the target (not all). If the first sim hits the wallclock, then the whole chunk is failed.
I noted some different behaviours compared to previous runs with AS 3.13.0 (Nov 2021): this time I see jobs marked as running but they are actually not! I know because in the runtime folders nothing is happening that indicates a running sim. Just a few jobs are simultaneously queueing, where I would expect all 100 jobs (members) to run or queue simultaneously.
We can assume the nemo simulation environment and executable themselves are fine: I changed nothing compared to older experiments, and I just managed to complete another experiment (a5mj) with 2000 sequential sims without failures.
I'll copy this to autosubmit.