Long experiments with many jobs: autosubmit run fails.
Hi @jlope2 and @dmanubens
One month ago I started to run an experiment a0dc
over 1974-2015 with monthly chunks (i.e., a total of 504 chunks to be run). The experiment comprises 25 ensemble members. Initially, I inadvertently asked to run 120 chunks. This experiment has successfully finished. Now I would like to run the remaining 384 chunks. To this end, I simply re-created the experiment by changing NUMCHUNKS to 504, re-covered the experiment, put the restarts of chunk 119 at the right place, etc.
The problem I have now is related to autosubmit itself:
- I have been able to create the experiment
- But I can't do an
autosubmit monitor
, this just takes too long. I get this message:
Getting job list...
Plotting...
^[[31m[CRITICAL] Unhandled exception on Autosubmit: maximum recursion depth exceeded while calling a Python object
Traceback (most recent call last):
File "/shared/earth/software/autosubmit/3.6.0-foss-2015a-Python-2.7.9/lib/python2.7/site-packages/autosubmit-3.6.0-py2.7.egg/autosubmit/autosubmit.py", line 294, in parse_args
args.filter_type, args.hide)
File "/shared/earth/software/autosubmit/3.6.0-foss-2015a-Python-2.7.9/lib/python2.7/site-packages/autosubmit-3.6.0-py2.7.egg/autosubmit/autosubmit.py", line 802, in monitor
monitor_exp.generate_output(expid, jobs, file_format, not hide)
File "/shared/earth/software/autosubmit/3.6.0-foss-2015a-Python-2.7.9/lib/python2.7/site-packages/autosubmit-3.6.0-py2.7.egg/autosubmit/monitor/monitor.py", line 169, in generate_output
graph = self.create_tree_list(expid, joblist)
File "/shared/earth/software/autosubmit/3.6.0-foss-2015a-Python-2.7.9/lib/python2.7/site-packages/autosubmit-3.6.0-py2.7.egg/autosubmit/monitor/monitor.py", line 125, in create_tree_list
self._add_children(job, exp, node_job)
File "/shared/earth/software/autosubmit/3.6.0-foss-2015a-Python-2.7.9/lib/python2.7/site-packages/autosubmit-3.6.0-py2.7.egg/autosubmit/monitor/monitor.py", line 148, in _add_children
self._add_children(child, exp, node_child)
File "/shared/earth/software/autosubmit/3.6.0-foss-2015a-Python-2.7.9/lib/python2.7/site-packages/autosubmit-3.6.0-py2.7.egg/autosubmit/monitor/monitor.py", line 148, in _add_children
self._add_children(child, exp, node_child)
File "/shared/earth/software/autosubmit/3.6.0-foss-2015a-Python-2.7.9/lib/python2.7/site-packages/autosubmit-3.6.0-py2.7.egg/autosubmit/monitor/monitor.py", line 148, in _add_children
self._add_children(child, exp, node_child)
File "/shared/earth/software/autosubmit/3.6.0-foss-2015a-Python-2.7.9/lib/python2.7/site-packages/autosubmit-3.6.0-py2.7.egg/autosubmit/monitor/monitor.py", line 148, in _add_children
self._add_children(child, exp, node_child)
File "/shared/earth/software/autosubmit/3.6.0-foss-2015a-Python-2.7.9/lib/python2.7/site-packages/autosubmit-3.6.0-py2.7.egg/autosubmit/monitor/monitor.py", line 148, in _add_children
self._add_children(child, exp, node_child)
File "/shared/earth/software/autosubmit/3.6.0-foss-2015a-Python-2.7.9/lib/python2.7/site-packages/autosubmit-3.6.0-py2.7.egg/autosubmit/monitor/monitor.py", line 148, in _add_children
self._add_children(child, exp, node_child)
RuntimeError: maximum recursion depth exceeded while calling a Python object
I have tried to filter the results by only showing the READY
jobs, that works, but this command can only be executed through a SBATCH (otherwise, the command is killed because it takes too long).
- I can't run autosubmit. When doing
autosubmit run a0dc
I get this output, then the process stops and nothing happens:
fmassonn@bscesautosubmit01: ~/logs/autosubmit >> tail -f log_a0dc_2016.12.15_19h42m13s_bscesautosubmit01
Preparing .lock file to avoid multiple instances with same expid.
Checking configuration files...
autosubmit_a0dc.conf OK
platforms_a0dc.conf OK
jobs_a0dc.conf OK
expdef_a0dc.conf OK
Configuration files OK
Starting job submission...
So I'm in the very uncomfortable situation of not being able to run my experiment... Could you help?
Thanks,
François