Profiler fails to stop and causes critical error
Hello @dbeltran and @bdepaula, I am forwarding an issue that @pgoitia noticed as he was travelling back to Cantabria.
MERGE REQUEST: !456
Autosubmit Version
4.0.87
Summary
I had a flashback and realized that a last minute change I made this week produces an error when you run the "monitor" function with the profiler enabled.
The error occurs because the stop()
function throws an exception when the user tries to call stop()
twice. This is due to a _finished
flag that I introduced in this merge request: !372 (merged)
This was not happening before approving this MR, but I consider that the flag is still needed, so I propose some solutions:
* The first one is to separate the error handling of stop()
into two, so that when the profiler is stopped for the second time, it simply returns:
def stop(self) -> None:
"""Function to finish the profiling process."""
if not self._started:
raise AutosubmitCritical('Cannot stop the profiler because was not running.', 7074)
if self._finished:
return
self._profiler.disable()
self._mem_final += _get_current_memory()
self._report()
self._finished = True
* The second one is to do the same, but creating an entry in the log to notify it. This option, on the one hand, is good so that any developer who is mistakenly using the profiler will notice the bug. On the other hand, any user profiling the monitor function will also receive these messages repeatedly. I don't like this option so much:
def stop(self) -> None:
"""Function to finish the profiling process."""
if not self._started:
raise AutosubmitCritical('Cannot stop the profiler because was not running.', 7074)
if self._finished:
Log.info("Cannot stop the profiler because was not running.")
return
self._profiler.disable()
self._mem_final += _get_current_memory()
self._report()
self._finished = True
- The third one, although I do not like it as much, is to add another try/catch statement to monitor that covers the whole function and is used exclusively for its finally clause to call
stop()
. This is a draft:
@staticmethod
def monitor(..., profile=False, ...):
# Start profiling if the flag has been used
if profile:
profiler = Profiler(expid)
profiler.start()
try:
# here is where you should set all the content of the monitor() function.
# just elevate the exception thrown by the monitoring process
except AutosubmitError as e:
raise
except AutosubmitCritical as e:
raise
except BaseException as e:
raise
finally:
if profile:
profiler.stop()
return True
- The last idea is to decide if the profiler output is really necessary after a function failure. Probably it is not, as the profiler is generally used to gather metrics to improve the performance of a working function. This is the fastest alternative, which also ensures not adding additional complexity to the code.
Steps to reproduce
Just run a monitor
command for any experiment.
What is the current bug behavior?
It critically fails.
What is the expected correct behavior?
It should provide the profiler output and create the plot.
Relevant logs and/or screenshots
[CRITICAL] Cannot stop the profiler because was not running. [eCode=7074]
More info at https://autosubmit.readthedocs.io/en/master/troubleshooting/error-codes.html
Traceback (most recent call last):
File "/home/mgimenez/Documents/local/dev/autosubmit/bin/autosubmit", line 37, in main
Autosubmit.parse_args()
File "/home/mgimenez/Documents/local/dev/autosubmit/autosubmit/autosubmit.py", line 664, in parse_args
return Autosubmit.monitor(args.expid, args.output, args.list, args.filter_chunks, args.filter_status,
File "/home/mgimenez/Documents/local/dev/autosubmit/autosubmit/autosubmit.py", line 2528, in monitor
profiler.stop()
File "/home/mgimenez/Documents/local/dev/autosubmit/autosubmit/profiler/profiler.py", line 63, in stop
raise AutosubmitCritical('Cannot stop the profiler because was not running.', 7074)
log.log.AutosubmitCritical: