Profiler fails to stop and causes critical error

Hello @dbeltran and @bdepaula, I am forwarding an issue that @pgoitia noticed as he was travelling back to Cantabria.

MERGE REQUEST: !456

Autosubmit Version

4.0.87

Summary

I had a flashback and realized that a last minute change I made this week produces an error when you run the "monitor" function with the profiler enabled.

The error occurs because the stop() function throws an exception when the user tries to call stop() twice. This is due to a _finished flag that I introduced in this merge request: !372 (merged)

This was not happening before approving this MR, but I consider that the flag is still needed, so I propose some solutions:

* The first one is to separate the error handling of stop() into two, so that when the profiler is stopped for the second time, it simply returns:

def stop(self) -> None:
	"""Function to finish the profiling process."""
	if not self._started:
		raise AutosubmitCritical('Cannot stop the profiler because was not running.', 7074)
	if self._finished:
		return

	self._profiler.disable()
	self._mem_final += _get_current_memory()
	self._report()
	self._finished = True

* The second one is to do the same, but creating an entry in the log to notify it. This option, on the one hand, is good so that any developer who is mistakenly using the profiler will notice the bug. On the other hand, any user profiling the monitor function will also receive these messages repeatedly. I don't like this option so much:

def stop(self) -> None:
	"""Function to finish the profiling process."""
	if not self._started:
		raise AutosubmitCritical('Cannot stop the profiler because was not running.', 7074)
	if self._finished:
		Log.info("Cannot stop the profiler because was not running.")
		return

	self._profiler.disable()
	self._mem_final += _get_current_memory()
	self._report()
	self._finished = True

The third one, although I do not like it as much, is to add another try/catch statement to monitor that covers the whole function and is used exclusively for its finally clause to call stop(). This is a draft:

@staticmethod
def monitor(..., profile=False, ...):

    # Start profiling if the flag has been used
    if profile:
        profiler = Profiler(expid)
        profiler.start()

    try:
    # here is where you should set all the content of the monitor() function.

    # just elevate the exception thrown by the monitoring process
    except AutosubmitError as e:
        raise
    except AutosubmitCritical as e:
        raise
    except BaseException as e:
        raise

    finally:
        if profile:
            profiler.stop()

    return True

The last idea is to decide if the profiler output is really necessary after a function failure. Probably it is not, as the profiler is generally used to gather metrics to improve the performance of a working function. This is the fastest alternative, which also ensures not adding additional complexity to the code.

Steps to reproduce

Just run a monitor command for any experiment.

What is the current bug behavior?

It critically fails.

What is the expected correct behavior?

It should provide the profiler output and create the plot.

Relevant logs and/or screenshots

 [CRITICAL] Cannot stop the profiler because was not running. [eCode=7074]
More info at https://autosubmit.readthedocs.io/en/master/troubleshooting/error-codes.html
Traceback (most recent call last):
  File "/home/mgimenez/Documents/local/dev/autosubmit/bin/autosubmit", line 37, in main
    Autosubmit.parse_args()
  File "/home/mgimenez/Documents/local/dev/autosubmit/autosubmit/autosubmit.py", line 664, in parse_args
    return Autosubmit.monitor(args.expid, args.output, args.list, args.filter_chunks, args.filter_status,
  File "/home/mgimenez/Documents/local/dev/autosubmit/autosubmit/autosubmit.py", line 2528, in monitor
    profiler.stop()
  File "/home/mgimenez/Documents/local/dev/autosubmit/autosubmit/profiler/profiler.py", line 63, in stop
    raise AutosubmitCritical('Cannot stop the profiler because was not running.', 7074)
log.log.AutosubmitCritical:

Edited Jul 10, 2024 by Pablo Goitia