[bug] Several Autosubmit and Autosubmit Config Parser have mutable default parameters
I had noticed it some time ago, and today decided to test the config parser while reading the diff function called in !395 (merged) .
Here's one example with the function from the Autosubmit4 Config Parser used by the workflow optimizations branch. The Python code in the config parser is as follows.
def detailed_deep_diff(self, current_data, last_run_data, differences={}):
"""
Returns a dictionary with for each key, the difference between the current configuration and the last_run_data
:param current_data: dictionary with the current data
:param last_run_data: dictionary with the last_run_data data
:return: differences: dictionary
"""
for key, val in current_data.items():
if isinstance(val, collections.abc.Mapping):
if key not in last_run_data.keys():
differences[key] = val
else:
self.detailed_deep_diff(last_run_data[key], val, differences)
else:
if key not in last_run_data.keys() or last_run_data[key] != val:
differences[key] = val
return differences
The bug is caused by differences={}
, whereas it should be (in many cases, it's not even should, but must/needs to be) differences=None
and then initialize the differences if needed.
Here's the type of bugs that this causes.
from autosubmitconfigparser.config.configcommon import AutosubmitConfig
c = AutosubmitConfig('a000')
current_data = {'a': 1}
# current data is {'a': 1}
differences = c.detailed_deep_diff({'b': 1}, current_data)
current_data.update(differences)
# differences is {'b': 1}
# current data is {'a': 1, 'b': 1}
differences = c.detailed_deep_diff({'c': 1}, current_data)
current_data.update(differences)
# differences is {'b': 1, 'c': 1} # <----- THIS IS WRONG! Because differences in the function "remembers" b:1
# current data is {'a': 1, 'b': 1, 'c': 1}
The detailed_deep_diff
function gives you the diff of dictionaries. Without paying attention, one might think that since you have no data, the first result ought to be just {'b': 1}
, and then the second result should contain {'c': 1'}
.
But the first time the function detailed_deep_diff
is called the differences
is initialized and during all the existence of the function object, it will hold a reference to that dictionary, and re-use it for every future call. The only way to avoid this bug is providing a value for that parameter.
If any developer uses that function and does not initialize the last parameter (which is valid in Python, and no IDE will complain, since we are giving a default parameter value) then the bug will happen the second time that's called.
Ref: https://docs.python-guide.org/writing/gotchas/#mutable-default-arguments
This means that this doc is incorrect “Returns a dictionary with for each key, the difference between the current configuration and the last_run_data”. The function actually returns the dictionary with the difference between the current configuration and all the previous calls, when the default value is used.
Most of Autosubmit & Autosubmit Config Parser suffer from this bug, which is easy to fix, but can be hard to diagnose. I guess it'd be worth to check the API and testing suite for this too.