AddStep(): Return a list with the same necessary info for each output
When the self-defined function returns more than one output, the workflow built by AddStep() will be a list instead of class "startR_workflow". It doesn't cause a problem because in Compute(), we can use an arbitrary item under the list, e.g., workflow$output1
, which has the class "startR_workflow". However, it is not intuitive nor with any instruction that we should use this 'trick'.
Take ex2_11 for example. The function returns two outputs, ind_exp
and ind_obs
, and they have the same dimension. The workflow will be a list of 2 and both items are of class "startR_workflow".
class(workflow)
[1] "list"
class(workflow$ind_exp)
[1] "startR_workflow"
class(workflow$ind_obs)
[1] "startR_workflow"
If we put workflow
as the input of Compute(), we'll get an error:
Parameter 'workflow' must be an object of class 'startR_cube' as returned by Start or of class 'startR_workflow' as returned by AddStep.
We can use either workflow$ind_exp
or workflow$ind_obs
instead and the result will be correct.
In fact, workflow$ind_exp
and workflow$ind_obs
are identical. Even if the two outputs don't share the same dimensions, the necessary information of workflow$ind_exp
and workflow$ind_obs
for Compute() is still the same. For example, I change the dimension of ind_obs
to [asd = 2]
(while ind_exp
has [sdate = 4]
), the only different between workflow$ind_exp
and workflow$ind_obs
is attributes(workflow$ind_obs)$Dimensions
. But this is not used in Compute() at all, so the difference doesn't make any impact.
attributes(workflow$ind_obs)$Dimensions
asd dat var
NA 1 1
attributes(workflow$ind_exp)$Dimensions
sdate dat var
NA 1 1
I guess there must be a reason why startR creates a workflow for each output (though the workflows are almost the same), but for now, I cannot find any potential problem if AddStep() only returns the first output as representation. The relative code is here: https://earth.bsc.es/gitlab/es/startR/-/blob/master/R/AddStep.R#L122-142.
To avoid the error message above, we can add an additional check in Compute(), like:
if (!any(c('startR_cube', 'startR_workflow') %in% class(workflow))) {
if (all(lapply(workflow, class) %in% c('startR_cube', 'startR_workflow'))) {
workflow <- workflow[[1]]
.warning("Parameter 'workflow' is a list but it contains multiple items of class 'startR_workflow' or 'startR_cube'. Use the first item in the list as the workflow.")
}
}
But for now, I keep the function as it is and see if we have any new findings in the future.
An-Chi