Skip to content
GitLab
  • Menu
Projects Groups Snippets
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
  • Sign in
  • startR startR
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 30
    • Issues 30
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 7
    • Merge requests 7
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • Earth Sciences
  • startRstartR
  • Issues
  • #109
Closed
Open
Created Jul 09, 2021 by aho@ahoMaintainer

AddStep(): Return a list with the same necessary info for each output

When the self-defined function returns more than one output, the workflow built by AddStep() will be a list instead of class "startR_workflow". It doesn't cause a problem because in Compute(), we can use an arbitrary item under the list, e.g., workflow$output1, which has the class "startR_workflow". However, it is not intuitive nor with any instruction that we should use this 'trick'.

Take ex2_11 for example. The function returns two outputs, ind_exp and ind_obs, and they have the same dimension. The workflow will be a list of 2 and both items are of class "startR_workflow".

class(workflow)
[1] "list"
class(workflow$ind_exp)
[1] "startR_workflow"
class(workflow$ind_obs)
[1] "startR_workflow"

If we put workflow as the input of Compute(), we'll get an error:

Parameter 'workflow' must be an object of class 'startR_cube' as returned by Start or of class 'startR_workflow' as returned by AddStep.

We can use either workflow$ind_exp or workflow$ind_obs instead and the result will be correct.

In fact, workflow$ind_exp and workflow$ind_obs are identical. Even if the two outputs don't share the same dimensions, the necessary information of workflow$ind_exp and workflow$ind_obs for Compute() is still the same. For example, I change the dimension of ind_obs to [asd = 2] (while ind_exp has [sdate = 4]), the only different between workflow$ind_exp and workflow$ind_obs is attributes(workflow$ind_obs)$Dimensions. But this is not used in Compute() at all, so the difference doesn't make any impact.

attributes(workflow$ind_obs)$Dimensions
asd dat var 
 NA   1   1 
attributes(workflow$ind_exp)$Dimensions
sdate   dat   var 
   NA     1     1 

I guess there must be a reason why startR creates a workflow for each output (though the workflows are almost the same), but for now, I cannot find any potential problem if AddStep() only returns the first output as representation. The relative code is here: https://earth.bsc.es/gitlab/es/startR/-/blob/master/R/AddStep.R#L122-142.

To avoid the error message above, we can add an additional check in Compute(), like:

if (!any(c('startR_cube', 'startR_workflow') %in% class(workflow))) {
  if (all(lapply(workflow, class) %in% c('startR_cube', 'startR_workflow'))) {
    workflow <- workflow[[1]]
    .warning("Parameter 'workflow' is a list but it contains multiple items of class 'startR_workflow' or 'startR_cube'. Use the first item in the list as the workflow.")
  }
}

But for now, I keep the function as it is and see if we have any new findings in the future.

An-Chi

Assignee
Assign to
Time tracking