Skip to content
GitLab
Projects Groups Topics Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
  • Sign in
  • startR startR
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributor statistics
    • Graph
    • Compare revisions
  • Issues 29
    • Issues 29
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 6
    • Merge requests 6
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Artifacts
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • Earth SciencesEarth Sciences
  • startRstartR
  • Issues
  • #196
Closed
Open
Issue created May 06, 2024 by vagudets@vagudetsMaintainer

Allow chunking over inner dimension that is defined across another dimension

Summary

When inner_dim_1 is defined as inner_dim_1_across = dim_2 in the Start() call, chunking along inner_dim_1 is not allowed because there is no way for startR to tell how to distribute the files correctly over each chunk. This means that in some cases the chunking options are limited by the way the data is organized in the files: for example, in some decadal models we might want to load time steps for an initialization date that are spread over several files.

The necessary development would be to add a way to organize the chunks according to the dependency between inner_dim_1 and dim_2 so that the list of files to load for each chunk can be correctly generated.

Example

library(startR)

path_list <- "/esarchive//exp/CMIP6/dcppA-hindcast/HadGEM3-GC31-MM/DCPP/MOHC/HadGEM3-GC31-MM/dcppA-hindcast/$ensemble$/Amon/$var$/gn/v20200417/$var$_Amon_*_dcppA-hindcast_s$syear$-$ensemble$_gn_$chunk$.nc"
member <- c("r1i1p1f2", "r2i1p1f2", "r3i1p1f2")
variable <- "tas"
sdates_hcst <- c("1990", "1991", "1992", "1993")
time_ind <- seq(2, 24)
lats.min <- 10
lats.max <- 20
lons.min <- 0
lons.max <- 15

exp <- Start(dat = path_list,
             var = variable,
             syear = paste0(sdates_hcst),
             chunk = 'all',
             chunk_depends = 'syear',
             time = indices(time_ind),
             time_across = 'chunk',
             merge_across_dims = TRUE,
             largest_dims_length = TRUE,
             latitude = values(list(lats.min, lats.max)),
             latitude_reorder = Sort(decreasing = TRUE),
             longitude = values(list(lons.min, lons.max)),
             longitude_reorder = CircularSort(0, 360),
             ensemble = member,
             synonims = list(longitude = c('lon', 'longitude'),
                             latitude = c('lat', 'latitude')),
             return_vars = list(latitude = NULL, longitude = NULL,
                                time = c('syear', 'chunk')),
             retrieve = FALSE)

step <- Step(fun = mean,
             target_dims = c("dat", "var", "syear",
                             "latitude", "longitude", "ensemble"),
             output_dims = NULL)

wf <- AddStep(inputs = exp,
              step = step)

res <- Compute(wf,
               chunks = list(time = 2))

The resulting error message is:

Error in Start(dat = "/esarchive//exp/CMIP6/dcppA-hindcast/HadGEM3-GC31-MM/DCPP/MOHC/HadGEM3-GC31-MM/dcppA-hindcast/$ensemble$/Amon/$var$/gn/v20200417/$var$_Amon_*_dcppA-hindcast_s$syear$-$ensemble$_gn_$chunk$.nc",  : 
  Chunk over dimension 'time' is not allowed because 'time' is across 'chunk'.

Module and Package Version

startR_2.3.1

Edited Jun 03, 2024 by vagudets
Assignee
Assign to
Time tracking