Allow chunking over inner dimension that is defined across another dimension
Summary
When inner_dim_1
is defined as inner_dim_1_across = dim_2
in the Start() call, chunking along inner_dim_1
is not allowed because there is no way for startR to tell how to distribute the files correctly over each chunk. This means that in some cases the chunking options are limited by the way the data is organized in the files: for example, in some decadal models we might want to load time steps for an initialization date that are spread over several files.
The necessary development would be to add a way to organize the chunks according to the dependency between inner_dim_1
and dim_2
so that the list of files to load for each chunk can be correctly generated.
Example
library(startR)
path_list <- "/esarchive//exp/CMIP6/dcppA-hindcast/HadGEM3-GC31-MM/DCPP/MOHC/HadGEM3-GC31-MM/dcppA-hindcast/$ensemble$/Amon/$var$/gn/v20200417/$var$_Amon_*_dcppA-hindcast_s$syear$-$ensemble$_gn_$chunk$.nc"
member <- c("r1i1p1f2", "r2i1p1f2", "r3i1p1f2")
variable <- "tas"
sdates_hcst <- c("1990", "1991", "1992", "1993")
time_ind <- seq(2, 24)
lats.min <- 10
lats.max <- 20
lons.min <- 0
lons.max <- 15
exp <- Start(dat = path_list,
var = variable,
syear = paste0(sdates_hcst),
chunk = 'all',
chunk_depends = 'syear',
time = indices(time_ind),
time_across = 'chunk',
merge_across_dims = TRUE,
largest_dims_length = TRUE,
latitude = values(list(lats.min, lats.max)),
latitude_reorder = Sort(decreasing = TRUE),
longitude = values(list(lons.min, lons.max)),
longitude_reorder = CircularSort(0, 360),
ensemble = member,
synonims = list(longitude = c('lon', 'longitude'),
latitude = c('lat', 'latitude')),
return_vars = list(latitude = NULL, longitude = NULL,
time = c('syear', 'chunk')),
retrieve = FALSE)
step <- Step(fun = mean,
target_dims = c("dat", "var", "syear",
"latitude", "longitude", "ensemble"),
output_dims = NULL)
wf <- AddStep(inputs = exp,
step = step)
res <- Compute(wf,
chunks = list(time = 2))
The resulting error message is:
Error in Start(dat = "/esarchive//exp/CMIP6/dcppA-hindcast/HadGEM3-GC31-MM/DCPP/MOHC/HadGEM3-GC31-MM/dcppA-hindcast/$ensemble$/Amon/$var$/gn/v20200417/$var$_Amon_*_dcppA-hindcast_s$syear$-$ensemble$_gn_$chunk$.nc", :
Chunk over dimension 'time' is not allowed because 'time' is across 'chunk'.
Module and Package Version
startR_2.3.1