Start(): Retrieve correct time steps when time is across file dimension and the time steps of the first files are skipped
(This is a template to report problems or suggest a new development. Please fill in the relevant information and remove the rest.)
Hi me,
Summary
When loading time steps across multiple files, Start() does not load the correct indices if the first index is not inside the first file and the files have different lengths for the time dimension.
For example, in the HadGEM3-GC31-MM model, the files have the following structure:
tas_Amon_HadGEM3-GC31-MM_dcppA-hindcast_s1991-r3i1p1f2_gn_199111-199112.nc
tas_Amon_HadGEM3-GC31-MM_dcppA-hindcast_s1991-r3i1p1f2_gn_199201-199212.nc
tas_Amon_HadGEM3-GC31-MM_dcppA-hindcast_s1991-r3i1p1f2_gn_199301-199312.nc
...
The model is initialized in November and the first file contains the first two time steps for November and December. The subsequent files contain all the months in each year from January to December, meaning 12 time steps per file.
The following Start() call can be used to load the time steps in order:
library(startR)
path_list <- paste0("/esarchive/exp/CMIP6/dcppA-hindcast/HadGEM3-GC31-MM/DCPP/MOHC/HadGEM3-GC31-MM/dcppA-hindcast/",
"$ensemble$/Amon/$var$/gn/v20200417/",
"$var$_Amon_*_dcppA-hindcast_s$syear$-$ensemble$_gn_$chunk$.nc")
# Where `$chunk$` refers to each of the strings designating the time steps
# in the file: `199111-199112`, `199201-199212`, `199301-199312`, etc.
sdates_hcst <- c("1990", "1991", "1992", "1993")
time_ind <- seq(2, 24)
lats.min <- 10
lats.max <- 20
lons.min <- 0
lons.max <- 15
exp <- Start(dat = path_list,
var = "tas",
syear = paste0(sdates_hcst),
chunk = 'all',
chunk_depends = 'syear',
time = indices(time_ind),
time_across = 'chunk',
merge_across_dims = TRUE,
largest_dims_length = TRUE,
latitude = values(list(10, 20)),
latitude_reorder = Sort(decreasing = TRUE),
longitude = values(list(0, 15)),
longitude_reorder = CircularSort(0, 360),
ensemble = c("r1i1p1f2", "r2i1p1f2", "r3i1p1f2"),
synonims = list(longitude = c('lon', 'longitude'),
latitude = c('lat', 'latitude')),
return_vars = list(latitude = NULL, longitude = NULL,
time = c('syear', 'chunk')),
retrieve = TRUE)
# The first time step is forecast time 2, December 1990, as expected.
attr(exp, "Variables")$common$time[1]
# [1] "1990-12-16 UTC"
However, if the first time step (nth index, where n > 2) falls outside of the first file defined by $chunk$
, the resulting dates are wrong, because Start() retrieves the forecast times starting from the nth index of the second file:
time_ind <- seq(3, 24)
exp <- Start(dat = path_list,
var = "tas",
syear = paste0(sdates_hcst),
chunk = 'all',
chunk_depends = 'syear',
time = indices(time_ind),
time_across = 'chunk',
merge_across_dims = TRUE,
largest_dims_length = TRUE,
latitude = values(list(10, 20)),
latitude_reorder = Sort(decreasing = TRUE),
longitude = values(list(0, 15)),
longitude_reorder = CircularSort(0, 360),
ensemble = c("r1i1p1f2", "r2i1p1f2", "r3i1p1f2"),
synonims = list(longitude = c('lon', 'longitude'),
latitude = c('lat', 'latitude')),
return_vars = list(latitude = NULL, longitude = NULL,
time = c('syear', 'chunk')),
retrieve = TRUE)
# The first time step is actually forecast time 5! and not forecast time 3.
attr(exp, "Variables")$common$time[1]
[1] "1991-03-16 UTC"
Furthermore, many of the time steps in the array are filled with NA values.
Module and Package Version
startR_2.3.1 with R/4.2.1 on Hub (pending testing with R/4.1.2)
Other Relevant Information
This bugfix is needed in order to correctly load forecast times in SUNSET for these decadal models.+
Victòria