Start(): Allow more than one file dimension in 'metadata_dims'
Summary
The parameter metadata_dims
allows the user to specify the file dimensions along which to look for the metadata of the variable ($var$
) in the files. For example it is useful when:
- More than one variable is requested: in this case
metadata_dims = "var"
ensures that the variable metadata is retrieved for all the variables and not only for the variable in the first file that it finds. https://earth.bsc.es/gitlab/es/startR/-/blob/master/inst/doc/faq.md#20-use-metadata_dims-to-retrieve-variable-metadata - The first file is missing: in this case speficying another file dimension as metadata_dims will retrieve the variable metadata for all the files. https://earth.bsc.es/gitlab/es/startR/-/blob/master/inst/doc/faq.md#19-get-metadata-when-the-first-file-does-not-exist; Discussions here: !85 (merged) #58 (closed)
However, when both (1) and (2) happen at the same time, retrieving the correct metadata is currently not possible unless 'var' is specified as the pattern dimension, because metadata_dims
is restricted to only one file dimension. This is not mentioned in the documentation of Start() and I have not been able find the reason why it was decided to allow only one file dimension. Removing this restriction does not break any unit tests so I guess that it was about performance and avoiding unnecessary repetitive scanning of files.
Example
library(startR)
hcst.path <- "/esarchive/exp/ncep/cfs-v2/weekly_mean/s2s/$var$_f24h/$var$_$file_date$.nc"
file_date <- c("19990711", "19990715")
variable <- c("tas", "prlr")
data <- Start(dat = hcst.path,
var = variable,
file_date = file_date,
time = indices(1:4),
latitude = values(list(0, 10)),
latitude_reorder = Sort(),
longitude = values(list(10, 10)),
synonims = list(latitude = c('lat', 'latitude'),
longitude = c('lon', 'longitude'),
ensemble = c('member', 'ensemble', 'lev')),
ensemble = 'all',
metadata_dims = c('var', 'file_date'),
return_vars = list(latitude = 'dat',
longitude = 'dat',
time = 'file_date'),
retrieve = FALSE)
names(attr(data, "Variables")$common)
If metadata_dims = c('var', 'file_date')
, only 'var'
is used and the metadata for tas is missing (appears as NA
) because the first tas file is a missing file. If metadata_dims = c('file_date', 'var')
, only 'file_date'
is used and only the metadata for the first variable is retrieved, so prlr is missing.
Start() returns the following warning:
Warning: Parameter 'metadata_dims' has too many elements which serve repetitive function. Keep 'file_date' only.
But in this particular case it's not really repetitive; both are needed in order to retrieve all the necessary metadata.
Module and Package Version
R version: R >= 4.1.2
Other Relevant Information
(Additional information.)