Skip to content
GitLab
Projects Groups Topics Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
  • Sign in
  • startR startR
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributor statistics
    • Graph
    • Compare revisions
  • Issues 29
    • Issues 29
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 7
    • Merge requests 7
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • Earth SciencesEarth Sciences
  • startRstartR
  • Issues
  • #203
Closed
Open
Issue created Jul 26, 2024 by vagudets@vagudetsMaintainer

Start(): Allow more than one file dimension in 'metadata_dims'

Summary

The parameter metadata_dims allows the user to specify the file dimensions along which to look for the metadata of the variable ($var$) in the files. For example it is useful when:

  1. More than one variable is requested: in this case metadata_dims = "var" ensures that the variable metadata is retrieved for all the variables and not only for the variable in the first file that it finds. https://earth.bsc.es/gitlab/es/startR/-/blob/master/inst/doc/faq.md#20-use-metadata_dims-to-retrieve-variable-metadata
  2. The first file is missing: in this case speficying another file dimension as metadata_dims will retrieve the variable metadata for all the files. https://earth.bsc.es/gitlab/es/startR/-/blob/master/inst/doc/faq.md#19-get-metadata-when-the-first-file-does-not-exist; Discussions here: !85 (merged) #58 (closed)

However, when both (1) and (2) happen at the same time, retrieving the correct metadata is currently not possible unless 'var' is specified as the pattern dimension, because metadata_dims is restricted to only one file dimension. This is not mentioned in the documentation of Start() and I have not been able find the reason why it was decided to allow only one file dimension. Removing this restriction does not break any unit tests so I guess that it was about performance and avoiding unnecessary repetitive scanning of files.

Example

library(startR)

hcst.path <- "/esarchive/exp/ncep/cfs-v2/weekly_mean/s2s/$var$_f24h/$var$_$file_date$.nc"
file_date <- c("19990711", "19990715")
variable <- c("tas", "prlr")

data <- Start(dat = hcst.path,
              var = variable,
              file_date = file_date,
              time = indices(1:4),
              latitude = values(list(0, 10)),
              latitude_reorder = Sort(),
              longitude = values(list(10, 10)),
              synonims = list(latitude = c('lat', 'latitude'),
                              longitude = c('lon', 'longitude'),
                              ensemble = c('member', 'ensemble', 'lev')),
              ensemble = 'all',
              metadata_dims = c('var', 'file_date'),
              return_vars = list(latitude = 'dat',
                                 longitude = 'dat',
                                 time = 'file_date'),
              retrieve = FALSE)

names(attr(data, "Variables")$common)

If metadata_dims = c('var', 'file_date'), only 'var' is used and the metadata for tas is missing (appears as NA) because the first tas file is a missing file. If metadata_dims = c('file_date', 'var'), only 'file_date' is used and only the metadata for the first variable is retrieved, so prlr is missing.

Start() returns the following warning:

Warning: Parameter 'metadata_dims' has too many elements which serve repetitive function. Keep 'file_date' only.

But in this particular case it's not really repetitive; both are needed in order to retrieve all the necessary metadata.

Module and Package Version

R version: R >= 4.1.2

Other Relevant Information

(Additional information.)

Edited Jul 26, 2024 by vagudets
Assignee
Assign to
Time tracking