diff --git a/inst/doc/faq.md b/inst/doc/faq.md index 6f1d1ac775e4d849af2b0e7ce0a82c8c162be421..4ad94a0407dd4f1b8a672528f542a828b1d943db 100644 --- a/inst/doc/faq.md +++ b/inst/doc/faq.md @@ -21,6 +21,7 @@ This document intends to be the first reference for any doubts that you may have 15. [Specify extra function arguments in the workflow](#15-specify-extra-function-arguments-in-the-workflow) 16. [Use parameter 'return_vars' in Start()](#16-use-parameter-return_vars-in-start) 17. [Use parameter 'split_multiselected_dims' in Start()](#17-use-parameter-split_multiselected_dims-in-start) + 18. [Use glob expression '*' to define the file path](#18-use-glob-expression-to-define-the-file-path) 2. **Something goes wrong...** @@ -388,35 +389,48 @@ Look at the position of `extra_queue_params` parameter in a full call of Compute wait = TRUE) ``` -### 8. Define a path with multiple dependencies +### 8. Define a path with multiple dependencies -The structure of the BSC Earth data repository 'esarchive' allows us to create a path pattern to the data by using different variables (between dollar symbol), such as `$var$`, for the variable name, or `$sdates$`, for the start date of the simulation. Here is an example for loading monthly simulations of system4_m1 data: +The structure of the BSC Earth data repository 'esarchive' allows us to create a path pattern to the data by using different variables +(between dollar symbol), such as `$var$`, for the variable name, or `$sdates$`, for the start date of the simulation. We call these variables 'file dimension'. +Here is an example for loading monthly simulations of system4_m1 data: `path <- '/esarchive/exp/ecmwf/system4_m1/monthly_mean/$var$_f6h/$var$_$sdate$.nc'` The function Start() will require two parameters 'var' and 'sdate' to load the desired data. -In some cases, the creation of the path could be a little bit more complicated. Some researchers create their own EC-Earth experiments which are identified by an experiment ID (`$expid$`) and with different model version (`$version`), even for different members (`$member$`): +In some cases, the file dimensions have dependence relationship. Some researchers create their own EC-Earth experiments which are identified by an experiment ID (`$expid$`) and with different members (`$member$`): + +| expid | member | +|-------|----------| +| a1st | r7i1p1f1 | +| a1sx |r10i1p1f1 | + +In this case, 'member' under each 'expid' has different value. Therefore, the parameter `member_depends = 'expid'` needs to be used in Start(). + +However, in some other cases, the creation of the path could be more complicated. For example, the experiment ID (`$expid$`) can have different members (`$member$`) and even with different model version (`$version`): | expid | member | version | |-------|----------|---------| | a1st | r7i1p1f1 |v20190302| | a1sx |r10i1p1f1 |v20190308| -In this case, the variable member and version have different value depending on the expid (the member r10i1p1f1 does not exist for expid a1st). The path will include this varibles: +In this case, the variable member and version have different value depending on the expid (the member r10i1p1f1 and version v20190302 do not exist for expid a1st). The path will include this varibles: `path <- '/esarchive/exp/ecearth/$expid$/diags/CMIP/EC-Earth-Consortium/EC-Earth3/historical/$member$/Omon/$var$/gn/$version$/$var$_Omon_EC-Earth3_historical_$member$_gn_$year$.nc'` -However, the following parameters are mandatory to make Start() aware of that they are not independent variables: +The current Start() can not deal with multiple dependencies. However, for this case, here is a workaround. The following parameters can be added to Start(): -``` +```r member_depends = 'expid', version_depends = 'expid', + member_depends = 'version', + version_depends = 'member', ``` The final Start() call will look like: -``` +```r yrh1 = 1960 yrh2 = 2014 years <- paste0(c(yrh1 : yrh2), '01-', c(yrh1 : yrh2), '12') @@ -427,9 +441,11 @@ data <- Start(dat = repos, version = 'all', member_depends = 'expid', version_depends = 'expid', + member_depends = 'version', + version_depends = 'member', year = years, time = 'all', - region = indices(1 : 4), + region = indices(1:4), return_vars = list(time = NULL, region = NULL), retrieve = TRUE) ``` @@ -662,6 +678,20 @@ obs <- Start(dat = path.obs, retrieve = T) ``` +### 18. Use glob expression '*' to define the file path +The standard way to define the file path for Start() is using tags (i.e., $TAG_NAME$). +The glob expression, or wildcard, '*', can also be used in the path definition, while the rule is different from the common usage. + +Please note that **'*' can only be used to replace the common part of all the files**. For example, if all the required files have the folder 'EC-Earth-Consortium/' in their path, then this part can be substituted with '*/'. +It can save some effort to define the long and uncritical path, and also make the script cleaner. + +However, if the part replaced by '*' is not same among all the files, Start() will use the first pattern it finds in the first file to substitute '*'. +As a result, the rest files may not be found due to the wrong path pattern. +For example, if the first file is under a folder named 'v20190302/' and the second file is under another one named 'v20190308/', and you define the path pattern as 'v*/', then Start() will use 'v20190302/' for both file paths. +This is different from the common definition of glob expression that tries to expand to match all the existing patterns, so please be careful when using it. + +There is a parameter 'path_glob_permissive' in Start(). If set it to TRUE, the '*' in the filename itself will remain (i.e., as the common definition), while the ones in the path to the filename will still be replaced by the pattern in the first found file. + # Something goes wrong...