From 29762c4288c00e1c3aa74d9d7e5766a61b9c30f3 Mon Sep 17 00:00:00 2001 From: aho Date: Fri, 26 Jun 2020 12:18:30 +0200 Subject: [PATCH 1/3] Revise how-to-8 and add how-to-18 --- inst/doc/faq.md | 44 ++++++++++++++++++++++++++++++++++++-------- 1 file changed, 36 insertions(+), 8 deletions(-) diff --git a/inst/doc/faq.md b/inst/doc/faq.md index 6f1d1ac..2a24292 100644 --- a/inst/doc/faq.md +++ b/inst/doc/faq.md @@ -21,6 +21,7 @@ This document intends to be the first reference for any doubts that you may have 15. [Specify extra function arguments in the workflow](#15-specify-extra-function-arguments-in-the-workflow) 16. [Use parameter 'return_vars' in Start()](#16-use-parameter-return_vars-in-start) 17. [Use parameter 'split_multiselected_dims' in Start()](#17-use-parameter-split_multiselected_dims-in-start) + 18. [Use glob expression '*' to define the file path](#18-use-glob-expression-*-to-define-the-file-path) 2. **Something goes wrong...** @@ -388,35 +389,46 @@ Look at the position of `extra_queue_params` parameter in a full call of Compute wait = TRUE) ``` -### 8. Define a path with multiple dependencies +### 8. Define a path with multiple dependencies The structure of the BSC Earth data repository 'esarchive' allows us to create a path pattern to the data by using different variables (between dollar symbol), such as `$var$`, for the variable name, or `$sdates$`, for the start date of the simulation. Here is an example for loading monthly simulations of system4_m1 data: `path <- '/esarchive/exp/ecmwf/system4_m1/monthly_mean/$var$_f6h/$var$_$sdate$.nc'` -The function Start() will require two parameters 'var' and 'sdate' to load the desired data. +The function Start() will require two parameters, or we call them 'file dimension', 'var' and 'sdate' to load the desired data. -In some cases, the creation of the path could be a little bit more complicated. Some researchers create their own EC-Earth experiments which are identified by an experiment ID (`$expid$`) and with different model version (`$version`), even for different members (`$member$`): +In some cases, the file dimensions have dependence relationship. Some researchers create their own EC-Earth experiments which are identified by an experiment ID (`$expid$`) and with different members (`$member$`): + +| expid | member | +|-------|----------| +| a1st | r7i1p1f1 | +| a1sx |r10i1p1f1 | + +In this case, 'member' under each 'expid' has different value. Therefore, the parameter `member_depends = 'expid'` needs to be used in Start(). + +However, in some other cases, the creation of the path could be more complicated. For example, the experiment ID (`$expid$`) can have different members (`$member$`) and even with different model version (`$version`): | expid | member | version | |-------|----------|---------| | a1st | r7i1p1f1 |v20190302| | a1sx |r10i1p1f1 |v20190308| -In this case, the variable member and version have different value depending on the expid (the member r10i1p1f1 does not exist for expid a1st). The path will include this varibles: +In this case, the variable member and version have different value depending on the expid (the member r10i1p1f1 and version v20190302 do not exist for expid a1st). The path will include this varibles: `path <- '/esarchive/exp/ecearth/$expid$/diags/CMIP/EC-Earth-Consortium/EC-Earth3/historical/$member$/Omon/$var$/gn/$version$/$var$_Omon_EC-Earth3_historical_$member$_gn_$year$.nc'` -However, the following parameters are mandatory to make Start() aware of that they are not independent variables: +The current Start() can not deal with multiple dependencies. However, for this case, here is a workaround. The following parameters can be added to Start(): -``` +```r member_depends = 'expid', version_depends = 'expid', + member_depends = 'version', + version_depends = 'member', ``` The final Start() call will look like: -``` +```r yrh1 = 1960 yrh2 = 2014 years <- paste0(c(yrh1 : yrh2), '01-', c(yrh1 : yrh2), '12') @@ -427,9 +439,11 @@ data <- Start(dat = repos, version = 'all', member_depends = 'expid', version_depends = 'expid', + member_depends = 'version', + version_depends = 'member', year = years, time = 'all', - region = indices(1 : 4), + region = indices(1:4), return_vars = list(time = NULL, region = NULL), retrieve = TRUE) ``` @@ -662,6 +676,20 @@ obs <- Start(dat = path.obs, retrieve = T) ``` +### 18. Use glob expression '*' to define the file path +The standard way to define the file path for Start() is using tags (i.e., $TAG_NAME$). +The glob expression, or wildcard, '*', can also be used in the path definition, while the rule is different from the common usage. + +Please note that **'*' can only be used to replace the common part of all the files**. For example, if all the required files have the folder 'EC-Earth-Consortium/' in their path, then this part can be substituted with '*/'. +It can save some effort to define the long and uncritical path, and also make the script cleaner. + +However, if the part replaced by '*' is not same among all the files, Start() will use the first pattern it finds in the first file to substitute '*'. +As a result, the rest files may not be found due to the wrong path pattern. +For example, if the first file is under a folder named 'v20190302/' and the second file is under another one named 'v20190308/', and you define the path pattern as 'v*/', then Start() will use 'v20190302/' for both file paths. +This is different from the common definition of glob expression that tries to expand to match all the existing patterns, so please be careful when using it. + +There is a parameter 'path_glob_permissive' in Start(). If set it to TRUE, the '*' in the filename itself will remain (i.e., as the common definition), while the ones in the path to the filename will still be replaced by the pattern in the first found file. + # Something goes wrong... -- GitLab From 39006ffd3c7ebc911a59cc1352c22a5c6e38836d Mon Sep 17 00:00:00 2001 From: aho Date: Fri, 26 Jun 2020 12:22:38 +0200 Subject: [PATCH 2/3] Fix hyperlink --- inst/doc/faq.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/inst/doc/faq.md b/inst/doc/faq.md index 2a24292..601fd96 100644 --- a/inst/doc/faq.md +++ b/inst/doc/faq.md @@ -21,7 +21,7 @@ This document intends to be the first reference for any doubts that you may have 15. [Specify extra function arguments in the workflow](#15-specify-extra-function-arguments-in-the-workflow) 16. [Use parameter 'return_vars' in Start()](#16-use-parameter-return_vars-in-start) 17. [Use parameter 'split_multiselected_dims' in Start()](#17-use-parameter-split_multiselected_dims-in-start) - 18. [Use glob expression '*' to define the file path](#18-use-glob-expression-*-to-define-the-file-path) + 18. [Use glob expression '*' to define the file path](#18-use-glob-expression-to-define-the-file-path) 2. **Something goes wrong...** -- GitLab From eaac405d68f84046ba80f9c01c0c7f3a457807ef Mon Sep 17 00:00:00 2001 From: aho Date: Fri, 26 Jun 2020 18:45:31 +0200 Subject: [PATCH 3/3] Revise confusing sentences in how-to-8 --- inst/doc/faq.md | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/inst/doc/faq.md b/inst/doc/faq.md index 601fd96..4ad94a0 100644 --- a/inst/doc/faq.md +++ b/inst/doc/faq.md @@ -391,11 +391,13 @@ Look at the position of `extra_queue_params` parameter in a full call of Compute ### 8. Define a path with multiple dependencies -The structure of the BSC Earth data repository 'esarchive' allows us to create a path pattern to the data by using different variables (between dollar symbol), such as `$var$`, for the variable name, or `$sdates$`, for the start date of the simulation. Here is an example for loading monthly simulations of system4_m1 data: +The structure of the BSC Earth data repository 'esarchive' allows us to create a path pattern to the data by using different variables +(between dollar symbol), such as `$var$`, for the variable name, or `$sdates$`, for the start date of the simulation. We call these variables 'file dimension'. +Here is an example for loading monthly simulations of system4_m1 data: `path <- '/esarchive/exp/ecmwf/system4_m1/monthly_mean/$var$_f6h/$var$_$sdate$.nc'` -The function Start() will require two parameters, or we call them 'file dimension', 'var' and 'sdate' to load the desired data. +The function Start() will require two parameters 'var' and 'sdate' to load the desired data. In some cases, the file dimensions have dependence relationship. Some researchers create their own EC-Earth experiments which are identified by an experiment ID (`$expid$`) and with different members (`$member$`): -- GitLab