From a0fd8ef5f4ae7b3be6078018471b52e84415428e Mon Sep 17 00:00:00 2001 From: aho Date: Tue, 24 Mar 2020 15:20:46 +0100 Subject: [PATCH 1/5] Add FAQ how-to-14 about finding error log in power9. --- inst/doc/faq.md | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/inst/doc/faq.md b/inst/doc/faq.md index e7e6bac..e652b39 100644 --- a/inst/doc/faq.md +++ b/inst/doc/faq.md @@ -17,6 +17,7 @@ This document intends to be the first reference for any doubts that you may have 11. [Select the longitude/latitude region](#11-select-the-longitudelatitude-region) 12. [What will happen if reorder function is not used](#12-what-will-happen-if-reorder-function-is-not-used) 13. [Load specific grid points data](#13-load-specific-grid-points-data) + 14. [Find the error log when jobs are launched on Power9](#14-find-the-error-log-when-jobs-are-launched-on-power9) @@ -449,6 +450,11 @@ If the values does not match the defined spatial point in the files, **Start** w An example of how to load several gridpoints and how to transform the data could be found in the Use Cases section [example 1.6](/inst/doc/usecase/ex1_6_gridpoint_data.R). +### 14. Find the error log when jobs are launched on Power9 + +Due to connection problem, when Compute() dispatches jobs to Power9, each job in ecFlow ui has a 'Z', zombie, beside, no matter the job is complete or failed. +The zombie blocks the error log to be shown in ecFlow ui output frame. Therefore, you need to log in Power9, go to 'temp_dir' listed in the cluster list in Compute() and enter the job folder. You will find another folder with the same name as the previous layer, then go down to the most inner folder. You will see 'Chunk.1.err'. +For example, the path can be: "/gpfs/scratch/bsc32/bsc32734/startR_hpc/STARTR_CHUNKING_1665710775/STARTR_CHUNKING_1665710775/computation/lead_year_CHUNK_1/lon_CHUNK_1/lat_CHUNK_1/sdate_CHUNK_1/var_CHUNK_1/dataset_CHUNK_1/Chunk.1.err". ## Something goes wrong... -- GitLab From 645bb058b53b19d4446e65ffccfa01f306506ff1 Mon Sep 17 00:00:00 2001 From: aho Date: Tue, 24 Mar 2020 16:15:11 +0100 Subject: [PATCH 2/5] Add FAQ how-to-15 about adding extra function argument in the workflow --- inst/doc/faq.md | 19 ++++++++++++++++++- 1 file changed, 18 insertions(+), 1 deletion(-) diff --git a/inst/doc/faq.md b/inst/doc/faq.md index e652b39..435d3d8 100644 --- a/inst/doc/faq.md +++ b/inst/doc/faq.md @@ -19,7 +19,8 @@ This document intends to be the first reference for any doubts that you may have 13. [Load specific grid points data](#13-load-specific-grid-points-data) 14. [Find the error log when jobs are launched on Power9](#14-find-the-error-log-when-jobs-are-launched-on-power9) - + 15. [Specify extra function arguments in the workflow](#15-specify-extra-function-arguments-in-the-workflow) + 2. **Something goes wrong...** 1. [No space left on device](#1-no-space-left-on-device) @@ -456,6 +457,22 @@ Due to connection problem, when Compute() dispatches jobs to Power9, each job in The zombie blocks the error log to be shown in ecFlow ui output frame. Therefore, you need to log in Power9, go to 'temp_dir' listed in the cluster list in Compute() and enter the job folder. You will find another folder with the same name as the previous layer, then go down to the most inner folder. You will see 'Chunk.1.err'. For example, the path can be: "/gpfs/scratch/bsc32/bsc32734/startR_hpc/STARTR_CHUNKING_1665710775/STARTR_CHUNKING_1665710775/computation/lead_year_CHUNK_1/lon_CHUNK_1/lat_CHUNK_1/sdate_CHUNK_1/var_CHUNK_1/dataset_CHUNK_1/Chunk.1.err". +### 15. Specify extra function arguments in the workflow + +The input arguments of the function may not only be the data, sometimes the extra information is required. +The additional arguments should be specified in 'AddStep()'. The following example shows how to assign 'na.rm' in mean(). + +``` + func <- function(x, narm = narm) { # add additional argument 'narm' + a <- apply(x, 2, mean, na.rm = narm) + dim(a) <- c(sdate = length(a)) + return(a) + } + step <- Step(func, target_dims = c('ensemble', 'sdate'), + output_dims = c('sdate')) + wf <- AddStep(data, step, narm = TRUE) # specify the additional argument 'narm' +``` + ## Something goes wrong... -- GitLab From 1d833a0d9d7601d8ca2cd3246fd91c8aa86588f0 Mon Sep 17 00:00:00 2001 From: aho Date: Tue, 24 Mar 2020 16:17:02 +0100 Subject: [PATCH 3/5] Context format fix --- inst/doc/faq.md | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/inst/doc/faq.md b/inst/doc/faq.md index 435d3d8..05ff30b 100644 --- a/inst/doc/faq.md +++ b/inst/doc/faq.md @@ -18,9 +18,8 @@ This document intends to be the first reference for any doubts that you may have 12. [What will happen if reorder function is not used](#12-what-will-happen-if-reorder-function-is-not-used) 13. [Load specific grid points data](#13-load-specific-grid-points-data) 14. [Find the error log when jobs are launched on Power9](#14-find-the-error-log-when-jobs-are-launched-on-power9) - 15. [Specify extra function arguments in the workflow](#15-specify-extra-function-arguments-in-the-workflow) - + 2. **Something goes wrong...** 1. [No space left on device](#1-no-space-left-on-device) -- GitLab From 2e0f9841b2d67549021b4c0bf2b9f3a2156d287c Mon Sep 17 00:00:00 2001 From: aho Date: Thu, 26 Mar 2020 10:52:49 +0100 Subject: [PATCH 4/5] Add how-to-16 about return_vars --- inst/doc/faq.md | 69 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 69 insertions(+) diff --git a/inst/doc/faq.md b/inst/doc/faq.md index 05ff30b..683e115 100644 --- a/inst/doc/faq.md +++ b/inst/doc/faq.md @@ -19,6 +19,7 @@ This document intends to be the first reference for any doubts that you may have 13. [Load specific grid points data](#13-load-specific-grid-points-data) 14. [Find the error log when jobs are launched on Power9](#14-find-the-error-log-when-jobs-are-launched-on-power9) 15. [Specify extra function arguments in the workflow](#15-specify-extra-function-arguments-in-the-workflow) + 16. [Use parameter 'return_vars' in Start()](#16-use-parameter-return_vars-in-Start) 2. **Something goes wrong...** @@ -472,6 +473,74 @@ The additional arguments should be specified in 'AddStep()'. The following examp wf <- AddStep(data, step, narm = TRUE) # specify the additional argument 'narm' ``` +### 16. Use parameter 'return_vars' in Start() + +Apart from the data array, retrieving auxiliary variables inside the netCDF files may also be needed. +The parameter 'return_vars' is used to request such variables. +This parameter expects to receive a named variable list. The names are the variable names to be fetched in the netCDF files, and the corresponding value can be: + +(1) NULL, if the variable is common along all the file dimensions (i.e., it will be retrieved only once from the first involved files) +(2) a vector of the file dimension name which to retrieve the variable for +(3) a vector which includes the file dimension for path pattern specification (i.e., 'dat' in the example below) + +For the first and second options, the fetched variable values will be saved in *$Variables$common$*. +For the third option, the fetched variable values will be saved in *$Variables$$*. + +Notice that if the variable is specified by values(), it will be automatically added to return_vars and its value will be NULL. + +Here is an example showing the above three ways. + +``` + repos <- "/esarchive/exp/ecmwf/system5_m1/monthly_mean/tas_f6h/$var$_$sdate$.nc" + var <- 'tas' + lon.min <- 10 + lon.max <- 20 + lat.min <- 20 + lat.max <- 30 + data <- Start(dat = repos, # file dimension for path pattern specification + var = var, + sdate = c('20170101', '20170401'), # file dimension; 'time' is dependent on 'sdate' + ensemble = indices(1:5), + time = indices(1:3), # inner dimension, also an auxiliary variable containing forecast time information + latitude = values(list(lat.min, lat.max)), # inner dimension, common along all files + longitude = values(list(lon.min, lon.max)), # inner dimension, common along all files + return_vars = list(time = 'sdate', # option (2) + longitude = NULL, # option (1) + latitude = NULL), # option (1) + retrieve = FALSE + ) + +``` + +In the return_vars list, we require information of three variables. 'time' values differ from each sdate, while longitude and latitude are common variable among all the files. +You can use `str(data)` to see the information structure. + +``` +str(attr(data, 'Variables')$common) +List of 3 + $ time : POSIXct[1:6], format: "2017-02-01 00:00:00" "2017-05-01 00:00:00" ... + $ longitude: num [1:37(1d)] 10 10.3 10.6 10.8 11.1 ... + $ latitude : num [1:36(1d)] 20.1 20.4 20.7 20.9 21.2 ... + +dim((attr(data, 'Variables')$common$time)) +sdate time + 2 3 +``` + +It is not necessary in this example, but you can try to replace return_vars longitude to `longitude = dat` (option (3)). +You will find that longitude is moved from $common to $dat1 list. + +``` +str(attr(data, 'Variables')$common) +List of 2 + $ time : POSIXct[1:6], format: "2017-02-01 00:00:00" "2017-05-01 00:00:00" ... + $ latitude: num [1:36(1d)] 20.1 20.4 20.7 20.9 21.2 ... + +str(attr(data, 'Variables')$dat1) +List of 1 + $ longitude: num [1:37(1d)] 10 10.3 10.6 10.8 11.1 ... +``` + ## Something goes wrong... -- GitLab From 1b8b9a49dfaa25dc08ada7cce4aba50be3476b47 Mon Sep 17 00:00:00 2001 From: aho Date: Thu, 26 Mar 2020 11:16:34 +0100 Subject: [PATCH 5/5] fix hyperlink for how-to-16 --- inst/doc/faq.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/inst/doc/faq.md b/inst/doc/faq.md index 683e115..cb15d9d 100644 --- a/inst/doc/faq.md +++ b/inst/doc/faq.md @@ -19,7 +19,7 @@ This document intends to be the first reference for any doubts that you may have 13. [Load specific grid points data](#13-load-specific-grid-points-data) 14. [Find the error log when jobs are launched on Power9](#14-find-the-error-log-when-jobs-are-launched-on-power9) 15. [Specify extra function arguments in the workflow](#15-specify-extra-function-arguments-in-the-workflow) - 16. [Use parameter 'return_vars' in Start()](#16-use-parameter-return_vars-in-Start) + 16. [Use parameter 'return_vars' in Start()](#16-use-parameter-return_vars-in-start) 2. **Something goes wrong...** -- GitLab