diff --git a/inst/doc/faq.md b/inst/doc/faq.md index e7e6bac60e1ca16959d1776d962da3a0fcdd2efa..cb15d9df352598de332af9c35e930290c2172af5 100644 --- a/inst/doc/faq.md +++ b/inst/doc/faq.md @@ -17,8 +17,10 @@ This document intends to be the first reference for any doubts that you may have 11. [Select the longitude/latitude region](#11-select-the-longitudelatitude-region) 12. [What will happen if reorder function is not used](#12-what-will-happen-if-reorder-function-is-not-used) 13. [Load specific grid points data](#13-load-specific-grid-points-data) + 14. [Find the error log when jobs are launched on Power9](#14-find-the-error-log-when-jobs-are-launched-on-power9) + 15. [Specify extra function arguments in the workflow](#15-specify-extra-function-arguments-in-the-workflow) + 16. [Use parameter 'return_vars' in Start()](#16-use-parameter-return_vars-in-start) - 2. **Something goes wrong...** 1. [No space left on device](#1-no-space-left-on-device) @@ -449,6 +451,95 @@ If the values does not match the defined spatial point in the files, **Start** w An example of how to load several gridpoints and how to transform the data could be found in the Use Cases section [example 1.6](/inst/doc/usecase/ex1_6_gridpoint_data.R). +### 14. Find the error log when jobs are launched on Power9 + +Due to connection problem, when Compute() dispatches jobs to Power9, each job in ecFlow ui has a 'Z', zombie, beside, no matter the job is complete or failed. +The zombie blocks the error log to be shown in ecFlow ui output frame. Therefore, you need to log in Power9, go to 'temp_dir' listed in the cluster list in Compute() and enter the job folder. You will find another folder with the same name as the previous layer, then go down to the most inner folder. You will see 'Chunk.1.err'. +For example, the path can be: "/gpfs/scratch/bsc32/bsc32734/startR_hpc/STARTR_CHUNKING_1665710775/STARTR_CHUNKING_1665710775/computation/lead_year_CHUNK_1/lon_CHUNK_1/lat_CHUNK_1/sdate_CHUNK_1/var_CHUNK_1/dataset_CHUNK_1/Chunk.1.err". + +### 15. Specify extra function arguments in the workflow + +The input arguments of the function may not only be the data, sometimes the extra information is required. +The additional arguments should be specified in 'AddStep()'. The following example shows how to assign 'na.rm' in mean(). + +``` + func <- function(x, narm = narm) { # add additional argument 'narm' + a <- apply(x, 2, mean, na.rm = narm) + dim(a) <- c(sdate = length(a)) + return(a) + } + step <- Step(func, target_dims = c('ensemble', 'sdate'), + output_dims = c('sdate')) + wf <- AddStep(data, step, narm = TRUE) # specify the additional argument 'narm' +``` + +### 16. Use parameter 'return_vars' in Start() + +Apart from the data array, retrieving auxiliary variables inside the netCDF files may also be needed. +The parameter 'return_vars' is used to request such variables. +This parameter expects to receive a named variable list. The names are the variable names to be fetched in the netCDF files, and the corresponding value can be: + +(1) NULL, if the variable is common along all the file dimensions (i.e., it will be retrieved only once from the first involved files) +(2) a vector of the file dimension name which to retrieve the variable for +(3) a vector which includes the file dimension for path pattern specification (i.e., 'dat' in the example below) + +For the first and second options, the fetched variable values will be saved in *$Variables$common$*. +For the third option, the fetched variable values will be saved in *$Variables$$*. + +Notice that if the variable is specified by values(), it will be automatically added to return_vars and its value will be NULL. + +Here is an example showing the above three ways. + +``` + repos <- "/esarchive/exp/ecmwf/system5_m1/monthly_mean/tas_f6h/$var$_$sdate$.nc" + var <- 'tas' + lon.min <- 10 + lon.max <- 20 + lat.min <- 20 + lat.max <- 30 + data <- Start(dat = repos, # file dimension for path pattern specification + var = var, + sdate = c('20170101', '20170401'), # file dimension; 'time' is dependent on 'sdate' + ensemble = indices(1:5), + time = indices(1:3), # inner dimension, also an auxiliary variable containing forecast time information + latitude = values(list(lat.min, lat.max)), # inner dimension, common along all files + longitude = values(list(lon.min, lon.max)), # inner dimension, common along all files + return_vars = list(time = 'sdate', # option (2) + longitude = NULL, # option (1) + latitude = NULL), # option (1) + retrieve = FALSE + ) + +``` + +In the return_vars list, we require information of three variables. 'time' values differ from each sdate, while longitude and latitude are common variable among all the files. +You can use `str(data)` to see the information structure. + +``` +str(attr(data, 'Variables')$common) +List of 3 + $ time : POSIXct[1:6], format: "2017-02-01 00:00:00" "2017-05-01 00:00:00" ... + $ longitude: num [1:37(1d)] 10 10.3 10.6 10.8 11.1 ... + $ latitude : num [1:36(1d)] 20.1 20.4 20.7 20.9 21.2 ... + +dim((attr(data, 'Variables')$common$time)) +sdate time + 2 3 +``` + +It is not necessary in this example, but you can try to replace return_vars longitude to `longitude = dat` (option (3)). +You will find that longitude is moved from $common to $dat1 list. + +``` +str(attr(data, 'Variables')$common) +List of 2 + $ time : POSIXct[1:6], format: "2017-02-01 00:00:00" "2017-05-01 00:00:00" ... + $ latitude: num [1:36(1d)] 20.1 20.4 20.7 20.9 21.2 ... + +str(attr(data, 'Variables')$dat1) +List of 1 + $ longitude: num [1:37(1d)] 10 10.3 10.6 10.8 11.1 ... +``` ## Something goes wrong...