practical_guide_bsc.md

# Practical guide for processing large data sets at BSC's HPCs using startR

In this guide some practical examples are shown for you to see how to use startR to process large data sets in parallel on the BSC's HPCs (CTE-Power 9, Marenostrum 4, ...). See the main page of the [**startR**](README.md) project for a general overview of the features of startR, without actual guidance on how to use it.

With the constant increase of resolution (in all possible dimensions) of the weather and climate model outputs, and with the need for using computationally demanding analytical methodologies (e.g. bootstraping millions of times), it is becoming difficult or impossible to perform the analysis of such outputs with conventional tools. While tools exist to process large geospatial data sets on HPCs, they usually require adapting your data to specific formats, migrating to specific database systems, or require advanced knowledge of computer sciences or of a specific programming language or framework.

startR allows the R user to apply user-defined functions or procedures to large (as large as desired) collections of NetCDF files (no specific convention is required), transparently using computational resources in HPCs (multi-core, multi-node clusters) to minimize the time to solution. Although startR can be difficult to use if learnt from the documentation of its functions, it can also be used effortlessly if re-using and tweaking already existing startR scripts, like the ones provided later in this guide. startR scripts are written in R, and are short (usually under 30 lines of code), concise, and easy to read.

Other things you can expect to do with startR:
- Combining data from multiple model executions or observational data sources.
- Extracting values for a certain time period, geographical location, etc., from a collection of NetCDF files.
- Obtaining data arrays with results of analytical procedures that are to be plotted or stored as RData or NetCDF for later use in the analysis workflow.
- Applying a set of analytical procedures to the same data.

Things that are not supposed to be done with startR:
- Curating/homogenizing model output files or generating files to be stored under /esarchive following the department/community conventions.

If startR is suitable for your use case, you will then need to follow the configuration steps listed in the first section of this guide to make sure startR works on your workstation with the HPC of your choice.

Afterwards, you will need to understand and use six functions, all of them included in the startR package:
 - **Start()**, for declaing the data sets to process
 - **Step()** and **AddStep()**, for specifying the operation to be applied to the data
 - **Compute()**, for specifying the HPC to be employed, the number of chunks and cores, and to trigger the computation
 - **Collect()** and the **EC-Flow graphical user interface**, for monitoring of the progress and collection of results


## Configuring startR

At BSC, the only configuration step you need to follow is to set up passwordless connection with the HPC. You do not need to follow the complete deployment steps since all dependencies are already installed for you to use, but you can find them under the [**Deployment**](inst/doc/deployment.md) section.

Specifically, you need to set up passwordless, userless access from your machine to the HPC login node, and from the HPC login node to your machine if at all possible. In order to establish the connection in one of the directions, you need do the following:

1- generate an ssh pair of keys on the origin host if you do not have one, using `ssh-keygen -t rsa`

2- ssh to the destionation node and create a directory where to store it, using `ssh username@hostname_or_ip mkdir -p .ssh`. 'hostname_or_ip' refers to the host name or IP address of the login node of the selected HPC, and 'username' to your account name on the HPC, which may not coincide with the one in your workstation.

3- dump your public key on a new file under that folder, using `cat .ssh/id_rsa.pub | ssh username@hostname_or_ip 'cat >> .ssh/authorized_keys'`

4- adjust the permissions of the key repository, using `ssh username@hostname_or_ip "chmod 700 .ssh; chmod 640 .ssh/authorized_keys"`

5- if your username is different on your workstation and on the login node of the HPC, add an entry in the file .ssh/config in your workstation as follows:
```
  Host short_name_of_the_host
    HostName hostname_or_ip
    User username
    IdentityFile ~/.ssh/id_rsa
```

After following these steps for the connections in both directions (although from the HPC to the workstation might not be possible), you are good to go.

Do not forget adding the following lines in your .bashrc on CTE-Power if you are planning to run on CTE-Power:
```
if [[ $BSC_MACHINE == "power" ]] ; then
  module unuse /apps/modules/modulefiles/applications
  module use /gpfs/projects/bsc32/software/rhel/7.4/ppc64le/POWER9/modules/all/
fi
```

You can add the following lines in your .bashrc file on your workstation for convenience:
```
alias ctp='ssh -X username@hostname_or_ip'
alias start='module load R CDO ecFlow'
```

## Using startR

If you have successfully gone through the configuration steps, you will just need to run the following commands in a terminal session and a fresh R session will pop up with the startR environment ready to use.

```
start
R
```

The library can be loaded as follows:

```R
library(startR)
```

### Start()

In order for startR to recognize the data sets you want to process, you first need to declare them. The first step in the declaration of a data set is to build a special path string that encodes where all the involved NetCDF files to be processed are stored. It contains some wildcards in those parts of the path that vary across files. This special path string is also called "path pattern".

Before defining an example path pattern, let's introduce some target NetCDF files. In the esarchive, we can find the following files:

```
/esarchive/exp/ecmwf/system5_m1/6hourly/
  |--tas/
  |   |--tas_19930101.nc
  |   |--tas_19930201.nc
  |   |        ...
  |   |--tas_20171201.nc
  |--tos/
      |--tos_19930101.nc
      |--tos_19930201.nc
      |        ...
      |--tos_20171201.nc
```

A path pattern that can be used to encode the location of these files in a compact way is the following:

```r
repos <- '/esarchive/exp/ecmwf/system5_m1/6hourly/$var$/$var$_$sdate$.nc'
```

The wildcards used (the pieces wrapped between '$' symbols) can be given any names you like. They do not necessarily need to be 'var' or 'sdate' or match any specific key word (although in this case, as explained later, the 'var' name will trigger a special feature of `Start()`).

Once the path pattern is specified, a `Start()` call can be built, in which you need to provide, as parameters, the specific values of interest for each of the wildcards (also called outer dimensions, or file dimensions), as well as for each of the dimensions inside the NetCDF files (inner dimensions).

You can check in advance which dimensions are inside the NetCDF files by using e.g. easyNCDF on one of the files:

```r
easyNCDF::NcReadDims('/esarchive/exp/ecmwf/system5_m1/6hourly/tas/tas_19930101.nc',
                     var_names = 'tas')
```

This will show the names and lengths of the dimensions of the selected variable:

```r
      var longitude  latitude  ensemble      time 
        1      1296       640        25       860
```

*Note: If you check the dimensions of that file with `ncdump -h`, you will realize the 'var' dimension is actually not defined inside. `NcReadDims()` and `Start()`, though, perceive the different variables inside a file as if stored along a virtual dimension called 'var'. You can ignore this for now and assume 'var' is simply a file dimension (since it appears as a wildcard in the path pattern). Read more on this in the note at the end of this section.*

Once we know the dimension names, we have all the information we need to put the `Start()` call together:

```r
data <- Start(dat = repos,
              # outer dimensions
              var = 'tas',
              sdate = '19930101',
              # inner dimensions
              ensemble = 'all',
              time = 'all',
              latitude = 'all',
              longitude = 'all')
```

For each of the dimensions, the values of interest can be specified in three possible ways:
- Using one or more numeric indices, for example `sdate = indices(1)`, or `time = indices(1, 3, 5)`.
- Using one or more actual values, for example `sdate = values('19930101')`, or `ensemble = values(c('r1i1p1', 'r2i1p1'))`, or `latitude = values(10, 10.5, 11)`. The `values()` helper function can be omitted (as shown in the example).
- Using a list of two numeric values, for example `sdate = indices(list(5, 10))`. This will take all indices from the 5th to the 10th.
- Using a list of two actual values, for example `sdate = values(list('r1i1p1', 'r5i1p1'))` or `latitude = values(list(-45, 75))`. This will take all values, in order, placed between the two values specified (both ends included).
- Using the special keywords 'all', 'first' or 'last'.

Also, the dimensions specified in the `Start()` call do not need to follow any specific order, not even the actual order in the path pattern or inside the file. The order, though, can have an impact on the performance of `Start()` as explained later in this section.

Running the `Start()` call shown above will display some progress and information messages:

```r
* Exploring files... This will take a variable amount of time depending
*   on the issued request and the performance of the file server...
* Detected dimension sizes:
*         dat:    1
*         var:    1
*       sdate:    1
*    ensemble:   25
*        time:  860
*    latitude:  640
*   longitude: 1296
* Total size of involved data:
*   1 x 1 x 1 x 25 x 860 x 640 x 1296 x 8 bytes = 132.9 Gb
* Successfully discovered data dimensions.
Warning messages:
1: ! Warning: Parameter 'pattern_dims' not specified. Taking the first dimension,
!   'dat' as 'pattern_dims'. 
2: ! Warning: Could not find any pattern dim with explicit data set descriptions (in
!   the form of list of lists). Taking the first pattern dim, 'dat', as
!   dimension with pattern specifications.
```

The warnings shown are normal, and could be avoided with a more wordy specification of the parameters to the `Start()` function.

The dimensions of the selected data set and the total size are shown. As you have probably noticed, this `Start()` call is very fast, even though several GB of data are involved. This is because `Start()` is simply discovering the location and dimension of the involved data.

You can give a quick look to the collected metadata with `str(data)`.

```r
Class 'startR_header' length 9 Start(dat = "/esarchive/exp/ecmwf/system5_m1/6hourly/$var$/$var$_$sdate$.nc",      var = "tas", sdate = "19930101", ensemble = "all", time = "all", latitude = "all",  ...
  ..- attr(*, "Dimensions")= Named num [1:7] 1 1 1 25 860 ...
  .. ..- attr(*, "names")= chr [1:7] "dat" "var" "sdate" "ensemble" ...
  ..- attr(*, "Variables")=List of 2
  .. ..$ common: NULL
  .. ..$ dat1  : NULL
  ..- attr(*, "ExpectedFiles")= chr [1, 1, 1] "/esarchive/exp/ecmwf/system5_m1/6hourly/tas/tas_19930101.nc"
  ..- attr(*, "FileSelectors")=List of 1
  .. ..$ dat1:List of 3
  .. .. ..$ dat  :List of 1
  .. .. .. ..$ : chr "dat1"
  .. .. ..$ var  :List of 1
  .. .. .. ..$ : chr "tas"
  .. .. ..$ sdate:List of 1
  .. .. .. ..$ : chr "19930101"
  ..- attr(*, "PatternDim")= chr "dat"
```

The retrieved information can be accessed with the `attr()` function. For example:

```r
attr(data, 'FileSelectors')$dat1
```

If you are interested in actually loading the entire data set in your machine you can do so in two ways (*be careful, loading the data involved in the `Start()` call in this example will most likely stall your machine. Try it with a smaller region or a subset of forecast time steps*):
- adding the parameter `retrieve = TRUE` in your `Start()` call.
- evaluating the object returned by `Start()`: `data_load <- eval(data)`

You may realize that this functionality is similar to the `Load()` function in the s2dverification package. In fact, `Start()` is more advanced and flexible, although `Load()` is more mature and consistent for loading typical seasonal to decadal forecasting data. `Load()` will be adapted in the future to use `Start()` internally.

There are no constrains for the number or names of the outer or inner dimensions used in a `Start()` call. In other words, `Start()` will handle NetCDF files with any number of dimensions with any name, as well as files distributed across folders in complex ways, since you can use customized wildcards in the path pattern.

Explanation on the order of dimensions.

Synonims.

Dimensions across.

See the documentation on the `Start()` function (https://earth.bsc.es/gitlab/es/startR/blob/master/vignettes/start.md) or in `?Start` for more information.

*Note on the 'var' dimension*: as mentioned above in this section, `NcVarReader()` is showing as if a virtual dimension 'var' appears inside the file. The existence of this dimension is justified by the fact that, many times, NetCDF files contain more than one variable. The 'var' dimension should hence be considered a "inner" dimension. But, in our example, the dimension 'var' is also defined as a file dimension in the path pattern. So, following the logic of `Start()`, there would be two 'var' dimensions, one of them outer and the other inner, and we should consequently specify indices for each of them. However, as exception, they are automatically understood to be the same dimension, and the target variable name specified as index for the outer 'var' dimension is also re-used to select the variable inside the file. This is a feature triggered only by the 'var' dimension name and, if other dimension names appeared more than once as inner or outer dimensions, `Start()` would crash throw an error. The feature described here is useful for the very common case where file paths contain the variable name and that variable is the only climate variable inside the file. If this feature was not available, one could still define the data set as shown in the code snippet below, where there would be some redundancy in the `Start()` call and in the dimensions of the resulting array.*

```r
repos <- '/esarchive/exp/ecmwf/system5_m1/6hourly/$var_out$/$var_out$_$sdate$.nc'

data <- Start(dat = repos,
              # outer dimensions
              var_out = 'tas',
              sdate = '19930101',
              # inner dimensions
              var = 'tas',
              ensemble = 'all',
              time = 'all',
              latitude = 'all',
              longitude = 'all')
```

### Step() and AddStep()

Once the data sources are declared, you can define the operation to be applied. The operation needs to be encapsulated in the form of an R function receiving one or more multidimensional arrays (plus additional helper parameters) and returning one or more multidimensional arrays. For example:

```r
fun <- function(x) {
  r <- sqrt(sum(x ^ 2) / length(x))
  for (i in 1:100) {
    r <- sqrt(sum(x ^ 2) / length(x))
  }
  dim(r) <- c(time = 1)
  r
}
```

This function receives only one multidimensional array (with dimensions c('ensemble' and 'time'), although not expressed in the code), and returns one multidimensional array (with a single dimension c('time') of length 1).

Having the function, the startR Step for this operation can be defined with the function `Step()` which requires, for a proper functioning, to specify the names of the dimensions of the input arrays expected by the function (in this example, a single array with the dimensions 'ensemble' and 'time'), as well as the names of the dimensions of the arrays the function returns:

```r
step <- Step(fun = fun, 
             target_dims = c('ensemble', 'time'), 
             output_dims = c('time'))
```

Finally, a workflow of steps can be assembled as follows:

```r
wf <- AddStep(data, step)
```

Functions that receive or return multiple multidimensional arrays are also supported by specifying lists of vectors of dimension names as `target_dims` or `output_dims`.

It is not possible for now to define workflows with more than one step. This is pending future work.

Since functions wrapped with the `Step()` function will potentially be called thousands of times, it is recommended to keep them as light as possible by, for example, avoiding calls to the `library()` function to load other packages or interacting with files on disk. See the documentation on the parameter `use_libraries` of the `Step()` function, or consider adding additional parameters to the step function with extra information. 

### Compute()

Once the data sources are declared and the workflow is defined, you can proceed to specify the execution parameters (including which platform to run on) and trigger the execution.

Next, a few examples show StartR codes to process datasets locally and on two example HPCs at BSC: the fat nodes and CTE-Power.

#### Compute() locally

When only your own workstation is available, StartR can still be useful to process a very large dataset by chunks, avoiding a RAM memory overload and crash of the workstation. StartR will simply load the dataset by chunks and each of them will be processed sequentially. The operations defined in the workflow will be applied to each chunk, and the results will be stored on a temporary file. `Compute()` will finally gather and merge the results of each chunk and return a single data object, including one or multiple multidimensional data arrays, and additional metadata.

```r
res <- Compute(wf,
               chunks = list(latitude = 2,
                             longitude = 2),
               threads_load = 1,
               threads_compute = 2,
               silent = FALSE,
               debug = FALSE,
               wait = FALSE)
```

compute will return a data array, as if it was a variable in your R session

discuss ecFlow

discuss plotProfiling

#### Compute() on the fat nodes

```r
res <- Compute(wf,
               chunks = list(latitude = 2,
                             longitude = 2),
               threads_load = 1,
               threads_compute = 2,
               cluster = list(queue_host = 'bsceslogin01.bsc.es',
                              queue_type = 'slurm',
                              temp_dir = '/home/Earth/nmanuben/startR_tests/',
                              cores_per_job = 2,
                              job_wallclock = '00:10:00',
                              max_jobs = 4,
                              bidirectional = TRUE
                             ),
               ecflow_suite_dir = '/home/Earth/nmanuben/test_remove/')
```

'queue_host' must match the 'short_name_of_the_host' you associated to the login node of the selected HPC in your .ssh/config file. 

#### Compute() on CTE-Power

```r
library(startR)

#repos <- '/esarchive/exp/ecmwf/system5_m1/6hourly/$var$/$var$_$sdate$.nc'
repos <- '/esarchive/exp/ecmwf/system5_m1/6hourly/$var$-longitudeS1latitudeS1all/$var$_$sdate$.nc'
data <- Start(dat = repos,
              var = 'tas',
              #sdate = 'all',
              sdate = indices(1),
              ensemble = 'all',
              time = 'all',
              #latitude = 'all',
              latitude = indices(1:40),
              #longitude = 'all',
              longitude = indices(1:40),
              retrieve = FALSE)
lons <- attr(data, 'Variables')$common$longitude
lats <- attr(data, 'Variables')$common$latitude

fun <- function(x) apply(x + 1, 2, mean)
step <- Step(fun, c('ensemble', 'time'), c('time'))
wf <- AddStep(data, step)

res <- Compute(wf,
               chunks = list(latitude = 2,
                             longitude = 2),
               threads_load = 1,
               threads_compute = 2,
               cluster = list(queue_host = 'p9login1.bsc.es',
                              queue_type = 'slurm',
                              temp_dir = '/gpfs/scratch/bsc32/bsc32473/startR_tests/',
                              lib_dir = '/gpfs/projects/bsc32/share/R_libs/3.5/',
                              r_module = 'R/3.5.0',
                              cores_per_job = 2,
                              job_wallclock = '00:10:00',
                              max_jobs = 4,
                              #extra_queue_params = list('#SBATCH --qos=bsc_es'),
                              bidirectional = FALSE,
                              polling_period = 10
                             ),
               ecflow_suite_dir = '/home/Earth/nmanuben/test_remove/',
               ecflow_server = NULL,
               silent = FALSE,
               debug = FALSE,
               wait = FALSE)

result <- Collect(res, wait = TRUE)
```

### Additional information

#### Tricks and best practices

How to select number of chunks

What to do if my function requires all dimensions

#### Pending features

Computation of weekly means with startR is still pending future work. By now, it is not possible to do that because the metadata associated to each chunk, such as the dates, is not being sent to the `Compute()` function.

#### Example using experimental and (date-corresponding) observational data

```r
repos <- paste0('/esnas/exp/ecmwf/system4_m1/6hourly/',
                '$var$/$var$_$sdate$.nc')

system4 <- Start(dat = repos,
                 var = 'sfcWind',
                 #sdate = paste0(1981:2015, '1101'),
                 sdate = paste0(1981:1984, '1101'),
                 #time = indices((30*4+1):(120*4)),
                 time = indices((30*4+1):(30*4+4)),
                 ensemble = 'all',
                 #ensemble = indices(1:6),
                 #latitude = 'all',
                 latitude = indices(1:10),
                 #longitude = 'all',
                 longitude = indices(1:10),
                 return_vars = list(latitude = NULL,
                                    longitude = NULL,
                                    time = c('sdate')))

repos <- paste0('/esnas/recon/ecmwf/erainterim/6hourly/',
                '$var$/$var$_$file_date$.nc')

dates <- attr(system4, 'Variables')$common$time
dates_file <- sort(unique(gsub('-', '', sapply(as.character(dates),
substr, 1, 7))))

erai <-    Start(dat = repos,
                 var = 'sfcWind',
                 file_date = dates_file,
                 time = values(dates),
                 #latitude = 'all',
                 latitude = indices(1:10),
                 #longitude = 'all',
                 longitude = indices(1:10),
                 time_var = 'time',
                 time_tolerance = as.difftime(1, units = 'hours'),
                 time_across = 'file_date',
                 return_vars = list(latitude = NULL,
                                    longitude = NULL,
                                    time = 'file_date'),
                 merge_across_dims = TRUE,
                 split_multiselected_dims = TRUE)

step <- Step(eqmcv_atomic,
             list(a = c('ensemble', 'sdate'),
                  b = c('sdate')),
             list(c = c('ensemble', 'sdate')))

res <- Compute(step, list(system4, erai),
               chunks = list(latitude = 5,
                             longitude = 5,
                             time = 2),
               cluster = list(queue_host = 'bsceslogin01.bsc.es',
                              max_jobs = 4,
                              cores_per_job = 2),
               shared_dir = '/esnas/scratch/nmanuben/test_bychunk',
               wait = FALSE) 
```

### Example on MareNostrum 4

```r
library(startR)

repos <- paste0('/esarchive/exp/ecmwf/system5_m1/6hourly/',
                '$var$-longitudeS1latitudeS1all/$var$_$sdate$.nc')
# Slower alternative, using files with a less efficient NetCDF 
# compression configuration
#repos <- '/esarchive/exp/ecmwf/system5_m1/6hourly/$var$/$var$_$sdate$.nc'

data <- Start(dat = repos,
              var = 'tas',
              sdate = indices(1),
              ensemble = 'all',
              time = 'all',
              latitude = indices(1:40),
              longitude = indices(1:40),
              retrieve = FALSE)
lons <- attr(data, 'Variables')$common$longitude
lats <- attr(data, 'Variables')$common$latitude

fun <- function(x) apply(x + 1, 2, mean)
step <- Step(fun, c('ensemble', 'time'), c('time'))
wf <- AddStep(data, step)

res <- Compute(wf,
               chunks = list(latitude = 2,
                             longitude = 2),
               threads_load = 1,
               threads_compute = 2,
               cluster = list(queue_host = 'mn2.bsc.es',
                              queue_type = 'slurm',
                              data_dir = '/gpfs/projects/bsc32/share/startR_data_repos/',
                              temp_dir = '/gpfs/scratch/pr1efe00/pr1efe03/startR_tests/',
                              lib_dir = '/gpfs/projects/bsc32/share/R_libs/3.4/',
                              r_module = 'R/3.4.0',
                              cores_per_job = 2,
                              job_wallclock = '00:10:00',
                              max_jobs = 4,
                              extra_queue_params = list('#SBATCH --qos=prace'),
                              bidirectional = FALSE,
                              polling_period = 10,
                              special_setup = 'marenostrum4'
                             ),
               ecflow_suite_dir = '/home/Earth/nmanuben/test_remove/',
               ecflow_server = NULL,
               silent = FALSE,
               debug = FALSE,
               wait = TRUE)
```

## Seasonal forecast verification example on cca

```r
crps <- function(x, y) {
  mean(SpecsVerification::EnsCrps(x, y, R.new = Inf))
}

library(startR)

repos <- '/perm/ms/spesiccf/c3ah/qa4seas/data/seasonal/g1x1/ecmf-system4/msmm/atmos/seas/tprate/12/ecmf-system4_msmm_atmos_seas_sfc_$date$_tprate_g1x1_init12.nc'

data <- Start(dat = repos,
              var = 'tprate',
              date = 'all',
              time = 'all',
              number = 'all',
              latitude = 'all',
              longitude = 'all',
              return_vars = list(time = 'date'))

dates <- attr(data, 'Variables')$common$time

repos <- '/perm/ms/spesiccf/c3ah/qa4seas/data/ecmf-ei_msmm_atmos_seas_sfc_19910101-20161201_t2m_g1x1_init02.nc'

obs <- Start(dat = repos,
             var = 't2m',
             time = values(dates),
             latitude = 'all',
             longitude = 'all',
             split_multiselected_dims = TRUE)

s <- Step(crps, target_dims = list(c('date', 'number'), c('date')),
                output_dims = NULL)
wf <- AddStep(list(data, obs), s)

r <- Compute(wf,
             chunks = list(latitude = 10,
                           longitude = 3),
             cluster = list(queue_host = 'cca',
                            queue_type = 'pbs',
                            max_jobs = 10,
                            init_commands = list('module load ecflow'),
                            r_module = 'R/3.3.1',
                            extra_queue_params = list('#PBS -l EC_billing_account=spesiccf')),
             ecflow_output_dir = '/perm/ms/spesiccf/c3ah/startR_test/',
             is_ecflow_output_dir_shared = FALSE
            )
```

### Compute() cluster template for Nord III

```r
cluster = list(queue_host = 'nord1.bsc.es',
               queue_type = 'lsf',
               data_dir = '/gpfs/projects/bsc32/share/startR_data_repos/',
               temp_dir = '/gpfs/scratch/bsc32/bsc32473/startR_tests/',
               lib_dir = '/gpfs/projects/bsc32/share/R_libs/3.3/',
               init_commands = list('module load intel/16.0.1'),
               r_module = 'R/3.3.0',
               cores_per_job = 2,
               job_wallclock = '00:10',
               max_jobs = 4,
               extra_queue_params = list('#BSUB -q bsc_es'),
               bidirectional = FALSE,
               polling_period = 10,
               special_setup = 'marenostrum4'
              )
```

### Compute() cluster template for MinoTauro

```r
cluster = list(queue_host = 'mt1.bsc.es',
               queue_type = 'slurm',
               data_dir = '/gpfs/projects/bsc32/share/startR_data_repos/',
               temp_dir = '/gpfs/scratch/bsc32/bsc32473/startR_tests/',
               lib_dir = '/gpfs/projects/bsc32/share/R_libs/3.3/',
               r_module = 'R/3.3.3',
               cores_per_job = 2,
               job_wallclock = '00:10:00',
               max_jobs = 4,
               extra_queue_params = list('#SBATCH --qos=bsc_es'),
               bidirectional = FALSE,
               polling_period = 10,
               special_setup = 'marenostrum4'
              )
```

### Example on CTE-Power using GPUs