Commit 6a125f06 authored by erifarov's avatar erifarov
Browse files

Added Nord3-v2 settings to the practical guide

parent 49a6ed51
Pipeline #7313 passed with stage
in 60 minutes and 25 seconds
# Practical guide for processing large data sets with startR
This guide includes explanations and practical examples for you to learn how to use startR to efficiently process large data sets in parallel on the BSC's HPCs (CTE-Power 9, Marenostrum 4, ...). See the main page of the [**startR**](README.md) project for a general overview of the features of startR, without actual guidance on how to use it.
This guide includes explanations and practical examples for you to learn how to use startR to efficiently process large data sets in parallel on the BSC's HPCs (Nord3v2, CTE-Power 9, Marenostrum 4, ...). See the main page of the [**startR**](README.md) project for a general overview of the features of startR, without actual guidance on how to use it.
If you would like to start using startR rightaway on the BSC infrastructure, you can directly go through the "Configuring startR" section, copy/paste the basic startR script example shown at the end of the "Introduction" section onto the text editor of your preference, adjust the paths and user names specified in the `Compute()` call, and run the code in an R session after loading the R and ecFlow modules.
## Index
## Index
1. [**Motivation**](inst/doc/practical_guide.md#motivation)
2. [**Introduction**](inst/doc/practical_guide.md#introduction)
......@@ -53,7 +53,7 @@ Afterwards, you will need to understand and use five functions, all of them incl
- **Compute()**, for specifying the HPC to be employed, the execution parameters (e.g. number of chunks and cores), and to trigger the computation
- **Collect()** and the **EC-Flow graphical user interface**, for monitoring of the progress and collection of results
Next, you can see an example startR script performing the ensemble mean of a small data set on CTE-Power9, for you to get a broad picture of how the startR functions interact and the information that is represented in a startR script. Note that the `queue_host`, `temp_dir` and `ecflow_suite_dir` parameters in the `Compute()` call are user-specific.
Next, you can see an example startR script performing the ensemble mean of a small data set on an HPC cluster such as Nord3v2 or CTE-Power9, for you to get a broad picture of how the startR functions interact and the information that is represented in a startR script. Note that the `queue_host`, `temp_dir` and `ecflow_suite_dir` parameters in the `Compute()` call are user-specific.
```r
library(startR)
......@@ -121,13 +121,23 @@ Specifically, you need to set up passwordless, userless access from your machine
After following these steps for the connections in both directions (although from the HPC to the workstation might not be possible), you are good to go.
Do not forget adding the following lines in your .bashrc on CTE-Power if you are planning to run on CTE-Power:
Do not forget adding the following lines in your .bashrc on the HPC machine.
If you are planning to run it on CTE-Power:
```
if [[ $BSC_MACHINE == "power" ]] ; then
module unuse /apps/modules/modulefiles/applications
module use /gpfs/projects/bsc32/software/rhel/7.4/ppc64le/POWER9/modules/all/
fi
```
If you are on Nord3-v2, then you'll have to add:
```
if [ $BSC_MACHINE == "nord3v2" ]; then
module purge
module use /gpfs/projects/bsc32/software/suselinux/11/modules/all
module unuse /apps/modules/modulefiles/applications /apps/modules/modulefiles/compilers /apps/modules/modulefiles/tools /apps/modules/modulefiles/libraries /apps/modules/modulefiles/environment
fi
```
You can add the following lines in your .bashrc file on your workstation for convenience:
```
......@@ -356,7 +366,7 @@ It is not possible for now to define workflows with more than one step, but this
Once the data sources are declared and the workflow is defined, you can proceed to specify the execution parameters (including which platform to run on) and trigger the execution with the `Compute()` function.
Next, a few examples are shown with `Compute()` calls to trigger the processing of a dataset locally (only on the machine where the R session is running) and on two different HPCs (the Earth Sciences fat nodes and CTE-Power9). However, let's first define a `Start()` call that involves a smaller subset of data in order not to make the examples too heavy.
Next, a few examples are shown with `Compute()` calls to trigger the processing of a dataset locally (only on the machine where the R session is running) and different HPCs (the Earth Sciences fat nodes, CTE-Power9 and other HPCs). However, let's first define a `Start()` call that involves a smaller subset of data in order not to make the examples too heavy.
```r
library(startR)
......@@ -669,7 +679,7 @@ As mentioned above in the definition of the `cluster` parameters, it is strongly
The `Compute()` call with the parameters to run the example in this section on the BSC ES fat nodes is provided below (you will need to adjust some of the parameters before using it). As you can see, the only thing that needs to be changed to execute startR on a different HPC is the definition of the `cluster` parameters.
The `cluster` configuration for the fat nodes, CTE-Power 9, Marenostrum 4, Nord III, Minotauro and ECMWF cca/ccb are all provided at the very end of this guide.
The `cluster` configuration for the fat nodes, CTE-Power 9, Marenostrum 4, Nord3, Minotauro and ECMWF cca/ccb are all provided at the very end of this guide.
```r
res <- Compute(wf,
......@@ -1062,7 +1072,7 @@ cluster = list(queue_host = 'mn2.bsc.es',
)
```
### Nord III
### Nord3
```r
cluster = list(queue_host = 'nord1.bsc.es',
......
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment