From bc1a1041a3251e35916f7285a62d4d7ba055c9c8 Mon Sep 17 00:00:00 2001 From: Nicolau Manubens Date: Tue, 20 Nov 2018 21:03:34 +0100 Subject: [PATCH 01/20] Added first draft of practical guide. --- inst/doc/practical_guide.md | 436 ++++++++++++++++++++++++++++++++++++ 1 file changed, 436 insertions(+) create mode 100644 inst/doc/practical_guide.md diff --git a/inst/doc/practical_guide.md b/inst/doc/practical_guide.md new file mode 100644 index 0000000..f8f86a6 --- /dev/null +++ b/inst/doc/practical_guide.md @@ -0,0 +1,436 @@ +# Practical guide for using startR at BSC + +In this guide, some practical examples are shown for you to see how to use startR to process large data sets in parallel on your Earth Sciences department workstation or on the BSC's HPCs. + +In order to do so, you need to understand 4 functions, all of them included in the startR package: + - Start() --> for declaing the data sets to process + - Step() and AddStep() --> for specifying the operation to be applied to the data + - Compute() --> for specifying the HPC to be employed, the number of chunks and cores, and to trigger the computation + +## Start() + +In order to declare the data sets you want to process, you first need to specify a special path that shows where all the involved NetCDF files you want to process are stored, containing some wildcards in those parts of the path that vary across files. This special path is also called "path pattern". + +Before defining an example path pattern, let's introduce some target NetCDF files. In esarchive, we can find the following files: + +``` +/esarchive/exp/ecmwf/system5_m1/6hourly/ + |--tas/ + | |--tas_19930101.nc + | |--tas_19930201.nc + | | ... + | |--tas_20171201.nc + |--tos/ + |--tos_19930101.nc + |--tos_19930201.nc + | ... + |--tos_20171201.nc +``` + +A path pattern that could be used to define the location of these files in a compact way is the following: + +```r +repos <- '/esarchive/exp/ecmwf/system5_m1/6hourly/$var$/$var$_$sdate$.nc' +``` + +The names of the wildcards used (the pieces wrapped between '$' symbols) can be given any names. + +Once the path pattern is specified, a Start() call can be built, requesting the values of interest for each of the wildcards (also called outer dimensions), as well as for each of the dimensions inside the NetCDF files (inner dimensions). + +You can check in advance which dimensions are inside the NetCDF files by checking one of them with the basic NetCDF tools: + +``` +ncdump -h /esarchive/exp/ecmwf/system5_m1/6hourly/tas/tas_19930101.nc +``` + +This would REVELAR the following inner dimensions: 'ensemble', 'time', 'latitude', and 'longitude'. + +We can now put the Start call together: + +```r +data <- Start(dat = repos, + # outer dimensions + var = 'tas', + sdate = '19930101', + # inner dimensions + ensemble = 'all', + time = 'all', + latitude = 'all', + longitude = 'all') +``` + +This will yield some output messages: + +```r +* Exploring files... This will take a variable amount of time depending +* on the issued request and the performance of the file server... +* Detected dimension sizes: +* dat: 1 +* var: 1 +* sdate: 1 +* ensemble: 25 +* time: 860 +* latitude: 640 +* longitude: 1296 +* Total size of involved data: +* 1 x 1 x 1 x 25 x 860 x 640 x 1296 x 8 bytes = 132.9 Gb +* Successfully discovered data dimensions. +Warning messages: +1: ! Warning: Parameter 'pattern_dims' not specified. Taking the first dimension, +! 'dat' as 'pattern_dims'. +2: ! Warning: Could not find any pattern dim with explicit data set descriptions (in +! the form of list of lists). Taking the first pattern dim, 'dat', as +! dimension with pattern specifications. +``` + +The warnings shown are normal, and could be avoided with a more wordy specification of the parameters to the Start function. + +The dimensions of the selected data set and the total size are shown. + +As you will notice, this Start call is very fast, even though several GB of data are involved. This is because Start is simply discovering the location and dimension of the involved data. You can give a quick look to the collected metadata with `str(data)`. + +```r +Class 'startR_header' length 9 Start(dat = "/esarchive/exp/ecmwf/system5_m1/6hourly/$var$/$var$_$sdate$.nc", var = "tas", sdate = "19930101", ensemble = "all", time = "all", latitude = "all", ... + ..- attr(*, "Dimensions")= Named num [1:7] 1 1 1 25 860 ... + .. ..- attr(*, "names")= chr [1:7] "dat" "var" "sdate" "ensemble" ... + ..- attr(*, "Variables")=List of 2 + .. ..$ common: NULL + .. ..$ dat1 : NULL + ..- attr(*, "ExpectedFiles")= chr [1, 1, 1] "/esarchive/exp/ecmwf/system5_m1/6hourly/tas/tas_19930101.nc" + ..- attr(*, "FileSelectors")=List of 1 + .. ..$ dat1:List of 3 + .. .. ..$ dat :List of 1 + .. .. .. ..$ : chr "dat1" + .. .. ..$ var :List of 1 + .. .. .. ..$ : chr "tas" + .. .. ..$ sdate:List of 1 + .. .. .. ..$ : chr "19930101" + ..- attr(*, "PatternDim")= chr "dat" +``` + +There are no constrains for the numer or names of the outer or inner dimensions. In other words, Start will handle NetCDF files with any number of dimensions with any name, as well as files distributed in complex ways, since you can use customized wildcards in the path pattern. + +If you are interested in actually loading the entire data set in your machine *(be careful!)* you can do so in two ways: +- adding the parameter `retrieve = TRUE` in your Start call. +- evaluating the object returned by Start: `data_load <- eval(data)` + +You may realize that this functionality is similar to the `Load()` function in the s2dverification package. In fact, `Start()` is more advanced and flexible, although `Load()` is more mature and consistent for loading classic seasonal to decadal forecasting data. `Load()` will be adapted in the future to use `Start()` internally. + +As you can see in the Start call we issued, we have requested specific values for the outer dimensions (e.g. `var = 'tas'` or `sdate = '19930101'`), but vectors of multiple values, numeric indices, or keywords can be used. For example, `var = c('tas', 'tos')`, `sdate = 1:5` or `sdate = 'all'`. See the documentation on the Start function on GitLab (https://earth.bsc.es/gitlab/es/startR/blob/master/vignettes/start.md) or in `?Start` for more information. + +## Step() and AddStep() + +Once the data sources are declared, we can define the operation to be applied. The operation needs to be encapsulated in the form of an R function receiving one or more multidimensional arrays (plus additional helper parameters) and returning one or more multidimensional arrays. For example: + +```r +fun <- function(x) { + r <- sqrt(sum(x ^ 2) / length(x)) + for (i in 1:100) { + r <- sqrt(sum(x ^ 2) / length(x)) + } + dim(r) <- c(time = 1) + r +} +``` + +Then, the startR Step for this operation can be defined with the function `Step`, which required for a proper functioning to specify the names of the dimensions of the input arrays expected by the function (in this example, a single array with the dimensions 'ensemble' and 'time'), as well as the names of the dimensions the function returns: + +```r +step <- Step(fun = fun, + target_dims = c('ensemble', 'time'), + output_dims = c('time')) +``` + +Finally, a workflow of steps can be assembled as follows: + +```r +wf <- AddStep(data, step) +``` + +If multiple data sources were to be provided to a step, they could be provided as a list. + +It is not possible for now to define workflows with more than one step. This is pending future work. + +what about defining library(blabla) in the code of the function? how to deal with that? + + +## Compute() locally + +Once the data sources are declared and the workflow is defined, we can proceed to specify the execution parameters (including which platform to run on) and trigger the execution. + +required ecFlow? +required CDO? + +```r +res <- Compute(wf, + chunks = list(latitude = 2, + longitude = 2), + threads_load = 1, + threads_compute = 2, + #cluster = list(queue_host = 'p9login1.bsc.es', + # queue_type = 'slurm', + # data_dir = '/gpfs/projects/bsc32/share/startR_data_repos/gpfs/archive/bsc32/', + # temp_dir = '/gpfs/scratch/bsc32/bsc32473/startR_tests/', + # lib_dir = '/gpfs/projects/bsc32/share/R_libs/3.5/', + # #init_commands = list('module load intel/16.0.1'), + # r_module = 'R/3.5.0', + # #ecflow_module = 'ecFlow/4.9.0-foss-2015a', + # #node_memory = NULL, #not working + # cores_per_job = 2, + # job_wallclock = '00:10:00', + # max_jobs = 4, + # extra_queue_params = list('#SBATCH --qos=bsc_es'), + # bidirectional = FALSE, + # polling_period = 10#, + # #special_setup = 'marenostrum4' + # ), + #ecflow_suite_dir = '/home/Earth/nmanuben/test_remove/', + #ecflow_server = NULL, + silent = FALSE, + debug = FALSE, + wait = FALSE) +``` + +compute will return a data array, as if it was a variable in your R session + +discuss ecFlow + +discuss plotProfiling + +discuss use of metadata (dates) in the Step + +summary of all code done so far: + +## Compute() on HPC + +setup steps: + +having startR installed on workstation and HPC (done) +having Step dependencies on HPC +having passwordless connection (how to?) +having rsync, ssh, ... on all machines +ecflow?? +having the data: +- either on a shared file system +- either on remote file systems (rsync) +- either on remote file systems (with special transfer mechanism, mn4) +not required to ssh manually to the HPC + +example on power9 + +```r +library(startR) + +#repos <- '/esarchive/exp/ecmwf/system5_m1/6hourly/$var$/$var$_$sdate$.nc' +repos <- '/esarchive/exp/ecmwf/system5_m1/6hourly/$var$-longitudeS1latitudeS1all/$var$_$sdate$.nc' +data <- Start(dat = repos, + var = 'tas', + #sdate = 'all', + sdate = indices(1), + ensemble = 'all', + time = 'all', + #latitude = 'all', + latitude = indices(1:40), + #longitude = 'all', + longitude = indices(1:40), + retrieve = FALSE) +lons <- attr(data, 'Variables')$common$longitude +lats <- attr(data, 'Variables')$common$latitude + +fun <- function(x) apply(x + 1, 2, mean) +step <- Step(fun, c('ensemble', 'time'), c('time')) +wf <- AddStep(data, step) + +res <- Compute(wf, + chunks = list(latitude = 2, + longitude = 2), + threads_load = 1, + threads_compute = 2, + cluster = list(queue_host = 'p9login1.bsc.es', + queue_type = 'slurm', + data_dir = '/gpfs/projects/bsc32/share/startR_data_repos/gpfs/archive/bsc32/', + temp_dir = '/gpfs/scratch/bsc32/bsc32473/startR_tests/', + lib_dir = '/gpfs/projects/bsc32/share/R_libs/3.5/', + #init_commands = list('module load intel/16.0.1'), + r_module = 'R/3.5.0-foss-2018b', + #ecflow_module = 'ecFlow/4.9.0-foss-2015a', + #node_memory = NULL, #not working + cores_per_job = 2, + job_wallclock = '00:10:00', + max_jobs = 4, + extra_queue_params = list('#SBATCH --qos=bsc_es'), + bidirectional = FALSE, + polling_period = 10#, + #special_setup = 'marenostrum4' + ), + ecflow_suite_dir = '/home/Earth/nmanuben/test_remove/', + ecflow_server = NULL, + silent = FALSE, + debug = FALSE, + wait = TRUE) +``` + +## Example using obs data / or more than one data source + +```r +crps <- function(x, y) { + mean(SpecsVerification::EnsCrps(x, y, R.new = Inf)) +} + +library(startR) + +repos <- '/perm/ms/spesiccf/c3ah/qa4seas/data/seasonal/g1x1/ecmf-system4/msmm/atmos/seas/tprate/12/ecmf-system4_msmm_atmos_seas_sfc_$date$_tprate_g1x1_init12.nc' + +data <- Start(dat = repos, + var = 'tprate', + date = 'all', + time = 'all', + number = 'all', + latitude = 'all', + longitude = 'all', + return_vars = list(time = 'date')) + +dates <- attr(data, 'Variables')$common$time + +repos <- '/perm/ms/spesiccf/c3ah/qa4seas/data/ecmf-ei_msmm_atmos_seas_sfc_19910101-20161201_t2m_g1x1_init02.nc' + +obs <- Start(dat = repos, + var = 't2m', + time = values(dates), + latitude = 'all', + longitude = 'all', + split_multiselected_dims = TRUE) + +s <- Step(crps, target_dims = list(c('date', 'number'), c('date')), + output_dims = NULL) +wf <- AddStep(list(data, obs), s) + +r <- Compute(wf, + chunks = list(latitude = 10, + longitude = 3), + cluster = list(queue_host = 'cca', + queue_type = 'pbs', + max_jobs = 10, + init_commands = list('module load ecflow'), + r_module = 'R/3.3.1', + extra_queue_params = list('#PBS -l EC_billing_account=spesiccf')), + ecflow_output_dir = '/perm/ms/spesiccf/c3ah/startR_test/', + is_ecflow_output_dir_shared = FALSE + ) +``` + +```r +repos <- paste0('/esnas/exp/ecmwf/system4_m1/6hourly/', + '$var$/$var$_$sdate$.nc') + +system4 <- Start(dat = repos, + var = 'sfcWind', + #sdate = paste0(1981:2015, '1101'), + sdate = paste0(1981:1984, '1101'), + #time = indices((30*4+1):(120*4)), + time = indices((30*4+1):(30*4+4)), + ensemble = 'all', + #ensemble = indices(1:6), + #latitude = 'all', + latitude = indices(1:10), + #longitude = 'all', + longitude = indices(1:10), + return_vars = list(latitude = NULL, + longitude = NULL, + time = c('sdate'))) + +repos <- paste0('/esnas/recon/ecmwf/erainterim/6hourly/', + '$var$/$var$_$file_date$.nc') + +dates <- attr(system4, 'Variables')$common$time +dates_file <- sort(unique(gsub('-', '', sapply(as.character(dates), +substr, 1, 7)))) + +erai <- Start(dat = repos, + var = 'sfcWind', + file_date = dates_file, + time = values(dates), + #latitude = 'all', + latitude = indices(1:10), + #longitude = 'all', + longitude = indices(1:10), + time_var = 'time', + time_tolerance = as.difftime(1, units = 'hours'), + time_across = 'file_date', + return_vars = list(latitude = NULL, + longitude = NULL, + time = 'file_date'), + merge_across_dims = TRUE, + split_multiselected_dims = TRUE) + +step <- Step(eqmcv_atomic, + list(a = c('ensemble', 'sdate'), + b = c('sdate')), + list(c = c('ensemble', 'sdate'))) + +res <- Compute(step, list(system4, erai), + chunks = list(latitude = 5, + longitude = 5, + time = 2), + cluster = list(queue_host = 'bsceslogin01.bsc.es', + max_jobs = 4, + cores_per_job = 2), + shared_dir = '/esnas/scratch/nmanuben/test_bychunk', + wait = FALSE) +``` + +## Example on marenostrum 4 + +```r +library(startR) + +#repos <- '/esarchive/exp/ecmwf/system5_m1/6hourly/$var$/$var$_$sdate$.nc' +repos <- '/esarchive/exp/ecmwf/system5_m1/6hourly/$var$-longitudeS1latitudeS1all/$var$_$sdate$.nc' +data <- Start(dat = repos, + var = 'tas', + #sdate = 'all', + sdate = indices(1), + ensemble = 'all', + time = 'all', + #latitude = 'all', + latitude = indices(1:40), + #longitude = 'all', + longitude = indices(1:40), + retrieve = FALSE) +lons <- attr(data, 'Variables')$common$longitude +lats <- attr(data, 'Variables')$common$latitude + +fun <- function(x) apply(x + 1, 2, mean) +step <- Step(fun, c('ensemble', 'time'), c('time')) +wf <- AddStep(data, step) + +res <- Compute(wf, + chunks = list(latitude = 2, + longitude = 2), + threads_load = 1, + threads_compute = 2, + cluster = list(queue_host = 'mn2.bsc.es', + queue_type = 'slurm', + data_dir = '/gpfs/projects/bsc32/share/startR_data_repos/', + temp_dir = '/gpfs/scratch/pr1efe00/pr1efe03/startR_tests/', + lib_dir = '/gpfs/projects/bsc32/share/R_libs/3.4/', + #init_commands = list('module load netcdf/4.4.1.1'), + r_module = 'R/3.4.0', + #ecflow_module = 'ecFlow/4.9.0-foss-2015a', + #node_memory = NULL, #not working + cores_per_job = 2, + job_wallclock = '00:10:00', + max_jobs = 4, + extra_queue_params = list('#SBATCH --qos=prace'), + bidirectional = FALSE, + polling_period = 10, + special_setup = 'marenostrum4' + ), + ecflow_suite_dir = '/home/Earth/nmanuben/test_remove/', + ecflow_server = NULL, + silent = FALSE, + debug = FALSE, + wait = TRUE) +``` + +## Example on cca -- GitLab From cc718be2ef7b2d397ccecdbededa536cfd4d8ccf Mon Sep 17 00:00:00 2001 From: Nicolau Manubens Date: Thu, 10 Jan 2019 20:29:04 +0100 Subject: [PATCH 02/20] Progress on the practical guide. --- README.md | 27 ++++++--- {vignettes => inst/doc}/compute_profiling.png | Bin inst/doc/deployment.md | 56 ++++++++++++++++++ inst/doc/ecflow_monitoring.png | Bin 0 -> 63479 bytes inst/doc/practical_guide.md | 37 ++++++++++++ {vignettes => inst/doc}/start.md | 0 vignettes/deployment.md | 3 - 7 files changed, 111 insertions(+), 12 deletions(-) rename {vignettes => inst/doc}/compute_profiling.png (100%) create mode 100644 inst/doc/deployment.md create mode 100644 inst/doc/ecflow_monitoring.png rename {vignettes => inst/doc}/start.md (100%) delete mode 100644 vignettes/deployment.md diff --git a/README.md b/README.md index 7fd50ea..056b95b 100644 --- a/README.md +++ b/README.md @@ -1,4 +1,4 @@ -## startR +## startR - Retrieval and processing of multidimensional datasets The startR package, developed at the Barcelona Supercomputing Center, implements the MapReduce paradigm (a.k.a. domain decomposition) on HPCs in a way transparent to the user and specially oriented to complex multidimensional datasets. @@ -11,10 +11,15 @@ When run, the script triggers the execution of the defined workflow. Furthermore startR is a project started at BSC with the aim to develop a tool that allows the user to automatically retrieve, homogenize and process multidimensional distributed data sets. It is an open source project that is open to external collaboration and funding, and will continuously evolve to support as many data set formats as possible while maximizing its efficiency. -An extensive part of this package is devoted to the automatic retrieval (from disk or store to RAM) and arrangement of multi-dimensional distributed data sets. This functionality is encapsulated in a single funcion called `Start()`, which is explained in detail in the [**Start()**](vignettes/start.md) documentation page and in `?Start`. +An extensive part of this package is devoted to the automatic retrieval (from disk or store to RAM) and arrangement of multi-dimensional distributed data sets. This functionality is encapsulated in a single funcion called `Start()`, which is explained in detail in the [**Start()**](inst/doc/start.md) documentation page and in `?Start`. ### Installation +See the [**Deployment**](inst/doc/deployment.md) documentation page for details on the set up steps. The most relevant system dependencies are listed next: +- netCDF-4 +- R with the startR, bigmemory and easyNCDF R packages +- For computation on UNIX HPCs: EC-Flow and a job scheduler (Slurm, PBS or LSF) + In order to install and load the latest published version of the package on CRAN, you can run the following lines in your R session: ```r @@ -22,17 +27,15 @@ install.packages('startR') library(startR) ``` -Also, you can install the latest stable version from this GitHub repository as follows: +Also, you can install the latest stable version from the GitLab repository as follows: ```r devtools::install_git('https://earth.bsc.es/gitlab/es/startR') ``` -See the [**Deployment**](vignettes/deployment.md) documentation page or the details in `?Compute` for a guide on deployment and set up steps, and additional technical aspects. - -### How to use +### How it works -An overview example of how to process a large data set is shown in the following. See the [**Start()**](vignettes/start.md) documentation page, as well as the documentation of the functions in the package for further details on usage. +An overview example of how to process a large data set is shown in the following. See the [**Start()**](inst/doc/start.md) documentation page, as well as the documentation of the functions in the package for further details on usage. The purpose of the example in this section is simply to illustrate how the user is expected to interact with the startR loading and distributed computing capability once the framework is deployed on the user workstation and computing cluster or HPC. @@ -99,7 +102,13 @@ res <- Compute(wf, wait = TRUE) ``` -#### 4. Profiling of the execution +#### 4. Monitoring the execution + +During the execution of the workflow, which is orchestrated by EC-Flow and a job scheduler (either Slurm, LSF or PBS), the status can be monitored using the EC-Flow graphical user interface. Pending tasks are coloured in blue, ongoing in green, and finished in yellow. + + + +#### 5. Profiling of the execution Additionally, profiling measurements of the execution are preserved together with the output data. Such measurements can be visualized with the `PlotProfiling` function made available in the source code of the startR package. @@ -112,4 +121,4 @@ PlotProfiling(attr(res, 'startR_compute_profiling')) You can click on the image to expand it. - + diff --git a/vignettes/compute_profiling.png b/inst/doc/compute_profiling.png similarity index 100% rename from vignettes/compute_profiling.png rename to inst/doc/compute_profiling.png diff --git a/inst/doc/deployment.md b/inst/doc/deployment.md new file mode 100644 index 0000000..2529e66 --- /dev/null +++ b/inst/doc/deployment.md @@ -0,0 +1,56 @@ +## Deployment of startR + +This section contains the information on system requirements and steps to set them up. Note that `startR` can be used for two different purposes, either only retrieving data locally, or retrieving plus processing data on a distributed HPC. The requirements for each purpose are detailed in separate sections. + +### Deployment steps for retrieving data locally + +A local or remote file system or THREDDS/OPeNDAP server providing the data to be retrieved must be accessible. + +1. Install netCDF-4 if retrieving data from NetCDF files (only option available by now): + - zlib >= 1.2.3 and HDF5 >= 1.8.0-beta1 are required + - Steps are detailed in https://www.unidata.ucar.edu/software/netcdf/docs/getting_and_building_netcdf.html + +2. Install R >= 2.14.1 + +3. Install the required R packages + - Installing `startR` will trigger the installation of other required packages +```r +devtools::install_git('https://earth.bsc.es/gitlab/es/startR') +``` + - Among others, the bigmemory package will be installed. + - If loading and processing NetCDF files (only option supported by now), install the easyNCDF package. + - If planning to interpolate the data with CDO (by using the `startR::Start` parameter called `transform`, or by using `s2dverification::CDORemap` in the workflow specified to `startR::Compute`), install s2dverification >= 2.8.4 and CDO (version 1.6.3 tested). CDO is not available for Windows. + +### Deployment steps for processing data on distributed HPCs + +For processing the data on a distributed HPC (cluster of multi-core nodes), your network should include your workstation, an optional EC-Flow host node, a HPC login node with acces to the HPC nodes. + +All machines must be UNIX based, with the "hostname", "date", "touch" and "sed" commands available. + +1. Set up passwordless, userless ssh access + - at least from your workstation to the HPC login node + - if possible, also from the HPC login node to your workstation + +2. Install the following libraries on your workstation: + - rsync (>= 3.0.6) + - scp + - ssh + - EC-Flow (>= 4.9.0) + +3. If you are using a separate EC-Flow host node to control your EC-Flow workflows (optional), install EC-Flow (>= 4.9.0) on the EC-Flow host node + +4. Install the following libraries on the HPC login node: + - rsync (>= 3.0.6) + - scp + - ssh + - EC-Flow (>= 4.9.0), as a Linux Environment Module (optional) + - Job scheduler (Slurm, PBS or LSF) to distribute the workload across HPC nodes + +5. Make sure the following requirements are fulfilled by all HPC nodes: + - netCDF-4 if loading and processing NetCDF files (only option) + - R (>= 2.14.1), as a Linux Environment Module + - startR package installed + - if using CDO interpolation, install the s2dverification package and CDO 1.6.3 + - any other R packages required by the `startR::Compute` workflow + - Use of any other Environment Modules by the `startR::Compute` workflow is supported + - A shared file system (with a unified access point) or THREDDS/OPeNDAP server is accessible across HPC nodes and HPC login node, where the necessary data can be uploaded from your workstation. A file system shared between your workstation and the HPC is also supported and advantageous. Use of a data transfer service between the workstation and the HPC is also supported under specific configurations. diff --git a/inst/doc/ecflow_monitoring.png b/inst/doc/ecflow_monitoring.png new file mode 100644 index 0000000000000000000000000000000000000000..012ed7898c84aced50dc6e6d24bff289699530cc GIT binary patch literal 63479 zcma&N1yq$?*EWixAT1KoARz+M-68_g-AH$Lhe}Cokj@R#4bmVb-LdKJ?(R5?=Xu}n zeE&G-jByx$_@ovwg{pq{h z7w94@c{Ned%tKM)lM0biQPv!H0<3o`&)R+!=K6WcAw&6bem%&dp3EOhuKI zmqX;`QX&bxeD=?`-@~g6*bkKgDh7tI0@V_Q26v~YPoMtm zUHj~DerpunyXLq(itpR3dyn<+@y~=V86&$5*PDa`(e;=%^AvN?SrYn2nTAc-HXV6j zOt1cP*pgtsF4M1(ts(PD$NgD+@kmT!Vo0V!UMLTA-z-P=KX-q&lqdHN+j71hpN5lj z;<5sZ;>CBef4&ddL_|e{J0kx53;a~b`*dD`_2z%xub7&g{PDlumwJl@q4@7pJ^R2G z_TTls{NKOB!MOgrLLsrI$%C^LYiAU@o zFMWkqr2Pq!O+vz(f`;!LHWxIGc)o>Su(`vTctUF)+k@Sv}W}y?=u40m>&JBHM*tST9X5>>i+)TmHz5%%+#MB z=6IZb_8RWqTVPaYq^q6JDXiMNm0FG8mZhY|mi(RTN7Gs8A>wlUvBPQo`?m9-yRTsz`TlF#tIOYSUAe>1(F@fo6LAVuPBu~+P)T{B z`!#(`u6mUDW{T)II5@(od_GQrn_d}=*i24M#n5?*kn#$>^b;Ec_Jk0!8{Q|&+BvcF zZ+7<#tT#-N+U?&y4+*kNM_WfNL7PV~JZ-t?3WxDxft|zZeOyG`B zPOi+Zki&?0Wi;G*1m?9J(U6xb?Jh{uRT5i{jEs%s3q;YhNd?1axIs^^n0d63&1 zdhgDmbv52AZ-o)i7gx7fDJkE^)_7@o^1)eYRDX=5ghMzz=q$r&_9Nm`Dzw9PK7A(< z6Z~2XwG-C-gjeK0|tkJ1rI-lar{duNrde26HCdBU!%kbbF z9$~FA4L zaJtjkn;{mF5+MYl+jOFeUZ?*7^{WxxclAnVVjUmm)bSz>W~Xb^Py~vF_ZmNTt+{cl ztXoCViP0&JRVZb`WS%)~cV!}#7+6`M#k(KIdOuKf$|QftfGzItTbJIou4-n$hRPRC zDlF%dZ5r?M`#gELCRH~dgWgr?$^E@ny<s@!OfNHA5`tLSjcHA?N07DLjJjH@c-IxMW_RCvcS-{gpctPikzB zUco_mojgZ(HMv|bnOG7&<)9Py^_G=8T!Q#WhC=Z!jwf%db>>P4;HYI%T1?rVX4le! z!}Jx75R`=s@|}wk*Nqqk23$QnBI2W;>B>l&s!g_<>=s8JBmNeod@UNl5GDmhs$gCwTx8nGo>BA1mGM%nTOFk4^^ zLQG84QK}1`o;7<74x5hCJVi6PCX%G8`vCxA@KR?xa~LM&UK0!iY`KFW6ag!mwe~?r ze4iU79hy8L>9B{nL_molnl`@!@X-Xts2)LZ~=$=mgnZrp4n2L zqi-;1{gQ7*NN#e)1R0sgX0DWGq<0;e8(I(&;x<)=L)Xy#*7Rh7k-5Q?jGM}Tg+M9z z0$fiqndH&a4GExmpFP*!x3@xDr3b?#>MXsuuv9EY%ta{t~ImQ)uIv-}}^6 zS#jgJPtNM1H#tLCO$K45Uc4tvgV$%<_N$Yn`v15@QA2=mv_N(JAFo8#H`J^BamBQ0 zJmZL{wuYDM?hFJu65JC*%kNm9Zuc1e5=xG2aGDR31LJ~KnuQ^r8P2xrMZs=?Npewjhd$Zi-YYP>BIHkBo>AcM=rLzr23>)8PW*VKf#; zK-H%6d+>XQu6mFAmXN6$p2dU>B@a*HET7^OPCe$~_#ES{3cV&M$>KddE^gTUChDCs zFP|-mmn*2L_kF7}Jm-V~#3jphpbl{w%U7aeIgZ^D#z%t;Yj?>h z3HfH3mWxLkcCFb{il;ktqVG;NB+)IY%I7Opn9U*c^75(_ed$~~v>ovb56gbz1Zo^2 zq6Yxk;5|B4R;*uYvf|>{$RmwT8c?^X!Dc#UywKS{UP)ndDU~l4Kyi|M?IV4XElnTm z;qHF8wTsc0y0W}7)#woC*7ZeC*;Bf;ft=h1#E8dVHf|0oDu0BtoupPXCyf)&#PsPZ z>o3LP6;x{_l+V#rd||prwZD$k0ZecYcs#_3XT!ES>W&0+yELm^ZO+A%@|A*+R&=r? zaCATwYVtVhHN<_0RaL)Y&>?D%n>(09`cB3tZnm{0E4}-09i^7Cah)cP`}yH<+4n?5yLsvqa9@4X0-5W{&8bPM?YuFRSh4f*GyNUovJXRJ3l|Ku!IpY&9z-C7l=xj z(;^CEBwfTop{)yu=ua0_i(FeX-}!(1R##Wou6VD(Zt-fgHmW@p3MG!D#2l@~vMw$* zG%|@&Hto_{Ez-=)%zm1E#a%F72>)xr!3*wvi;L+5iKZr@^WD{JH@7P|%eiwz1m6g2 zAt5-xU57_Tj*gBQY~{4>ar+ zqrZgfOkVK1hvFa9i;Z+ZK1<5x*k?)9GZ`{DJ%u*|^;9gJ)TF0ktHNq5_%sw1q~l1o z3=OL{9UUFu2SLGM#sC>nHy6S|odqz#ERe_=PVnU3a_StBB_WiVvEkW=s1As&o%6cx zrXG*W?$?iLznu3pYW~#NGMzGkiWnRd6XQl|AlA_KF#quL!r13i!yhNx-!6;O5W>~< z_2E2W#M=T@MpbpTLY;8oz}6q(vVSM4?SpJq2nEbhk%EFe&$~NK*$5C&i36HVb!Tjv z^b8F8lLueFbf69pMI(reOnk&3`0I-d=OPVoplB$80s6tB?%?KY9mA<$UC}Un$u)_X z8pkzcWlyiIu?$4GxZ-#Mcr=1h#@JXyWw!sApN{Ui6Fhva^9sU&&nYt;;Dbyhj2faM z8^l-h#QRka7pYv_JobA5YstUdopy%O-B|AhY+oiwiO&7~&=Q85YI91d^q*?RX+56_ z7UOx>?&-kz=pa_Bao*X`AFluP40o#M|bLmfC#9kb2JtGmwS2$1sb*-%YDaEC{PSCa|3Lb$5FjBX(wbDM< zo1#~(@S$#7B5#|RF;!Jn_4jAr!u7YlIFp68P&s71p3fb7YHG^!rZqe>D@&ctn@W87ggcexw~ru$ zc4OZk2;tH3aob$~Uc~V&6YcRoP?4X3D6H1tnb3a7jp5>=!l-jT)MWRJY;ApA;I`%! z5M@iGl0_;fEg=~|4qDD1qMV-_VPJVAD<~-9vn-3;<|`YV1*b~h_NO8q$@VVzsQ*jZ zeQ05T(S1K8H3G!BrqrMdD3J+ro39=OK&+4ysr33Ct9(2wwcP(dEBQy_{=Y2!*Hc0<1mEV{htpHXDyx|Q ziLCtG%?F%&$d!zY%zC)sz3$EVURy_pK&;m-VXsI{Q-)Z${r&A#4m8vr%SrBWNq)z2 z(VUB6#Jra`2TdXL2MbMtLPA2P`*TSNtR}+^9xltv%kjJ}48*+7ZR6urAF;GX#|l)h zwsKP8es&wWy1Oe>TYpLM*ypyiv?SSS3~^VuV(@$ql4X8=p3P!B)OzmEo4DRyV%d*C z90p?K822aE)YJf#ATyS)Ovl$Co34dU#P#=}=|M(cpByav83F=}Av^Bh^P+?`Gz=9()X)YPQ)(HV&?dJg@Wa;`UfUUm|UM99M8~FgYJzvR-R|`umI1(@$@sn89tH!o$LPpp zJ`@S-Em>TzxRjLP)^L{PNh!GJHL-yDS&;aIrj{Ib9PAi&$DJwhpijaT+5|EimNvm3{Wt?*6 z&Mz)5j)#g+-_TH6RaN|T_IqF;qCsaEyU+e;9#*+=|MGZ|=4he%Nd2rX(@Rp)aPT%D zCct2!vgd-_IiwXb*;8}(&_-<})nK?O%MH-d%YjdV-k+HF&*47N#+S=_s zHDpr2^S}mF#Xfj5)ixWo8!0Op;Q}m-9$Q&)60dH6dpmW^4q8ED0j_ z<>lqwy}cOKyj%)gC#?DPi%j5Z70dhn$^>*2V5V#!@XJhL z)n9KHX#H_S%7A?iL?<>K%g0&p zx&WL+Isl9KYb-HND|me%D?NmEasho>jsqmMzWQB4yFc~d+d$K3Xxxr#Y5rM62S zkB^UsOxYeGLKZwhgiZRIJvR_y7ifwqcXTuh0s;b~`N|;}g119N;TlDc_L53u3wo{X z`mLgZP2l!Wcp!y8G3{!&Z#l z$x`2y&TvgjPM>?vd5;7BmX?-D00bZg3e`$m>b=)`VnA2duSm1n{%}!9;AWQ<_r2QR z_K??-Ql0f~C!1phuYp6B7n^`hlb;qo|tpXzmvuj6%w zgNQ+5c(UG4#OL}`$NTCzph&j1Om{aIIiQJ~%;y$)(0DUIG?B<|@gpR}V5I|cdp={c z)^h{Oz>lILdT=%+8(TE6GK=nCxDd8tou(83Zw{+nn6ZpHz$G4&O+F748(u@dg!}>m zj;BmgF-b{vEWO-zbYPwr2Mej-Rscx)_0WlV9viGCgDH3r(RrT-FH0*cBqXFa;OZp& z?o1Zr@9iBN`e%~)Jz|v$R7zcs^gn+5n5$8_ja5YGPp{LM>~(XV5Wdj?A>{VC=UebR z>4*2Hc@H|*S(0%qHuHSo`fZI~H;1h_{T{VD4 z=;-JmEdqe!KzX?v3SxW@z#m}wOCqAaf|5pwxj#umoU| ze8Cv#W5UXeB>DLH=_s4JySv|O)%<*pg8LYsxbzxrbq9?W`$+25*2Ey6bDVa@KS%nd zrjmk-(6bl$Unn*4-XIUq`*$ptgKNZIkHLB;3Z@wib6P1;f z4Q)B%+F=tvU0qz%l`(17N(4z!=Cq^Y0Eo{1;?k15exktrrGfEaO25M(f9KCP!g$>F zhIx9V(J<3bB@^)3IqSd{)th-ifKxCshJ!+$5SNve731olsRUw@K7c%i`oO0C2;rdq z1RWj`eHA2=5rB$y?ZT%QbPHl*fLHx--jgsjeOI7Tq(7P~e|xzS&IUNo>{*FvBQ!aA zb2tm7&gCH6<6d0a+q)c78}#fFRd@^}+U@kkMj*t$B8(t5#xKk|vAqe{^d zMx`qXIWokk2=L6z%&xAk(fHZG|F}D#in|>&TwJav*-X~Ev1W)u(tRH8agJO3rYkMV zLEX^rH5o`YJG#=_@c;o9vS%}x!XN%q|6l9(gJfC-U0u5qd2P@~rB%*99M`nL(NuLU zd&p2LH-=pKql(b0{r>v*6$ei3q78+J!hRaj1y)lbpJgolT_34~{4Pyi&)00^_r!bAY6FdGQ~ z)NfwD?wGIN#nG|iQY)qYbsuxB>k)Z#b8{#j^Y`v3YE8D!`cVf4(;p2sn$?bj6eXnBHg_#o&gDv_m==Fnw^t zy?z^YUsV^)XR<;}|=8dv_v+--{~* zmE?}5QB3T_bl8SQUDf2d+>s+4kzK?oNwTbqLwuV=z`-7e|y}^SnjYP<9)e= z=-^93M)nm@EJO(lGZM-JzzmI#mfN2pB4Pm$+k|D1x;ajxXy zpKs1~0<{=6Mq~x4xVga)@`JrS%bPcE!o3zEDfMh@2Ep)AE>9^}i58bZ3yd(m7F|0+ z@ms-tU!b4}XTW|byzQ@$tKXkRjo^KVIbr1u|W)h zlR+tvc6R0fz(YYv={Er7g%4MG0O$t**l4E8N~^~98!%PE#8{qUAcxg7lgqxA^=!?z zW?eE~-lU&DUji9_KzI<9Pw3yhW*>k?*a4F3F+xGkfEwX_y+sYUxSy}DaBT~AOmuV` zKuf^-0Lk8;tu^pAn2S38eRh|V>P^nTfbQG;7ZhYJ_tSU2h#1}<^Kk#+DXHs0r$pSm zyeD%m3(Vm5*QZ;)l;WQw4O6I(xx^#e3f0TwSxwNt{BaaYlRKQYD1qh3rXq$}oNmGZ zF`oCj_@YFM0|-9`28O`Ht%C#OM?FFgOWpI`smYm{sMJ*1o$;asE<1e@!-9Bl`!cr^ zDG$+7vR4@S< zowyc62gwI;ejryqUJ0``X2E z+)CIGQDJv?9w0}k2J~EC0*^iFs=`8BC#Qx3BLBy#QC<#2R?%-#elIx?d4TiJ&dv4S z!luh+84tjMfB@g!8%)9d?_Pl2U2v`!d4uS4gT=VlFf z(QTWJqX7v5Zv7}pQZUw#6#wcn{Y*#*lE`Wq89B+jk2iPQz5%?*^=d6{s@wzvTqiR# z6V#}qt_g&HA8Ibe^>G5eWJWj3YNf8YE{PD8V0mh;cr{)yT=#ogJ;oB#6_Jk)?3 z=)da+$N1&Q{I5a(|L)EF^6P+QF__8H(6spGJM=c zmJvg=nE2VJmn5&DGGcB#n2%c{GQ|qwoX^i|++Fo8Ov1HGs0BF#`zn+idFCkE>G-%p zI$$x@_^uO!%?kpHG;hVA`j$W4&GaOySuBkSBw73|b?A=S%Zh5w(Ba^HtoAS;f zjZY$<`ukxGK&tJ@jTI|Jcla&GKTjL_a1wT#eTc;(DQEc_a%Xaps?N{fogB;2#B{B) z30>+@kQsmiTy@JpzW)B$h+nnKvo@pMF{hazg;`^L(kIq+~lp~7@cMyzB& zmQ}h1Ud?__b2g0a*Kmx7?0W`CU7zlQ_a*P8V1pl0OH(pebqb&S`bW2DCD~_7c3}7^ zRgfTLwJ}U6wy;pK@}~o*R_p$QUGJK``FN3RB>{V_+uEycB&1wriBQoIt6HhIovh9&=Nhb~E8Pn>Sa5o&Oh+!Rcb;>;?MhYKH z{BibqBVXaHt04zhRVOIDTt||iX51^72FuTf)6SaTExphG_+4Jgi;p6Tc}h|xG3v%E zoB2Ig-w1$R<*V%!>uHtO4L6@CNl-;nzEM?sBhK>ohjvM;?M25zfr3$@;7H1dURq`r zrJzDevX<5s%;m+C%AF@HsG>n#J4?GWwGp?oyY`(c>+4{sTraGogbgSX*wT`4d+P}# zo9Re4S|la2&3rwwSs%{&)>aS@llHYl;*mc>Lq8R%sBQ{ii(nOrg$D%B1s%Ggm9WR} zsrbBF+x>JlHx4)YMIQQ_dC5?v_g9SSNlx-dp-Y!n8{<*bJzDR=Be#Rd_+)tC4b4L? zRLoCx*$C*}Kldd1tZEkV#DL+H<%2exIZl-`I-B*FKY>%y^6GAwM2zf5sb?Eu2Nd+& zVXJD_z7A9K9PR|=<~D-1kMO5*H2+y zMjd)4CV?R80c-E;lLA|TUc6wE?kugY&H-v1POIJ()3^C6U|9eLjn2v3+}-u9tgK-1 zEO@9ubihn0d3g!56oPxGk=sO@^~q0vU@#eiT!YBKBjg+8`ZJB;Goo`?s6jvnLB5?8W4ALT6#kl3mP?mMD;JQ7=H@40h*_aQA4#bmTS}RxkoU-M-bSU# zy#ve{v^8)o;&Imw4#ItX;X(hF#;zee9K*gg`SQ{O3qPcxXn$eaA#*tIGepK?+w8$; zU3Bz{z}yK*cV-vrcTXfb$M7xJezeflik*?3ShSAWs(Y}@M10{uel0(5Iv05YWA5+B zf-k<0tcYIG;<3)s?d`Ax!MXVdcPXA7llvatv=Q69-H)SL?>HLEgSKw6$-4KIQ>q&K zYjkl2&Pww}QJoRVbbX0kuD7PLrh3co+czTf9L7AStdO=`SYDVaLoW}Xjk*yC2wadf z4bGu|WzlShqLe;!W}Abnxv5obfNqKEmPgZqtg+TxIL?n0hZZVhw^^KKI~GLuRH0sy=yFBkI7(Br%f*yK z+y3kI_4?5wDIqKbp_X^Lyf?1{7bj*h5IH@_!{^fN7DPie*%Mov{4kw(5;?1-e@|LJ zl)bc9aKNuNXM+ztBEmFgH z^WJ#AGW_uL;$79?ZK$WfLDkQ5gX!fj3%vhe7 zD~i1+h%O!~(i&oMLB#lFx@4R(^STG=FGIYH+Kd+!x)UzK2g&*0LZ3rl%jXqis8p#G zCX}U!YwBdkY)r2oc8;g$_WPk#5KYg5Ep5xKSeN-XoTH{+h>J8j1Js+CSq*6!V(1Bf zSIdg9Hr;&Wf!GX&NCv8pV`M}al^{4+&peOK^!%0qq1e`9*SWaGZ5a6KA}u@KQ28w= zsWWMZmv6TA%*LMe{VNL(i+s)!73alQEha`rcFr4af24OT<#kBgTVvhaEqNrA=drFB zYTRg_PJrEjJ4^RA=tEu-abm%&>^x{OGH0K+1RJi13belj2Bs3uSJnq{`yhV3!c4Wl zvB#D~@EjcO^T(ah;|mb25(uX$k3>tXk_$r|~lWq8_@c0zal-utJTU z;j?_@Qlnim3<*LZ7dn;J$jYsY2K9^6yihPo;(kK{ig8pFoyvGso9LI7%Wr#`l=IC4 z^Fhr9*-OjoL8?ZAWh{zMYP?Q(ig=&fUG;4tCiWU68YmG{;)`MIQS8fLCv4@Z%}gp# z{Kovu!ZM=Ri(>+MUX|u!=@K!tV1vY;7%+NEnwS9%Q0rMP^RayYrY1p7>)Ayxv{(z~ z1ud)}KYl1o)YaB5?d(Vb5e(XZ09Mjf3VJPx&NgAb(n%b4>wWkkPG>t48(DGMQ`V<3 z{ql?QlEx4M_97iKq3Q=GjzGab%UpHq40hi@Q1LCp5+HYVsK1! z!vEToZi%OC*!&#s_qUq{6v`Hu zq|Y0*B#4&f-s{0>%;Pv}dNAl`wF37!yj#~ggVB^P_YLHl{axwp6Le2PiOoU1dSad_ zqEtY=a#+rOY!9V%$`yv~!1O-YXj`}asPfZRq7Zd71ud=+1PmV+87m&ilL+a&8U7*i zLImkjFY_3UMz4FZ?TdWjxCAKN=9M02%lGhM8*fs>s*3?Z@i#}E^Na& zRxK|7OjIjbY52>!Gz1q#LAaZyJ@>pkzRFtjNsD?S9HgcONxh*s(qPf4Q2$hp39&2M z?wO~$+2YLMgi{?AbsE9D)l70X+E0cUdH0rV9qDlrpcl)w zHlNX-SH3f)p$}jeh9oWs&WSgqIN=?w$ic}J^7khqH8I_6*TL;x9?CKtpy%8bSjijH z&>#QynrF~}Sa$p>NkP=@pFG>WcW3l%-1RBxJ7F)xt4r({59um;7No{rr0Hy z(TvP4#z6!cLCh;%N`}bzovKj7l?v(GgJ>gz2cf~DmeE>sY!2iTZj}!O!*9ANvB#-= zMnF#9{&x&L*#L(iv`K**flTy{@*ir|<(Try|-aKKfh5Yi$lzuPQT5-xqCn(_9?{MDNU z()5*R=dAqX*{`((*opQdQ2jBqcijE6@nj<6)r{qxNgf z&9I>TL9}$e5tS}O6|iOK{9s-}qvuUx<(`CYYiwptmhIB1Ui2DX@oOxsRxq&Hnk=ON zd`0hIz5#b9MuH0Td<~kvKL?$mqm@oDXkcX3ZqN>=6wk_T2D3r{TM;oa?Er!Lh4*^O zJT9z33kILX@I`HHE!e?I@|DMFo5_ClbMcqDj{UhhdhK1g!i~?Wn*sW*y-A0~EA25* zRdL8?VQ5BnXW-oCxy(sqi?=p$H8p1}Q$gGn1arwt&B9su$K852hc_-yGb>&7!zR-H|Ubirx+R~|2J_VIe8sMvX)A1-gdwmoEXsum-v zZ!cM?{-vBqyZ6iHdc;*u7;b(1_InDclgUAHlf3b}2pYpst+S_29?$iM@Gtkyijp7p z$W@$GxLkRdT(?d}_ccb+chM0^vzERhaE;l1f)I%LAcim!7Rd+Bs_0$67dm4Y>x1y@ z{v8;@tyB?lmf@g@$b05^#%eR8Z>d-%GX;+E#bsotI^Ccc-Tvaa`JKyJujU3SuPGSdc7aQGO`libYe}WVd{~|8Y3?Vezf~B6N z#f;Ot`ZzZzrf)lw^n>*nGeIcjoOs7XzkO~%RagJl;;z@_zN!4$mHeOUp`Xa8F!aC? zyLZPn?w+F%3HpYEcIt$liBUP4D-<1<3zc8CE1$)V5CZaGj@5SGad};Mc^c+YoK`rG zn|f_DXFcaEh!W28M(NbgmKkeQT+#+#^Y`1m9({sljylWBjYz-O8Q$>SjH`mA5hA^2 z^jFuq0(!fL3JSS!9L6X6HkKxID$pWF`5R`}yDwriZ{jDWl**b-+0sXGntrt>*9+gSJ=31!$7MIyADBy zwLN5ND08>&XWTpzQa$%_{#rlk$!FnqxSaU<(rX)Cf3PEMV1&t3+$Y7Ys8Zyz!9lo7 zvcGj|A_2A|CYP09rnX$iK+4SR__R@4>TGi$fh+b*y*oJ*4?hI^FNGrA*}avS5^K))^2JG)HOKXQ1Z z$;;KW^(vBbNqqzgvaoMMQGLPg3SVM8{0i~|K4OwQ~VM z8oQ)_IzhAvQXQ#B95rf$T@&J6p1t$gg?idD48@0%~%d180kQ8_rOP>BR$dC(GnLSy<9umX>YRD5siwWgi6Bxk}JfE>K7 zbYc&(Giv%?2ZrybTO)yh0zP?3LZYp&&C1FO_BLhOul1mk3M7Lu`)}(xo{;&3;b;|S zFn9txAi$=8FfE20x0Cf*Fj8J!E$h_-v*pLZIhg!_;emN&B7kNr9Gv!G9J*k3wx`Sx6ut_|(C!@V&apxNxrdp~;elS) z{Q{DDLSD$~AvP}ieTqs6XJ?6(GEu<{v*jEaE&HixLYkp`#sT-)4!%a1Re)&8(r99N zeT5mGu*w;=;Fs?21R>njdru*O93OHfevO&a|5>Qipx0>FviKB&P^2KKBEfO+sR^Q; zVY0Uf*>!!8MYtOczpe08LD%lhEo#k!O{KC%OE6V$DEv->=;KS5w<)uueEH6~p|JU{Euum_Q0&46=~iq7gOSgF5<{nR!AF1U z3EW;?_hX)Hq-*U$H`VH%BbpGWUwe+A1&+|*1VGDH@D#^n$5(2$EN!s933sY+ktu< z+R@zmByL)!eR){l%~G158{wp^rW}bffLgHwC}w@8WhR(KPuO9lKU{j$h0Ss@qK0Np-a?)Sqk`*( zUH%oewVW*y=SeY~ANa-k6B;jlJLdW{wM2Xfkyvf);&qu~8EoAGmhX59g_XD6CEuUD ze?nY9;Sz(R*<>U1!fL_v`FXa1*@m_7Hq@h4r#!L@W^vl~CFC)jX18nM5FAAnWmB#+ zGDp*UR19>+2I~7@gDHhImM7e{cY{f7$+fL*zwLJKnA(;#jEC^mINd3{UZ1t~tXN~v z5ERPAuNx{gxa``&dB}Pw=quXgYTsTs^>|pXyW#U|5h;w}?AFg*5)o{?Ok9FDP- zjo4GN<1Q5CDl{?`#rmKi*JfUD-Cot_o&pT5;~bM==VE#E3F^ zQyp9GFirA`fjX^0GghMrw@AiUOjtGLVV~Q%f3h{yBC6j=?UJ`AnWJ*R^SyI@6QNOY zB*rU~^xyKoEjmaXX`xrsr6%(WhHs6Jtovp@vjpjAUqZf=Kcqi!dP%&e{+eq5d)6?L zHfr%5o!%YNILXemZ~4^rKto}OirdKM~fUcz77a?_f;Z`7Prx_dTdKB+aaNs*_pS zLHng}NiJ1~mX6Ng8VkFwFO)=d+%GY4#as2xnDjH(+Oj`0{1O#e{7XO-?}NuqO9a0w zZ&Gl16XIKuF#D=g<2*j#g&MBu^q!*@Jd{3GjHIdkYAmiRw7a7y07X<&xODG0^5n{B z^{qSoZt$#d04|zUPKDuJNRk{Fk$cC9(VM!}4Ss)))CLlG^5Y{5)kV<=jnIk?qIPgn5(hD$F(}ZMGL% z~A>Ef04vJk6%i) zVEUAeKrwu%B)sgbf3If3p2B{?3iNpm!&pXh3hW2M8s zt-{Y1@w$XZG(2fgkxgNFZPgfN73IQgeu1)CSm7Tap4dX7VT(P8Kr-q%PkJ_ol7_&J zY?y1Wwa2Q{XH{%#G!BZ>p)N_dv;##=e@}YdDY%n*vZmTZZ}|^ia-uI}u9rVC z^6vaHZSTL~vkz$=a3q>;7zw6WAYyd1-sxXR&|q`%0H2)<-8TF0UV!_sbU9>{u=cg0 zB~&#SqA!#L&3#(Rrg=%Ev+pQX-67@I6!GUj&}R>w^U%&t$1@C~hd!y;?#2U2Ov#t} zI#`msqXhY}v?4IMiYoJh>(TLLGt%u1H%H})3O@z#ra}d+X`x7$es`m^@xvW6OLx3D z3yt#h$$8?0c~&Wm=_NGK@0s7F4LpU^NA&VumF9cY~+- zo+&R{NR`j|+z&fSRSKx3^F_z9zHjWVK3#~9jY>`O4Jk;nQ@5?0?pl+Xm8ol9;!lz* z*kAZnL%Oc)%)yb+hfq8F?R$Ma@}QkZZL*0$pnQh2*RX>1@m!*^>(%8*u_EcH2K}o% zdTw?6-5)yrm2^$tVp8M4<~xV$Q?mdJQkl*$VzBG|4cNEGRuBT3n%>^taGrN(6GVdE zv-!u8=Mx${(1Gn!utS4hvx>=Tnk`d-Mzh+Qp40E(+i#w{tvMA(uII7yc_PSFG90-T zJvT13PF$DWkR7Tp_uaW&*mOQb(8~&Hn7hBgU01V(d}Dn7qfa|PClf<`q72D$&7{73 zl}K%UpI&#x(pR*Aq2cEK@0a&QG8VfD?~tU+&LZ$fy5M);)oA`u(3Bc4vb0w`zVyCD zH}H-lqds*GU^&~gdr`qizcBSOr}$%#A{wITEmu6niR|HlIh`rbo5=GAbKl)>T<%7V zJr$^Ao~F_*fdnmoy>HgZE!?ej?R$DBYw=o2qH86_PcOp)7{{~jb9p-IiRW#E^El_x__-P{H-^1XSXKuI`^C1c=N+OI<2$bio3QGCA@HCc*a@K^BG-M%`r@h? z#nZ!=Rgtt1JYJq;Z+MT?xLuVZ&e=hB!Nm55?LwHmXC1b4dVB0f9xV0z;N}7G>oO5l zj=){ab_1G${ORToJngsu#?iIYe1}H(K zAm80N3^z2QSQ)dMoW$>iUK!!NBFtMtd?Ha{dStQM`lHo1h_oJh5a2U-*Z?0f8?Ii3 zt)gmQzei-bc_wZ$elac8e?RsY!C){(lHH`N*o1IrQ=>Cz2@x^k&I5hHF}&nZt=o6c z^=cPI>vErk0GcTCVBJ~mRlSI3dC6zv>WxCeK8Js@V~efcZb=CJn;lmLv$5JDzOFMl zC>7vH@z%Ncn~|0@b&k%+Eqa%>Gqsx<=*&8X&wJ`nf&?C1!SSlXQ5 z$L(=Px=DO^`uys77awXTy&xy4rWoOA7*AFGZ>am=gs4T%+CIbsdkr^@cGUL37$F$- zdCt8@;v3!@Fg@n4kZ5Hds7d%12<1f=K*wHBLSJMYpzj#y>{w>5I4_TZMnWud%hmDK z&88;nri2qn1l@k}yYm}n8!uz7gO3uNdslUwDTpE29}HX2%LMGvR<=(XSbRRPFd4p~ zx`>YQ;w7B0H93txAINv2jT&^EB9VtPN(tUGaB(nrpubJJB-qvzx-Q%JU3sHAe>NVO zuQiD3q?k5}6P2p{wSL$6|6=Stz`5?f@bQlnDN;0KmK~LmJrh}psFV>I*%=|(opV9b$}M+<1~Po+4F4=ci%Hihi;}z!%-BTWplhYa?6j7_&mNf&{up9V)4v zJ-*8M(AKi=S}hsY?kO>lLLrTImW1XH3deWb>+Ne+tzKrA<~E*EgjuidKZDs~x2OAAl@c$rzIwU|8#NrXYR(0ZTsW}Za&RY^*W?UWn?ld>5hpNJte5HK!CsDzd*1Wv5Be?sxcTOH zM|p__rGk5@)6)nWjro8k@85R6G!qx>%~u{&wkj0fkx;if^8AgO3c>Ehd43 zRAakt2x}d;&=*T^?a&@bUC3+KeKAj+{EaD8V~&RE#@sgc*ZDo?$2@0Mj!9GQZSE*G zVYtOJE0}n+&t6kZJkoj6xZiT+`iJoz(o2V?r@A-vJUDo)WHa{iM6IgpH(kE|ouWg2 zw@suOnWI_Pq1TLuI2z<==jBP8Qa7bfWCoFOl;As9u~Ds zm46cU?HL=hU%G;SXd2C{9fu2)3ntsEPK?I5-x*J=KL1MEZ|k*|>Hg6r&L>Rz3BS%c zmquJZ5LPmJrOl?^oZB)cA)JxL{lKDxuJa?2u#%a$%X7+{YF!t4C!h3vmH5ER;`~%$ zoL}W~(z6lEF})wx$ZzguO`0{R3t6_8DB&~vpgt^=RbJ{Z;Ci#gqW$Ye=A!;8XSsc~ zQO)ZoI7c+aG=0MMQ9h5|#w%KBF%iXeRUuDFQ%J*J+eJ&kNS&IbebFWB+!K*B1#ONTtM4xNKMrIfA6CnEO6$2z$7L6%4&wY>^FTNq!ExBa8Bq+uGkW<=#&1Sty!N5$x;#+VG z+ZsNw{mQvC;h@s_$}30Ma75~KT-wt|nIr?AitwIMk%~{Dd1QIJGgKHwhB%X}lg!(> z2BTSJCEc!sj zyT-2arAguBl5WmU)pv}~K2pwTVG8&BHo+@uS~hZiF@LSfoQ;3Uj`H=?K*!Qnk+?Gry|iK#vqe)>d*fb6I+@L% z%yH2ZzjE!_>CBp`+E-`!zI+%>RxRcfi8~)|yBzsV*p!moZT6Mpoc2tnTkrntNax{o zZ>=wR1GHE2T>z|-I3rXGBA!`v{oY_M%&Rl>d4$j5ybQyVEB~St)PX9I*6?(*ckZ*H zbJK~WqNXwX{2`Th%T~v$(?_2lSz&c_xgzwF_T>&`51TD~_XkuS-TO@mNc`gp3e@_g+Qjg14QlC0=ij;<0OZwYvLYPsc-tkG+$;C$H`oShru7Y? z>f)K@X7zg;=eJy3w^fl?AKPRjYdKWkpmn&c{9IV5|NeWaqtq&wO4GMC2Y=pajV#6O z6S6!-53gz8(X+a0*fCOYW!{v+grVev&bDXtUt4KYU+yf{Z4ltE?;V=(Q;e2fT-@Z< z<;_Djwqh+KeVsjdzuji452GhMpT1T<)Rx!DIXrTL`Xuwkj-T;c-JE9Jj!u|*e6sKQ z$le#Yu`BuTkWf*+Ud3pZt>eQ?n}+C?#mYH6G%wz7*aY-n)S00iO9N4_Sm^^bnQ#|!1<1G0^ z@o;!w9$VhU%d4SZ+z*75@%q*_dnlZ~x~(%@F(vM;KkeYYi@5kvVB5T>op%swPoTv$*)vV6!r7I)TtT`Zqm0FMQ)#D$>)er9rslq z)yN;-_M_F!{PfMSjV?1>k6s$b6i%qWzE?8Ocjjxydf_Z(-FX8C4#t40&mWZ4R1=SS zJ@laOa+4NHdopR?O-09hP&n?yebF%W{w_ZA^#Ts1B16iUjGghC!5tL|UGrU!>N5pD zSLBk5JH8wbQGdqHf8ge!A8t)P?zSt^i>J=meN9l}R?#nA}jVl+RkjUK`pYUTZ4c4>iAeyR;to zvdKou%Z_2Qv9FG`zo9|;rGSb()F*CNmK%hfVBw()WZ^W`kX^T^7z8_JB#YD zWxKxoNL)P~?s#cl^jIOqa@_ZJu2CdTRQz87gzt$WWaCo<`{x|~`PXnLY{1ueMFDBExh%&gbPFwbW% zCXdsBaH*-QRUEf2D0y?nt#>|N+)9v(nduD;&-!$l+M1;Y#q@Ockp)-5d+XD+N|Cd< z2^r;$i_a%*2L36&DXV`!?E1Z99X1805~^~TGN0AEjoxn88kvvXKcB5~KkZnAc8Edr z$m)o$JH7BdqlRP46i(*#RP~n&j*cyN#heqrz%$^;*SKzN(kk&{NdHpyNvii=oewgY zF8^HUJuMQy*VXzB(p+7`?dTipe3HIDMe4pQWqFQR+};&nx8Ox}vi<6Q!2$}Y_y@A>t^4n* zQ*CunHSsy@QOru~;9%35dg5K{g1TZ&w(yb9!(1YE-yhsFsQGgFbI1n^G->E-vb5h+ z*YmNmxzw6r+JYi3J*M-7qCuip) zb5gPkMM-xr)z=4FCh1L+&WnqS8+ne2qD4M|p5HNJL^)CMLe^G3nrSs@r-$A3WK}|{q|{Wsokp&e zwgFWOeFu)zQ#_B2*|^EwfLf)XgeNA_A-=rz+ay`Ob57F7{KgMOT$jQEtj}L?dUm&~ zRLf%0eY%G0R|r+#pc}olAVxh=&}@u~LE@9yrt zUy6z!PS%;36<$`ooSd>%bRmE4ar=Wgt1)Vmq}XM7?%D?f-)cjb`fQ!&lD1ZPdpJH! zIP}9xC}W58&GF|WZetRHQw$^~eP6AN8=4XejPJ;4=1pmq4>;XSS+`X77Ew&T5^u$GhdbS zBw-gwpEJ-GuvS-<_uV>i)1BdRMNE%*$cO&ZrMW|{9>+X4Y5S@l3SLgFa5;99vddO5 z&RLjq;r0B}wfEoCQ@l0KlSbIxtyft*gQ^ZdB69{*PoqIY`;{O zI&I1jUlPtWllJ`P1{# zGxGw&7fNk!{#>K>IwWuPZmmZ}Ro{Cr6D1R)#x|bSub0i-XL(vI8m8s1xK6vq7SpRO z@QW@!9x;;Jy2{A(Cghoe`1^6&_AAZHc_JE#hS?=AmMw=1`@)&0DL>k;Xl_*DPB3>J zbBIvoLVL#Bs(h7+^qd8jxr=K*x3a4-T)Wk9y|k*n|IBj;6*Ix8O$l^nR+vaYg`{ig1Z@vsZlpA=a zLh?&ZK}k*F4(52T+YQG`%FEkXFNNJ!ZXRjwucO#K&r){yL*N>ZhvT&lD!EzX;Fmdv zYMFPrR)2ZA%JXtUE9|0N&`1A|EVkd9UCpF@LZoX-gl;OZmp2+~wuO#4rE`hb7=O$C zRi4eo`q|xkXS>u`d46%eiN+N9ROD7wLwg3!HTecn?hzwhS`{0P1M@OFZpa7g=zTYX zC(n%ddW?(a%85t{&Ujh>sBi)8__tTA8V*lg$&twHGcXqu{_L-trobL?=iapqpZk9H zC_GOnmUN#H{1y{)zjeCF!San)53`9Yqu{mvG}UCW{=tE4vuVb|t5=sESM~4uQA##4 zICv9f$b$`ugU~XdW|p{9R3wAOPxXBJADd}uDKn+wG&9=zy1S1?a-U3pb>dQ_3<-^D z?(j9Y~AHVS6@P*n>F-L}>O0vG(wH`9UiPoelj(d`ySo2dnuPsGu8ZYyv zUZFN_y*lnYs_+jAbX2*%zPX=hY>HJ^+oRJXr#iGv z`*Av=nMrkTe&lx-gJk95x z(frd}s9ut`M*d^&YFU2n^986<+@*O37EQSw6PlOw!(9w3??t7c7&QeRwCH#ik zQ|sgMoQTe#?Ws#?cb30$IPkgS--e9+@E!)+ieBY=%80cRc3m> zWJJH4vRp~LB1h@LTQ7LSTsC|9n8U(KO}~!ah4rtZN8>XT*pt;0d2ENiNP*n1D0%Q( zF!4~?LgxqX(a^q6>M7Hr`^v95_lAGpUKBcKYtXxAOwlQ@>2sM+_iB^%%+9VuRprL^ zLif~8KRtf`mDW}rl;RF%y;=p(@N*qzENAO7aMrhmg*Ad{GrgMO*md%McYH@ z>UiERA4^^3nB-&Z_g8u*#^i&Or->e|)HAo8FIc*eyAJruTlkOD$7d;MKPWcTOuNeB z=se&3rIGE8+n^X@XccE@M|p&eiKz9LC%a0SDFWdRhPa)Er zYhE|*F`Z-mX3mA4ewT^P_zQ(sQ5H`m*3M-Z4al!A(n8J#(tF{>p7Z*}vU~RIv9qIY ztGg>h`}0tt;U5hRjRY<8pob5c@CgvuZSVCE5D=L0D6Ywg<^(SzXfJ77z&x1Z(&1(J z^81c*!zRje1fW?#?HZ|%}@A!yKP zdR)<~UH-yg8QYINf1r`3PU_Am!96ZNPBN+Z+fqMMQKik9DZDz?8>uBH#GjspZ8SJd;{F<(-(X!oaWGQF(mC3S7B<*Yle)^{N-qxjmG z1ADSuBOW&oWjQ@)9G#}Prn2hZxa-)!!FZwpPmPlU&X+ZvcU)id zd~0*Jn(R-HP*&GGiq!axO!+)%+ACV-!?3SBp&HzMq2is7P(kzb*)w79C6_H*wez_MZ%N51SSQ8vdt!b0ul)TYOpb@+s>78aQvL3}wkQzG{0J z7M`9ywx5z!Pw&4k-ps|?-dAGT<~jKI6*Rwbl5sZ#g&doL5^~GI=E>WA-ZsBE9~PQHCYz zNeOw6p&_fcF2jT{PfdNj*t2tKE$PsTDfg#re(P#WG-#uX*j&d#L4nMnbX5!jSJ)eY zmp(pt++srpfgRHiclI0N%+R(;4x2(AX?LjdM~6K+>yDCt@Bf{AwV5q*>M6OI6W3N| zJY?y)HTFR^#=*gX=s-sYeyZ+EM)alk5#kT{FEx|cr)VC?oIty(>D2{kC}9Nz2XBS! z2&a(H4ib7-yGpzzuTAbl@jW!VrXU|Ry)=?R-aS05hEje8d7mPuj#Fp*`*%io9jRED zuQP}-zqCiDKjPn&UmH(Y0^fCi3q*1>xH-_Q84^Mb(SgjY13yGVmHmw}s#{xk2OLwd zu;7MGtommO>O|;shFAiG!Y$gN5(KG*H;Nakzx4pflh+raAIbjo-5Koqtmez1W z(1;EO{N-h9t0>*H<1cj|+G+9KXKd~&bPF#lJKC3>jR%39iiT~(MK25!?|Q4BJbCT|>#r{Yw(yg$tpTGtX7GP*b1O5>RER1&+2cxlcda$9orDYf7 zM@ZJjer0l_bGEIfmK8C zMimQpu=S-((4c+nA!*O?-D61=mHpIzSMJ!Ggr&^4*a$i1HFSa7Sk%#XPIS1t{`%4b zrHU#Mq7ZpuvgcF0bQG$N8y_ihw)PJ1rJ3@s zu0&``5ZVhw%lfcOl?XHrAc-CWxriowz^^Z=^zr$t%cqKWtcV)XFL) z%VZ;=y@VGNS`MVOwKbw2Hlt*z*0i|Rd0LnIn`D5+N9QZS1qogm>X-iQ{a!z)D;n0@ zCobf*kh#V9_wR>%2m2bR(;mwE%E9$=iHhDwU%q~>%_ZtazAR`}?A)>mQa7-LbPNot z$kx8`S(R#gb%8@xHXeoA!p=W6%vjOfrX0pxgUaqVt|PmlwP1##!_ltNuBW4IMxLSJ za!a@HupEE)9!EArFWWpw^k(aObi*Gwc#snZA72QKf$9$t9HF6!=+!4wuO@pcP(aQD z!HLu}>V}4KEwgW&`Y4j$PADt0=$zCKO34vMs6Au8hSkSjJ{W8Rh96?qjmw8 zcn|mtopX;ByIhMK$;!#8C@=pff%*X<6A>3@ch{lJj|`f88v^eL`}+F6GJIp2y$UIu&oFpMe4|}^W~@C5hs(g( z>Lk>)2n`EnX6DTNd`lcND8@{V<#%l$szM=G=8iOagnBa6`UEf4?wp(G($drnJ3D)T zYOL_ez!r$4!P0rm44DXdX+1u5=kDEO#U8VtT1D-?-hteOk;gC>L^!HW2o#$;CEgAW z4hjxVf|!lVsMR$FM~I&g`K!gB53w<>Aj_p3&U(_xDN8Nd1RsLt$mGn=MLZwwZjEX> z#n&?M>Rehb=Y^e-NB&hDc%a9k+g7?xGV-u~{cWl-srqy2x}W;=dW@_JPKxki;ZsV* z)Pr?>h(Mqg^4kHA0*-WYeqQ`Azn<}3t)xas)-~Yx4tEwyLjq<$KmXR;?=zGI$z%LG zZ>xvb_^+(2XjF+{Ho+Ql|f<*`}i^F@z=I(VMxvidCu9D2`0sf&Jx0K(5o|r zjE#tWM>Isyo_4&ou&~$=9csIwz-5?@pwB($Mtfi&UfchOg;|aif!rPhK%n3jV)lRp z6~e{w!U;m}<7BLe$-5g=Z~(msjT{{_g)Y=w#kS;=b3AHPa^1F#pZ4L2k{AU|P3`32 ze|@5F0^i9GpOaK*VXzk`e9vtVGe4!_!!nA1*CA+D(({QJzyJ};zL`K`ftYC?(; zNXQ(~7GcbwMU+CGRa0 zy{n!a4<-OMG{3mb{78LIcQb~%^TtL-(j!N(u&_WVBJO}8gyn+}BlX2-M@X84L5VHX zta@9{okEx`pOxP#eO(aGGrx368^bF4vhR*P5Z6=E8?1kRewWYlHYRR6D>fD*YyJN~ zO>%8cOXd6b42p^0Ut62GmOLMmJdQdZhmyrWTDb*3oRkfLgY)q9&ucG9N=QTp1nR|- zdZ6Oc-`@|l$!!c4y@gl`S&tdgBh7dzq9+axc=a)bo*vhJci35Wc$bBRYdKjbxO#51 zj^=1SKhm1|)Z`wD^001%w4ca2WDkf@8PK5;JIw#hUQM<(GWJ1TaY&JRzBaq`^E>?y z_sY=gyBGhrE@AQ_1{WTRU4HRJ@cQ!!Nfn@S9v+@;Y;ur)Jo@+)A(fVMN6J>j>a!o< zhvW3mlPGkk4rd$8X^=Z|;(3crjI5^^dh~bh*^_gzd;_55UYToxBsBA>a7XUgvqwc= zpHV744B}4+tZiH`=;-KtEp!urlaAM{hNu-UAD{3;N!l`42x$2cH8XwLa4!EK7P^bE z4{ebO@xr3jxpr=| zM9oAw=sdXc*=@GTm8$1SzE@ZxQ+h%5_{&z^gg>&PP#~$k_AAkO#u5mEP_`>5kmz%c zgrwGGyOcAPi0LIE>a%0V4%jui-Mg#6fQHKIaKRbh25DeoVv^Eg1HmFa$V)jH-h&te zSb54fhGEJ{Ch(xt3>V!y&TD8;6Q>Sh zImf|cL`GHsG@JzZfuc85Sb}#S`uf_M9!Q9Ix-fYpNX*#kc4BcIot^sv_84`;LO|#R zuKycec)2}KR1K0`My95wd>x7W=NytwrKEqK&pyLPv*+-$#!Lm(`_q5)BKK|f)w&C^ zW8|lr&*ox#SI*Z)jssNuPDc&|5cpXkH2vJvQC%3NNXn18_o31NwS{6M2dC@)*`3< zr zI!=FHh7a7koc9v?3hII2)@R24g$M5veo(QeYo>HSWuxNR$%h|mYRW4r`n0;Bt-}gL zw`3zBSC#+X!I$Uk;+ zK?A-VBqF?hLVO`(4qa1Gqia2zjwvd_Nti(CZNR=7$4a1dBBJ?Jywr!)h z*iqm#1DQ5F|LDt>lilUevT8&?(I=F0E`t>+f}Nfa0Te-&!t$UckkksI2pqxp(U3_! z0t|ug!bvi%y?3C>709<2NT(kM@>8OGB18y&cQ`l1*d=9V-ocd+f}}vY8m%l4odiB} z#-&0?7qW~==q@ROdeg>Do6dS@(6Y*WsJ+ks2G|K^#{vRg&_C=dFnXz*W7EP%=#@eK zmMC<=7e7JFRwV*ELr4amh&tReFi?4Czv8vE#X;B#6z;_l_;z>x*u+G`)WAow3eVTD zeLl-$;?S|%D{jw>h)Lul)Vk^n9Ey)aObTh{kQymiJScp=N9P}MGcF^op@5Dwsb8it zs}`pR36;yM;~myc-Vh(dTg=|y+Cj}C^`$_Dhvz1tX#jeO2T7Ile|E}a zo3%N5VP3ZK(rn!I(6<~OK$v9I4-(>r5Zb&l)eka)3T8~ip)>Ah-9z|u38&s0zbEkj zL2X_r{KZfid@CXki2?1TFfl19E6R9&Euys#?_j2Cv z3H+Aa5lQ&^I7vmQas`EiJjIH@h!Wdu#$$xlX%=dujX9QDeHD7up-d`}(koq=+DaGn z{(Kt_e(XTq$B!KNc;uq`vVmHh`l@Jx+rLAaba;4}ySlrAR^A+N+`>AXP2OIo9`Hue zXQe-RM`7H1U%K5 zC(u11Hik8Z^eyp>_XrqmyjT}28p0|Y3lk>@66Jf|<&W=!=tBo1A=(FsB;?7(!{ZD& zW)cbupu%`8sZr3Ziki^j1@iw^;FP51O;1lB6c%Oy9n7k+C{egtIcPc(5C*SU#FE6w zdB3gnM&v4hv|TrZ*Rl(Y(pm_GP&#^g$R7*ot&Qe1Kyt5?C}L=Oc9IDwwGK9oPqy=~ z32WUht8&(*_dONvKU@ zf4sq1i#-As__QJj0`7XYn)CksdqJ2h2#G*+ELW;)eQjP|Zgu7~vExTy+CuD?DBgH! z(AK}Ym6{sXA^<2R+oXbmlTPi;z+q_X(lCi-!brijX|xv4wi@WRTyA`FJOpxV(Ivj? zYXp0b5GWCZ$xc54KtOKsh*Pf;Ny=rY4BFvD00}icYF3$OoIg+qW(UPF+5!e+8qbd& zPJ+py$Rzs*f-``^s)r6GRAn$1FlIj?3wRvcA>YB+@WKW6<*|GsFe1_p_muDm3FUDF zzM}0>n3_f5xv8F32De=Np-TxU;W3Hs6qG<|FVY z5R&d7q228K^HUFQ^)Z68`_I~Jav=zO{l<+OTRhqCsdHh^U4);oph`H9yDy$`chr?fAYkR9odl!=ml$W1C zJQQ~H9xehh(QBuQOO}4||CKo;R}7>!!70A~^8g=TFQ618WBX|@=e);Z&&$`^%GSZ+wi)%3f9&$#)5-w(=;@D2mP0wvwmY`*CttMEq*F#kj%$g1sC z{MmNi!-t>2K@!J6BT+6MN>{Ekebpo<^P!@m!j5UvrcDt0(=jxpCe*OM zLPie_HAg!90+uo;X2H zO_P6@;#u;kPY~HAxH7}T;)f$^%;5(2ek2hmy>xYs2{tEZ;SG>!2#fRsxI1m&gB%gJ z@eZLd1AgtMQwIxK{yfPW^1p(%r3Hs~DZ%swSQEsBA6XW(O23%6=H=xP;?m-s*ZiZ5 zJc1w>PE-w`<~tAnLke!>_~TRE{r%>%VgL~&0t(?VSc1K(n#{{quR44G#_&HqDxPZ5j7JgU zyThgPcNhVx6FUnw<|%d)p>%uDxP%!jB~+xJ8)p)WH{Ha>c~@Yt9Z9GWRc6~sxqg+Z)U7=DBZh0`c{$hjH~_?DIFv}W5;URO`3O-kqv)XXgLuC%6q%uvTd6(`A<`a%`B$&zooz>~ zLj&u>oq@-EHucpm5M5NbDMVx2%S;_~O-%Mf^_$C|g2oj152S3Ndb{J*s}`^{gn`10 zglaU6mX!EepHGJ1BpM+W78jj{KJWK|4_D8hh8TD=2Eq^uy>Jw&s;U=(HaBkE$l&Qp zWLZnsW(DAbDzI}PFb$g~2Eo(v^dbyY$n==Cfr9lRu``gMBB)Wkkvp^sTHl$##!zil zMfgl?!5G*}KRtny?j0Q+8r&Vc+Goz($jLc^FdgUv;i_dEZ@&WNHs*^s9|sNc(mRaQ zG36v@jCbc2QtB~qvH&-&@c@)@Al%Bi*!70!+Cn8D9q@R` znEn7pJp??SHUzb{Ort&;D)J}5A4#~4$2qyf^bkveuR8K3N6zL8FCpiRk5R$Jp3gGf zgkqbQI-#%ed_nk&V9hi(eWaj6L)i%R-8lxyca1j;eqe!h%t1r zgF!!#XF{i%7`H*FxuO=y;nO_I#R>34=&k;0eQE3D34F@M#Z}q+l+mmzL?}kpydsbW z0t6zqm%J%mj0N2t6vAmG_dLnEqr*lDEo5bh21&{G&+5a4?dZ$fz7 z_LDEz3CtL&f~hA0k>6m`n0U@L9XZ=%^+!ct;Ti*0PCJN@fCv4g%1MW(m{CM$dA4FR z>uNN|2}Ths(naAP&h8)^h{+R>p6!Nv6e=)(JsMR`mK;t!1Mmu-yiV=ANvJtrP zF%B}$6(L28A%YPf2apUg8+v4TxVow;kQfF64fs9}a1|(7tE#GE=xl>qCgg!5vs6(| zTLq~T@aO+yoPGnC5Xj7jQ}1JJ)ycD9#)%=5h(7|OqV~Y+69yDeT>W_^MFqnX(CWPt z6AsxO&CQW8;l%09p^SLIAbs_qvDfcSz~AD-k||q2L?KA2MtYF}1MS|JXmSzYi8w-_ z=VP!RL3kYJjwC=6^!bC?jXaUb^rE7gPvGQ&I;9y^G@4I+uxAR5Ugb)k$ zA@`#@MtRX|-z0p{0>D3TN>6T^Q4(~Y5#TWgDVbhPAPyofa<6Co!IQ#H2mlEVJT9Tc zZK4w+Xo88&OZ*RsHCRTZ5Ec32lm8CNEoam^jspVi=rgG%8v`@n-t@kIKO&->m^grR zI!@Gc7&Tm04*sJ~nF+14EG)tm%D8YSoU{3^WOeNA#9Ayu#h-90l{cU0+jf`z1N?UT z!0BfU)$I4Hku-q&AmT!c`ZzTurH$YlE@7g>?nHO9P^@z&gAWr7t$ z^i7Q~;exm}?hL3IQer8ujiB8w8g}~Ek3wB45Ile~XU7|mq4k&_ zuYoKv^4`)2wR2Ih8TleN(#8*`}rSR`Tn|?y=n+r7dwi4>A}uD zwJ6T4rPuAXX_~O3mAtrOkk)b+Q>KvCg9}}cE^GzUh3wO#Myp9AqCLcNZexWaqoP!f z9zA-$YLn~%>}SAqU~KS}FXi^&kFcvlvrFpLANvOeA|5_GarCI)x9C?Y`=PT2S`cJE z+mb9VFYnQ`7G0@?F%yWI)1EumwndTuZ_@sSX~X!9=RE1-;Y&xj&5^+KeMr{*P~5=+ zR%&-!@pQd3ck{IqoOC&kWj6H4ZeuzP~yr)EE+%DN+>JPX>^ovaC9X&W%j#P|Zu-mbJOiq&UtI3)=syxFB)N z;fDmDbkM(q-#&=@CX@`9|7FMV@3oCjFzoW*zxDecd1=xUr-zNWNiLecb8kw73lRy>F?NjhNKl=Wf34?4RnLB^JNU)pWYNZh*CrM>6~`4|+@!W%@&|DNOHJ8b)QEuN+IrW zm}#|X@JP!m8#)iJ$#Nbx22bmRewn|^d2B$bZ%9i7a$GRImEccPE)AY{&LPKaaPQ-6a)aQ_RtTQ`uTPx&`6LPh##C z$PprL1b8D6QqxbaiHMWl> zx0csQIq-Dw$c40f8SIiqlDf(KTMh~b%E;BY|C{Wq4N6F=-iHW$VI5nZ3_h2@ zIYV_1;31JNMzMy5hiTP+wE#Es^Tiuvrm2Cq$OZROenI61hJ0ODPTJ4G;SU1%2ok}3 zZ5dCsx$moo&Sqz25lj*ZQ@m<8y^*~N34L_e8(WMOL+7wZb&y8gvSmxOpveZzPWXvD zuh`G~s+_vwA>#*xqg_{6%)-M|%{P#C%j|8E>u{5Lv+V!e zN>!s7X6o3j_^KtnYiw+^(+h%inMSo|>fNcMFh@8N3| zn0;4a84dTGzG{k}@EcPP!H}~X6qA5_iOEQsaT4XgYgC&zcjNsbZ1SBFKU4U2C+Y<% zv70ncbRAj!YM>!Mx0|v#%u#ZXuQ|N9GC7XFpV#@D^9ZRSU^YXir1nM!Ro08J!%-JD zQ&GA1PXFsY?$d`RRX=*sd?``Joz9OKH3pE;jG-sNp$V9^<-E4yj@*oOSBWelY>gP< zlKRP(aPc`*on;&a1@HWJpy^6pxcvCreo%Ml^0 zg~@Ayz!kAiPxJ0Xl=q~?hX3U`vk8isEn|VjpE?)rs+p^D>5N{QgSE=*SRgZ|bkJWE zB`#azi`ILwhP-zodG2{PeAv`AG(P>0X<%@k*|>#@s%K?!8px9Z96Cy*YHDkR)gqC8 z23Jl-bkvF3uE~4Ov+h2dyH2}Oymv?c6CZyCi|wbhcTOd2VeI_8G3=QkyX5w$*G2rX z`n5s|;qGZ>M#0WH%O=k*sib_;e57`B>aJT+h_avfMx~8=w_EtdEUxzSF<18pe|Yq0 zBC0_~%X6YZ-*fE8;@raGt6aH4VrI;5WD2^$bBbU0_opdI1hoK@n&rKG72|8@!7P>I z`C`IjPjBya47rJ4z{5fzNm*s(CPXGAKh%Pd$|%{q{NSoEsAKcbpFfF`2+|`EFeE>; zK%)Yr7pSfBF4u`)%x%Vn!}1kfTCd70dq1A<`7KAf2yv2Yb(L(~*VX*Lw^aKU$os7L z%Coy03cpC7JrVn`_GF{Cv)hc*FMs$g@ntbsd8IS*^~|D*F!=$HxJR!vk$)qGP9m$` zY?P6n{vJk0`@#hpq+n65KwK)pWRWf+kpDhlAE*x^1_THNYUb8g2IYx@AkfAHv@8e3 z$a59iXDGWlh7eXmLjxnjLNLXSL;}Nl1L+a3xo?(4URnLbi5^3oQ#Hd8NmuA~))O(-qW?F5_$8ayOB4Xx+${ zdPm?zq=3H1r2Om%1hBsKiwDu94_EeHowSw@b#(#f54N7rWVD4Jy*6DZPN@Fl660-= zCkE*`1pcph{tzGEb?``?2~S_V;3JCpP$I}8+OYlltUpimxztgR@3%rjTUh14@(_!oq;&AZ^ddU3MC;S zadi8}Yo4AtclZ8zyJA6l7;&e!5+*@`G=(O_cymlf|52mHo1(9 zCoA$?ke}R+c(Y}=;=GyrZrOV2PNYeYmT6MrAl~)l!R;Dt`r9EP?_qzKzF2_8T0mBm zjD%+aXr}_NlvHJs?h~6-?OqMar~)x@1*7?FTY z)9y=2O+>6_0^$sbE6b!MpYM#zVL`;zvtGRNPFm;#8I&)@Y z>X{eb6!+BwunACv#K_IP{_q1PrC^d@V^zz~Mz@utdU|{9yGm$yc%Cy>PjwsNS~Ih= zuVc=%k;d`ko9sJ_sGOahfq)s8_Y;rQk}|4A^f4%xUk4YjxVR``kfw}`%x7?nsKaL= zxMF0TUV6(|O;7$tAl6vw>pL^i#fBOflohGK^%8U&Kt3`h9;0=~DrriKcm70>#2Ibj?r=45nt zcl&|nJ;ryx7g?5;_9z!58HvjVuLuK34lkmuqw}P=1O+25zrIkRtijRE?PgjUzp$|I zFBV#23+EbYge7FUb-Gi60InwLNx-4pd+>lL60lZpOh5iMDaW;EYt6ITLqvu!Ae{o77&qv z5R$A!h8LGl;~{|Eg$o+#HJ1M!^W#u!bo=`C4H!8|hq%51zQGO8 z6%{8Ec3mYbP~-hjUtb0M*a$F75qgt24f?yeRkY57bii;pYFr!O7*gnDdkz~~ogy-a zupi@#Hi#Lh9e3OkuyE;CAeKY_Z9y zsq1I>D#6peyigBObBTD*eA~8dNbmT+c_X<|0Y!9zFJ8QGB(lYRctV)+c^z~~H!!6# zZjZR=o>`{Sp-S}9MI64bUOY8>^Qll93OArk$tqSrO`!{a@(hP*b??zq< z;>WFrg}_nEKh{k%v9;X_t~CCVFJU%jMti#E5>QF`c9_{fPwz3}76or4x{fb5AZ>OX znGbM-Og|hHCum4>b930Gt!m~64;=zY`MMf|XA-@psBp--$Hqdz%x94Ne*Ln!tbviT z`uX;uARJ9#!7%FZ^5oaGm1(Wb*GVv?@zdC($CQ-=>zky|=MoPk1&>Eho)DhE(b2IC zk>Kw}6}HF5r9K3=72AIS(bW#^>397@RGtOb-t`HKi?>OiJFO7#>{Ro@$lLs5kNr@J zjq#a+V9!ykL57g3Ef^E*E5Ml%C8b1ld{)&nomA3M2j2)7l7|5=4p1x@KMqujc%^-!7^6FI;F=Z9C(_0Y6 zv<9t$%pzh<6tN)(O+=A6PT0DDr3;mdw-QdurZbvZ_9k3H zmdmgX1}Q7@-8V&~B3iXND8F8gB7sEZA0O-LXz&$P7#6w}h4^5}`hZJX#v^J!d^n1N zx3%BlzK;;ilg7`z=Rb6)5{tm(BA~Dx^e`#wJT;;rs5_XTgc2!`0Hc;-SSM2e3}R{~ zs7lNg`-khq15&}dcM^C#)D3Q<3))DGo5s;aB*-*-_|FQu)AwI_^K4DfV>i5mwS4XO z7x&lZ{V>=JEZJw6bmgwJ*ROYCBIX@cPMTN8&PBmyp^(}P$60C+FLNX;+=Scm+~xE7 z`dGYUgGKr;mBGXF@$mso++>`0Czh=-rw>qKW?fmMfd<>|yq;c*M=|WYa$!_%t|;bS zGj-UbzUITHPhX@ROicbk-18(e^Hhcui=*#oV90uptTOEpA)y{&DZ*b3Qb%&wv=SAD z*hjzOyGqf2Rv#CH{nGOGAZnJeGxtX`Yv;_NWTEZa8j3zay}=?;q7EiuGKF}-kluS) zl|QIprbGJ9-A^{IfBv9EI^j-FT<{LE?t~<6p}8v6q}kjqp%yFeKfl!;i)I#$V3_}^ zJfEeL_h~Z+;Xm2g6#-z_)p--=wkNm)$8d8S?o}r~ud2LMPb@;6muWTWd+HfJ!kdB^ z((>db{&FS+z(VPs^1}2%keZCB<^TLhtfi%eK~l$!ElQR19g3m?jEX(>I$!(qa%MBU zJod^@Op^5cfIVVuox0}D&kwdI6eUfJ8TbNH8O}3mvRUFsV0kd5udV9KmvQjI#7Pig z{Lx^Bv!amtg(XKh=^q#cSiQWH>gvaU7f{^~)9TVotSJiBi+!4!ns8Pzf*n7v%}{@% z3>Pt*kdvNH4W;)+&ut{YnD1GGgM%Cb0$LM#f1c{%`~H+5k;+fne|_-22vx$1qiunJ zmZb6ZWM34cU@J{dPWr*pgTi;3=#;&@4A<3*=tSw%sqM%(qJzj1ev|@r1jo=zf%%oU z@f=K$For*Bwa&jfrjMzGJ3t2bquCFP;pP~PXQ4!v6$|WA#UJ&NfX}G4Atvj9Y)DB- zA?zQrSh}wasYt3?TGfvf{1FyYBXNN->n7kPNPgIkJ+p1bDnKlFC4PJs2lkLKTPvwP z`z_@J2zfGc%567r*{Cpj5xF8`)rUid3ww+J2y?Z0Q5eTMZ!Ah|6j=`JlO1L}D)7HZ zaC0N*#M%4 zZX&=(T9AW}PhFb-3>!=XGh$)Vo!Hph=>2(L)tNEW296A)w|4>xqL5PofRz|=4JZTh zekbM={Sy$d3D`v2&@co8D28V<8m+b?=&7w$Ckj^)<0H*{)Xpxwv*hXj9S;4H75H=0 zzUN=Qq6iA|gttP>HB6;=uz7_%o6p0uI0Bv+cy#bfA$Z zSR;}jd>oN5B|HtXbQGwfBb8y9wlgqnCLv0@4(tbVbrNP=ar?$wm8Kz>D~S?~hxMyS zMBhUEN%Dg;CK3sV`}blp@jG|{rdc~=HY$+4eT@YyKVR%2`pT%-W-{Rk>L-b^U6(&;zf{wWk}H3 z6tOZg-uCk=W6Y3!bmLu$8H#sr?Av_pcjcbhR~U2sxTwf*LLXKVzyd=Ll268>lkEw; znS(<(fWYSKk7O;at*HWzxvgM3H%nf1cJ>bnqQVEa4)v=Zjxxiclejv{fW5UZ>LXc) z!263S*fR9j_m$PvogDOn`TdKuB!y|z}ev^zJzUd z(&zbit<>{=W2B>dl+|P@{rUity}gVdO^Q zpAMSf1fsSq@C@He*8@QBd*Lbf-B-^w13*7+ZG8Y`lc8fGj5H_)A!sB|&t<8o8jSSq zhL6pi1Fr}-okD(~ro;KmnUD*2dQ6DGK1R=dCN?R_Eb>QtZS59aUEN%x{LD?h9U7C6>yD=#Skx4Tv5ykD$j)dpmXv6p9C~-YXbxi+Zlz(c7zs5hL!)cz+LW+f& zth}A9_YuF5bpQf2Gwm>A`cbX^O_(F(2JvK5E}2}J39-IS*GoyudLucR3qhRSObHDU z^q4k@@E^sZIl&X515;FOo%pgZR~#D}k36HRRbFS!1^GrOvTF5M`{A5{@99Qtc(Q37 z-dhlbP)Tz!_wNUwmW!a`Dk{hcEo_uFI70n_L?BvBDxPL+S3Z6mDUaixMNWOn=;D1* z^%X||)gVTm%=GmBAQ`d!Mk;+!a3-~i#Mo|F?E<%n1kDb>7~I%Dh$9$g(km+~qa~fW ziDsEoGWfN&b5%@|<)@sP-+ClR7nk=$47s}bva72oQ48_(nOK9YHs)nlKUItD{`KoN zD(iNkGE@oN1zH1yP)biKL#&FNw~d`IMjoRu?Tn`v=bVdjl(X5%DL5_l&r#EDLfpXx=#n3;QdS z2UyW&oLCu~*hTAX&$GJhJUo)v#?Kh490RtqA~)zP(0(@%4TnGn6d$^!IN> zxQ%Tc9UJ@Z{rioW)2)Js3R6oWp-&eG7kG^HDAe<_GAEt&^2--5kbh7BvT+?hr=xQn zM?y8ZNk8#qvQ7^SFmZwoA|&=;6L;t&o)!_Rvttz|O0=#cv?hv2rdy_=oB`(2X9Mpfk8Sw@yD=V^LSg2?9 zbT0{v2j1kEl@%{s;PtiDWulF60lA&mLV!ugT;Z5jVjwxn5^n$pu@XEG+yki_)1otT zaum_tjl5n!o3Ss;u3fu;BJo4K9gY!r@45KH*JuL9>}uO^fY|MHbQ_QtBM2XX;u$(P z{TK6I*gH2sB%@3VMgR#yn&j-Y`D|pJ;qE}!|4ijnCK2`WATiCr1cEjZNg?`-hjxO_(P^j2lBhZ)qP$$^yT(=NY>e+++v5)MCn z`~Ru#&Es-T+yDP7OLobUlzqv*B!vj65o0NXq%0|GNl2t9%HB4nu{2F3St3bVtRY&g zB}yq084^lF5&Auk+{@he+@Iz9{pAMVD_>Ahv+dHsl^Yy7Z z7@MI0PjYx`L~rBUcJ6#KebldJLsIU<0neGuo$J?j-ukI8*8fX4=ltq$E8B$&ojrVh z?7@4Iag3E-l1yO_ClTMQ*BvB0!SDV3w9&FMNTbo(_wgaS7hL02;x>zZ{zsckE@0&@o042z>RVe2$y##Ex*xA*G-*xHKvUb}oWzf)3Qra|V zsG}F(^yTlNBR_$Nf4-mpZNjbgN}nel?bv?wm%VR4H*SX@A@NciU7m*RK82a>gjXhI zmBj^}<1KwzBqeVW*PXdz z%5=&)#_zKQS<})cmLu&@pfAZ!=+D$B9%DwC*6+y0l8?eW}PczdNuo@XN!;3K2%EyLh?lt(Y7>+KS7eRcZ6g$uJEjB8Pko4Or- zQ65v>wuU+uoC`Y>yfP>-@CmLWeQhWa{WQV;QThN19DA=hc)|F@3>^GE?1k3_iRTL} zjJW%-E(u0CZhGkTZtfj;_0glm^n@mA{g`U{(G}SF(wHD*+{md2hkWe)asmHCdU9&_ z{$~eif9Izoho|`ck9}fOVT=gC=*F}MAi&gBh19EL**%zHN;uX46-pk%_@JRf1#*_I z?bqGA4UbOm+NFz<&>sn{b{JD`(x}yZ65zTkzf`Tna#UO~+Oc3HnsI9s2y#@=QzC5_ zX|QW(8mBJ$kYo^05eq^vH)2!B{BZH>M=P(#H?NW+VZD4gyh}qwc1nD_!#uu?WCK{+ z+k5ZdpH~^GbQ!;2T~wEdNn+E7@!|pO>L{3tH|BewQcH+^HJYKUpHjQahab$xx5j;*M+XyNK**-{%TnfyBaxhmysje z1Dw`I7g&S!a*_dZZUmwiw^?GzI_TPBOc4S&$5{AaYqDXabNyPD8&g*%SH2KOp5Uvn z5t}Wsm*VzMPk@FWCL&DAFVsB;;I%`d5 z$J&`sNqDxOxoIn_e!Gt15dnD_wWgXWUYdritkz7k3Nag?#5tUP%)zsBmc@|E$;rm~ zrgP|sUj?sJQdn781=|KrGJkd2i^Vd?wPH!IlL$x=^}b5<52N4E_;68EpbJiz=yJUD^D_OXsgL9;aH=F~4|}`Hem8=6A3C7yjF> ztG4@L+WL7f0SHdU#U=B9n%VxAQCoLRxOcqsZ736<4tB%+?q6%?w1BGyCJS2cHq-lD z8G#q45JAWCtTWi%FboJ`sCO(h@AqLV>xf6+E))q<3XwL_h6u|JS0;j5CSZj^Tmx$T zEX-{4X*8a~dfe~m`0$J>Xwfn4HJ{%70lb~l(c{w}dqwSl0kOgcOQ0VI=bLgkO;ze4 z4cE)$HwB5BJg+lOFZ>KMEJ1uw0o%|hDyPWH)$Nhqe%DmN@Qh~6oLRJc=3GO&lA4eG zhXdjv=h^D(>-+H`>qji_6|Wz>U)nTw3KbX+NhywtdHrO@V>!Hde7j1Pcl>&KY#?_uJ3wHnlx#zzq%n3wmf_A2Hgn%R1IwY0{w~M z4!B?M%p>|D)pqT6Ak$3B$cQd*37n)%_$Xe!Cy3sB+YRO!?8;m*t3$gjcN=qF9gwhVDWsm-8nY zC|Z@qMBPm97>`C>cqbgC7$#Gg8vF_it%LVHcCT)R+6mp>K=E0DkIGDX{rb^@#0e&T z?O735uI>7;cU&~c@FcB!Bj#CJ9*aGGB8;j_uz@cX@m7d|CMsR8&JE+GHSPve{g>9* zLr=(th~d_&jK@!U=#6<(-xh8~Rb74hs}H=#(kBL>6Ak&8xBEJ*Z?w{gH_EZv3PmI! z{s_h8d-qyFndagE(p!#JfjyK4Fr!d_m0{xe0a0V(^{E^HfYFdiuR54Aiy-8&zo6E7 zdiA9vpE+H^kYQp>(UIepi5A(~sryxcp*V}`%CV+3Rf<-BrGRc@|B6}aSDTH^Y;AOJ zB^o`kGxIE&O;YG1r_Pv>{j9S7MpVv>bBSG>*P@f@wX{QT%F|&3;e4+B@y@lZi<&p~ zpnAUJra2xExjMY}%iy8r6UyZjW$9PtKmNuF54B7a_ai4bPgWdY`)3W5&inb!>#9l$ zYW4i(1wkVhsGdpk2GdCfo5$IZH)Q5Il=hbyBe2KU>(m!o?LIKT>L&F@<`e3=Qu$?GHNWhoEN6^OZ6a)U`kC}7ilRlh$Dd}g) z`uQ6iYJaEY(ZeUqa}pe^xFfO&(suybVa_GL7Hl)Z?%j~~f--3so|$##*_3{2;Z&^g zkVRlBFhz8c&9QvpeQVnb1<_AOjmM7lOh3KZxT`8L8!_%*UExa^t4RT958kKGC4ECx z$jthiqKfrZAK&0SkFW6sYjb%8)W7)zW(fMv(ICOgB86ZE@}2u`k)Zqgf`UNF>L4@_ zc?*hCZaQyB=E(i?qp#n**#ws5aC(}Jy$z}ZIzJI#K&&_G*t7PzU4FK`0*gdF8j_`E zLA5RGO^nQ&gel`K~DBNvxpr0~ShEhW1Dh00ZIWP%*46+lf8 z#|lGxr?;%3S8=}H-bR<)zYcPFYfYmgwkwKvw&qp5)#A$rG`oyGsGy*L#`}U{nfBnp zEt#m#xRiHAq~(j8JBc?#fygTxs%s1@{}voN{M4zRtSr~(o}4e>E1KR06DQtut*%0{ zwcKoJ?LPZwx2iePTpsq3$wa{&51;5e_}{zL59h{(C$Ysj;da(kU#f}JJr5uc3Pj^*dj3gezmQYjTFyX)44~Wo8PrzA;!J&9+w0swWyc0-&4WHa7}55`VK(m zm4iJ%)*T=3sHmv)m`Pe!5op-?G-Fz4ly*)J3hx5|nSQ$xl z8_3BYl5Apar4Knya*9Uu*)v1jpGOT}7ZS2tI=&9_6;L9k{nTtJj;3zvrpfOsj#C^f zUQ@uwt(<9IV=|>y@%5XImPytU=QdM2^GSBK>uqr(i0{C1`-fnS<2l~i*3oB(<-FM% zP~$Z-t-yNvi4uTgoXCL-~Pw#d-v`w!XcN$4*f9dOD#9tzd-!=wX`Hca-FXOy%kAa z2Mh|%tn3EoHN*BbvoN51w1MA_9ftI^H*VhiU;|a;6zqJGEUCo zTFE4<8%5d-+iI@{DmGw?c||;_Fs9VhT)|7PU!UN%{&lXl2#we+u8p#VDhUpbG8oXW zpD{rm3_v3^LE_OzYnb=CMKsrx=1o$zNV zTT4_v!LLx~F>13kj>ikT_U!rd1N=kyjxC)VDikzrur$_~KSY36Pl;8Ow+7iDJh=&S zNwBv0cU&CdWB}7NQC|G8e*H*Vzpv`1kA|^ft19l9sI4cSbNj* zRTRIE+MvN=OBHb$IA#%~tYYKe`?GlawlJG3kZfaxoR@!M0L_^pH7$^1hvd|eUy8%gmSK=@c8zrCtD2UBQiFtFR%-*U96P~QQb%{N>l0l zx!v%09*?b0L77EN4S)M0)Z_lTGK{kVm3ABFqfnqm?hJ)a4afQG9_+vf@Z|aPMR#HS+Pi$TbHEC&?RAliIabe+1j7)tM z78cg9VZ-CGDm_|{>kt6UA=DQB7aK`aP{Hk5y}b;fu|;#2#q#s;k%ADtnx z6E?jvD`+f=)2JP$Aoj^N+ROk%MXm_m{0nQlcquj0bjFx5op0UFx26@;f6>#_y$P=M z4+)MtJ)2_N8jD(wMrtd|$QG;$iUf0jk|Oca>(S@ZUPv`Wye-s=_+m(NyG#L#|0B*m z91SCGn#9DR{%T$k6ZQ_6^OjL}?4h5sH63kiJk6sjYD{Y~Uz|!v7>8!lcLPIN(v2I( zQtfHAnWjY}%8xxfZQ0ZLG;oG|SjY~xicXLQQMrGU?o!kHJajTP4Th>dY#OR$*rId8 zI*V@YTWV%gZ^&druPSMCoEMs*!Qgr2i!65}mO<(uFf+8EoTROy**N*0` z(LJf5VAoz-I+E5R+VEziPjLG`$i)ML;5cUEqmYr#7raOzi43{aZGq94JS4HkOJx+7 zaYC#J3dN8J;RW11K(ulC+Qd=HLjq%B>_x7BVcqx%6O!O(%`7clK`p@7aO1g z6afygC`pJcCG|>H8;6?|hGy=eDoq+S+Jt12Pg+E+JRvbeuxv!3;pu7jxQTLY4jvl17_4nJNb}7ny%N8PS3wHtmJ9+eI?~B!?BZ3#7m;kO417qRU=f(#zmDsDsNIwbP`)W}nyoy~v z!vT9^)ds|mr<*XIxpwck%7ETCuehg8e-ra(AzcI7-&gTRj~%;0IYv!w*KZil4((3^ zO-)S)?QJH#cmK$^d^J6N%0Ox72o4k!qOof;Xa*V2_%dVxL(goOY}^0Ooms?Lk-vF1 z%{ry~gP}4Eo-%5ckKUyFWQFJ1ER&6z!0W3=1--_M*qo2@@08_`S8ks5p`t|5asg5IqA z(T|3xGHzSBu?R>h5iWUGX~_<}Xu&?nEWJ@uGo}v{k%V{P_55D?s_2A=I;)J zL9zn*Rp;_TXX9piwQ9bUL$$QT?FIqe4y3jC6S!9qw`54=rO-L|Og2)FBNp1MGOkb$ z+1v8z^z%dn)>WKaeWKbciG?YgxN4?tgsjNtPucKtFo~WUF>_K{S_2^!K(!Qw_*RoW z>o>7mMjvmv{!4jUUXfs29QL}Y; zQbr|uw>*0$1);lXP!(gM%!T2g&gAW5%n^sZ5r+keJd3heH8nLG(wyZs9ZKXY-)L`M z=CMem+unPZAMkiQ#kYhJZu@TyKw$lo_UVjD$o2$jl|*aX8SRdI~gr z1eA5;9HLe@f5sm>NJj^B=2}KZ0|gLx&br4)m6MPU)5>`UKdRZY?RM~poFy4Exu>8p zG{!CFPZ>9ps_|Ex^+Lxg-c7iL52<~3{|Mttz))SxcvpC%pK@(4x}9#HbE2)CfAr%s zUQ^N&T%Ob%?|9x+KHZKW#l_maeBZuMIy<{L?BaUFLHd=$+`kN??ABTY_U^T%@nWD# z1JjcrXHtzi*u`6FmeB{0+h9vDEbU!(`njy<O%NVi`-qV7Mp@pwi(=W?4d!Zwm zM9$T2Cimz4b^$ndaNwk}nURJoH@SNu8nW#DJiIz zv$C=Z-K6Y!b{^7a$65EH<~z#@?v3y`eX#w^6OY_qyURXXS4f*y+Fh$u;I3e8l8v#D z9Q$yN3xzI}r)oBA+;}WZHXm5Y3t>A+~74duj4Uf~N z&WXB`GI6MVqWjBL1N48KH|Ev-FBue?sS{HIySbLqqICfc{PEDK)OjST2JcW+^a@*2 zQ8u4`aHDb8%mMu;PMDB6pi|w_jD1)OYe93=ry|q>)o@ri$HoAu)eU^7+CD)e16WnAcccUk%FMcXO*MToX_E(dXKL zoRC@p%=@%w+e0S81e0}ks^HW0fX8FUjInOke|5X_zn%u~oLBRCrbHSkHsZ3@5P*XP zqNET94_legXwh@LessR=Pe-Ra&Sv=2t?EG1O*<4+NaKD?jPdNeAFR>!(zYh!6_o+) zL5CQz5=T-0j?6TEIvj9lB74kJsn{JN~;dK|2!kqAD^mH?KrS+-@eNzCBZTF5y7hvL96S;5{+Yg zp28+l@Ik(|pvV_qQX*C1+Ld_j7PEAZkBlcBSL8~Qg7lj4UW{0$Sd=`${{%&&z(&O; z(w2-O<&<=dv*`s!vVi)wx74Yvg3(412@M8I0X2J zbLY3_i~Wkv4qb5VXXPH_HkR+rHemkqqvJwAVRy*^>ENi(#oZ*+i}~3Y^#950*T*OS zs?qJgLZ}#`KdgP*5Wxg?EJe7uPp&=GMg9x3=6nOukzyb2%0 zgTfdMy80r|en*nt448Xv$FV*$T76#-zreJ9TG{-7vbvjX_jBjY4>=b)X_nf@5r;L} zM^Epwl5vBIACBtJ^mCuqjZr2fVHeTU)AJ44K^pnBn>VjE8F)d#DAvvQ-Su-{cv6d> z8JOn}2!4I%{NC_gwT$27KX4@M%~myNZoBuL-e#(*{+icIR$Tb;&=vGX_7@noMuUw^ zd+|GsEcqJ`BeX|N>fTa!$pv2sdGP~jMB}C^{&j@o_T_WlWJX0Alt#bhRPXfiNgEH1 zC2@O)s5BUoIe4@j8Tuec9gU`ODZ8Fwi^8(9c}2Qt4SSCK?Qjn@Z~G1AZTF^y{?Tg3 znRojbt=3(6w!K(8Zu7Hiz%Oi6x4f$^U6XCfo!$#tm9=_bdsTz2=LcVn+$+t`)SG~t z5t}rAGM4hc-Lmk#+?It`TlBVCw0W1P z5)It=&hed1k8Q33^)_R7AO$zrRV_b7w*doFmgn@b4~j#(h{^~)Max~I@c*p)0W*~Z zHr><#^DfSQu})$jINQWEpQ{Hu5jLdMgR$^a_c`EQx44 z88LdZoxH^ZJ^gj$T{wNKdi_s6^4|Wc|FfPaqn1zpMZ=a9od1%M-12uu(&C>Tj}u7C zRr~0~tw?i-)jV0Sc(SRfiN)|^Gamn43t*v+vg8gElxq=3UGYn1<}6Qj{rC6G;3u)y z!5$k^h3$R&_U>&sb*d`!37kOO3j{{qgo(&76J|E1d&{J(thoE`{{_jIZj^0Hbnq~z z=S@Ym2XU2OxQ6v=NI@f4V+#m{Yth2N?Z7%(0Ow$f7;2L&3n_~}CZhgSjVBr5JP#kb z*C#1$=WRxg!c7tgrw+9W0DzOG_Q8X8Vvk(4>h1s)A`&I86MqkR)=i)t=_BrcYX3hm zgDCc3rN=SNpImxRtD&cvq)Rw1;=Ko61OgG}{itBIH>i!-a zti_Vz;^HiK_bU9yJjd>mxh;?DWCj*5uNj(l1bt5?qIklE_5 z3b@p{=4)A>qP3~{oO9`pL~V>A%S14URxFVb_!OqNaW{hx#juEDM~b3XnU#e_LsF>B zJjLT_YMP-DUB7f!?V=bibmvi3 zLVmt{nTQ*4uVlLzfJD&&IDZTiC&D=t?Ql?}TBm;!`{J!zIL=?gKO#;wnlhz1rWDBU z-u^~#J3tm1zK07AEd?Q!doT(enJ&wV$+M`@jy-m`iS zDL;w2f&5p6{r%%fEETy?=KhE^R6n~}QYPn<{KV0?Rj+rTwyhRMn;gbpkytfwVCmzbUF1vd{kH#v@v)B?1rB>(VT5qh3C$y$FPsw z2(%UH4-h)+LldBPgnKw4Me?hc(R0hztv0MIy?O3^TQrn+0aqs7e!o*_+$$QQyQ$bX zvhtymOX=Acd0DBM`6a*q{`*_5UqT^wfHqj@U`VNxAU?LYK$HQ*W`Frw=CKXd2$Zk9 z@={F)(~;$ywb4R9k^E7JbGH8kw~2&abAf;tsMCBv1yNO?MT;T*G9ueyv{g3t2il3S zvuE3ZzM@2vJxC298MGO;s-O)reXQB7R#{OY+Hq#4&BCxcbnMH1be`h5b*u6>N7Dn_ zE_?62dS|U!=$guDM;MR}82l}yd}akCSYtf+X6&(1DA`R(}9SF>gOdM_|Xi?h|Y z%og|9HNHi=c1u6Tj@poS>2r4AmtD`n;VL3*nyTxWl-EpGS%38_J-8EU|*k%`~$Uc4hYIFWqzq`O(dgalh4q$MVf2@tmmA3W9qbNtpK;hqc0dAozv~xw})|ffZMlFdwwQ%3O~*GBZqhGmuDg&^amAt z0_F&)4L%rAu}_LHXgdLOlUH7kZd1+`8C6{Dh_i@j$ zHa!Kx-FWHJ=n&VE-js9N|Lawsh$*a#hG@xLKHe zNMRWDQ=Y~m`uD-&f(GJv|9Nd$y9NhqKXkt-enJ+)2R^}^b@J0|_Q@0Mj$=WD`GB{I z_m2WV?yz0Zi*hP|Gp3Rh)Mz4(PE8&6<$Y+)5Y6-47A1yiVTaI)+<(59pO8tAv8w8W ztfrICd$>m@B&aFmo|~YIHQakdBo<7k5_&L^-RK85XXtbYF`;-MLBtMF52=9QPwaf% z9jXoM0s5HO*mT8xp{k*g_xuIjYv$D*i~Y7=(`uI0{EcbXT?z#==S(K+dv?bezES-9 znJcw=B-~t15>*67S2>VXYxI6FFKFQ#TJzdBAyYzWa~{ckYyj()@r7_7?Zv4P_~Yv z_>#uyjqW9&#L2uSTCz<{`K4$otm)UufrIO#Xc1Cg(i;SGfs3jfIf`@4)n&yW z+qW-Uwc1cwVL|q?5MbaPMGR2@{x}qp(cErfiqAJlVB#LrtF@RcJf|EnGBT=Lz+DUG z*{@zZABn3xweTiHqc*C&;??KeO1yc89S6`r@G2c6>u!V|lb`Ev@KK$;kco1PvF!=g z!ZeAOtIS1XjQ`)4M=gdck_UuUB<(CucQKqWfsJ9bT`0$v&z#3*saVls$dDoG9Va0LV(5h4 zsq3ki|ENw8SLH7w`+cYMU9(R9I+QQtSM8Vc|4M}k`l~j!9k^}y@qx>q@VR$A&|u4T z)#1kfR-BAiW31X`wA%RSi+|RmHzg*~IajYAwO#Y`TvXN4UV%Vtt1y+iQK^!_KNwLVx~eeCLD9lK&-(VZ-Wnri|jO!DdFw&AV>W;L^6(I$B^u=kaf@7i3?f~8oF17 zHIM)te)l z;_Ae)R+g5C^#|_XUwG-o9=b=d#Q>aYmL-1s{7R%j*4Bky>W625;=+w@hQB~tkFpCG zcUNer|KY=0iZQ$s!~%;FdTbJbhV@mtn{HmMpM8V^o!ITyFXwFOGWY@!~r z{@sKk^Wz{;^TniLXGg}$7Mxb-h_(^CEQ|$g5z-07QHGp1tbGcFXgb-#mjQ0FXSeNd zxfxiu$a%vC1F8d*k_aKlu#;$zsMNCvn>dRAl2Alp?yIli`{q+HT93u)zb8?5CbbN& zzMhexB8(fqmFxe`=S?>^w;|xrg~ap82?eu=SN8!)ZCI}kY|x{7`=kW4-_qA)!LB6h zF^isSZBIGTu+h47yVqZH&D*N0Zx^j2C@g0Mt9%?efzpfNJ>(H0XV`!EaBA-1h=^-d zk5PzU0nQk-qHs*>)~y9LR#>mhfAB~wN@!n2T;fsl4Q8bt$u5&@gz6yi7Y4owv5v{8 zs3rcHF6!#@2M578$M$O7wco8ZODkB-y;|=yuY_2eoRA&A8!ZRk;9c~V;!wy4F=)}S zo?4N;ot>O2?7sQ%sg++I!lMetW^r--Cxc3KI>uRZXAkchN!F~e`}o0<>}H#^UU0_B z+PZUy`{$*+ACVXU!r@+7n>9V^?#tD=TuFh#5OKR=)0mBgg@xvQdys8oOza8+e|`Q8 z>7_`8NGZ_^ooZ!8zz8JD0-c2K*h-W;e%G>fewQ7s8X*Uc!HFrgABBf`5pf3+6n*CW zd8^3>j!%uu9zSUkuX6L89bG7sQ%#=SP#wNmxum&=+_}0Loq@>o2pmo!x|i!#M|?S} zFvCe3Z~5rhb`io&I>COvlbegKb)0(HoJ;Nt7Yc=GHgjeJuOw@oOz{38K@Eo= zJ6J_Wwd?oaeWc4El{#iv2oID1KGCP30d<(N+|trg@?S3wWykQogVvSl~4U9YMloT zRB=rFphoX-D^*$-rlMZ``X$p4ij9_+9%*~Y=*Yb;+S;l-K9OcPIQQe@G_I^0(CyUt z&Mm}xeR5qv!{RC(6XsmL^JNl}mQ6Av{b{mIy*f&(^@FqbUMyIe|M20m3u})ILQ*<- zM^r?_AT3=~LX@Rm0|ySB)#eyh#iiocK6g$ZHv7VDr=}_@5=$w02w+NX1&2SnaR;_U z7FhNe%QN8LrBTZIB{(}d!C7k#f-9nQTKTkUSkL10@($80ij$SkS#RDsx1QR4diPGE zXUxPEsFxuJ@U={r=iB@J#$Y|p_mpGMbd#Bp@87+S zqC
p|+bmY)q6-+jf(m4>vZJkb5wb{qr}Dkh10e7vYoxwLoiy1hS}^%Az6-l++= z0i{z8w7z|B9<1Rej`^VD0w6F)vF=D)5~XDRm*$BVF8mB^MK5JbxG3iDadYN$kfbb} zq6t41ycXLhB4j{yaTmk3FBuXfGJF zpnjgbdGn~$VGV!9ImY}|Ii%LoY4Tvx*H0szjQuh*>$ltZI6WcPGKreJrcFI|*^(Iw z>;R1h2i{s_#bAHle0J$5xx-Uvqu3Zk4C)51an}u4C2X!RW)2$entZUC<-^iZ z=Dqd|v+gd1#aN}SmWls9zg|Z-;A7`GcI)0f`N@WSqrY{i8~@s&s;GqUjsB%W^-#nr z{j)_~J-!BhQX()qnQS1Rop>1=HuohsKOlgLmXJMR7riTyYl3TpV9TY`o zU?TQo%oU<*LiF_W@4q)y2wP8WeejVa1i44-KQy9Zo}vlE@*e+V1ajJyNSTV99US_L zB>L`Mn+qHjqp&%38!;lkYJ=4;`}S>z;1x9+j*lw(c`2#VbAZm9=Zj3)*?f{;M{VXA z{nxW@->&tpqy2pT{Ctx0TuY-))ny!>3Eby(g7r| zl#l9Xl60866k~9QAiidcDAbEe<5U}ONp)wGYN}k9>!Buv%$zq*W%rLL?H8?FPOF3@ zQ>UZHmzBiVrJN`$;^MAaw+@;I;g=H9C} zZ{Dn`9*hJ|Tx%S>wX*2quoMSpe|2d@724vl0y9KY}FHeY^yl`O;?#l2HNibTGHSt_*n3~*BQLm0+ zWIYl27SF%>tXS1)^e^GS`l6Esi|vnLlObm-?V<$Y>h7?jd;bPG@RUzn?reh7nw(xK zY`oZlf8M=Y;$&zy3p>?RZA)=jfK^=*zC`*6p#7A#A!XjTcWuvK3L?{O|G|Ux*p3lL zk0xE>svo2RncbhWf`uE0ba;#END>q9J~>7Xa}v=ohXq4>u1_L@)7FiDVBpMu^ZF2l zoInTREdbAB+Rd9b&5IuYFP*Bou3gsmPPJnHN~<|Ne+h_f!ZSCw6{ec|TDyCPuB}MW z=X93KZR7$Dln!+i>VvEmv|F3STL=k`4x^reTYlso7@dS*^3Fs}9&af}u~_5S94;>p z?Num1ZR^RutTPG#SkYh_@4Xh)ccY-3j# zu|LrtEy-~ak&M_=D)rj+UAPf)a%s({43DjI zqp2~wwU?IUM2AlB*|sgs(Kc%dVrD5FsBE*lFCtK*gWhsBHh@b&^;-WJL^W6{!I?Lg z7tljz-nnD_z8&ZKrEi}j=M?er`90{fTCnUnZyW;AItXbh{0Z%kTC33iRfAmj>a&P| zK*sp(Da%o@*XW!b?etvQ(A;H@n>w2`Yj)YOugPV{7hBxR(m#~ds4kEE8FX^V2mdv@ z#y{IWzs|-7zuaFtx~w$T&^56mD*34Nf-e{Ec=ezYOl5603!buArvRVkvXwi}r@f$O zLKqUe0m-TDm32d@YS6wCGgedSQL}i}sNza2cFjm$Lo;e4_p)=_`keean;x4Nkzy}` zO9`1#OA{=(;$&f9_>?7hK3$4@Yk*QWj_G{4=9}YzIXhJElrQ#!F#Yzz<6D}=peKF| za|}y}!1GJjeWa0ZyyeQXxjc<*y*ir|`oZb@4k``$c6A>2FeIB#pT5s0KRde(IHQ&} zaD`Tml9EzHWaL%$(PVeIl_}``4m47Iti^BZ;`D7M@FkG-=9nyP`kj|I03}>oya~I11-T$ z;XIKmsDj?FqaW9%XmiFD0XtuoWK0kk5u?azfBkK)`#^nt4cNHy@^aCz(ZgN4ptstS-L-QQ+>Md8Et}YJ=mr#t!lP61^xI^d2!Xh6;ZXcWcRkn1MM@?DHliDm1 z+8$dFFiE~wW{Q$YP%pFkIr}GNn*BIiHQ7k!( zvZrN`tIb4M2Y}R(;dN0FPGs_q8W)~^eD(Ys41iF+@wRJNQ^vvG`@_PLtVj5CRQ|Wd zyS`r0Z@>L!Rto{xef;8+$n!lh?^1K3yGUD=Y+kXhiBUMVoM!<^1;MuoUZVW>^x*6x zpkK2L%D5${KUO}VKC2Ehsjs{ckgL*{{icI4)3=ko@aS+G-HH4;Cm$0{2Uk78m$z@M zTmAm*s)~rSo%))r1=m6A{Me})djwMe|3gr&xXe(BXDcvs8vhsjSZ?OEFa z5BTHFc(6Rtfy;(w7{@OAWOU6Z6{MD#y=OuV?nsGRx#4r1N6HDEcEeWePkCH-c_5;h z114=wXB&)#s@l5TEAP@zE2m{z=Omo_$XId()=}i}pFg{zfay%9z`YNXwvDbeh!t0! zi7HiHJ#BvV!b8DZ@IFiW8rykPahNP;)D5n`KJ?F)=|q0UV<&lb{}MqE<=4J2<>vh6 zue`0kM-Ui7g*M-%w`Aq{YKt(7ryFa9I;xK}WwPi{?tbY1{~&bEY!8?k5lR9T#PqFo zp>F#@PmUE{r0TB5vWs=#+RF3hw zQ}UoFfDpmKXo$I`SQUhf&WhA*5F;Yu`}A3XQz`b1Y3L{#k7XAwzW1Etga?Lws8Zw% zWJG6=%(R@GgDVd{UGgvh8RiC-%Pw=Qti1Uyh)2}RI1f}7CcmH1|Xc&V)IBP1rl3OefGRf#Lj50Us)%pth8>)QQUDpndjxulJl_a@G0oqwPIYtzDt$y-^wI?K$VhM6 zkGMv%6{rMcsxEBl1ko&zCgLFV;j6k6{42p=`$_%L%h2jz;M#g@Sd01d6DEca)zwA+ z6?g72xr?WwPDHV&#ifjyNPNX|=#S}qQe7H%r0*myH%2rRTRogUL?Wh@zI=Qka>@vu z!W*iy4&}TOQ}!1wd;}mp(D~{kHcUAp5LsUwJ!tsFksN;5ZZ~U*=rQhX-i$~dKpu(S zY0KUZzW3YOB#858Fcl4CO1XYL=93RS0h1U7Jt&4=3YpjP57R6xTr!;Mx##~_{Q9*I z4d7E&6I&)hqtie7?niblr6Z~s#OoWL{MBOkXCgirhPDXlj8N)~`&a!qONVWv+bZyf zl2K^YMuLDsuggva_R}W7vM5ntLv77XmOY)6#(YSk$VE$?n#j5a{A}C{G_z zCdN;-G#oB<5v>97`?Mc^_rwhGOB_figJ2dms?A)8%fL9x*M93cqcqV6V?XujFhWg| z07%sqK8X?2Iai0Z|EQ$k(S$3HQZyJIftoF&dVQ^Jid{d5M@ND~r5Ck3&ark>Yi-)2 zGN*T_Mo0;KFp542C0D;uwn?7#MXf59J51}b+9ZUCe~ro@w+yvsEpQEdNv@C~`hB$^ zLuh~H0`B)0Xn*_uuMp#3g8#1&<4N0=P19tCL+hE{N7Zi0k}H?Y>dySDW8$-Skua7XuVO}73I0CBwQ$_V5Pv<{2{b>Rcn+zckN z#oN0H2)F=*6g*}`c5M-tzIgq3NKy^W&G%eCzj^c6E;Mx~@z&r<=Mph-7t6J~wNV2S z_6VvC^U#p*ux0Ef99jj>*A^35MvEuc8-P0*b6F{CMh`xTiV_SRoa_ZzSTh zs!xzXUI8z$zpIBKNG|3e9-DPfhGh%_A9GuA|6x6%4oV&tU=G&12OkWyi0~q45-9&F z?WKftz|~12eQM><_#dc4y9kly__AGE&Z|UlDp3s5l|D#A?BY0cX*n#aT(yDZ>-4P4 zq;yhMT0#$=^cxU=^Ix@2XaA_|)2^xVY?vNSHS`4gQua|GI68UvIZy8oK?}yXMDJJh zpaZA@0GHfCS>kO4cU>Pu032{l@o5onrG*qal_?xFRL|+5Ayez?)_hg@P zh>lS!2gVcNLt9gmf1zr6#@0 zwy`=57c4mVAmekz(wZAqi{Jej(H%7W6v4(~x>j*juT1H*^n?_+Vt9e(D`^XR9Fcy{ zIJh1DwW0PuwB~b!z*YMXSyX>&1S{l=ivQb2aO#a}KzbgS2?zQ4&zY}2tXNgDufsvY zG9=ctg{EeHm+x6rU$5%WIo`e#SN&o+i&m~>&euJ`=q#&?5$sJ#+An~Z7d|}?YW{}f zBuJpA!qXPjT{neY(!gwO(i1|N%HjK3)+oi=PHfmyf>LNG)Go?UwpR0m(EU~@7@ zw5F_2DXmI>+cJ009yQ4h?A5EP>}(I5v^jI#njxA@b(32Jg&tK`uj;&Y=?oA&a3YQm zE?)8GM5Ct@4Z+c)jTEABr$GyJ@Ze#x?Tehr1ctxAovMbQkpnsP74|Ecxe*``lDZ}K zy+@5JU7Vy-m7jx;z_r4_q+Nijq7&zgT)p^iLlAd;{f|&^Ys&e3n#9g(=jYRDtVs#7 z`t-6PGZ-WWvJASTv0-%Wi$i$7&c-AI%d8o zu;NIz-TXo72+C{*Z)H#ch+(&R7iGO%T?H9c4*e{4vNg9SN9XWJ+led@_PibnvZBS5 zr!^lk=Q96_MOdO`ptJKHoj~0~j0>>>zH>Ylaj8ag0Gy(-n<#)*pnk?%Tq~+O&|qtR z$+ZKwgGTGl+7dIaXyd#D>7wK!Aoy(R@z$Q9>+havV=oqVjAP8EXbsRo%AH(KKrP>v zm&?vZ!N_rYCo7BC;)e7VgN3SAn(fMe+7Bn+g6aMm8~h z$Bqq&)o>e~kX>I{*#?LOqZ+EKtE_Udn^T*`>rW@gfB;mhozS~apTXZML5)^CyQYd? zpaqr!N$F8oJY2Kp^gnUk;>VXpbG_asxjT0sV*PHkn)pY_tLi2wq9}rJ!k%2G_mWc1 z9MP~z>>Q0mR&(Pm2eQ9HAOtu#I~>A8k0h4IVvX4`H+HcKH8G`%T8lDUt}r3vwZLHX z>0`$lkN#HGjpyl}u|fLb__j0suJxfT14S00F*>P{qguaRsTzt4P*)A!53VQL^klNr zJLt}cT&>?BVc7#u01}}5F`Zj<9yaVu_K(-`?MlWO{t=MV{ft}WDHu+f;&*$o&>MYr zkL}*_flm~p8cyrTP`W3#kZy1D&Bp?RiSS8*Ck)DdWac|b3F`d<)2Ugi9m zHWIp$HT8@X*3ssE5*@-(`~(Gt;Ol~LM49J5$*3>4?L=gknE%ioz-RHQIc#`7BJM}J z;!?(+z#N?X)RcaZ6A20;Xva8f>-`UmuTAZd0x#r4J1B6Ev{;{t&sc7P5R8m{y2$n~7;7)!j_jDnVfSACOm#w3Rs^+>psyzVZ#7Bcv%0rv z_%>D3F?Xb6z13fCaG^~{depa=?lfEd$|dCi>YK(WKzctMie8(iH*~0G*`wY~@t{w# zLEzyt!52{xWeJ*WviTN7yOZ-+5c|=DvxFVrx5r!yP3wz+u#1k)`;gN|zZL+fO4P3T z4l_nT{+*OEZ-+cd|)R94<#Nr@wSr~ zomLGAVUi=ofmQc9Un@H^y9B6c1<7WNV3(~22`5s;oI1&dr~qNHxx$o39&1%f0|i+o z;$e;ygkT&55*sZSo6@M$nF3f&D(KO@bz_M`5d|csoame)wz}vPUV5PHk(>o-idkHI z2l~v1B@#h^80dUp2T*U$b zipXLuJN3FWccI378h-&`Aw@N)a;}umr)HEzzcs@S*z+ai?BJ z04wnt+Qwb8n9=y}lOrxdS4eP=3%TktH_>v6M^DlxcW|+GD#JT(t^sU3ZSRp8qiNmb zhA}8xF}pESE9BNG26$A94QNJ%yOIlRC+8jJ?IEgaNa6#X5m%f6m_D*=ikOI3DcS|N zdsUo<=>N}OOk>KSC>63#A=rnwetf-yH%LN5Zq>KL33691F;aLDmi6mDidV@ for specifying the operation to be applied to the data - Compute() --> for specifying the HPC to be employed, the number of chunks and cores, and to trigger the computation +But, in first place, you must follow the deployment steps to make sure startR will work with the HPC of your choice, and follow some tricks for a better experience. + +## Deployment at BSC + +The full deployment steps are detailed in the [**Deployment**](inst/doc/deployment.md) section. However at BSC you do not need to follow them since everything is already installed for you. You just need to set up passwordless access: +1- generate an ssh pair of keys if you do not have one, using `ssh-keygen -t rsa` +2- ssh to the HPC login node and create a directory where to store it, using `ssh username@hostname_or_ip mkdir -p .ssh` +3- dump your public key on a new file under that folder, using `cat .ssh/id_rsa.pub | ssh username@hostname_or_ip 'cat >> .ssh/authorized_keys'` +4- adjust the permissions, using `ssh username@hostname_or_ip "chmod 700 .ssh; chmod 640 .ssh/authorized_keys"` +5- if your username is different on your workstation and on the HPC login node, add an entry in the file .ssh/config in your workstation as follows: +``` + Host short_name_of_the_host + HostName hostname_or_ip + User username + IdentityFile ~/.ssh/id_rsa +``` + +You are almost good to go. Do not forget adding the following lines on your .bashrc on CTE-Power, if you are planning to run on CTE-Power: +``` +if [[ $BSC_MACHINE == "power" ]] ; then + module unuse /apps/modules/modulefiles/applications + module use /gpfs/projects/bsc32/software/rhel/7.4/ppc64le/POWER9/modules/all/ +fi +``` + +Also, you can add the following lines on your .bashrc on your workstation for convenience: +``` +alias ctp='ssh -X username@p9login1.bsc.es' +alias start='module load R CDO ecFlow' +``` + +Then, when you open a new terminal session, you will just need to run the following commands and a fresh R session will pop up with the startR environment ready to use. +``` +start +R +``` + ## Start() In order to declare the data sets you want to process, you first need to specify a special path that shows where all the involved NetCDF files you want to process are stored, containing some wildcards in those parts of the path that vary across files. This special path is also called "path pattern". diff --git a/vignettes/start.md b/inst/doc/start.md similarity index 100% rename from vignettes/start.md rename to inst/doc/start.md diff --git a/vignettes/deployment.md b/vignettes/deployment.md deleted file mode 100644 index f2dd3e1..0000000 --- a/vignettes/deployment.md +++ /dev/null @@ -1,3 +0,0 @@ -## Deployment of startR - -This documentation page is work in progress. -- GitLab From 69fbd1f1e9942a9a5f5d36293a507bdb750781c7 Mon Sep 17 00:00:00 2001 From: Nicolau Manubens Date: Thu, 10 Jan 2019 20:39:07 +0100 Subject: [PATCH 03/20] Progress on practical guide. --- README.md | 4 ++-- inst/doc/deployment.md | 16 ++++++++-------- ...practical_guide.md => practical_guide_bsc.md} | 11 ++++++++--- 3 files changed, 18 insertions(+), 13 deletions(-) rename inst/doc/{practical_guide.md => practical_guide_bsc.md} (98%) diff --git a/README.md b/README.md index 056b95b..fa4956b 100644 --- a/README.md +++ b/README.md @@ -35,7 +35,7 @@ devtools::install_git('https://earth.bsc.es/gitlab/es/startR') ### How it works -An overview example of how to process a large data set is shown in the following. See the [**Start()**](inst/doc/start.md) documentation page, as well as the documentation of the functions in the package for further details on usage. +An overview example of how to process a large data set is shown in the following. See the [**Start()**](inst/doc/start.md) documentation page, as well as the documentation of the functions in the package for further details on usage, or see real use cases in [**Using startR at BSC**](inst/doc/practical_guide_bsc.md). The purpose of the example in this section is simply to illustrate how the user is expected to interact with the startR loading and distributed computing capability once the framework is deployed on the user workstation and computing cluster or HPC. @@ -106,7 +106,7 @@ res <- Compute(wf, During the execution of the workflow, which is orchestrated by EC-Flow and a job scheduler (either Slurm, LSF or PBS), the status can be monitored using the EC-Flow graphical user interface. Pending tasks are coloured in blue, ongoing in green, and finished in yellow. - + #### 5. Profiling of the execution diff --git a/inst/doc/deployment.md b/inst/doc/deployment.md index 2529e66..c26b21c 100644 --- a/inst/doc/deployment.md +++ b/inst/doc/deployment.md @@ -6,13 +6,13 @@ This section contains the information on system requirements and steps to set th A local or remote file system or THREDDS/OPeNDAP server providing the data to be retrieved must be accessible. -1. Install netCDF-4 if retrieving data from NetCDF files (only option available by now): +#### 1. Install netCDF-4 if retrieving data from NetCDF files (only option available by now): - zlib >= 1.2.3 and HDF5 >= 1.8.0-beta1 are required - Steps are detailed in https://www.unidata.ucar.edu/software/netcdf/docs/getting_and_building_netcdf.html -2. Install R >= 2.14.1 +#### 2. Install R >= 2.14.1 -3. Install the required R packages +#### 3. Install the required R packages - Installing `startR` will trigger the installation of other required packages ```r devtools::install_git('https://earth.bsc.es/gitlab/es/startR') @@ -27,26 +27,26 @@ For processing the data on a distributed HPC (cluster of multi-core nodes), your All machines must be UNIX based, with the "hostname", "date", "touch" and "sed" commands available. -1. Set up passwordless, userless ssh access +#### 1. Set up passwordless, userless ssh access - at least from your workstation to the HPC login node - if possible, also from the HPC login node to your workstation -2. Install the following libraries on your workstation: +#### 2. Install the following libraries on your workstation: - rsync (>= 3.0.6) - scp - ssh - EC-Flow (>= 4.9.0) -3. If you are using a separate EC-Flow host node to control your EC-Flow workflows (optional), install EC-Flow (>= 4.9.0) on the EC-Flow host node +#### 3. If you are using a separate EC-Flow host node to control your EC-Flow workflows (optional), install EC-Flow (>= 4.9.0) on the EC-Flow host node -4. Install the following libraries on the HPC login node: +#### 4. Install the following libraries on the HPC login node: - rsync (>= 3.0.6) - scp - ssh - EC-Flow (>= 4.9.0), as a Linux Environment Module (optional) - Job scheduler (Slurm, PBS or LSF) to distribute the workload across HPC nodes -5. Make sure the following requirements are fulfilled by all HPC nodes: +#### 5. Make sure the following requirements are fulfilled by all HPC nodes: - netCDF-4 if loading and processing NetCDF files (only option) - R (>= 2.14.1), as a Linux Environment Module - startR package installed diff --git a/inst/doc/practical_guide.md b/inst/doc/practical_guide_bsc.md similarity index 98% rename from inst/doc/practical_guide.md rename to inst/doc/practical_guide_bsc.md index 7750145..66d3d3d 100644 --- a/inst/doc/practical_guide.md +++ b/inst/doc/practical_guide_bsc.md @@ -3,19 +3,24 @@ In this guide, some practical examples are shown for you to see how to use startR to process large data sets in parallel on your Earth Sciences department workstation or on the BSC's HPCs. In order to do so, you need to understand 4 functions, all of them included in the startR package: - - Start() --> for declaing the data sets to process - - Step() and AddStep() --> for specifying the operation to be applied to the data - - Compute() --> for specifying the HPC to be employed, the number of chunks and cores, and to trigger the computation + - **Start()**, for declaing the data sets to process + - **Step()** and **AddStep()**, for specifying the operation to be applied to the data + - **Compute()**, for specifying the HPC to be employed, the number of chunks and cores, and to trigger the computation But, in first place, you must follow the deployment steps to make sure startR will work with the HPC of your choice, and follow some tricks for a better experience. ## Deployment at BSC The full deployment steps are detailed in the [**Deployment**](inst/doc/deployment.md) section. However at BSC you do not need to follow them since everything is already installed for you. You just need to set up passwordless access: + 1- generate an ssh pair of keys if you do not have one, using `ssh-keygen -t rsa` + 2- ssh to the HPC login node and create a directory where to store it, using `ssh username@hostname_or_ip mkdir -p .ssh` + 3- dump your public key on a new file under that folder, using `cat .ssh/id_rsa.pub | ssh username@hostname_or_ip 'cat >> .ssh/authorized_keys'` + 4- adjust the permissions, using `ssh username@hostname_or_ip "chmod 700 .ssh; chmod 640 .ssh/authorized_keys"` + 5- if your username is different on your workstation and on the HPC login node, add an entry in the file .ssh/config in your workstation as follows: ``` Host short_name_of_the_host -- GitLab From 44c18a79efbb57a15dc68392bbe770e5a540e75d Mon Sep 17 00:00:00 2001 From: Nicolau Manubens Date: Thu, 10 Jan 2019 20:43:16 +0100 Subject: [PATCH 04/20] Small fixes. --- inst/doc/deployment.md | 72 +++++++++++++++++++++--------------------- 1 file changed, 36 insertions(+), 36 deletions(-) diff --git a/inst/doc/deployment.md b/inst/doc/deployment.md index c26b21c..dcf1818 100644 --- a/inst/doc/deployment.md +++ b/inst/doc/deployment.md @@ -6,20 +6,20 @@ This section contains the information on system requirements and steps to set th A local or remote file system or THREDDS/OPeNDAP server providing the data to be retrieved must be accessible. -#### 1. Install netCDF-4 if retrieving data from NetCDF files (only option available by now): - - zlib >= 1.2.3 and HDF5 >= 1.8.0-beta1 are required - - Steps are detailed in https://www.unidata.ucar.edu/software/netcdf/docs/getting_and_building_netcdf.html +1. Install netCDF-4 if retrieving data from NetCDF files (only option available by now): + - zlib >= 1.2.3 and HDF5 >= 1.8.0-beta1 are required + - Steps are detailed in https://www.unidata.ucar.edu/software/netcdf/docs/getting_and_building_netcdf.html -#### 2. Install R >= 2.14.1 +2. Install R >= 2.14.1 -#### 3. Install the required R packages - - Installing `startR` will trigger the installation of other required packages +3. Install the required R packages + - Installing `startR` will trigger the installation of other required packages ```r devtools::install_git('https://earth.bsc.es/gitlab/es/startR') ``` - - Among others, the bigmemory package will be installed. - - If loading and processing NetCDF files (only option supported by now), install the easyNCDF package. - - If planning to interpolate the data with CDO (by using the `startR::Start` parameter called `transform`, or by using `s2dverification::CDORemap` in the workflow specified to `startR::Compute`), install s2dverification >= 2.8.4 and CDO (version 1.6.3 tested). CDO is not available for Windows. + - Among others, the bigmemory package will be installed. + - If loading and processing NetCDF files (only option supported by now), install the easyNCDF package. + - If planning to interpolate the data with CDO (by using the `startR::Start` parameter called `transform`, or by using `s2dverification::CDORemap` in the workflow specified to `startR::Compute`), install s2dverification >= 2.8.4 and CDO (version 1.6.3 tested). CDO is not available for Windows. ### Deployment steps for processing data on distributed HPCs @@ -27,30 +27,30 @@ For processing the data on a distributed HPC (cluster of multi-core nodes), your All machines must be UNIX based, with the "hostname", "date", "touch" and "sed" commands available. -#### 1. Set up passwordless, userless ssh access - - at least from your workstation to the HPC login node - - if possible, also from the HPC login node to your workstation - -#### 2. Install the following libraries on your workstation: - - rsync (>= 3.0.6) - - scp - - ssh - - EC-Flow (>= 4.9.0) - -#### 3. If you are using a separate EC-Flow host node to control your EC-Flow workflows (optional), install EC-Flow (>= 4.9.0) on the EC-Flow host node - -#### 4. Install the following libraries on the HPC login node: - - rsync (>= 3.0.6) - - scp - - ssh - - EC-Flow (>= 4.9.0), as a Linux Environment Module (optional) - - Job scheduler (Slurm, PBS or LSF) to distribute the workload across HPC nodes - -#### 5. Make sure the following requirements are fulfilled by all HPC nodes: - - netCDF-4 if loading and processing NetCDF files (only option) - - R (>= 2.14.1), as a Linux Environment Module - - startR package installed - - if using CDO interpolation, install the s2dverification package and CDO 1.6.3 - - any other R packages required by the `startR::Compute` workflow - - Use of any other Environment Modules by the `startR::Compute` workflow is supported - - A shared file system (with a unified access point) or THREDDS/OPeNDAP server is accessible across HPC nodes and HPC login node, where the necessary data can be uploaded from your workstation. A file system shared between your workstation and the HPC is also supported and advantageous. Use of a data transfer service between the workstation and the HPC is also supported under specific configurations. +1. Set up passwordless, userless ssh access + - at least from your workstation to the HPC login node + - if possible, also from the HPC login node to your workstation + +2. Install the following libraries on your workstation: + - rsync (>= 3.0.6) + - scp + - ssh + - EC-Flow (>= 4.9.0) + +3. If you are using a separate EC-Flow host node to control your EC-Flow workflows (optional), install EC-Flow (>= 4.9.0) on the EC-Flow host node + +4. Install the following libraries on the HPC login node: + - rsync (>= 3.0.6) + - scp + - ssh + - EC-Flow (>= 4.9.0), as a Linux Environment Module (optional) + - Job scheduler (Slurm, PBS or LSF) to distribute the workload across HPC nodes + +5. Make sure the following requirements are fulfilled by all HPC nodes: + - netCDF-4 if loading and processing NetCDF files (only option) + - R (>= 2.14.1), as a Linux Environment Module + - startR package installed + - if using CDO interpolation, install the s2dverification package and CDO 1.6.3 + - any other R packages required by the `startR::Compute` workflow + - Use of any other Environment Modules by the `startR::Compute` workflow is supported + - A shared file system (with a unified access point) or THREDDS/OPeNDAP server is accessible across HPC nodes and HPC login node, where the necessary data can be uploaded from your workstation. A file system shared between your workstation and the HPC is also supported and advantageous. Use of a data transfer service between the workstation and the HPC is also supported under specific configurations. -- GitLab From 8f352ef48f86c789919357dec6bce6e673cc85dd Mon Sep 17 00:00:00 2001 From: Nicolau Manubens Date: Fri, 11 Jan 2019 18:56:59 +0100 Subject: [PATCH 05/20] Progress on practical guide. --- README.md | 28 ++-- inst/doc/deployment.md | 46 +++--- inst/doc/practical_guide_bsc.md | 279 ++++++++++++++++++-------------- 3 files changed, 200 insertions(+), 153 deletions(-) diff --git a/README.md b/README.md index fa4956b..9eede53 100644 --- a/README.md +++ b/README.md @@ -1,6 +1,6 @@ ## startR - Retrieval and processing of multidimensional datasets -The startR package, developed at the Barcelona Supercomputing Center, implements the MapReduce paradigm (a.k.a. domain decomposition) on HPCs in a way transparent to the user and specially oriented to complex multidimensional datasets. +startR is an R package developed at the Barcelona Supercomputing Center which implements the MapReduce paradigm (a.k.a. domain decomposition) on HPCs in a way transparent to the user and specially oriented to complex multidimensional datasets. Following the startR framework, the user can represent in a one-page startR script all the information that defines a use case, including: - the involved (multidimensional) data sources and the distribution of the data files @@ -9,16 +9,19 @@ Following the startR framework, the user can represent in a one-page startR scri When run, the script triggers the execution of the defined workflow. Furthermore, the EC-Flow workflow manager is transparently used to dispatch tasks onto the HPC, and the user can employ its graphical interface for monitoring and control purposes. -startR is a project started at BSC with the aim to develop a tool that allows the user to automatically retrieve, homogenize and process multidimensional distributed data sets. It is an open source project that is open to external collaboration and funding, and will continuously evolve to support as many data set formats as possible while maximizing its efficiency. - An extensive part of this package is devoted to the automatic retrieval (from disk or store to RAM) and arrangement of multi-dimensional distributed data sets. This functionality is encapsulated in a single funcion called `Start()`, which is explained in detail in the [**Start()**](inst/doc/start.md) documentation page and in `?Start`. +startR is an open source project that is open to external collaboration and funding, and will continuously evolve to support as many data set formats as possible while maximizing its efficiency. + ### Installation See the [**Deployment**](inst/doc/deployment.md) documentation page for details on the set up steps. The most relevant system dependencies are listed next: - netCDF-4 - R with the startR, bigmemory and easyNCDF R packages -- For computation on UNIX HPCs: EC-Flow and a job scheduler (Slurm, PBS or LSF) +- For distributed computation: + - UNIX-based HPC (cluster of multi-processor nodes) + - a job scheduler (Slurm, PBS or LSF) + - EC-Flow >= 4.9.0 In order to install and load the latest published version of the package on CRAN, you can run the following lines in your R session: @@ -35,18 +38,19 @@ devtools::install_git('https://earth.bsc.es/gitlab/es/startR') ### How it works -An overview example of how to process a large data set is shown in the following. See the [**Start()**](inst/doc/start.md) documentation page, as well as the documentation of the functions in the package for further details on usage, or see real use cases in [**Using startR at BSC**](inst/doc/practical_guide_bsc.md). +An overview example of how to process a large data set is shown in the following. You can see real use cases in [**Using startR at BSC**](inst/doc/practical_guide_bsc.md), and you can find more information on the use of the `Start()` function in the [**Start()**](inst/doc/start.md) documentation page, as well as in the documentation of the functions in the package. -The purpose of the example in this section is simply to illustrate how the user is expected to interact with the startR loading and distributed computing capability once the framework is deployed on the user workstation and computing cluster or HPC. - -In this example, it is shown how a simple addition and averaging operation is performed, on BSC's CTE-Power HPC, over a multi-dimensional climate data set, which lives in the BSC-ES storage infrastructure. As mentioned in the introduction, the user will need to declare the involved data sources, the workflow of operations to carry out, and the computing environment and parameters. +The purpose of the example in this section is simply to illustrate how the user is expected to use startR once the framework is deployed on the workstation and HPC. It shows how a simple addition and averaging operation is performed on BSC's CTE-Power HPC, over a multi-dimensional climate data set, which lives in the BSC-ES storage infrastructure. As mentioned in the introduction, the user will need to declare the involved data sources, the workflow of operations to carry out, and the computing environment and parameters. #### 1. Declaration of data sources ```r library(startR) +# A path pattern is built repos <- '/esarchive/exp/ecmwf/system5_m1/6hourly/$var$/$var$_$sdate$.nc' + +# A Start() call is built with the indices to operate data <- Start(dat = repos, var = 'tas', sdate = '20180101', @@ -66,6 +70,8 @@ fun <- function(x) { # Expected inputs: # x: array with dimensions ('ensemble', 'time') apply(x + 1, 2, mean) + # Outputs: + # single array with dimensions ('time') } # A startR Step is defined, specifying its expected input and @@ -106,13 +112,13 @@ res <- Compute(wf, During the execution of the workflow, which is orchestrated by EC-Flow and a job scheduler (either Slurm, LSF or PBS), the status can be monitored using the EC-Flow graphical user interface. Pending tasks are coloured in blue, ongoing in green, and finished in yellow. - + #### 5. Profiling of the execution -Additionally, profiling measurements of the execution are preserved together with the output data. Such measurements can be visualized with the `PlotProfiling` function made available in the source code of the startR package. +Additionally, profiling measurements of the execution are provided together with the output data. Such measurements can be visualized with the `PlotProfiling()` function made available in the source code of the startR package. -This function has not been included as part of the official set of functions of the package because it requires a number of plotting libraries which can take time to load and, since the startR package is loaded in each of the worker jobs on the HPC or cluster, this could imply a substantial amount of time spent in repeatedly loading unused visualization libraries during the computing stage. +This function has not been included as part of the official set of functions of the package because it requires a number of extense plotting libraries which take time to load and, since the startR package is loaded in each of the worker jobs on the HPC or cluster, this could imply a substantial amount of time spent in repeatedly loading unused visualization libraries during the computing stage. ```r source('https://earth.bsc.es/gitlab/es/startR/raw/master/inst/PlotProfiling.R') diff --git a/inst/doc/deployment.md b/inst/doc/deployment.md index dcf1818..47325db 100644 --- a/inst/doc/deployment.md +++ b/inst/doc/deployment.md @@ -1,56 +1,56 @@ ## Deployment of startR -This section contains the information on system requirements and steps to set them up. Note that `startR` can be used for two different purposes, either only retrieving data locally, or retrieving plus processing data on a distributed HPC. The requirements for each purpose are detailed in separate sections. +This section contains the information on system requirements and the steps to set up such requirements. Note that `startR` can be used for two different purposes, either only retrieving data locally, or retrieving plus processing data on a distributed HPC. The requirements for each purpose are detailed in separate sections. ### Deployment steps for retrieving data locally -A local or remote file system or THREDDS/OPeNDAP server providing the data to be retrieved must be accessible. - -1. Install netCDF-4 if retrieving data from NetCDF files (only option available by now): - - zlib >= 1.2.3 and HDF5 >= 1.8.0-beta1 are required - - Steps are detailed in https://www.unidata.ucar.edu/software/netcdf/docs/getting_and_building_netcdf.html +1. Install netCDF-4 if retrieving data from NetCDF files (only file format supported by now): + - zlib (>= 1.2.3) and HDF5 (>= 1.8.0-beta1) are required by netCDF-4 + - Steps for installation of netCDF-4 are detailed in https://www.unidata.ucar.edu/software/netcdf/docs/getting_and_building_netcdf.html -2. Install R >= 2.14.1 +2. Install R (>= 2.14.1) 3. Install the required R packages - - Installing `startR` will trigger the installation of other required packages + - Installing `startR` will trigger the installation of the other required packages ```r devtools::install_git('https://earth.bsc.es/gitlab/es/startR') ``` - Among others, the bigmemory package will be installed. - - If loading and processing NetCDF files (only option supported by now), install the easyNCDF package. - - If planning to interpolate the data with CDO (by using the `startR::Start` parameter called `transform`, or by using `s2dverification::CDORemap` in the workflow specified to `startR::Compute`), install s2dverification >= 2.8.4 and CDO (version 1.6.3 tested). CDO is not available for Windows. + - If loading and processing NetCDF files (only file format supported by now), install the easyNCDF package. + - If planning to interpolate the data with CDO (either by using the `transform` parameter in `startR::Start`, or by using `s2dverification::CDORemap` in the workflow specified to `startR::Compute`), install s2dverification (>= 2.8.4) and CDO (version 1.6.3 tested). CDO is not available for Windows. + +A local or remote file system or THREDDS/OPeNDAP server providing the data to be retrieved must be accessible. ### Deployment steps for processing data on distributed HPCs -For processing the data on a distributed HPC (cluster of multi-core nodes), your network should include your workstation, an optional EC-Flow host node, a HPC login node with acces to the HPC nodes. +For processing the data on a distributed HPC (cluster of multi-processor, multi-core nodes), your workstation, an optional EC-Flow host node and a HPC login node with acces to the HPC nodes, should all be accessible in your network. -All machines must be UNIX based, with the "hostname", "date", "touch" and "sed" commands available. +All machines must be UNIX-based, with the "hostname", "date", "touch" and "sed" commands available. 1. Set up passwordless, userless ssh access - at least from your workstation to the HPC login node - if possible, also from the HPC login node to your workstation 2. Install the following libraries on your workstation: - - rsync (>= 3.0.6) - - scp - ssh + - scp + - rsync (>= 3.0.6) - EC-Flow (>= 4.9.0) 3. If you are using a separate EC-Flow host node to control your EC-Flow workflows (optional), install EC-Flow (>= 4.9.0) on the EC-Flow host node 4. Install the following libraries on the HPC login node: - - rsync (>= 3.0.6) - - scp - ssh + - scp + - rsync (>= 3.0.6) - EC-Flow (>= 4.9.0), as a Linux Environment Module (optional) - Job scheduler (Slurm, PBS or LSF) to distribute the workload across HPC nodes 5. Make sure the following requirements are fulfilled by all HPC nodes: - - netCDF-4 if loading and processing NetCDF files (only option) - - R (>= 2.14.1), as a Linux Environment Module - - startR package installed - - if using CDO interpolation, install the s2dverification package and CDO 1.6.3 - - any other R packages required by the `startR::Compute` workflow - - Use of any other Environment Modules by the `startR::Compute` workflow is supported - - A shared file system (with a unified access point) or THREDDS/OPeNDAP server is accessible across HPC nodes and HPC login node, where the necessary data can be uploaded from your workstation. A file system shared between your workstation and the HPC is also supported and advantageous. Use of a data transfer service between the workstation and the HPC is also supported under specific configurations. + - netCDF-4 is installed, if loading and processing NetCDF files (only supported format by now) + - R (>= 2.14.1) is installed as a Linux Environment Module + - the startR package is installed + - if using CDO interpolation, the s2dverification package and CDO 1.6.3 are installed + - any other R packages required by the `startR::Compute` workflow are installed + - any other Environment Modules used by the `startR::Compute` workflow are installed + - a shared file system (with a unified access point) or THREDDS/OPeNDAP server is accessible across HPC nodes and HPC login node, where the necessary data can be uploaded from your workstation. A file system shared between your workstation and the HPC is also supported and advantageous. Use of a data transfer service between the workstation and the HPC is also supported under specific configurations. diff --git a/inst/doc/practical_guide_bsc.md b/inst/doc/practical_guide_bsc.md index 66d3d3d..969bf87 100644 --- a/inst/doc/practical_guide_bsc.md +++ b/inst/doc/practical_guide_bsc.md @@ -1,21 +1,23 @@ # Practical guide for using startR at BSC -In this guide, some practical examples are shown for you to see how to use startR to process large data sets in parallel on your Earth Sciences department workstation or on the BSC's HPCs. +In this guide some practical examples are shown for you to see how to use startR to process large data sets in parallel on your workstation at the BSC ES or on the BSC's HPCs. -In order to do so, you need to understand 4 functions, all of them included in the startR package: +In order to do so, you will need to understand and use four functions, all of them included in the startR package: - **Start()**, for declaing the data sets to process - **Step()** and **AddStep()**, for specifying the operation to be applied to the data - **Compute()**, for specifying the HPC to be employed, the number of chunks and cores, and to trigger the computation -But, in first place, you must follow the deployment steps to make sure startR will work with the HPC of your choice, and follow some tricks for a better experience. +But, in first place, you must follow the deployment steps to make sure startR will work on your workstation with the HPC of your choice, and follow some tricks for a better experience. ## Deployment at BSC -The full deployment steps are detailed in the [**Deployment**](inst/doc/deployment.md) section. However at BSC you do not need to follow them since everything is already installed for you. You just need to set up passwordless access: +The full deployment steps are detailed in the [**Deployment**](inst/doc/deployment.md) section. However at BSC you do not need to follow all of them since all dependencies are already installed for you. -1- generate an ssh pair of keys if you do not have one, using `ssh-keygen -t rsa` +You only need to set up passwordless, userless access from your machine to the HPC login node, and from the HPC login node to your machine if at all possible. To establish the connection in one of the directions, you can do the following: -2- ssh to the HPC login node and create a directory where to store it, using `ssh username@hostname_or_ip mkdir -p .ssh` +1- generate an ssh pair of keys on the origin host if you do not have one, using `ssh-keygen -t rsa` + +2- ssh to the destionation node and create a directory where to store it, using `ssh username@hostname_or_ip mkdir -p .ssh` 3- dump your public key on a new file under that folder, using `cat .ssh/id_rsa.pub | ssh username@hostname_or_ip 'cat >> .ssh/authorized_keys'` @@ -29,7 +31,9 @@ The full deployment steps are detailed in the [**Deployment**](inst/doc/deployme IdentityFile ~/.ssh/id_rsa ``` -You are almost good to go. Do not forget adding the following lines on your .bashrc on CTE-Power, if you are planning to run on CTE-Power: +If you have followed these steps for the connections in the two directions (from HPC to workstation might not be possible), you are almost good to go. + +Do not forget adding the following lines on your .bashrc on CTE-Power, if you are planning to run on CTE-Power: ``` if [[ $BSC_MACHINE == "power" ]] ; then module unuse /apps/modules/modulefiles/applications @@ -39,7 +43,7 @@ fi Also, you can add the following lines on your .bashrc on your workstation for convenience: ``` -alias ctp='ssh -X username@p9login1.bsc.es' +alias ctp='ssh -X username@hostname_or_ip' alias start='module load R CDO ecFlow' ``` @@ -51,7 +55,7 @@ R ## Start() -In order to declare the data sets you want to process, you first need to specify a special path that shows where all the involved NetCDF files you want to process are stored, containing some wildcards in those parts of the path that vary across files. This special path is also called "path pattern". +In order to declare the data sets you want to process, you first need to build a special path that shows where all the involved NetCDF files you want to process are stored, containing some wildcards in those parts of the path that vary across files. This special path is also called "path pattern". Before defining an example path pattern, let's introduce some target NetCDF files. In esarchive, we can find the following files: @@ -75,19 +79,19 @@ A path pattern that could be used to define the location of these files in a com repos <- '/esarchive/exp/ecmwf/system5_m1/6hourly/$var$/$var$_$sdate$.nc' ``` -The names of the wildcards used (the pieces wrapped between '$' symbols) can be given any names. +The wildcards used (the pieces wrapped between '$' symbols) can be given any names you like, they do not necessarily need to be 'var' or 'sdate' or match any other keyword. -Once the path pattern is specified, a Start() call can be built, requesting the values of interest for each of the wildcards (also called outer dimensions), as well as for each of the dimensions inside the NetCDF files (inner dimensions). +Once the path pattern is specified, a `Start()` call can be built, which requests the values of interest for each of the wildcards (also called outer dimensions), as well as for each of the dimensions inside the NetCDF files (inner dimensions). -You can check in advance which dimensions are inside the NetCDF files by checking one of them with the basic NetCDF tools: +You can check in advance which dimensions are inside the NetCDF files by using the basic NetCDF tools: ``` ncdump -h /esarchive/exp/ecmwf/system5_m1/6hourly/tas/tas_19930101.nc ``` -This would REVELAR the following inner dimensions: 'ensemble', 'time', 'latitude', and 'longitude'. +This would reveal the following inner dimensions: 'ensemble', 'time', 'latitude', and 'longitude'. -We can now put the Start call together: +We now have all the information we need to put the `Start()` call together: ```r data <- Start(dat = repos, @@ -101,7 +105,7 @@ data <- Start(dat = repos, longitude = 'all') ``` -This will yield some output messages: +This will display some progress and information messages: ```r * Exploring files... This will take a variable amount of time depending @@ -125,11 +129,11 @@ Warning messages: ! dimension with pattern specifications. ``` -The warnings shown are normal, and could be avoided with a more wordy specification of the parameters to the Start function. +The warnings shown are normal, and could be avoided with a more wordy specification of the parameters to the `Start()` function. -The dimensions of the selected data set and the total size are shown. +The dimensions of the selected data set and the total size are shown. As you have probably noticed, this `Start()` call is very fast, even though several GB of data are involved. This is because `Start()` is simply discovering the location and dimension of the involved data. -As you will notice, this Start call is very fast, even though several GB of data are involved. This is because Start is simply discovering the location and dimension of the involved data. You can give a quick look to the collected metadata with `str(data)`. +You can give a quick look to the collected metadata with `str(data)`. ```r Class 'startR_header' length 9 Start(dat = "/esarchive/exp/ecmwf/system5_m1/6hourly/$var$/$var$_$sdate$.nc", var = "tas", sdate = "19930101", ensemble = "all", time = "all", latitude = "all", ... @@ -150,15 +154,15 @@ Class 'startR_header' length 9 Start(dat = "/esarchive/exp/ecmwf/system5_m1/6hou ..- attr(*, "PatternDim")= chr "dat" ``` -There are no constrains for the numer or names of the outer or inner dimensions. In other words, Start will handle NetCDF files with any number of dimensions with any name, as well as files distributed in complex ways, since you can use customized wildcards in the path pattern. +There are no constrains for the numer or names of the outer or inner dimensions. In other words, `Start()` will handle NetCDF files with any number of dimensions with any name, as well as files distributed across folders in complex ways, since you can use customized wildcards in the path pattern. -If you are interested in actually loading the entire data set in your machine *(be careful!)* you can do so in two ways: -- adding the parameter `retrieve = TRUE` in your Start call. -- evaluating the object returned by Start: `data_load <- eval(data)` +If you are interested in actually loading the entire data set in your machine you can do so in two ways (*be careful, doing so with the `Start()` call used in this example will most likely stall your machine. Try it with a smaller region or a subset of forecast time steps*): +- adding the parameter `retrieve = TRUE` in your `Start()` call. +- evaluating the object returned by `Start()`: `data_load <- eval(data)` -You may realize that this functionality is similar to the `Load()` function in the s2dverification package. In fact, `Start()` is more advanced and flexible, although `Load()` is more mature and consistent for loading classic seasonal to decadal forecasting data. `Load()` will be adapted in the future to use `Start()` internally. +You may realize that this functionality is similar to the `Load()` function in the s2dverification package. In fact, `Start()` is more advanced and flexible, although `Load()` is more mature and consistent for loading typical seasonal to decadal forecasting data. `Load()` will be adapted in the future to use `Start()` internally. -As you can see in the Start call we issued, we have requested specific values for the outer dimensions (e.g. `var = 'tas'` or `sdate = '19930101'`), but vectors of multiple values, numeric indices, or keywords can be used. For example, `var = c('tas', 'tos')`, `sdate = 1:5` or `sdate = 'all'`. See the documentation on the Start function on GitLab (https://earth.bsc.es/gitlab/es/startR/blob/master/vignettes/start.md) or in `?Start` for more information. +As you can see in the issued `Start()` call, we have requested specific values for the outer dimensions (e.g. `var = 'tas'` or `sdate = '19930101'`), but vectors of multiple values, numeric indices, or keywords can be used. For example, `var = c('tas', 'tos')`, `sdate = 1:5` or `sdate = 'all'`. See the documentation on the `Start()` function (https://earth.bsc.es/gitlab/es/startR/blob/master/vignettes/start.md) or in `?Start` for more information. ## Step() and AddStep() @@ -175,7 +179,9 @@ fun <- function(x) { } ``` -Then, the startR Step for this operation can be defined with the function `Step`, which required for a proper functioning to specify the names of the dimensions of the input arrays expected by the function (in this example, a single array with the dimensions 'ensemble' and 'time'), as well as the names of the dimensions the function returns: +This function receives only one multidimensional array (with dimensions c('ensemble' and 'time'), although not expressed in the code), and returns one multidimensional array (with a single dimension c('time') of length 1). + +Having the function, the startR Step for this operation can be defined with the function `Step()` which requires, for a proper functioning, to specify the names of the dimensions of the input arrays expected by the function (in this example, a single array with the dimensions 'ensemble' and 'time'), as well as the names of the dimensions of the arrays the function returns: ```r step <- Step(fun = fun, @@ -189,19 +195,21 @@ Finally, a workflow of steps can be assembled as follows: wf <- AddStep(data, step) ``` -If multiple data sources were to be provided to a step, they could be provided as a list. +Functions that receive or return multiple multidimensional arrays are also supported by specifying lists of vectors of dimension names as `target_dims` or `output_dims`. It is not possible for now to define workflows with more than one step. This is pending future work. -what about defining library(blabla) in the code of the function? how to deal with that? - +Since functions wrapped with the `Step()` function will potentially be called thousands of times, it is recommended to keep them as light as possible by, for example, avoiding calls to the `library()` function to load other packages. -## Compute() locally +## Compute() Once the data sources are declared and the workflow is defined, we can proceed to specify the execution parameters (including which platform to run on) and trigger the execution. -required ecFlow? -required CDO? +Next, a few examples show examples of StartR codes to process datasets locally and on two example HPCs at BSC: the fat nodes and CTE-Power. + +### Compute() locally + +When only your own workstation is available, StartR can still be useful to process a very large dataset by chunks, avoiding a RAM memory overload and crash of the workstation. StartR will simply load the dataset by chunks and each of them will be processed sequentially. The operations defined in the workflow will be applied to each chunk, and the results will be stored on a temporary file. `Compute()` will finally gather and merge the results of each chunk and return a single data object, including one or multiple multidimensional data arrays, and additional metadata. ```r res <- Compute(wf, @@ -209,25 +217,6 @@ res <- Compute(wf, longitude = 2), threads_load = 1, threads_compute = 2, - #cluster = list(queue_host = 'p9login1.bsc.es', - # queue_type = 'slurm', - # data_dir = '/gpfs/projects/bsc32/share/startR_data_repos/gpfs/archive/bsc32/', - # temp_dir = '/gpfs/scratch/bsc32/bsc32473/startR_tests/', - # lib_dir = '/gpfs/projects/bsc32/share/R_libs/3.5/', - # #init_commands = list('module load intel/16.0.1'), - # r_module = 'R/3.5.0', - # #ecflow_module = 'ecFlow/4.9.0-foss-2015a', - # #node_memory = NULL, #not working - # cores_per_job = 2, - # job_wallclock = '00:10:00', - # max_jobs = 4, - # extra_queue_params = list('#SBATCH --qos=bsc_es'), - # bidirectional = FALSE, - # polling_period = 10#, - # #special_setup = 'marenostrum4' - # ), - #ecflow_suite_dir = '/home/Earth/nmanuben/test_remove/', - #ecflow_server = NULL, silent = FALSE, debug = FALSE, wait = FALSE) @@ -239,26 +228,29 @@ discuss ecFlow discuss plotProfiling -discuss use of metadata (dates) in the Step - -summary of all code done so far: - -## Compute() on HPC +### Compute() on the fat nodes -setup steps: - -having startR installed on workstation and HPC (done) -having Step dependencies on HPC -having passwordless connection (how to?) -having rsync, ssh, ... on all machines -ecflow?? -having the data: -- either on a shared file system -- either on remote file systems (rsync) -- either on remote file systems (with special transfer mechanism, mn4) -not required to ssh manually to the HPC +```r +res <- Compute(wf, + chunks = list(latitude = 2, + longitude = 2), + threads_load = 1, + threads_compute = 2, + cluster = list(queue_host = 'bsceslogin01.bsc.es', + queue_type = 'slurm', + temp_dir = '/home/Earth/nmanuben/startR_tests/', + r_module = 'R/3.2.0', + cores_per_job = 2, + job_wallclock = '00:10:00', + max_jobs = 4 + ), + ecflow_suite_dir = '/home/Earth/nmanuben/test_remove/', + silent = FALSE, + debug = FALSE, + wait = FALSE) +``` -example on power9 +### Compute() on CTE-Power ```r library(startR) @@ -312,54 +304,19 @@ res <- Compute(wf, wait = TRUE) ``` -## Example using obs data / or more than one data source - -```r -crps <- function(x, y) { - mean(SpecsVerification::EnsCrps(x, y, R.new = Inf)) -} - -library(startR) - -repos <- '/perm/ms/spesiccf/c3ah/qa4seas/data/seasonal/g1x1/ecmf-system4/msmm/atmos/seas/tprate/12/ecmf-system4_msmm_atmos_seas_sfc_$date$_tprate_g1x1_init12.nc' +## Additional information -data <- Start(dat = repos, - var = 'tprate', - date = 'all', - time = 'all', - number = 'all', - latitude = 'all', - longitude = 'all', - return_vars = list(time = 'date')) +### Tricks and best practices -dates <- attr(data, 'Variables')$common$time +How to select number of chunks -repos <- '/perm/ms/spesiccf/c3ah/qa4seas/data/ecmf-ei_msmm_atmos_seas_sfc_19910101-20161201_t2m_g1x1_init02.nc' +What to do if my function requires all dimensions -obs <- Start(dat = repos, - var = 't2m', - time = values(dates), - latitude = 'all', - longitude = 'all', - split_multiselected_dims = TRUE) +### Pending features -s <- Step(crps, target_dims = list(c('date', 'number'), c('date')), - output_dims = NULL) -wf <- AddStep(list(data, obs), s) +Computation of weekly means with startR is still pending future work. By now, it is not possible to do that because the metadata associated to each chunk, such as the dates, is not being sent to the `Compute()` function. -r <- Compute(wf, - chunks = list(latitude = 10, - longitude = 3), - cluster = list(queue_host = 'cca', - queue_type = 'pbs', - max_jobs = 10, - init_commands = list('module load ecflow'), - r_module = 'R/3.3.1', - extra_queue_params = list('#PBS -l EC_billing_account=spesiccf')), - ecflow_output_dir = '/perm/ms/spesiccf/c3ah/startR_test/', - is_ecflow_output_dir_shared = FALSE - ) -``` +### Example using experimental and (date-corresponding) observational data ```r repos <- paste0('/esnas/exp/ecmwf/system4_m1/6hourly/', @@ -421,22 +378,23 @@ res <- Compute(step, list(system4, erai), wait = FALSE) ``` -## Example on marenostrum 4 +## Example on MareNostrum 4 ```r library(startR) +repos <- paste0('/esarchive/exp/ecmwf/system5_m1/6hourly/', + '$var$-longitudeS1latitudeS1all/$var$_$sdate$.nc') +# Slower alternative, using files with a less efficient NetCDF +# compression configuration #repos <- '/esarchive/exp/ecmwf/system5_m1/6hourly/$var$/$var$_$sdate$.nc' -repos <- '/esarchive/exp/ecmwf/system5_m1/6hourly/$var$-longitudeS1latitudeS1all/$var$_$sdate$.nc' + data <- Start(dat = repos, var = 'tas', - #sdate = 'all', sdate = indices(1), ensemble = 'all', time = 'all', - #latitude = 'all', latitude = indices(1:40), - #longitude = 'all', longitude = indices(1:40), retrieve = FALSE) lons <- attr(data, 'Variables')$common$longitude @@ -456,10 +414,7 @@ res <- Compute(wf, data_dir = '/gpfs/projects/bsc32/share/startR_data_repos/', temp_dir = '/gpfs/scratch/pr1efe00/pr1efe03/startR_tests/', lib_dir = '/gpfs/projects/bsc32/share/R_libs/3.4/', - #init_commands = list('module load netcdf/4.4.1.1'), r_module = 'R/3.4.0', - #ecflow_module = 'ecFlow/4.9.0-foss-2015a', - #node_memory = NULL, #not working cores_per_job = 2, job_wallclock = '00:10:00', max_jobs = 4, @@ -475,4 +430,90 @@ res <- Compute(wf, wait = TRUE) ``` -## Example on cca +## Seasonal forecast verification example on cca + +```r +crps <- function(x, y) { + mean(SpecsVerification::EnsCrps(x, y, R.new = Inf)) +} + +library(startR) + +repos <- '/perm/ms/spesiccf/c3ah/qa4seas/data/seasonal/g1x1/ecmf-system4/msmm/atmos/seas/tprate/12/ecmf-system4_msmm_atmos_seas_sfc_$date$_tprate_g1x1_init12.nc' + +data <- Start(dat = repos, + var = 'tprate', + date = 'all', + time = 'all', + number = 'all', + latitude = 'all', + longitude = 'all', + return_vars = list(time = 'date')) + +dates <- attr(data, 'Variables')$common$time + +repos <- '/perm/ms/spesiccf/c3ah/qa4seas/data/ecmf-ei_msmm_atmos_seas_sfc_19910101-20161201_t2m_g1x1_init02.nc' + +obs <- Start(dat = repos, + var = 't2m', + time = values(dates), + latitude = 'all', + longitude = 'all', + split_multiselected_dims = TRUE) + +s <- Step(crps, target_dims = list(c('date', 'number'), c('date')), + output_dims = NULL) +wf <- AddStep(list(data, obs), s) + +r <- Compute(wf, + chunks = list(latitude = 10, + longitude = 3), + cluster = list(queue_host = 'cca', + queue_type = 'pbs', + max_jobs = 10, + init_commands = list('module load ecflow'), + r_module = 'R/3.3.1', + extra_queue_params = list('#PBS -l EC_billing_account=spesiccf')), + ecflow_output_dir = '/perm/ms/spesiccf/c3ah/startR_test/', + is_ecflow_output_dir_shared = FALSE + ) +``` + +## Compute() cluster template for Nord III + +```r +cluster = list(queue_host = 'nord1.bsc.es', + queue_type = 'lsf', + data_dir = '/gpfs/projects/bsc32/share/startR_data_repos/', + temp_dir = '/gpfs/scratch/bsc32/bsc32473/startR_tests/', + lib_dir = '/gpfs/projects/bsc32/share/R_libs/3.3/', + init_commands = list('module load intel/16.0.1'), + r_module = 'R/3.3.0', + cores_per_job = 2, + job_wallclock = '00:10', + max_jobs = 4, + extra_queue_params = list('#BSUB -q bsc_es'), + bidirectional = FALSE, + polling_period = 10, + special_setup = 'marenostrum4' + ) +``` + +## Compute() cluster template for MinoTauro + +```r +cluster = list(queue_host = 'mt1.bsc.es', + queue_type = 'slurm', + data_dir = '/gpfs/projects/bsc32/share/startR_data_repos/', + temp_dir = '/gpfs/scratch/bsc32/bsc32473/startR_tests/', + lib_dir = '/gpfs/projects/bsc32/share/R_libs/3.3/', + r_module = 'R/3.3.3', + cores_per_job = 2, + job_wallclock = '00:10:00', + max_jobs = 4, + extra_queue_params = list('#SBATCH --qos=bsc_es'), + bidirectional = FALSE, + polling_period = 10, + special_setup = 'marenostrum4' + ) +``` -- GitLab From 343829635dfca22aeb5885a60e6280c52497773e Mon Sep 17 00:00:00 2001 From: Nicolau Manubens Date: Fri, 11 Jan 2019 19:14:11 +0100 Subject: [PATCH 06/20] Small progress on practical guide. --- inst/doc/practical_guide_bsc.md | 36 +++++++++++++++++---------------- 1 file changed, 19 insertions(+), 17 deletions(-) diff --git a/inst/doc/practical_guide_bsc.md b/inst/doc/practical_guide_bsc.md index 969bf87..ea64da7 100644 --- a/inst/doc/practical_guide_bsc.md +++ b/inst/doc/practical_guide_bsc.md @@ -7,7 +7,11 @@ In order to do so, you will need to understand and use four functions, all of th - **Step()** and **AddStep()**, for specifying the operation to be applied to the data - **Compute()**, for specifying the HPC to be employed, the number of chunks and cores, and to trigger the computation -But, in first place, you must follow the deployment steps to make sure startR will work on your workstation with the HPC of your choice, and follow some tricks for a better experience. +But, in first place, you should understand why and when StartR can be helpful, and then, once you are decided to use startR, you must follow the deployment steps to make sure startR will work on your workstation with the HPC of your choice, and follow some tricks for a better experience. + +## Motivation + + ## Deployment at BSC @@ -205,7 +209,7 @@ Since functions wrapped with the `Step()` function will potentially be called th Once the data sources are declared and the workflow is defined, we can proceed to specify the execution parameters (including which platform to run on) and trigger the execution. -Next, a few examples show examples of StartR codes to process datasets locally and on two example HPCs at BSC: the fat nodes and CTE-Power. +Next, a few examples show StartR codes to process datasets locally and on two example HPCs at BSC: the fat nodes and CTE-Power. ### Compute() locally @@ -239,15 +243,12 @@ res <- Compute(wf, cluster = list(queue_host = 'bsceslogin01.bsc.es', queue_type = 'slurm', temp_dir = '/home/Earth/nmanuben/startR_tests/', - r_module = 'R/3.2.0', cores_per_job = 2, job_wallclock = '00:10:00', - max_jobs = 4 + max_jobs = 4, + bidirectional = TRUE ), - ecflow_suite_dir = '/home/Earth/nmanuben/test_remove/', - silent = FALSE, - debug = FALSE, - wait = FALSE) + ecflow_suite_dir = '/home/Earth/nmanuben/test_remove/') ``` ### Compute() on CTE-Power @@ -282,26 +283,23 @@ res <- Compute(wf, threads_compute = 2, cluster = list(queue_host = 'p9login1.bsc.es', queue_type = 'slurm', - data_dir = '/gpfs/projects/bsc32/share/startR_data_repos/gpfs/archive/bsc32/', temp_dir = '/gpfs/scratch/bsc32/bsc32473/startR_tests/', lib_dir = '/gpfs/projects/bsc32/share/R_libs/3.5/', - #init_commands = list('module load intel/16.0.1'), - r_module = 'R/3.5.0-foss-2018b', - #ecflow_module = 'ecFlow/4.9.0-foss-2015a', - #node_memory = NULL, #not working + r_module = 'R/3.5.0', cores_per_job = 2, job_wallclock = '00:10:00', max_jobs = 4, - extra_queue_params = list('#SBATCH --qos=bsc_es'), + #extra_queue_params = list('#SBATCH --qos=bsc_es'), bidirectional = FALSE, - polling_period = 10#, - #special_setup = 'marenostrum4' + polling_period = 10 ), ecflow_suite_dir = '/home/Earth/nmanuben/test_remove/', ecflow_server = NULL, silent = FALSE, debug = FALSE, - wait = TRUE) + wait = FALSE) + +result <- Collect(res, wait = TRUE) ``` ## Additional information @@ -517,3 +515,7 @@ cluster = list(queue_host = 'mt1.bsc.es', special_setup = 'marenostrum4' ) ``` + +## Example on CTE-Power using GPUs + + -- GitLab From c5c56c07cf74a27922938899655132fabd1996c1 Mon Sep 17 00:00:00 2001 From: Nicolau Manubens Date: Fri, 25 Jan 2019 22:09:40 +0100 Subject: [PATCH 07/20] Progress on practical guide. --- inst/doc/practical_guide_bsc.md | 164 ++++++++++++++++++++++---------- 1 file changed, 116 insertions(+), 48 deletions(-) diff --git a/inst/doc/practical_guide_bsc.md b/inst/doc/practical_guide_bsc.md index ea64da7..2655d8a 100644 --- a/inst/doc/practical_guide_bsc.md +++ b/inst/doc/practical_guide_bsc.md @@ -1,33 +1,44 @@ -# Practical guide for using startR at BSC +# Practical guide for processing large data sets at BSC's HPCs using startR -In this guide some practical examples are shown for you to see how to use startR to process large data sets in parallel on your workstation at the BSC ES or on the BSC's HPCs. +In this guide some practical examples are shown for you to see how to use startR to process large data sets in parallel on the BSC's HPCs (CTE-Power 9, Marenostrum 4, ...). See the main page of the [**startR**](README.md) project for a general overview of the features of startR, without actual guidance on how to use it. -In order to do so, you will need to understand and use four functions, all of them included in the startR package: - - **Start()**, for declaing the data sets to process - - **Step()** and **AddStep()**, for specifying the operation to be applied to the data - - **Compute()**, for specifying the HPC to be employed, the number of chunks and cores, and to trigger the computation +With the constant increase of resolution (in all possible dimensions) of the weather and climate model outputs, and with the need for using computationally demanding analytical methodologies (e.g. bootstraping millions of times), it is becoming difficult or impossible to perform the analysis of such outputs with conventional tools. While tools exist to process large geospatial data sets on HPCs, they usually require adapting your data to specific formats, migrating to specific database systems, or require advanced knowledge of computer sciences or of a specific programming language or framework. -But, in first place, you should understand why and when StartR can be helpful, and then, once you are decided to use startR, you must follow the deployment steps to make sure startR will work on your workstation with the HPC of your choice, and follow some tricks for a better experience. +startR allows the R user to apply user-defined functions or procedures to large (as large as desired) collections of NetCDF files (no specific convention is required), transparently using computational resources in HPCs (multi-core, multi-node clusters) to minimize the time to solution. Although startR can be difficult to use if learnt from the documentation of its functions, it can also be used effortlessly if re-using and tweaking already existing startR scripts, like the ones provided later in this guide. startR scripts are written in R, and are short (usually under 30 lines of code), concise, and easy to read. -## Motivation +Other things you can expect to do with startR: +- Combining data from multiple model executions or observational data sources. +- Extracting values for a certain time period, geographical location, etc., from a collection of NetCDF files. +- Obtaining data arrays with results of analytical procedures that are to be plotted or stored as RData or NetCDF for later use in the analysis workflow. +- Applying a set of analytical procedures to the same data. +Things that are not supposed to be done with startR: +- Curating/homogenizing model output files or generating files to be stored under /esarchive following the department/community conventions. +If startR is suitable for your use case, you will then need to follow the configuration steps listed in the first section of this guide to make sure startR works on your workstation with the HPC of your choice. -## Deployment at BSC +Afterwards, you will need to understand and use six functions, all of them included in the startR package: + - **Start()**, for declaing the data sets to process + - **Step()** and **AddStep()**, for specifying the operation to be applied to the data + - **Compute()**, for specifying the HPC to be employed, the number of chunks and cores, and to trigger the computation + - **Collect()** and the **EC-Flow graphical user interface**, for monitoring of the progress and collection of results -The full deployment steps are detailed in the [**Deployment**](inst/doc/deployment.md) section. However at BSC you do not need to follow all of them since all dependencies are already installed for you. -You only need to set up passwordless, userless access from your machine to the HPC login node, and from the HPC login node to your machine if at all possible. To establish the connection in one of the directions, you can do the following: +## Configuring startR + +At BSC, the only configuration step you need to follow is to set up passwordless connection with the HPC. You do not need to follow the complete deployment steps since all dependencies are already installed for you to use, but you can find them under the [**Deployment**](inst/doc/deployment.md) section. + +Specifically, you need to set up passwordless, userless access from your machine to the HPC login node, and from the HPC login node to your machine if at all possible. In order to establish the connection in one of the directions, you need do the following: 1- generate an ssh pair of keys on the origin host if you do not have one, using `ssh-keygen -t rsa` -2- ssh to the destionation node and create a directory where to store it, using `ssh username@hostname_or_ip mkdir -p .ssh` +2- ssh to the destionation node and create a directory where to store it, using `ssh username@hostname_or_ip mkdir -p .ssh`. 'hostname_or_ip' refers to the host name or IP address of the login node of the selected HPC, and 'username' to your account name on the HPC, which may not coincide with the one in your workstation. 3- dump your public key on a new file under that folder, using `cat .ssh/id_rsa.pub | ssh username@hostname_or_ip 'cat >> .ssh/authorized_keys'` -4- adjust the permissions, using `ssh username@hostname_or_ip "chmod 700 .ssh; chmod 640 .ssh/authorized_keys"` +4- adjust the permissions of the key repository, using `ssh username@hostname_or_ip "chmod 700 .ssh; chmod 640 .ssh/authorized_keys"` -5- if your username is different on your workstation and on the HPC login node, add an entry in the file .ssh/config in your workstation as follows: +5- if your username is different on your workstation and on the login node of the HPC, add an entry in the file .ssh/config in your workstation as follows: ``` Host short_name_of_the_host HostName hostname_or_ip @@ -35,9 +46,9 @@ You only need to set up passwordless, userless access from your machine to the H IdentityFile ~/.ssh/id_rsa ``` -If you have followed these steps for the connections in the two directions (from HPC to workstation might not be possible), you are almost good to go. +After following these steps for the connections in both directions (although from the HPC to the workstation might not be possible), you are good to go. -Do not forget adding the following lines on your .bashrc on CTE-Power, if you are planning to run on CTE-Power: +Do not forget adding the following lines in your .bashrc on CTE-Power if you are planning to run on CTE-Power: ``` if [[ $BSC_MACHINE == "power" ]] ; then module unuse /apps/modules/modulefiles/applications @@ -45,23 +56,32 @@ if [[ $BSC_MACHINE == "power" ]] ; then fi ``` -Also, you can add the following lines on your .bashrc on your workstation for convenience: +You can add the following lines in your .bashrc file on your workstation for convenience: ``` alias ctp='ssh -X username@hostname_or_ip' alias start='module load R CDO ecFlow' ``` -Then, when you open a new terminal session, you will just need to run the following commands and a fresh R session will pop up with the startR environment ready to use. +## Using startR + +If you have successfully gone through the configuration steps, you will just need to run the following commands in a terminal session and a fresh R session will pop up with the startR environment ready to use. + ``` start R ``` -## Start() +The library can be loaded as follows: -In order to declare the data sets you want to process, you first need to build a special path that shows where all the involved NetCDF files you want to process are stored, containing some wildcards in those parts of the path that vary across files. This special path is also called "path pattern". +```R +library(startR) +``` + +### Start() + +In order for startR to recognize the data sets you want to process, you first need to declare them. The first step in the declaration of a data set is to build a special path string that encodes where all the involved NetCDF files to be processed are stored. It contains some wildcards in those parts of the path that vary across files. This special path string is also called "path pattern". -Before defining an example path pattern, let's introduce some target NetCDF files. In esarchive, we can find the following files: +Before defining an example path pattern, let's introduce some target NetCDF files. In the esarchive, we can find the following files: ``` /esarchive/exp/ecmwf/system5_m1/6hourly/ @@ -77,25 +97,33 @@ Before defining an example path pattern, let's introduce some target NetCDF file |--tos_20171201.nc ``` -A path pattern that could be used to define the location of these files in a compact way is the following: +A path pattern that can be used to encode the location of these files in a compact way is the following: ```r repos <- '/esarchive/exp/ecmwf/system5_m1/6hourly/$var$/$var$_$sdate$.nc' ``` -The wildcards used (the pieces wrapped between '$' symbols) can be given any names you like, they do not necessarily need to be 'var' or 'sdate' or match any other keyword. +The wildcards used (the pieces wrapped between '$' symbols) can be given any names you like. They do not necessarily need to be 'var' or 'sdate' or match any specific key word (although in this case, as explained later, the 'var' name will trigger a special feature of `Start()`). -Once the path pattern is specified, a `Start()` call can be built, which requests the values of interest for each of the wildcards (also called outer dimensions), as well as for each of the dimensions inside the NetCDF files (inner dimensions). +Once the path pattern is specified, a `Start()` call can be built, in which you need to provide, as parameters, the specific values of interest for each of the wildcards (also called outer dimensions, or file dimensions), as well as for each of the dimensions inside the NetCDF files (inner dimensions). -You can check in advance which dimensions are inside the NetCDF files by using the basic NetCDF tools: +You can check in advance which dimensions are inside the NetCDF files by using e.g. easyNCDF on one of the files: +```r +easyNCDF::NcReadDims('/esarchive/exp/ecmwf/system5_m1/6hourly/tas/tas_19930101.nc', + var_names = 'tas') ``` -ncdump -h /esarchive/exp/ecmwf/system5_m1/6hourly/tas/tas_19930101.nc + +This will show the names and lengths of the dimensions of the selected variable: + +```r + var longitude latitude ensemble time + 1 1296 640 25 860 ``` -This would reveal the following inner dimensions: 'ensemble', 'time', 'latitude', and 'longitude'. +*Note: If you check the dimensions of that file with `ncdump -h`, you will realize the 'var' dimension is actually not defined inside. `NcReadDims()` and `Start()`, though, perceive the different variables inside a file as if stored along a virtual dimension called 'var'. You can ignore this for now and assume 'var' is simply a file dimension (since it appears as a wildcard in the path pattern). Read more on this in the note at the end of this section.* -We now have all the information we need to put the `Start()` call together: +Once we know the dimension names, we have all the information we need to put the `Start()` call together: ```r data <- Start(dat = repos, @@ -109,7 +137,16 @@ data <- Start(dat = repos, longitude = 'all') ``` -This will display some progress and information messages: +For each of the dimensions, the values of interest can be specified in three possible ways: +- Using one or more numeric indices, for example `sdate = indices(1)`, or `time = indices(1, 3, 5)`. +- Using one or more actual values, for example `sdate = values('19930101')`, or `ensemble = values(c('r1i1p1', 'r2i1p1'))`, or `latitude = values(10, 10.5, 11)`. The `values()` helper function can be omitted (as shown in the example). +- Using a list of two numeric values, for example `sdate = indices(list(5, 10))`. This will take all indices from the 5th to the 10th. +- Using a list of two actual values, for example `sdate = values(list('r1i1p1', 'r5i1p1'))` or `latitude = values(list(-45, 75))`. This will take all values, in order, placed between the two values specified (both ends included). +- Using the special keywords 'all', 'first' or 'last'. + +Also, the dimensions specified in the `Start()` call do not need to follow any specific order, not even the actual order in the path pattern or inside the file. The order, though, can have an impact on the performance of `Start()` as explained later in this section. + +Running the `Start()` call shown above will display some progress and information messages: ```r * Exploring files... This will take a variable amount of time depending @@ -158,19 +195,48 @@ Class 'startR_header' length 9 Start(dat = "/esarchive/exp/ecmwf/system5_m1/6hou ..- attr(*, "PatternDim")= chr "dat" ``` -There are no constrains for the numer or names of the outer or inner dimensions. In other words, `Start()` will handle NetCDF files with any number of dimensions with any name, as well as files distributed across folders in complex ways, since you can use customized wildcards in the path pattern. +The retrieved information can be accessed with the `attr()` function. For example: + +```r +attr(data, 'FileSelectors')$dat1 +``` -If you are interested in actually loading the entire data set in your machine you can do so in two ways (*be careful, doing so with the `Start()` call used in this example will most likely stall your machine. Try it with a smaller region or a subset of forecast time steps*): +If you are interested in actually loading the entire data set in your machine you can do so in two ways (*be careful, loading the data involved in the `Start()` call in this example will most likely stall your machine. Try it with a smaller region or a subset of forecast time steps*): - adding the parameter `retrieve = TRUE` in your `Start()` call. - evaluating the object returned by `Start()`: `data_load <- eval(data)` You may realize that this functionality is similar to the `Load()` function in the s2dverification package. In fact, `Start()` is more advanced and flexible, although `Load()` is more mature and consistent for loading typical seasonal to decadal forecasting data. `Load()` will be adapted in the future to use `Start()` internally. -As you can see in the issued `Start()` call, we have requested specific values for the outer dimensions (e.g. `var = 'tas'` or `sdate = '19930101'`), but vectors of multiple values, numeric indices, or keywords can be used. For example, `var = c('tas', 'tos')`, `sdate = 1:5` or `sdate = 'all'`. See the documentation on the `Start()` function (https://earth.bsc.es/gitlab/es/startR/blob/master/vignettes/start.md) or in `?Start` for more information. +There are no constrains for the number or names of the outer or inner dimensions used in a `Start()` call. In other words, `Start()` will handle NetCDF files with any number of dimensions with any name, as well as files distributed across folders in complex ways, since you can use customized wildcards in the path pattern. + +Explanation on the order of dimensions. -## Step() and AddStep() +Synonims. -Once the data sources are declared, we can define the operation to be applied. The operation needs to be encapsulated in the form of an R function receiving one or more multidimensional arrays (plus additional helper parameters) and returning one or more multidimensional arrays. For example: +Dimensions across. + +See the documentation on the `Start()` function (https://earth.bsc.es/gitlab/es/startR/blob/master/vignettes/start.md) or in `?Start` for more information. + +*Note on the 'var' dimension*: as mentioned above in this section, `NcVarReader()` is showing as if a virtual dimension 'var' appears inside the file. The existence of this dimension is justified by the fact that, many times, NetCDF files contain more than one variable. The 'var' dimension should hence be considered a "inner" dimension. But, in our example, the dimension 'var' is also defined as a file dimension in the path pattern. So, following the logic of `Start()`, there would be two 'var' dimensions, one of them outer and the other inner, and we should consequently specify indices for each of them. However, as exception, they are automatically understood to be the same dimension, and the target variable name specified as index for the outer 'var' dimension is also re-used to select the variable inside the file. This is a feature triggered only by the 'var' dimension name and, if other dimension names appeared more than once as inner or outer dimensions, `Start()` would crash throw an error. The feature described here is useful for the very common case where file paths contain the variable name and that variable is the only climate variable inside the file. If this feature was not available, one could still define the data set as shown in the code snippet below, where there would be some redundancy in the `Start()` call and in the dimensions of the resulting array.* + +```r +repos <- '/esarchive/exp/ecmwf/system5_m1/6hourly/$var_out$/$var_out$_$sdate$.nc' + +data <- Start(dat = repos, + # outer dimensions + var_out = 'tas', + sdate = '19930101', + # inner dimensions + var = 'tas', + ensemble = 'all', + time = 'all', + latitude = 'all', + longitude = 'all') +``` + +### Step() and AddStep() + +Once the data sources are declared, you can define the operation to be applied. The operation needs to be encapsulated in the form of an R function receiving one or more multidimensional arrays (plus additional helper parameters) and returning one or more multidimensional arrays. For example: ```r fun <- function(x) { @@ -203,15 +269,15 @@ Functions that receive or return multiple multidimensional arrays are also suppo It is not possible for now to define workflows with more than one step. This is pending future work. -Since functions wrapped with the `Step()` function will potentially be called thousands of times, it is recommended to keep them as light as possible by, for example, avoiding calls to the `library()` function to load other packages. +Since functions wrapped with the `Step()` function will potentially be called thousands of times, it is recommended to keep them as light as possible by, for example, avoiding calls to the `library()` function to load other packages or interacting with files on disk. See the documentation on the parameter `use_libraries` of the `Step()` function, or consider adding additional parameters to the step function with extra information. -## Compute() +### Compute() -Once the data sources are declared and the workflow is defined, we can proceed to specify the execution parameters (including which platform to run on) and trigger the execution. +Once the data sources are declared and the workflow is defined, you can proceed to specify the execution parameters (including which platform to run on) and trigger the execution. Next, a few examples show StartR codes to process datasets locally and on two example HPCs at BSC: the fat nodes and CTE-Power. -### Compute() locally +#### Compute() locally When only your own workstation is available, StartR can still be useful to process a very large dataset by chunks, avoiding a RAM memory overload and crash of the workstation. StartR will simply load the dataset by chunks and each of them will be processed sequentially. The operations defined in the workflow will be applied to each chunk, and the results will be stored on a temporary file. `Compute()` will finally gather and merge the results of each chunk and return a single data object, including one or multiple multidimensional data arrays, and additional metadata. @@ -232,7 +298,7 @@ discuss ecFlow discuss plotProfiling -### Compute() on the fat nodes +#### Compute() on the fat nodes ```r res <- Compute(wf, @@ -251,7 +317,9 @@ res <- Compute(wf, ecflow_suite_dir = '/home/Earth/nmanuben/test_remove/') ``` -### Compute() on CTE-Power +'queue_host' must match the 'short_name_of_the_host' you associated to the login node of the selected HPC in your .ssh/config file. + +#### Compute() on CTE-Power ```r library(startR) @@ -302,19 +370,19 @@ res <- Compute(wf, result <- Collect(res, wait = TRUE) ``` -## Additional information +### Additional information -### Tricks and best practices +#### Tricks and best practices How to select number of chunks What to do if my function requires all dimensions -### Pending features +#### Pending features Computation of weekly means with startR is still pending future work. By now, it is not possible to do that because the metadata associated to each chunk, such as the dates, is not being sent to the `Compute()` function. -### Example using experimental and (date-corresponding) observational data +#### Example using experimental and (date-corresponding) observational data ```r repos <- paste0('/esnas/exp/ecmwf/system4_m1/6hourly/', @@ -376,7 +444,7 @@ res <- Compute(step, list(system4, erai), wait = FALSE) ``` -## Example on MareNostrum 4 +### Example on MareNostrum 4 ```r library(startR) @@ -477,7 +545,7 @@ r <- Compute(wf, ) ``` -## Compute() cluster template for Nord III +### Compute() cluster template for Nord III ```r cluster = list(queue_host = 'nord1.bsc.es', @@ -497,7 +565,7 @@ cluster = list(queue_host = 'nord1.bsc.es', ) ``` -## Compute() cluster template for MinoTauro +### Compute() cluster template for MinoTauro ```r cluster = list(queue_host = 'mt1.bsc.es', @@ -516,6 +584,6 @@ cluster = list(queue_host = 'mt1.bsc.es', ) ``` -## Example on CTE-Power using GPUs +### Example on CTE-Power using GPUs -- GitLab From a6200dbc755cc1c0e52edee49bdaafab586ae940 Mon Sep 17 00:00:00 2001 From: Nicolau Manubens Date: Mon, 28 Jan 2019 01:47:58 +0100 Subject: [PATCH 08/20] Progress on practical guide. --- inst/doc/practical_guide_bsc.md | 620 ++++++++++++++++++++++++++------ 1 file changed, 503 insertions(+), 117 deletions(-) diff --git a/inst/doc/practical_guide_bsc.md b/inst/doc/practical_guide_bsc.md index 2655d8a..9e5c4d4 100644 --- a/inst/doc/practical_guide_bsc.md +++ b/inst/doc/practical_guide_bsc.md @@ -1,42 +1,97 @@ -# Practical guide for processing large data sets at BSC's HPCs using startR +# Practical guide for processing large data sets at BSC using startR on HPCs -In this guide some practical examples are shown for you to see how to use startR to process large data sets in parallel on the BSC's HPCs (CTE-Power 9, Marenostrum 4, ...). See the main page of the [**startR**](README.md) project for a general overview of the features of startR, without actual guidance on how to use it. +This guide includes explanations and practical examples for you to learn how to use startR to efficiently process large data sets in parallel on the BSC's HPCs (CTE-Power 9, Marenostrum 4, ...). See the main page of the [**startR**](README.md) project for a general overview of the features of startR, without actual guidance on how to use it. -With the constant increase of resolution (in all possible dimensions) of the weather and climate model outputs, and with the need for using computationally demanding analytical methodologies (e.g. bootstraping millions of times), it is becoming difficult or impossible to perform the analysis of such outputs with conventional tools. While tools exist to process large geospatial data sets on HPCs, they usually require adapting your data to specific formats, migrating to specific database systems, or require advanced knowledge of computer sciences or of a specific programming language or framework. +If you would like to start using startR rightaway on the BSC infrastructure, you can directly go through the "Configuring startR" section, copy/paste the basic startR script example shown at the end of the "Motivation" section onto the text editor of your preference, adjust the paths and user names specified in the `Compute()` function, and run the code in an R session after loading the relevant modules. -startR allows the R user to apply user-defined functions or procedures to large (as large as desired) collections of NetCDF files (no specific convention is required), transparently using computational resources in HPCs (multi-core, multi-node clusters) to minimize the time to solution. Although startR can be difficult to use if learnt from the documentation of its functions, it can also be used effortlessly if re-using and tweaking already existing startR scripts, like the ones provided later in this guide. startR scripts are written in R, and are short (usually under 30 lines of code), concise, and easy to read. +## Motivation -Other things you can expect to do with startR: +What would you do if you had to apply a custom statistical analysis procedure to a 10TB climate data set? Probably, you would need to use a scripting language to write a procedure which is able to retrieve a subset of data from the file system (it would rarely be possible to handle all of it at once on a single node), encode the procedure in that language, and apply it carefully and efficiently to the data. Afterwards, you would need to think of and develop a mechanism to dispatch the job mutiple times in parallel to an HPC of your choice, each of the jobs processing a different subset of the data set. You could do this by hand, but ideally you would use EC-Flow or a similar general purpose workflow manager, which would orchestrate the work for you, and would allow you to monitor and control the progress, as well as keep an easy-to-understand record of what you did, to be reused in the future if needed. The mentioned solution, although it is the recommended way to go, is a demanding one and you could easily spend a few days until you get it running smoothly. Additionally, when developing the job script, you would be exposed to the difficulties of efficiently managing the data and applying the encoded procedure to it. + +With the constant increase of resolution (in all possible dimensions) of weather and climate model output, and with the growing need for using computationally demanding analytical methodologies (e.g. bootstraping with thousands of repetitions), this kind of divide-and-conquer approach is becoming indispensable. While tools exist to simplify and automate this complex procedure, they usually require adapting your data to specific formats, migrating to specific database systems, or an advanced knowledge of computer sciences or of specific programming languages or frameworks. + +startR is yet another tool which allows the R user to apply user-defined functions or procedures to collections of NetCDF files (see Note 1) as large as desired, transparently using computational resources in HPCs (see Note 2) to minimize the time to solution. Although it has been designed to require as few mandatory technical parameters as possible from the user, an experienced user can configure a number of additional parameters to adjust the execution. startR operates on, and provides as outputs, multidimensional arrays with named dimensions, a basic and widely used data structure in R, and this makes the framework more familiar to the general R user. + +Although startR can be difficult to use if learnt from the documentation of its functions, it can also be used with little effort if re-using and adapting already existing startR scripts, such as the ones provided later in this guide. startR scripts are written in R, and are short (usually under 50 lines of code), concise, and easy to read. + +Other things you can expect from startR: - Combining data from multiple model executions or observational data sources. - Extracting values for a certain time period, geographical location, etc., from a collection of NetCDF files. - Obtaining data arrays with results of analytical procedures that are to be plotted or stored as RData or NetCDF for later use in the analysis workflow. -- Applying a set of analytical procedures to the same data. Things that are not supposed to be done with startR: -- Curating/homogenizing model output files or generating files to be stored under /esarchive following the department/community conventions. +- Curating/homogenizing model output files or generating files to be stored under /esarchive following the department/community conventions. Although metadata is understood and used by startR, its handling is not 100% consistent yet. -If startR is suitable for your use case, you will then need to follow the configuration steps listed in the first section of this guide to make sure startR works on your workstation with the HPC of your choice. +If startR is suitable for your use case and decide to use it, you will then need to follow the configuration steps listed in the first section of this guide to make sure startR works on your workstation with the HPC of your choice. -Afterwards, you will need to understand and use six functions, all of them included in the startR package: - - **Start()**, for declaing the data sets to process +Afterwards, you will need to understand and use five functions, all of them included in the startR package: + - **Start()**, for declaing the data sets to be processed - **Step()** and **AddStep()**, for specifying the operation to be applied to the data - **Compute()**, for specifying the HPC to be employed, the number of chunks and cores, and to trigger the computation - **Collect()** and the **EC-Flow graphical user interface**, for monitoring of the progress and collection of results +Next, you can see an example of startR script performing an ensemble mean of a small data set on CTE-Power9, for you to get a broad picture of how the startR functions interact and the information that is represented in a startR script. Note that the `temp_dir` and `ecflow_suite_dir` parameters in the `Compute()` call are user-specific. + +```r +library(startR) + +repos <- '/esarchive/exp/ecmwf/system5_m1/6hourly/$var$/$var$_$sdate$.nc' +data <- Start(dat = repos, + var = 'tas', + sdate = '20180101', + ensemble = 'all', + time = 'all', + latitude = indices(1:40), + longitude = indices(1:40)) + +fun <- function(x) { + # Expected inputs: + # x: array with dimensions ('ensemble', 'time') + apply(x + 1, 2, mean) +} + +step <- Step(fun, + target_dims = c('ensemble', 'time'), + output_dims = c('time')) + +wf <- AddStep(data, step) + +res <- Compute(wf, + chunks = list(latitude = 2, + longitude = 2), + threads_load = 2, + threads_compute = 4, + cluster = list(queue_host = 'p9login1.bsc.es', + queue_type = 'slurm', + temp_dir = '/gpfs/scratch/bsc32/bsc32473/startR_hpc/', + lib_dir = '/gpfs/projects/bsc32/share/R_libs/3.5/', + r_module = 'R/3.5.0-foss-2018b', + job_wallclock = '00:10:00', + cores_per_job = 4, + max_jobs = 4, + bidirectional = FALSE, + polling_period = 10 + ), + ecflow_suite_dir = '/home/Earth/nmanuben/startR_local/') +``` + +***Note 1****: The data files do not need to be migrated to a database system, nor have to comply any specific convention for their file names, name, number or order of the dimensions and variables, or distribution of files in the file system. Although only files in the NetCDF format are supported for now, plug-in functions (of approximately 200 lines of code) can be programmed to add support for additional file formats.* + +***Note 2****: The HPCs startR is designed to run on, are understood as multi-core multi-node clusters. startR relies on a shared file system across all HPC nodes, and does not implement any kind of distributed storage system for now.* ## Configuring startR -At BSC, the only configuration step you need to follow is to set up passwordless connection with the HPC. You do not need to follow the complete deployment steps since all dependencies are already installed for you to use, but you can find them under the [**Deployment**](inst/doc/deployment.md) section. +At BSC, the only configuration step you need to follow is to set up passwordless connection with the HPC. You do not need to follow the complete list of deployment steps since all dependencies are already installed for you to use, but you can find the steps listed under the [**Deployment**](inst/doc/deployment.md) section. Specifically, you need to set up passwordless, userless access from your machine to the HPC login node, and from the HPC login node to your machine if at all possible. In order to establish the connection in one of the directions, you need do the following: 1- generate an ssh pair of keys on the origin host if you do not have one, using `ssh-keygen -t rsa` -2- ssh to the destionation node and create a directory where to store it, using `ssh username@hostname_or_ip mkdir -p .ssh`. 'hostname_or_ip' refers to the host name or IP address of the login node of the selected HPC, and 'username' to your account name on the HPC, which may not coincide with the one in your workstation. +2- ssh to the destionation node and create a directory where to store the public key, using `ssh username@hostname_or_ip mkdir -p .ssh`. 'hostname_or_ip' refers to the host name or IP address of the login node of the selected HPC, and 'username' to your account name on the HPC, which may not coincide with the one in your workstation. 3- dump your public key on a new file under that folder, using `cat .ssh/id_rsa.pub | ssh username@hostname_or_ip 'cat >> .ssh/authorized_keys'` -4- adjust the permissions of the key repository, using `ssh username@hostname_or_ip "chmod 700 .ssh; chmod 640 .ssh/authorized_keys"` +4- adjust the permissions of the repository of keys, using `ssh username@hostname_or_ip "chmod 700 .ssh; chmod 640 .ssh/authorized_keys"` 5- if your username is different on your workstation and on the login node of the HPC, add an entry in the file .ssh/config in your workstation as follows: ``` @@ -58,7 +113,7 @@ fi You can add the following lines in your .bashrc file on your workstation for convenience: ``` -alias ctp='ssh -X username@hostname_or_ip' +alias ctp='ssh -XY username@hostname_or_ip' alias start='module load R CDO ecFlow' ``` @@ -73,7 +128,7 @@ R The library can be loaded as follows: -```R +```r library(startR) ``` @@ -121,7 +176,7 @@ This will show the names and lengths of the dimensions of the selected variable: 1 1296 640 25 860 ``` -*Note: If you check the dimensions of that file with `ncdump -h`, you will realize the 'var' dimension is actually not defined inside. `NcReadDims()` and `Start()`, though, perceive the different variables inside a file as if stored along a virtual dimension called 'var'. You can ignore this for now and assume 'var' is simply a file dimension (since it appears as a wildcard in the path pattern). Read more on this in the note at the end of this section.* +***Note****: If you check the dimensions of that file with `ncdump -h`, you will realize the 'var' dimension is actually not defined inside. `NcReadDims()` and `Start()`, though, perceive the different variables inside a file as if stored along a virtual dimension called 'var'. You can ignore this for now and assume 'var' is simply a file dimension (since it appears as a wildcard in the path pattern). Read more on this in Note 1 at the end of this section.* Once we know the dimension names, we have all the information we need to put the `Start()` call together: @@ -137,14 +192,14 @@ data <- Start(dat = repos, longitude = 'all') ``` -For each of the dimensions, the values of interest can be specified in three possible ways: -- Using one or more numeric indices, for example `sdate = indices(1)`, or `time = indices(1, 3, 5)`. -- Using one or more actual values, for example `sdate = values('19930101')`, or `ensemble = values(c('r1i1p1', 'r2i1p1'))`, or `latitude = values(10, 10.5, 11)`. The `values()` helper function can be omitted (as shown in the example). +For each of the dimensions, the values or indices of interest (a.k.a. selectors) can be specified in three possible ways: +- Using one or more numeric indices, for example `time = indices(c(1, 3, 5))`, or `sdate = indices(3:5)`. In the latter case, the third, fourht and fifth start dates appearing in the file system in alphabetical order would be selected ('19930301', '19930401' and '19930501'). +- Using one or more actual values, for example `sdate = values('19930101')`, or `ensemble = values(c('r1i1p1', 'r2i1p1'))`, or `latitude = values(c(10, 10.5, 11))`. The `values()` helper function can be omitted (as shown in the example). See Note 2 for details on how values are handled for inner dimensions. - Using a list of two numeric values, for example `sdate = indices(list(5, 10))`. This will take all indices from the 5th to the 10th. - Using a list of two actual values, for example `sdate = values(list('r1i1p1', 'r5i1p1'))` or `latitude = values(list(-45, 75))`. This will take all values, in order, placed between the two values specified (both ends included). - Using the special keywords 'all', 'first' or 'last'. -Also, the dimensions specified in the `Start()` call do not need to follow any specific order, not even the actual order in the path pattern or inside the file. The order, though, can have an impact on the performance of `Start()` as explained later in this section. +The dimensions specified in the `Start()` call do not need to follow any specific order, not even the actual order in the path pattern or inside the file. The order, though, can have an impact on the performance of `Start()` as explained in Note 3. Running the `Start()` call shown above will display some progress and information messages: @@ -172,10 +227,13 @@ Warning messages: The warnings shown are normal, and could be avoided with a more wordy specification of the parameters to the `Start()` function. -The dimensions of the selected data set and the total size are shown. As you have probably noticed, this `Start()` call is very fast, even though several GB of data are involved. This is because `Start()` is simply discovering the location and dimension of the involved data. +The dimensions of the selected data set and the total size are shown. As you have probably noticed, this `Start()` call is very fast, even though several GB of data are involved (see Note 4 on the size of the data in R). This is because `Start()` is simply discovering the location and dimension of the involved data. -You can give a quick look to the collected metadata with `str(data)`. +You can give a quick look to the collected metadata as follows: +```r +str(data) +``` ```r Class 'startR_header' length 9 Start(dat = "/esarchive/exp/ecmwf/system5_m1/6hourly/$var$/$var$_$sdate$.nc", var = "tas", sdate = "19930101", ensemble = "all", time = "all", latitude = "all", ... ..- attr(*, "Dimensions")= Named num [1:7] 1 1 1 25 860 ... @@ -200,8 +258,23 @@ The retrieved information can be accessed with the `attr()` function. For exampl ```r attr(data, 'FileSelectors')$dat1 ``` +```r +$dat +$dat[[1]] +[1] "dat1" + + +$var +$var[[1]] +[1] "tas" + + +$sdate +$sdate[[1]] +[1] "19930101" +``` -If you are interested in actually loading the entire data set in your machine you can do so in two ways (*be careful, loading the data involved in the `Start()` call in this example will most likely stall your machine. Try it with a smaller region or a subset of forecast time steps*): +If you are interested in actually loading the entire data set in your machine you can do so in two ways (***be careful****, loading the data involved in the `Start()` call in this example will most likely stall your machine. Try it with a smaller region or a subset of forecast time steps*): - adding the parameter `retrieve = TRUE` in your `Start()` call. - evaluating the object returned by `Start()`: `data_load <- eval(data)` @@ -209,34 +282,19 @@ You may realize that this functionality is similar to the `Load()` function in t There are no constrains for the number or names of the outer or inner dimensions used in a `Start()` call. In other words, `Start()` will handle NetCDF files with any number of dimensions with any name, as well as files distributed across folders in complex ways, since you can use customized wildcards in the path pattern. -Explanation on the order of dimensions. - -Synonims. +There are a number of advanced parameters and features in `Start()` to handle heterogeneities across files involved in a `Start()` call, such as the `synonims` parameter, or to handle dimensions extending across multiple NetCDF files, such as the `*_across` parameter. See the documentation on the `Start()` function (https://earth.bsc.es/gitlab/es/startR/blob/master/vignettes/start.md) or in `?Start` for more information. -Dimensions across. +***Note 1 ****on the 'var' dimension*: as mentioned above in this section, `NcVarReader()` is showing as if a virtual dimension 'var' appears inside the file. The existence of this dimension is justified by the fact that, many times, NetCDF files contain more than one variable. The 'var' dimension should hence be considered a "inner" dimension. But, in our example, the dimension 'var' is also defined as a file dimension in the path pattern. So, following the logic of `Start()`, there would be two 'var' dimensions, one of them outer and the other inner, and we should consequently specify indices for each of them. However, as exception, they are automatically understood to be the same dimension, and the target variable name specified as index for the outer 'var' dimension is also re-used to select the variable inside the file. This is a feature triggered only by the 'var' dimension name and, if other dimension names appeared more than once as inner or outer dimensions, `Start()` would crash throw an error. The feature described here is useful for the very common case where file paths contain the variable name and that variable is the only climate variable inside the file. If this feature was not available, one could still define the data set as shown in the code snippet below, where there would be some redundancy in the `Start()` call and in the dimensions of the resulting array. -See the documentation on the `Start()` function (https://earth.bsc.es/gitlab/es/startR/blob/master/vignettes/start.md) or in `?Start` for more information. +***Note 2 ****on providing values as selectors for inner dimensions*: when values are requested for a inner dimension, the corresponding numeric indices are automatically calculated by comparing the provided values with a variable inside the file with the same name as the dimension for which the values have been requested. In the last example where specific values are requested for the latitudes, the variable 'latitude' is automatically retrieved from the files. If the name of the variable does not coincide with the name of the dimension, the parameter `*_var` can be specified in the `Start()` call, as detailed in `?Start`. -*Note on the 'var' dimension*: as mentioned above in this section, `NcVarReader()` is showing as if a virtual dimension 'var' appears inside the file. The existence of this dimension is justified by the fact that, many times, NetCDF files contain more than one variable. The 'var' dimension should hence be considered a "inner" dimension. But, in our example, the dimension 'var' is also defined as a file dimension in the path pattern. So, following the logic of `Start()`, there would be two 'var' dimensions, one of them outer and the other inner, and we should consequently specify indices for each of them. However, as exception, they are automatically understood to be the same dimension, and the target variable name specified as index for the outer 'var' dimension is also re-used to select the variable inside the file. This is a feature triggered only by the 'var' dimension name and, if other dimension names appeared more than once as inner or outer dimensions, `Start()` would crash throw an error. The feature described here is useful for the very common case where file paths contain the variable name and that variable is the only climate variable inside the file. If this feature was not available, one could still define the data set as shown in the code snippet below, where there would be some redundancy in the `Start()` call and in the dimensions of the resulting array.* +***Note 3 ****on the order of dimensions*: neither the file dimensions nor the inner dimensions need to be specified in the same order as they appear in the path pattern or inside the NetCDF files, respectively. The resulting arrays returned by `Start()` will have the dimensions in the same order as requested in `Start()`, so changing the order in the call can potentially trigger automatic reordering of data, which is time consuming. But, depending on the use case, it may be a good idea to ask for a specific dimension order so that the data is properly arranged for making posterior calculations more efficient. Remember that the order of the dimensions in R is "big endian"; the values are consecutive along the first (left-most) dimension. In contrast, the order of dimensions in NetCDF files is "little endian". This means that if you want to respect the order of the data values in memory as stored in the NetCDF files, you should request the dimensions in your `Start()` call in the opposite order. -```r -repos <- '/esarchive/exp/ecmwf/system5_m1/6hourly/$var_out$/$var_out$_$sdate$.nc' - -data <- Start(dat = repos, - # outer dimensions - var_out = 'tas', - sdate = '19930101', - # inner dimensions - var = 'tas', - ensemble = 'all', - time = 'all', - latitude = 'all', - longitude = 'all') -``` +***Note 4 ****on the size of the data in R*: if you check the size of the involved file in the example `Start()` call used above ('/esarchive/exp/ecmwf/system5_m1/6hourly/tas/tas_19930101.nc'), you will realize it only weights 34GB. Why is the data reported to occupy 134GB then? This is due to two facts: by one side, NetCDF files are usually compressed, and their uncompressed size can be substantially greater. In this case, the uncompressed data would occupy about 72GB. Besides, the variable we are targetting in the example ('tas') is defined as a float variable inside the NetCDF file. This means each value is a 4-byte real number. However, R cannot represent 4-byte real numbers; it always takes 8 bytes to represent a real number. This is why, when float numbers are represented in R, the effective size of the data is doubled. In this case, the 72GB of uncompressed float numbers need to be represented using 132GB of RAM in R. ### Step() and AddStep() -Once the data sources are declared, you can define the operation to be applied. The operation needs to be encapsulated in the form of an R function receiving one or more multidimensional arrays (plus additional helper parameters) and returning one or more multidimensional arrays. For example: +Once the data sources are declared, you can define the operation to be applied to them. The operation needs to be encapsulated in the form of an R function receiving one or more multidimensional arrays with named dimensions (plus additional helper parameters) and returning one or more multidimensional arrays, which should also have named dimensions. For example: ```r fun <- function(x) { @@ -249,9 +307,9 @@ fun <- function(x) { } ``` -This function receives only one multidimensional array (with dimensions c('ensemble' and 'time'), although not expressed in the code), and returns one multidimensional array (with a single dimension c('time') of length 1). +As you see, the function only needs to operate on the essential dimensions, not on the whole set of dimensions of the data set. This example function receives only one multidimensional array (with dimensions 'ensemble' and 'time', although not expressed in the code), and returns one multidimensional array (with a single dimension 'time' of length 1). startR will automatically feed the function with subsets of data with only the essential dinmensions, but first, a startR "step" for the function has to be built with with the `Step()` function. -Having the function, the startR Step for this operation can be defined with the function `Step()` which requires, for a proper functioning, to specify the names of the dimensions of the input arrays expected by the function (in this example, a single array with the dimensions 'ensemble' and 'time'), as well as the names of the dimensions of the arrays the function returns: +The `Step()` function requires, as parameters, the names of the dimensions of the input arrays expected by the function (in this example, a single array with the dimensions 'ensemble' and 'time'), as well as the names of the dimensions of the arrays the function returns (in this example, a single array with the dimension 'time'): ```r step <- Step(fun = fun, @@ -259,130 +317,394 @@ step <- Step(fun = fun, output_dims = c('time')) ``` -Finally, a workflow of steps can be assembled as follows: +The step function should ideally expect arrays with the dimensions in the same order as requested in the `Start()` call, and consequently the dimension names specified in the `Step()` function should appear in the same order. If a different order was specified, startR would reorder the subsets for the step function to receive them in the expected dimension order. + +Functions that receive or return multiple multidimensional arrays are also supported. In such cases, lists of vectors of dimension names should be provided as `target_dims` or `output_dims`. + +Since functions wrapped with the `Step()` function will potentially be called thousands of times, it is recommended to keep them as light as possible by, for example, avoiding calls to the `library()` function to load other packages or interacting with files on disk. See the documentation on the parameter `use_libraries` of the `Step()` function, or consider adding additional parameters to the step function with extra information. + +Once the step is built, a workflow of steps can be assembled as follows: ```r wf <- AddStep(data, step) ``` -Functions that receive or return multiple multidimensional arrays are also supported by specifying lists of vectors of dimension names as `target_dims` or `output_dims`. - -It is not possible for now to define workflows with more than one step. This is pending future work. +If the step involved more than one data source, a list of data sources could be provided as first parameter. You can find examples using more than one data source further in this guide. -Since functions wrapped with the `Step()` function will potentially be called thousands of times, it is recommended to keep them as light as possible by, for example, avoiding calls to the `library()` function to load other packages or interacting with files on disk. See the documentation on the parameter `use_libraries` of the `Step()` function, or consider adding additional parameters to the step function with extra information. +It is not possible for now to define workflows with more than one step, but this is not a crucial gap since a step function can contain more than one statistical analysis procedure. Furthermore, it is usually enough to perform only the first or two first steps of the analysis workflow on the HPCs because, after these steps, the volume of data involved is reduced substantially and the analysis can go on with conventional methods. ### Compute() -Once the data sources are declared and the workflow is defined, you can proceed to specify the execution parameters (including which platform to run on) and trigger the execution. +Once the data sources are declared and the workflow is defined, you can proceed to specify the execution parameters (including which platform to run on) and trigger the execution with the `Compute()` function. + +Next, a few examples are shown with `Compute()` calls to trigger the processing of a dataset locally (only on the machine where the R session is running) and on two different HPCs (the Earth Sciences fat nodes and CTE-Power9). However, let's first define a `Start()` call that involves a smaller subset of data in order not to make the examples too heavy. + +```r +library(startR) + +repos <- '/esarchive/exp/ecmwf/system5_m1/6hourly/$var$/$var$_$sdate$.nc' +data <- Start(dat = repos, + var = 'tas', + sdate = '19930101', + ensemble = 'all', + time = indices(1:2), + latitude = 'all', + longitude = 'all') + +fun <- function(x) { + # Expected inputs: + # x: array with dimensions ('ensemble', 'time') + apply(x + 1, 2, mean) +} + +step <- Step(fun, + target_dims = c('ensemble', 'time'), + output_dims = c('time')) -Next, a few examples show StartR codes to process datasets locally and on two example HPCs at BSC: the fat nodes and CTE-Power. +wf <- AddStep(data, step) +``` + +The output of the `Start()` call follows, where the size of the selected data is reported. +```r +* Exploring files... This will take a variable amount of time depending +* on the issued request and the performance of the file server... +* Detected dimension sizes: +* dat: 1 +* var: 1 +* sdate: 1 +* ensemble: 25 +* time: 2 +* latitude: 640 +* longitude: 1296 +* Total size of involved data: +* 1 x 1 x 1 x 25 x 2 x 640 x 1296 x 8 bytes = 316.4 Mb +* Successfully discovered data dimensions. +Warning messages: +1: ! Warning: Parameter 'pattern_dims' not specified. Taking the first dimension, +! 'dat' as 'pattern_dims'. +2: ! Warning: Could not find any pattern dim with explicit data set descriptions (in +! the form of list of lists). Taking the first pattern dim, 'dat', as +! dimension with pattern specifications. +``` #### Compute() locally -When only your own workstation is available, StartR can still be useful to process a very large dataset by chunks, avoiding a RAM memory overload and crash of the workstation. StartR will simply load the dataset by chunks and each of them will be processed sequentially. The operations defined in the workflow will be applied to each chunk, and the results will be stored on a temporary file. `Compute()` will finally gather and merge the results of each chunk and return a single data object, including one or multiple multidimensional data arrays, and additional metadata. +When only your own workstation is available, startR can still be useful to process a very large dataset by chunks, thus avoiding a RAM memory overload and consequent crash of the workstation. startR will simply load and process the dataset by chunks, one after the other. The operations defined in the workflow will be applied to each chunk, and the results will be stored on a temporary file. `Compute()` will finally gather and merge the results of each chunk and return a single data object, including one or multiple multidimensional data arrays, and additional metadata. + +A list of the dimensions which to split the data along, and the number of slices (or "chunks") to make for each, is the only piece of information required for `Compute()` to run locally. It will only be possible to request chunks for those dimensions not required by any of the steps in the workflow built by `Step()` and `AddStep()`. + +Following the worklfow of steps defined in the example, where the step uses 'time' and 'ensemble' as target dimensions, the dimensions remaining for chunking would be 'dat', 'var', 'sdate', 'latitude' and 'longitude'. Note that defining a step which has many target dimensions should be avoided as it will reduce the chunking options. + +As an example, we could request for computation performing two chunks along the 'latitude' dimension, and two chunks along the 'longitude' dimension. This would result in the data being processed in 4 chunks of about 80MB (the size of the involved data, 316MB, divided by 4). + +Calculate the size of the chunks before executing the computation, and make sure they fit in the RAM of your machine. You should have as much free RAM as 2x or 3x the expected size of one chunk. Read more on adjusting the number of chunks in the section "How to choose the number of chunks, jobs and cores". + +```r +res <- Compute(wf, + chunks = list(latitude = 2, + longitude = 2)) +``` + +Once the `Compute()` call is executed, the R session will wait for it to return the result. Progress messages will be shown, with a remaining time estimate after the first chunk has been processed. +```r +* Processing chunks... remaining time estimate soon... +* Loading chunk 1 out of 4 ... +* Processing... +* Remaining time estimate (at 2019-01-27 23:31:07) (neglecting merge +* time): 1.662909 mins +* Loading chunk 2 out of 4 ... +* Processing... +* Loading chunk 3 out of 4 ... +* Processing... +* Loading chunk 4 out of 4 ... +* Processing... +``` + +After all chunks have been processed, a summary of the execution and its performance is reported. +```r +* Computation ended successfully. +* Number of chunks: 4 +* Max. number of concurrent chunks (jobs): 1 +* Requested cores per job: NA +* Load threads per chunk: 1 +* Compute threads per chunk: 1 +* Total time (s): 123.636801242828 +* Chunking setup: 0.000516414642333984 +* Data upload to cluster: 0 +* All chunks: 123.416877985001 +* Transfer results from cluster: 0 +* Merge: 0.219406843185425 +* Each chunk: +* queue: +* mean: 0 +* min: 0 +* max: 0 +* job setup: +* mean: 0 +* min: 0 +* max: 0 +* load: +* mean: 3.07621091604233 +* min: 2.05077886581421 +* max: 5.05771064758301 +* compute: +* mean: 27.7766432762146 +* min: 27.4815557003021 +* max: 28.199684381485 +``` + +Also, some warning messages will be displayed corresponding to the execution of the `Start()` function to retrieve the data for each chunk. + +`Compute()` will return a list of data arrays in your R session, one data array for each output returned by the last step in the specified workflow. In the example here, ony one array is returned. +```r +str(res) +``` +```r +List of 1 + $ output1: num [1:2, 1, 1, 1, 1:640, 1:1296] 250 250 251 250 251 ... + - attr(*, "startR_compute_profiling")=List of 14 + ..$ nchunks : num 4 + ..$ concurrent_chunks: num 1 + ..$ cores_per_job : logi NA + ..$ threads_load : num 1 + ..$ threads_compute : num 1 + ..$ bychunks_setup : num 0.000516 + ..$ transfer : num 0 + ..$ queue : num 0 + ..$ job_setup : num 0 + ..$ load : num [1:4] 5.06 3.14 2.06 2.05 + ..$ compute : num [1:4] 28.2 27.8 27.5 27.7 + ..$ transfer_back : num 0 + ..$ merge : num 0.219 + ..$ total : num 124 +``` + +The configuration details and profiling information are attached as attributes to the returned list of arrays. + +If you check the dimensions of one of the ouput arrays, you will see that it preserves named dimensions, and that the output dimensions of the workflow steps appear on the first prositions (left-most). + +```r +dim(res$output1) +``` +```r + time dat var sdate latitude longitude + 2 1 1 1 640 1296 +``` + +In addition to performing the computation in chunks, you can adjust the number of execution threads to use for the data retrieval stage (with `threads_load`) and for the computation (with `threads_compute`). Using more than 2 threads for the retrieval will usually be perjudicial, since two will already be able to make full use of the bandwidth between the workstation and the data repository. The optimal number of threads for the computation will depend on the number of processors in your machine, the number of cores they have, and the number of threads supported by each of them. ```r res <- Compute(wf, chunks = list(latitude = 2, longitude = 2), - threads_load = 1, - threads_compute = 2, - silent = FALSE, - debug = FALSE, - wait = FALSE) + threads_load = 2, + threads_compute = 4) +``` +```r +* Computation ended successfully. +* Number of chunks: 4 +* Max. number of concurrent chunks (jobs): 1 +* Requested cores per job: NA +* Load threads per chunk: 2 +* Compute threads per chunk: 4 +* Total time (s): 44.6467976570129 +* Chunking setup: 0.000483036041259766 +* Data upload to cluster: 0 +* All chunks: 44.4387269020081 +* Transfer results from cluster: 0 +* Merge: 0.207587718963623 +* Each chunk: +* queue: +* mean: 0 +* min: 0 +* max: 0 +* job setup: +* mean: 0 +* min: 0 +* max: 0 +* load: +* mean: 3.08622789382935 +* min: 2.77512788772583 +* max: 3.93441939353943 +* compute: +* mean: 8.02220791578293 +* min: 8.01178908348083 +* max: 8.03660178184509 ``` -compute will return a data array, as if it was a variable in your R session +#### Compute() on CTE-Power 9 -discuss ecFlow +In order to run the computation on a HPC, such as the BSC CTE-Power 9, you will need to make sure the passwordless connection with the login node of that HPC is configured, as shown at the beginning of this guide. If possible, in both directions. Also, you will need to know whether there is a shared file system between your workstation and that HPC, and will need information on the number of nodes, cores per node, threads per core, RAM memory per node, and type of workload used by that HPC (Slurm, PBS and LSF supported). -discuss plotProfiling +You will need to add two parameters to your `Compute()` call: `cluster` and `ecflow_suite_dir`. -#### Compute() on the fat nodes +The parameter `ecflow_suite_dir` expects a path to a folder in the workstation where to store temporary files generated for the automatic management of the workflow. As you will see later, the EC-Flow workflow manager is used transparently for this purpose. +The parameter `cluster` expects a list with a number of components that will have to be provided a bit differently depending on the HPC you want to run on. You can see next an example cluster configuration that will execute the previously defined workflow on CTE-Power 9. ```r res <- Compute(wf, chunks = list(latitude = 2, longitude = 2), - threads_load = 1, - threads_compute = 2, - cluster = list(queue_host = 'bsceslogin01.bsc.es', + threads_load = 2, + threads_compute = 4, + cluster = list(queue_host = 'p9login1.bsc.es', queue_type = 'slurm', - temp_dir = '/home/Earth/nmanuben/startR_tests/', - cores_per_job = 2, + temp_dir = '/gpfs/scratch/bsc32/bsc32473/startR_hpc/', + lib_dir = '/gpfs/projects/bsc32/share/R_libs/3.5/', + r_module = 'R/3.5.0-foss-2018b', + cores_per_job = 4, job_wallclock = '00:10:00', max_jobs = 4, - bidirectional = TRUE + extra_queue_params = list('#SBATCH --mem-per-cpu=3000'), + bidirectional = FALSE, + polling_period = 10 ), - ecflow_suite_dir = '/home/Earth/nmanuben/test_remove/') + ecflow_suite_dir = '/home/Earth/nmanuben/startR_local/' + ) ``` -'queue_host' must match the 'short_name_of_the_host' you associated to the login node of the selected HPC in your .ssh/config file. +The cluster components and options are explained next: +- `queue_host`: 'queue_host' must match the 'short_name_of_the_host' you associated to the login node of the selected HPC in your .ssh/config file. +- `queue_type`: +- `temp_dir`: +- `lib_dir`: +- `r_module`: +- `cores_per_job`: +- `job_wallclock`: +- `max_jobs`: +- `extra_queue_params`: better qos, memory, special release: '#SBATCH --reservation=test-rhel-7.5' +- `bidirectional`: +- `polling_period`: +EEEEEEEEEEEEEEEEEEEEEEEEE -#### Compute() on CTE-Power +After the `Compute()` call is executed, an EC-Flow server is automatically started on your workstation, which will orchestrate the work and dispatch jobs onto the HPC. Thanks to the use of EC-Flow, you will also be able to monitor visually the progress of the execution. See the "Collect and the EC-Flow GUI" section. + +The following messages will be displayed upon execution: ```r -library(startR) +* ATTENTION: Dispatching chunks on a remote cluster. Make sure +* passwordless access is properly set in both directions. +ping server(bscearth329:5678) succeeded in 00:00:00.001342 ~1 milliseconds +server is already started +* Processing chunks... +* Remaining time estimate soon... +``` -#repos <- '/esarchive/exp/ecmwf/system5_m1/6hourly/$var$/$var$_$sdate$.nc' -repos <- '/esarchive/exp/ecmwf/system5_m1/6hourly/$var$-longitudeS1latitudeS1all/$var$_$sdate$.nc' -data <- Start(dat = repos, - var = 'tas', - #sdate = 'all', - sdate = indices(1), - ensemble = 'all', - time = 'all', - #latitude = 'all', - latitude = indices(1:40), - #longitude = 'all', - longitude = indices(1:40), - retrieve = FALSE) -lons <- attr(data, 'Variables')$common$longitude -lats <- attr(data, 'Variables')$common$latitude +At this point, you may want to check the jobs are being dispatched and executed properly onto the HPC. For that, you can either use the EC-Flow GUI (covered in the next section), or you can `ssh` to the login node of the HPC and check the status of the queue with `squeue` or `qstat`, as shown below. +``` +[bsc32473@p9login1 ~]$ squeue + JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) + 1142418 main /STARTR_ bsc32473 R 0:12 1 p9r3n08 + 1142419 main /STARTR_ bsc32473 R 0:12 1 p9r3n08 + 1142420 main /STARTR_ bsc32473 R 0:12 1 p9r3n08 + 1142421 main /STARTR_ bsc32473 R 0:12 1 p9r3n08 +``` -fun <- function(x) apply(x + 1, 2, mean) -step <- Step(fun, c('ensemble', 'time'), c('time')) -wf <- AddStep(data, step) +Here the output of the execution on CTE-Power 9 after waiting for about a minute: +```r +* Remaining time estimate (neglecting queue and merge time) (at +* 2019-01-28 01:16:59): 0 mins (46.22883 secs per chunk) +* Computation ended successfully. +* Number of chunks: 4 +* Max. number of concurrent chunks (jobs): 4 +* Requested cores per job: 4 +* Load threads per chunk: 2 +* Compute threads per chunk: 4 +* Total time (s): 58.3834805488586 +* Chunking setup: 1.13428020477295 +* Data upload to cluster: 0 +* All chunks: 52.9015641212463 +* Transfer results from cluster: 4.05326294898987 +* Merge: 0.294373273849487 +* Each chunk: +* queue: +* mean: 10.5 +* min: 7 +* max: 14 +* job setup: +* mean: 3.3917236328125 +* min: 1.16518902778625 +* max: 5.61825823783875 +* load: +* mean: 5.77260076999664 +* min: 5.27595114707947 +* max: 6.26925039291382 +* compute: +* mean: 23.7037765979767 +* min: 23.4937765598297 +* max: 23.9137766361237 +* Computation ended successfully. +Warning messages: +1: ! Warning: Parameter 'ecflow_server' has not been specified but execution on +!cluster has been requested. An ecFlow server instance will +!be created on localhost:5678. +2: ! Warning: ATTENTION: The source chunks will be removed from the +!system. Store the result after Collect() ends if needed. +``` + +As you have probably realized, this execution has been slower than the local execution, even if 4 simultaneous jobs have been executed on CTE-Power. This is due to the small size of the data being processed here. The overhead of queuing and starting jobs at CTE-Power is large compared to the required computation time for this amount of data. The benefit would be obvious in use cases with larger inputs. +Usually, in use cases with larger data inputs, it will be preferrable to add the parameter `wait = FALSE` to your `Compute()` call. With this parameter, `Compute()` will return an object with all the information on your startR execution which you will be able to store in your disk. After doing that, you will be able to close your R session and collect the results later on with the `Collect()` function. This is discussed in the next section. + +As mentioned above in the definition of the `cluster` parameters, it is strongly recommended to check the section on "How to choose the number of chunks, jobs and cores". + +#### Compute() on the fat nodes and other HPCs + +The `Compute()` call with the parameters to run the example in this section on the BSC ES fat nodes is provided below (you will need to adjust some of the parameters before using it). As you can see, the only thing that needs to be changed to execute startR on a different HPC is the definition of the `cluster` parameters. + +The `cluster` configuration for the fat nodes, CTE-Power 9, Marenostrum 4, Nord III, Minotauro and ECMWF cca/ccb are all provided at the very end of this guide. + +```r res <- Compute(wf, chunks = list(latitude = 2, longitude = 2), - threads_load = 1, - threads_compute = 2, - cluster = list(queue_host = 'p9login1.bsc.es', + threads_load = 2, + threads_compute = 4, + cluster = list(queue_host = 'bsceslogin01.bsc.es', queue_type = 'slurm', - temp_dir = '/gpfs/scratch/bsc32/bsc32473/startR_tests/', - lib_dir = '/gpfs/projects/bsc32/share/R_libs/3.5/', - r_module = 'R/3.5.0', + temp_dir = '/home/Earth/nmanuben/startR_hpc/', cores_per_job = 2, job_wallclock = '00:10:00', max_jobs = 4, - #extra_queue_params = list('#SBATCH --qos=bsc_es'), - bidirectional = FALSE, - polling_period = 10 + bidirectional = TRUE ), - ecflow_suite_dir = '/home/Earth/nmanuben/test_remove/', - ecflow_server = NULL, - silent = FALSE, - debug = FALSE, - wait = FALSE) + ecflow_suite_dir = '/home/Earth/nmanuben/startR_local/') +``` +### Collect() and the EC-Flow GUI + +EEEEEEEEEEEEEE + +```r result <- Collect(res, wait = TRUE) ``` +discuss collect +discuss ecFlow ### Additional information -#### Tricks and best practices +#### How to choose the number of chunks, jobs and cores + +EEEEEEEEEEEEE + +#### How to clean a failed execution + +EEEEEEEEEEEE -How to select number of chunks +#### Visualizing the profiling of the execution -What to do if my function requires all dimensions +EEEEEEEEEEEEEE +Discuss PlotProfiling() + +#### What to do if your function has too many target dimensions #### Pending features Computation of weekly means with startR is still pending future work. By now, it is not possible to do that because the metadata associated to each chunk, such as the dates, is not being sent to the `Compute()` function. -#### Example using experimental and (date-corresponding) observational data +### Other examples + +#### Using experimental and (date-corresponding) observational data ```r repos <- paste0('/esnas/exp/ecmwf/system4_m1/6hourly/', @@ -444,7 +766,11 @@ res <- Compute(step, list(system4, erai), wait = FALSE) ``` -### Example on MareNostrum 4 +#### Example of computation of weekly means + +#### Example with data on an irregular grid with selection of a region + +#### Example on MareNostrum 4 ```r library(startR) @@ -496,7 +822,9 @@ res <- Compute(wf, wait = TRUE) ``` -## Seasonal forecast verification example on cca +#### Example on CTE-Power using GPUs + +#### Seasonal forecast verification example on cca ```r crps <- function(x, y) { @@ -545,13 +873,63 @@ r <- Compute(wf, ) ``` -### Compute() cluster template for Nord III +### Compute() cluster templates + +#### CTE-Power9 + +```r +cluster = list(queue_host = 'p9login1.bsc.es', + queue_type = 'slurm', + temp_dir = '/gpfs/scratch/bsc32/bsc32473/startR_hpc/', + lib_dir = '/gpfs/projects/bsc32/share/R_libs/3.5/', + r_module = 'R/3.5.0-foss-2018b', + cores_per_job = 4, + job_wallclock = '00:10:00', + max_jobs = 4, + bidirectional = FALSE, + polling_period = 10 + ) +``` + +#### BSC ES fat nodes + +```r +cluster = list(queue_host = 'bsceslogin01.bsc.es', + queue_type = 'slurm', + temp_dir = '/home/Earth/nmanuben/startR_hpc/', + cores_per_job = 2, + job_wallclock = '00:10:00', + max_jobs = 4, + bidirectional = TRUE + ) +``` + +#### Marenostrum 4 + +```r +cluster = list(queue_host = 'mn2.bsc.es', + queue_type = 'slurm', + data_dir = '/gpfs/projects/bsc32/share/startR_data_repos/', + temp_dir = '/gpfs/scratch/pr1efe00/pr1efe03/startR_hpc/', + lib_dir = '/gpfs/projects/bsc32/share/R_libs/3.4/', + r_module = 'R/3.4.0', + cores_per_job = 2, + job_wallclock = '00:10:00', + max_jobs = 4, + extra_queue_params = list('#SBATCH --qos=prace'), + bidirectional = FALSE, + polling_period = 10, + special_setup = 'marenostrum4' + ) +``` + +#### Nord III ```r cluster = list(queue_host = 'nord1.bsc.es', queue_type = 'lsf', data_dir = '/gpfs/projects/bsc32/share/startR_data_repos/', - temp_dir = '/gpfs/scratch/bsc32/bsc32473/startR_tests/', + temp_dir = '/gpfs/scratch/bsc32/bsc32473/startR_hpc/', lib_dir = '/gpfs/projects/bsc32/share/R_libs/3.3/', init_commands = list('module load intel/16.0.1'), r_module = 'R/3.3.0', @@ -565,13 +943,13 @@ cluster = list(queue_host = 'nord1.bsc.es', ) ``` -### Compute() cluster template for MinoTauro +#### MinoTauro ```r cluster = list(queue_host = 'mt1.bsc.es', queue_type = 'slurm', data_dir = '/gpfs/projects/bsc32/share/startR_data_repos/', - temp_dir = '/gpfs/scratch/bsc32/bsc32473/startR_tests/', + temp_dir = '/gpfs/scratch/bsc32/bsc32473/startR_hpc/', lib_dir = '/gpfs/projects/bsc32/share/R_libs/3.3/', r_module = 'R/3.3.3', cores_per_job = 2, @@ -584,6 +962,14 @@ cluster = list(queue_host = 'mt1.bsc.es', ) ``` -### Example on CTE-Power using GPUs - +#### ECMWF ecgate (workstation) + cca/ccb (HPC) +```r +cluster = list(queue_host = 'cca', + queue_type = 'pbs', + max_jobs = 10, + init_commands = list('module load ecflow'), + r_module = 'R/3.3.1', + extra_queue_params = list('#PBS -l EC_billing_account=spesiccf') + ) +``` -- GitLab From 66ab948c45888ed5ebb3a46e1dc657bde6ba337b Mon Sep 17 00:00:00 2001 From: Nicolau Manubens Date: Mon, 28 Jan 2019 04:24:01 +0100 Subject: [PATCH 09/20] Progress in practical guide. --- README.md | 2 +- inst/doc/practical_guide_bsc.md | 200 +++++++++++++++++++++++++++----- 2 files changed, 172 insertions(+), 30 deletions(-) diff --git a/README.md b/README.md index 9eede53..131c296 100644 --- a/README.md +++ b/README.md @@ -38,7 +38,7 @@ devtools::install_git('https://earth.bsc.es/gitlab/es/startR') ### How it works -An overview example of how to process a large data set is shown in the following. You can see real use cases in [**Using startR at BSC**](inst/doc/practical_guide_bsc.md), and you can find more information on the use of the `Start()` function in the [**Start()**](inst/doc/start.md) documentation page, as well as in the documentation of the functions in the package. +An overview example of how to process a large data set is shown in the following. You can see real use cases in the [**Practical guide for processing large data sets with startR**](inst/doc/practical_guide_bsc.md), and you can find more information on the use of the `Start()` function in the [**Start()**](inst/doc/start.md) documentation page, as well as in the documentation of the functions in the package. The purpose of the example in this section is simply to illustrate how the user is expected to use startR once the framework is deployed on the workstation and HPC. It shows how a simple addition and averaging operation is performed on BSC's CTE-Power HPC, over a multi-dimensional climate data set, which lives in the BSC-ES storage infrastructure. As mentioned in the introduction, the user will need to declare the involved data sources, the workflow of operations to carry out, and the computing environment and parameters. diff --git a/inst/doc/practical_guide_bsc.md b/inst/doc/practical_guide_bsc.md index 9e5c4d4..d41e727 100644 --- a/inst/doc/practical_guide_bsc.md +++ b/inst/doc/practical_guide_bsc.md @@ -1,9 +1,26 @@ -# Practical guide for processing large data sets at BSC using startR on HPCs +# Practical guide for processing large data sets with startR This guide includes explanations and practical examples for you to learn how to use startR to efficiently process large data sets in parallel on the BSC's HPCs (CTE-Power 9, Marenostrum 4, ...). See the main page of the [**startR**](README.md) project for a general overview of the features of startR, without actual guidance on how to use it. If you would like to start using startR rightaway on the BSC infrastructure, you can directly go through the "Configuring startR" section, copy/paste the basic startR script example shown at the end of the "Motivation" section onto the text editor of your preference, adjust the paths and user names specified in the `Compute()` function, and run the code in an R session after loading the relevant modules. +## Index + +1. [**Motivation**](inst/doc/practical_guide_bsc.md#motivation) +2. [**Introduction**](inst/doc/practical_guide_bsc.md#introduction) +3. [**Configuring startR**](inst/doc/practical_guide_bsc.md#configuring-startr) +4. [**Using startR**](inst/doc/practical_guide_bsc.md#using-startr) + 4.1. [**Start()**](inst/doc/practical_guide_bsc.md#start) + 4.2. [**Step() and AddStep()**](inst/doc/practical_guide_bsc.md#step-and-addstep) + 4.3. [**Compute()**](inst/doc/practical_guide_bsc.md#compute) + 4.3.1. [**Compute() locally**](inst/doc/practical_guide_bsc.md#compute-locally) + 4.3.2. [**Compute() on CTE-Power 9**](inst/doc/practical_guide_bsc.md#compute-on-cte-power-9) + 4.3.3. [**Compute() on the fat nodes and other HPCs**](inst/doc/practical_guide_bsc.md#compute-on-the-fat-nodes-and-other-hpcs) + 4.4. [**Collect() and the EC-Flow GUI**](inst/doc/practical_guide_bsc.md#collect-and-the-ec-flow-gui) + 4.5. [**Additional information**](inst/doc/practical_guide_bsc.md#additional-information) + 4.6. [**Other examples**](inst/doc/practical_guide_bsc.md#other-examples) + 4.7. [**Compute() cluster templates**](inst/doc/practical_guide_bsc.md#compute-cluster-templates) + ## Motivation What would you do if you had to apply a custom statistical analysis procedure to a 10TB climate data set? Probably, you would need to use a scripting language to write a procedure which is able to retrieve a subset of data from the file system (it would rarely be possible to handle all of it at once on a single node), encode the procedure in that language, and apply it carefully and efficiently to the data. Afterwards, you would need to think of and develop a mechanism to dispatch the job mutiple times in parallel to an HPC of your choice, each of the jobs processing a different subset of the data set. You could do this by hand, but ideally you would use EC-Flow or a similar general purpose workflow manager, which would orchestrate the work for you, and would allow you to monitor and control the progress, as well as keep an easy-to-understand record of what you did, to be reused in the future if needed. The mentioned solution, although it is the recommended way to go, is a demanding one and you could easily spend a few days until you get it running smoothly. Additionally, when developing the job script, you would be exposed to the difficulties of efficiently managing the data and applying the encoded procedure to it. @@ -22,7 +39,9 @@ Other things you can expect from startR: Things that are not supposed to be done with startR: - Curating/homogenizing model output files or generating files to be stored under /esarchive following the department/community conventions. Although metadata is understood and used by startR, its handling is not 100% consistent yet. -If startR is suitable for your use case and decide to use it, you will then need to follow the configuration steps listed in the first section of this guide to make sure startR works on your workstation with the HPC of your choice. +## Introduction + +In order to use startR you will need to follow the configuration steps listed in the first section of this guide to make sure startR works on your workstation with the HPC of your choice. Afterwards, you will need to understand and use five functions, all of them included in the startR package: - **Start()**, for declaing the data sets to be processed @@ -75,9 +94,9 @@ res <- Compute(wf, ecflow_suite_dir = '/home/Earth/nmanuben/startR_local/') ``` -***Note 1****: The data files do not need to be migrated to a database system, nor have to comply any specific convention for their file names, name, number or order of the dimensions and variables, or distribution of files in the file system. Although only files in the NetCDF format are supported for now, plug-in functions (of approximately 200 lines of code) can be programmed to add support for additional file formats.* +_**Note 1**_: The data files do not need to be migrated to a database system, nor have to comply any specific convention for their file names, name, number or order of the dimensions and variables, or distribution of files in the file system. Although only files in the NetCDF format are supported for now, plug-in functions (of approximately 200 lines of code) can be programmed to add support for additional file formats. -***Note 2****: The HPCs startR is designed to run on, are understood as multi-core multi-node clusters. startR relies on a shared file system across all HPC nodes, and does not implement any kind of distributed storage system for now.* +_**Note 2**_: The HPCs startR is designed to run on are understood as multi-core multi-node clusters. startR relies on a shared file system across all HPC nodes, and does not implement any kind of distributed storage system for now. ## Configuring startR @@ -176,7 +195,7 @@ This will show the names and lengths of the dimensions of the selected variable: 1 1296 640 25 860 ``` -***Note****: If you check the dimensions of that file with `ncdump -h`, you will realize the 'var' dimension is actually not defined inside. `NcReadDims()` and `Start()`, though, perceive the different variables inside a file as if stored along a virtual dimension called 'var'. You can ignore this for now and assume 'var' is simply a file dimension (since it appears as a wildcard in the path pattern). Read more on this in Note 1 at the end of this section.* +_**Note**: If you check the dimensions of that file with `ncdump -h`, you will realize the 'var' dimension is actually not defined inside. `NcReadDims()` and `Start()`, though, perceive the different variables inside a file as if stored along a virtual dimension called 'var'. You can ignore this for now and assume 'var' is simply a file dimension (since it appears as a wildcard in the path pattern). Read more on this in Note 1 at the end of this section._ Once we know the dimension names, we have all the information we need to put the `Start()` call together: @@ -274,7 +293,7 @@ $sdate[[1]] [1] "19930101" ``` -If you are interested in actually loading the entire data set in your machine you can do so in two ways (***be careful****, loading the data involved in the `Start()` call in this example will most likely stall your machine. Try it with a smaller region or a subset of forecast time steps*): +If you are interested in actually loading the entire data set in your machine you can do so in two ways (_**be careful**_, loading the data involved in the `Start()` call in this example will most likely stall your machine. Try it with a smaller region or a subset of forecast time steps*): - adding the parameter `retrieve = TRUE` in your `Start()` call. - evaluating the object returned by `Start()`: `data_load <- eval(data)` @@ -284,13 +303,13 @@ There are no constrains for the number or names of the outer or inner dimensions There are a number of advanced parameters and features in `Start()` to handle heterogeneities across files involved in a `Start()` call, such as the `synonims` parameter, or to handle dimensions extending across multiple NetCDF files, such as the `*_across` parameter. See the documentation on the `Start()` function (https://earth.bsc.es/gitlab/es/startR/blob/master/vignettes/start.md) or in `?Start` for more information. -***Note 1 ****on the 'var' dimension*: as mentioned above in this section, `NcVarReader()` is showing as if a virtual dimension 'var' appears inside the file. The existence of this dimension is justified by the fact that, many times, NetCDF files contain more than one variable. The 'var' dimension should hence be considered a "inner" dimension. But, in our example, the dimension 'var' is also defined as a file dimension in the path pattern. So, following the logic of `Start()`, there would be two 'var' dimensions, one of them outer and the other inner, and we should consequently specify indices for each of them. However, as exception, they are automatically understood to be the same dimension, and the target variable name specified as index for the outer 'var' dimension is also re-used to select the variable inside the file. This is a feature triggered only by the 'var' dimension name and, if other dimension names appeared more than once as inner or outer dimensions, `Start()` would crash throw an error. The feature described here is useful for the very common case where file paths contain the variable name and that variable is the only climate variable inside the file. If this feature was not available, one could still define the data set as shown in the code snippet below, where there would be some redundancy in the `Start()` call and in the dimensions of the resulting array. +_**Note 1 **on the 'var' dimension_: as mentioned above in this section, `NcVarReader()` is showing as if a virtual dimension 'var' appears inside the file. The existence of this dimension is justified by the fact that, many times, NetCDF files contain more than one variable. The 'var' dimension should hence be considered a "inner" dimension. But, in our example, the dimension 'var' is also defined as a file dimension in the path pattern. So, following the logic of `Start()`, there would be two 'var' dimensions, one of them outer and the other inner, and we should consequently specify indices for each of them. However, as exception, they are automatically understood to be the same dimension, and the target variable name specified as index for the outer 'var' dimension is also re-used to select the variable inside the file. This is a feature triggered only by the 'var' dimension name and, if other dimension names appeared more than once as inner or outer dimensions, `Start()` would crash throw an error. The feature described here is useful for the very common case where file paths contain the variable name and that variable is the only climate variable inside the file. If this feature was not available, one could still define the data set as shown in the code snippet below, where there would be some redundancy in the `Start()` call and in the dimensions of the resulting array. -***Note 2 ****on providing values as selectors for inner dimensions*: when values are requested for a inner dimension, the corresponding numeric indices are automatically calculated by comparing the provided values with a variable inside the file with the same name as the dimension for which the values have been requested. In the last example where specific values are requested for the latitudes, the variable 'latitude' is automatically retrieved from the files. If the name of the variable does not coincide with the name of the dimension, the parameter `*_var` can be specified in the `Start()` call, as detailed in `?Start`. +_**Note 2 **on providing values as selectors for inner dimensions_: when values are requested for a inner dimension, the corresponding numeric indices are automatically calculated by comparing the provided values with a variable inside the file with the same name as the dimension for which the values have been requested. In the last example where specific values are requested for the latitudes, the variable 'latitude' is automatically retrieved from the files. If the name of the variable does not coincide with the name of the dimension, the parameter `*_var` can be specified in the `Start()` call, as detailed in `?Start`. -***Note 3 ****on the order of dimensions*: neither the file dimensions nor the inner dimensions need to be specified in the same order as they appear in the path pattern or inside the NetCDF files, respectively. The resulting arrays returned by `Start()` will have the dimensions in the same order as requested in `Start()`, so changing the order in the call can potentially trigger automatic reordering of data, which is time consuming. But, depending on the use case, it may be a good idea to ask for a specific dimension order so that the data is properly arranged for making posterior calculations more efficient. Remember that the order of the dimensions in R is "big endian"; the values are consecutive along the first (left-most) dimension. In contrast, the order of dimensions in NetCDF files is "little endian". This means that if you want to respect the order of the data values in memory as stored in the NetCDF files, you should request the dimensions in your `Start()` call in the opposite order. +_**Note 3 **on the order of dimensions_: neither the file dimensions nor the inner dimensions need to be specified in the same order as they appear in the path pattern or inside the NetCDF files, respectively. The resulting arrays returned by `Start()` will have the dimensions in the same order as requested in `Start()`, so changing the order in the call can potentially trigger automatic reordering of data, which is time consuming. But, depending on the use case, it may be a good idea to ask for a specific dimension order so that the data is properly arranged for making posterior calculations more efficient. Remember that the order of the dimensions in R is "big endian"; the values are consecutive along the first (left-most) dimension. In contrast, the order of dimensions in NetCDF files is "little endian". This means that if you want to respect the order of the data values in memory as stored in the NetCDF files, you should request the dimensions in your `Start()` call in the opposite order. -***Note 4 ****on the size of the data in R*: if you check the size of the involved file in the example `Start()` call used above ('/esarchive/exp/ecmwf/system5_m1/6hourly/tas/tas_19930101.nc'), you will realize it only weights 34GB. Why is the data reported to occupy 134GB then? This is due to two facts: by one side, NetCDF files are usually compressed, and their uncompressed size can be substantially greater. In this case, the uncompressed data would occupy about 72GB. Besides, the variable we are targetting in the example ('tas') is defined as a float variable inside the NetCDF file. This means each value is a 4-byte real number. However, R cannot represent 4-byte real numbers; it always takes 8 bytes to represent a real number. This is why, when float numbers are represented in R, the effective size of the data is doubled. In this case, the 72GB of uncompressed float numbers need to be represented using 132GB of RAM in R. +_**Note 4 **on the size of the data in R_: if you check the size of the involved file in the example `Start()` call used above ('/esarchive/exp/ecmwf/system5_m1/6hourly/tas/tas_19930101.nc'), you will realize it only weighs 34GB. Why is the data reported to occupy 134GB then? This is due to two facts: by one side, NetCDF files are usually compressed, and their uncompressed size can be substantially greater. In this case, the uncompressed data would occupy about 72GB. Besides, the variable we are targetting in the example ('tas') is defined as a float variable inside the NetCDF file. This means each value is a 4-byte real number. However, R cannot represent 4-byte real numbers; it always takes 8 bytes to represent a real number. This is why, when float numbers are represented in R, the effective size of the data is doubled. In this case, the 72GB of uncompressed float numbers need to be represented using 132GB of RAM in R. ### Step() and AddStep() @@ -564,18 +583,17 @@ res <- Compute(wf, ``` The cluster components and options are explained next: -- `queue_host`: 'queue_host' must match the 'short_name_of_the_host' you associated to the login node of the selected HPC in your .ssh/config file. -- `queue_type`: -- `temp_dir`: -- `lib_dir`: -- `r_module`: -- `cores_per_job`: -- `job_wallclock`: -- `max_jobs`: -- `extra_queue_params`: better qos, memory, special release: '#SBATCH --reservation=test-rhel-7.5' -- `bidirectional`: -- `polling_period`: -EEEEEEEEEEEEEEEEEEEEEEEEE +- `queue_host`: must match the 'short_name_of_the_host' you associated to the login node of the selected HPC in your .ssh/config file. +- `queue_type`: one of 'slurm', 'pbs' or 'lsf'. +- `temp_dir`: directory on the HPC where to store temporary files. Must be accessible from the HPC login node and all HPC nodes. +- `lib_dir`: directory on the HPC where the startR R package and other required R packages are installed, accessible from all HPC nodes. These installed packages must be compatible with the R module specified in `r_module`. This parameter is optional; only required when the libraries are not installed in the R module. +- `r_module`: name of the UNIX environment module to be used for R. If not specified, 'module load R' will be used. +- `cores_per_job`: number of computing cores to be requested when submitting the job for each chunk to the HPC queue. Each node may be capable of supporting more than one computing thread. +- `job_wallclock`: amount of time to reserve the resources when submitting the job for each chunk. Must follow the specific format required by the specified `queue_type`. +- `max_jobs`: maximum number of jobs (chunks) to be queued simultaneously onto the HPC queue. Submitting too many jobs could overload the bandwidth between the HPC nodes and the storage system, or could overload the queue system. +- `extra_queue_params`: list of character strings with additional queue headers for the jobs to be submitted to the HPC. Mainly used to specify the amount of memory to book for each job (e.g. '#SBATCH --mem-per-cpu=30000'), to request special queuing (e.g. '#SBATCH --qos=bsc_es'), or to request use of specific software (e.g. '#SBATCH --reservation=test-rhel-7.5'). +- `bidirectional`: whether the connection between the R workstation and the HPC login node is bidirectional (TRUE) or unidirectional from the workstation to the login node (FALSE). +- `polling_period`: when the connection is unidirectional, the workstation will ask the HPC login node for results each `polling_period` seconds. An excessively small value can overload the login node or result in temporary banning. After the `Compute()` call is executed, an EC-Flow server is automatically started on your workstation, which will orchestrate the work and dispatch jobs onto the HPC. Thanks to the use of EC-Flow, you will also be able to monitor visually the progress of the execution. See the "Collect and the EC-Flow GUI" section. @@ -673,28 +691,152 @@ res <- Compute(wf, ### Collect() and the EC-Flow GUI -EEEEEEEEEEEEEE +Usually, in use cases where large data inputs are involved, it is convenient to add the parameter `wait = FALSE` to your `Compute()` call. With this parameter, `Compute()` will immediately return an object with information about your startR execution. You will be able to store this object onto disk. After doing that, you will not need to worry in case your workstation turns off in the middle of the computation. You will be able to close your R session, and collect the results later on with the `Collect()` function. + +```r +res <- Compute(wf, + chunks = list(latitude = 2, + longitude = 2), + threads_load = 2, + threads_compute = 4, + cluster = list(queue_host = 'p9login1.bsc.es', + queue_type = 'slurm', + temp_dir = '/gpfs/scratch/bsc32/bsc32473/startR_hpc/', + lib_dir = '/gpfs/projects/bsc32/share/R_libs/3.5/', + r_module = 'R/3.5.0-foss-2018b', + cores_per_job = 4, + job_wallclock = '00:10:00', + max_jobs = 4, + extra_queue_params = list('#SBATCH --mem-per-cpu=3000'), + bidirectional = FALSE, + polling_period = 10 + ), + ecflow_suite_dir = '/home/Earth/nmanuben/startR_local/', + wait = FALSE + ) + +saveRDS(res, file = 'test_collect.Rds') +``` + +At this point, after storing the descriptor of the execution and before calling `Collect()`, you may want to visually check the status of the execution. You can do that with the EC-Flow graphical user interface. You need to open a new terminal, load the EC-Flow module if needed, and start the GUI: + +``` +module load ecFlow +ecflow_ui & +``` + +After doing that, a window will pop up. You will be able to monitor the status of your EC-Flow suites there. However, if it is the first time you are using the EC-Flow GUI with startR, you will need to register the EC-Flow server that has been started automatically by `Compute()`. You can open the top menu "Manage servers" > "New server" > set host to 'localhost' > set port to '5678' > save. + +Note that the host and port can be adjusted with the parameter `ecflow_server` in `Compute()`, which must be provided in the form `c(host = 'hostname', port = port_number)`. + +After registering the EC-Flow server, an expandable entry will appear, where you can see listed the jobs to be executed, one for each chunk, with their status represented by a colour. Gray means pending, blue means queuing, green means in progress, and yellow means completed. + +You will see that, if you are running on an HPC where the connection with its login node is unidirectional, the jobs remain blue (queuing). This is because the jobs, upon start or completion, cannot send the signals back. In order to retrieve this information, the `Collect()` function must be called from an R terminal. ```r +library(startR) + +res <- readRDS('test_collect.Rds') + result <- Collect(res, wait = TRUE) ``` -discuss collect -discuss ecFlow + +In this example, `Collect()` has been run with the parameter `wait = TRUE`. This will be a blocking call, in which `Collect()` will retrieve information from the HPC, including signals and outputs, each `polling_period` seconds (as described above). `Collect()` will not return until the results of all chunks have been received. Meanwhile, the status of the EC-Flow workflow on the EC-Flow GUI will be updated periodically and you will be able to monitor the status, as shown in the image below (image taken from another use case). + + + +Upon completion, `Collect()` returns the merged data array, as seen in the "Compute locally" section. + +```r +str(result) +``` +```r +List of 1 + $ output1: num [1:2, 1, 1, 1, 1:640, 1:1296] 251 252 249 246 245 ... + - attr(*, "startR_compute_profiling")=List of 14 + ..$ nchunks : num 4 + ..$ concurrent_chunks: num 4 + ..$ cores_per_job : num 4 + ..$ threads_load : num 2 + ..$ threads_compute : num 4 + ..$ bychunks_setup : num 1.16 + ..$ transfer : num 0 + ..$ queue : Named num [1:4] 18 18 18 18 + .. ..- attr(*, "names")= chr [1:4] "queue" "queue" "queue" "queue" + ..$ job_setup : Named num [1:4] 2.25 2.25 2.25 2.25 + .. ..- attr(*, "names")= chr [1:4] "job_setup" "job_setup" "job_setup" "job_setup" + ..$ load : Named num [1:4] 5.64 6.19 5.64 5.62 + .. ..- attr(*, "names")= chr [1:4] "load" "load" "load" "load" + ..$ compute : Named num [1:4] 24.5 24.4 24.3 24.4 + .. ..- attr(*, "names")= chr [1:4] "compute" "compute" "compute" "compute" + ..$ transfer_back : num 1.72 + ..$ merge : num 0.447 + ..$ total : logi NA +``` + +Note that, when the results are collected with `Collect()` instead of calling `Compute()` with the parameter `wait = TRUE`, it will not be possible to know the total time taken by the entire data processing workflow, but we will still be able to know the timings of most of the stages. + +You can also run `Collect()` with `wait = FALSE`. This will crash with an error if the results are not yet available, or will return the merged array otherwise. + +`Collect()` also has a parameter called `remove`, which by default is set to `TRUE` and triggers removal of all data results received from the HPC (and stored under `ecflow_suite_dir`). If you would like to preserve the data, you can set `remove = FALSE` and `Collect()` it as many times as desired. Alternatively, you can `Collect()` with `remove = TRUE` and store the merged array right after with `saveRDS()`. ### Additional information #### How to choose the number of chunks, jobs and cores -EEEEEEEEEEEEE +##### Number of chunks and memory per job + +Adjusting the number of chunks, simultaneous jobs, cores and memory per job is crucial for a successful and efficient execution of your startR workflow on the HPC. + +The most important choice is that of the number of chunks and memory per job. Your first priority should be to make the data chunks fit in the HPC nodes' RAM memory. + +After running the `Start()` call (or calls) for a use case, you will get a message on the terminal with the total size of the involved data. You should divide the data in as many chunks as required to make one of them fit in the blocks of RAM memory on the HPC. If you check the specifications of your target HPC, you will see that each node has a fixed amount of RAM (e.g., for CTE-Power 9, each node has 512GB of RAM). But that is not what you are looking for. You want to know the size of the RAM memory modules (e.g., for CTE-Power 9, the memory modules are of 32 GB). startR can not use memory regions allocated in multiple memory modules. + +Knowing the size of the involved data and the size of the memory modules, you can work out the ideal number of chunks with a simple division. If the data weighs 100GB and the memory modules are of 32GB, 4 chunks will be required ideally. + +However, we must take into account that the handling of these data chunks is not ideal, and that neither startR nor the functions to be applied to the data will manage the data in a 100% efficient manner. This is why the number of chunks should be finally doubled or multiplied by 3. + +The amount of memory requested for each job, in this case where the data to be processed is a few times larger than the memory modules, must be equal to the size of the memory modules. + +If the data to be processed was smaller than the memory modules, it could still be interesting to split it in chunks in order to parallelize the computation and get a faster result. In that case, the memory requested for each job must be equal to 2 or 3 times the size of one chunk. + +##### Maximum number of simultaneous jobs + +The maximum number of simultaneous jobs should be adjusted according to the capacity of the file system to deal with multiple simultanous file accesses, according to the bandwidth of the connection between the HPC nodes and the file system, and according to the number of jobs per user supported/allowed by the HPC queue. A recommended practice is to make a test with a subset of the data and try first with a small number of simultaneous jobs (e.g. 2), and keep repeating the test while increasing the number of simultaneous jobs, until the performance of the process is not improved due to any of the reasons mentioned before. + +##### Number of cores and computing threads + +The number of cores per job should be as large as possible, with two limitations: +- If the HPC nodes have each "M" total amount of memory with "m" amount of memory per memory module, and have each "N" amount of cores, the requested amount of cores "n" should be such that "n" / "N" does not exceed "m" / "M". +- Depending on the shape of the chunks, startR has a limited capability to exploit multiple cores. It is recommended to make small tests increasing the number of cores to work out a reasonable number of cores to be requested. #### How to clean a failed execution -EEEEEEEEEEEE +- Work out the startR execution ID, either by inspecting the execution description by `Compute()` when called with the parameter `wait = FALSE`, or by checking the `ecflow_suite_dir` with `ls -ltr`. +- ssh to the HPC login node and cancel all jobs of your startR execution. +- Close the R session from where your `Compute()` call was made. +- Remove the folder named with your startR execution ID under the `ecflow_suite_dir`. +- ssh to the HPC login node and remove the folder named with your startR execution ID under the `temp_dir`. +- Optionally remove the data under `data_dir` on the HPC login node if the file system is not shared between the workstation and the HPC and you do not want to keep the data in the `data_dir`, used as caché for future startR executions. +- Open the EC-Flow GUI and remove the workflow entry (a.k.a. suite) named with your startR execution ID with right click > "Remove". #### Visualizing the profiling of the execution -EEEEEEEEEEEEEE -Discuss PlotProfiling() +As seen in previous sections, profiling measurements of the execution are provided together with the data output. These measurements can be visualized with the `PlotProfiling()` function made available in the source code of the startR package. + +This function has not been included as part of the official set of functions of the package because it requires a number of extense plotting libraries which take time to load and, since the startR package is loaded in each of the worker jobs on the HPC, this could imply a substantial amount of time spent in repeatedly loading unused visualization libraries during the computing stage. + +The function takes as inputs one or a list of 'startR_compute_profiling' attribute objects from one or more `Compute()` executions. +```r +source('https://earth.bsc.es/gitlab/es/startR/raw/master/inst/PlotProfiling.R') +PlotProfiling(attr(res, 'startR_compute_profiling')) +``` + +A chart displays the timings for the different stages of the computation, as shown in the image below. Note that these results have been taken from a use case different to the ones used in this guide. + + + +You can click on the image to expand it. #### What to do if your function has too many target dimensions -- GitLab From 44ae53cf316a0f7338a6d75b5a514dc027099f7a Mon Sep 17 00:00:00 2001 From: Nicolau Manubens Date: Mon, 28 Jan 2019 04:36:03 +0100 Subject: [PATCH 10/20] Fixes in practical guide. --- inst/doc/practical_guide_bsc.md | 80 ++++++++++++++++----------------- 1 file changed, 39 insertions(+), 41 deletions(-) diff --git a/inst/doc/practical_guide_bsc.md b/inst/doc/practical_guide_bsc.md index d41e727..a7df125 100644 --- a/inst/doc/practical_guide_bsc.md +++ b/inst/doc/practical_guide_bsc.md @@ -10,16 +10,16 @@ If you would like to start using startR rightaway on the BSC infrastructure, you 2. [**Introduction**](inst/doc/practical_guide_bsc.md#introduction) 3. [**Configuring startR**](inst/doc/practical_guide_bsc.md#configuring-startr) 4. [**Using startR**](inst/doc/practical_guide_bsc.md#using-startr) - 4.1. [**Start()**](inst/doc/practical_guide_bsc.md#start) - 4.2. [**Step() and AddStep()**](inst/doc/practical_guide_bsc.md#step-and-addstep) - 4.3. [**Compute()**](inst/doc/practical_guide_bsc.md#compute) - 4.3.1. [**Compute() locally**](inst/doc/practical_guide_bsc.md#compute-locally) - 4.3.2. [**Compute() on CTE-Power 9**](inst/doc/practical_guide_bsc.md#compute-on-cte-power-9) - 4.3.3. [**Compute() on the fat nodes and other HPCs**](inst/doc/practical_guide_bsc.md#compute-on-the-fat-nodes-and-other-hpcs) - 4.4. [**Collect() and the EC-Flow GUI**](inst/doc/practical_guide_bsc.md#collect-and-the-ec-flow-gui) - 4.5. [**Additional information**](inst/doc/practical_guide_bsc.md#additional-information) - 4.6. [**Other examples**](inst/doc/practical_guide_bsc.md#other-examples) - 4.7. [**Compute() cluster templates**](inst/doc/practical_guide_bsc.md#compute-cluster-templates) + 1. [**Start()**](inst/doc/practical_guide_bsc.md#start) + 2. [**Step() and AddStep()**](inst/doc/practical_guide_bsc.md#step-and-addstep) + 3. [**Compute()**](inst/doc/practical_guide_bsc.md#compute) + 1. [**Compute() locally**](inst/doc/practical_guide_bsc.md#compute-locally) + 2. [**Compute() on CTE-Power 9**](inst/doc/practical_guide_bsc.md#compute-on-cte-power-9) + 3. [**Compute() on the fat nodes and other HPCs**](inst/doc/practical_guide_bsc.md#compute-on-the-fat-nodes-and-other-hpcs) + 4. [**Collect() and the EC-Flow GUI**](inst/doc/practical_guide_bsc.md#collect-and-the-ec-flow-gui) +5. [**Additional information**](inst/doc/practical_guide_bsc.md#additional-information) +6. [**Other examples**](inst/doc/practical_guide_bsc.md#other-examples) +7. [**Compute() cluster templates**](inst/doc/practical_guide_bsc.md#compute-cluster-templates) ## Motivation @@ -195,7 +195,7 @@ This will show the names and lengths of the dimensions of the selected variable: 1 1296 640 25 860 ``` -_**Note**: If you check the dimensions of that file with `ncdump -h`, you will realize the 'var' dimension is actually not defined inside. `NcReadDims()` and `Start()`, though, perceive the different variables inside a file as if stored along a virtual dimension called 'var'. You can ignore this for now and assume 'var' is simply a file dimension (since it appears as a wildcard in the path pattern). Read more on this in Note 1 at the end of this section._ +_**Note**_: If you check the dimensions of that file with `ncdump -h`, you will realize the 'var' dimension is actually not defined inside. `NcReadDims()` and `Start()`, though, perceive the different variables inside a file as if stored along a virtual dimension called 'var'. You can ignore this for now and assume 'var' is simply a file dimension (since it appears as a wildcard in the path pattern). Read more on this in Note 1 at the end of this section. Once we know the dimension names, we have all the information we need to put the `Start()` call together: @@ -293,7 +293,7 @@ $sdate[[1]] [1] "19930101" ``` -If you are interested in actually loading the entire data set in your machine you can do so in two ways (_**be careful**_, loading the data involved in the `Start()` call in this example will most likely stall your machine. Try it with a smaller region or a subset of forecast time steps*): +If you are interested in actually loading the entire data set in your machine you can do so in two ways (_**be careful**_, loading the data involved in the `Start()` call in this example will most likely stall your machine. Try it with a smaller region or a subset of forecast time steps): - adding the parameter `retrieve = TRUE` in your `Start()` call. - evaluating the object returned by `Start()`: `data_load <- eval(data)` @@ -303,13 +303,13 @@ There are no constrains for the number or names of the outer or inner dimensions There are a number of advanced parameters and features in `Start()` to handle heterogeneities across files involved in a `Start()` call, such as the `synonims` parameter, or to handle dimensions extending across multiple NetCDF files, such as the `*_across` parameter. See the documentation on the `Start()` function (https://earth.bsc.es/gitlab/es/startR/blob/master/vignettes/start.md) or in `?Start` for more information. -_**Note 1 **on the 'var' dimension_: as mentioned above in this section, `NcVarReader()` is showing as if a virtual dimension 'var' appears inside the file. The existence of this dimension is justified by the fact that, many times, NetCDF files contain more than one variable. The 'var' dimension should hence be considered a "inner" dimension. But, in our example, the dimension 'var' is also defined as a file dimension in the path pattern. So, following the logic of `Start()`, there would be two 'var' dimensions, one of them outer and the other inner, and we should consequently specify indices for each of them. However, as exception, they are automatically understood to be the same dimension, and the target variable name specified as index for the outer 'var' dimension is also re-used to select the variable inside the file. This is a feature triggered only by the 'var' dimension name and, if other dimension names appeared more than once as inner or outer dimensions, `Start()` would crash throw an error. The feature described here is useful for the very common case where file paths contain the variable name and that variable is the only climate variable inside the file. If this feature was not available, one could still define the data set as shown in the code snippet below, where there would be some redundancy in the `Start()` call and in the dimensions of the resulting array. +**_Note 1 _**_on the 'var' dimension_: as mentioned above in this section, `NcVarReader()` is showing as if a virtual dimension 'var' appears inside the file. The existence of this dimension is justified by the fact that, many times, NetCDF files contain more than one variable. The 'var' dimension should hence be considered a "inner" dimension. But, in our example, the dimension 'var' is also defined as a file dimension in the path pattern. So, following the logic of `Start()`, there would be two 'var' dimensions, one of them outer and the other inner, and we should consequently specify indices for each of them. However, as exception, they are automatically understood to be the same dimension, and the target variable name specified as index for the outer 'var' dimension is also re-used to select the variable inside the file. This is a feature triggered only by the 'var' dimension name and, if other dimension names appeared more than once as inner or outer dimensions, `Start()` would crash throw an error. The feature described here is useful for the very common case where file paths contain the variable name and that variable is the only climate variable inside the file. If this feature was not available, one could still define the data set as shown in the code snippet below, where there would be some redundancy in the `Start()` call and in the dimensions of the resulting array. -_**Note 2 **on providing values as selectors for inner dimensions_: when values are requested for a inner dimension, the corresponding numeric indices are automatically calculated by comparing the provided values with a variable inside the file with the same name as the dimension for which the values have been requested. In the last example where specific values are requested for the latitudes, the variable 'latitude' is automatically retrieved from the files. If the name of the variable does not coincide with the name of the dimension, the parameter `*_var` can be specified in the `Start()` call, as detailed in `?Start`. +**_Note 2 _**_on providing values as selectors for inner dimensions_: when values are requested for a inner dimension, the corresponding numeric indices are automatically calculated by comparing the provided values with a variable inside the file with the same name as the dimension for which the values have been requested. In the last example where specific values are requested for the latitudes, the variable 'latitude' is automatically retrieved from the files. If the name of the variable does not coincide with the name of the dimension, the parameter `*_var` can be specified in the `Start()` call, as detailed in `?Start`. -_**Note 3 **on the order of dimensions_: neither the file dimensions nor the inner dimensions need to be specified in the same order as they appear in the path pattern or inside the NetCDF files, respectively. The resulting arrays returned by `Start()` will have the dimensions in the same order as requested in `Start()`, so changing the order in the call can potentially trigger automatic reordering of data, which is time consuming. But, depending on the use case, it may be a good idea to ask for a specific dimension order so that the data is properly arranged for making posterior calculations more efficient. Remember that the order of the dimensions in R is "big endian"; the values are consecutive along the first (left-most) dimension. In contrast, the order of dimensions in NetCDF files is "little endian". This means that if you want to respect the order of the data values in memory as stored in the NetCDF files, you should request the dimensions in your `Start()` call in the opposite order. +**_Note 3 _**_on the order of dimensions_: neither the file dimensions nor the inner dimensions need to be specified in the same order as they appear in the path pattern or inside the NetCDF files, respectively. The resulting arrays returned by `Start()` will have the dimensions in the same order as requested in `Start()`, so changing the order in the call can potentially trigger automatic reordering of data, which is time consuming. But, depending on the use case, it may be a good idea to ask for a specific dimension order so that the data is properly arranged for making posterior calculations more efficient. Remember that the order of the dimensions in R is "big endian"; the values are consecutive along the first (left-most) dimension. In contrast, the order of dimensions in NetCDF files is "little endian". This means that if you want to respect the order of the data values in memory as stored in the NetCDF files, you should request the dimensions in your `Start()` call in the opposite order. -_**Note 4 **on the size of the data in R_: if you check the size of the involved file in the example `Start()` call used above ('/esarchive/exp/ecmwf/system5_m1/6hourly/tas/tas_19930101.nc'), you will realize it only weighs 34GB. Why is the data reported to occupy 134GB then? This is due to two facts: by one side, NetCDF files are usually compressed, and their uncompressed size can be substantially greater. In this case, the uncompressed data would occupy about 72GB. Besides, the variable we are targetting in the example ('tas') is defined as a float variable inside the NetCDF file. This means each value is a 4-byte real number. However, R cannot represent 4-byte real numbers; it always takes 8 bytes to represent a real number. This is why, when float numbers are represented in R, the effective size of the data is doubled. In this case, the 72GB of uncompressed float numbers need to be represented using 132GB of RAM in R. +**_Note 4 _**_on the size of the data in R_: if you check the size of the involved file in the example `Start()` call used above ('/esarchive/exp/ecmwf/system5_m1/6hourly/tas/tas_19930101.nc'), you will realize it only weighs 34GB. Why is the data reported to occupy 134GB then? This is due to two facts: by one side, NetCDF files are usually compressed, and their uncompressed size can be substantially greater. In this case, the uncompressed data would occupy about 72GB. Besides, the variable we are targetting in the example ('tas') is defined as a float variable inside the NetCDF file. This means each value is a 4-byte real number. However, R cannot represent 4-byte real numbers; it always takes 8 bytes to represent a real number. This is why, when float numbers are represented in R, the effective size of the data is doubled. In this case, the 72GB of uncompressed float numbers need to be represented using 132GB of RAM in R. ### Step() and AddStep() @@ -780,11 +780,11 @@ You can also run `Collect()` with `wait = FALSE`. This will crash with an error `Collect()` also has a parameter called `remove`, which by default is set to `TRUE` and triggers removal of all data results received from the HPC (and stored under `ecflow_suite_dir`). If you would like to preserve the data, you can set `remove = FALSE` and `Collect()` it as many times as desired. Alternatively, you can `Collect()` with `remove = TRUE` and store the merged array right after with `saveRDS()`. -### Additional information +## Additional information -#### How to choose the number of chunks, jobs and cores +### How to choose the number of chunks, jobs and cores -##### Number of chunks and memory per job +#### Number of chunks and memory per job Adjusting the number of chunks, simultaneous jobs, cores and memory per job is crucial for a successful and efficient execution of your startR workflow on the HPC. @@ -800,17 +800,17 @@ The amount of memory requested for each job, in this case where the data to be p If the data to be processed was smaller than the memory modules, it could still be interesting to split it in chunks in order to parallelize the computation and get a faster result. In that case, the memory requested for each job must be equal to 2 or 3 times the size of one chunk. -##### Maximum number of simultaneous jobs +#### Maximum number of simultaneous jobs The maximum number of simultaneous jobs should be adjusted according to the capacity of the file system to deal with multiple simultanous file accesses, according to the bandwidth of the connection between the HPC nodes and the file system, and according to the number of jobs per user supported/allowed by the HPC queue. A recommended practice is to make a test with a subset of the data and try first with a small number of simultaneous jobs (e.g. 2), and keep repeating the test while increasing the number of simultaneous jobs, until the performance of the process is not improved due to any of the reasons mentioned before. -##### Number of cores and computing threads +#### Number of cores and computing threads The number of cores per job should be as large as possible, with two limitations: - If the HPC nodes have each "M" total amount of memory with "m" amount of memory per memory module, and have each "N" amount of cores, the requested amount of cores "n" should be such that "n" / "N" does not exceed "m" / "M". - Depending on the shape of the chunks, startR has a limited capability to exploit multiple cores. It is recommended to make small tests increasing the number of cores to work out a reasonable number of cores to be requested. -#### How to clean a failed execution +### How to clean a failed execution - Work out the startR execution ID, either by inspecting the execution description by `Compute()` when called with the parameter `wait = FALSE`, or by checking the `ecflow_suite_dir` with `ls -ltr`. - ssh to the HPC login node and cancel all jobs of your startR execution. @@ -820,7 +820,7 @@ The number of cores per job should be as large as possible, with two limitations - Optionally remove the data under `data_dir` on the HPC login node if the file system is not shared between the workstation and the HPC and you do not want to keep the data in the `data_dir`, used as caché for future startR executions. - Open the EC-Flow GUI and remove the workflow entry (a.k.a. suite) named with your startR execution ID with right click > "Remove". -#### Visualizing the profiling of the execution +### Visualizing the profiling of the execution As seen in previous sections, profiling measurements of the execution are provided together with the data output. These measurements can be visualized with the `PlotProfiling()` function made available in the source code of the startR package. @@ -838,15 +838,13 @@ A chart displays the timings for the different stages of the computation, as sho You can click on the image to expand it. -#### What to do if your function has too many target dimensions +### What to do if your function has too many target dimensions -#### Pending features +### Pending features -Computation of weekly means with startR is still pending future work. By now, it is not possible to do that because the metadata associated to each chunk, such as the dates, is not being sent to the `Compute()` function. +## Other examples -### Other examples - -#### Using experimental and (date-corresponding) observational data +### Using experimental and (date-corresponding) observational data ```r repos <- paste0('/esnas/exp/ecmwf/system4_m1/6hourly/', @@ -908,11 +906,11 @@ res <- Compute(step, list(system4, erai), wait = FALSE) ``` -#### Example of computation of weekly means +### Example of computation of weekly means -#### Example with data on an irregular grid with selection of a region +### Example with data on an irregular grid with selection of a region -#### Example on MareNostrum 4 +### Example on MareNostrum 4 ```r library(startR) @@ -964,9 +962,9 @@ res <- Compute(wf, wait = TRUE) ``` -#### Example on CTE-Power using GPUs +### Example on CTE-Power using GPUs -#### Seasonal forecast verification example on cca +### Seasonal forecast verification example on cca ```r crps <- function(x, y) { @@ -1015,9 +1013,9 @@ r <- Compute(wf, ) ``` -### Compute() cluster templates +## Compute() cluster templates -#### CTE-Power9 +### CTE-Power9 ```r cluster = list(queue_host = 'p9login1.bsc.es', @@ -1033,7 +1031,7 @@ cluster = list(queue_host = 'p9login1.bsc.es', ) ``` -#### BSC ES fat nodes +### BSC ES fat nodes ```r cluster = list(queue_host = 'bsceslogin01.bsc.es', @@ -1046,7 +1044,7 @@ cluster = list(queue_host = 'bsceslogin01.bsc.es', ) ``` -#### Marenostrum 4 +### Marenostrum 4 ```r cluster = list(queue_host = 'mn2.bsc.es', @@ -1065,7 +1063,7 @@ cluster = list(queue_host = 'mn2.bsc.es', ) ``` -#### Nord III +### Nord III ```r cluster = list(queue_host = 'nord1.bsc.es', @@ -1085,7 +1083,7 @@ cluster = list(queue_host = 'nord1.bsc.es', ) ``` -#### MinoTauro +### MinoTauro ```r cluster = list(queue_host = 'mt1.bsc.es', @@ -1104,7 +1102,7 @@ cluster = list(queue_host = 'mt1.bsc.es', ) ``` -#### ECMWF ecgate (workstation) + cca/ccb (HPC) +### ECMWF ecgate (workstation) + cca/ccb (HPC) ```r cluster = list(queue_host = 'cca', -- GitLab From 8b2fe08567b9c9577d8227893988f9693e2480bf Mon Sep 17 00:00:00 2001 From: Nicolau Manubens Date: Mon, 28 Jan 2019 04:40:06 +0100 Subject: [PATCH 11/20] Fixes in practical guide. --- inst/doc/practical_guide_bsc.md | 22 +++++++++++----------- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/inst/doc/practical_guide_bsc.md b/inst/doc/practical_guide_bsc.md index a7df125..79610ac 100644 --- a/inst/doc/practical_guide_bsc.md +++ b/inst/doc/practical_guide_bsc.md @@ -10,13 +10,13 @@ If you would like to start using startR rightaway on the BSC infrastructure, you 2. [**Introduction**](inst/doc/practical_guide_bsc.md#introduction) 3. [**Configuring startR**](inst/doc/practical_guide_bsc.md#configuring-startr) 4. [**Using startR**](inst/doc/practical_guide_bsc.md#using-startr) - 1. [**Start()**](inst/doc/practical_guide_bsc.md#start) - 2. [**Step() and AddStep()**](inst/doc/practical_guide_bsc.md#step-and-addstep) - 3. [**Compute()**](inst/doc/practical_guide_bsc.md#compute) - 1. [**Compute() locally**](inst/doc/practical_guide_bsc.md#compute-locally) - 2. [**Compute() on CTE-Power 9**](inst/doc/practical_guide_bsc.md#compute-on-cte-power-9) - 3. [**Compute() on the fat nodes and other HPCs**](inst/doc/practical_guide_bsc.md#compute-on-the-fat-nodes-and-other-hpcs) - 4. [**Collect() and the EC-Flow GUI**](inst/doc/practical_guide_bsc.md#collect-and-the-ec-flow-gui) + 1. [**Start()**](inst/doc/practical_guide_bsc.md#start) + 2. [**Step() and AddStep()**](inst/doc/practical_guide_bsc.md#step-and-addstep) + 3. [**Compute()**](inst/doc/practical_guide_bsc.md#compute) + 1. [**Compute() locally**](inst/doc/practical_guide_bsc.md#compute-locally) + 2. [**Compute() on CTE-Power 9**](inst/doc/practical_guide_bsc.md#compute-on-cte-power-9) + 3. [**Compute() on the fat nodes and other HPCs**](inst/doc/practical_guide_bsc.md#compute-on-the-fat-nodes-and-other-hpcs) + 4. [**Collect() and the EC-Flow GUI**](inst/doc/practical_guide_bsc.md#collect-and-the-ec-flow-gui) 5. [**Additional information**](inst/doc/practical_guide_bsc.md#additional-information) 6. [**Other examples**](inst/doc/practical_guide_bsc.md#other-examples) 7. [**Compute() cluster templates**](inst/doc/practical_guide_bsc.md#compute-cluster-templates) @@ -303,13 +303,13 @@ There are no constrains for the number or names of the outer or inner dimensions There are a number of advanced parameters and features in `Start()` to handle heterogeneities across files involved in a `Start()` call, such as the `synonims` parameter, or to handle dimensions extending across multiple NetCDF files, such as the `*_across` parameter. See the documentation on the `Start()` function (https://earth.bsc.es/gitlab/es/startR/blob/master/vignettes/start.md) or in `?Start` for more information. -**_Note 1 _**_on the 'var' dimension_: as mentioned above in this section, `NcVarReader()` is showing as if a virtual dimension 'var' appears inside the file. The existence of this dimension is justified by the fact that, many times, NetCDF files contain more than one variable. The 'var' dimension should hence be considered a "inner" dimension. But, in our example, the dimension 'var' is also defined as a file dimension in the path pattern. So, following the logic of `Start()`, there would be two 'var' dimensions, one of them outer and the other inner, and we should consequently specify indices for each of them. However, as exception, they are automatically understood to be the same dimension, and the target variable name specified as index for the outer 'var' dimension is also re-used to select the variable inside the file. This is a feature triggered only by the 'var' dimension name and, if other dimension names appeared more than once as inner or outer dimensions, `Start()` would crash throw an error. The feature described here is useful for the very common case where file paths contain the variable name and that variable is the only climate variable inside the file. If this feature was not available, one could still define the data set as shown in the code snippet below, where there would be some redundancy in the `Start()` call and in the dimensions of the resulting array. +***Note 1 ***_on the 'var' dimension_: as mentioned above in this section, `NcVarReader()` is showing as if a virtual dimension 'var' appears inside the file. The existence of this dimension is justified by the fact that, many times, NetCDF files contain more than one variable. The 'var' dimension should hence be considered a "inner" dimension. But, in our example, the dimension 'var' is also defined as a file dimension in the path pattern. So, following the logic of `Start()`, there would be two 'var' dimensions, one of them outer and the other inner, and we should consequently specify indices for each of them. However, as exception, they are automatically understood to be the same dimension, and the target variable name specified as index for the outer 'var' dimension is also re-used to select the variable inside the file. This is a feature triggered only by the 'var' dimension name and, if other dimension names appeared more than once as inner or outer dimensions, `Start()` would crash throw an error. The feature described here is useful for the very common case where file paths contain the variable name and that variable is the only climate variable inside the file. If this feature was not available, one could still define the data set as shown in the code snippet below, where there would be some redundancy in the `Start()` call and in the dimensions of the resulting array. -**_Note 2 _**_on providing values as selectors for inner dimensions_: when values are requested for a inner dimension, the corresponding numeric indices are automatically calculated by comparing the provided values with a variable inside the file with the same name as the dimension for which the values have been requested. In the last example where specific values are requested for the latitudes, the variable 'latitude' is automatically retrieved from the files. If the name of the variable does not coincide with the name of the dimension, the parameter `*_var` can be specified in the `Start()` call, as detailed in `?Start`. +***Note 2 ***_on providing values as selectors for inner dimensions_: when values are requested for a inner dimension, the corresponding numeric indices are automatically calculated by comparing the provided values with a variable inside the file with the same name as the dimension for which the values have been requested. In the last example where specific values are requested for the latitudes, the variable 'latitude' is automatically retrieved from the files. If the name of the variable does not coincide with the name of the dimension, the parameter `*_var` can be specified in the `Start()` call, as detailed in `?Start`. -**_Note 3 _**_on the order of dimensions_: neither the file dimensions nor the inner dimensions need to be specified in the same order as they appear in the path pattern or inside the NetCDF files, respectively. The resulting arrays returned by `Start()` will have the dimensions in the same order as requested in `Start()`, so changing the order in the call can potentially trigger automatic reordering of data, which is time consuming. But, depending on the use case, it may be a good idea to ask for a specific dimension order so that the data is properly arranged for making posterior calculations more efficient. Remember that the order of the dimensions in R is "big endian"; the values are consecutive along the first (left-most) dimension. In contrast, the order of dimensions in NetCDF files is "little endian". This means that if you want to respect the order of the data values in memory as stored in the NetCDF files, you should request the dimensions in your `Start()` call in the opposite order. +***Note 3 ***_on the order of dimensions_: neither the file dimensions nor the inner dimensions need to be specified in the same order as they appear in the path pattern or inside the NetCDF files, respectively. The resulting arrays returned by `Start()` will have the dimensions in the same order as requested in `Start()`, so changing the order in the call can potentially trigger automatic reordering of data, which is time consuming. But, depending on the use case, it may be a good idea to ask for a specific dimension order so that the data is properly arranged for making posterior calculations more efficient. Remember that the order of the dimensions in R is "big endian"; the values are consecutive along the first (left-most) dimension. In contrast, the order of dimensions in NetCDF files is "little endian". This means that if you want to respect the order of the data values in memory as stored in the NetCDF files, you should request the dimensions in your `Start()` call in the opposite order. -**_Note 4 _**_on the size of the data in R_: if you check the size of the involved file in the example `Start()` call used above ('/esarchive/exp/ecmwf/system5_m1/6hourly/tas/tas_19930101.nc'), you will realize it only weighs 34GB. Why is the data reported to occupy 134GB then? This is due to two facts: by one side, NetCDF files are usually compressed, and their uncompressed size can be substantially greater. In this case, the uncompressed data would occupy about 72GB. Besides, the variable we are targetting in the example ('tas') is defined as a float variable inside the NetCDF file. This means each value is a 4-byte real number. However, R cannot represent 4-byte real numbers; it always takes 8 bytes to represent a real number. This is why, when float numbers are represented in R, the effective size of the data is doubled. In this case, the 72GB of uncompressed float numbers need to be represented using 132GB of RAM in R. +***Note 4 ***_on the size of the data in R_: if you check the size of the involved file in the example `Start()` call used above ('/esarchive/exp/ecmwf/system5_m1/6hourly/tas/tas_19930101.nc'), you will realize it only weighs 34GB. Why is the data reported to occupy 134GB then? This is due to two facts: by one side, NetCDF files are usually compressed, and their uncompressed size can be substantially greater. In this case, the uncompressed data would occupy about 72GB. Besides, the variable we are targetting in the example ('tas') is defined as a float variable inside the NetCDF file. This means each value is a 4-byte real number. However, R cannot represent 4-byte real numbers; it always takes 8 bytes to represent a real number. This is why, when float numbers are represented in R, the effective size of the data is doubled. In this case, the 72GB of uncompressed float numbers need to be represented using 132GB of RAM in R. ### Step() and AddStep() -- GitLab From 1a9810b60785ffe4fe3c8a81621e0b957bf518ae Mon Sep 17 00:00:00 2001 From: Nicolau Manubens Date: Mon, 28 Jan 2019 04:42:34 +0100 Subject: [PATCH 12/20] Fixes in practical guide. --- inst/doc/practical_guide_bsc.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/inst/doc/practical_guide_bsc.md b/inst/doc/practical_guide_bsc.md index 79610ac..49e02a0 100644 --- a/inst/doc/practical_guide_bsc.md +++ b/inst/doc/practical_guide_bsc.md @@ -303,13 +303,13 @@ There are no constrains for the number or names of the outer or inner dimensions There are a number of advanced parameters and features in `Start()` to handle heterogeneities across files involved in a `Start()` call, such as the `synonims` parameter, or to handle dimensions extending across multiple NetCDF files, such as the `*_across` parameter. See the documentation on the `Start()` function (https://earth.bsc.es/gitlab/es/startR/blob/master/vignettes/start.md) or in `?Start` for more information. -***Note 1 ***_on the 'var' dimension_: as mentioned above in this section, `NcVarReader()` is showing as if a virtual dimension 'var' appears inside the file. The existence of this dimension is justified by the fact that, many times, NetCDF files contain more than one variable. The 'var' dimension should hence be considered a "inner" dimension. But, in our example, the dimension 'var' is also defined as a file dimension in the path pattern. So, following the logic of `Start()`, there would be two 'var' dimensions, one of them outer and the other inner, and we should consequently specify indices for each of them. However, as exception, they are automatically understood to be the same dimension, and the target variable name specified as index for the outer 'var' dimension is also re-used to select the variable inside the file. This is a feature triggered only by the 'var' dimension name and, if other dimension names appeared more than once as inner or outer dimensions, `Start()` would crash throw an error. The feature described here is useful for the very common case where file paths contain the variable name and that variable is the only climate variable inside the file. If this feature was not available, one could still define the data set as shown in the code snippet below, where there would be some redundancy in the `Start()` call and in the dimensions of the resulting array. +_**Note 1 **__on the 'var' dimension_: as mentioned above in this section, `NcVarReader()` is showing as if a virtual dimension 'var' appears inside the file. The existence of this dimension is justified by the fact that, many times, NetCDF files contain more than one variable. The 'var' dimension should hence be considered a "inner" dimension. But, in our example, the dimension 'var' is also defined as a file dimension in the path pattern. So, following the logic of `Start()`, there would be two 'var' dimensions, one of them outer and the other inner, and we should consequently specify indices for each of them. However, as exception, they are automatically understood to be the same dimension, and the target variable name specified as index for the outer 'var' dimension is also re-used to select the variable inside the file. This is a feature triggered only by the 'var' dimension name and, if other dimension names appeared more than once as inner or outer dimensions, `Start()` would crash throw an error. The feature described here is useful for the very common case where file paths contain the variable name and that variable is the only climate variable inside the file. If this feature was not available, one could still define the data set as shown in the code snippet below, where there would be some redundancy in the `Start()` call and in the dimensions of the resulting array. -***Note 2 ***_on providing values as selectors for inner dimensions_: when values are requested for a inner dimension, the corresponding numeric indices are automatically calculated by comparing the provided values with a variable inside the file with the same name as the dimension for which the values have been requested. In the last example where specific values are requested for the latitudes, the variable 'latitude' is automatically retrieved from the files. If the name of the variable does not coincide with the name of the dimension, the parameter `*_var` can be specified in the `Start()` call, as detailed in `?Start`. +_**Note 2 **__on providing values as selectors for inner dimensions_: when values are requested for a inner dimension, the corresponding numeric indices are automatically calculated by comparing the provided values with a variable inside the file with the same name as the dimension for which the values have been requested. In the last example where specific values are requested for the latitudes, the variable 'latitude' is automatically retrieved from the files. If the name of the variable does not coincide with the name of the dimension, the parameter `*_var` can be specified in the `Start()` call, as detailed in `?Start`. -***Note 3 ***_on the order of dimensions_: neither the file dimensions nor the inner dimensions need to be specified in the same order as they appear in the path pattern or inside the NetCDF files, respectively. The resulting arrays returned by `Start()` will have the dimensions in the same order as requested in `Start()`, so changing the order in the call can potentially trigger automatic reordering of data, which is time consuming. But, depending on the use case, it may be a good idea to ask for a specific dimension order so that the data is properly arranged for making posterior calculations more efficient. Remember that the order of the dimensions in R is "big endian"; the values are consecutive along the first (left-most) dimension. In contrast, the order of dimensions in NetCDF files is "little endian". This means that if you want to respect the order of the data values in memory as stored in the NetCDF files, you should request the dimensions in your `Start()` call in the opposite order. +_**Note 3 **__on the order of dimensions_: neither the file dimensions nor the inner dimensions need to be specified in the same order as they appear in the path pattern or inside the NetCDF files, respectively. The resulting arrays returned by `Start()` will have the dimensions in the same order as requested in `Start()`, so changing the order in the call can potentially trigger automatic reordering of data, which is time consuming. But, depending on the use case, it may be a good idea to ask for a specific dimension order so that the data is properly arranged for making posterior calculations more efficient. Remember that the order of the dimensions in R is "big endian"; the values are consecutive along the first (left-most) dimension. In contrast, the order of dimensions in NetCDF files is "little endian". This means that if you want to respect the order of the data values in memory as stored in the NetCDF files, you should request the dimensions in your `Start()` call in the opposite order. -***Note 4 ***_on the size of the data in R_: if you check the size of the involved file in the example `Start()` call used above ('/esarchive/exp/ecmwf/system5_m1/6hourly/tas/tas_19930101.nc'), you will realize it only weighs 34GB. Why is the data reported to occupy 134GB then? This is due to two facts: by one side, NetCDF files are usually compressed, and their uncompressed size can be substantially greater. In this case, the uncompressed data would occupy about 72GB. Besides, the variable we are targetting in the example ('tas') is defined as a float variable inside the NetCDF file. This means each value is a 4-byte real number. However, R cannot represent 4-byte real numbers; it always takes 8 bytes to represent a real number. This is why, when float numbers are represented in R, the effective size of the data is doubled. In this case, the 72GB of uncompressed float numbers need to be represented using 132GB of RAM in R. +_**Note 4 **__on the size of the data in R_: if you check the size of the involved file in the example `Start()` call used above ('/esarchive/exp/ecmwf/system5_m1/6hourly/tas/tas_19930101.nc'), you will realize it only weighs 34GB. Why is the data reported to occupy 134GB then? This is due to two facts: by one side, NetCDF files are usually compressed, and their uncompressed size can be substantially greater. In this case, the uncompressed data would occupy about 72GB. Besides, the variable we are targetting in the example ('tas') is defined as a float variable inside the NetCDF file. This means each value is a 4-byte real number. However, R cannot represent 4-byte real numbers; it always takes 8 bytes to represent a real number. This is why, when float numbers are represented in R, the effective size of the data is doubled. In this case, the 72GB of uncompressed float numbers need to be represented using 132GB of RAM in R. ### Step() and AddStep() -- GitLab From dd09667febf86c995f872a28cd966b5d3c52ff9d Mon Sep 17 00:00:00 2001 From: Nicolau Manubens Date: Mon, 28 Jan 2019 04:44:02 +0100 Subject: [PATCH 13/20] Fixes in practical guide. --- inst/doc/practical_guide_bsc.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/inst/doc/practical_guide_bsc.md b/inst/doc/practical_guide_bsc.md index 49e02a0..5a3c4cb 100644 --- a/inst/doc/practical_guide_bsc.md +++ b/inst/doc/practical_guide_bsc.md @@ -303,13 +303,13 @@ There are no constrains for the number or names of the outer or inner dimensions There are a number of advanced parameters and features in `Start()` to handle heterogeneities across files involved in a `Start()` call, such as the `synonims` parameter, or to handle dimensions extending across multiple NetCDF files, such as the `*_across` parameter. See the documentation on the `Start()` function (https://earth.bsc.es/gitlab/es/startR/blob/master/vignettes/start.md) or in `?Start` for more information. -_**Note 1 **__on the 'var' dimension_: as mentioned above in this section, `NcVarReader()` is showing as if a virtual dimension 'var' appears inside the file. The existence of this dimension is justified by the fact that, many times, NetCDF files contain more than one variable. The 'var' dimension should hence be considered a "inner" dimension. But, in our example, the dimension 'var' is also defined as a file dimension in the path pattern. So, following the logic of `Start()`, there would be two 'var' dimensions, one of them outer and the other inner, and we should consequently specify indices for each of them. However, as exception, they are automatically understood to be the same dimension, and the target variable name specified as index for the outer 'var' dimension is also re-used to select the variable inside the file. This is a feature triggered only by the 'var' dimension name and, if other dimension names appeared more than once as inner or outer dimensions, `Start()` would crash throw an error. The feature described here is useful for the very common case where file paths contain the variable name and that variable is the only climate variable inside the file. If this feature was not available, one could still define the data set as shown in the code snippet below, where there would be some redundancy in the `Start()` call and in the dimensions of the resulting array. +_**Note 1**_ _on the 'var' dimension_: as mentioned above in this section, `NcVarReader()` is showing as if a virtual dimension 'var' appears inside the file. The existence of this dimension is justified by the fact that, many times, NetCDF files contain more than one variable. The 'var' dimension should hence be considered a "inner" dimension. But, in our example, the dimension 'var' is also defined as a file dimension in the path pattern. So, following the logic of `Start()`, there would be two 'var' dimensions, one of them outer and the other inner, and we should consequently specify indices for each of them. However, as exception, they are automatically understood to be the same dimension, and the target variable name specified as index for the outer 'var' dimension is also re-used to select the variable inside the file. This is a feature triggered only by the 'var' dimension name and, if other dimension names appeared more than once as inner or outer dimensions, `Start()` would crash throw an error. The feature described here is useful for the very common case where file paths contain the variable name and that variable is the only climate variable inside the file. If this feature was not available, one could still define the data set as shown in the code snippet below, where there would be some redundancy in the `Start()` call and in the dimensions of the resulting array. -_**Note 2 **__on providing values as selectors for inner dimensions_: when values are requested for a inner dimension, the corresponding numeric indices are automatically calculated by comparing the provided values with a variable inside the file with the same name as the dimension for which the values have been requested. In the last example where specific values are requested for the latitudes, the variable 'latitude' is automatically retrieved from the files. If the name of the variable does not coincide with the name of the dimension, the parameter `*_var` can be specified in the `Start()` call, as detailed in `?Start`. +_**Note 2**_ _on providing values as selectors for inner dimensions_: when values are requested for a inner dimension, the corresponding numeric indices are automatically calculated by comparing the provided values with a variable inside the file with the same name as the dimension for which the values have been requested. In the last example where specific values are requested for the latitudes, the variable 'latitude' is automatically retrieved from the files. If the name of the variable does not coincide with the name of the dimension, the parameter `*_var` can be specified in the `Start()` call, as detailed in `?Start`. -_**Note 3 **__on the order of dimensions_: neither the file dimensions nor the inner dimensions need to be specified in the same order as they appear in the path pattern or inside the NetCDF files, respectively. The resulting arrays returned by `Start()` will have the dimensions in the same order as requested in `Start()`, so changing the order in the call can potentially trigger automatic reordering of data, which is time consuming. But, depending on the use case, it may be a good idea to ask for a specific dimension order so that the data is properly arranged for making posterior calculations more efficient. Remember that the order of the dimensions in R is "big endian"; the values are consecutive along the first (left-most) dimension. In contrast, the order of dimensions in NetCDF files is "little endian". This means that if you want to respect the order of the data values in memory as stored in the NetCDF files, you should request the dimensions in your `Start()` call in the opposite order. +_**Note 3**_ _on the order of dimensions_: neither the file dimensions nor the inner dimensions need to be specified in the same order as they appear in the path pattern or inside the NetCDF files, respectively. The resulting arrays returned by `Start()` will have the dimensions in the same order as requested in `Start()`, so changing the order in the call can potentially trigger automatic reordering of data, which is time consuming. But, depending on the use case, it may be a good idea to ask for a specific dimension order so that the data is properly arranged for making posterior calculations more efficient. Remember that the order of the dimensions in R is "big endian"; the values are consecutive along the first (left-most) dimension. In contrast, the order of dimensions in NetCDF files is "little endian". This means that if you want to respect the order of the data values in memory as stored in the NetCDF files, you should request the dimensions in your `Start()` call in the opposite order. -_**Note 4 **__on the size of the data in R_: if you check the size of the involved file in the example `Start()` call used above ('/esarchive/exp/ecmwf/system5_m1/6hourly/tas/tas_19930101.nc'), you will realize it only weighs 34GB. Why is the data reported to occupy 134GB then? This is due to two facts: by one side, NetCDF files are usually compressed, and their uncompressed size can be substantially greater. In this case, the uncompressed data would occupy about 72GB. Besides, the variable we are targetting in the example ('tas') is defined as a float variable inside the NetCDF file. This means each value is a 4-byte real number. However, R cannot represent 4-byte real numbers; it always takes 8 bytes to represent a real number. This is why, when float numbers are represented in R, the effective size of the data is doubled. In this case, the 72GB of uncompressed float numbers need to be represented using 132GB of RAM in R. +_**Note 4**_ _on the size of the data in R_: if you check the size of the involved file in the example `Start()` call used above ('/esarchive/exp/ecmwf/system5_m1/6hourly/tas/tas_19930101.nc'), you will realize it only weighs 34GB. Why is the data reported to occupy 134GB then? This is due to two facts: by one side, NetCDF files are usually compressed, and their uncompressed size can be substantially greater. In this case, the uncompressed data would occupy about 72GB. Besides, the variable we are targetting in the example ('tas') is defined as a float variable inside the NetCDF file. This means each value is a 4-byte real number. However, R cannot represent 4-byte real numbers; it always takes 8 bytes to represent a real number. This is why, when float numbers are represented in R, the effective size of the data is doubled. In this case, the 72GB of uncompressed float numbers need to be represented using 132GB of RAM in R. ### Step() and AddStep() -- GitLab From 8b4179a2bfbd38c4a9bffaa2f8454864b401906d Mon Sep 17 00:00:00 2001 From: Nicolau Manubens Date: Mon, 28 Jan 2019 05:08:42 +0100 Subject: [PATCH 14/20] Fixes in practical guide. --- inst/doc/practical_guide_bsc.md | 26 +++++++++++++------------- 1 file changed, 13 insertions(+), 13 deletions(-) diff --git a/inst/doc/practical_guide_bsc.md b/inst/doc/practical_guide_bsc.md index 5a3c4cb..beae33f 100644 --- a/inst/doc/practical_guide_bsc.md +++ b/inst/doc/practical_guide_bsc.md @@ -2,7 +2,7 @@ This guide includes explanations and practical examples for you to learn how to use startR to efficiently process large data sets in parallel on the BSC's HPCs (CTE-Power 9, Marenostrum 4, ...). See the main page of the [**startR**](README.md) project for a general overview of the features of startR, without actual guidance on how to use it. -If you would like to start using startR rightaway on the BSC infrastructure, you can directly go through the "Configuring startR" section, copy/paste the basic startR script example shown at the end of the "Motivation" section onto the text editor of your preference, adjust the paths and user names specified in the `Compute()` function, and run the code in an R session after loading the relevant modules. +If you would like to start using startR rightaway on the BSC infrastructure, you can directly go through the "Configuring startR" section, copy/paste the basic startR script example shown at the end of the "Introduction" section onto the text editor of your preference, adjust the paths and user names specified in the `Compute()` function, and run the code in an R session after loading the R and ecFlow modules. ## Index @@ -23,9 +23,9 @@ If you would like to start using startR rightaway on the BSC infrastructure, you ## Motivation -What would you do if you had to apply a custom statistical analysis procedure to a 10TB climate data set? Probably, you would need to use a scripting language to write a procedure which is able to retrieve a subset of data from the file system (it would rarely be possible to handle all of it at once on a single node), encode the procedure in that language, and apply it carefully and efficiently to the data. Afterwards, you would need to think of and develop a mechanism to dispatch the job mutiple times in parallel to an HPC of your choice, each of the jobs processing a different subset of the data set. You could do this by hand, but ideally you would use EC-Flow or a similar general purpose workflow manager, which would orchestrate the work for you, and would allow you to monitor and control the progress, as well as keep an easy-to-understand record of what you did, to be reused in the future if needed. The mentioned solution, although it is the recommended way to go, is a demanding one and you could easily spend a few days until you get it running smoothly. Additionally, when developing the job script, you would be exposed to the difficulties of efficiently managing the data and applying the encoded procedure to it. +What would you do if you had to apply a custom statistical analysis procedure to a 10TB climate data set? Probably, you would need to use a scripting language to write a procedure which is able to retrieve a subset of data from the file system (it would rarely be possible to handle all of it at once on a single node), code the procedure in that language, and apply it carefully and efficiently to the data. Afterwards, you would need to think of and develop a mechanism to dispatch the job mutiple times in parallel to an HPC of your choice, each of the jobs processing a different subset of the data set. You could do this by hand but, ideally, you would rather use EC-Flow or a similar general purpose workflow manager which would orchestrate the work for you. Also, it would allow you to visually monitor and control the progress, as well as keep an easy-to-understand record of what you did, in case you need to re-use it in the future. The mentioned solution, although it is the recommended way to go, is a demanding one and you could easily spend a few days until you get it running smoothly. Additionally, when developing the job script, you would be exposed to the difficulties of efficiently managing the data and applying the coded procedure to it. -With the constant increase of resolution (in all possible dimensions) of weather and climate model output, and with the growing need for using computationally demanding analytical methodologies (e.g. bootstraping with thousands of repetitions), this kind of divide-and-conquer approach is becoming indispensable. While tools exist to simplify and automate this complex procedure, they usually require adapting your data to specific formats, migrating to specific database systems, or an advanced knowledge of computer sciences or of specific programming languages or frameworks. +With the constant increase of resolution (in all possible dimensions) of weather and climate model output, and with the growing need for using computationally demanding analytical methodologies (e.g. bootstraping with thousands of repetitions), this kind of divide-and-conquer approach becomes indispensable. While tools exist to simplify and automate this complex procedure, they usually require adapting your data to specific formats, migrating to specific database systems, or an advanced knowledge of computer sciences or of specific programming languages or frameworks. startR is yet another tool which allows the R user to apply user-defined functions or procedures to collections of NetCDF files (see Note 1) as large as desired, transparently using computational resources in HPCs (see Note 2) to minimize the time to solution. Although it has been designed to require as few mandatory technical parameters as possible from the user, an experienced user can configure a number of additional parameters to adjust the execution. startR operates on, and provides as outputs, multidimensional arrays with named dimensions, a basic and widely used data structure in R, and this makes the framework more familiar to the general R user. @@ -39,6 +39,10 @@ Other things you can expect from startR: Things that are not supposed to be done with startR: - Curating/homogenizing model output files or generating files to be stored under /esarchive following the department/community conventions. Although metadata is understood and used by startR, its handling is not 100% consistent yet. +_**Note 1**_: The data files do not need to be migrated to a database system, nor have to comply any specific convention for their file names, name, number or order of the dimensions and variables, or distribution of files in the file system. Although only files in the NetCDF format are supported for now, plug-in functions (of approximately 200 lines of code) can be programmed to add support for additional file formats. + +_**Note 2**_: The HPCs startR is designed to run on are understood as multi-core multi-node clusters. startR relies on a shared file system across all HPC nodes, and does not implement any kind of distributed storage system for now. + ## Introduction In order to use startR you will need to follow the configuration steps listed in the first section of this guide to make sure startR works on your workstation with the HPC of your choice. @@ -94,25 +98,21 @@ res <- Compute(wf, ecflow_suite_dir = '/home/Earth/nmanuben/startR_local/') ``` -_**Note 1**_: The data files do not need to be migrated to a database system, nor have to comply any specific convention for their file names, name, number or order of the dimensions and variables, or distribution of files in the file system. Although only files in the NetCDF format are supported for now, plug-in functions (of approximately 200 lines of code) can be programmed to add support for additional file formats. - -_**Note 2**_: The HPCs startR is designed to run on are understood as multi-core multi-node clusters. startR relies on a shared file system across all HPC nodes, and does not implement any kind of distributed storage system for now. - ## Configuring startR At BSC, the only configuration step you need to follow is to set up passwordless connection with the HPC. You do not need to follow the complete list of deployment steps since all dependencies are already installed for you to use, but you can find the steps listed under the [**Deployment**](inst/doc/deployment.md) section. Specifically, you need to set up passwordless, userless access from your machine to the HPC login node, and from the HPC login node to your machine if at all possible. In order to establish the connection in one of the directions, you need do the following: -1- generate an ssh pair of keys on the origin host if you do not have one, using `ssh-keygen -t rsa` +1. generate an ssh pair of keys on the origin host if you do not have one, using `ssh-keygen -t rsa` -2- ssh to the destionation node and create a directory where to store the public key, using `ssh username@hostname_or_ip mkdir -p .ssh`. 'hostname_or_ip' refers to the host name or IP address of the login node of the selected HPC, and 'username' to your account name on the HPC, which may not coincide with the one in your workstation. +2. ssh to the destionation node and create a directory where to store the public key, using `ssh username@hostname_or_ip mkdir -p .ssh`. 'hostname_or_ip' refers to the host name or IP address of the login node of the selected HPC, and 'username' to your account name on the HPC, which may not coincide with the one in your workstation. -3- dump your public key on a new file under that folder, using `cat .ssh/id_rsa.pub | ssh username@hostname_or_ip 'cat >> .ssh/authorized_keys'` +3. dump your public key on a new file under that folder, using `cat .ssh/id_rsa.pub | ssh username@hostname_or_ip 'cat >> .ssh/authorized_keys'` -4- adjust the permissions of the repository of keys, using `ssh username@hostname_or_ip "chmod 700 .ssh; chmod 640 .ssh/authorized_keys"` +4. adjust the permissions of the repository of keys, using `ssh username@hostname_or_ip "chmod 700 .ssh; chmod 640 .ssh/authorized_keys"` -5- if your username is different on your workstation and on the login node of the HPC, add an entry in the file .ssh/config in your workstation as follows: +5. if your username is different on your workstation and on the login node of the HPC, add an entry in the file .ssh/config in your workstation as follows: ``` Host short_name_of_the_host HostName hostname_or_ip @@ -301,7 +301,7 @@ You may realize that this functionality is similar to the `Load()` function in t There are no constrains for the number or names of the outer or inner dimensions used in a `Start()` call. In other words, `Start()` will handle NetCDF files with any number of dimensions with any name, as well as files distributed across folders in complex ways, since you can use customized wildcards in the path pattern. -There are a number of advanced parameters and features in `Start()` to handle heterogeneities across files involved in a `Start()` call, such as the `synonims` parameter, or to handle dimensions extending across multiple NetCDF files, such as the `*_across` parameter. See the documentation on the `Start()` function (https://earth.bsc.es/gitlab/es/startR/blob/master/vignettes/start.md) or in `?Start` for more information. +There are a number of advanced parameters and features in `Start()` to handle heterogeneities across files involved in a `Start()` call, such as the `synonims` parameter, or to handle dimensions extending across multiple NetCDF files, such as the `*_across` parameter. See the documentation on the [**Start() function**](inst/doc/start.md) or in `?Start` for more information. _**Note 1**_ _on the 'var' dimension_: as mentioned above in this section, `NcVarReader()` is showing as if a virtual dimension 'var' appears inside the file. The existence of this dimension is justified by the fact that, many times, NetCDF files contain more than one variable. The 'var' dimension should hence be considered a "inner" dimension. But, in our example, the dimension 'var' is also defined as a file dimension in the path pattern. So, following the logic of `Start()`, there would be two 'var' dimensions, one of them outer and the other inner, and we should consequently specify indices for each of them. However, as exception, they are automatically understood to be the same dimension, and the target variable name specified as index for the outer 'var' dimension is also re-used to select the variable inside the file. This is a feature triggered only by the 'var' dimension name and, if other dimension names appeared more than once as inner or outer dimensions, `Start()` would crash throw an error. The feature described here is useful for the very common case where file paths contain the variable name and that variable is the only climate variable inside the file. If this feature was not available, one could still define the data set as shown in the code snippet below, where there would be some redundancy in the `Start()` call and in the dimensions of the resulting array. -- GitLab From 96ae2c59cab78f071eafdbfc9294bd3dcbbe24e0 Mon Sep 17 00:00:00 2001 From: Nicolau Manubens Date: Mon, 28 Jan 2019 12:25:33 +0100 Subject: [PATCH 15/20] Improvements in the user guide. --- ...ctical_guide_bsc.md => practical_guide.md} | 29 +++++++++++-------- 1 file changed, 17 insertions(+), 12 deletions(-) rename inst/doc/{practical_guide_bsc.md => practical_guide.md} (94%) diff --git a/inst/doc/practical_guide_bsc.md b/inst/doc/practical_guide.md similarity index 94% rename from inst/doc/practical_guide_bsc.md rename to inst/doc/practical_guide.md index beae33f..13659bc 100644 --- a/inst/doc/practical_guide_bsc.md +++ b/inst/doc/practical_guide.md @@ -2,7 +2,7 @@ This guide includes explanations and practical examples for you to learn how to use startR to efficiently process large data sets in parallel on the BSC's HPCs (CTE-Power 9, Marenostrum 4, ...). See the main page of the [**startR**](README.md) project for a general overview of the features of startR, without actual guidance on how to use it. -If you would like to start using startR rightaway on the BSC infrastructure, you can directly go through the "Configuring startR" section, copy/paste the basic startR script example shown at the end of the "Introduction" section onto the text editor of your preference, adjust the paths and user names specified in the `Compute()` function, and run the code in an R session after loading the R and ecFlow modules. +If you would like to start using startR rightaway on the BSC infrastructure, you can directly go through the "Configuring startR" section, copy/paste the basic startR script example shown at the end of the "Introduction" section onto the text editor of your preference, adjust the paths and user names specified in the `Compute()` call, and run the code in an R session after loading the R and ecFlow modules. ## Index @@ -45,15 +45,15 @@ _**Note 2**_: The HPCs startR is designed to run on are understood as multi-core ## Introduction -In order to use startR you will need to follow the configuration steps listed in the first section of this guide to make sure startR works on your workstation with the HPC of your choice. +In order to use startR you will need to follow the configuration steps listed in the "Configuring startR" section of this guide to make sure startR works on your workstation with the HPC of your choice. Afterwards, you will need to understand and use five functions, all of them included in the startR package: - **Start()**, for declaing the data sets to be processed - **Step()** and **AddStep()**, for specifying the operation to be applied to the data - - **Compute()**, for specifying the HPC to be employed, the number of chunks and cores, and to trigger the computation + - **Compute()**, for specifying the HPC to be employed, the execution parameters (e.g. number of chunks and cores), and to trigger the computation - **Collect()** and the **EC-Flow graphical user interface**, for monitoring of the progress and collection of results -Next, you can see an example of startR script performing an ensemble mean of a small data set on CTE-Power9, for you to get a broad picture of how the startR functions interact and the information that is represented in a startR script. Note that the `temp_dir` and `ecflow_suite_dir` parameters in the `Compute()` call are user-specific. +Next, you can see an example startR script performing the ensemble mean of a small data set on CTE-Power9, for you to get a broad picture of how the startR functions interact and the information that is represented in a startR script. Note that the `queue_host`, `temp_dir` and `ecflow_suite_dir` parameters in the `Compute()` call are user-specific. ```r library(startR) @@ -100,9 +100,9 @@ res <- Compute(wf, ## Configuring startR -At BSC, the only configuration step you need to follow is to set up passwordless connection with the HPC. You do not need to follow the complete list of deployment steps since all dependencies are already installed for you to use, but you can find the steps listed under the [**Deployment**](inst/doc/deployment.md) section. +At BSC, the only configuration step you need to follow is to set up passwordless connection with the HPC. You do not need to follow the complete list of deployment steps under [**Deployment**](inst/doc/deployment.md) since all dependencies are already installed for you to use. -Specifically, you need to set up passwordless, userless access from your machine to the HPC login node, and from the HPC login node to your machine if at all possible. In order to establish the connection in one of the directions, you need do the following: +Specifically, you need to set up passwordless, userless access from your machine to the HPC login node, and from the HPC login node to your machine if at all possible. In order to establish the connection in one of the directions, you need to do the following: 1. generate an ssh pair of keys on the origin host if you do not have one, using `ssh-keygen -t rsa` @@ -155,7 +155,7 @@ library(startR) In order for startR to recognize the data sets you want to process, you first need to declare them. The first step in the declaration of a data set is to build a special path string that encodes where all the involved NetCDF files to be processed are stored. It contains some wildcards in those parts of the path that vary across files. This special path string is also called "path pattern". -Before defining an example path pattern, let's introduce some target NetCDF files. In the esarchive, we can find the following files: +Before defining an example path pattern, let's introduce some target NetCDF files. In the "esarchive" at BSC, we can find the following files: ``` /esarchive/exp/ecmwf/system5_m1/6hourly/ @@ -179,7 +179,7 @@ repos <- '/esarchive/exp/ecmwf/system5_m1/6hourly/$var$/$var$_$sdate$.nc' The wildcards used (the pieces wrapped between '$' symbols) can be given any names you like. They do not necessarily need to be 'var' or 'sdate' or match any specific key word (although in this case, as explained later, the 'var' name will trigger a special feature of `Start()`). -Once the path pattern is specified, a `Start()` call can be built, in which you need to provide, as parameters, the specific values of interest for each of the wildcards (also called outer dimensions, or file dimensions), as well as for each of the dimensions inside the NetCDF files (inner dimensions). +Once the path pattern is specified, a `Start()` call can be built, in which you need to provide, as parameters, the specific values of interest for each of the dimensions defined by the wildcards (also called outer dimensions, or file dimensions), as well as for each of the dimensions inside the NetCDF files (inner dimensions). You can check in advance which dimensions are inside the NetCDF files by using e.g. easyNCDF on one of the files: @@ -213,7 +213,7 @@ data <- Start(dat = repos, For each of the dimensions, the values or indices of interest (a.k.a. selectors) can be specified in three possible ways: - Using one or more numeric indices, for example `time = indices(c(1, 3, 5))`, or `sdate = indices(3:5)`. In the latter case, the third, fourht and fifth start dates appearing in the file system in alphabetical order would be selected ('19930301', '19930401' and '19930501'). -- Using one or more actual values, for example `sdate = values('19930101')`, or `ensemble = values(c('r1i1p1', 'r2i1p1'))`, or `latitude = values(c(10, 10.5, 11))`. The `values()` helper function can be omitted (as shown in the example). See Note 2 for details on how values are handled for inner dimensions. +- Using one or more actual values, for example `sdate = values('19930101')`, or `ensemble = values(c('r1i1p1', 'r2i1p1'))`, or `latitude = values(c(10, 10.5, 11))`. The `values()` helper function can be omitted (as shown in the example). See Note 2 for details on how value selectors are handled when specified for inner dimensions. - Using a list of two numeric values, for example `sdate = indices(list(5, 10))`. This will take all indices from the 5th to the 10th. - Using a list of two actual values, for example `sdate = values(list('r1i1p1', 'r5i1p1'))` or `latitude = values(list(-45, 75))`. This will take all values, in order, placed between the two values specified (both ends included). - Using the special keywords 'all', 'first' or 'last'. @@ -301,11 +301,11 @@ You may realize that this functionality is similar to the `Load()` function in t There are no constrains for the number or names of the outer or inner dimensions used in a `Start()` call. In other words, `Start()` will handle NetCDF files with any number of dimensions with any name, as well as files distributed across folders in complex ways, since you can use customized wildcards in the path pattern. -There are a number of advanced parameters and features in `Start()` to handle heterogeneities across files involved in a `Start()` call, such as the `synonims` parameter, or to handle dimensions extending across multiple NetCDF files, such as the `*_across` parameter. See the documentation on the [**Start() function**](inst/doc/start.md) or in `?Start` for more information. +There are a number of advanced parameters and features in `Start()` to handle heterogeneities across files involved in a `Start()` call, such as the `synonims` parameter, or to handle dimensions extending across multiple NetCDF files, such as the `*_across` parameter. See the documentation on the [**Start()**](inst/doc/start.md) function or `?Start` for more information. -_**Note 1**_ _on the 'var' dimension_: as mentioned above in this section, `NcVarReader()` is showing as if a virtual dimension 'var' appears inside the file. The existence of this dimension is justified by the fact that, many times, NetCDF files contain more than one variable. The 'var' dimension should hence be considered a "inner" dimension. But, in our example, the dimension 'var' is also defined as a file dimension in the path pattern. So, following the logic of `Start()`, there would be two 'var' dimensions, one of them outer and the other inner, and we should consequently specify indices for each of them. However, as exception, they are automatically understood to be the same dimension, and the target variable name specified as index for the outer 'var' dimension is also re-used to select the variable inside the file. This is a feature triggered only by the 'var' dimension name and, if other dimension names appeared more than once as inner or outer dimensions, `Start()` would crash throw an error. The feature described here is useful for the very common case where file paths contain the variable name and that variable is the only climate variable inside the file. If this feature was not available, one could still define the data set as shown in the code snippet below, where there would be some redundancy in the `Start()` call and in the dimensions of the resulting array. +_**Note 1**_ _on the 'var' dimension_: as mentioned above in this section, `NcVarReader()` is showing as if a virtual dimension 'var' appeared inside the file. The existence of this dimension is justified by the fact that, many times, NetCDF files contain more than one variable. The 'var' dimension should hence be considered a "inner" dimension. But, in our example, the dimension 'var' is also defined as a file dimension in the path pattern. So, following the logic of `Start()`, there would be two 'var' dimensions, one of them outer and the other inner, and we should consequently specify selectors for each of them. However, as exception, they are automatically understood to be the same dimension, and the target variable name specified as selector for the outer 'var' dimension is also re-used to select the variable inside the file. This is a feature triggered only by the 'var' dimension name and, if other dimension names appeared more than once as inner or outer dimensions, `Start()` would crash throw an error. The feature described here is useful for the very common case where file paths contain the variable name and that variable is the only climate variable inside the file. If this feature was not available, one could still define the data set as shown in the code snippet below, where there would be some redundancy in the `Start()` call and in the dimensions of the resulting array. -_**Note 2**_ _on providing values as selectors for inner dimensions_: when values are requested for a inner dimension, the corresponding numeric indices are automatically calculated by comparing the provided values with a variable inside the file with the same name as the dimension for which the values have been requested. In the last example where specific values are requested for the latitudes, the variable 'latitude' is automatically retrieved from the files. If the name of the variable does not coincide with the name of the dimension, the parameter `*_var` can be specified in the `Start()` call, as detailed in `?Start`. +_**Note 2**_ _on providing values as selectors for inner dimensions_: when values are requested for a inner dimension, the corresponding numeric indices are automatically calculated by comparing the provided values with a variable inside the file with the same name as the dimension for which the values have been requested. In the last example, where specific values are requested for the latitudes, the variable 'latitude' is automatically retrieved from the files. If the name of the variable does not coincide with the name of the dimension, the parameter `*_var` can be specified in the `Start()` call, as detailed in `?Start`. _**Note 3**_ _on the order of dimensions_: neither the file dimensions nor the inner dimensions need to be specified in the same order as they appear in the path pattern or inside the NetCDF files, respectively. The resulting arrays returned by `Start()` will have the dimensions in the same order as requested in `Start()`, so changing the order in the call can potentially trigger automatic reordering of data, which is time consuming. But, depending on the use case, it may be a good idea to ask for a specific dimension order so that the data is properly arranged for making posterior calculations more efficient. Remember that the order of the dimensions in R is "big endian"; the values are consecutive along the first (left-most) dimension. In contrast, the order of dimensions in NetCDF files is "little endian". This means that if you want to respect the order of the data values in memory as stored in the NetCDF files, you should request the dimensions in your `Start()` call in the opposite order. @@ -841,6 +841,11 @@ You can click on the image to expand it. ### What to do if your function has too many target dimensions ### Pending features +- Adding feature for `Copute()` to run on multiple HPCs or workstations. +- Adding plug-in to read CSV files. +- Supporting multiple steps in a workflow defined by `AddStep()`. +- Adding feature in `Start()` to read sparse grid points. +- Allow for chunking along "essential" (a.k.a. "target") dimensions. ## Other examples -- GitLab From e38039b75318db2836d082b83091617b4cfa8aa6 Mon Sep 17 00:00:00 2001 From: Nicolau Manubens Date: Mon, 28 Jan 2019 12:27:20 +0100 Subject: [PATCH 16/20] Fixed in links. --- README.md | 2 +- inst/doc/practical_guide.md | 28 ++++++++++++++-------------- 2 files changed, 15 insertions(+), 15 deletions(-) diff --git a/README.md b/README.md index 131c296..664e1c8 100644 --- a/README.md +++ b/README.md @@ -38,7 +38,7 @@ devtools::install_git('https://earth.bsc.es/gitlab/es/startR') ### How it works -An overview example of how to process a large data set is shown in the following. You can see real use cases in the [**Practical guide for processing large data sets with startR**](inst/doc/practical_guide_bsc.md), and you can find more information on the use of the `Start()` function in the [**Start()**](inst/doc/start.md) documentation page, as well as in the documentation of the functions in the package. +An overview example of how to process a large data set is shown in the following. You can see real use cases in the [**Practical guide for processing large data sets with startR**](inst/doc/practical_guide.md), and you can find more information on the use of the `Start()` function in the [**Start()**](inst/doc/start.md) documentation page, as well as in the documentation of the functions in the package. The purpose of the example in this section is simply to illustrate how the user is expected to use startR once the framework is deployed on the workstation and HPC. It shows how a simple addition and averaging operation is performed on BSC's CTE-Power HPC, over a multi-dimensional climate data set, which lives in the BSC-ES storage infrastructure. As mentioned in the introduction, the user will need to declare the involved data sources, the workflow of operations to carry out, and the computing environment and parameters. diff --git a/inst/doc/practical_guide.md b/inst/doc/practical_guide.md index 13659bc..0aa31fe 100644 --- a/inst/doc/practical_guide.md +++ b/inst/doc/practical_guide.md @@ -6,20 +6,20 @@ If you would like to start using startR rightaway on the BSC infrastructure, you ## Index -1. [**Motivation**](inst/doc/practical_guide_bsc.md#motivation) -2. [**Introduction**](inst/doc/practical_guide_bsc.md#introduction) -3. [**Configuring startR**](inst/doc/practical_guide_bsc.md#configuring-startr) -4. [**Using startR**](inst/doc/practical_guide_bsc.md#using-startr) - 1. [**Start()**](inst/doc/practical_guide_bsc.md#start) - 2. [**Step() and AddStep()**](inst/doc/practical_guide_bsc.md#step-and-addstep) - 3. [**Compute()**](inst/doc/practical_guide_bsc.md#compute) - 1. [**Compute() locally**](inst/doc/practical_guide_bsc.md#compute-locally) - 2. [**Compute() on CTE-Power 9**](inst/doc/practical_guide_bsc.md#compute-on-cte-power-9) - 3. [**Compute() on the fat nodes and other HPCs**](inst/doc/practical_guide_bsc.md#compute-on-the-fat-nodes-and-other-hpcs) - 4. [**Collect() and the EC-Flow GUI**](inst/doc/practical_guide_bsc.md#collect-and-the-ec-flow-gui) -5. [**Additional information**](inst/doc/practical_guide_bsc.md#additional-information) -6. [**Other examples**](inst/doc/practical_guide_bsc.md#other-examples) -7. [**Compute() cluster templates**](inst/doc/practical_guide_bsc.md#compute-cluster-templates) +1. [**Motivation**](inst/doc/practical_guide.md#motivation) +2. [**Introduction**](inst/doc/practical_guide.md#introduction) +3. [**Configuring startR**](inst/doc/practical_guide.md#configuring-startr) +4. [**Using startR**](inst/doc/practical_guide.md#using-startr) + 1. [**Start()**](inst/doc/practical_guide.md#start) + 2. [**Step() and AddStep()**](inst/doc/practical_guide.md#step-and-addstep) + 3. [**Compute()**](inst/doc/practical_guide.md#compute) + 1. [**Compute() locally**](inst/doc/practical_guide.md#compute-locally) + 2. [**Compute() on CTE-Power 9**](inst/doc/practical_guide.md#compute-on-cte-power-9) + 3. [**Compute() on the fat nodes and other HPCs**](inst/doc/practical_guide.md#compute-on-the-fat-nodes-and-other-hpcs) + 4. [**Collect() and the EC-Flow GUI**](inst/doc/practical_guide.md#collect-and-the-ec-flow-gui) +5. [**Additional information**](inst/doc/practical_guide.md#additional-information) +6. [**Other examples**](inst/doc/practical_guide.md#other-examples) +7. [**Compute() cluster templates**](inst/doc/practical_guide.md#compute-cluster-templates) ## Motivation -- GitLab From 103a0b917192193225b7e26f49a800e68a47bd4b Mon Sep 17 00:00:00 2001 From: Nicolau Manubens Date: Fri, 1 Feb 2019 17:30:10 +0100 Subject: [PATCH 17/20] Updated MergeArrayDims. --- R/ByChunks.R | 2 +- R/Start.R | 4 ++-- R/Utils.R | 7 ++++++- 3 files changed, 9 insertions(+), 4 deletions(-) diff --git a/R/ByChunks.R b/R/ByChunks.R index c43f4d1..933fc71 100644 --- a/R/ByChunks.R +++ b/R/ByChunks.R @@ -233,7 +233,7 @@ ByChunks <- function(step_fun, cube_headers, ..., chunks = 'auto', if (is.null(all_dims_merged)) { all_dims_merged <- i } else { - all_dims_merged <- startR:::.MergeArrayDims(all_dims_merged, i)[[1]] + all_dims_merged <- startR:::.MergeArrayDims(all_dims_merged, i)[[3]] } } all_dimnames <- names(all_dims_merged) diff --git a/R/Start.R b/R/Start.R index fc05ee5..168fae7 100644 --- a/R/Start.R +++ b/R/Start.R @@ -2461,12 +2461,12 @@ print("-> PROCEEDING TO CROP VARIABLES") total_inner_dims <- inner_dims } else { new_dims <- .MergeArrayDims(total_inner_dims, inner_dims) - total_inner_dims <- pmax(new_dims[[1]], new_dims[[2]]) + total_inner_dims <- new_dims[[3]] } } } new_dims <- .MergeArrayDims(dim(array_of_files_to_load), total_inner_dims) - final_dims <- pmax(new_dims[[1]], new_dims[[2]])[dim_names] + final_dims <- new_dims[[3]][dim_names] # final_dims_fake is the vector of final dimensions after having merged the # 'across' file dimensions with the respective 'across' inner dimensions, and # after having broken into multiple dimensions those dimensions for which diff --git a/R/Utils.R b/R/Utils.R index 6f7b6a1..0d1fdc6 100644 --- a/R/Utils.R +++ b/R/Utils.R @@ -528,6 +528,11 @@ chunk <- function(chunk, n_chunks, selectors) { # It expects as inputs two named numeric vectors, and it extends them # with dimensions of length 1 until an ordered common dimension # format is reached. +# The first output is dims1 extended with 1s. +# The second output is dims2 extended with 1s. +# The third output is a merged dimension vector. If dimensions with +# the same name are found in the two inputs, and they have a different +# length, the maximum is taken. .MergeArrayDims <- function(dims1, dims2) { new_dims1 <- c() new_dims2 <- c() @@ -555,7 +560,7 @@ chunk <- function(chunk, n_chunks, selectors) { new_dims1 <- c(new_dims1, dims_to_add) new_dims2 <- c(new_dims2, dims2) } - list(new_dims1, new_dims2) + list(new_dims1, new_dims2, pmax(new_dims1, new_dims2)) } # This function takes two named arrays and merges them, filling with -- GitLab From 7bb9c79c7a80cfc6f3d2f8d17c2c98a26441661e Mon Sep 17 00:00:00 2001 From: Nicolau Manubens Date: Tue, 5 Feb 2019 19:36:29 +0100 Subject: [PATCH 18/20] Enhancement in polling. --- R/Collect.R | 3 ++- R/Compute.R | 6 +++--- R/Utils.R | 22 ++++++++++++++++++++-- inst/chunking/load_process_save_chunk.R | 6 +++++- 4 files changed, 30 insertions(+), 7 deletions(-) diff --git a/R/Collect.R b/R/Collect.R index 44cfda5..2325d55 100644 --- a/R/Collect.R +++ b/R/Collect.R @@ -43,7 +43,8 @@ Collect <- function(startr_exec, wait = TRUE, remove = TRUE) { } done <- FALSE attempt <- 1 - sum_received_chunks <- 0 + sum_received_chunks <- sum(grepl('output.*\\.Rds', + list.files(ecflow_suite_dir_suite))) if (cluster[['bidirectional']]) { t_transfer_back <- NA } else { diff --git a/R/Compute.R b/R/Compute.R index 29d6be0..69c1bce 100644 --- a/R/Compute.R +++ b/R/Compute.R @@ -4,13 +4,13 @@ Compute <- function(workflow, chunks = 'auto', ecflow_server = NULL, silent = FALSE, debug = FALSE, wait = TRUE) { # Check workflow - if (!any(c('startR_cube', 'startR_workflow') %in% class(workflow))) { - stop("Parameter 'workflow' must be an object of class 'startR_cube' as ", + if (!any(c('startR_header', 'startR_workflow') %in% class(workflow))) { + stop("Parameter 'workflow' must be an object of class 'startR_header' as ", "returned by Start or of class 'startR_workflow' as returned by ", "AddStep.") } - if ('startR_cube' %in% class(workflow)) { + if ('startR_header' %in% class(workflow)) { #machine_free_ram <- 1000000000 #max_ram_ratio <- 0.5 #data_size <- prod(c(attr(workflow, 'Dimensions'), 8)) diff --git a/R/Utils.R b/R/Utils.R index 0d1fdc6..699468a 100644 --- a/R/Utils.R +++ b/R/Utils.R @@ -742,8 +742,26 @@ chunk <- function(chunk, n_chunks, selectors) { found_chunk <- which(found_chunks_str == paste(chunk_indices_on_file, collapse = '_'))[1] if (length(found_chunk) > 0) { - array_of_chunks[[i]] <- readRDS(paste0(shared_dir, '/', - chunk_files_original[found_chunk])) + num_tries <- 5 + found <- FALSE + try_num <- 1 + while ((try_num <= num_tries) && !found) { + array_of_chunks[[i]] <- try({ + readRDS(paste0(shared_dir, '/', + chunk_files_original[found_chunk])) + }) + if (('try-error' %in% class(array_of_chunks[[i]]))) { + message("Waiting for an incomplete file transfer...") + Sys.sleep(5) + } else { + found <- TRUE + } + try_num <- try_num + 1 + } + if (!found) { + stop("Could not open one of the chunks. Might be a large chunk ", + "in transfer. Merge aborted, files have been preserved.") + } } } diff --git a/inst/chunking/load_process_save_chunk.R b/inst/chunking/load_process_save_chunk.R index 8a5843c..a8a31a8 100644 --- a/inst/chunking/load_process_save_chunk.R +++ b/inst/chunking/load_process_save_chunk.R @@ -104,7 +104,11 @@ for (component in names(res)) { for (i in 1:total_specified_dims) { filename <- paste0(filename, param_dimnames[i], '_', chunk_indices[i], '__') } - saveRDS(res[[component]], file = paste0(out_dir, '/', filename, '.Rds')) + # Saving in a temporary file, then renaming. This way, the polling mechanism + # won't transfer back results before the save is completed. + saveRDS(res[[component]], file = paste0(out_dir, '/', filename, '.Rds.tmp')) + file.rename(paste0(out_dir, '/', filename, '.Rds.tmp'), + paste0(out_dir, '/', filename, '.Rds')) } rm(res) gc() -- GitLab From 041ef18db7687fae50a459047f00355ecc07f68d Mon Sep 17 00:00:00 2001 From: Nicolau Manubens Date: Tue, 5 Feb 2019 19:40:03 +0100 Subject: [PATCH 19/20] Properly naming cubes. --- R/AddStep.R | 8 ++++---- R/ByChunks.R | 6 +++--- R/Compute.R | 8 ++++---- R/Start.R | 4 ++-- 4 files changed, 13 insertions(+), 13 deletions(-) diff --git a/R/AddStep.R b/R/AddStep.R index fece572..c34e1b0 100644 --- a/R/AddStep.R +++ b/R/AddStep.R @@ -5,20 +5,20 @@ AddStep <- function(inputs, step_fun, ...) { } # Check inputs - if (any(c('startR_header', 'startR_workflow') %in% class(inputs))) { + if (any(c('startR_cube', 'startR_workflow') %in% class(inputs))) { inputs <- list(inputs) names(inputs) <- 'input1' } if (is.list(inputs)) { if (any(!sapply(inputs, - function(x) any(c('startR_header', + function(x) any(c('startR_cube', 'startR_workflow') %in% class(x))))) { stop("Parameter 'inputs' must be one or a list of objects of the class ", - "'startR_header' or 'startR_workflow'.") + "'startR_cube' or 'startR_workflow'.") } } else { stop("Parameter 'inputs' must be one or a list of objects of the class ", - "'startR_header' or 'startR_workflow'.") + "'startR_cube' or 'startR_workflow'.") } # Consistency checks diff --git a/R/ByChunks.R b/R/ByChunks.R index 933fc71..f9b4027 100644 --- a/R/ByChunks.R +++ b/R/ByChunks.R @@ -27,12 +27,12 @@ ByChunks <- function(step_fun, cube_headers, ..., chunks = 'auto', MergeArrays <- startR:::.MergeArrays # Check input headers - if ('startR_header' %in% class(cube_headers)) { + if ('startR_cube' %in% class(cube_headers)) { cube_headers <- list(cube_headers) } if (!all(sapply(lapply(cube_headers, class), - function(x) 'startR_header' %in% x))) { - stop("All objects passed in 'cube_headers' must be of class 'startR_header', ", + function(x) 'startR_cube' %in% x))) { + stop("All objects passed in 'cube_headers' must be of class 'startR_cube', ", "as returned by Start().") } diff --git a/R/Compute.R b/R/Compute.R index 69c1bce..12c2dce 100644 --- a/R/Compute.R +++ b/R/Compute.R @@ -4,13 +4,13 @@ Compute <- function(workflow, chunks = 'auto', ecflow_server = NULL, silent = FALSE, debug = FALSE, wait = TRUE) { # Check workflow - if (!any(c('startR_header', 'startR_workflow') %in% class(workflow))) { - stop("Parameter 'workflow' must be an object of class 'startR_header' as ", + if (!any(c('startR_cube', 'startR_workflow') %in% class(workflow))) { + stop("Parameter 'workflow' must be an object of class 'startR_cube' as ", "returned by Start or of class 'startR_workflow' as returned by ", "AddStep.") } - if ('startR_header' %in% class(workflow)) { + if ('startR_cube' %in% class(workflow)) { #machine_free_ram <- 1000000000 #max_ram_ratio <- 0.5 #data_size <- prod(c(attr(workflow, 'Dimensions'), 8)) @@ -55,7 +55,7 @@ Compute <- function(workflow, chunks = 'auto', attr(workflow$fun, 'UseLibraries'), attr(workflow$fun, 'UseAttributes')) - if (!all(sapply(workflow$inputs, class) == 'startR_header')) { + if (!all(sapply(workflow$inputs, class) == 'startR_cube')) { stop("Workflows with only one step supported by now.") } # Run ByChunks with the combined operation diff --git a/R/Start.R b/R/Start.R index 168fae7..bac9c66 100644 --- a/R/Start.R +++ b/R/Start.R @@ -2868,7 +2868,7 @@ print(str(picked_vars)) FileSelectors = file_selectors, PatternDim = found_pattern_dim) ) - attr(data_array, 'class') <- c('startR_cube', attr(data_array, 'class')) + attr(data_array, 'class') <- c('startR_array', attr(data_array, 'class')) data_array } else { if (!silent) { @@ -2898,7 +2898,7 @@ print(str(picked_vars)) NULL }) ) - attr(start_call, 'class') <- c('startR_header', attr(start_call, 'class')) + attr(start_call, 'class') <- c('startR_cube', attr(start_call, 'class')) start_call } } -- GitLab From 9d8aca2c95f617a490f57227488ebe5b612cfceb Mon Sep 17 00:00:00 2001 From: Nicolau Manubens Date: Tue, 5 Feb 2019 19:44:00 +0100 Subject: [PATCH 20/20] Bumped version to v0.1.1. --- DESCRIPTION | 2 +- startR-manual.pdf | Bin 143986 -> 143986 bytes 2 files changed, 1 insertion(+), 1 deletion(-) diff --git a/DESCRIPTION b/DESCRIPTION index ba95e15..c914a29 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -1,6 +1,6 @@ Package: startR Title: Automatically Retrieve Multidimensional Distributed Data Sets -Version: 0.1.0 +Version: 0.1.1 Authors@R: c( person("BSC-CNS", role = c("aut", "cph")), person("Nicolau", "Manubens", , "nicolau.manubens@bsc.es", role = c("aut", "cre")), diff --git a/startR-manual.pdf b/startR-manual.pdf index fb1a048ade6bde2ce33c175e50928af4f34a1650..37dadec8798bed8e656b3fb8714ae5a469ab8e5b 100644 GIT binary patch delta 1050 zcmV+#1m*kkU`T1`{Ary@P!f5O(ll|S0L`KZZen;- zt6GiS*@(%U?n_2JNiRUL7sDs5$Ymp|f=HHZsN{@}y>Rq-?1m!~*4rkREMo;N z1*=Mu>vyl2fQQ#(504s2GxH{qMBgo`6i1^l&P?18B}t%N`U4Pu#Usj(nhD}5!8k=#(3?o2`@C#Bd_yf9Cf z1}wti1Ir!Jtc8NCeNk=nwdjzaC?5f|1o>V9fwaWZKJ}B5b@)Fh`+!57R(VcS73-db z4;Xtn74VE}1(r9g| zm^a()Lj_HLt411F3i7FgL_TJyXqY^h36;kEM3jn;_cjfJ5}xiJJf(T|s6?JgsUA(O z251KXR;h**<+>m;^i+>eaJ=7lYg%UFSkDby6uN=!xr>K3_Ub2n?;{w;ZO=Mp-KZH@ z|E*AoK5E@hkOz|*Tku(#nFOxcwt$|;ZK-V<*>HM)(F63duI&Wq`|GI*rTn!8LO6^H znjarp+K++zOT{fApMf4F;{JAi{n}U-US`@bQ$$J0q51bDiZ?6;-q0sTG@f-M`sUa7 zHxvJ<8Lb2+uytYI(_+6bZU!~S5W0zw!(&P^M z;xR6UD2Thg_i8zx%x{;v9_~ojp3{2w@j|Z=SuO@s!zO$St4K~7vk{ZBS+j2hi~$6c z=Q_KC{{@Ht1p&AJ1p=5C2{JG>F*!6dGcuQ;9ReB&GB7nUIW#jfGPlDW0x=LnMKm%( zH$+1*GBZUtHZ?OaH8U_lG(j~oFhw{sMM6YDJ|H|rG%`XrL_;w$GetKxH8U_ZGcZ9k UK{YZkML07>LPSBgH!cD@1Vk(3$^ZZW delta 1050 zcmV+#1m*kk}7HyoL;-Zr^p87pWh zSXGi-zkAIDJiH!zc+^OmnKy|f`ff?3I2wg~oz%#CqlNGmr*<_`@=2dD9fb-Le>w8X%O47a_qvBpjT3AEe`olG+xG=CZLI;J1 z;jnVs3qA#J^vjggWX*CW88xc^*-`spT?xU~IYs*AYDaB`Y_4ivDf6FMX2nTqqm-0y zbS=|iVr$A%%J%EQAgnPx7(+{{0nUdKL&QLzfXFR`IpL~*pe;FygBAFoQ!(lOVvk<5 zB!GyqAuS>eBw`pOrw3#ByP-lFGTS6}M7NR;?E?Og{G!7;r9yO>+-E$*A&2lBWPj)L z9vW@5<0u@NZ2!qgJq*D##oC6tY05tNySmLXYrd{4z8`JcKrs}L`K?NsSZ7m9qqU`C z-fXuI6*R4X8fjoD$fphx`Iw=iVe()mR2ugaQ7S&(+cXGDc)EM=l;+u^5_u-2dNj2f zpdAEQr5aL{>w?J8Q$0Sx@qXW}X_<**JvVSs=mxguE*{$0tDp3}k6;|PJ?oftqh?_J zw?ZZQsC7R<9!zR%!DnS=61ZmD0(u^|rM78g!|6qT575iHwiBT5uLl##RsPxnAsj{p z&5sW)?Z?3VrQ(*5&p?k7aeq6%er+rZFEeeJDWat0(ENK6#T%9aZ|D;v8qc~Bee>)4 zn~DF_j8*~@*t)RqX|dlIHv<`r)b@fM4lm8^qSci2x10!8wJv``i1a#$-oNbj4lScl% zZ}+of+=);y>$V-q8+x1@`tgC8TS_xAuZo)GtdB z@fa6F6vW-$d$pWT=C?~-4|k+%&uP8;c%j!3EEj{RVH3WERU{{k*@$6rU$buni~$6R zd^&=I{{@Ht1p&AJ1p=5C2{AG;GB7kZG&Gl>9ReB&F)}eSFf=zbG`GVY0x=LnIW{sd zLoq}`LNqrwMMOq1MMg6