This is an old revision of the document!
At BSC-ES there is a collaborative effort to share resources for data analysis in R and develop some in-houseR tools that are maintained by the R developer team. The R tools can be used for postprocessing experiments: loading data, computing prediction scores, indices or calibrate experiments as well as plotting, formating data and saving data. Furthermore, by dividing the data in chunks, you can speed your execution by using startR. You will find more information about our in-house packages further down, in the 'In-house Packages' section.
You can join the Earth RTools mailing list to receive the latest updates about news. Check the list of R tips below to learn about BSC-ES infrastructure and other R tips.
The Workstations, the BSC-ES Hub and the HPC machines use environment modules, which are maintained by our IT team. Each module contains a set of software packages that can only be used if the module has been loaded in the environment.
For example, to load CDO and R in the workstation, you should load the corresponding modules with the 'module load' command:
module load CDO/1.9.8-foss-2015a module load R/4.1.2-foss-2015a-bare
The currently maintained R module versions are:
Check the wiki page for each machine to see if you need to follow any additional steps to be able to load the correct modules.
The R modules contain the latest released version of our in-house R packages, as well as many other R packages that may be used by people in the department. There is no need to install any packages locally by yourself; you can load the corresponding module and check if the R package is already installed. If it is not installed, you may open an issue in the Requests GitLab project (https://earth.bsc.es/gitlab/es/requests/-/issues) and tag Stamen Miroslavov Minkov (@smirosla) to ask him to install it in the machines.
Some packages may require additional modules; see the ‘R tips’ section below.
The list of R packages installed in the latest maintained R modules (as of 2024-02-02) can be found here, and a list of functions in each department R package (as of 2021-01-18) can be found here or in the documentation for each package on CRAN.
A monthly meeting takes place in the department to discuss about the plans and priorities of the R tools (or any other topic we need to discuss). Here you can find a document were the minutes are being gathered:
Brief Summary and links to in-house packages: Package Summary
See specific information for each tool:
We use issues to identify and address bugs and propose new developments in the codebase. If you find a problem in our tools, please read this document to see if you should and how to report an issue. Guidelines for R-related questions in Earth Sciences
If we want to test a developing function from GitLab, sourcing the file on GitLab is very useful. Of course, you can save the file and source it locally, but when the file is updated, you need to manually update the local file, which is not convenient.
If the repository is public, you can follow these two ways.
1. If you have the git repo cloned under your personal directory:
path <- "/home/Earth/aho/s2dv/R/" # the git repo path ff <- lapply(list.files(path), function(x) paste0(path, x)) invisible(lapply(ff, source)) # load all the depended packages lib <- c('parallel', 'abind', 'bigmemory', 'future', 'multiApply', 'PCICt', 'ClimProjDiags', 'ncdf4', 'plyr', 'easyNCDF') invisible(lapply(lib, library, character.only = TRUE))
2. If you don't have a local git repo:
Source the file by raw file URL. E.g.,
source("https://earth.bsc.es/gitlab/external/cstools/-/raw/master/R/s2dv_cube.R")
Remember that you need to use the raw URL, which can be found in the upper-right corner of the file. Note that you may need to load other packages or source other functions used in the sourced function. In contrast, by the first method above, all the possible needs are taken care of already.
If the repository is not public, see this slide
If you use any R packages, both developed in or outside of the department, you can include a citation in your research items.
Reference: https://ropensci.org/blog/2021/11/16/how-to-cite-r-and-r-packages/
To get the most updated citation text, you can simply use R function “citation”. For example,
> citation("s2dv") To cite package 's2dv' in publications use: BSC-CNS, An-Chi Ho and Nuria Perez-Zanon (2023). s2dv: A Set of Common Tools for Seasonal to Decadal Verification. R package version 2.0.0. https://CRAN.R-project.org/package=s2dv A BibTeX entry for LaTeX users is @Manual{, title = {s2dv: A Set of Common Tools for Seasonal to Decadal Verification}, author = {{BSC-CNS} and An-Chi Ho and Nuria Perez-Zanon}, year = {2023}, note = {R package version 2.0.0}, url = {https://CRAN.R-project.org/package=s2dv}, }
Notice that you may want to change the version number in case the research is done with a previous version.
To cite R software along with its base packages, type citation(). The year varies with the R version you use.
> citation() To cite R in publications use: R Core Team (2021). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/. A BibTeX entry for LaTeX users is @Manual{, title = {R: A Language and Environment for Statistical Computing}, author = {{R Core Team}}, organization = {R Foundation for Statistical Computing}, address = {Vienna, Austria}, year = {2021}, url = {https://www.R-project.org/}, } We have invested a lot of time and effort in creating R, please cite it when using it for data analysis. See also ‘citation("pkgname")’ for citing R packages.
Some packages have their own publication, which can also be considered being included.
Pérez-Zanón, N., Caron, L.-P., Terzago, S., Van Schaeybroeck, B., Lledó, L., Manubens, N., Roulin, E., Alvarez-Castro, M. C., Batté, L., Bretonnière, P.-A., Corti, S., Delgado-Torres, C., Domínguez, M., Fabiano, F., Giuntoli, I., von Hardenberg, J., Sánchez-García, E., Torralba, V., and Verfaillie, D.: Climate Services Toolbox (CSTools) v4.0: from climate forecasts to climate forecast information, Geosci. Model Dev., 15, 6115–6142, https://doi.org/10.5194/gmd-15-6115-2022, 2022.
Pérez-Zanón, N., Ho, A. Chou, C., Lledó, L., Marcos-Matamoros, R., Rifà, E. and González-Reviriego, N. (2023). CSIndicators: Get tailored climate indicators for applications in your sector. Climate Services. https://doi.org/10.1016/j.cliser.2023.100393
Nicolau Manubens, Louis-Philippe Caron, Alasdair Hunter, Omar Bellprat, Eleftheria Exarchou, Neven S. Fučkar, Javier Garcia-Serrano, François Massonnet, Martin Ménégoz, Valentina Sicardi, Lauriane Batté, Chloé Prodhomme, Verónica Torralba, Nicola Cortesi, Oriol Mula-Valls, Kim Serradell, Virginie Guemas, Francisco J. Doblas-Reyes, An R package for climate forecast verification, Environmental Modelling & Software, Volume 103, 2018, Pages 29-42, ISSN 1364-8152,https://doi.org/10.1016/j.envsoft.2018.01.018.
You can also choose to put the packages in the acknowledgment section. Here is an example from 'How Reliable Are Decadal Climate Predictions of Near-Surface Air Temperature?' (Verfaille et al, 2020):
We acknowledge the use of the s2dverification (Manubens et al. 2018), startR (BSC/CNS and Manubens 2020), SpecsVerification (Siegert 2017), CSTools (Pérez-Zanón et al. 2019), ClimProjDiags (BSC/CNS et al. 2020), and boot (Davison and Hinkley 1997; Canty and Ripley 2020) R (R Core Team 2013) software packages.
It is recommended to add some sentences in the manuscripts where readers can check the full list of software used. For example,
All analyses were performed using R Statistical Software (v4.1.2; R Core Team 2021). Temperature data was processed via the R packages: startR (v2.3.0; BSC-CNS etc. 2023), s2dv (v2.0.0; BSC-CNS etc. 2023). The indices were calculated using the R package: CSIndicators (v1.0.1; N. Perez etc. 2023).
You can launch RStudio-server on workstation or on Nord3v2. It opens RStudio IDE as a webpage. Please follow the instruction:
General R solutions that are useful in the department but doesn't belong exclusively to an in-house R package will be listed here.
1. BSC-ES infrastructure
How to access, what you will find (servers, machines, partitions, modules) and how to open RStudio: check the slides and share them with your colleagues. slides
How to access BSC Hub and run R from the terminal and using VSCode: ghr_-_bschub_demo.pdf
2. How to change a CDO version in your open R session
> system('module load CDO/1.5.3-foss-2015a') The following have been reloaded with a version change: 1) CDO/1.6.3-foss-2015a => CDO/1.5.3-foss-2015a
In this case, CDO version has been changed from a newer to an older version.
Remember that you can see the full list of CDO version by running:
module av CDO
in your terminal
3. How to load dependencies of R package sf, terra, and mapview
This package could be used by loading the following modules (note that order may affect):
(on workstation)
module load R/4.1.2-foss-2015a-bare module load GDAL/2.2.1-foss-2015a module load PROJ/4.8.0-foss-2015a module load GEOS/3.7.2-foss-2015a-Python-2.7.9
(on nord3v2)
module load R/4.1.2-foss-2019b module load GDAL/3.3.2-foss-2019b-Python-3.7.4 module load PROJ/7.2.1-foss-2019b module load GEOS/3.7.2-foss-2019b-Python-3.7.4
(on hub)
module load R/4.2.1-foss-2021b module load GDAL/3.5.2-foss-2021b-Python-3.9.6 module load PROJ/9.1.0-foss-2021b module load GEOS/3.11.0-GCC-11.2.0
Note: Avoid to include them in your bashrc, just load them when it is a requirement.
Note2: to use library RNetCDF `module load HDF5/1.10.5-foss-2015a` is required.
4. How to load dependencies of R package rgdal
This package could be used by loading the following modules in this specific order:
- for R 3.6.1:
module load R/3.6.1-foss-2015a-bare module load GDAL/2.2.1-foss-2015a-GEOS-3.8.0 # if necessary add: module load PROJ/6.1.1-foss-2015a
- for R 4.1.2:
module load R/4.1.2-foss-2015a-bare module load GDAL/2.2.1-foss-2015a # if necessary add: module load PROJ/6.1.1-foss-2015a
Note: Avoid including them in your bashrc, just load them when it is a requirement.
5. How to avoid Load error in R 3.6.1: cdo -griddes core dumped
To avoid an error of Load because of the command CDO -griddes, a different version of HDF5 is required:
module load HDF5/1.8.14-foss-2015a
The same requirement also applies to s2dv::CDORemap and startR::CDORemapper.
6. How to use 'rmapshaper' library in workstation and Nord3v2
To correctly use the R library rmapshaper, you need to load the following modules:
- On workstation:
module load R/4.1.2-foss-2015a-bare module load libprotobuf/3.5.1-foss-2015a libjq/1.5.-foss-2015a nodejs/10.21.0-foss-2015a
- On Nord3v2:
module load R/4.1.2-foss-2019b module load protobuf/3.7.1-GCCcore-8.3.0 jq/1.5-GCCcore-8.3.0 nodejs/10.21.0-GCCcore-8.3.0
7. How to avoid plotting issues (as fuzzy labels) in Nord3
To save good quality plots created in Nord3, the library 'ragg' is necessary. It is already installed, so, in your R code or R session, before generating the plot, load the library, and use agg_png() function to define the name, size and resolution of your plot. Then, create your plot and close the device. Here, there are two examples:
library(CSTools) library(ragg) # Ex1: agg_png("fig1.png", width = 1000, height = 500, units = 'px',res = 144) fcst <- data.frame(fcst1=rnorm(mean=25,sd=3,n=30),fcst2=rnorm(mean=23,sd=4.5,n=30)) PlotForecastPDF(fcst,tercile.limits=c(20,26)) dev.off() # Ex2: agg_png("fig2.png", width = 1000, height = 1000, units = 'px',res = 144) PlotMostLikelyQuantileMap(list(a, b, c), lons, lats, toptitle = 'Most likely tercile map', bar_titles = paste('% of belonging to', c('a', 'b', 'c')), brks = 20) dev.off()
In case you want to save your plot in .ps format, you don't need this library. You can adapt the following lines to your ploting function:
postscript(‘name_and_path_to_the_file.ps’) anyplotingfunction() dev.off()
8. Error * caught segfault * address 0x18, cause 'memory not mapped'
If this error appears, check that the partition `/dev/shm/` is empty. In case, trash files are occupying this partition, the process you are running may fail. Remove the files and re-run your code.
If the error persists, check your code with a smaller data sample to discard a problem with your code since this error message indicates that you are requesting more memory than the available.
9. Special characters
In case of problems with accents or special characters, try to change the R session language to the most convenient: `Sys.setlocale(“LC_ALL”,“en_US.UTF-8”)`.
10. Test the development on Git
There are several ways to source the functions under development. Depending on the attributes of the function and the use case, the most suitable way can vary. Check the slides: source_git_function.pdf
11. How to load dependencies of R package RNetCDF
This package could be used by loading the following modules (note that order may affect):
(on workstation)
module load R/4.1.2-foss-2015a-bare module load HDF5/1.10.5-foss-2015a
(on nord3v2)
module load R/4.1.2-foss-2019b module load HDF5/1.10.5-gompi-2019b