Commits (2)
# BSC Trainings
# BSC R Trainings
## Getting started
To make it easy for you to get started with GitLab, here's a list of recommended next steps.
Already a pro? Just edit this README.md and make it your own. Want to make it easy? [Use the template at the bottom](#editing-this-readme)!
## Add your files
- [ ] [Create](https://docs.gitlab.com/ee/user/project/repository/web_editor.html#create-a-file) or [upload](https://docs.gitlab.com/ee/user/project/repository/web_editor.html#upload-a-file) files
- [ ] [Add files using the command line](https://docs.gitlab.com/ee/gitlab-basics/add-file.html#add-a-file-using-the-command-line) or push an existing Git repository with the following command:
```
cd existing_repo
git remote add origin https://earth.bsc.es/gitlab/vagudets/bsc-trainings-r.git
git branch -M main
git push -uf origin main
```
## Integrate with your tools
- [ ] [Set up project integrations](https://earth.bsc.es/gitlab/vagudets/bsc-trainings-r/-/settings/integrations)
## Collaborate with your team
- [ ] [Invite team members and collaborators](https://docs.gitlab.com/ee/user/project/members/)
- [ ] [Create a new merge request](https://docs.gitlab.com/ee/user/project/merge_requests/creating_merge_requests.html)
- [ ] [Automatically close issues from merge requests](https://docs.gitlab.com/ee/user/project/issues/managing_issues.html#closing-issues-automatically)
- [ ] [Enable merge request approvals](https://docs.gitlab.com/ee/user/project/merge_requests/approvals/)
- [ ] [Automatically merge when pipeline succeeds](https://docs.gitlab.com/ee/user/project/merge_requests/merge_when_pipeline_succeeds.html)
## Test and Deploy
Use the built-in continuous integration in GitLab.
- [ ] [Get started with GitLab CI/CD](https://docs.gitlab.com/ee/ci/quick_start/index.html)
- [ ] [Analyze your code for known vulnerabilities with Static Application Security Testing(SAST)](https://docs.gitlab.com/ee/user/application_security/sast/)
- [ ] [Deploy to Kubernetes, Amazon EC2, or Amazon ECS using Auto Deploy](https://docs.gitlab.com/ee/topics/autodevops/requirements.html)
- [ ] [Use pull-based deployments for improved Kubernetes management](https://docs.gitlab.com/ee/user/clusters/agent/)
- [ ] [Set up protected environments](https://docs.gitlab.com/ee/ci/environments/protected_environments.html)
***
# Editing this README
When you're ready to make this README your own, just edit this file and use the handy template below (or feel free to structure it however you want - this is just a starting point!). Thank you to [makeareadme.com](https://www.makeareadme.com/) for this template.
## Suggestions for a good README
Every project is different, so consider which of these sections apply to yours. The sections used in the template are suggestions for most open source projects. Also keep in mind that while a README can be too long and detailed, too long is better than too short. If you think your README is too long, consider utilizing another form of documentation rather than cutting out information.
## Name
Choose a self-explaining name for your project.
## Description
Let people know what your project can do specifically. Provide context and add a link to any reference visitors might be unfamiliar with. A list of Features or a Background subsection can also be added here. If there are alternatives to your project, this is a good place to list differentiating factors.
## Badges
On some READMEs, you may see small images that convey metadata, such as whether or not all the tests are passing for the project. You can use Shields to add some to your README. Many services also have instructions for adding a badge.
## Visuals
Depending on what you are making, it can be a good idea to include screenshots or even a video (you'll frequently see GIFs rather than actual videos). Tools like ttygif can help, but check out Asciinema for a more sophisticated method.
## Installation
Within a particular ecosystem, there may be a common way of installing things, such as using Yarn, NuGet, or Homebrew. However, consider the possibility that whoever is reading your README is a novice and would like more guidance. Listing specific steps helps remove ambiguity and gets people to using your project as quickly as possible. If it only runs in a specific context like a particular programming language version or operating system or has dependencies that have to be installed manually, also add a Requirements subsection.
## Usage
Use examples liberally, and show the expected output if you can. It's helpful to have inline the smallest example of usage that you can demonstrate, while providing links to more sophisticated examples if they are too long to reasonably include in the README.
## Support
Tell people where they can go to for help. It can be any combination of an issue tracker, a chat room, an email address, etc.
## Roadmap
If you have ideas for releases in the future, it is a good idea to list them in the README.
## Contributing
State if you are open to contributions and what your requirements are for accepting them.
For people who want to make changes to your project, it's helpful to have some documentation on how to get started. Perhaps there is a script that they should run or some environment variables that they need to set. Make these steps explicit. These instructions could also be useful to your future self.
You can also document commands to lint the code or run tests. These steps help to ensure high code quality and reduce the likelihood that the changes inadvertently break something. Having instructions for running tests is especially helpful if it requires external setup, such as starting a Selenium server for testing in a browser.
## Authors and acknowledgment
Show your appreciation to those who have contributed to the project.
## License
For open source projects, say how it is licensed.
## Project status
If you have run out of energy or time for your project, put a note at the top of the README saying that development has slowed down or stopped completely. Someone may choose to fork your project or volunteer to step in as a maintainer or owner, allowing your project to keep going. You can also make an explicit request for maintainers.
A public repository to store materials and hands-on exercises for R-related BSC trainings
Description:
Author: V. Agudetse
Description: Analysis of MF System 7 with temperature
Analysis:
Horizon: Seasonal
Variables:
name: tas
freq: monthly_mean
units: K
Datasets:
System:
name: Meteo-France-System7
Multimodel: no
Reference:
name: ERA5
Time:
sdate: '1101'
fcst_year: '2020'
hcst_start: '1993'
hcst_end: '2016'
ftime_min: 1
ftime_max: 2
Region:
name: "EU"
latmin: 20
latmax: 80
lonmin: -20
lonmax: 40
Regrid:
method: bilinear
type: 'r360x181'
Workflow:
Anomalies:
compute: yes
cross_validation: yes
save: 'none'
Calibration:
method: mse_min
save: 'none'
Skill:
metric: RPSS, BSS10, BSS90
cross_validation: yes
save: 'all'
Probabilities:
percentiles: [[1/3, 2/3], [1/10, 9/10]]
save: 'none'
Visualization:
plots: skill_metrics, forecast_ensemble_mean, most_likely_terciles
multi_panel: no
projection: cylindrical_equidistant
mask_terciles: no # CHECK
dots: no # CHECK
ncores: 10
remove_NAs: yes
Output_format: S2S4E
Run:
filesystem: esarchive
Loglevel: INFO
Terminal: yes
output_dir: /esarchive/scratch/vagudets/auto-s2s-outputs/
code_dir: /esarchive/scratch/vagudets/repos/auto-s2s/
# Hands-on 3: Verification Workflow with SUNSET
## Goal
Create a SUNSET recipe and use the functions in the suite to reproduce the verification workflow from the previous hands-on exercises.
## 0. Copy the recipe template and modify it
```shell
# Create a copy of the recipe in your directory, adding your name in <user_name>
cp /gpfs/scratch/nct01/nct01001/d2_handson_R/sunset/use_cases/PATC_2023/recipe_PATC_2023.yml /gpfs/scratch/nct01/<your_account>/recipe_PATC_2023-<your_name>.yml
# Open the recipe with a text editor such as vim or emacs
vim /gpfs/scratch/nct01/<your_account>/recipe_PATC_2023-<your_name>.yml
```
Once you have opened your recipe, it's time to edit the contents. For this example, we will evaluate the temperature-at-surface (tas) monthly means, using MeteoFrance System 7 data as our experiment and ERA5 as our reference dataset, for the initialization month of November.
```yaml
Description:
Author: <___>
Description: Calibration and skill assessment of MeteoFrance System 7 surface temperature
Analysis:
Horizon: Seasonal
Variables:
name: tas
freq: monthly_mean
units: <___> # Choose your units: C or K
Datasets:
System:
name: Meteo-France-System7
Multimodel: no
Reference:
name: ERA5
Time:
sdate: '1101'
fcst_year: '2020'
hcst_start: '1993'
hcst_end: '2016'
ftime_min: <___> # Choose the first time step! A number from 1 to 6
ftime_max: <___> # Choose the last time step! A number from 1 to 6
Region:
name: "EU"
latmin: 20
latmax: 80
lonmin: -20
lonmax: 40
Regrid:
method: bilinear
type: 'r360x181' # options: to_system, to_reference, self-defined grid
Workflow:
Anomalies:
compute: yes
cross_validation: yes
save: 'none'
Calibration:
method: <___>
save: 'none'
Skill:
metric: RPSS, BSS10, BSS90
cross_validation: yes
save: 'all'
Probabilities:
percentiles: [[1/3, 2/3], [1/10, 9/10]]
save: 'none'
Visualization:
plots: skill_metrics, forecast_ensemble_mean, most_likely_terciles
multi_panel: no
projection: cylindrical_equidistant
ncores: 10
remove_NAs: yes
Output_format: S2S4E
Run:
Loglevel: INFO
Terminal: yes
output_dir: /gpfs/scratch/nct01/<your_account>/
code_dir: /gpfs/scratch/nct01/nct01001/d2_handson_R/sunset/
```
## 1. Load the required SUNSET modules and read the recipe
Open an R session again, by simply typing `R` on the terminal.
To run SUNSET, we must set up the directory where the code is as our working directory. Then, each SUNSET module should be sourced in order to have access to the module functions in within our R session.
```r
# Set the working directory
setwd("/gpfs/scratch/nct01/nct01001/d2_handson_R/sunset/")
# Load modules
source("modules/Loading/Loading.R")
source("modules/Units/Units.R")
source("modules/Calibration/Calibration.R")
source("modules/Anomalies/Anomalies.R")
source("modules/Skill/Skill.R")
source("modules/Saving/Saving.R")
source("modules/Visualization/Visualization.R")
# Read recipe
recipe_file <- "/gpfs/scratch/nct01/<your_account>/recipe_PATC_2023-<your_name>.yml"
recipe <- prepare_outputs(recipe_file)
```
The function `prepare_outputs()` creates a unique folder for the logs, data files and plots that result from the execution of your recipe, inside the directory you specified. It also runs a check over the recipe to detect any potential errors, misspellings or missing arguments. At the end of the check, a message is displayed indicating whether or not the recipe passed the check, along with the list of errors and warnings.
**Questions**
Read the logs!
(1) Did your recipe pass the check? Did you get any warnings?
(2) Where will your outputs be saved? Copy and paste this directory somewhere, so that you can check it later!
*Tip*: The recipe is now stored as a `list` containing all the information of the original YAML file, plus some extra things! If you want to see any particular element of the recipe from the R session, you can simply access that element in the list. For example:
```r
# Checking the variable name
recipe$Analysis$Variables$name
# Checking the output directory
recipe$Run$output_dir
```
## 2. Load the data and change the units
The **Loading** module retrieves the information from the recipe to load the data that has been requested it in. It loads the experiment data for the hindcast period, the reference data for the corresponding period, and the experiment forecast if a forecast year has been requested.
For certain variables like temperature, precipitation or sea level pressure, the user can request for specific units to load the data in. The **Units** module will read the original units as stored in the netCDF files and perform any necessary unit converstions to match the request in the recipe. It also verifies that all of the loaded datasets share the same units, even if no specific unit has been requested. For this reason, users are strongly encouraged to run it even if they did not request any unit conversion.
```r
# Load datasets
data <- Loading(recipe)
# Change units
data <- Units(recipe, data)
```
**Questions**
(1) What is the structure of `data`? What is the class of the objects in `data`? *Tip*: you can use functions like `class()`, `names()` or `str()` to gain information about the structure of the object and its contents.
```r
class(data)
names(data)
str(data, max.level = 2)
# You can access any of the three objects with the `$` operator:
class(data$____)
```
(2) Pay attention to the log messages: Did your units get converted? Are the new units what you expect? You can check the metadata of any of the objects in data. SUNSET also provides the `data_summary()` function, which lets you have a quick look at your objects:
```r
# Check the new units and data of hcst, fcst and/or obs. Are they all the same?
data$____$attrs$Variable$metadata$tas$units
data_summary(data$____, recipe)
```
(3) What are the dimensions of the datasets? Are they consistent with what is requested in the recipe? *Tip*: Check the data summary!
## 3. Calibrate the data and compute the anomalies
SUNSET has a few modules to perform post-processing on the experimental and the reference datasets. The **Calibration** module performs the bias correction method indicated in the recipe, using the `CSTools::CST_Calibration()` function.
The **Anomalies** module removes the climatologies using functions like `CSTools::CST_Anomaly()` and `s2dv::Clim()`, and also returns the full fields in case they are needed for any future computations.
```r
# Calibrate the data
data <- Calibration(recipe, data)
# Compute anomalies
data <- Anomalies(recipe, data)
```
**Questions**
(1) Verify that you now have anomaly values instead of the original full field. *Tip*: Use `data_summary()` like in the previous example and pay attention to the new values.
## 4. Evaluate the model skill and compute the probability thresholds
The **Skill** module returns a list of all the evaluation metrics requested in the recipe, in the shape of multi-dimensional arrays. In this case, we will compute three metrics:
- **RPSS (Ranked Probability Skill Score)**: This skill score measures how well a forecast predicts the probability of the tercile categories (below normal, normal and above-normal), compared to the climatology.
- **BSS10 and BSS90 (Brier Skill Score):** This skill score measures how well a forecast predicts the probability of the 10th percentile and 90th percentile extreme events, compared to the climatology.
The `Probabilities()` function returns the probability values for each requested category for the hindcast and forecast data, as well as the hindcast percentile values corresponding to each threshold.
```
# Compute skill metrics
skill_metrics <- Skill(recipe, data)
# Compute percentiles and probability bins
probabilities <- Probabilities(recipe, data)
```
**Questions**
(1) What is the structure of `skill_metrics`? Which metrics were computed? What dimensions do they have? *Tip*: use `str()` and `names()`.
(2) What is the structure of `probabilities`? Can you identify the probability categories and the percentiles? *Tip*: use `str()`.
## 5. Plotting the results
Now, let's visualize the information that was computed!
The **Visualization** module will generate the three types of maps we requested in the recipe:
- Skill maps to visualize the skill distribution of the model, for each metric.
- The ensemble mean of the calibrated forecast anomalies.
- A map showing the most likely tercile category for each point in the grid.
With the significance option in the `Visualization()` function, you can choose whether or not to shade the grid points that are statistically significant in each skill metric plot.
```r
# Plot data
Visualization(recipe, data,
skill_metrics = skill_metrics,
probabilities = probabilities,
significance = TRUE)
```
Now, you can `cd` to the the output directory and inspect the contents of the `plots/` subdirectory. The plots are png files that can be visualized with the `display` command. They have a descriptive name including the content of the plot, the date and the forecast time.
**Questions**
(1) Let's take a look at the forecast ensemble mean. What is the sign of the anomalies over Spain? In what regions are the anomalous temperatures strongest?
(2) Let's take a look at the skill metrics RPSS, BSS10 and BSS90. In what regions and for which metrics is the forecast most skillful? *Tip*: Positive values indicate that the model is a better predictor than the climatology, with 1 being the perfect score.
(3) Let's take a look at the Most Likely Terciles plots. This plot indicates the probability of the temperature being below normal, near-normal or above normal. What is the most likely category for Spain?