vagudets · 1affe7b8
--- a/home.md
+++ b/home.md
 ## Introduction

-[WIKI CREATION IN PROGRESS]
+Auto-S2S is the GitLab repository for the ESS Verification Suite, a modular tool for subseasonal to seasonal to decadal forecast verification workflows. It is intended to have a modularized structure, where each module is a separate part of the code that performs a specific task, so that parts of the workflow can be skipped or reordered. 
+The datasets, forecast horizon, time period, skill metrics to compute and other parameters are specified by the user in a configuration file, called "recipe".

-Auto-S2S is the GitLab repository for the ESS Verification Suite, a modular tool for subseasonal to seasonal to decadal forecast verification workflows.
+- Modules currently available: Loading, Calibration, Skill, Saving
+- Modules in development: Visualization
+- Future modules: Downscaling, Aggregation, Indicators

-This tool is currently in the early stages of development, so the code and the information in this wiki may be subject to frequent changes and updates.
+This tool is in the early stages of development, so the code and the information in this wiki may be subject to frequent changes and updates. This wiki contains all the information needed to use the available modules.
+
+Find an example script to run the ESS Verification Suite [in the Auto-S2S code Snippets](https://earth.bsc.es/gitlab/es/auto-s2s/-/snippets/93).

 ## Recipes

-In order to use the suite, users must define a 'recipe' containing all the information pertaining to their workflow. 
+In order to use the Verification Suite, users must define a 'recipe' containing all the information pertaining to their workflow.

-Here is an example of a recipe to load daily mean ECMWF System 5 data from /esarchive/, with a 1993 to 2016 hindcast period, the corresponding ERA5 observations, and a 2020 forecast for the November initialization. The observations will be interpolated to the experiment grid (Regrid type: 'to_system') using bilinear interpolation. The hindcast and forecast will be calibrated using Quantile Mapping, and the Ranked Probability Skill Score (RPSS) will be computed. Any output files will be saved in the output directory.
+Here is an example of a recipe to load monthly mean ECMWF System 5 data from `/esarchive/`, with a 1993 to 2016 hindcast period, the corresponding ERA5 observations, and a 2020 forecast for the November initialization, for the months of November and December. 
+The observations will be interpolated to the experiment grid (Regrid type: 'to_system') using bilinear interpolation. The hindcast and forecast will be calibrated using Quantile Mapping, and the Ranked Probability Skill Score (RPSS) and Continuous Ranked Probability Skill Score (CRPSS) will be computed. 
+The terciles (1/3, 2/3), quartiles (1/4, 2/4, 3/4), extremes (1/10, 9/10) and their corresponding probability bins will also be computed. Any output files will be saved to the output directory.

 ```yaml
 Description:
  Author: V. Agudetse
-  Info: ECMWF System5 Seasonal Forecast Example recipe (daily mean, tas)
+  Info: ECMWF System5 Seasonal Forecast Example recipe (monthly mean, tas)

 Analysis:
  Horizon: seasonal # Mandatory, str: 'subseasonal', 'seasonal', or 'decadal'
  Variables:
    name: tas # Mandatory, str: variable name in /esarchive/
-    freq: daily_mean # Mandatory, str: 'monthly_mean' or 'daily_mean'
+    freq: monthly_mean # Mandatory, str: 'monthly_mean' or 'daily_mean'
  Datasets:
    System:
      name: system5c3s # Mandatory, str: System codename.
@@ -29,13 +36,12 @@ Analysis:
    Reference:
      name: era5 # Mandatory, str: Reference codename. 
  Time:
-    sdate: 
-      fcst_syear: '2020' # Optional, int: Forecast initialization year 'YYYY'
-      fcst_sday: '1101' # Mandatory, int: Start date, 'mmdd'
+    sdate: ‘1101’# Mandatory, int: Start date, 'mmdd'
+    fcst_year: '2020' # Optional, int: Forecast initialization year 'YYYY'
     hcst_start: '1993' # Mandatory, int: Hindcast initialization start year 'YYYY'
     hcst_end: '2016' # Mandatory, int: Hindcast initialization end year 'YYYY'
-    ftime_min: 0 # Mandatory, int: First forecast time step in months 
-    ftime_max: 1 # Mandatory, int: Last forecast time step in months
+     ftime_min: 1 # Mandatory, int: First forecast time step in months. Starts at “1”.
+     ftime_max: 2 # Mandatory, int: Last forecast time step in months. Starts at “1”.
  Region:
    latmin: -10 # Mandatory, int: minimum latitude
    latmax: 10 # Mandatory, int: maximum latitude
@@ -43,14 +49,19 @@ Analysis:
    lonmax: 20 # Mandatory, int: maximum longitude
  Regrid:
    method: bilinear # Mandatory, str: Interpolation method.
-    type: to_system # Mandatory, str: 'to_system', 'to_reference', or CDO-accepted grid.
+    type: to_system # Mandatory, str: 'to_system', 'to_reference', 'none',
+                    # or CDO-accepted grid.
  Workflow:
    Calibration:
-      method: qmap # Mandatory, str: Calibration method.
+      method: mse_min # Mandatory, str: Calibration method.
    Skill:
-      metric: RPSS # Mandatory, str: Skill metric or list of skill metrics.
+      metric: RPSS CRPSS EnsCorr # Mandatory, str: Skill metric or list of skill metrics, separated by commas or spaces.
+    Probabilities:
+      percentiles: [[1/3, 2/3], [1/10, 9/10], [1/4, 2/4, 3/4]] # Optional: Thresholds for quantiles and probability categories. Each set of thresholds should be enclosed within brackets.
    Indicators:
      index: no # This feature is not implemented yet
+  ncores: 4 # Optional, int: number of cores to be used in parallel computation. If left empty, defaults to 1.
+  remove_NAs: TRUE # Optional, bool: Whether to remove NAs. If left empty, defaults to FALSE.
  Output_format: S2S4E # This feature is not implemented yet
 Run:
  Loglevel: INFO # This feature is not implemented yet
@@ -59,27 +70,163 @@ Run:
  code_dir: /esarchive/scratch/vagudets/repos/auto-s2s/
 ```

+## List of /esarchive/ datasets
+
+Here is a list of the datasets that can currently be loaded by the tool. To request that an additional dataset be added, please open an issue.
+
+### Seasonal datasets
+
+Systems:
+|  Forcast System          | Monthly mean    | Daily mean | Recipe name |
+|--------------------------|-----------------|------------|-------------|
+| **ECMWF SEAS5**          | Yes             | Yes        | system5c3s  |
+| **DWD GFCS 2.1**         | Yes             | No         | system21_m1 |
+| **CMCC 3.5**             | Yes             | No         | system35c3s |
+| **MeteoFrance System 7** | Yes             | No         | system7c3s  |
+| **JMA System 2**         | Yes             | No         | system2c3s  |
+| **ECCC CanCM4i**         | May to November | No         | eccc1       |
+
+Observations:
+|  Reference   | Monthly mean | Daily mean | Recipe name |
+|--------------|--------------|------------|-------------|
+| **ERA5**     | Yes          | Yes        | era5        |
+| **ERA5Land** | `tas` only   | Yes        | era5land    |
+| **UERRA**    | No           | `tas` only | uerra       |
+
+### Decadal datasets
+
+Systems:
+|  Forcast System        | Monthly mean | Daily mean | forecast (DCPP-B)? |
+|------------------------|--------------|------------|--------------------|
+|  **BCC-CSM2-MR**       | Yes          | Yes        | No                 |
+| **CanESM5**            | Yes          | Yes        | Yes                |
+| **CESM1-1-CAM5-CMIP5** | Yes          | WIP        | No                 |
+| **CMCC-CM2-SR5**       | Yes          | No         | Yes                |
+| **EC-Earth3-i1**       | Yes          | Yes        | No                 |
+| **EC-Earth3-i2**       | Yes          | Yes        | No                 |
+| **EC-Earth3-i4**       | Yes          | Yes        | Yes                |
+| **HadGEM3-GC3.1-MM**   | Yes          | Yes        | Yes                |
+| **IPSL-CM6A-LR**       | Yes          | Yes        | No                 |
+| **MIROC6**             | Yes          | Yes        | No                 |
+| **MPI-ESM1.2-HR**      | Yes          | Yes        | No                 |
+| **MPI-ESM1.2-LR**      | Yes          | No         | Yes                |
+| **MRI-ESM2-0**         | Yes          | Yes        | No                 |
+| **NorCPM1-i1**         | Yes          | Yes        | No                 |
+| **NorCPM1-i2**         | Yes          | Yes        | No                 |
+
+Observations:
+|  Reference    | Monthly mean | Daily mean |
+|---------------|--------------|------------|
+|  **GHCNv4**   | Yes          | No         |
+| **ERA5**      | Yes          | Yes        |
+| **JRA-55**    | Yes          | Yes        |
+| **GISTEMPv4** | Yes          | No         |
+| **HadCRUT4**  | Yes          | No         |
+| **HadSLP2**   | Yes          | No         |
+
 ## Loading module

-The Loading module reads the requested data from /esarchive/ and converts it to objects of class `s2dv_cube` which can be passed onto the other modules in the tool. An `s2dv_cube` object is a list containing the data array in the element $data and many other elements that store the metadata.
+The Loading module retrieves the data requested in the recipe from /esarchive/, interpolates it to the desired grids if interpolation has been requested, and converts it to objects of class `s2dv_cube`, which can be passed onto the other modules in the tool. An `s2dv_cube` object is a list containing the data array in the element $data and many other elements that store the metadata.
+
+The output of the main function, load_datasets(), is a list containing the hindcast, observations and forecast, named hcst, obs and fcst respectively. fcst will be `NULL` if no forecast years have been requested.
+
+### Regridding

-The output of the main function, load_datasets(), is a list with containing the hindcast, observations and forecast, named $hcst, $obs and $fcst. $fcst will be `NULL` if no forecast years have been requested.
+The Loading module can use CDO_Remap() to interpolate the loaded data. The interpolation methods that can be specified in the recipe under Regrid:method are those accepted by CDO: "conservative", "bilinear", "bicubic", "distance-weighted", "con2", "laf", "nn". See CDO User Guide for more details.
+
+The target grid is to be specified in Regrid:type. The options are:
+- `‘to_system’`: The observations are interpolated to the system grid.
+- `‘to_reference’:` the hindcast and forecast are interpolated to the reference grid.
+- `‘none’`: No interpolation is done when loading the data.
+- A CDO-accepted grid format, such as `'r360x180'` or the link to a netCDF file. In this case, both the system and the reference will be interpolated to this grid. See the FAQs section for more details.

 ## Calibration module

-The Calibration module applies a calibration method to the hindcast and forecast data using the observations as a reference, and returns the calibrated data and its metadata in `s2dv_cube` objects. 
+The Calibration performs bias correction on the loaded data. It module accepts the output of the Loading module as input, and also requires the recipe. It applies a calibration method to the hindcast and forecast data using the observations as a reference, and returns the calibrated data and its metadata as an `s2dv_cube` object.

-The output of the main function, calibrate_datasets(), is a list containing the calibrated hindcast and forecast, named $hcst and $fcst. $fcst will be `NULL` if no forecast years have been requested.
+The output of the main function, calibrate_datasets(), is a list containing the calibrated hindcast and forecast, named hcst and fcst respectively. fcst will be `NULL` if no forecast years have been requested.

 ### Calibration methods currently available:

-The calibration method can be requested in the Workflow$Calibration$method section of the recipe. **You can only request one calibration method per recipe.** This is a list of the methods currently available:
+The calibration method can be requested in the Workflow:Calibration:method section of the recipe. **The user can only request one calibration method per recipe.** This is a list of the methods currently available:

 - `'raw'`: No calibration is performed. A warning will show up on the terminal when calibrate_datasets() is called, and it will return the uncalibrated hindcast and forecast.

 - Daily data: Quantile mapping `'qmap'`.
-For more details, see the CSTools documentation for CST_QuantileMapping(). 
+For more details, see the [CSTools documentation](https://CRAN.R-project.org/package=CSTools) for CST_QuantileMapping().

 - Monthly data: `'bias'`, `'evmos'`, `'mse_min'`, `'crps_min'`, and `'rpc-based'`.
-For more details, see the CSTools documentation for CST_Calibration().
+For more details, see the  [CSTools documentation](https://CRAN.R-project.org/package=CSTools)for CST_Calibration().
+
+## Skill module
+
+The Skill module is the part of the workflow that computes the metrics to assess the quality of a forecast. It accepts the output of the Calibration module as input, and also requires the recipe. It is comprised of two main functions:
+
+**compute_skill_metrics()**: Computes the verification metrics requested in Workflow:Skill:metric. The user can request an unlimited number of verification metrics per recipe. The following metrics are currently available:
+
+- `EnsCorr`: Ensemble Mean Correlation.
+- `RPS`: Ranked Probability Score.
+- `RPSS`: Ranked Probability Skill Score.
+- `FRPS`: Fair Ranked Probability Score.
+- `FRPSS`: Fair Ranked Probability Skill Score.
+- `CRPS`: Continuous Ranked Probability Score.
+- `CRPSS`: Continuous Ranked Probability Skill Score.
+- `BSS10`: Brier Skill Score (lower extreme).
+- `BSS90`: Brier Skill Score (upper extreme).
+
+**Note**: For the following metrics: "EnsCorr", "FRPS", "RPSS", "FRPSS", "BSS90", "BSS10", if “_specs” is added at the end of the metric (e.g. RPSS_specs), it will be computed using SpecsVerification.
+
+The output of compute_skill_metrics() is a list containing one or more arrays with named dimensions; usually ‘time’, ‘longitude’ and ‘latitude’. For more details on the specific output for each metric, see the documentation for [s2dv](https://CRAN.R-project.org/package=s2dv) and [SpecsVerification](https://CRAN.R-project.org/package=SpecsVerification).
+
+**compute_probabilities()** returns a list of two elements with arrays containing the values corresponding to the thresholds in Workflow:Probabilities:percentiles (\$percentiles), as well as their probability bins (\$probs). Each list contains arrays with named dimensions ‘time’, ‘longitude’ and ‘latitude’.
+For example, if the extremes (1/10, 9/10) are requested, the output will be:
+`$percentiles`:
+   `percentile_10`: The 10th percentile, or lower extreme.
+   `percentile_90`: The 90th percentile, or upper extreme.
+`$probs`:
+   `prob_b10`: Probability of falling below the 10th percentile.
+   `prob_10_to_90`: Probability of falling inbetween the 10th and 90th percentile.
+   `prob_a90`: Probability of falling above the 90th percentile.
+
+**Note**: When naming the variables, the probability thresholds are converted to percentiles and rounded to the nearest integer to avoid dots in variable or file names. However, this is just a naming convention; the computations are performed based on the original thresholds specified in the recipe.
+
+## Saving module
+
+The Saving module contains several functions that export the data (the calibrated hindcast and forecast, the corresponding observations, the skill metrics, percentiles and probabilities) to netCDF files and save them.
+
+Several subdirectories are created in the output directory specified in the recipe. Their structure is as follows:
+
+If fcst_year has been requested:
+
+`output_dir/<calibration_method>-<frequency>/<forecast_date>/<var>/`
+
+If fcst_year is empty:
+
+`output_dir/<calibration_method>-<frequency>/hcst-<mmdd>/<var>/`
+
+Please take this structure into account when defining Run:output_dir, to avoid unintentionally rewriting previous data.
+For example, in our example recipe, the final output directory will be:
+
+`/esarchive/scratch/vagudets/repos/auto-s2s/out-logs/mse_min-monthly_mean/20201101/tas/`
+
+The calibrated hindcast and forecast are saved in files named `<var>_<yyyymmdd>.nc`, where `var` is the name of the variable, `yyyy` is the year and `mmdd` is the initialization date. There is one file per year loaded. The observations are saved in the same format, in files named `<var>-obs_<yyyymmdd>.nc`.
+
+All of the skill metrics with (time, latitude, longitude) dimensions are saved to a common file named `<var>-skill_month<mm>.nc`, where mm is the initialization month. Each metric is stored as a variable within the file. 
+
+The file containing the requested quantiles is named `<var>-percentiles_month<mm>.nc`. For each year of the hindcast period there is also a file named `<var>-probs_<yyyymmdd>.nc` containing the probability bins.
+
+## **FAQ**
+
+**Can a new metric/method/module/functionality be added to the ESS Verification Suite?**
+
+To request a new feature, please open an issue in the Auto-S2S repository describing what you need and adding any information you think could be of use. The dev team will assess its viability and priority status and work to implement it whenever possible.
+
+**How can I interpolate my data to a personalized grid?**
+
+To interpolate both your hindcast/forecast and your observations to a third grid, you need to change the ‘regrid’ parameter in the Auto-S2S recipe to an accepted grid description. The tool uses CDO to regrid your data when loading it, and it accepts the following formats as regridding input:
+
+1.  A predefined grid name in a format that CDO accepts. E.g. ‘r360x180’ or ‘t106grid’. More information on predefined grid names in the [CDO User's Guide](https://code.mpimet.mpg.de/projects/cdo/embedded/cdo.pdf), in the “Predefined grids” section.
+    
+2.  The path to a netCDF file that has the grid you want to use. Make sure that the netCDF only contains one grid, to ensure that CDO is actually reading the information you want to use. More information in the [CDO User's Guide](https://code.mpimet.mpg.de/projects/cdo/embedded/cdo.pdf), in the “Grids from data files” section.

+3. The path to a grid description file, which is a simple ASCII file containing keywords and parameters that describe the grid in a format that CDO understands. You can find more information in the [CDO User's Guide](https://code.mpimet.mpg.de/projects/cdo/embedded/cdo.pdf) section “CDO grids”, and **examples are available in the Auto-S2S repository** under `conf/grid_description/`.
\ No newline at end of file