example.md

Example
=======

Next you can see an example of usage of s2dverification spanning its
four modules ([Data retrieval](data_retrieval.md), 
[Basic statistics](basic_statistics.md), [Verification](verification.md) 
and [Visualisation](visualisation.md)).

The goal of the example is to load, analyse and asses the skill of two 
sample forecasts of the near-surface air temperature over the Atlantic
region against a reference reanalysis.

Loading the data
----------------

First the package is loaded and attached.
Then a list is built with information on the location of each dataset to load.

```r
library(s2dverification)

expA <- list(name = 'experimentA',
             path = file.path('/path/to/experiments/$EXP_NAME$/monthly_mean',
                              '$VAR_NAME$/$VAR_NAME$_$START_DATE$.nc'))
expB <- list(name = 'experimentB',
             path = file.path('/path/to/experiments/$EXP_NAME$/monthly_mean',
                              '$VAR_NAME$/$VAR_NAME$_$START_DATE$.nc'))
obsX <- list(name = 'observationX',
             path = file.path('/path/to/observations/$OBS_NAME$/monthly_mean',
                              '$VAR_NAME$/$VAR_NAME$_$YEAR$$MONTH$.nc'))
```

Then the data is loaded with `Load()` providing the previously built lists
-to specify the desired datasets- and other parameters to select the Earth
surface region of interest, starting dates and forecast time steps to load
data from.
In this example, the requested format is 2-dimensional: all the loaded data
will be remapped onto the specified common 'grid' via CDO libraries.

```r
sdates <- paste0(1991:1993, '1101')
data <- Load('tas', list(expA, expB), list(obsX),
             sdates = sdates,
             leadtimemin = 2, leadtimemax = 7,
             latmin = -10, latmax = 60,
             lonmin = 100, lonmax = 250, 
             output = 'lonlat', grid = 't106grid',
             method = 'distance-weighted')
## * The load call you issued is:
## *   Load(var = "tas", exp = list(structure(list(name = "experimentA", path =
## *        "/path/to/experiments/$EXP_NAME$/monthly_mean/$VAR_NAME$/$VAR_NAME$_$START_DATE$.nc"),
## *        .Names = c("name", "path")), structure(list(name =
## *        "experimentB", path =
## *        "/path/to/experiments/$EXP_NAME$/monthly_mean/$VAR_NAME$/$VAR_NAME$_$START_DATE$.nc"),
## *        .Names = c("name", "path"))), obs = list(structure(list(name =
## *        "observationX", path =
## *        "/path/to/observations/$OBS_NAME$/monthly_mean/$VAR_NAME$/$VAR_NAME$_$YEAR$$MONTH$.nc"),
## *        .Names = c("name", "path"))), sdates = c("19911101",
## *        "19921101", "19931101"), grid = "t106grid", output = "lonlat",
## *        storefreq = "monthly", ...)
## * See the full call in '$load_parameters' after Load() finishes.
## * Fetching first experimental files to work out 'var_exp' size...
## * Exploring dimensions... /path/to/experiments/experimentA/monthly_mean/tas/tas_19911101.nc                  
## * Success. Detected dimensions of experimental data: 2, 5, 3, 6, 63, 134
## * Fetching first observational files to work out 'var_obs' size...
## * Exploring dimensions... /path/to/observations/observationX/monthly_mean/tas/tas_199112.nc                  
## * Success. Detected dimensions of observational data: 1, 1, 3, 6, 63, 134
## * Will now proceed to read and process 24 data files:
## *   /path/to/experiments/experimentA/monthly_mean/tas/tas_19911101.nc
## *   /path/to/experiments/experimentA/monthly_mean/tas/tas_19921101.nc
## *   /path/to/experiments/experimentA/monthly_mean/tas/tas_19931101.nc
## *   /path/to/experiments/experimentB/monthly_mean/tas/tas_19911101.nc
## *   /path/to/experiments/experimentB/monthly_mean/tas/tas_19921101.nc
## *   /path/to/experiments/experimentB/monthly_mean/tas/tas_19931101.nc
## *   /path/to/observations/observationX/monthly_mean/tas/tas_199112.nc
## *   /path/to/observations/observationX/monthly_mean/tas/tas_199201.nc
## *   /path/to/observations/observationX/monthly_mean/tas/tas_199202.nc
## *   /path/to/observations/observationX/monthly_mean/tas/tas_199203.nc
## *   /path/to/observations/observationX/monthly_mean/tas/tas_199204.nc
## *   /path/to/observations/observationX/monthly_mean/tas/tas_199205.nc
## *   /path/to/observations/observationX/monthly_mean/tas/tas_199212.nc
## *   /path/to/observations/observationX/monthly_mean/tas/tas_199301.nc
## *   /path/to/observations/observationX/monthly_mean/tas/tas_199302.nc
## *   /path/to/observations/observationX/monthly_mean/tas/tas_199303.nc
## *   /path/to/observations/observationX/monthly_mean/tas/tas_199304.nc
## *   /path/to/observations/observationX/monthly_mean/tas/tas_199305.nc
## *   /path/to/observations/observationX/monthly_mean/tas/tas_199312.nc
## *   /path/to/observations/observationX/monthly_mean/tas/tas_199401.nc
## *   /path/to/observations/observationX/monthly_mean/tas/tas_199402.nc
## *   /path/to/observations/observationX/monthly_mean/tas/tas_199403.nc
## *   /path/to/observations/observationX/monthly_mean/tas/tas_199404.nc
## *   /path/to/observations/observationX/monthly_mean/tas/tas_199405.nc
## * Total size of requested data:  13372128 bytes.
## *   - Experimental data:  ( 2 x 5 x 3 x 6 x 63 x 134 ) x 8 bytes = 12156480 bytes.
## *   - Observational data: ( 1 x 1 x 3 x 6 x 63 x 134 ) x 8 bytes = 1215648 bytes.
## * If size of requested data is close to or above the free shared RAM memory, R will crash.
## starting worker pid=14895 on localhost:11924 at 15:11:37.477
## starting worker pid=14916 on localhost:11924 at 15:11:37.716
## starting worker pid=14932 on localhost:11924 at 15:11:37.891
## starting worker pid=14948 on localhost:11924 at 15:11:38.110
## starting worker pid=14964 on localhost:11924 at 15:11:38.360
## starting worker pid=14980 on localhost:11924 at 15:11:38.540
## starting worker pid=14996 on localhost:11924 at 15:11:38.718
## starting worker pid=15012 on localhost:11924 at 15:11:38.908
## * Loading... This may take several minutes...
## * Progress: 0% + 33.33% + 33.33% + 33.33%
```

The output consists of two arrays of data (experimental and observational
data) with labelled dimensions, a list of loaded files, a list of not found
files and a call stamp to exactly reproduce as needed, among others.
See [**Data
retrieval**](data_retrieval.md)
for a full explanation of the capabilities and outputs of `Load()`.

Keep in mind that the two arrays obtained from `Load()` will have the 
following dimensions in the following order:
```r
  c(n. of datasets, n. of members, n. of starting dates, n. of lead-times, n. of latitudes, n. of longitudes)
```

Checking and summarizing data
-----------------------------

<a name="snippet1"></a>
Next, the data of the first member, starting date and leadtime of each dataset
can be visualised on an equidistant projection with `PlotEquiMap()` to check
that the loaded data is as expected.
```r
PlotEquiMap(data$mod[1, 1, 1, 1, , ], data$lon, data$lat)
PlotEquiMap(data$mod[2, 1, 1, 1, , ], data$lon, data$lat)
PlotEquiMap(data$obs[1, 1, 1, 1, , ], data$lon, data$lat)
```

<img src="equi_map_raw_all.png" />

See the full code used to obtain this figure in 
[Snippet 1](snippets.md#snippet1).

The values for some starting dates at a single grid point of an experimental
dataset can be plotted together with the observations with the function 
`PlotAno()`:
```r
mod <- data$mod[, , , , 30, 60, drop = FALSE]
dim(mod) <- dim(mod)[1:4]
obs <- data$obs[, , , , 30, 60, drop = FALSE]
dim(obs) <- dim(obs)[1:4]
PlotAno(mod, obs, sdates, 
        fileout = c('raw_expA_obsX.eps', 'raw_expB_obsX.eps'))
```

<img src="raw_expA_obsX.png" />
<img src="raw_expB_obsX.png" />

We can work out the exact coordinates of the selected grid cell as follows:
```r
data$lon[60]
## 166.5
data$lat[30]
## 27.47649
```

At this point, additional numerical checks could be carried out to ensure the
data retrieval stage has been sucessful.

The next usual step is to perform basic statistics on the raw data such as
computing the climatologies with bias correction and computing the anomalies.
This is straightforward with the functions `Clim()` and `Ano()`. Before, 
however, it is usual to average the data over the region of interest instead
of focusing on a single grid cell. This can be done directly requesting the 
data to `Load()` with the 'areave' output type, which will apply area weights 
and calculate total average:
```r
data <- Load('tas', list(expA, expB), list(obsX),
             sdates = sdates,
             leadtimemin = 2, leadtimemax = 7,
             latmin = -10, latmax = 60,
             lonmin = 100, lonmax = 250, 
             output = 'areave', grid = 't106grid',
             method = 'distance-weighted')
```

It is also common at this point to use `Smoothing()` to calculate running 
means along the lead-times dimension or `Trend()` to compute and remove the
trend of the raw data. ¿¿??

Then we calculate per-pair climatologies with bias correction with `Clim()`:
```r
clim <- Clim(mod, obs)
PlotClim(clim$clim_exp, clim$clim_obs, fileout = 'clim_expA_expB_obsX.eps')
```

<img src="clim_expA_expB_obsX.png" />

And the anomalies with `Ano()`, which literally substracts the climatologies 
to the raw data:
```r
ano_mod <- Ano(mod, clim$clim_exp)
ano_obs <- Ano(obs, clim$clim_obs)
PlotAno(ano_mod, ano_obs, sdates, 
        fileout = c('ano_expA_obsX.eps', 'ano_expB_obsX.eps'))
```

<img src="ano_expA_obsX.png" />
<img src="ano_expB_obsX.png" />