example.md 12.4 KB
Newer Older
Example
=======

Next you can see an example of usage of s2dverification spanning its
Nicolau Manubens's avatar
Nicolau Manubens committed
four modules ([Data retrieval](data_retrieval.md), 
[Basic statistics](basic_statistics.md), [Verification](verification.md) 
and [Visualisation](visualisation.md)).
Nicolau Manubens's avatar
Nicolau Manubens committed
The goal of the example is to load and analyse the skill of two sample 
forecasts of the near-surface air temperature over the Atlanticregion and 
Nicolau Manubens's avatar
Nicolau Manubens committed
assess their skill against a reference reanalysis.  
The data used in this example has gone already through a first post-processing
stage in which the monthly means have been computed.
Nicolau Manubens's avatar
Nicolau Manubens committed
Loading data
------------
Nicolau Manubens's avatar
Nicolau Manubens committed
First the package is loaded and attached.  
Then a list is built with information on the location of each dataset to load.  
A path pattern is specified for each dataset, in this case, with the wildcards
'$VAR_NAME$', '$START_DATE$', '$YEAR$', '$MONTH$', '$EXP_NAME$' and 
'$OBS_NAME$'. These wildcards will be automatically replaced by `Load()` with
the corresponding values (the requested variable name, the requested starting
dates, years or months, the dataset names, ...).  
See the available wildcards in `?Load`.

```r
library(s2dverification)

expA <- list(name = 'experimentA',
Nicolau Manubens's avatar
Nicolau Manubens committed
             path = file.path('/path/to/experiments/$EXP_NAME$/monthly_mean',
                              '$VAR_NAME$/$VAR_NAME$_$START_DATE$.nc'))
expB <- list(name = 'experimentB',
Nicolau Manubens's avatar
Nicolau Manubens committed
             path = file.path('/path/to/experiments/$EXP_NAME$/monthly_mean',
                              '$VAR_NAME$/$VAR_NAME$_$START_DATE$.nc'))
obsX <- list(name = 'observationX',
Nicolau Manubens's avatar
Nicolau Manubens committed
             path = file.path('/path/to/observations/$OBS_NAME$/monthly_mean',
                              '$VAR_NAME$/$VAR_NAME$_$YEAR$$MONTH$.nc'))
```

Then the data is loaded with `Load()` providing the previously built lists
-to specify the desired datasets- and other parameters to select the Earth
surface region of interest, starting dates and forecast time steps to load
Nicolau Manubens's avatar
Nicolau Manubens committed
data from.  
In this example, the requested format is 2-dimensional: all the loaded data
will be remapped onto the specified common 'grid' via CDO libraries.

```r
sdates <- paste0(1991:1993, '1101')
data <- Load('tas', list(expA, expB), list(obsX),
             sdates = sdates,
             leadtimemin = 2, leadtimemax = 7,
             latmin = -10, latmax = 60,
             lonmin = 100, lonmax = 250, 
             output = 'lonlat', grid = 't106grid',
             method = 'distance-weighted')
## * The load call you issued is:
## *   Load(var = "tas", exp = list(structure(list(name = "experimentA", path =
## *        "/path/to/experiments/$EXP_NAME$/monthly_mean/$VAR_NAME$/$VAR_NAME$_$START_DATE$.nc"),
## *        .Names = c("name", "path")), structure(list(name =
## *        "experimentB", path =
## *        "/path/to/experiments/$EXP_NAME$/monthly_mean/$VAR_NAME$/$VAR_NAME$_$START_DATE$.nc"),
## *        .Names = c("name", "path"))), obs = list(structure(list(name =
## *        "observationX", path =
## *        "/path/to/observations/$OBS_NAME$/monthly_mean/$VAR_NAME$/$VAR_NAME$_$YEAR$$MONTH$.nc"),
## *        .Names = c("name", "path"))), sdates = c("19911101",
## *        "19921101", "19931101"), grid = "t106grid", output = "lonlat",
## *        storefreq = "monthly", ...)
## * See the full call in '$load_parameters' after Load() finishes.
## * Fetching first experimental files to work out 'var_exp' size...
## * Exploring dimensions... /path/to/experiments/experimentA/monthly_mean/tas/tas_19911101.nc                  
## * Success. Detected dimensions of experimental data: 2, 5, 3, 6, 63, 134
## * Fetching first observational files to work out 'var_obs' size...
## * Exploring dimensions... /path/to/observations/observationX/monthly_mean/tas/tas_199112.nc                  
## * Success. Detected dimensions of observational data: 1, 1, 3, 6, 63, 134
## * Will now proceed to read and process 24 data files:
## *   /path/to/experiments/experimentA/monthly_mean/tas/tas_19911101.nc
## *   /path/to/experiments/experimentA/monthly_mean/tas/tas_19921101.nc
## *   /path/to/experiments/experimentA/monthly_mean/tas/tas_19931101.nc
## *   /path/to/experiments/experimentB/monthly_mean/tas/tas_19911101.nc
## *   /path/to/experiments/experimentB/monthly_mean/tas/tas_19921101.nc
## *   /path/to/experiments/experimentB/monthly_mean/tas/tas_19931101.nc
## *   /path/to/observations/observationX/monthly_mean/tas/tas_199112.nc
## *   /path/to/observations/observationX/monthly_mean/tas/tas_199201.nc
## *   /path/to/observations/observationX/monthly_mean/tas/tas_199202.nc
## *   /path/to/observations/observationX/monthly_mean/tas/tas_199203.nc
## *   /path/to/observations/observationX/monthly_mean/tas/tas_199204.nc
## *   /path/to/observations/observationX/monthly_mean/tas/tas_199205.nc
## *   /path/to/observations/observationX/monthly_mean/tas/tas_199212.nc
## *   /path/to/observations/observationX/monthly_mean/tas/tas_199301.nc
## *   /path/to/observations/observationX/monthly_mean/tas/tas_199302.nc
## *   /path/to/observations/observationX/monthly_mean/tas/tas_199303.nc
## *   /path/to/observations/observationX/monthly_mean/tas/tas_199304.nc
## *   /path/to/observations/observationX/monthly_mean/tas/tas_199305.nc
## *   /path/to/observations/observationX/monthly_mean/tas/tas_199312.nc
## *   /path/to/observations/observationX/monthly_mean/tas/tas_199401.nc
## *   /path/to/observations/observationX/monthly_mean/tas/tas_199402.nc
## *   /path/to/observations/observationX/monthly_mean/tas/tas_199403.nc
## *   /path/to/observations/observationX/monthly_mean/tas/tas_199404.nc
## *   /path/to/observations/observationX/monthly_mean/tas/tas_199405.nc
## * Total size of requested data:  13372128 bytes.
## *   - Experimental data:  ( 2 x 5 x 3 x 6 x 63 x 134 ) x 8 bytes = 12156480 bytes.
## *   - Observational data: ( 1 x 1 x 3 x 6 x 63 x 134 ) x 8 bytes = 1215648 bytes.
## * If size of requested data is close to or above the free shared RAM memory, R will crash.
## starting worker pid=14895 on localhost:11924 at 15:11:37.477
## starting worker pid=14916 on localhost:11924 at 15:11:37.716
## starting worker pid=14932 on localhost:11924 at 15:11:37.891
## starting worker pid=14948 on localhost:11924 at 15:11:38.110
## starting worker pid=14964 on localhost:11924 at 15:11:38.360
## starting worker pid=14980 on localhost:11924 at 15:11:38.540
## starting worker pid=14996 on localhost:11924 at 15:11:38.718
## starting worker pid=15012 on localhost:11924 at 15:11:38.908
## * Loading... This may take several minutes...
## * Progress: 0% + 33.33% + 33.33% + 33.33%
```

The output consists of two arrays of data (experimental and observational
data) with labelled dimensions, a list of loaded files, a list of not found
Nicolau Manubens's avatar
Nicolau Manubens committed
files and a call stamp to exactly reproduce as needed, among others.  
See [**Data
retrieval**](data_retrieval.md)
for a full explanation of the capabilities and outputs of `Load()`.

Keep in mind that the two arrays obtained from `Load()` will have the 
following dimensions in the following order:
  c(n. of datasets, n. of members, n. of starting dates, n. of lead-times, n. of latitudes, n. of longitudes)
Nicolau Manubens's avatar
Nicolau Manubens committed
```
Checking and summarizing data
-----------------------------

<a name="snippet1"></a>
Next, the data of the first member, starting date and leadtime of each dataset
can be visualised on an equidistant projection with `PlotEquiMap()` to check
that the loaded data is as expected.
```r
PlotEquiMap(data$mod[1, 1, 1, 1, , ], data$lon, data$lat)
PlotEquiMap(data$mod[2, 1, 1, 1, , ], data$lon, data$lat)
PlotEquiMap(data$obs[1, 1, 1, 1, , ], data$lon, data$lat)
```

<img src="equi_map_raw_all.png" />

See the full code used to obtain this figure in 
[Snippet 1](snippets.md#snippet1).

The values for some starting dates at a single grid point of an experimental
dataset can be plotted together with the observations with the function 
`PlotAno()`:
```r
mod <- data$mod[, , , , 30, 60, drop = FALSE]
dim(mod) <- dim(mod)[1:4]
obs <- data$obs[, , , , 30, 60, drop = FALSE]
dim(obs) <- dim(obs)[1:4]
Nicolau Manubens's avatar
Nicolau Manubens committed
PlotAno(mod, obs, gsub('1101', '1201', sdates),
        toptitle = paste0("Experiment ", c("A", "B"), 
                          ": raw 'tas' at 166.5ºE, 27.47ºN."), 
        ytitle = "K",
        fileout = c('raw_expA_obsX.eps', 'raw_expB_obsX.eps'))
Nicolau Manubens's avatar
Nicolau Manubens committed
<img src="raw_expA_obsX.png" width="600" />
<img src="raw_expB_obsX.png" width="600" />

Nicolau Manubens's avatar
Nicolau Manubens committed
Each coloured region represents data corresponding to a single starting date.  
Nicolau Manubens's avatar
Nicolau Manubens committed
The bold line represents the mean value and the thin lines represent the values
Nicolau Manubens's avatar
Nicolau Manubens committed
of each ensemble member. The upper and lower limits of each region are 
defined, respectively, by the maximum and minimum values among the members.  
Nicolau Manubens's avatar
Nicolau Manubens committed
The black line represents the reference reanalysis values.

We can see in the plot how the 'tas' reaches a minimum in February/March.
We can work out the exact coordinates of the selected grid cell as follows:
```r
data$lon[60]
## 166.5
data$lat[30]
Nicolau Manubens's avatar
Nicolau Manubens committed
## 27.47649
At this point, additional numerical checks could be carried out to ensure the
data retrieval stage has been sucessful.
Nicolau Manubens's avatar
Nicolau Manubens committed
Bias correction
---------------

Nicolau Manubens's avatar
Nicolau Manubens committed
The next common step is to perform basic statistics on the raw data such as
computing the climatologies with bias correction to account for the drift 
Nicolau Manubens's avatar
Nicolau Manubens committed
dependency on the initial conditions in the verification stage.  
This is straightforward with the function `Clim()`. Before, however, it is 
usual to average the data over the region of interest instead of focusing on a 
single grid cell. This can be done by directly requesting the data to `Load()` 
with the 'areave' output type, which will apply area weights and calculate the 
total average:
```r
data <- Load('tas', list(expA, expB), list(obsX),
             sdates = sdates,
             leadtimemin = 2, leadtimemax = 7,
             latmin = -10, latmax = 60,
             lonmin = 100, lonmax = 250, 
             output = 'areave', grid = 't106grid',
             method = 'distance-weighted')
```

Nicolau Manubens's avatar
Nicolau Manubens committed
Then we calculate and plot bias corrected per-pair climatologies with `Clim()`
and `PlotClim()`:
```r
clim <- Clim(mod, obs)
Nicolau Manubens's avatar
Nicolau Manubens committed
PlotClim(clim$clim_exp, clim$clim_obs, monini = 12,
Nicolau Manubens's avatar
Nicolau Manubens committed
         toptitle = "Bias corrected climatologies of 'tas' over Atlantic region.",
         listexp = c('Experiment A', 'Experiment B'),
         listobs = c('Observation X'),
Nicolau Manubens's avatar
Nicolau Manubens committed
         ytitle = "K", fileout = 'clim_expA_expB_obsX.eps')
Nicolau Manubens's avatar
Nicolau Manubens committed
<img src="clim_expA_expB_obsX.png" width="600" />
Nicolau Manubens's avatar
Nicolau Manubens committed
Each line in this plot represents the climatology of each member of the 
experimental datasets. A single climatology could be obtained providing the
parameter memb = FALSE to `Clim()`.

We can obtain the anomalies with `Ano()`, which simply substracts the 
Nicolau Manubens's avatar
Nicolau Manubens committed
climatologies to the corresponding starting dates and members of the raw data.  
These can be plotted with `PlotAno()`:
```r
ano_mod <- Ano(mod, clim$clim_exp)
ano_obs <- Ano(obs, clim$clim_obs)
Nicolau Manubens's avatar
Nicolau Manubens committed
PlotAno(ano_mod, ano_obs, gsub('1101', '1201', sdates), 
Nicolau Manubens's avatar
Nicolau Manubens committed
        toptitle = paste0("Experiment ", c("A", "B"), 
Nicolau Manubens's avatar
Nicolau Manubens committed
                          ": bias corrected 'tas' anomalies over Atlantic region."),
Nicolau Manubens's avatar
Nicolau Manubens committed
        ytitle = 'K', linezero = TRUE,
        fileout = c('ano_expA_obsX.eps', 'ano_expB_obsX.eps'))
Nicolau Manubens's avatar
Nicolau Manubens committed
<img src="ano_expA_obsX.png" width="600" />
<img src="ano_expB_obsX.png" width="600" />

To complete the bias correction we would need to add the observed climatologies 
Nicolau Manubens's avatar
Nicolau Manubens committed
to these anomalies. The working units of the package, however, are the 
anomalies.
Nicolau Manubens's avatar
Nicolau Manubens committed

Nicolau Manubens's avatar
Nicolau Manubens committed
Other basic statistics such as trends, detrended anomalies, frequency 
filtering, EOF/PCA and empirical model generation are implemented in the 
package.  
Nicolau Manubens's avatar
Nicolau Manubens committed
Check [**Basic statistics**](basic_statistics.md) for a thorough explanation.
Also see [**Visualisation**](visualisation.md) for details on visualisation 
tools.

Nicolau Manubens's avatar
Nicolau Manubens committed
Assessing the quality of a forecast
-----------------------------------
Nicolau Manubens's avatar
Nicolau Manubens committed

Nicolau Manubens's avatar
Nicolau Manubens committed
As a measure of the skill of the two forecasts we will compute their 
correlation against the reanalysis.

First we compute the ensemble mean and then calculate the Pearson correlation
with `Corr()` and plot the results with `PlotVsLTime()`, which allows to plot
any other computed variable with confidence intervals and signification values.
```r
corr <- Corr(Mean1Dim(ano_mod, 2), Mean1Dim(ano_obs, 2))
PlotVsLTime(corr, toptitle = "Correlations with Observation X over Atlantic region.",
            ytitle = 'corr', monini = 12, 
            listexp = c('Exp A', 'Exp B'),
            fileout = 'corr_expA_expB_obsX.eps')
```
Nicolau Manubens's avatar
Nicolau Manubens committed

Nicolau Manubens's avatar
Nicolau Manubens committed
<img src="corr_expA_expB_obsX.png" width="600" />
Nicolau Manubens's avatar
Nicolau Manubens committed

See [**Verification**](verification.md) for a detailed explanation of the 
Nicolau Manubens's avatar
Nicolau Manubens committed
available deterministic and probabilistic scores.