diff --git a/DESCRIPTION b/DESCRIPTION index 2c87e1472aeed292a148f0e89adbc82e9fc0453b..07433f279959e102a655e5b3b8d32c51e0a00770 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -1,7 +1,7 @@ Package: CSTools Title: Assessing Skill of Climate Forecasts on Seasonal-to-Decadal Timescales -Version: 3.0.0 +Version: 3.1.0 Authors@R: c( person("Nuria", "Perez-Zanon", , "nuria.perez@bsc.es", role = c("aut", "cre"), comment = c(ORCID = "0000-0001-8568-3071")), person("Louis-Philippe", "Caron", , "louis-philippe.caron@bsc.es", role = "aut", comment = c(ORCID = "0000-0001-5221-0147")), @@ -15,13 +15,17 @@ Authors@R: c( person("Deborah", "Verfaillie", , "deborah.verfaillie@bsc.es", role = "aut"), person("Lauriane", "Batte", , "lauriane.batte@meteo.fr", role = "ctb"), person("Filippo", "Cali Quaglia", , "filippo.caliquaglia@gmail.com", role = "ctb"), + person("Chihchung", "Chou", , "chihchung.chou@bsc.es", role = "ctb"), + person("Nicola", "Cortesi", , "nicola.cortesi@bsc.es", role = "ctb"), person("Susanna", "Corti", , "s.corti@isac.cnr.it", role = "ctb"), person("Paolo", "Davini", , "p.davini@isac.cnr.it", role = "ctb"), + person("Marta", "Dominguez", , "mdomingueza@aemet.es", role = "ctb"), person("Federico", "Fabiano", , "f.fabiano@isac.cnr.it", role = "ctb"), person("Ignazio", "Giuntoli", , "i.giuntoli@isac.cnr.it", role = "ctb"), person("Raul", "Marcos", , "raul.marcos@bsc.es", role = "ctb"), person("Niti", "Mishra", , "niti.mishra@bsc.es", role = "ctb"), person("Jesus", "Peña", , "jesus.pena@bsc.es", role = "ctb"), + person("Francesc", "Roura-Adserias", , "francesc.roura@bsc.es", role = "ctb"), person("Silvia", "Terzago", , "s.terzago@isac.cnr.it", role = "ctb"), person("Danila", "Volpi", , "d.volpi@isac.cnr.it", role = "ctb"), person("BSC-CNS", role = c("cph"))) @@ -42,12 +46,13 @@ Description: Exploits dynamical seasonal forecasts in order to provide Van Schaeybroeck et al. (2019) . Yiou et al. (2013) . Depends: - R (>= 3.4.2), + R (>= 3.4.0), maps Imports: s2dverification, + s2dv, rainfarmr, - multiApply, + multiApply (>= 2.1.1), qmap, ClimProjDiags, ncdf4, @@ -56,6 +61,7 @@ Imports: data.table, reshape2, ggplot2, + RColorBrewer, graphics, grDevices, stats, @@ -65,7 +71,9 @@ Suggests: zeallot, testthat, knitr, - rmarkdown + markdown, + rmarkdown, + startR VignetteBuilder: knitr License: Apache License 2.0 Encoding: UTF-8 diff --git a/NAMESPACE b/NAMESPACE index bd5d0f1649b1d8529ffb6d16118bc8c87ec186ab..9f9a0e3481c2eaa1eb04b26776b497b25783564e 100644 --- a/NAMESPACE +++ b/NAMESPACE @@ -17,19 +17,29 @@ export(CST_MultiMetric) export(CST_MultivarRMSE) export(CST_QuantileMapping) export(CST_RFSlope) +export(CST_RFTemp) export(CST_RFWeights) export(CST_RainFARM) +export(CST_RegimesAssign) export(CST_SaveExp) export(CST_SplitDim) +export(CST_WeatherRegimes) +export(Calibration) export(EnsClustering) export(MergeDims) export(MultiEOF) export(PlotCombinedMap) export(PlotForecastPDF) export(PlotMostLikelyQuantileMap) +export(PlotPDFsOLE) +export(PlotTriangles4Categories) export(RFSlope) +export(RFTemp) export(RainFARM) +export(RegimesAssign) +export(SaveExp) export(SplitDim) +export(WeatherRegime) export(as.s2dv_cube) export(s2dv_cube) import(abind) @@ -41,6 +51,7 @@ import(rainfarmr) import(s2dverification) import(stats) importFrom(ClimProjDiags,SelBox) +importFrom(RColorBrewer,brewer.pal) importFrom(data.table,CJ) importFrom(data.table,data.table) importFrom(data.table,setkey) @@ -57,16 +68,23 @@ importFrom(grDevices,png) importFrom(grDevices,postscript) importFrom(grDevices,svg) importFrom(grDevices,tiff) +importFrom(graphics,axis) importFrom(graphics,box) importFrom(graphics,image) importFrom(graphics,layout) importFrom(graphics,mtext) importFrom(graphics,par) +importFrom(graphics,plot) importFrom(graphics,plot.new) +importFrom(graphics,points) +importFrom(graphics,polygon) +importFrom(graphics,text) +importFrom(graphics,title) importFrom(maps,map) importFrom(plyr,.) importFrom(plyr,dlply) importFrom(reshape2,melt) +importFrom(s2dv,Reorder) importFrom(utils,glob2rx) importFrom(utils,head) importFrom(utils,tail) diff --git a/NEWS.md b/NEWS.md index 3e625ed09e2933e7f4255ceef9fc1ad89e8af161..fab70c708389c95d71195da561f6b346cfed7d2b 100644 --- a/NEWS.md +++ b/NEWS.md @@ -1,3 +1,28 @@ +### CSTools 3.1.0 +**Submission date to CRAN: XX-06-2020** + +- New features: + + EnsClustering vignette + + EnsClustering has a new parameter 'time_dim' + + CST_BiasCorrection has na.rm paramter + + CST_Anomaly allows to smooth the climatology with filter.span parameter + + PlotTriangles4Categories new plotting function to convert any 3-d numerical array to a grid of coloured triangles. + + CST_WeatherRegimes/WeatherRegimes and CST_RegimeAssign/RegimeAssign + + PlotPDFsOLE plots two probability density gaussian functions and the optimal linear estimation + + CST_RFTemp/RF_Temp functions available for downscaling temperature + + Weather Regimes vignette + +- Fixes + + CST_Anomaly handles exp, obs or both + + PlotForecastPDF vignette displays figures correctly + + Calibration function is exposed to users + + MultiMetric vignette fixed typo text description + + RainFARM checks 'slope' is not a vector + + DESCRIPTION specifies the minimum multiApply version required + + EnsClustering has a fixed 'closest_member' output + + PlotCombinedMap handles masks correctly + + CST_SaveExp uses multiApply and save time dimension correctly + ### CSTools 3.0.0 **Submission date to CRAN: 10-02-2020** @@ -8,6 +33,7 @@ - Fixes + CST_Calibration handles missing values + BEI functions handle missing values + ### CSTools 2.0.0 **Submission date to CRAN: 25-11-2019** @@ -33,6 +59,7 @@ - Adding reference to S2S4E H2020 project into the DESCRIPTION file - Adding NEWS.md file + ### CSTools 1.0.1 **Release date on CRAN: 19-06-2019** diff --git a/R/CST_Anomaly.R b/R/CST_Anomaly.R index e76c46ab67164a77196460bf0e7dbb890d785819..6e33c3411d1000abb0c15e4994aa9d7fbd473b23 100644 --- a/R/CST_Anomaly.R +++ b/R/CST_Anomaly.R @@ -12,7 +12,7 @@ #'@param cross A logical value indicating whether cross-validation should be applied or not. Default = FALSE. #'@param memb A logical value indicating whether Clim() computes one climatology for each experimental data #'product member(TRUE) or it computes one sole climatology for all members (FALSE). Default = TRUE. -#' +#'@param filter_span a numeric value indicating the degree of smoothing. This option is only available if parameter \code{cross} is set to FALSE. #'@param dim_anom An integer indicating the dimension along which the climatology will be computed. It #'usually corresponds to 3 (sdates) or 4 (ftime). Default = 3. #' @@ -44,22 +44,29 @@ #'anom4 <- CST_Anomaly(exp = exp, obs = obs, cross = FALSE, memb = FALSE) #'str(anom4) #' +#'anom5 <- CST_Anomaly(lonlat_data$exp) +#' +#'anom6 <- CST_Anomaly(obs = lonlat_data$obs) +#' #'@seealso \code{\link[s2dverification]{Ano_CrossValid}}, \code{\link[s2dverification]{Clim}} and \code{\link{CST_Load}} #' #' #'@export -CST_Anomaly <- function(exp = NULL, obs = NULL, cross = FALSE, memb = TRUE, dim_anom = 3) { +CST_Anomaly <- function(exp = NULL, obs = NULL, cross = FALSE, memb = TRUE, + filter_span = NULL, dim_anom = 3) { - if (!inherits(exp, 's2dv_cube') || !inherits(obs, 's2dv_cube')) { + if (!inherits(exp, 's2dv_cube') & !is.null(exp) || + !inherits(obs, 's2dv_cube') & !is.null(obs)) { stop("Parameter 'exp' and 'obs' must be of the class 's2dv_cube', ", "as output by CSTools::CST_Load.") } - if (!is.null(obs) & dim(obs$data)['member'] != 1) { + if (!is.null(obs)) { + if (dim(obs$data)['member'] != 1) { stop("The length of the dimension 'member' in the component 'data' ", "of the parameter 'obs' must be equal to 1.") + } } - case_exp = case_obs = 0 if (is.null(exp) & is.null(obs)) { stop("One of the parameter 'exp' or 'obs' cannot be NULL.") @@ -75,6 +82,7 @@ CST_Anomaly <- function(exp = NULL, obs = NULL, cross = FALSE, memb = TRUE, dim_ warning("Parameter 'obs' is not provided and will be recycled.") } + if (!is.null(names(dim(exp$data))) & !is.null(names(dim(obs$data)))) { if (all(names(dim(exp$data)) %in% names(dim(obs$data)))) { dimnames <- names(dim(exp$data)) @@ -88,8 +96,8 @@ CST_Anomaly <- function(exp = NULL, obs = NULL, cross = FALSE, memb = TRUE, dim_ } dim_exp <- dim(exp$data) dim_obs <- dim(obs$data) + dimnames_data <- names(dim_exp) - if (dim_exp[dim_anom] == 1 | dim_obs[dim_anom] == 1) { stop("The length of dimension 'dim_anom' in label 'data' of the parameter ", "'exp' and 'obs' must be greater than 1.") @@ -137,7 +145,22 @@ CST_Anomaly <- function(exp = NULL, obs = NULL, cross = FALSE, memb = TRUE, dim_ # Without cross-validation } else { tmp <- Clim(var_exp = exp$data, var_obs = obs$data, memb = memb) - + if (!is.null(filter_span)) { + if (is.numeric(filter_span)) { + pos_dims <- names(dim(tmp$clim_exp)) + reorder <- match(pos_dims, c('ftime', + pos_dims[-which(pos_dims == 'ftime')])) + tmp$clim_obs <- aperm(apply(tmp$clim_obs, c(1 : + length(dim(tmp$clim_obs)))[-which(names(dim(tmp$clim_obs)) == 'ftime')], + .Loess, loess_span = filter_span), reorder) + tmp$clim_exp <- aperm(apply(tmp$clim_exp, c(1 : + length(dim(tmp$clim_exp)))[-which(names(dim(tmp$clim_exp)) == 'ftime')], + .Loess, loess_span = filter_span), reorder) + } else { + warning("Paramater 'filter_span' is not numeric and any filter", + " is being applied.") + } + } if (memb) { clim_exp <- tmp$clim_exp clim_obs <- tmp$clim_obs @@ -186,3 +209,10 @@ CST_Anomaly <- function(exp = NULL, obs = NULL, cross = FALSE, memb = TRUE, dim_ return(list(exp = exp, obs = obs)) } } +.Loess <- function(clim, loess_span) { + data <- data.frame(ensmean = clim, day = 1 : length(clim)) + loess_filt <- loess(ensmean ~ day, data, span = loess_span) + output <- predict(loess_filt) + return(output) +} + diff --git a/R/CST_BiasCorrection.R b/R/CST_BiasCorrection.R index 89821dca448b11bda9c4d5c0f832c83f9d14fb1f..1da7fb5be8a8679d022a7d5179dcbdc2830acd26 100644 --- a/R/CST_BiasCorrection.R +++ b/R/CST_BiasCorrection.R @@ -5,6 +5,7 @@ #' #'@param exp an object of class \code{s2dv_cube} as returned by \code{CST_Load} function, containing the seasonal forecast experiment data in the element named \code{$data} #'@param obs an object of class \code{s2dv_cube} as returned by \code{CST_Load} function, containing the observed data in the element named \code{$data}. +#'@param na.rm a logical value indicating whether missing values should be stripped before the computation proceeds, by default it is set to FALSE. #' #'@return an object of class \code{s2dv_cube} containing the bias corrected forecasts in the element called \code{$data} with the same dimensions of the experimental data. #' @@ -30,7 +31,7 @@ #'a <- CST_BiasCorrection(exp = exp, obs = obs) #'str(a) #'@export -CST_BiasCorrection <- function(exp, obs) { +CST_BiasCorrection <- function(exp, obs, na.rm = FALSE) { if (!inherits(exp, 's2dv_cube') || !inherits(obs, 's2dv_cube')) { stop("Parameter 'exp' and 'obs' must be of the class 's2dv_cube', ", "as output by CSTools::CST_Load.") @@ -41,7 +42,7 @@ CST_BiasCorrection <- function(exp, obs) { "of the parameter 'obs' must be equal to 1.") } dimnames <- names(dim(exp$data)) - BiasCorrected <- BiasCorrection(exp = exp$data, obs = obs$data) + BiasCorrected <- BiasCorrection(exp = exp$data, obs = obs$data, na.rm = na.rm) pos <- match(dimnames, names(dim(BiasCorrected))) BiasCorrected <- aperm(BiasCorrected, pos) names(dim(BiasCorrected)) <- dimnames @@ -51,7 +52,7 @@ CST_BiasCorrection <- function(exp, obs) { return(exp) } -BiasCorrection <- function (exp, obs) { +BiasCorrection <- function (exp, obs , na.rm = FALSE) { if (!all(c('member', 'sdate') %in% names(dim(exp)))) { stop("Parameter 'exp' must have the dimensions 'member' and 'sdate'.") @@ -69,6 +70,16 @@ BiasCorrection <- function (exp, obs) { warning("Parameter 'obs' contains NA values.") } + if (!is.logical(na.rm)) { + na.rm <- FALSE + warning("Paramater 'na.rm' must be a logical, it has been set to FALSE.") + } + + if (length(na.rm)>1) { + na.rm <- na.rm[1] + warning("Paramter 'na.rm' has length greater than 1, and only the fist element is used.") + } + target_dims_obs <- 'sdate' if ('member' %in% names(dim(obs))) { target_dims_obs <- c('member', target_dims_obs) @@ -76,11 +87,11 @@ BiasCorrection <- function (exp, obs) { BiasCorrected <- Apply(data = list(var_obs = obs, var_exp = exp), target_dims = list(target_dims_obs, c('member', 'sdate')), - fun = .sbc)$output1 + fun = .sbc , na.rm = na.rm)$output1 return(BiasCorrected) } -.sbc <- function(var_obs, var_exp) { +.sbc <- function(var_obs, var_exp , na.rm = FALSE) { nmembers <- dim(var_exp)['member'][] ntime <- dim(var_exp)['sdate'][] if (all(names(dim(var_exp)) != c('member','sdate'))) { @@ -96,10 +107,10 @@ BiasCorrection <- function (exp, obs) { obs <- var_obs[-t] # parameters - sd_obs <- sd(obs) - sd_exp <- sd(hcst) - clim_exp <- mean(hcst) - clim_obs <- mean(obs) + sd_obs <- sd(obs , na.rm = na.rm) + sd_exp <- sd(hcst , na.rm = na.rm) + clim_exp <- mean(hcst , na.rm = na.rm) + clim_obs <- mean(obs , na.rm = na.rm) # bias corrected forecast corrected[ , t] <- ((fcst - clim_exp) * (sd_obs / sd_exp)) + clim_obs diff --git a/R/CST_Calibration.R b/R/CST_Calibration.R index a7f1924b208663909a774c94e0096f2e3d824428..ca29039764d5cbe108f1c340cf9188963972141e 100644 --- a/R/CST_Calibration.R +++ b/R/CST_Calibration.R @@ -34,12 +34,9 @@ #'str(a) #'@export -CST_Calibration <- function(exp, obs, - cal.method = "mse_min", - eval.method = "leave-one-out", - multi.model = F, - na.fill = T, - ncores = 1) { +CST_Calibration <- function(exp, obs, cal.method = "mse_min", + eval.method = "leave-one-out", multi.model = FALSE, + na.fill = TRUE, ncores = 1) { if (!inherits(exp, "s2dv_cube") || !inherits(obs, "s2dv_cube")) { stop("Parameter 'exp' and 'obs' must be of the class 's2dv_cube', ", "as output by CSTools::CST_Load.") @@ -92,13 +89,16 @@ CST_Calibration <- function(exp, obs, #' #'@seealso \code{\link{CST_Load}} #' - -Calibration <- function(exp, obs, - cal.method = "mse_min", - eval.method = "leave-one-out", - multi.model = F, - na.fill = T, - ncores = 1) { +#'@examples +#'mod1 <- 1 : (1 * 3 * 4 * 5 * 6 * 7) +#'dim(mod1) <- c(dataset = 1, member = 3, sdate = 4, ftime = 5, lat = 6, lon = 7) +#'obs1 <- 1 : (1 * 1 * 4 * 5 * 6 * 7) +#'dim(obs1) <- c(dataset = 1, member = 1, sdate = 4, ftime = 5, lat = 6, lon = 7) +#'a <- Calibration(exp = mod1, obs = obs1) +#'str(a) +#'@export +Calibration <- function(exp, obs, cal.method = "mse_min", eval.method = "leave-one-out", + multi.model = FALSE, na.fill = TRUE, ncores = 1) { dim.exp <- dim(exp) amt.dims.exp <- length(dim.exp) diff --git a/R/CST_EnsClustering.R b/R/CST_EnsClustering.R index 62d2924423992837b292ee7301521eea0599f799..93a34d9ffa9f9caf05b8cd5b693b7be4b52220f4 100644 --- a/R/CST_EnsClustering.R +++ b/R/CST_EnsClustering.R @@ -52,7 +52,7 @@ #' spatial dimensions named "lon" and "lat", and dimensions "dataset", "member", "ftime", "sdate". #' @param time_moment Decides the moment to be applied to the time dimension. Can be either 'mean' (time mean), #' 'sd' (standard deviation along time) or 'perc' (a selected percentile on time). -#' If 'perc' the keyword 'time_percentile' is also needed. +#' If 'perc' the keyword 'time_percentile' is also used. #' @param time_percentile Set the percentile in time you want to analyse (used for `time_moment = "perc"). #' @param numclus Number of clusters (scenarios) to be calculated. #' If set to NULL the number of ensemble members divided by 10 is used, with a minimum of 2 and a maximum of 8. @@ -61,7 +61,10 @@ #' @param variance_explained variance (percentage) to be explained by the set of EOFs. #' Defaults to 80. Not used if numpcs is specified. #' @param numpcs Number of EOFs retained in the analysis (optional). -#' @param cluster_dim Dimension along which to cluster. Typically "member" or "sdate". This can also be a list like c("member", "sdate"). +#' @param cluster_dim Dimension along which to cluster. Typically "member" or "sdate". +#' This can also be a list like c("member", "sdate"). +#' @param time_dim String or character array with name(s) of dimension(s) over which to compute statistics. +#' If omitted c("ftime", "sdate", "time") are searched in this order. #' @param verbose Logical for verbose output #' @return A list with elements \code{$cluster} (cluster assigned for each member), #' \code{$freq} (relative frequency of each cluster), \code{$closest_member} @@ -70,22 +73,23 @@ #' \code{$lon} (selected longitudes of output fields), #' \code{$lat} (selected longitudes of output fields). #' @examples +#'\donttest{ #' exp <- lonlat_data$exp #' # Example 1: Cluster on all start dates, members and models #' res <- CST_EnsClustering(exp, numclus = 3, #' cluster_dim = c("member", "dataset", "sdate")) #' iclus = res$cluster[2, 1, 3] -#'\donttest{ +#' #' print(paste("Cluster of 2. member, 1. dataset, 3. sdate:", iclus)) #' print(paste("Frequency (numerosity) of cluster (", iclus, ") :", res$freq[iclus])) #' library(s2dverification) #' PlotEquiMap(res$repr_field[iclus, , ], exp$lon, exp$lat, #' filled.continents = FALSE, #' toptitle = paste("Representative field of cluster", iclus)) -#'} +#' #' # Example 2: Cluster on members retaining 4 EOFs during #' # preliminary dimensional reduction -#'\donttest{ +#' #' res <- CST_EnsClustering(exp, numclus = 3, numpcs = 4, cluster_dim = "member") #' # Example 3: Cluster on members, retain 80% of variance during #' # preliminary dimensional reduction @@ -100,7 +104,7 @@ #'@export CST_EnsClustering <- function(exp, time_moment = "mean", numclus = NULL, lon_lim = NULL, lat_lim = NULL, - variance_explained = 80, numpcs = NULL, + variance_explained = 80, numpcs = NULL, time_dim = NULL, time_percentile = 90, cluster_dim = "member", verbose = F) { if (!inherits(exp, "s2dv_cube")) { @@ -112,7 +116,7 @@ CST_EnsClustering <- function(exp, time_moment = "mean", numclus = NULL, time_moment = time_moment, numclus = numclus, lon_lim = lon_lim, lat_lim = lat_lim, variance_explained = variance_explained, numpcs = numpcs, - time_percentile = time_percentile, + time_percentile = time_percentile, time_dim = time_dim, cluster_dim = cluster_dim, verbose = verbose) return(result) @@ -136,7 +140,7 @@ CST_EnsClustering <- function(exp, time_moment = "mean", numclus = NULL, #' @param lon Vector of longitudes. #' @param time_moment Decides the moment to be applied to the time dimension. Can be either 'mean' (time mean), #' 'sd' (standard deviation along time) or 'perc' (a selected percentile on time). -#' If 'perc' the keyword 'time_percentile' is also needed. +#' If 'perc' the keyword 'time_percentile' is also used. #' @param time_percentile Set the percentile in time you want to analyse (used for `time_moment = "perc"). #' @param numclus Number of clusters (scenarios) to be calculated. #' If set to NULL the number of ensemble members divided by 10 is used, with a minimum of 2 and a maximum of 8. @@ -145,7 +149,10 @@ CST_EnsClustering <- function(exp, time_moment = "mean", numclus = NULL, #' @param variance_explained variance (percentage) to be explained by the set of EOFs. #' Defaults to 80. Not used if numpcs is specified. #' @param numpcs Number of EOFs retained in the analysis (optional). -#' @param cluster_dim Dimension along which to cluster. Typically "member" or "sdate". This can also be a list like c("member", "sdate"). +#' @param cluster_dim Dimension along which to cluster. Typically "member" or "sdate". +#' This can also be a list like c("member", "sdate"). +#' @param time_dim String or character array with name(s) of dimension(s) over which to compute statistics. +#' If omitted c("ftime", "sdate", "time") are searched in this order. #' @param verbose Logical for verbose output #' @return A list with elements \code{$cluster} (cluster assigned for each member), #' \code{$freq} (relative frequency of each cluster), \code{$closest_member} @@ -155,27 +162,50 @@ CST_EnsClustering <- function(exp, time_moment = "mean", numclus = NULL, #' \code{$lat} (selected longitudes of output fields). #' #' @examples +#'\donttest{ #' exp <- lonlat_data$exp #' res <- EnsClustering(exp$data, exp$lat, exp$lon, numclus = 3, #' cluster_dim = c("member", "dataset", "sdate")) +#'} #'@export EnsClustering <- function(data, lat, lon, time_moment = "mean", numclus = NULL, lon_lim = NULL, lat_lim = NULL, variance_explained = 80, - numpcs = NULL, time_percentile = 90, + numpcs = NULL, time_percentile = 90, time_dim = NULL, cluster_dim = "member", verbose = T) { + # Check/detect time_dim + if (is.null(time_dim)) { + time_dim_names <- c("ftime", "sdate", "time") + time_dim_num <- which(time_dim_names %in% names(dim(data))) + if (length(time_dim_num) > 0) { + # Find time dimension with length > 1 + ilong <- which(dim(data)[time_dim_names[time_dim_num]] > 1) + if (length(ilong) > 0) { + time_dim <- time_dim_names[time_dim_num[ilong[1]]] + } else { + stop("No time dimension longer than one found.") + } + } else { + stop("Could not automatically detect a target time dimension ", + "in the provided data in 'data'.") + } + .printv(paste("Selected time dim:", time_dim), verbose) + } + # Apply time_moment if (time_moment == "mean") { .printv("Considering the time_moment: mean", verbose) - exp <- apply(data, c(1, 2, 3, 5, 6), mean) + exp <- Apply(data, target_dims = time_dim, mean)$output1 } else if (time_moment == "sd") { .printv("Considering the time_moment: sd", verbose) - exp <- apply(data, c(1, 2, 3, 5, 6), sd) + exp <- Apply(data, target_dims = time_dim, sd)$output1 } else if (time_moment == "perc") { .printv(paste0("Considering the time_moment: percentile ", sprintf("%5f", time_percentile)), verbose) - exp <- apply(data, c(1, 2, 3, 5, 6), quantile, time_percentile / 100.) + exp <- Apply(data, target_dims = time_dim, + function(x) {quantile(as.vector(x), + time_percentile / 100.)})$output1 } else { stop(paste0("Invalid time_moment '", time_moment, "' specified!")) } @@ -186,6 +216,20 @@ EnsClustering <- function(data, lat, lon, time_moment = "mean", numclus = NULL, lon_lim = lon_lim, lat_lim = lat_lim, variance_explained = variance_explained, numpcs = numpcs, verbose = verbose) + + # Expand result$closest_member into indices in cluster_dim dimensions + cm=result$closest_member + cml <- vector(mode = "list", length = length(cluster_dim)) + cum <- cm * 0 + dim_cd <- dim(exp)[cluster_dim] + for (i in rev(seq_along(cluster_dim))) { + cml[[i]] <- floor((cm - cum - 1) / prod(dim_cd[-i])) + 1 + cum <- cum + (cml[[i]] - 1) * prod(dim_cd[-i]) + dim_cd <- dim_cd[-i] + } + names(cml) <- cluster_dim + result$closest_member <- cml + return(append(result, list(lat = lat, lon = lon))) } diff --git a/R/CST_QuantileMapping.R b/R/CST_QuantileMapping.R index 25651241451c685d30f88f97c128ee3e80aad4dc..3923569ae3ea59355fc2ba36dbd2d11f05d99d4a 100644 --- a/R/CST_QuantileMapping.R +++ b/R/CST_QuantileMapping.R @@ -14,11 +14,11 @@ #' #'@details The different methods are: #'\itemize{ -#'\item{'PTF'} {fits a parametric transformations to the quantile-quantile relation of observed and modelled values. See \code{\link[qmap]{fitQmapPTF}}.} -#' \item{'DIST'} {fits a theoretical distribution to observed and to modelled time series. See \code{\link[qmap]{fitQmapDIST}}.} -#'\item{'RQUANT'} {estimates the values of the quantile-quantile relation of observed and modelled time series for regularly spaced quantiles using local linear least square regression. See \code{\link[qmap]{fitQmapRQUANT}}.} -#'\item{'QUANT'} {estimates values of the empirical cumulative distribution function of observed and modelled time series for regularly spaced quantiles. See \code{\link[qmap]{fitQmapQUANT}}.} -#'\item{'SSPLIN'} {fits a smoothing spline to the quantile-quantile plot of observed and modelled time series. See \code{\link[qmap]{fitQmapSSPLIN}}}.} +#'\item{'PTF'} {fits a parametric transformations to the quantile-quantile relation of observed and modelled values. See \code{?qmap::fitQmapPTF}.} +#' \item{'DIST'} {fits a theoretical distribution to observed and to modelled time series. See \code{?qmap::fitQmapDIST}.} +#'\item{'RQUANT'} {estimates the values of the quantile-quantile relation of observed and modelled time series for regularly spaced quantiles using local linear least square regression. See \code{?qmap::fitQmapRQUANT}.} +#'\item{'QUANT'} {estimates values of the empirical cumulative distribution function of observed and modelled time series for regularly spaced quantiles. See \code{?qmap::fitQmapQUANT}.} +#'\item{'SSPLIN'} {fits a smoothing spline to the quantile-quantile plot of observed and modelled time series. See \code{?qmap::fitQmapSSPLIN}.}} #'All methods accepts some common arguments: #'\itemize{ #'\item{wet.day} {logical indicating whether to perform wet day correction or not.(Not available in 'DIS' method)} @@ -29,7 +29,7 @@ #'@import multiApply #'@import abind #' -#'@seealso \code{\link[qmap]{fitQmap}} and \code{\link[qmap]{doQmap}} +#'@seealso \code{qmap::fitQmap} and \code{qmap::doQmap} #'@examples #'library(qmap) #'exp <- 1 : (1 * 5 * 10 * 6 * 2 * 3) diff --git a/R/CST_RFTemp.R b/R/CST_RFTemp.R new file mode 100644 index 0000000000000000000000000000000000000000..cebac85aa555252b19ba156d7d2d4e2a60f946a3 --- /dev/null +++ b/R/CST_RFTemp.R @@ -0,0 +1,563 @@ +#' @rdname CST_RFTemp +#' @title Temperature downscaling of a CSTools object using lapse rate +#' correction or a reference field +#' @author Jost von Hardenberg - ISAC-CNR, \email{j.vonhardenberg@isac.cnr.it} +#' @description This function implements a simple lapse rate correction of a +#' temperature field (an object of class 's2dv_cube' as provided by +#' `CST_Load`) as input. +#' The input lon grid must be increasing (but can be modulo 360). +#' The input lat grid can be irregularly spaced (e.g. a Gaussian grid) +#' The output grid can be irregularly spaced in lon and/or lat. +#' @references Method described in ERA4CS MEDSCOPE milestone M3.2: +#' High-quality climate prediction data available to WP4 +#' [https://www.medscope-project.eu/the-project/deliverables-reports/]([https://www.medscope-project.eu/the-project/deliverables-reports/) +#' and in H2020 ECOPOTENTIAL Deliverable No. 8.1: +#' High resolution (1-10 km) climate, land use and ocean change scenarios +#' [https://www.ecopotential-project.eu/images/ecopotential/documents/D8.1.pdf](https://www.ecopotential-project.eu/images/ecopotential/documents/D8.1.pdf) +#' @param data An object of the class 's2dv_cube' as returned by `CST_Load`, +#' containing the temperature fields to downscale. +#' The data object is expected to have an element named \code{$data} +#' with at least two spatial dimensions named "lon" and "lat". +#' (these default names can be changed with the \code{lon_dim} and +#' \code{lat_dim} parameters) +#' @param oro An object of the class 's2dv_cube' as returned by `CST_Load`, +#' containing fine scale orography (in meters). +#' The destination downscaling area must be contained in the orography field. +#' @param xlim vector with longitude bounds for downscaling; +#' the full input field is downscaled if `xlim` and `ylim` are not specified. +#' @param ylim vector with latitude bounds for downscaling +#' @param lapse float with environmental lapse rate +#' @param lon_dim string with name of longitude dimension +#' @param lat_dim string with name of latitude dimension +#' @param time_dim a vector of character string indicating the name of temporal dimension. By default, it is set to NULL and it considers "ftime", "sdate" and "time" as temporal dimensions. +#' @param verbose logical if to print diagnostic output +#' @param nolapse logical, if true `oro` is interpreted as a fine-scale +#' climatology and used directly for bias correction +#' @param compute_delta logical if true returns only a delta to be used for +#' out-of-sample forecasts. Returns an object of the class 's2dv_cube', +#' containing a delta. Activates `nolapse = TRUE`. +#' @param delta An object of the class 's2dv_cube', containing a delta +#' to be applied to the downscaled input data. Activates `nolapse = TRUE`. +#' The grid of this object must coincide with that of the required output. +#' @param method string indicating the method used for interpolation: +#' "nearest" (nearest neighbours followed by smoothing with a circular +#' uniform weights kernel), "bilinear" (bilinear interpolation) +#' The two methods provide similar results, but nearest is slightly better +#' provided that the fine-scale grid is correctly centered as a subdivision +#' of the large-scale grid +#' @return CST_RFTemp() returns a downscaled CSTools object +#' (i.e., of the class 's2dv_cube'). +#' @export +#' @import multiApply +#' @examples +#' # Generate simple synthetic data and downscale by factor 4 +#' t <- rnorm(7 * 6 * 2 * 3 * 4)*10 + 273.15 + 10 +#' dim(t) <- c(dataset = 1, member = 2, sdate = 3, ftime = 4, lat = 6, lon = 7) +#' lon <- seq(3, 9, 1) +#' lat <- seq(42, 47, 1) +#' exp <- list(data = t, lat = lat, lon = lon) +#' attr(exp, 'class') <- 's2dv_cube' +#' o <- runif(29*29)*3000 +#' dim(o) <- c(lat = 29, lon = 29) +#' lon <- seq(3, 10, 0.25) +#' lat <- seq(41, 48, 0.25) +#' oro <- list(data = o, lat = lat, lon = lon) +#' attr(oro, 'class') <- 's2dv_cube' +#' res <- CST_RFTemp(exp, oro, xlim=c(4,8), ylim=c(43, 46), lapse=6.5) + +CST_RFTemp <- function(data, oro, xlim = NULL, ylim = NULL, lapse = 6.5, + lon_dim = "lon", lat_dim = "lat", time_dim = NULL, + nolapse = FALSE, verbose = FALSE, compute_delta = FALSE, + method = "bilinear", delta = NULL) { + + if (!inherits(data, "s2dv_cube")) { + stop("Parameter 'data' must be of the class 's2dv_cube', ", + "as output by CSTools::CST_Load.") + } + if (!inherits(oro, "s2dv_cube")) { + stop("Parameter 'oro' must be of the class 's2dv_cube', ", + "as output by CSTools::CST_Load.") + } + if (!is.null(delta)) { + if (!inherits(delta, "s2dv_cube")) { + stop("Parameter 'delta' must be of the class 's2dv_cube', ", + "as output by CSTools::CST_Load.") + } + } + + res <- RFTemp(data$data, data$lon, data$lat, + oro$data, oro$lon, oro$lat, xlim, ylim, lapse, + lon_dim = lon_dim, lat_dim = lat_dim, time_dim = time_dim, + nolapse = nolapse, verbose = verbose, method = method, + compute_delta = compute_delta, delta = delta$data) + + data$data <- res$data + data$lon <- res$lon + data$lat <- res$lat + + return(data) +} + +#' @rdname RFTemp +#' @title Temperature downscaling of a CSTools object using lapse rate +#' correction (reduced version) +#' @author Jost von Hardenberg - ISAC-CNR, \email{j.vonhardenberg@isac.cnr.it} +#' @description This function implements a simple lapse rate correction of a +#' temperature field (a multidimensional array) as input. +#' The input lon grid must be increasing (but can be modulo 360). +#' The input lat grid can be irregularly spaced (e.g. a Gaussian grid) +#' The output grid can be irregularly spaced in lon and/or lat. +#' @references Method described in ERA4CS MEDSCOPE milestone M3.2: +#' High-quality climate prediction data available to WP4 +#' [https://www.medscope-project.eu/the-project/deliverables-reports/]([https://www.medscope-project.eu/the-project/deliverables-reports/) +#' and in H2020 ECOPOTENTIAL Deliverable No. 8.1: +#' High resolution (1-10 km) climate, land use and ocean change scenarios +#' [https://www.ecopotential-project.eu/images/ecopotential/documents/D8.1.pdf](https://www.ecopotential-project.eu/images/ecopotential/documents/D8.1.pdf) +#' @param data Temperature array to downscale. +#' The input array is expected to have at least two dimensions named +#' "lon" and "lat" by default +#' (these default names can be changed with the \code{lon_dim} and +#' \code{lat_dim} parameters) +#' @param lon Vector or array of longitudes. +#' @param lat Vector or array of latitudes. +#' @param lonoro Vector or array of longitudes corresponding to the fine orography. +#' @param latoro Vector or array of latitudes corresponding to the fine orography. +#' @param oro Array containing fine-scale orography (in m) +#' The destination downscaling area must be contained in the orography field. +#' @param xlim vector with longitude bounds for downscaling; +#' the full input field is downscaled if `xlim` and `ylim` are not specified. +#' @param ylim vector with latitude bounds for downscaling +#' @param lapse float with environmental lapse rate +#' @param lon_dim string with name of longitude dimension +#' @param lat_dim string with name of latitude dimension +#' @param time_dim a vector of character string indicating the name of temporal dimension. By default, it is set to NULL and it considers "ftime", "sdate" and "time" as temporal dimensions. +#' @param verbose logical if to print diagnostic output +#' @param nolapse logical, if true `oro` is interpreted as a +#' fine-scale climatology and used directly for bias correction +#' @param compute_delta logical if true returns only a delta to be used for +#' out-of-sample forecasts. +#' @param delta matrix containing a delta to be applied to the downscaled +#' input data. The grid of this matrix is supposed to be same as that of +#' the required output field +#' @param method string indicating the method used for interpolation: +#' "nearest" (nearest neighbours followed by smoothing with a circular +#' uniform weights kernel), "bilinear" (bilinear interpolation) +#' The two methods provide similar results, but nearest is slightly better +#' provided that the fine-scale grid is correctly centered as a subdivision +#' of the large-scale grid +#' @return CST_RFTemp() returns a downscaled CSTools object +#' @return RFTemp() returns a list containing the fine-scale +#' longitudes, latitudes and the downscaled fields. +#' @export +#' @import multiApply +#' @examples +#' # Generate simple synthetic data and downscale by factor 4 +#' t <- rnorm(7 * 6 * 4 * 3) * 10 + 273.15 + 10 +#' dim(t) <- c(sdate = 3, ftime = 4, lat = 6, lon = 7) +#' lon <- seq(3, 9, 1) +#' lat <- seq(42, 47, 1) +#' o <- runif(29 * 29) * 3000 +#' dim(o) <- c(lat = 29, lon = 29) +#' lono <- seq(3, 10, 0.25) +#' lato <- seq(41, 48, 0.25) +#' res <- RFTemp(t, lon, lat, o, lono, lato, xlim = c(4, 8), ylim = c(43, 46), +#' lapse = 6.5) + +RFTemp <- function(data, lon, lat, oro, lonoro, latoro, + xlim = NULL, ylim = NULL, lapse = 6.5, + lon_dim = "lon", lat_dim = "lat", time_dim = NULL, + nolapse = FALSE, verbose = FALSE, compute_delta = FALSE, + method = "bilinear", delta = NULL) { + + # Check/detect time_dim + if (is.null(time_dim)) { + time_dim_names <- c("ftime", "sdate", "time") + time_dim_num <- which(time_dim_names %in% names(dim(data))) + if (length(time_dim_num) > 0) { + # Find time dimension with length > 0 + ilong <- which(dim(data)[time_dim_names[time_dim_num]] > 0) + if (length(ilong) > 0) { + time_dim <- time_dim_names[time_dim_num[ilong]] + } else { + stop("No time dimension longer than zero found.") + } + } else { + stop("Could not automatically detect a target time dimension ", + "in the provided data in 'data'.") + } + warning(paste("Selected time dim:", time_dim, "\n")) + } + + # Repeatedly apply .downscalet + if (is.null(delta)) { + result <- Apply(data, target_dims = c(lon_dim, lat_dim, time_dim), + fun = .downscalet, lon, lat, oro, lonoro, latoro, + xlim = xlim, ylim = ylim, lapse = lapse, + nolapse = nolapse, verbose = verbose, method = method, + compute_delta = compute_delta) + } else { + result <- Apply(list(data, delta), + target_dims = list(c(lon_dim, lat_dim, time_dim), + c(lon_dim, lat_dim)), + fun = .downscalet_delta, lon, lat, oro, lonoro, latoro, + xlim = xlim, ylim = ylim, lapse = lapse, + nolapse = nolapse, verbose = verbose, method = method, + compute_delta = compute_delta) + } + + names(dim(result$data))[1] <- names(dim(data))[names(dim(data)) == lon_dim] + names(dim(result$data))[2] <- names(dim(data))[names(dim(data)) == lat_dim] + result$lon <- array(result$lon[1:dim(result$lon)[1]]) + result$lat <- array(result$lat[1:dim(result$lat)[1]]) + names(dim(result$lon)) <- lon_dim + names(dim(result$lat)) <- lat_dim + return(result) +} + +#' Lapse-rate temperature correction downscaling +#' +#' @description Downscales a temperature field using a lapse-rate +#' correction based on a reference orography. Time-averaging is done on all +#' dimensions after the first two. +#' @author Jost von Hardenberg, \email{j.vonhardenberg@isac.cnr.it} +#' @param lon vector of input longitudes +#' @param lat vector of input latitudes +#' @param t matrix of input temperature data +#' @param lono vector of orography longitudes +#' @param lato vector of orography latitudes +#' @param oro matrix of topographical elevations (in meters) +#' The destination downscaling area must be contained in the orography field. +#' @param xlim vector of longitude bounds; the full input field is downscaled if `xlim` and `ylim` are not specified. +#' @param ylim vector of latitude bounds +#' @param radius smoothing radius expressed in longitude units +#' (default is half a large-scale pixel) +#' @param lapse environmental lapse rate (in K/Km) +#' @param nolapse logical, if true `oro` is interpreted as a fine-scale +#' climatology and used directly for bias correction +#' @param compute_delta logical if true returns only a delta to be used for +#' out-of-sample forecasts. +#' @param delta matrix containing a delta to be applied to the input data. +#' The grid of this matrix is supposed to be same as +#' that of the required output field +#' @param verbose logical if to print diagnostic output +#' @return A downscaled temperature matrix +#' @examples +#' lon = 5:20 +#' lat = 35:40 +#' t = runif(16 * 6); dim(t) = c(16, 6) +#' lono = seq(5, 20, 0.1) +#' lato = seq(35, 40, 0.1) +#' o = runif(151 * 51) * 2000; dim(o) = c(151, 51) +#' td = .downscalet(t, lon, lat, o, lono, lato, c(8, 12), c(36, 38)) +#' @noRd +.downscalet <- function(t, lon, lat, oro, lono, lato, + xlim = NULL, ylim = NULL, + radius = 0, lapse = 6.5, nolapse = FALSE, + verbose = FALSE, compute_delta = FALSE, + method = "bilinear", delta = NULL) { + + if (!is.null(delta) & compute_delta) { + stop("Cannot `compute_delta` and provide `delta` at the same time.") + } + + tdim <- dim(t) + ldim <- FALSE + if (length(tdim) == 3) { + nt <- tdim[3] + } else if (length(tdim) == 2) { + nt <- 1 + dim(t) <- c(tdim, 1) + } else if (length(tdim) < 2) { + stop("Input array must have at least two dimensions") + } else { + ldim <- TRUE + dim(t) <- c(tdim[1:2], time = prod(tdim[-(1:2)])) + nt <- dim(t)[3] + } + if (lon[2] <= lon[1]) { + stop("Longitudes must be monotone increasing.") + } + # Regularize lon coords to monotone increasing + lon[lon >= lon[1]] <- lon[lon >= lon[1]] - 360 + if (lon[length(lon)] < 0) { lon <- lon + 360 } + lono[lono >= lono[1]] <- lono[lono >= lono[1]] - 360 + if (lono[length(lono)] < 0) { lono <- lono + 360 } + + dxl <- (lon[2] - lon[1]) / 2 + dyl <- (lat[2] - lat[1]) / 2 + if (radius == 0) { + radius <- dxl + } + if (is.null(xlim)) { + xlim <- c(lon[1] + dxl, lon[length(lon)] - dxl) + } + if (is.null(ylim)) { + ylim <- c(lat[1] + dyl, lat[length(lat)] - dyl) + if (ylim[1] > ylim[2]) { + ylim <- ylim[2:1] + } + } + #Add buffer + lon1 <- xlim[1] - radius + lon2 <- xlim[2] + radius + lat1 <- ylim[1] - radius + lat2 <- ylim[2] + radius + + orocut <- oro[(lono <= lon2) & (lono >= lon1), + (lato <= lat2) & (lato >= lat1)] + lonocut <- lono[(lono <= lon2) & (lono >= lon1)] + latocut <- lato[(lato <= lat2) & (lato >= lat1)] + + dxol <- lonocut[2] - lonocut[1] + nrad <- as.integer(radius / abs(dxol) + 0.5) + + if (length(lonocut) == 0 | length(latocut) == 0) { + stop("Orography not available for selected area") + } + + if (is.null(delta) & compute_delta & nolapse) { + if (verbose) { + print("Time averaging") + } + # If we just need the delta we can work with time averages + t <- apply(t, c(1, 2), mean) + dim(t) <- c(dim(t), time = 1) + } + + if (!(is.null(delta) & compute_delta & !nolapse)) { + # This calculation is not needed if we just need the delta + # and lapse rate is used + if (verbose) { + print(paste("Interpolation using", method, "method")) + } + tout <- .interp2d(t, lon, lat, lonocut, latocut, method = method) + # Fine-scale smooth interpolated input field + if (method == "nearest") { + if (verbose) { + print(paste("Smoothing interpolated field")) + } + tout <- .smooth(tout, nrad) + } + } + if (is.null(delta)) { + # Compute delta + if (nolapse) { + # oro is a reference fine-scale field: use that directly + # for bias correcting the interpolated input field's temporal mean + t1 <- orocut - apply(tout, c(1, 2), mean) + } else { + # Lapse-rate correction + orocuts <- .smooth(orocut, nrad) + t1 <- -(orocut - orocuts) * lapse / 1000. + } + if (compute_delta) { + # Return delta + tout <- t1[(lonocut <= xlim[2]) & (lonocut >= xlim[1]), + (latocut <= ylim[2]) & (latocut >= ylim[1])] + } else { + # Apply delta + if (verbose) { + print("Adding computed delta") + } + for (i in seq_len(nt)) { + tout[, , i] <- tout[, , i] + t1 + } + tout <- tout[(lonocut <= xlim[2]) & (lonocut >= xlim[1]), + (latocut <= ylim[2]) & (latocut >= ylim[1]), ] + if (ldim) { + dim(tout) <- c(dim(tout)[1:2], tdim[-(1:2)]) + } + } + } else { + # Just apply the provided delta + tout <- tout[(lonocut <= xlim[2]) & (lonocut >= xlim[1]), + (latocut <= ylim[2]) & (latocut >= ylim[1]), ] + if (any(dim(delta)[1:2] != dim(tout)[1:2])) { + stop("Provided delta has not the same dimensions as required output") + } + if (verbose) { + print("Adding provided delta") + } + for (i in seq_len(nt)) { + tout[, , i] <- tout[, , i] + delta + } + if (ldim) { + dim(tout) <- c(dim(tout)[1:2], tdim[-(1:2)]) + } + } + lonocut <- lonocut[(lonocut <= xlim[2]) & (lonocut >= xlim[1])] + latocut <- latocut[(latocut <= ylim[2]) & (latocut >= ylim[1])] + + return(list(data = tout, lon = lonocut, lat = latocut)) +} + +# Special driver version of .downscalet to apply delta +.downscalet_delta <- function(t, delta, lon, lat, oro, lono, lato, + xlim = NULL, ylim = NULL, radius = 0, + lapse = 6.5, nolapse = FALSE, verbose = FALSE, + compute_delta = FALSE, method = "bilinear") { + res <- .downscalet(t, lon, lat, oro, lono, lato, xlim = xlim, + ylim = ylim, radius = radius, lapse = lapse, + nolapse = nolapse, verbose = verbose, + compute_delta = compute_delta, delta = delta, + method = method) +} + +#' Nearest neighbour interpolation +#' +#' @description The input field is interpolated onto the output +#' coordinate grid using nearest neighbours or bilinear interpolation. +#' The input lon grid must be monotone increasing. +#' The input lat grid can be irregularly spaced (e.g. a Gaussian grid) +#' The output grid can be irregularly spaced in lon and/or lat. +#' @author Jost von Hardenberg, \email{j.vonhardenberg@isac.cnr.it} +#' @param z matrix with the input field to interpolate (assumed to +#' include also a third time dimension) +#' @param lon vector of input longitudes +#' @param lat vector of input latitudes +#' @param lonp vector of output longitudes +#' @param latp vector of output latitudes +#' @param method string indicating the interpolation method +#' ("nearest" or "bilinear" (default)) +#' @return The interpolated field. +#' @examples +#' lon = 5:11 +#' lat = 35:40 +#' z = runif(7 * 6 * 2); dim(z) = c(7, 6, 2) +#' lonp = seq(5, 10, 0.2) +#' latp = seq(35, 40, 0.2) +#' zo <- .interp2d(z, lon, lat, lonp, latp, method = "nearest") +#' @noRd +.interp2d <- function(z, lon, lat, lonp, latp, method="bilinear") { + + nx <- length(lonp) + ny <- length(latp) + nt <- dim(z)[3] + # Interpolate nn assuming a regular grid + zo <- array(0., c(nx, ny, nt)) + dy <- lat[2] - lat[1] + dx <- lon[2] - lon[1] + + if (method == "nearest") { + jj <- 1:length(latp) + for (j in 1:length(latp)) { + jj[j] <- which.min(abs(latp[j]-lat)) + } + ii <- ((lonp - (lon[1] - dx / 2)) %/% dx + 1) + # If lon are global and periodic attempt to fix issues + if ((lon[1] - lon[length(lon)]) %% 360 == dx) { + ii[ii <= 0] <- ii[ii <= 0] + length(lon) + ii[ii > length(lon)] <- ii[ii > length(lon)] - length(lon) + } + if ((ii[1] <= 0) | (ii[length(ii)] > length(lon)) | + (jj[1] <= 0) | (jj[length(jj)] > length(lat))) { + stop("Downscaling area not contained in input data") + } + zo[, , ] <- z[ii, jj, ] + } else { + jj <- 1:length(latp) + jj2 <- jj + for (j in 1:length(latp)) { + jmin <- which.min(abs(latp[j]-lat)) + if ( (((latp[j]-lat[jmin]) >= 0) & ( dy >= 0)) | + (((latp[j]-lat[jmin]) < 0) & ( dy <= 0))) { + jj[j] <- jmin + jj2[j] <- jmin + 1 + } else { + jj2[j] <- jmin + jj[j] <- jmin - 1 + } + } + ii <- ((lonp - lon[1]) %/% dx + 1) + ii2 <- ii + 1 + # If lon are global and periodic attempt to fix issues + if ((lon[1] - lon[length(lon)]) %% 360 == dx) { + ii[ii <= 0] <- ii[ii <= 0] + length(lon) + ii[ii > length(lon)] <- ii[ii > length(lon)] - length(lon) + ii2[ii2 <= 0] <- ii2[ii2 <= 0] + length(lon) + ii2[ii2 > length(lon)] <- ii2[ii2 > length(lon)] - length(lon) + } + if ((ii[1] <= 0) | (ii[length(ii)] > length(lon)) | + (jj[1] <= 0) | (jj[length(jj)] > length(lat)) | + (ii2[1] <= 0) | (ii2[length(ii2)] > length(lon)) | + (jj2[1] <= 0) | (jj2[length(jj2)] > length(lat))) { + stop("Downscaling area not contained in input data") + } + xx <- ((lonp - lon[ii]) / dx) %% 360 + yy <- (latp - lat[jj]) / (lat[jj2] - lat[jj]) + xx <- xx[row(zo[, , 1])] + yy <- yy[col(zo[, , 1])] + dim(xx) <- c(nx, ny) + dim(yy) <- c(nx, ny) + w00 <- (1 - xx) * (1 - yy) + w10 <- xx * (1 - yy) + w01 <- (1 - xx) * yy + w11 <- xx * yy + for (k in seq_len(nt)) { + zo[, , k] <- z[ii, jj, k] * w00 + z[ii2, jj, k] * w10 + + z[ii, jj2, k] * w01 + z[ii2, jj2, k] * w11 + } + } + names(dim(zo)) <- names(dim(z)) + return(zo) +} + +#' Smoothening using convolution with a circular kernel +#' +#' @description The input field is convolved with a circular kernel with equal +#' weights. Takes into account missing values. +#' @author Jost von Hardenberg, \email{j.vonhardenberg@isac.cnr.it} +#' @param z matrix with the input field to smoothen, with dimensions `c(ns, ns)` +#' @param sdim the smoothing kernel radius in pixel +#' @return The smoothened field. +#' @examples +#' z <- rnorm(64 * 64) +#' dim(z) <- c(64, 64) +#' zs <- smooth(z, 8) +#' sd(zs) +#' # [1] 0.1334648 +#' @noRd +.smooth <- function(z, sdim) { + nsx <- dim(z)[1] + nsy <- dim(z)[2] + zdim <- dim(z) + if (length(dim(z)) == 2) { + dim(z) <- c(dim(z), time = 1) + nt <- 1 + } else { + nt <- dim(z)[3] + } + imask <- !is.finite(z) + z[imask] <- 0. + mask <- matrix(0., nsx, nsy) + for (i in 1:nsx) { + for (j in 1:nsy) { + kx <- i - 1 + ky <- j - 1 + if (i > nsx / 2 + 1) { + kx <- i - nsx - 1 + } + if (j > nsy / 2 + 1) { + ky <- j - nsy - 1 + } + r2 <- kx * kx + ky * ky + mask[i, j] <- (r2 <= (sdim * sdim)) + } + } + fm <- fft(mask) + zf <- array(0., c(nsx, nsy, nt)) + for (k in seq_len(nt)) { + zf[, , k] <- Re(fft(fm * fft(z[, , k]), inverse = TRUE) + ) / sum(mask) / length(fm) + if (sum(imask[, , k]) > 0) { + zz <- z[, , k] + zz[!imask[, , k]] <- 1.0 + zz <- zf[, , k] / (Re(fft(fm * fft(zz), inverse = TRUE)) / + sum(mask) / length(fm)) + zz[imask[, , k]] <- NA + zf[, , k] <- zz + } + } + dim(zf) <- zdim + return(zf) +} diff --git a/R/CST_RainFARM.R b/R/CST_RainFARM.R index ff4d624dc7fa6f8f12f4183332bd4b79ba89f601..49065a974711aa9732778ee46e8ca9717ee667b9 100644 --- a/R/CST_RainFARM.R +++ b/R/CST_RainFARM.R @@ -237,6 +237,12 @@ RainFARM <- function(data, lon, lat, nf, weights = 1., nens = 1, } warning(paste("Selected time dim:", time_dim)) } + # Check if slope is an array + if (length(slope) > 1) { + warning("Parameter 'slope' has length > 1 and only the first ", + "element will be used.") + slope <- as.numeric(slope[1]) + } # Perform common calls r <- lon_lat_fine(lon, lat, nf) diff --git a/R/CST_RegimesAssign.R b/R/CST_RegimesAssign.R new file mode 100644 index 0000000000000000000000000000000000000000..60f0d7045a622bb70526b7f7248b12130cd87637 --- /dev/null +++ b/R/CST_RegimesAssign.R @@ -0,0 +1,362 @@ +#' @rdname CST_RegimesAssign +#' @title Function for matching a field of anomalies with +#' a set of maps used as a reference (e.g. clusters obtained from the WeatherRegime function) +#' +#' @author Verónica Torralba - BSC, \email{veronica.torralba@bsc.es} +#' +#' @description This function performs the matching between a field of anomalies and a set +#' of maps which will be used as a reference. The anomalies will be assigned to the reference map +#' for which the minimum Eucledian distance (method=’distance’) or highest spatial correlation +#' (method=‘ACC’) is obtained. +#' +#'@references Torralba, V. (2019) Seasonal climate prediction for the wind energy sector: methods and tools +#' for the development of a climate service. Thesis. Available online: \url{https://eprints.ucm.es/56841/} +#' +#'@param data a 's2dv_cube' object. + +#'@param ref_maps a 's2dv_cube' object as the output of CST_WeatherRegimes. +#'@param method whether the matching will be performed in terms of minimum distance (default = ’distance’) or +#' the maximum spatial correlation (method = ’ACC’) between the maps. +#'@param composite a logical parameter indicating if the composite maps are computed or not (default = FALSE). +#'@param memb a logical value indicating whether to compute composites for separate members (default FALSE) or as unique ensemble (TRUE). +#'This option is only available for when parameter 'composite' is set to TRUE and the data object has a dimension named 'member'. +#'@param ncores the number of multicore threads to use for parallel computation. +#'@return A list with two elements \code{$data} (a 's2dv_cube' object containing the composites cluster=1,..,K for case (*1) +# or only k=1 for any specific cluster, i.e., case (*2)) (only when composite = 'TRUE') and \code{$statistics} that includes +#' \code{$pvalue} (array with the same structure as \code{$data} containing the pvalue of the composites obtained through a t-test +#' that accounts for the serial dependence of the data with the same structure as Composite.)(only when composite = 'TRUE'), +#' \code{$cluster} (array with the same dimensions as data (except latitude and longitude which are removed) indicating the ref_maps to which each point is allocated.) , +#' \code{$frequency} (A vector of integers (from k=1,...k n reference maps) indicating the percentage of assignations corresponding to each map.), +#'@import s2dverification +#'@import multiApply +#'@examples +#'\dontrun{ +#'regimes <- CST_WeatherRegimes(data = lonlat_data$obs, EOFs = FALSE, ncenters = 4) +#'res1 <- CST_RegimesAssign(data = lonlat_data$exp, ref_maps = regimes, composite = FALSE) +#'res2 <- CST_RegimesAssign(data = lonlat_data$exp, ref_maps = regimes, composite = TRUE) +#'} +#'@export +#' + +CST_RegimesAssign <- function(data, ref_maps, + method = "distance", + composite = FALSE, + memb = FALSE, ncores = NULL) { + if (!inherits(data, 's2dv_cube')) { + stop("Parameter 'data' must be of the class 's2dv_cube', ", + "as output by CSTools::CST_Load.") + } + + if (!inherits(ref_maps, 's2dv_cube')) { + stop("Parameter 'ref_maps' must be of the class 's2dv_cube', ", + "as output by CSTools::CST_Load.") + } + + if ('lat' %in% names(data)){ + lat <- data$lat + } else { + lat <- NULL + } + result <- Apply(data = list(data = data$data, ref_maps = ref_maps$data), + lat = lat, fun = RegimesAssign, + target_dims = list(names(dim(data$data)), c('lat', 'lon', 'cluster')), + method = method, memb = memb, composite = composite, ncores = ncores) + + if (composite){ + data$data <- result$composite + data$statistics <- result[-1] + } else { + data <- NULL + data$statistics <- result + } + + return(data) +} + +#' @rdname RegimesAssign +#' @title Function for matching a field of anomalies with +#' a set of maps used as a reference (e.g. clusters obtained from the WeatherRegime function). +#' +#' @author Verónica Torralba - BSC, \email{veronica.torralba@bsc.es} +#' +#' @description This function performs the matching between a field of anomalies and a set +#' of maps which will be used as a reference. The anomalies will be assigned to the reference map +#' for which the minimum Eucledian distance (method=’distance’) or highest spatial correlation +#' (method=‘ACC’) is obtained. +#' +#'@references Torralba, V. (2019) Seasonal climate prediction for the wind energy sector: methods and tools for the development of a climate service. Thesis. Available online: \url{https://eprints.ucm.es/56841/} +#' +#'@param data an array containing anomalies with named dimensions: dataset, member, sdate, ftime, lat and lon. +#'@param ref_maps array with 3-dimensions ('lon', 'lat', 'cluster') containing the maps/clusters that will be used as a reference for the matching. +#'@param method whether the matching will be performed in terms of minimum distance (default = ’distance’) or +#' the maximum spatial correlation (method=’ACC’) between the maps. +#'@param lat a vector of latitudes corresponding to the positions provided in data and ref_maps. +#'@param composite a logical parameter indicating if the composite maps are computed or not (default=FALSE). +#'@param memb a logical value indicating whether to compute composites for separate members (default FALSE) or as unique ensemble (TRUE). +#'This option is only available for when parameter 'composite' is set to TRUE and the data object has a dimension named 'member'. +#'@param ncores the number of multicore threads to use for parallel computation. +#'@return A list with elements \code{$composite} (3-d array (lon, lat, k) containing the composites k=1,..,K for case (*1) +# or only k=1 for any specific cluster, i.e., case (*2)) (only if composite='TRUE'), +#' \code{$pvalue} ( array with the same structure as \code{$composite} containing the pvalue of the composites obtained through a t-test +#' that accounts for the serial dependence of the data with the same structure as Composite.) (only if composite='TRUE'), +#' \code{$cluster} (array with the same dimensions as data (except latitude and longitude which are removed) indicating the ref_maps to which each point is allocated.) , +#' \code{$frequency} (A vector of integers (from k = 1, ... k n reference maps) indicating the percentage of assignations corresponding to each map.), +#' +#'@import s2dverification +#'@import multiApply +#'@examples +#'\dontrun{ +#'regimes <- WeatherRegime(data = lonlat_data$obs$data, lat = lonlat_data$obs$lat, +#' EOFs = FALSE, ncenters = 4)$composite +#'res1 <- RegimesAssign(data = lonlat_data$exp$data, ref_maps = drop(regimes), +#' lat = lonlat_data$exp$lat, composite = FALSE) +#'} +#'@export + +RegimesAssign <- function(data, ref_maps, lat, method = "distance", composite = FALSE, + memb = FALSE, ncores = NULL) { + + if (is.null(names(dim(data)))) { + stop("Parameter 'data' must be an array with named dimensions.") + } + if (is.null(ref_maps)) { + stop("Parameter 'ref_maps' must be specified.") + } + + if (is.null(lat)) { + stop("Parameter 'lat' must be specified.") + } + if (is.null(names(dim(ref_maps)))) { + stop("Parameter 'ref_maps' must be an array with named dimensions.") + } + if (!is.logical(memb)) { + stop("Parameter 'memb' must be logical.") + } + if (!is.logical(composite)) { + stop("Parameter 'memb' must be logical.") + } + dimData <- names(dim(data)) + + if (!all( c('lat', 'lon') %in% dimData)) { + stop("Parameter 'data' must contain the named dimensions 'lat' and 'lon'.") + } + + dimRef <- names(dim(ref_maps)) + + if (!all( c('cluster', 'lat', 'lon') %in% dimRef)) { + stop("Parameter 'ref_maps' must contain the named dimensions + 'cluster','lat' and 'lon'.") + } + + + if (length(lat) != dim(data)['lat'] | (length(lat) != dim(ref_maps)['lat']) ) { + stop(" Parameter 'lat' does not match with the dimension 'lat' in the + parameter 'data' or in the parameter 'ref_maps'.") + } + + + if ('sdate' %in% dimData && 'ftime' %in% dimData) { + nsdates <- dim(data)['sdate'] + nftimes <- dim(data)['ftime'] + data <- MergeDims(data, + merge_dims = c('ftime','sdate'), + rename_dim = 'time') + } else if ('sdate' %in% dimData | 'ftime' %in% dimData) { + names(dim(data))[which(dimData == 'sdate' | dimData == 'ftime') ] = 'time' + } else { + if (!('time' %in% dimData)) { + stop("Parameter 'data' must have temporal dimensions.") + } + } + ref_maps <- drop(ref_maps) + index <- Apply(data = list(ref = ref_maps, target = data), + target_dims = list(c('lat', 'lon', 'cluster'), c('lat', 'lon')), + fun = .RegimesAssign, + lat = lat, method = method, + ncores = ncores)[[1]] + + nclust <- dim(ref_maps)['cluster'] + freqs <- rep(NA, nclust) + for (n in 1:nclust) { + freqs[n] <- (length(which(index == n)) / length(index)) * 100 + } + + if (composite){ + poslon <- which(names(dim(data)) == 'lon') + poslat <- which(names(dim(data)) == 'lat') + postime <- which(names(dim(data)) == 'time') + posdim <- setdiff(1:length(dim(data)), c(postime, poslat, poslon)) + dataComp <- aperm(data, c(poslon, poslat, postime, posdim)) + + if (any(is.na(index))) { + recon <-list( + composite = InsertDim(array(NA, dim = c(dim(dataComp)[-postime])), + postime, dim(ref_maps)['composite.cluster']), + pvalue = InsertDim(array(NA, dim = c(dim(dataComp)[-postime])), + postime, dim(ref_maps)['composite.cluster'])) + } else { + if (memb) { + dataComp <- MergeDims(dataComp, merge_dims = c('time', 'member'), rename_dim = 'time') + index <- MergeDims(index, merge_dims = c('time', 'member'), rename_dim = 'time') + } + recon <- + Apply(data = list(var = dataComp, occ = index), + target_dims = list(c('lon', 'lat', 'time'), c('time')), + fun = Composite, + K = dim(ref_maps)['cluster']) + } + + output <- list(composite = recon$composite, + pvalue = recon$pvalue, + cluster = index, + frequency = freqs) + } else{ + + output <- list(cluster = index, + frequency = freqs) + } + + return(output) +} + +.RegimesAssign <- function(ref, target, method = 'distance', lat, composite=FALSE) { + posdim <- which(names(dim(ref)) == 'cluster') + poslat <- which(names(dim(ref)) == 'lat') + poslon <- which(names(dim(ref)) == 'lon') + + nclust <- dim(ref)[posdim] + if (all(dim(ref)[-posdim] != dim(target))) { + stop('The target should have the same dimensions [lat,lon] that + the reference ') + } + + if (is.null(names(dim(ref))) | is.null(names(dim(target)))) { + stop( + 'The arrays should include dimensions names ref[cluster,lat,lon] + and target [lat,lon]' + ) + } + + + if (length(lat) != dim(ref)[poslat]) { + stop('latitudes do not match with the maps') + } + + if (is.na(max(target))){ + assign <- NA + + } else{ + + + # This dimensions are reorganized + ref <- aperm(ref, c(posdim, poslat, poslon)) + target <- aperm(target, + c(which(names(dim(target)) == 'lat'), + which(names(dim(target)) == 'lon'))) + + # weights are defined + latWeights <- InsertDim(sqrt(cos(lat * pi / 180)), 2, dim(ref)[3]) + + + rmsdiff <- function(x, y) { + dims <- dim(x) + ndims <- length(dims) + if (ndims != 2 | ndims != length(dim(y))) { + stop('x and y should be maps') + } + map_diff <- NA * x + for (i in 1:dims[1]) { + for (j in 1:dims[2]) { + map_diff[i, j] <- (x[i, j] - y[i, j]) ^ 2 + } + } + rmsdiff <- sqrt(mean(map_diff)) + return(rmsdiff) + } + + if (method == 'ACC') { + corr <- rep(NA, nclust) + for (i in 1:nclust) { + corr[i] <- + ACC(InsertDim(InsertDim( + InsertDim(ref[i, , ] * latWeights, 1, 1), 2, 1 + ), 3, 1), + InsertDim(InsertDim( + InsertDim(target * latWeights, 1, 1), 2, 1 + ), 3, 1))$ACC[2] + } + assign <- which(corr == max(corr)) + } + + if (method == 'distance') { + rms <- rep(NA, nclust) + for (i in 1:nclust) { + rms[i] <- rmsdiff(ref[i, , ] * latWeights, target * latWeights) + } + assign <- which(rms == min(rms)) + } + } + + return(assign) +} + + +Composite <- function(var, occ, lag = 0, eno = FALSE, K = NULL, fileout = NULL) { + + if ( dim(var)[3] != length(occ) ) { + stop("Temporal dimension of var is not equal to length of occ.") + } + if (is.null(K)) { + K <- max(occ) + } + composite <- array(dim = c(dim(var)[1:2], composite = K)) + tvalue <- array(dim = dim(var)[1:2]) + dof <- array(dim = dim(var)[1:2]) + pvalue <- array(dim = c(dim(var)[1:2], composite = K)) + + if (eno == TRUE) { + n_tot <- Eno(var, posdim = 3) + } else { + n_tot <- length(occ) + } + mean_tot <- Mean1Dim(var, posdim = 3, narm = TRUE) + stdv_tot <- apply(var, c(1, 2), sd, na.rm = TRUE) + + for (k in 1 : K) { + if (length(which(occ == k)) >= 1) { + indices <- which(occ == k) + lag + toberemoved <- which(0 > indices | indices > dim(var)[3]) + + if (length(toberemoved) > 0) { + indices <- indices[-toberemoved] + } + if (eno == TRUE) { + n_k <- Eno(var[, , indices], posdim = 3) + } else { + n_k <- length(indices) + } + if (length(indices) == 1) { + composite[, , k] <- var[, , indices] + warning(paste("Composite", k, "has length 1 and pvalue is NA.")) + } else { + composite[, , k] <- Mean1Dim(var[, , indices], posdim = 3, narm = TRUE) + } + stdv_k <- apply(var[, , indices], c(1, 2), sd, na.rm = TRUE) + + tvalue <- (mean_tot - composite[, , k]) / + sqrt(stdv_tot ^ 2 / n_tot + stdv_k ^ 2 / n_k) + dof <- (stdv_tot ^ 2 / n_tot + stdv_k ^ 2 / n_k) ^ 2 / + ((stdv_tot ^ 2 / n_tot) ^ 2 / (n_tot - 1) + + (stdv_k ^ 2 / n_k) ^ 2 / (n_k - 1)) + pvalue[, , k] <- 2 * pt(-abs(tvalue), df = dof) + } + } + if (is.null(fileout) == FALSE) { + output <- list(composite = composite, pvalue = pvalue) + save(output, file = paste(fileout, '.sav', sep = '')) + } + + invisible(list(composite = composite, pvalue = pvalue)) +} + + diff --git a/R/CST_SaveExp.R b/R/CST_SaveExp.R index 1b3b4b1011179ec8a529c3914b40d64287f651a5..b98f3668b3316260c9e28b9a4658b795566b68c5 100644 --- a/R/CST_SaveExp.R +++ b/R/CST_SaveExp.R @@ -16,9 +16,9 @@ #' #'@seealso \code{\link{CST_Load}}, \code{\link{as.s2dv_cube}} and \code{\link{s2dv_cube}} #' -#'@import s2dverification #'@import ncdf4 -#'@import abind +#'@importFrom s2dv Reorder +#'@import multiApply #' #'@examples #'\dontrun{ @@ -39,33 +39,96 @@ CST_SaveExp <- function(data, destination = "./CST_Data") { stop("Parameter 'data' must be of the class 's2dv_cube', ", "as output by CSTools::CST_Load.") } -dimname <- names(dim(data$data)) - if (any(dimname == "time")) { - dimname[which(dimname == "time")] <- "ftime" - names(dim(data$data))[which(dimname == "time")] <- "ftime" + sdates <- lapply(1:length(data$Datasets), function(x) { + data$Datasets[[x]]$InitializationDates[[1]]})[[1]] + if (!is.character(attributes(data$Variable)$units)) { + units <- attributes(data$Variable)$variable$units + } else { + units <- attributes(data$Variable)$units + } + cdo_grid_name = attr(data$lon, 'cdo_grid_name') + projection = attr(data$lon, 'projection') + var_name <- data$Variable$varName + time_values <- data$Dates$start + dim(time_values) <- c(time = length(time_values) / length(sdates), + sdate = length(sdates)) + SaveExp(data = data$data, lon = data$lon, lat = data$lat, + Dataset = names(data$Datasets), var_name = var_name, + units = units, cdo_grid_name = cdo_grid_name, projection = projection, + startdates = sdates, Dates = time_values, destination) +} +#'Save an experiment in a format compatible with CST_Load +#'@description This function is created for compatibility with CST_Load/Load for saving post-processed datasets such as those calibrated of downscaled with CSTools functions +#' +#'@author Perez-Zanon Nuria, \email{nuria.perez@bsc.es} +#' +#'@param data an multi-dimensional array with named dimensions (longitude, latitude, time, member, sdate) +#'@param lon vector of logitud corresponding to the longitudinal dimension in data +#'@param lat vector of latitud corresponding to the latitudinal dimension in data +#'@param Dataset a vector of character string indicating the names of the datasets +#'@param var_name a character string indicating the name of the variable to be saved +#'@param units a character string indicating the units of the variable +#'@param startdates a vector of dates indicating the initialization date of each simulations +#'@param Dates a matrix of dates with two dimension 'time' and 'sdate'. +#'@param cdo_grid_name a character string indicating the name of the grid e.g.: 'r360x181' +#'@param projection a character string indicating the projection name +#'@param destination a character string indicating the path where to store the NetCDF files +#' +#'@return the function creates as many files as sdates per dataset. Each file could contain multiple members +#' The path will be created with the name of the variable and each Datasets. +#' +#'@import ncdf4 +#'@importFrom s2dv Reorder +#'@import multiApply +#' +#'@examples +#'\dontrun{ +#'data <- lonlat_data$exp$data +#'lon <- lonlat_data$exp$lon +#'lat <- lonlat_data$exp$lat +#'Dataset <- 'XXX' +#'var_name <- 'tas' +#'units <- 'k' +#'startdates <- lapply(1:length(lonlat_data$exp$Datasets), +#' function(x) { +#' lonlat_data$exp$Datasets[[x]]$InitializationDates[[1]]})[[1]] +#'Dates <- lonlat_data$exp$Dates$start +#'dim(Dates) <- c(time = length(Dates)/length(startdates), sdate = length(startdates)) +#'cdo_grid_name = attr(lonlat_data$exp$lon, 'cdo_grid_name') +#'projection = attr(lonlat_data$exp$lon, 'projection') +#'destination = './path/' +#'SaveExp(data, lon, lat, Dataset, var_name, units, startdates, Dates, +#' cdo_grid_name, projection, destination) +#'} +#'@export +SaveExp <- function(data, lon, lat, Dataset, var_name, units, startdates, Dates, + cdo_grid_name, projection, destination) { + dimname <- names(dim(data)) + if (any(dimname == "ftime")) { + dimname[which(dimname == "ftime")] <- "time" + names(dim(data))[which(dimname == "ftime")] <- "time" } if (any(dimname == "memb")) { dimname[which(dimname == "memb")] <- "member" - names(dim(data$data))[which(dimname == "memb")] <- "member" + names(dim(data))[which(dimname == "memb")] <- "member" } if (any(dimname == "ensemble")) { dimname[which(dimname == "ensemble")] <- "member" - names(dim(data$data))[which(dimname == "ensemble")] <- "member" + names(dim(data))[which(dimname == "ensemble")] <- "member" } - if (any(dimname == "longitude")) { - dimname[which(dimname == "longitude")] <- "lon" - names(dim(data$data))[which(dimname == "longitude")] <- "lon" + if (any(dimname == "lon")) { + dimname[which(dimname == "lon")] <- "longitude" + names(dim(data))[which(dimname == "lon")] <- "longitude" } - if (any(dimname == "latitude")) { - dimname[which(dimname == "latitude")] <- "lat" - names(dim(data$data))[which(dimname == "latitude")] <- "lat" + if (any(dimname == "lat")) { + dimname[which(dimname == "lat")] <- "latitude" + names(dim(data))[which(dimname == "lat")] <- "latitude" } -names(dim(data$data)) <- dimname - + names(dim(data)) <- dimname if (is.null(dimname)) { stop("Element 'data' in parameter 'data' must have named dimensions.") } -sdate_pos <- which(dimname == "sdate") + sdate_pos <- which(dimname == "sdate") if (length(sdate_pos) == 0) { stop("Element 'data' in parameter 'data' hasn't 'sdate' dimension.") @@ -73,24 +136,23 @@ sdate_pos <- which(dimname == "sdate") stop("Element 'data' in parameter 'data' has more than one 'sdate'", " dimension.") } -n_sdates <- dim(data$data)[sdate_pos] # number of files to create -dataset_pos <- which(dimname == "dataset" | dimname == "dat") -dims <- dim(data$data) + dataset_pos <- which(dimname == "dataset" | dimname == "dat") + dims <- dim(data) if (length(dataset_pos) == 0) { warning("Element 'data' in parameter 'data' hasn't 'dataset' dimension. ", "All data is stored in the same 'dataset' folder.") - data$data <- InsertDim(var = data$data, posdim = 1, lendim = 1) - names(dim(data$data))[1] <- "dataset" + data$data <- InsertDim(var = data, posdim = 1, lendim = 1) + names(dim(data))[1] <- "dataset" dimname <- c("dataset", dimname) dataset_pos = 1 } else if (length(dataset_pos) > 1) { stop("Element 'data' in parameter 'data' has more than one 'dataset'", " dimension.") } -n_datasets <- dim(data$data)[dataset_pos] # number of folder by dataset -# dataset names: -datasets <- names(data$Datasets) + n_datasets <- dim(data)[dataset_pos] # number of folder by dataset + # dataset names: + datasets <- Dataset if (n_datasets > length(datasets)) { warning("Dimension 'dataset' in element 'data' from parameter 'data' ", "is greater than those listed in element 'Datasets' and the ", @@ -102,15 +164,14 @@ datasets <- names(data$Datasets) " first element will be used.") datasets <- datasets[1 : n_datasets] } -# var names: + # var names: if ('var' %in% dimname) { var_pos <- which(dimname == 'var') if (dims[var_pos] == 1) { - data$data <- adrop(data$data, drop = var_pos) - dimname <- names(dim(data$data)) + data <- adrop(data, drop = var_pos) + dimname <- names(dim(data)) } } -var_name <- data$Variable$varName if (length(var_name) != 1) { stop("One variable name must be included in element 'Variable$varName' ", "of parameter 'data'.") @@ -120,20 +181,20 @@ var_name <- data$Variable$varName "must be a character string.") } -known_dim_names <- c("var", "lat", "latitude", "lon", "longitude", "time", - "ftime", "sdate", "dataset", "dat", "nlevel", "levels") -dims_var <- NULL -list_pos <- 1 + known_dim_names <- c("var", "lat", "latitude", "lon", "longitude", "time", + "ftime", "sdate", "dataset", "dat", "nlevel", "levels") + dims_var <- NULL + list_pos <- 1 if (any(dimname == 'longitude') | any(dimname == 'lon')) { dim_lon <- ncdim_def(name = 'lon', units = 'degrees', - vals = as.vector(data$lon), longname = 'longitude') + vals = as.vector(lon), longname = 'longitude') dims_var[[list_pos]] <- dim_lon list_pos <- list_pos + 1 } if (any(dimname == 'latitude') | any(dimname == 'lat')) { dim_lat <- ncdim_def(name = 'lat', units = 'degrees_north', - vals = as.vector(data$lat), longname = 'latitude') + vals = as.vector(lat), longname = 'latitude') dims_var[[list_pos]] <- dim_lat list_pos <- list_pos + 1 } @@ -143,15 +204,13 @@ list_pos <- 1 stop("Ask for saving realizations or further dimensions to the mantainer.") } else { dim_memb <- ncdim_def(name = 'ensemble', units = "adim", - vals = 1 : dim(data$data)[which(dimname == 'member')], + vals = 1 : dim(data)[which(dimname == 'member')], longname = 'ensemble', create_dimvar = TRUE) dims_var[[list_pos]] <- dim_memb list_pos <- list_pos + 1 } } - # Lead-time depends on the start date - nlt <- length(data$Dates$start)/n_sdates if (any(dimname == 'level')) { stop("Ask for saving 3Dim fields to the mantainer.") @@ -160,49 +219,57 @@ list_pos <- 1 for (i in 1 : n_datasets) { path <- file.path(destination, datasets[i], var_name) dir.create(path, recursive = TRUE) - startdate <- gsub("-", "", data$Datasets[[i]]$InitializationDates[[1]]) - file_name <- paste0(var_name, "_", startdate, ".nc") - full_filename <- file.path(path, file_name) - - data_dataset <- Subset(data$data, along = which(dimname == 'dataset'), indices = i) - standard_order <- c("lon", "lat", "member", "ftime") - change_names <- c("lon", "lat", "ensemble", "ftime") - for (j in 1 : n_sdates) { - n_data <- s2dverification::Subset(data_dataset, - along = which(dimname == 'sdate'), - indices = j, drop = TRUE) - pos_standard_order <- match( standard_order, names(dim(n_data))) - n_data <- aperm(n_data, pos_standard_order) - - names(dim(n_data)) <- change_names + startdate <- gsub("-", "", startdates) - # Lead-time depends on the start date - # The correct time should be selected from $Dates$start - time_values <- as.Date(substr(data$Dates$start[(j * nlt - nlt + 1):(j * nlt)], - 1, 10)) - - if (any(dimname == 'time') | any(dimname == 'ftime')) { - dim_time <- ncdim_def(name = 'time', units = 'days since 1970-01-01', - vals = as.numeric(time_values), - longname = 'time', unlim = TRUE) - if (i == 1 & j == 1) { - dims_var[[list_pos]] <- dim_time - list_pos <- list_pos + 1 - } - } - if (!is.character(attributes(data$Variable)$units)) { - units = attributes(data$Variable)$variable$units - } else { - units = attributes(data$Variable)$units - } - datanc <- ncvar_def(name = var_name, - units = units, - dim = dims_var, missval = -99999) - file_nc <- nc_create(full_filename[j], datanc) - ncvar_put(file_nc, datanc, n_data) - ncatt_put(file_nc, datanc, 'coordinates', attr(data$lon, 'cdo_grid_name')) - ncatt_put(file_nc, datanc, 'projection', attr(data$lon, 'projection')) - nc_close(file_nc) - } + dim(startdate) <- c(sdate = length(startdate)) + Apply(list(data, startdate, Dates), + target_dims = list(c('member', 'time', 'latitude', 'longitude'), + NULL, 'time'), + fun = .saveExp, var_name = var_name, units = units, + dims_var = dims_var, cdo_grid_name = cdo_grid_name, projection = projection, + destination = path) } } + +# data is an array with dimensions: member, time, lat, lon: +# Dates is a vector of the dates for the time dimension +# dims_var is a list with the ncdim_def of common variables in dataset: member, lat and lon: +# data <- 1:(3 * 4 * 5 * 6) +# dim(data) <- c(longitude = 3, latitude = 4, time = 5, member = 6) +# var_name <- 'tas' +# units <- 'K' +# lon <- 1:3 +# lat <- 1:4 +# sdate = '19001101' +# destination = '/esarchive/scratch/nperez/git/Flor/cstools/' +# dims_var = list(ncdim_def(name = 'lon', units = 'degrees', +# vals = as.vector(lon), longname = 'longitude'), +# ncdim_def(name = 'lat', units = 'degrees_north', +# vals = as.vector(lat), longname = 'latitude'), +# ncdim_def(name = 'ensemble', units = "adim", +# vals = 1 : 6, +# longname = 'ensemble', create_dimvar = TRUE)) +#Dates <- as.Date(c("1900-11-01", "1900-12-01", "1901-01-01", "1901-02-01", "1901-03-01")) +#.saveExp(data, sdate, Dates, var_name, units, dims_var, cdo_grid_name = 'r360x181', projection = 'none', destination) +.saveExp <- function(data, sdate, Dates, var_name, units, dims_var, + cdo_grid_name, projection, destination) { + dim_names <- names(dim(data)) + if (any(dim_names != c('longitude', 'latitude', 'member', 'time'))) { + data <- Reorder(data, c('longitude', 'latitude', 'member', 'time')) + } + dim_time <- ncdim_def(name = 'time', units = 'days since 1970-01-01', + vals = as.numeric(Dates), + longname = 'time', unlim = TRUE) + list_pos = length(dims_var) + 1 + dims_var[[list_pos]] <- dim_time + datanc <- ncvar_def(name = var_name, + units = units, + dim = dims_var, missval = -99999) + file_name <- paste0(var_name, "_", sdate, ".nc") + full_filename <- file.path(destination, file_name) + file_nc <- nc_create(full_filename, datanc) + ncvar_put(file_nc, datanc, data) + ncatt_put(file_nc, datanc, 'coordinates', cdo_grid_name) + ncatt_put(file_nc, datanc, 'projection', projection) + nc_close(file_nc) +} diff --git a/R/CST_WeatherRegimes.R b/R/CST_WeatherRegimes.R new file mode 100644 index 0000000000000000000000000000000000000000..72ab3987e38e2ab4b7f45bcc2b10016f3eecbb92 --- /dev/null +++ b/R/CST_WeatherRegimes.R @@ -0,0 +1,303 @@ +#' @rdname CST_WeatherRegimes +#' @title Function for Calculating the Cluster analysis +#' +#' @author Verónica Torralba - BSC, \email{veronica.torralba@bsc.es} +#' +#' @description This function computes the weather regimes from a cluster analysis. +#'It is applied on the array \code{data} in a 's2dv_cube' object. The dimensionality of this object can be also reduced +#'by using PCs obtained from the application of the #'EOFs analysis to filter the dataset. +#'The cluster analysis can be performed with the traditional k-means or those methods +#'included in the hclust (stats package). +#' +#'@references Cortesi, N., V., Torralba, N., González-Reviriego, A., Soret, and F.J., Doblas-Reyes (2019). +#' Characterization of European wind speed variability using weather regimes. Climate Dynamics,53, +#' 4961–4976, doi:10.1007/s00382-019-04839-5. +#'@references Torralba, V. (2019) Seasonal climate prediction for the wind energy sector: methods and tools +#' for the development of a climate service. Thesis. Available online: \url{https://eprints.ucm.es/56841/} +#' +#'@param data a 's2dv_cube' object +#'@param ncenters Number of clusters to be calculated with the clustering function. +#'@param EOFs Whether to compute the EOFs (default = 'TRUE') or not (FALSE) to filter the data. +#'@param neofs number of modes to be kept (default = 30). +#'@param varThreshold Value with the percentage of variance to be explained by the PCs. +#' Only sufficient PCs to explain this much variance will be used in the clustering. +#'@param method Different options to estimate the clusters. The most traditional approach is the k-means analysis (default=’kmeans’) +#'but the function also support the different methods included in the hclust . These methods are: +#'"ward.D", "ward.D2", "single", "complete", "average" (= UPGMA), "mcquitty" (= WPGMA), "median" (= WPGMC) or "centroid" (= UPGMC). +#' For more details about these methods see the hclust function documentation included in the stats package. +#'@param iter.max Parameter to select the maximum number of iterations allowed (Only if method='kmeans' is selected). +#'@param nstart Parameter for the cluster analysis determining how many random sets to choose (Only if method='kmeans' is selected). +#'@param ncores The number of multicore threads to use for parallel computation. +#'@return A list with two elements \code{$data} (a 's2dv_cube' object containing the composites cluster=1,..,K for case (*1) +# or only k=1 for any specific cluster, i.e., case (*2)) and \code{$statistics} that includes +#' \code{$pvalue} (array with the same structure as \code{$data} containing the pvalue of the composites obtained through a t-test that accounts for the serial dependence.), +#' \code{cluster} (A matrix or vector with integers (from 1:k) indicating the cluster to which each time step is allocated.), +#' \code{persistence} (Percentage of days in a month/season before a cluster is replaced for a new one (only if method=’kmeans’ has been selected.)), +#' \code{frequency} (Percentage of days in a month/season belonging to each cluster (only if method=’kmeans’ has been selected).), +#'@import s2dverification +#'@import multiApply +#'@examples +#'\dontrun{ +#'res1 <- CST_WeatherRegimes(data = lonlat_data$obs, EOFs = FALSE, ncenters = 4) +#'res2 <- CST_WeatherRegimes(data = lonlat_data$obs, EOFs = TRUE, ncenters = 3) +#'} +#'@export +#' +CST_WeatherRegimes <- function(data, ncenters = NULL, + EOFs = TRUE, neofs = 30, + varThreshold = NULL, + method = "kmeans", + iter.max = 100, nstart = 30, + ncores = NULL) { + if (!inherits(data, 's2dv_cube')) { + stop("Parameter 'data' must be of the class 's2dv_cube', ", + "as output by CSTools::CST_Load.") + } + if ('lon' %in% names(data)){ + lon <- data$lon + }else { + lon <- NULL + } + result <- WeatherRegime(data$data,ncenters = ncenters, + EOFs = EOFs, neofs = neofs, + varThreshold = varThreshold, lon = lon, + lat = data$lat, method = method, + iter.max=iter.max, nstart = nstart, + ncores = ncores) + data$data <- result$composite + data$statistics <- result[-1] + return(data) +} + +#' @rdname WeatherRegimes +#' @title Function for Calculating the Cluster analysis +#' +#' @author Verónica Torralba - BSC, \email{veronica.torralba@bsc.es} +#' +#' @description This function computes the weather regimes from a cluster analysis. +#'It can be applied over the dataset with dimensions +#'c(year/month, month/day, lon, lat), or by using PCs obtained from the application of the +#'EOFs analysis to filter the dataset. +#'The cluster analysis can be performed with the traditional k-means or those methods +#'included in the hclust (stats package). +#' +#'@references Cortesi, N., V., Torralba, N., González-Reviriego, A., Soret, and F.J., Doblas-Reyes (2019). +#' Characterization of European wind speed variability using weather regimes. Climate Dynamics,53, +#' 4961–4976, doi:10.1007/s00382-019-04839-5. +#'@references Torralba, V. (2019) Seasonal climate prediction for the wind energy sector: methods and tools +#' for the development of a climate service. Thesis. Available online: \url{https://eprints.ucm.es/56841/} +#' +#'@param data an array containing anomalies with named dimensions with at least start date 'sdate', forecast time 'ftime', latitude 'lat' and longitude 'lon'. +#'@param ncenters Number of clusters to be calculated with the clustering function. +#'@param EOFs Whether to compute the EOFs (default = 'TRUE') or not (FALSE) to filter the data. +#'@param neofs number of modes to be kept only if EOFs = TRUE has been selected. (default = 30). +#'@param varThreshold Value with the percentage of variance to be explained by the PCs. +#' Only sufficient PCs to explain this much variance will be used in the clustering. +#'@param lon Vector of longitudes. +#'@param lat Vector of latitudes. +#'@param method Different options to estimate the clusters. The most traditional approach is the k-means analysis (default=’kmeans’) +#'but the function also support the different methods included in the hclust . These methods are: +#'"ward.D", "ward.D2", "single", "complete", "average" (= UPGMA), "mcquitty" (= WPGMA), "median" (= WPGMC) or "centroid" (= UPGMC). +#' For more details about these methods see the hclust function documentation included in the stats package. +#'@param iter.max Parameter to select the maximum number of iterations allowed (Only if method='kmeans' is selected). +#'@param nstart Parameter for the cluster analysis determining how many random sets to choose (Only if method='kmeans' is selected). +#'@param ncores The number of multicore threads to use for parallel computation. +#'@return A list with elements \code{$composite} (array with at least 3-d ('lat', 'lon', 'cluster') containing the composites k=1,..,K for case (*1) +# or only k=1 for any specific cluster, i.e., case (*2)), +#' \code{pvalue} (array with at least 3-d ('lat','lon','cluster') with the pvalue of the composites obtained through a t-test that accounts for the serial +# dependence of the data with the same structure as Composite.), +#' \code{cluster} (A matrix or vector with integers (from 1:k) indicating the cluster to which each time step is allocated.), +#' \code{persistence} (Percentage of days in a month/season before a cluster is replaced for a new one (only if method=’kmeans’ has been selected.)), +#' \code{frequency} (Percentage of days in a month/season belonging to each cluster (only if method=’kmeans’ has been selected).), +#'@import s2dverification +#'@import multiApply +#'@examples +#'\dontrun{ +#'res <- WeatherRegime(data = lonlat_data$obs$data, lat = lonlat_data$obs$lat, +#' EOFs = FALSE, ncenters = 4) +#'} +#'@export + +WeatherRegime <- function(data, ncenters = NULL, + EOFs = TRUE,neofs = 30, + varThreshold = NULL, lon = NULL, + lat = NULL, method = "kmeans", + iter.max=100, nstart = 30, + ncores = NULL) { + + if (is.null(names(dim(data)))) { + stop("Parameter 'data' must be an array with named dimensions.") + } + + if (is.null(lat)) { + stop("Parameter 'lat' must be specified.") + } + + dimData <- names(dim(data)) + + if ('sdate' %in% dimData && 'ftime' %in% dimData) { + nsdates <- dim(data)['sdate'] + nftimes <- dim(data)['ftime'] + data <- MergeDims(data, + merge_dims = c('ftime', 'sdate'), + rename_dim = 'time') + } else if ('sdate' %in% dimData | 'ftime' %in% dimData) { + names(dim(data))[which(dimData == 'sdate' | dimData == 'ftime') ] = 'time' + } else { + if (!('time' %in% dimData)) { + stop("Parameter 'data' must have temporal dimensions.") + } + } + + + output <- Apply(data = list(data), + target_dims = c('time','lat','lon'), + fun = .WeatherRegime, + EOFs = EOFs, neofs = neofs, + varThreshold = varThreshold, + lon = lon, lat = lat, + ncenters = ncenters, + method = method, + ncores = ncores) + + if (method == 'kmeans' && 'sdate' %in% dimData && 'ftime' %in% dimData) { + + # The frequency and the persistency are computed as they are useful + # parameters in the cluster analysis + extra_output <- Apply(data = output$cluster, + target_dims = 'time', + fun = .freqPer, + nsdates = nsdates, + nftimes = nftimes , + ncenters = ncenters) + + output$cluster <- t(array(output$cluster, dim = c(nftimes, nsdates))) + names(dim(output$cluster)) <- c('sdate', 'ftime') + + output <- list(composite = output$composite, + pvalue = output$pvalue, + cluster = output$cluster, + frequency = extra_output$frequency, + persistence = extra_output$persistence) + } + return(output) +} + +.WeatherRegime <- function(data, ncenters = NULL, EOFs = TRUE,neofs = 30, + varThreshold = NULL, lon = NULL, + lat = NULL, method = "kmeans", + iter.max=100, nstart = 30) { + + if (is.null(names(dim(data)))) { + stop("Parameter 'data' must be an array with 'time', 'lat' and 'lon' dimensions.") + } + + if (!is.null(lat) && dim(data)['lat'] != length(lat)) { + stop("The length of the paramter 'lat' does not match with the ['lat'] dimension of + the parameter 'data'.") + } + if (is.null(ncenters)) { + stop("Parameter 'ncenters' must be specified.") + } + if (EOFs == TRUE && is.null(lon)) { + stop("Parameter 'lon' must be specified.") + } + if (is.null(lat)) { + stop("Parameter 'lat' must be specified.") + } + + nlon <- dim(data)['lat'] + nlat <- dim(data)['lon'] + + if (any(is.na(data))){ + nas_test <- MergeDims(data, merge_dims = c('lat','lon'), + rename_dim = 'space',na.rm = TRUE) + if (dim(nas_test)['space']== c(nlat*nlon)){ + stop("Parameter 'data' contains NAs in the 'time' dimensions.") + } + } + + if (EOFs == TRUE) { + if (is.null(varThreshold)) { + dataPC <- EOF(data, + lat = as.vector(lat), + lon = as.vector(lon), + neofs = neofs) + cluster_input <- dataPC$PC + } else { + dataPC <- EOF(data, + lat = as.vector(lat), + lon = as.vector(lon), + neofs = neofs) + minPC <- + head(as.numeric(which(cumsum(dataPC$var) > varThreshold)), 1) + cluster_input <- dataPC$PC[, 1:minPC] + } + } else { + + dataW <- aperm(Apply(data, target_dims = 'lat', + function (x, la) { + x * cos(la * pi / 180)}, + la = lat)[[1]], c(2, 1, 3)) + + cluster_input <- MergeDims(dataW, merge_dims = c('lat','lon'), + rename_dim = 'space',na.rm = TRUE) + + } + + if (method == "kmeans") { + + clust <- kmeans( + cluster_input, + centers = ncenters, + iter.max = iter.max, + nstart = nstart, + trace = FALSE) + + result <- array(0, c(ncenters, nlat, nlon)) + # the order of the data dimensions is changed ('lat','lon','time') + result <- Composite(aperm(data,c(2, 3, 1)), clust$cluster) + + } else { + result <- hclust(dist(cluster_input), method = method) + clusterCut <- cutree(result, ncenters) + result <- Composite(aperm(data, c(2, 3, 1)), clusterCut) + } + result <- lapply(1:length(result), + function (n) { + names(dim(result[[n]])) <- c("lat", "lon", "cluster") + return (result[[n]]) + }) + + names(result) <- c('composite','pvalue') + + if (method == "kmeans") { + clust <- as.array(clust$cluster) + names(dim(clust)) <- 'time' + return(list( + composite = result$composite, + pvalue = result$pvalue, + cluster = clust)) + } else { + clust <- as.array(clusterCut) + names(dim(clust)) <- 'time' + return(list( + composite = result$composite, + pvalue = result$pvalue, + cluster = clust)) + } +} + +.freqPer<- function (clust, nsdates, nftimes, ncenters){ + frequency <- persistence <- matrix(NA, nsdates, ncenters) + x <- as.vector(clust) + for (i in 1:nsdates) { + occurences <-rle(x[((i * nftimes) + 1 - nftimes):(i * nftimes)]) + for (j in 1:ncenters) { + frequency[i, j] <-(sum(occurences$lengths[occurences$values == j]) / nftimes) * 100 + persistence[i, j] <- mean(occurences$lengths[occurences$values == j]) + } + } + return(list(frequency = frequency, + persistence = persistence)) +} diff --git a/R/PlotCombinedMap.R b/R/PlotCombinedMap.R index 96784eb88674539cc5b26150f3c1d35567066fff..7f8f5d64e0191ab40cd7e88b25d23fd95f9269e9 100644 --- a/R/PlotCombinedMap.R +++ b/R/PlotCombinedMap.R @@ -43,7 +43,17 @@ #' display_range = c(0, 1), #' bar_titles = paste('% of belonging to', c('a', 'b', 'c')), #' brks = 20, width = 10, height = 8) - +#' +#'Lon <- c(0:40, 350:359) +#'Lat <- 51:26 +#'data <- rnorm(51 * 26 * 3) +#'dim(data) <- c(map = 3, lon = 51, lat = 26) +#'mask <- sample(c(0,1), replace = TRUE, size = 51 * 26) +#'dim(mask) <- c(lat = 26, lon = 51) +#'PlotCombinedMap(data, lon = Lon, lat = Lat, map_select_fun = max, +#' display_range = range(data), mask = mask, +#' width = 12, height = 8) +#' #'@export PlotCombinedMap <- function(maps, lon, lat, map_select_fun, display_range, @@ -349,9 +359,19 @@ PlotCombinedMap <- function(maps, lon, lat, # Add overplot on top #---------------------- if (!is.null(mask)) { + dims_mask <- dim(mask) + latb <- sort(lat, index.return = TRUE) + dlon <- lon[2:dims_mask[2]] - lon[1:(dims_mask[2] - 1)] + wher <- which(dlon > (mean(dlon) + 1)) + if (length(wher) > 0) { + lon[(wher + 1):dims_mask[2]] <- lon[(wher + 1):dims_mask[2]] - 360 + } + lonb <- sort(lon, index.return = TRUE) + cols_mask <- sapply(seq(from = 0, to = 1, length.out = 10), function(x) adjustcolor(col_mask, alpha.f = x)) - image(lon, lat, t(mask), axes = FALSE, col = cols_mask, + image(lonb$x, latb$x, t(mask)[lonb$ix, latb$ix], + axes = FALSE, col = cols_mask, breaks = seq(from = 0, to = 1, by = 0.1), xlab='', ylab='', add = TRUE, xpd = TRUE) if (!exists('coast_color')) { diff --git a/R/PlotForecastPDF.R b/R/PlotForecastPDF.R index c40cc32c3b57211e9d14f2555c67541ec18b523d..14e6cd3aadca4cda3f6d040e526b8bcc5a0b206a 100644 --- a/R/PlotForecastPDF.R +++ b/R/PlotForecastPDF.R @@ -220,7 +220,7 @@ PlotForecastPDF <- function(fcst, tercile.limits, extreme.limits = NULL, obs = N }) attr <- attr(hatch.ls, "split_labels") for (i in 1:length(hatch.ls)) { - hatch.ls[[i]] <- cbind(hatch.ls[[i]], attr[i, ]) + hatch.ls[[i]] <- cbind(hatch.ls[[i]], attr[i, ], row.names = NULL) } hatch.df <- do.call("rbind", hatch.ls) # Compute max y for each extreme category @@ -229,7 +229,7 @@ PlotForecastPDF <- function(fcst, tercile.limits, extreme.limits = NULL, obs = N }) attr <- attr(max.ls, "split_labels") for (i in 1:length(max.ls)) { - max.ls[[i]] <- cbind(max.ls[[i]], attr[i, ]) + max.ls[[i]] <- cbind(max.ls[[i]], attr[i, ], row.names = NULL) } max.df <- do.call("rbind", max.ls) } @@ -323,7 +323,8 @@ PlotForecastPDF <- function(fcst, tercile.limits, extreme.limits = NULL, obs = N pct2$lab.pos <- as.vector(apply(extreme.limits, 1, function(x) {c(x[1], NA, x[2])})) pct2 <- merge(pct2, max.df, by = c("init", "extremes")) # include potentially missing groups - pct2 <- pct2[CJ(levels(pct2$init), factor(c("Below P10", "Normal", "Above P90"), + pct2 <- pct2[CJ(factor(levels(pct2$init), levels = levels(pct2$init)), + factor(c("Below P10", "Normal", "Above P90"), levels = c("Below P10", "Normal", "Above P90"))), ] } #------------------------ diff --git a/R/PlotPDFsOLE.R b/R/PlotPDFsOLE.R new file mode 100644 index 0000000000000000000000000000000000000000..fc4ad76e230b535456f6c6e5aa5930dc42761424 --- /dev/null +++ b/R/PlotPDFsOLE.R @@ -0,0 +1,246 @@ +#' Plotting two probability density gaussian functions and the optimal linear +#' estimation (OLE) as result of combining them. +#' +#' @author Eroteida Sanchez-Garcia - AEMET, //email{esanchezg@aemet.es} +#' +#' @description This function plots two probability density gaussian functions +#' and the optimal linear estimation (OLE) as result of combining them. +#' +#' @param pdf_1 A numeric array with a dimension named 'statistic', containg +#' two parameters: mean' and 'standard deviation' of the first gaussian pdf +#' to combining. +#' @param pdf_2 A numeric array with a dimension named 'statistic', containg +#' two parameters: mean' and 'standard deviation' of the second gaussian pdf +#' to combining. +#' @param nsigma (optional) A numeric value for setting the limits of X axis. +#' (Default nsigma = 3). +#' @param plotfile (optional) A filename where the plot will be saved. +#' (Default: the plot is not saved). +#' @param width (optional) A numeric value indicating the plot width in +#' units ("in", "cm", or "mm"). (Default width = 30). +#' @param height (optional) A numeric value indicating the plot height. +#' (Default height = 15). +#' @param units (optional) A character value indicating the plot size +#' unit. (Default units = 'cm'). +#' @param dpi (optional) A numeric value indicating the plot resolution. +#' (Default dpi = 300). +#' +#' @return PlotPDFsOLE() returns a ggplot object containing the plot. +#' +#' @import ggplot2 +#' +#' @examples +#' # Example 1 +#' pdf_1 <- c(1.1,0.6) +#' attr(pdf_1, "name") <- "NAO1" +#' dim(pdf_1) <- c(statistic = 2) +#' pdf_2 <- c(1,0.5) +#' attr(pdf_2, "name") <- "NAO2" +#' dim(pdf_2) <- c(statistic = 2) +#' +#' PlotPDFsOLE(pdf_1, pdf_2) +#' +#'@export +PlotPDFsOLE <- function(pdf_1, pdf_2, nsigma = 3, plotfile = NULL, + width = 30, height = 15, + units = "cm", dpi = 300) { + y <- type <- NULL + + if(!is.null(plotfile)){ + if (!is.numeric(dpi)) { + stop("Parameter 'dpi' must be numeric.") + } + if (length(dpi) > 1) { + warning("Parameter 'dpi' has length greater than 1 and ", + "only the first element will be used.") + dpi <- dpi[1] + } + if (!is.character(units)) { + stop("Parameter 'units' must be character") + } + if (length(units) > 1) { + warning("Parameter 'units' has length greater than 1 and ", + "only the first element will be used.") + units <- units[1] + } + if(!(units %in% c("in", "cm", "mm"))) { + stop("Parameter 'units' must be equal to 'in', 'cm' or 'mm'.") + } + if (!is.numeric(height)) { + stop("Parameter 'height' must be numeric.") + } + if (length(height) > 1) { + warning("Parameter 'height' has length greater than 1 and ", + "only the first element will be used.") + height <- height[1] + } + if (!is.numeric(width)) { + stop("Parameter 'width' must be numeric.") + } + if (length(width) > 1) { + warning("Parameter 'width' has length greater than 1 and ", + "only the first element will be used.") + width <- width[1] + } + if (!is.character(plotfile)) { + stop("Parameter 'plotfile' must be a character string ", + "indicating the path and name of output png file.") + } + } + if (!is.numeric(nsigma)) { + stop("Parameter 'nsigma' must be numeric.") + } + if (length(nsigma) > 1) { + warning("Parameter 'nsigma' has length greater than 1 and ", + "only the first element will be used.") + nsigma <- nsigma[1] + } + if (!is.array(pdf_1)) { + stop("Parameter 'pdf_1' must be an array.") + } + if (!is.array(pdf_2)) { + stop("Parameter 'pdf_2' must be an array.") + } + if (!is.numeric(pdf_1)) { + stop("Parameter 'pdf_1' must be a numeric array.") + } + if (!is.numeric(pdf_2)) { + stop("Parameter 'pdf_2' must be a numeric array.") + } + if (is.null(names(dim(pdf_1))) || + is.null(names(dim(pdf_2)))) { + stop("Parameters 'pdf_1' and 'pdf_2' ", + "should have dimmension names.") + } + if(!('statistic' %in% names(dim(pdf_1)))) { + stop("Parameter 'pdf_1' must have dimension 'statistic'.") + } + if(!('statistic' %in% names(dim(pdf_2)))) { + stop("Parameter 'pdf_2' must have dimension 'statistic'.") + } + if (length(dim(pdf_1)) != 1) { + stop("Parameter 'pdf_1' must have only dimension 'statistic'.") + } + if (length(dim(pdf_2)) != 1) { + stop("Parameter 'pdf_2' must have only dimension 'statistic'.") + } + if ((dim(pdf_1)['statistic'] != 2) || (dim(pdf_2)['statistic'] != 2)) { + stop("Length of dimension 'statistic'", + "of parameter 'pdf_1' and 'pdf_2' must be equal to 2.") + } + if(!is.null(attr(pdf_1, "name"))){ + if(!is.character(attr(pdf_1, "name"))){ + stop("The 'name' attribute of parameter 'pdf_1' must be a character ", + "indicating the name of the variable of parameter 'pdf_1'.") + } + } + if(!is.null(attr(pdf_2, "name"))){ + if(!is.character(attr(pdf_2, "name"))){ + stop("The 'name' attribute of parameter 'pdf_2' must be a character ", + "indicating the name of the variable of parameter 'pdf_2'.") + } + } + if(is.null(attr(pdf_1, "name"))){ + name1 <- "variable 1" + } else { + name1 <- attr(pdf_1, "name") + } + if(is.null(attr(pdf_2, "name"))){ + name2 <- "Variable 2" + } else { + name2 <- attr(pdf_2, "name") + } + + #----------------------------------------------------------------------------- + # Set parameters of gaussian distributions (mean and sd) + #----------------------------------------------------------------------------- + mean1 <- pdf_1[1] + sigma1 <- pdf_1[2] + mean2 <- pdf_2[1] + sigma2 <- pdf_2[2] + pdfBest <- CombinedPDFs(pdf_1, pdf_2) + meanBest <- pdfBest[1] + sigmaBest <- pdfBest[2] + + + #----------------------------------------------------------------------------- + # Plot the gaussian distributions + #----------------------------------------------------------------------------- + nameBest <- paste0(name1, " + ", name2) + graphicTitle <- "OPTIMAL LINEAR ESTIMATION" + xlimSup <- max(nsigma * sigmaBest + meanBest, nsigma * sigma1 + mean1, + nsigma * sigma2 + mean2) + xlimInf <- min(-nsigma * sigmaBest+meanBest, - nsigma * sigma1 + mean1, + -nsigma * sigma2 + mean2) + # deltax <- 0.02 + deltax <- (xlimSup - xlimInf) / 10000 + + x <- seq(xlimInf, xlimSup, deltax) + df1 <- data.frame(x = x, y = dnorm(x, mean = mean1, sd = sigma1), + type = name1) + df2 <- data.frame(x = x, y = dnorm(x, mean = mean2, sd = sigma2), + type = name2) + df3 <- data.frame(x = x, y = dnorm(x, mean = meanBest, sd = sigmaBest), + type = nameBest) + df123 <- rbind(df1, df2, df3) + label1 <- paste0(name1, ": N(mean=",round(mean1, 2), ", sd=", round(sigma1, 2), + ")") + label2 <- paste0(name2, ": N(mean=",round(mean2, 2), ", sd=", round(sigma2, 2), + ")") + labelBest <- paste0(nameBest, ": N(mean=",round(meanBest,2), ", sd=", + round(sigmaBest, 2), ")") + cols <- c("#DC3912", "#13721A", "#1F5094") + names(cols) <- c(name1, name2, nameBest) + g <- ggplot(df123) + geom_line(aes(x, y, colour = type), size = rel(1.2)) + + g <- g + scale_colour_manual(values = cols, + limits = c(name1, name2, nameBest), + labels = c(label1, label2, labelBest)) + g <- g + theme(plot.title=element_text(size=rel(1.1), colour="black", + face= "bold"), + axis.text.x = element_text(size=rel(1.2)), + axis.text.y = element_text(size=rel(1.2)), + axis.title.x = element_blank(), + legend.title = element_blank(), + legend.position = c(1,1), legend.justification = c(1,1), + legend.text = element_text(face = "bold")) + g <- g + ggtitle(graphicTitle) + g <- g + labs(y="probability", size=rel(1.9)) + g <- g + stat_function(fun = dnorm_limit, args = list(mean=mean1, sd=sigma1), + fill = cols[name1], alpha=0.2, geom="area") + g <- g + stat_function(fun = dnorm_limit, args = list(mean=mean2, sd=sigma2), + fill = cols[name2], alpha=0.2, geom="area") + g <- g + stat_function(fun = dnorm_limit, args = list(mean=meanBest, + sd=sigmaBest), + fill = cols[nameBest], alpha=0.2, geom="area") + + + #----------------------------------------------------------------------------- + # Save to plotfile if needed, and return plot + #----------------------------------------------------------------------------- + if (!is.null(plotfile)) { + ggsave(plotfile, g, width = width, height = height, units = units, dpi = dpi) + } + return(g) +} + +# Auxiliar function to plot +CombinedPDFs <- function(pdf_1, pdf_2) { + mean_1 <- pdf_1[1] + sigma_1 <- pdf_1[2] + mean_2 <- pdf_2[1] + sigma_2 <- pdf_2[2] + a_1 <- (sigma_2^2)/((sigma_1^2)+(sigma_2^2)) + a_2 <- (sigma_1^2)/((sigma_1^2)+(sigma_2^2)) + pdf_mean <- a_1*mean_1 + a_2*mean_2 + pdf_sigma <- sqrt((sigma_1^2)*(sigma_2^2)/((sigma_1^2)+(sigma_2^2))) + data <- c(pdf_mean, pdf_sigma) + dim(data) <- c(statistic = 2) + return(data) +} + +dnorm_limit <- function(x,mean,sd){ + y <- dnorm(x,mean,sd) + y[x mean+sd] <- NA + return(y) +} diff --git a/R/PlotTriangles4Categories.R b/R/PlotTriangles4Categories.R new file mode 100644 index 0000000000000000000000000000000000000000..fcfb36bbb68b7b58019634d9417ce1ad0609523a --- /dev/null +++ b/R/PlotTriangles4Categories.R @@ -0,0 +1,429 @@ +#'Function to convert any 3-d numerical array to a grid of coloured triangles. +#' +#'This function converts a 3-d numerical data array into a coloured +#'grid with triangles. It is useful for a slide or article to present tabular results as +#'colors instead of numbers. This can be used to compare the outputs of two or four categories ( +#'e.g. modes of variability, clusters, or forecast systems). +#' +#'@param data array with three named dimensions: 'dimx', 'dimy', 'dimcat', +#' containing the values to be displayed in a coloured image with triangles. +#'@param brks A vector of the color bar intervals. The length must be one more +#' than the parameter 'cols'. Use ColorBar() to generate default values. +#'@param cols A vector of valid colour identifiers for color bar. The length +#' must be one less than the parameter 'brks'. Use ColorBar() to generate +#' default values. +#'@param toptitle A string of the title of the grid. Set NULL as default. +#'@param sig_data logical array with the same dimensions as 'data' to add layers +#' to the plot. A value of TRUE at a grid cell will draw a dot/symbol on the +#' corresponding triangle of the plot. Set NULL as default. +#'@param pch_sig symbol to be used to represent sig_data. Takes 18 +#' (diamond) by default. See 'pch' in par() for additional +#' accepted options. +#'@param col_sig colour of the symbol to represent sig_data. +#'@param cex_sig parameter to increase/reduce the size of the symbols used +#' to represent sig_data. +#'@param xlab A logical value (TRUE) indicating if xlabels should be plotted +#'@param ylab A logical value (TRUE) indicating if ylabels should be plotted +#'@param xlabels A vector of labels of the x-axis The length must be +#' length of the col of parameter 'data'. Set the sequence from 1 to the +#' length of the row of parameter 'data' as default. +#'@param xtitle A string of title of the x-axis. Set NULL as default. +#'@param ylabels A vector of labels of the y-axis The length must be +#' length of the row of parameter 'data'. Set the sequence from 1 to the +#' length of the row of parameter 'data' as default. +#'@param ytitle A string of title of the y-axis. Set NULL as default. +#'@param legend A logical value to decide to draw the color bar legend or not. +#' Set TRUE as default. +#'@param lab_legend A vector of labels indicating what is represented in each +#'category (i.e. triangle). Set the sequence from 1 to the length of +#' the categories (2 or 4). +#'@param cex_leg a number to indicate the increase/reductuion of the lab_legend used +#' to represent sig_data. +#'@param col_leg color of the legend (triangles). +#'@param fileout A string of full directory path and file name indicating where +#' to save the plot. If not specified (default), a graphics device will pop up. +#'@param size_units A string indicating the units of the size of the device +#' (file or window) to plot in. Set 'px' as default. See ?Devices and the +#' creator function of the corresponding device. +#'@param res A positive number indicating resolution of the device (file or window) +#' to plot in. See ?Devices and the creator function of the corresponding device. +#'@param figure.width a numeric value to control the width of the plot. +#'@param ... The additional parameters to be passed to function ColorBar() in +#' s2dverification for color legend creation. +#'@return A figure in popup window by default, or saved to the specified path. +#' +#'@author History:\cr +#'1.0 - 2020-10 (V.Torralba, \email{veronica.torralba@bsc.es}) - Original code +#' +#'@examples +#'#Example with random data +#' arr1<- arr1<- array(runif(n = 12 * 7 * 4, min=-1, max=1),dim = c(12,7,4)) +#' names(dim(arr1)) <- c('dimx','dimy','dimcat') +#'arr2<- array(TRUE,dim = dim(arr1)) +#'arr2[which(arr1 < 0.3)] = FALSE +#'PlotTriangles4Categories(data = arr1, +#' cols = c('white','#fef0d9','#fdd49e','#fdbb84','#fc8d59', +#' '#e34a33','#b30000', '#7f0000'), +#' brks = c(-1, 0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 1), +#' lab_legend = c('NAO+', 'BL','AR','NAO-'), +#' xtitle = "Target month", ytitle = "Lead time", +#' xlabels = c("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", +#' "Aug", "Sep", "Oct", "Nov", "Dec")) +#'@importFrom grDevices dev.new dev.off dev.cur +#'@importFrom graphics plot points polygon text title axis +#'@importFrom RColorBrewer brewer.pal +#'@export +PlotTriangles4Categories <- function(data, brks = NULL, cols = NULL, + toptitle = NULL, sig_data = NULL, + pch_sig = 18, col_sig = 'black', + cex_sig = 1, xlab = TRUE, ylab = TRUE, + xlabels = NULL, xtitle = NULL, + ylabels = NULL, ytitle = NULL, + legend = TRUE, lab_legend = NULL, + cex_leg = 1, col_leg = 'black', + fileout = NULL, size_units = 'px', + res = 100, figure.width = 1, ...) { + # Checking the dimensions + if (length(dim(data)) != 3) { + stop("Parameter 'data' must be an array with three dimensions.") + } + + if (any(is.na(data))){ + stop("Parameter 'data' cannot contain NAs.") + } + + if (is.null(names(dim(data)))) { + stop("Parameter 'data' must be an array with named dimensions.") + }else{ + if (!any(names(dim(data)) == 'dimx') | !any(names(dim(data)) == 'dimy') | + !any(names(dim(data)) == 'dimcat')) { + stop("Parameter 'data' should contain 'dimx', 'dimy' and 'dimcat' dimension names. ") + } + } + + if (!is.null(sig_data)) { + if (!is.logical(sig_data)) { + stop("Parameter 'sig_data' array must be logical.")} + else if (length(dim(sig_data)) != 3) { + stop("Parameter 'sig_data' must be an array with three dimensions.") + }else if (any(dim(sig_data) != dim(data))){ + stop("Parameter 'sig_data' must be an array with the same dimensions as 'data'.") + }else if(!is.null(names(dim(sig_data)))) { + if (any(names(dim(sig_data)) != names(dim(data)))) { + stop("Parameter 'sig_data' must be an array with the same named dimensions as 'data'.")} + } + } + + if (dim(data)['dimcat'] != 4 && dim(data)['dimcat'] != 2) { + stop( + "Parameter 'data' should contain a dimcat dimension with length equals + to two or four as only two or four categories can be plotted.") + } + + # Checking what is available and generating missing information + if (!is.null(lab_legend) && + length(lab_legend) != 4 && length(lab_legend) != 2) { + stop("Parameter 'lab_legend' should contain two or four names.") + } + + datadim <- dim(data) + nrow <- dim(data)['dimy'] + ncol <- dim(data)['dimx'] + ncat <- dim(data)['dimcat'] + + # If there is any filenames to store the graphics, process them + # to select the right device + if (!is.null(fileout)) { + deviceInfo <- .SelectDevice(fileout = fileout, + width = 80 * ncol * figure.width, + height = 80 * nrow, + units = size_units, res = res) + saveToFile <- deviceInfo$fun + fileout <- deviceInfo$files + } + + # Open connection to graphical device + if (!is.null(fileout)) { + saveToFile(fileout) + } else if (names(dev.cur()) == 'null device') { + dev.new(units = size_units, res = res, + width = 8 * figure.width, height = 5) + } + + if (is.null(xlabels)){ + xlabels = 1:ncol + } + if (is.null(ylabels)){ + ylabels = 1:nrow + } + + if (!is.null(brks) && !is.null(cols)) { + if (length(brks) != length(cols) + 1) { + stop("The length of the parameter 'brks' must be one more than 'cols'.") + } + } + if (is.null(brks)) { + brks <- seq(min(data, na.rm = T), max(data, na.rm = T), length.out = 9) + } + if (is.null(cols)) { + cols <- rev(brewer.pal(length(brks) - 1, 'RdBu')) + } + + # The colours for each triangle/category are defined + data_cat <- array(cols[length(cols)], dim = datadim) + names(dim(data_cat)) <- names(dim(data)) + for (i in (length(cols) - 1):1) { + data_cat[data < brks[i + 1]] <- cols[i] + } + + if(legend){ + layout(matrix(c(1, 2, 1, 3), 2, 2, byrow = T), + widths = c(10, 3.4), heights = c(10, 3.5)) + par(oma = c(1, 1, 1, 1), mar = c(5, 4, 0, 0)) + if(is.null(lab_legend)) { + lab_legend = 1:ncat + } + } + + plot(ncol, nrow, xlim = c(0, ncol), ylim=c(0, nrow), xaxs = "i", yaxs = 'i', type = "n", + xaxt = "n", yaxt = "n", ann = F, axes = F) + + box(col = 'black',lwd = 1) + + if (! is.null(toptitle)){ + title(toptitle, cex = 1.5) + } + + if (!is.null(xtitle)){ + mtext(side = 1, text = xtitle, line = 4, cex = 1.5) + } + if (!is.null(ytitle)){ + mtext(side = 2, text = ytitle, line = 2.5, cex = 1.5) + } + + if (xlab){ + axis(1, at =(1:ncol) - 0.5, las = 2, labels = xlabels, cex.axis = 1.5) + } + if (ylab){ + axis(2, at = (1:nrow) - 0.5, las = 2, labels = ylabels, cex.axis = 1.5) + } + + + #The triangles are plotted + for(p in 1:ncol){ + for(l in 1:nrow){ + if (ncat == 4){ + coord_triangl <- list(xs=list(c(p-1, p-0.5, p-1),c(p-1, p-0.5, p),c(p, p-0.5, p),c(p-1, p-0.5, p)), + ys=list( c(l-1, -0.5+l, l), c(l-1, -0.5+l, l-1),c(l-1, -0.5+l, l),c(l, -0.5+l, l))) + + coord_sig <- list(x=c(p-0.75,p-0.5,p-0.25,p-0.5),y=c(l-0.5,l-0.75,l-0.5,l-0.25)) + } + + if (ncat==2){ + coord_triangl <- list(xs=list(c(p-1, p, p-1),c(p-1, p, p)), + ys=list(c(l-1, l, l),c(l-1,l-1, l))) + coord_sig <- list(x=c(p-(2/3),p-(1/3)),y=c(l-(1/3),l-(2/3))) + } + for (n in 1:ncat) { + polygon(coord_triangl$xs[[n]], + coord_triangl$ys[[n]], + col = Subset( + data_cat, + along = c('dimcat', 'dimx', 'dimy'), + indices = list(n, p, l))) + if (!is.null(sig_data) && + Subset(sig_data,along = c('dimcat', 'dimx', 'dimy'), + indices = list(n, p, l))) { + points( + x = coord_sig$x[n], + y = coord_sig$y[n], + pch = pch_sig, + cex = cex_sig, + col = col_sig + ) + } + } + } + } + + # legend + + if(legend){ + # Colorbar + par(mar=c(0,0,0,0)) + ColorBar(brks = brks, cols = cols, vertical = T, draw_ticks = T, draw_separators = T, + # extra_margin = c(0,0,2.5,0),label_scale = 1.5,...) + extra_margin = c( 0, 0, 0, 0), label_scale = 1.5, ...) + + par(mar = c(0.5, 2.5, 0.5, 2.5)) + plot(1, 1, xlim = c(0, 1), ylim =c(0, 1), xaxs = "i", yaxs = 'i', type = "n", + xaxt = "n", yaxt = "n", ann = F, axes = F) + + box(col = col_leg) + p = l = 1 + if (ncat == 4){ + coord_triangl <- list(xs = list(c(p -1, p - 0.5, p - 1), c(p - 1, p - 0.5, p), + c(p, p - 0.5, p), c(p - 1, p - 0.5, p)), + ys = list(c(l - 1, - 0.5 + l, l), c(l - 1, - 0.5 + l, l - 1), + c(l - 1, - 0.5 + l, l), c(l, - 0.5 + l, l))) + + coord_sig <- list(x = c(p - 0.75, p - 0.5, p - 0.25, p - 0.5), + y = c(l - 0.5, l - 0.75, l - 0.5, l - 0.25)) + } + + if (ncat==2){ + coord_triangl<- list(xs=list(c(p-1, p, p),c(p-1, p, p-1)), + ys=list( c(l-1,l-1, l), c(l-1, l, l))) + coord_sig<- list(x=c(p-(2/3),p-(1/3)),y=c(l-(1/3),l-(2/3))) + } + for (n in 1:ncat) { + polygon(coord_triangl$xs[[n]], + coord_triangl$ys[[n]],border=col_leg) + text(x=coord_sig$x[[n]],y=coord_sig$y[[n]],labels = lab_legend[n],cex=cex_leg,col=col_leg) + + } + } + + # If the graphic was saved to file, close the connection with the device + if (!is.null(fileout)) dev.off() +} +.SelectDevice <- function(fileout, width, height, units, res) { + # This function is used in the plot functions to check the extension of the + # files where the graphics will be stored and select the right R device to + # save them. + # If the vector of filenames ('fileout') has files with different + # extensions, then it will only accept the first one, changing all the rest + # of the filenames to use that extension. + + # We extract the extension of the filenames: '.png', '.pdf', ... + ext <- regmatches(fileout, regexpr("\\.[a-zA-Z0-9]*$", fileout)) + + if (length(ext) != 0) { + # If there is an extension specified, select the correct device + ## units of width and height set to accept inches + if (ext[1] == ".png") { + saveToFile <- function(fileout) { + png(filename = fileout, width = width, height = height, res = res, units = units) + } + } else if (ext[1] == ".jpeg") { + saveToFile <- function(fileout) { + jpeg(filename = fileout, width = width, height = height, res = res, units = units) + } + } else if (ext[1] %in% c(".eps", ".ps")) { + saveToFile <- function(fileout) { + postscript(file = fileout, width = width, height = height) + } + } else if (ext[1] == ".pdf") { + saveToFile <- function(fileout) { + pdf(file = fileout, width = width, height = height) + } + } else if (ext[1] == ".svg") { + saveToFile <- function(fileout) { + svg(filename = fileout, width = width, height = height) + } + } else if (ext[1] == ".bmp") { + saveToFile <- function(fileout) { + bmp(filename = fileout, width = width, height = height, res = res, units = units) + } + } else if (ext[1] == ".tiff") { + saveToFile <- function(fileout) { + tiff(filename = fileout, width = width, height = height, res = res, units = units) + } + } else { + .warning("file extension not supported, it will be used '.eps' by default.") + ## In case there is only one filename + fileout[1] <- sub("\\.[a-zA-Z0-9]*$", ".eps", fileout[1]) + ext[1] <- ".eps" + saveToFile <- function(fileout) { + postscript(file = fileout, width = width, height = height) + } + } + # Change filenames when necessary + if (any(ext != ext[1])) { + .warning(paste0("some extensions of the filenames provided in 'fileout' are not ", ext[1],". The extensions are being converted to ", ext[1], ".")) + fileout <- sub("\\.[a-zA-Z0-9]*$", ext[1], fileout) + } + } else { + # Default filenames when there is no specification + .warning("there are no extensions specified in the filenames, default to '.eps'") + fileout <- paste0(fileout, ".eps") + saveToFile <- postscript + } + + # return the correct function with the graphical device, and the correct + # filenames + list(fun = saveToFile, files = fileout) +} + +.warning <- function(...) { + # Function to use the 'warning' R function with our custom settings + # Default: no call information, indent to 0, exdent to 3, + # collapse to \n + args <- list(...) + + ## In case we need to specify warning arguments + if (!is.null(args[["call."]])) { + call <- args[["call."]] + } else { + ## Default: don't show info about the call where the warning came up + call <- FALSE + } + if (!is.null(args[["immediate."]])) { + immediate <- args[["immediate."]] + } else { + ## Default value in warning function + immediate <- FALSE + } + if (!is.null(args[["noBreaks."]])) { + noBreaks <- args[["noBreaks."]] + } else { + ## Default value warning function + noBreaks <- FALSE + } + if (!is.null(args[["domain"]])) { + domain <- args[["domain"]] + } else { + ## Default value warning function + domain <- NULL + } + args[["call."]] <- NULL + args[["immediate."]] <- NULL + args[["noBreaks."]] <- NULL + args[["domain"]] <- NULL + + ## To modify strwrap indent and exdent arguments + if (!is.null(args[["indent"]])) { + indent <- args[["indent"]] + } else { + indent <- 0 + } + if (!is.null(args[["exdent"]])) { + exdent <- args[["exdent"]] + } else { + exdent <- 3 + } + args[["indent"]] <- NULL + args[["exdent"]] <- NULL + + ## To modify paste collapse argument + if (!is.null(args[["collapse"]])) { + collapse <- args[["collapse"]] + } else { + collapse <- "\n!" + } + args[["collapse"]] <- NULL + + ## Warning tag + if (!is.null(args[["tag"]])) { + tag <- args[["tag"]] + } else { + tag <- "! Warning: " + } + args[["tag"]] <- NULL + + warning(paste0(tag, paste(strwrap( + args, indent = indent, exdent = exdent + ), collapse = collapse)), call. = call, immediate. = immediate, + noBreaks. = noBreaks, domain = domain) +} + diff --git a/man/CST_Anomaly.Rd b/man/CST_Anomaly.Rd index 07691ea7d7ff5a6105b0e0f4e53a9373d4fdccab..1157416852e78b063b6c4db1606e2d3e2b8401cf 100644 --- a/man/CST_Anomaly.Rd +++ b/man/CST_Anomaly.Rd @@ -4,7 +4,14 @@ \alias{CST_Anomaly} \title{Anomalies relative to a climatology along selected dimension with or without cross-validation} \usage{ -CST_Anomaly(exp = NULL, obs = NULL, cross = FALSE, memb = TRUE, dim_anom = 3) +CST_Anomaly( + exp = NULL, + obs = NULL, + cross = FALSE, + memb = TRUE, + filter_span = NULL, + dim_anom = 3 +) } \arguments{ \item{exp}{an object of class \code{s2dv_cube} as returned by \code{CST_Load} function, containing the seasonal forecast experiment data in the element named \code{$data}.} @@ -16,6 +23,8 @@ CST_Anomaly(exp = NULL, obs = NULL, cross = FALSE, memb = TRUE, dim_anom = 3) \item{memb}{A logical value indicating whether Clim() computes one climatology for each experimental data product member(TRUE) or it computes one sole climatology for all members (FALSE). Default = TRUE.} +\item{filter_span}{a numeric value indicating the degree of smoothing. This option is only available if parameter \code{cross} is set to FALSE.} + \item{dim_anom}{An integer indicating the dimension along which the climatology will be computed. It usually corresponds to 3 (sdates) or 4 (ftime). Default = 3.} } @@ -52,6 +61,10 @@ str(anom3) anom4 <- CST_Anomaly(exp = exp, obs = obs, cross = FALSE, memb = FALSE) str(anom4) +anom5 <- CST_Anomaly(lonlat_data$exp) + +anom6 <- CST_Anomaly(obs = lonlat_data$obs) + } \seealso{ \code{\link[s2dverification]{Ano_CrossValid}}, \code{\link[s2dverification]{Clim}} and \code{\link{CST_Load}} diff --git a/man/CST_BiasCorrection.Rd b/man/CST_BiasCorrection.Rd index a1b415fb525a9f2e3c72171e631c2633e0aefcd6..55c325a2db6dc53f8b315c4c0afc718746f79b1a 100644 --- a/man/CST_BiasCorrection.Rd +++ b/man/CST_BiasCorrection.Rd @@ -4,12 +4,14 @@ \alias{CST_BiasCorrection} \title{Bias Correction based on the mean and standard deviation adjustment} \usage{ -CST_BiasCorrection(exp, obs) +CST_BiasCorrection(exp, obs, na.rm = FALSE) } \arguments{ \item{exp}{an object of class \code{s2dv_cube} as returned by \code{CST_Load} function, containing the seasonal forecast experiment data in the element named \code{$data}} \item{obs}{an object of class \code{s2dv_cube} as returned by \code{CST_Load} function, containing the observed data in the element named \code{$data}.} + +\item{na.rm}{a logical value indicating whether missing values should be stripped before the computation proceeds, by default it is set to FALSE.} } \value{ an object of class \code{s2dv_cube} containing the bias corrected forecasts in the element called \code{$data} with the same dimensions of the experimental data. diff --git a/man/CST_Calibration.Rd b/man/CST_Calibration.Rd index 891e2e5fc10fb6751d4e0a6b08cd5b71232a0ed4..76812a438ce7f4cd0b42975e7feef911d6f9b48c 100644 --- a/man/CST_Calibration.Rd +++ b/man/CST_Calibration.Rd @@ -9,8 +9,8 @@ CST_Calibration( obs, cal.method = "mse_min", eval.method = "leave-one-out", - multi.model = F, - na.fill = T, + multi.model = FALSE, + na.fill = TRUE, ncores = 1 ) } diff --git a/man/CST_EnsClustering.Rd b/man/CST_EnsClustering.Rd index 154541d55a3adb839e923ef409fe8cb82cdd59ff..a7ca4a9c2d86dd839e6f4316ac0082b274f1c6fa 100644 --- a/man/CST_EnsClustering.Rd +++ b/man/CST_EnsClustering.Rd @@ -12,6 +12,7 @@ CST_EnsClustering( lat_lim = NULL, variance_explained = 80, numpcs = NULL, + time_dim = NULL, time_percentile = 90, cluster_dim = "member", verbose = F @@ -24,7 +25,7 @@ spatial dimensions named "lon" and "lat", and dimensions "dataset", "member", "f \item{time_moment}{Decides the moment to be applied to the time dimension. Can be either 'mean' (time mean), 'sd' (standard deviation along time) or 'perc' (a selected percentile on time). -If 'perc' the keyword 'time_percentile' is also needed.} +If 'perc' the keyword 'time_percentile' is also used.} \item{numclus}{Number of clusters (scenarios) to be calculated. If set to NULL the number of ensemble members divided by 10 is used, with a minimum of 2 and a maximum of 8.} @@ -38,9 +39,13 @@ Defaults to 80. Not used if numpcs is specified.} \item{numpcs}{Number of EOFs retained in the analysis (optional).} +\item{time_dim}{String or character array with name(s) of dimension(s) over which to compute statistics. +If omitted c("ftime", "sdate", "time") are searched in this order.} + \item{time_percentile}{Set the percentile in time you want to analyse (used for `time_moment = "perc").} -\item{cluster_dim}{Dimension along which to cluster. Typically "member" or "sdate". This can also be a list like c("member", "sdate").} +\item{cluster_dim}{Dimension along which to cluster. Typically "member" or "sdate". +This can also be a list like c("member", "sdate").} \item{verbose}{Logical for verbose output} } @@ -94,22 +99,23 @@ the maximum distance between a member in a cluster and the cluster centroid deviation for each cluster (i.e. how much the cluster is compact). } \examples{ +\donttest{ exp <- lonlat_data$exp # Example 1: Cluster on all start dates, members and models res <- CST_EnsClustering(exp, numclus = 3, cluster_dim = c("member", "dataset", "sdate")) iclus = res$cluster[2, 1, 3] -\donttest{ + print(paste("Cluster of 2. member, 1. dataset, 3. sdate:", iclus)) print(paste("Frequency (numerosity) of cluster (", iclus, ") :", res$freq[iclus])) library(s2dverification) PlotEquiMap(res$repr_field[iclus, , ], exp$lon, exp$lat, filled.continents = FALSE, toptitle = paste("Representative field of cluster", iclus)) -} + # Example 2: Cluster on members retaining 4 EOFs during # preliminary dimensional reduction -\donttest{ + res <- CST_EnsClustering(exp, numclus = 3, numpcs = 4, cluster_dim = "member") # Example 3: Cluster on members, retain 80\% of variance during # preliminary dimensional reduction diff --git a/man/CST_QuantileMapping.Rd b/man/CST_QuantileMapping.Rd index ad8f4b6ccaf59532d503ea314e29a337fca1b2f4..e78c8d563745e077b54edd3f8d904e07510c9bf6 100644 --- a/man/CST_QuantileMapping.Rd +++ b/man/CST_QuantileMapping.Rd @@ -42,11 +42,11 @@ This function is a wrapper from fitQmap and doQmap from package 'qmap'to be appl \details{ The different methods are: \itemize{ -\item{'PTF'} {fits a parametric transformations to the quantile-quantile relation of observed and modelled values. See \code{\link[qmap]{fitQmapPTF}}.} -\item{'DIST'} {fits a theoretical distribution to observed and to modelled time series. See \code{\link[qmap]{fitQmapDIST}}.} -\item{'RQUANT'} {estimates the values of the quantile-quantile relation of observed and modelled time series for regularly spaced quantiles using local linear least square regression. See \code{\link[qmap]{fitQmapRQUANT}}.} -\item{'QUANT'} {estimates values of the empirical cumulative distribution function of observed and modelled time series for regularly spaced quantiles. See \code{\link[qmap]{fitQmapQUANT}}.} -\item{'SSPLIN'} {fits a smoothing spline to the quantile-quantile plot of observed and modelled time series. See \code{\link[qmap]{fitQmapSSPLIN}}}.} +\item{'PTF'} {fits a parametric transformations to the quantile-quantile relation of observed and modelled values. See \code{?qmap::fitQmapPTF}.} +\item{'DIST'} {fits a theoretical distribution to observed and to modelled time series. See \code{?qmap::fitQmapDIST}.} +\item{'RQUANT'} {estimates the values of the quantile-quantile relation of observed and modelled time series for regularly spaced quantiles using local linear least square regression. See \code{?qmap::fitQmapRQUANT}.} +\item{'QUANT'} {estimates values of the empirical cumulative distribution function of observed and modelled time series for regularly spaced quantiles. See \code{?qmap::fitQmapQUANT}.} +\item{'SSPLIN'} {fits a smoothing spline to the quantile-quantile plot of observed and modelled time series. See \code{?qmap::fitQmapSSPLIN}.}} All methods accepts some common arguments: \itemize{ \item{wet.day} {logical indicating whether to perform wet day correction or not.(Not available in 'DIS' method)} @@ -85,7 +85,7 @@ res <- CST_QuantileMapping(exp = exp, obs = obs, sample_dims = 'time', } } \seealso{ -\code{\link[qmap]{fitQmap}} and \code{\link[qmap]{doQmap}} +\code{qmap::fitQmap} and \code{qmap::doQmap} } \author{ Nuria Perez-Zanon, \email{nuria.perez@bsc.es} diff --git a/man/CST_RFTemp.Rd b/man/CST_RFTemp.Rd new file mode 100644 index 0000000000000000000000000000000000000000..8ab5b6f329b63174cab568199ee4e5245b8927e5 --- /dev/null +++ b/man/CST_RFTemp.Rd @@ -0,0 +1,107 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/CST_RFTemp.R +\name{CST_RFTemp} +\alias{CST_RFTemp} +\title{Temperature downscaling of a CSTools object using lapse rate +correction or a reference field} +\usage{ +CST_RFTemp( + data, + oro, + xlim = NULL, + ylim = NULL, + lapse = 6.5, + lon_dim = "lon", + lat_dim = "lat", + time_dim = NULL, + nolapse = FALSE, + verbose = FALSE, + compute_delta = FALSE, + method = "bilinear", + delta = NULL +) +} +\arguments{ +\item{data}{An object of the class 's2dv_cube' as returned by `CST_Load`, +containing the temperature fields to downscale. +The data object is expected to have an element named \code{$data} +with at least two spatial dimensions named "lon" and "lat". +(these default names can be changed with the \code{lon_dim} and +\code{lat_dim} parameters)} + +\item{oro}{An object of the class 's2dv_cube' as returned by `CST_Load`, +containing fine scale orography (in meters). +The destination downscaling area must be contained in the orography field.} + +\item{xlim}{vector with longitude bounds for downscaling; +the full input field is downscaled if `xlim` and `ylim` are not specified.} + +\item{ylim}{vector with latitude bounds for downscaling} + +\item{lapse}{float with environmental lapse rate} + +\item{lon_dim}{string with name of longitude dimension} + +\item{lat_dim}{string with name of latitude dimension} + +\item{time_dim}{a vector of character string indicating the name of temporal dimension. By default, it is set to NULL and it considers "ftime", "sdate" and "time" as temporal dimensions.} + +\item{nolapse}{logical, if true `oro` is interpreted as a fine-scale +climatology and used directly for bias correction} + +\item{verbose}{logical if to print diagnostic output} + +\item{compute_delta}{logical if true returns only a delta to be used for +out-of-sample forecasts. Returns an object of the class 's2dv_cube', +containing a delta. Activates `nolapse = TRUE`.} + +\item{method}{string indicating the method used for interpolation: +"nearest" (nearest neighbours followed by smoothing with a circular +uniform weights kernel), "bilinear" (bilinear interpolation) +The two methods provide similar results, but nearest is slightly better +provided that the fine-scale grid is correctly centered as a subdivision +of the large-scale grid} + +\item{delta}{An object of the class 's2dv_cube', containing a delta +to be applied to the downscaled input data. Activates `nolapse = TRUE`. +The grid of this object must coincide with that of the required output.} +} +\value{ +CST_RFTemp() returns a downscaled CSTools object +(i.e., of the class 's2dv_cube'). +} +\description{ +This function implements a simple lapse rate correction of a +temperature field (an object of class 's2dv_cube' as provided by +`CST_Load`) as input. +The input lon grid must be increasing (but can be modulo 360). +The input lat grid can be irregularly spaced (e.g. a Gaussian grid) +The output grid can be irregularly spaced in lon and/or lat. +} +\examples{ +# Generate simple synthetic data and downscale by factor 4 +t <- rnorm(7 * 6 * 2 * 3 * 4)*10 + 273.15 + 10 +dim(t) <- c(dataset = 1, member = 2, sdate = 3, ftime = 4, lat = 6, lon = 7) +lon <- seq(3, 9, 1) +lat <- seq(42, 47, 1) +exp <- list(data = t, lat = lat, lon = lon) +attr(exp, 'class') <- 's2dv_cube' +o <- runif(29*29)*3000 +dim(o) <- c(lat = 29, lon = 29) +lon <- seq(3, 10, 0.25) +lat <- seq(41, 48, 0.25) +oro <- list(data = o, lat = lat, lon = lon) +attr(oro, 'class') <- 's2dv_cube' +res <- CST_RFTemp(exp, oro, xlim=c(4,8), ylim=c(43, 46), lapse=6.5) +} +\references{ +Method described in ERA4CS MEDSCOPE milestone M3.2: +High-quality climate prediction data available to WP4 +[https://www.medscope-project.eu/the-project/deliverables-reports/]([https://www.medscope-project.eu/the-project/deliverables-reports/) +and in H2020 ECOPOTENTIAL Deliverable No. 8.1: +High resolution (1-10 km) climate, land use and ocean change scenarios +[https://www.ecopotential-project.eu/images/ecopotential/documents/D8.1.pdf](https://www.ecopotential-project.eu/images/ecopotential/documents/D8.1.pdf) +} +\author{ +Jost von Hardenberg - ISAC-CNR, \email{j.vonhardenberg@isac.cnr.it} +} diff --git a/man/CST_RegimesAssign.Rd b/man/CST_RegimesAssign.Rd new file mode 100644 index 0000000000000000000000000000000000000000..43df97c45b387185eb00ecb937fcbc18eab46694 --- /dev/null +++ b/man/CST_RegimesAssign.Rd @@ -0,0 +1,58 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/CST_RegimesAssign.R +\name{CST_RegimesAssign} +\alias{CST_RegimesAssign} +\title{Function for matching a field of anomalies with +a set of maps used as a reference (e.g. clusters obtained from the WeatherRegime function)} +\usage{ +CST_RegimesAssign( + data, + ref_maps, + method = "distance", + composite = FALSE, + memb = FALSE, + ncores = NULL +) +} +\arguments{ +\item{data}{a 's2dv_cube' object.} + +\item{ref_maps}{a 's2dv_cube' object as the output of CST_WeatherRegimes.} + +\item{method}{whether the matching will be performed in terms of minimum distance (default = ’distance’) or +the maximum spatial correlation (method = ’ACC’) between the maps.} + +\item{composite}{a logical parameter indicating if the composite maps are computed or not (default = FALSE).} + +\item{memb}{a logical value indicating whether to compute composites for separate members (default FALSE) or as unique ensemble (TRUE). +This option is only available for when parameter 'composite' is set to TRUE and the data object has a dimension named 'member'.} + +\item{ncores}{the number of multicore threads to use for parallel computation.} +} +\value{ +A list with two elements \code{$data} (a 's2dv_cube' object containing the composites cluster=1,..,K for case (*1) + \code{$pvalue} (array with the same structure as \code{$data} containing the pvalue of the composites obtained through a t-test + that accounts for the serial dependence of the data with the same structure as Composite.)(only when composite = 'TRUE'), + \code{$cluster} (array with the same dimensions as data (except latitude and longitude which are removed) indicating the ref_maps to which each point is allocated.) , + \code{$frequency} (A vector of integers (from k=1,...k n reference maps) indicating the percentage of assignations corresponding to each map.), +} +\description{ +This function performs the matching between a field of anomalies and a set +of maps which will be used as a reference. The anomalies will be assigned to the reference map +for which the minimum Eucledian distance (method=’distance’) or highest spatial correlation +(method=‘ACC’) is obtained. +} +\examples{ +\dontrun{ +regimes <- CST_WeatherRegimes(data = lonlat_data$obs, EOFs = FALSE, ncenters = 4) +res1 <- CST_RegimesAssign(data = lonlat_data$exp, ref_maps = regimes, composite = FALSE) +res2 <- CST_RegimesAssign(data = lonlat_data$exp, ref_maps = regimes, composite = TRUE) +} +} +\references{ +Torralba, V. (2019) Seasonal climate prediction for the wind energy sector: methods and tools +for the development of a climate service. Thesis. Available online: \url{https://eprints.ucm.es/56841/} +} +\author{ +Verónica Torralba - BSC, \email{veronica.torralba@bsc.es} +} diff --git a/man/CST_WeatherRegimes.Rd b/man/CST_WeatherRegimes.Rd new file mode 100644 index 0000000000000000000000000000000000000000..a243a76ae2f0484babf3f67075467ed54fa7991a --- /dev/null +++ b/man/CST_WeatherRegimes.Rd @@ -0,0 +1,72 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/CST_WeatherRegimes.R +\name{CST_WeatherRegimes} +\alias{CST_WeatherRegimes} +\title{Function for Calculating the Cluster analysis} +\usage{ +CST_WeatherRegimes( + data, + ncenters = NULL, + EOFs = TRUE, + neofs = 30, + varThreshold = NULL, + method = "kmeans", + iter.max = 100, + nstart = 30, + ncores = NULL +) +} +\arguments{ +\item{data}{a 's2dv_cube' object} + +\item{ncenters}{Number of clusters to be calculated with the clustering function.} + +\item{EOFs}{Whether to compute the EOFs (default = 'TRUE') or not (FALSE) to filter the data.} + +\item{neofs}{number of modes to be kept (default = 30).} + +\item{varThreshold}{Value with the percentage of variance to be explained by the PCs. +Only sufficient PCs to explain this much variance will be used in the clustering.} + +\item{method}{Different options to estimate the clusters. The most traditional approach is the k-means analysis (default=’kmeans’) +but the function also support the different methods included in the hclust . These methods are: +"ward.D", "ward.D2", "single", "complete", "average" (= UPGMA), "mcquitty" (= WPGMA), "median" (= WPGMC) or "centroid" (= UPGMC). +For more details about these methods see the hclust function documentation included in the stats package.} + +\item{iter.max}{Parameter to select the maximum number of iterations allowed (Only if method='kmeans' is selected).} + +\item{nstart}{Parameter for the cluster analysis determining how many random sets to choose (Only if method='kmeans' is selected).} + +\item{ncores}{The number of multicore threads to use for parallel computation.} +} +\value{ +A list with two elements \code{$data} (a 's2dv_cube' object containing the composites cluster=1,..,K for case (*1) + \code{$pvalue} (array with the same structure as \code{$data} containing the pvalue of the composites obtained through a t-test that accounts for the serial dependence.), + \code{cluster} (A matrix or vector with integers (from 1:k) indicating the cluster to which each time step is allocated.), + \code{persistence} (Percentage of days in a month/season before a cluster is replaced for a new one (only if method=’kmeans’ has been selected.)), + \code{frequency} (Percentage of days in a month/season belonging to each cluster (only if method=’kmeans’ has been selected).), +} +\description{ +This function computes the weather regimes from a cluster analysis. +It is applied on the array \code{data} in a 's2dv_cube' object. The dimensionality of this object can be also reduced +by using PCs obtained from the application of the #'EOFs analysis to filter the dataset. +The cluster analysis can be performed with the traditional k-means or those methods +included in the hclust (stats package). +} +\examples{ +\dontrun{ +res1 <- CST_WeatherRegimes(data = lonlat_data$obs, EOFs = FALSE, ncenters = 4) +res2 <- CST_WeatherRegimes(data = lonlat_data$obs, EOFs = TRUE, ncenters = 3) +} +} +\references{ +Cortesi, N., V., Torralba, N., González-Reviriego, A., Soret, and F.J., Doblas-Reyes (2019). +Characterization of European wind speed variability using weather regimes. Climate Dynamics,53, +4961–4976, doi:10.1007/s00382-019-04839-5. + +Torralba, V. (2019) Seasonal climate prediction for the wind energy sector: methods and tools +for the development of a climate service. Thesis. Available online: \url{https://eprints.ucm.es/56841/} +} +\author{ +Verónica Torralba - BSC, \email{veronica.torralba@bsc.es} +} diff --git a/man/Calibration.Rd b/man/Calibration.Rd index 9f884671ebb4bdf6b6412a5f56c2f7a28df3b603..64452279fce1be4719bcdfdaa18864398fd01ee2 100644 --- a/man/Calibration.Rd +++ b/man/Calibration.Rd @@ -9,8 +9,8 @@ Calibration( obs, cal.method = "mse_min", eval.method = "leave-one-out", - multi.model = F, - na.fill = T, + multi.model = FALSE, + na.fill = TRUE, ncores = 1 ) } @@ -37,6 +37,14 @@ Four types of member-by-member bias correction can be performed. The \code{bias} Both in-sample or our out-of-sample (leave-one-out cross validation) calibration are possible. } +\examples{ +mod1 <- 1 : (1 * 3 * 4 * 5 * 6 * 7) +dim(mod1) <- c(dataset = 1, member = 3, sdate = 4, ftime = 5, lat = 6, lon = 7) +obs1 <- 1 : (1 * 1 * 4 * 5 * 6 * 7) +dim(obs1) <- c(dataset = 1, member = 1, sdate = 4, ftime = 5, lat = 6, lon = 7) +a <- Calibration(exp = mod1, obs = obs1) +str(a) +} \references{ Doblas-Reyes F.J, Hagedorn R, Palmer T.N. The rationale behind the success of multi-model ensembles in seasonal forecasting-II calibration and combination. Tellus A. 2005;57:234-252. doi:10.1111/j.1600-0870.2005.00104.x diff --git a/man/EnsClustering.Rd b/man/EnsClustering.Rd index 2fd8a3f1036e46a2e7b6c0eedc8372a08205cbd3..510fc1686d6e62a618052db96aa2fe2985ef3bd4 100644 --- a/man/EnsClustering.Rd +++ b/man/EnsClustering.Rd @@ -15,6 +15,7 @@ EnsClustering( variance_explained = 80, numpcs = NULL, time_percentile = 90, + time_dim = NULL, cluster_dim = "member", verbose = T ) @@ -28,7 +29,7 @@ EnsClustering( \item{time_moment}{Decides the moment to be applied to the time dimension. Can be either 'mean' (time mean), 'sd' (standard deviation along time) or 'perc' (a selected percentile on time). -If 'perc' the keyword 'time_percentile' is also needed.} +If 'perc' the keyword 'time_percentile' is also used.} \item{numclus}{Number of clusters (scenarios) to be calculated. If set to NULL the number of ensemble members divided by 10 is used, with a minimum of 2 and a maximum of 8.} @@ -44,7 +45,11 @@ Defaults to 80. Not used if numpcs is specified.} \item{time_percentile}{Set the percentile in time you want to analyse (used for `time_moment = "perc").} -\item{cluster_dim}{Dimension along which to cluster. Typically "member" or "sdate". This can also be a list like c("member", "sdate").} +\item{time_dim}{String or character array with name(s) of dimension(s) over which to compute statistics. +If omitted c("ftime", "sdate", "time") are searched in this order.} + +\item{cluster_dim}{Dimension along which to cluster. Typically "member" or "sdate". +This can also be a list like c("member", "sdate").} \item{verbose}{Logical for verbose output} } @@ -62,10 +67,12 @@ and returns a number of scenarios, with representative members for each of them. The clustering is performed in a reduced EOF space. } \examples{ +\donttest{ exp <- lonlat_data$exp res <- EnsClustering(exp$data, exp$lat, exp$lon, numclus = 3, cluster_dim = c("member", "dataset", "sdate")) } +} \author{ Federico Fabiano - ISAC-CNR, \email{f.fabiano@isac.cnr.it} diff --git a/man/PlotCombinedMap.Rd b/man/PlotCombinedMap.Rd index 616b84f98b6951040beb33961f285f571c4523ad..c45d1afb594c362a6ecf8e4e641cf7223f8a0e1c 100644 --- a/man/PlotCombinedMap.Rd +++ b/man/PlotCombinedMap.Rd @@ -82,6 +82,17 @@ PlotCombinedMap(list(a, b, c), lons, lats, display_range = c(0, 1), bar_titles = paste('\% of belonging to', c('a', 'b', 'c')), brks = 20, width = 10, height = 8) + +Lon <- c(0:40, 350:359) +Lat <- 51:26 +data <- rnorm(51 * 26 * 3) +dim(data) <- c(map = 3, lon = 51, lat = 26) +mask <- sample(c(0,1), replace = TRUE, size = 51 * 26) +dim(mask) <- c(lat = 26, lon = 51) +PlotCombinedMap(data, lon = Lon, lat = Lat, map_select_fun = max, + display_range = range(data), mask = mask, + width = 12, height = 8) + } \seealso{ \code{PlotCombinedMap} and \code{PlotEquiMap} diff --git a/man/PlotPDFsOLE.Rd b/man/PlotPDFsOLE.Rd new file mode 100644 index 0000000000000000000000000000000000000000..f2e2be8c8c75cd70dabba63accaa5aafbcc37cc3 --- /dev/null +++ b/man/PlotPDFsOLE.Rd @@ -0,0 +1,67 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/PlotPDFsOLE.R +\name{PlotPDFsOLE} +\alias{PlotPDFsOLE} +\title{Plotting two probability density gaussian functions and the optimal linear +estimation (OLE) as result of combining them.} +\usage{ +PlotPDFsOLE( + pdf_1, + pdf_2, + nsigma = 3, + plotfile = NULL, + width = 30, + height = 15, + units = "cm", + dpi = 300 +) +} +\arguments{ +\item{pdf_1}{A numeric array with a dimension named 'statistic', containg +two parameters: mean' and 'standard deviation' of the first gaussian pdf +to combining.} + +\item{pdf_2}{A numeric array with a dimension named 'statistic', containg +two parameters: mean' and 'standard deviation' of the second gaussian pdf + to combining.} + +\item{nsigma}{(optional) A numeric value for setting the limits of X axis. +(Default nsigma = 3).} + +\item{plotfile}{(optional) A filename where the plot will be saved. +(Default: the plot is not saved).} + +\item{width}{(optional) A numeric value indicating the plot width in +units ("in", "cm", or "mm"). (Default width = 30).} + +\item{height}{(optional) A numeric value indicating the plot height. +(Default height = 15).} + +\item{units}{(optional) A character value indicating the plot size +unit. (Default units = 'cm').} + +\item{dpi}{(optional) A numeric value indicating the plot resolution. +(Default dpi = 300).} +} +\value{ +PlotPDFsOLE() returns a ggplot object containing the plot. +} +\description{ +This function plots two probability density gaussian functions +and the optimal linear estimation (OLE) as result of combining them. +} +\examples{ +# Example 1 +pdf_1 <- c(1.1,0.6) +attr(pdf_1, "name") <- "NAO1" +dim(pdf_1) <- c(statistic = 2) +pdf_2 <- c(1,0.5) +attr(pdf_2, "name") <- "NAO2" +dim(pdf_2) <- c(statistic = 2) + +PlotPDFsOLE(pdf_1, pdf_2) + +} +\author{ +Eroteida Sanchez-Garcia - AEMET, //email{esanchezg@aemet.es} +} diff --git a/man/PlotTriangles4Categories.Rd b/man/PlotTriangles4Categories.Rd new file mode 100644 index 0000000000000000000000000000000000000000..6e95df38351cb9f4c7bfcfa749e36af265f0d160 --- /dev/null +++ b/man/PlotTriangles4Categories.Rd @@ -0,0 +1,129 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/PlotTriangles4Categories.R +\name{PlotTriangles4Categories} +\alias{PlotTriangles4Categories} +\title{Function to convert any 3-d numerical array to a grid of coloured triangles.} +\usage{ +PlotTriangles4Categories( + data, + brks = NULL, + cols = NULL, + toptitle = NULL, + sig_data = NULL, + pch_sig = 18, + col_sig = "black", + cex_sig = 1, + xlab = TRUE, + ylab = TRUE, + xlabels = NULL, + xtitle = NULL, + ylabels = NULL, + ytitle = NULL, + legend = TRUE, + lab_legend = NULL, + cex_leg = 1, + col_leg = "black", + fileout = NULL, + size_units = "px", + res = 100, + figure.width = 1, + ... +) +} +\arguments{ +\item{data}{array with three named dimensions: 'dimx', 'dimy', 'dimcat', +containing the values to be displayed in a coloured image with triangles.} + +\item{brks}{A vector of the color bar intervals. The length must be one more +than the parameter 'cols'. Use ColorBar() to generate default values.} + +\item{cols}{A vector of valid colour identifiers for color bar. The length +must be one less than the parameter 'brks'. Use ColorBar() to generate +default values.} + +\item{toptitle}{A string of the title of the grid. Set NULL as default.} + +\item{sig_data}{logical array with the same dimensions as 'data' to add layers +to the plot. A value of TRUE at a grid cell will draw a dot/symbol on the +corresponding triangle of the plot. Set NULL as default.} + +\item{pch_sig}{symbol to be used to represent sig_data. Takes 18 +(diamond) by default. See 'pch' in par() for additional +accepted options.} + +\item{col_sig}{colour of the symbol to represent sig_data.} + +\item{cex_sig}{parameter to increase/reduce the size of the symbols used +to represent sig_data.} + +\item{xlab}{A logical value (TRUE) indicating if xlabels should be plotted} + +\item{ylab}{A logical value (TRUE) indicating if ylabels should be plotted} + +\item{xlabels}{A vector of labels of the x-axis The length must be +length of the col of parameter 'data'. Set the sequence from 1 to the +length of the row of parameter 'data' as default.} + +\item{xtitle}{A string of title of the x-axis. Set NULL as default.} + +\item{ylabels}{A vector of labels of the y-axis The length must be +length of the row of parameter 'data'. Set the sequence from 1 to the +length of the row of parameter 'data' as default.} + +\item{ytitle}{A string of title of the y-axis. Set NULL as default.} + +\item{legend}{A logical value to decide to draw the color bar legend or not. +Set TRUE as default.} + +\item{lab_legend}{A vector of labels indicating what is represented in each +category (i.e. triangle). Set the sequence from 1 to the length of +the categories (2 or 4).} + +\item{cex_leg}{a number to indicate the increase/reductuion of the lab_legend used +to represent sig_data.} + +\item{col_leg}{color of the legend (triangles).} + +\item{fileout}{A string of full directory path and file name indicating where +to save the plot. If not specified (default), a graphics device will pop up.} + +\item{size_units}{A string indicating the units of the size of the device +(file or window) to plot in. Set 'px' as default. See ?Devices and the +creator function of the corresponding device.} + +\item{res}{A positive number indicating resolution of the device (file or window) +to plot in. See ?Devices and the creator function of the corresponding device.} + +\item{figure.width}{a numeric value to control the width of the plot.} + +\item{...}{The additional parameters to be passed to function ColorBar() in +s2dverification for color legend creation.} +} +\value{ +A figure in popup window by default, or saved to the specified path. +} +\description{ +This function converts a 3-d numerical data array into a coloured +grid with triangles. It is useful for a slide or article to present tabular results as +colors instead of numbers. This can be used to compare the outputs of two or four categories ( +e.g. modes of variability, clusters, or forecast systems). +} +\examples{ +#Example with random data +arr1<- arr1<- array(runif(n = 12 * 7 * 4, min=-1, max=1),dim = c(12,7,4)) +names(dim(arr1)) <- c('dimx','dimy','dimcat') +arr2<- array(TRUE,dim = dim(arr1)) +arr2[which(arr1 < 0.3)] = FALSE +PlotTriangles4Categories(data = arr1, + cols = c('white','#fef0d9','#fdd49e','#fdbb84','#fc8d59', + '#e34a33','#b30000', '#7f0000'), + brks = c(-1, 0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 1), + lab_legend = c('NAO+', 'BL','AR','NAO-'), + xtitle = "Target month", ytitle = "Lead time", + xlabels = c("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", + "Aug", "Sep", "Oct", "Nov", "Dec")) +} +\author{ +History:\cr +1.0 - 2020-10 (V.Torralba, \email{veronica.torralba@bsc.es}) - Original code +} diff --git a/man/RFTemp.Rd b/man/RFTemp.Rd new file mode 100644 index 0000000000000000000000000000000000000000..106ae6e218f2037ac5987d16e49641ac90c044d8 --- /dev/null +++ b/man/RFTemp.Rd @@ -0,0 +1,114 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/CST_RFTemp.R +\name{RFTemp} +\alias{RFTemp} +\title{Temperature downscaling of a CSTools object using lapse rate +correction (reduced version)} +\usage{ +RFTemp( + data, + lon, + lat, + oro, + lonoro, + latoro, + xlim = NULL, + ylim = NULL, + lapse = 6.5, + lon_dim = "lon", + lat_dim = "lat", + time_dim = NULL, + nolapse = FALSE, + verbose = FALSE, + compute_delta = FALSE, + method = "bilinear", + delta = NULL +) +} +\arguments{ +\item{data}{Temperature array to downscale. +The input array is expected to have at least two dimensions named +"lon" and "lat" by default +(these default names can be changed with the \code{lon_dim} and +\code{lat_dim} parameters)} + +\item{lon}{Vector or array of longitudes.} + +\item{lat}{Vector or array of latitudes.} + +\item{oro}{Array containing fine-scale orography (in m) +The destination downscaling area must be contained in the orography field.} + +\item{lonoro}{Vector or array of longitudes corresponding to the fine orography.} + +\item{latoro}{Vector or array of latitudes corresponding to the fine orography.} + +\item{xlim}{vector with longitude bounds for downscaling; +the full input field is downscaled if `xlim` and `ylim` are not specified.} + +\item{ylim}{vector with latitude bounds for downscaling} + +\item{lapse}{float with environmental lapse rate} + +\item{lon_dim}{string with name of longitude dimension} + +\item{lat_dim}{string with name of latitude dimension} + +\item{time_dim}{a vector of character string indicating the name of temporal dimension. By default, it is set to NULL and it considers "ftime", "sdate" and "time" as temporal dimensions.} + +\item{nolapse}{logical, if true `oro` is interpreted as a +fine-scale climatology and used directly for bias correction} + +\item{verbose}{logical if to print diagnostic output} + +\item{compute_delta}{logical if true returns only a delta to be used for +out-of-sample forecasts.} + +\item{method}{string indicating the method used for interpolation: +"nearest" (nearest neighbours followed by smoothing with a circular +uniform weights kernel), "bilinear" (bilinear interpolation) +The two methods provide similar results, but nearest is slightly better +provided that the fine-scale grid is correctly centered as a subdivision +of the large-scale grid} + +\item{delta}{matrix containing a delta to be applied to the downscaled +input data. The grid of this matrix is supposed to be same as that of +the required output field} +} +\value{ +CST_RFTemp() returns a downscaled CSTools object + +RFTemp() returns a list containing the fine-scale +longitudes, latitudes and the downscaled fields. +} +\description{ +This function implements a simple lapse rate correction of a +temperature field (a multidimensional array) as input. +The input lon grid must be increasing (but can be modulo 360). +The input lat grid can be irregularly spaced (e.g. a Gaussian grid) +The output grid can be irregularly spaced in lon and/or lat. +} +\examples{ +# Generate simple synthetic data and downscale by factor 4 +t <- rnorm(7 * 6 * 4 * 3) * 10 + 273.15 + 10 +dim(t) <- c(sdate = 3, ftime = 4, lat = 6, lon = 7) +lon <- seq(3, 9, 1) +lat <- seq(42, 47, 1) +o <- runif(29 * 29) * 3000 +dim(o) <- c(lat = 29, lon = 29) +lono <- seq(3, 10, 0.25) +lato <- seq(41, 48, 0.25) +res <- RFTemp(t, lon, lat, o, lono, lato, xlim = c(4, 8), ylim = c(43, 46), + lapse = 6.5) +} +\references{ +Method described in ERA4CS MEDSCOPE milestone M3.2: +High-quality climate prediction data available to WP4 +[https://www.medscope-project.eu/the-project/deliverables-reports/]([https://www.medscope-project.eu/the-project/deliverables-reports/) +and in H2020 ECOPOTENTIAL Deliverable No. 8.1: +High resolution (1-10 km) climate, land use and ocean change scenarios +[https://www.ecopotential-project.eu/images/ecopotential/documents/D8.1.pdf](https://www.ecopotential-project.eu/images/ecopotential/documents/D8.1.pdf) +} +\author{ +Jost von Hardenberg - ISAC-CNR, \email{j.vonhardenberg@isac.cnr.it} +} diff --git a/man/RegimesAssign.Rd b/man/RegimesAssign.Rd new file mode 100644 index 0000000000000000000000000000000000000000..c145e8ef6ad5b84478ecdcdbbedc95e47d9c5067 --- /dev/null +++ b/man/RegimesAssign.Rd @@ -0,0 +1,61 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/CST_RegimesAssign.R +\name{RegimesAssign} +\alias{RegimesAssign} +\title{Function for matching a field of anomalies with +a set of maps used as a reference (e.g. clusters obtained from the WeatherRegime function).} +\usage{ +RegimesAssign( + data, + ref_maps, + lat, + method = "distance", + composite = FALSE, + memb = FALSE, + ncores = NULL +) +} +\arguments{ +\item{data}{an array containing anomalies with named dimensions: dataset, member, sdate, ftime, lat and lon.} + +\item{ref_maps}{array with 3-dimensions ('lon', 'lat', 'cluster') containing the maps/clusters that will be used as a reference for the matching.} + +\item{lat}{a vector of latitudes corresponding to the positions provided in data and ref_maps.} + +\item{method}{whether the matching will be performed in terms of minimum distance (default = ’distance’) or +the maximum spatial correlation (method=’ACC’) between the maps.} + +\item{composite}{a logical parameter indicating if the composite maps are computed or not (default=FALSE).} + +\item{memb}{a logical value indicating whether to compute composites for separate members (default FALSE) or as unique ensemble (TRUE). +This option is only available for when parameter 'composite' is set to TRUE and the data object has a dimension named 'member'.} + +\item{ncores}{the number of multicore threads to use for parallel computation.} +} +\value{ +A list with elements \code{$composite} (3-d array (lon, lat, k) containing the composites k=1,..,K for case (*1) + \code{$pvalue} ( array with the same structure as \code{$composite} containing the pvalue of the composites obtained through a t-test + that accounts for the serial dependence of the data with the same structure as Composite.) (only if composite='TRUE'), + \code{$cluster} (array with the same dimensions as data (except latitude and longitude which are removed) indicating the ref_maps to which each point is allocated.) , + \code{$frequency} (A vector of integers (from k = 1, ... k n reference maps) indicating the percentage of assignations corresponding to each map.), +} +\description{ +This function performs the matching between a field of anomalies and a set +of maps which will be used as a reference. The anomalies will be assigned to the reference map +for which the minimum Eucledian distance (method=’distance’) or highest spatial correlation +(method=‘ACC’) is obtained. +} +\examples{ +\dontrun{ +regimes <- WeatherRegime(data = lonlat_data$obs$data, lat = lonlat_data$obs$lat, + EOFs = FALSE, ncenters = 4)$composite +res1 <- RegimesAssign(data = lonlat_data$exp$data, ref_maps = drop(regimes), + lat = lonlat_data$exp$lat, composite = FALSE) +} +} +\references{ +Torralba, V. (2019) Seasonal climate prediction for the wind energy sector: methods and tools for the development of a climate service. Thesis. Available online: \url{https://eprints.ucm.es/56841/} +} +\author{ +Verónica Torralba - BSC, \email{veronica.torralba@bsc.es} +} diff --git a/man/SaveExp.Rd b/man/SaveExp.Rd new file mode 100644 index 0000000000000000000000000000000000000000..40ace2dbab6cc9bb1c84292ce7387f6a3e991dc0 --- /dev/null +++ b/man/SaveExp.Rd @@ -0,0 +1,73 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/CST_SaveExp.R +\name{SaveExp} +\alias{SaveExp} +\title{Save an experiment in a format compatible with CST_Load} +\usage{ +SaveExp( + data, + lon, + lat, + Dataset, + var_name, + units, + startdates, + Dates, + cdo_grid_name, + projection, + destination +) +} +\arguments{ +\item{data}{an multi-dimensional array with named dimensions (longitude, latitude, time, member, sdate)} + +\item{lon}{vector of logitud corresponding to the longitudinal dimension in data} + +\item{lat}{vector of latitud corresponding to the latitudinal dimension in data} + +\item{Dataset}{a vector of character string indicating the names of the datasets} + +\item{var_name}{a character string indicating the name of the variable to be saved} + +\item{units}{a character string indicating the units of the variable} + +\item{startdates}{a vector of dates indicating the initialization date of each simulations} + +\item{Dates}{a matrix of dates with two dimension 'time' and 'sdate'.} + +\item{cdo_grid_name}{a character string indicating the name of the grid e.g.: 'r360x181'} + +\item{projection}{a character string indicating the projection name} + +\item{destination}{a character string indicating the path where to store the NetCDF files} +} +\value{ +the function creates as many files as sdates per dataset. Each file could contain multiple members +The path will be created with the name of the variable and each Datasets. +} +\description{ +This function is created for compatibility with CST_Load/Load for saving post-processed datasets such as those calibrated of downscaled with CSTools functions +} +\examples{ +\dontrun{ +data <- lonlat_data$exp$data +lon <- lonlat_data$exp$lon +lat <- lonlat_data$exp$lat +Dataset <- 'XXX' +var_name <- 'tas' +units <- 'k' +startdates <- lapply(1:length(lonlat_data$exp$Datasets), + function(x) { + lonlat_data$exp$Datasets[[x]]$InitializationDates[[1]]})[[1]] +Dates <- lonlat_data$exp$Dates$start +dim(Dates) <- c(time = length(Dates)/length(startdates), sdate = length(startdates)) +cdo_grid_name = attr(lonlat_data$exp$lon, 'cdo_grid_name') +projection = attr(lonlat_data$exp$lon, 'projection') +destination = './path/' +SaveExp(data, lon, lat, Dataset, var_name, units, startdates, Dates, + cdo_grid_name, projection, destination) +} +} +\author{ +Perez-Zanon Nuria, \email{nuria.perez@bsc.es} +} diff --git a/man/WeatherRegimes.Rd b/man/WeatherRegimes.Rd new file mode 100644 index 0000000000000000000000000000000000000000..3f77841e89095623fb0e8d5b50624e12d28aae97 --- /dev/null +++ b/man/WeatherRegimes.Rd @@ -0,0 +1,79 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/CST_WeatherRegimes.R +\name{WeatherRegime} +\alias{WeatherRegime} +\title{Function for Calculating the Cluster analysis} +\usage{ +WeatherRegime( + data, + ncenters = NULL, + EOFs = TRUE, + neofs = 30, + varThreshold = NULL, + lon = NULL, + lat = NULL, + method = "kmeans", + iter.max = 100, + nstart = 30, + ncores = NULL +) +} +\arguments{ +\item{data}{an array containing anomalies with named dimensions with at least start date 'sdate', forecast time 'ftime', latitude 'lat' and longitude 'lon'.} + +\item{ncenters}{Number of clusters to be calculated with the clustering function.} + +\item{EOFs}{Whether to compute the EOFs (default = 'TRUE') or not (FALSE) to filter the data.} + +\item{neofs}{number of modes to be kept only if EOFs = TRUE has been selected. (default = 30).} + +\item{varThreshold}{Value with the percentage of variance to be explained by the PCs. +Only sufficient PCs to explain this much variance will be used in the clustering.} + +\item{lon}{Vector of longitudes.} + +\item{lat}{Vector of latitudes.} + +\item{method}{Different options to estimate the clusters. The most traditional approach is the k-means analysis (default=’kmeans’) +but the function also support the different methods included in the hclust . These methods are: +"ward.D", "ward.D2", "single", "complete", "average" (= UPGMA), "mcquitty" (= WPGMA), "median" (= WPGMC) or "centroid" (= UPGMC). +For more details about these methods see the hclust function documentation included in the stats package.} + +\item{iter.max}{Parameter to select the maximum number of iterations allowed (Only if method='kmeans' is selected).} + +\item{nstart}{Parameter for the cluster analysis determining how many random sets to choose (Only if method='kmeans' is selected).} + +\item{ncores}{The number of multicore threads to use for parallel computation.} +} +\value{ +A list with elements \code{$composite} (array with at least 3-d ('lat', 'lon', 'cluster') containing the composites k=1,..,K for case (*1) + \code{pvalue} (array with at least 3-d ('lat','lon','cluster') with the pvalue of the composites obtained through a t-test that accounts for the serial + \code{cluster} (A matrix or vector with integers (from 1:k) indicating the cluster to which each time step is allocated.), + \code{persistence} (Percentage of days in a month/season before a cluster is replaced for a new one (only if method=’kmeans’ has been selected.)), + \code{frequency} (Percentage of days in a month/season belonging to each cluster (only if method=’kmeans’ has been selected).), +} +\description{ +This function computes the weather regimes from a cluster analysis. +It can be applied over the dataset with dimensions +c(year/month, month/day, lon, lat), or by using PCs obtained from the application of the +EOFs analysis to filter the dataset. +The cluster analysis can be performed with the traditional k-means or those methods +included in the hclust (stats package). +} +\examples{ +\dontrun{ +res <- WeatherRegime(data = lonlat_data$obs$data, lat = lonlat_data$obs$lat, + EOFs = FALSE, ncenters = 4) +} +} +\references{ +Cortesi, N., V., Torralba, N., González-Reviriego, A., Soret, and F.J., Doblas-Reyes (2019). +Characterization of European wind speed variability using weather regimes. Climate Dynamics,53, +4961–4976, doi:10.1007/s00382-019-04839-5. + +Torralba, V. (2019) Seasonal climate prediction for the wind energy sector: methods and tools +for the development of a climate service. Thesis. Available online: \url{https://eprints.ucm.es/56841/} +} +\author{ +Verónica Torralba - BSC, \email{veronica.torralba@bsc.es} +} diff --git a/tests/testthat/test-CST_EnsClustering.R b/tests/testthat/test-CST_EnsClustering.R index 8c507db3ae0c86992fc51e4d476d5c6f583b2bfd..2e7364d9b7e5c7e44af6b53eacf60210f5193e11 100644 --- a/tests/testthat/test-CST_EnsClustering.R +++ b/tests/testthat/test-CST_EnsClustering.R @@ -37,7 +37,8 @@ test_that("Sanity and Functionality tests", { res <- CST_EnsClustering(exp, numclus = 3, cluster_dim = c("member", "sdate")) expect_equivalent(dim(res$cluster), dim(exp$data)[c(2, 3, 1)]) expect_equivalent(dim(res$freq), c(cluster = 3, dim(exp$data)[1])) - expect_equivalent(dim(res$closest_member), c(cluster = 3, dim(exp$data)[1])) + expect_equivalent(dim(res$closest_member$sdate), c(cluster = 3, + dim(exp$data)[1])) expect_equivalent(dim(res$repr_field), c(cluster = 3, dim(exp$data)[c(5, 6)], dim(exp$data)[1])) expect_equivalent(dim(res$composites), c(cluster = 3, @@ -48,13 +49,13 @@ test_that("Sanity and Functionality tests", { cluster_dim = "member") # The closest member of each cluster should be member of that cluster for (i in 1:3) { - expect_equivalent(res$cluster[res$closest_member[i, 1, 1], 1, 1], i) + expect_equivalent(res$cluster[res$closest_member$member[i, 1, 1], 1, 1], i) } res <- CST_EnsClustering(exp, numclus = 3, numpcs = 8, cluster_dim = c("member")) for (i in 1:3) { - expect_equivalent(res$cluster[res$closest_member[i, 1, 1], 1, 1], i) + expect_equivalent(res$cluster[res$closest_member$member[i, 1, 1], 1, 1], i) } res <- CST_EnsClustering(exp, numclus = 3, variance_explained = 80, diff --git a/tests/testthat/test-CST_RFTemp.R b/tests/testthat/test-CST_RFTemp.R new file mode 100644 index 0000000000000000000000000000000000000000..536e44f0e0092d0266cf253271c5df6138c7666e --- /dev/null +++ b/tests/testthat/test-CST_RFTemp.R @@ -0,0 +1,71 @@ +context("Generic tests") +test_that("Sanity checks and simple use cases", { + # Generate simple synthetic data + t <- rnorm(2 * 6 * 6 * 2 * 3 * 4) * 10 + 273.15 + 10 + dim(t) <- c(dataset = 2, member = 2, sdate = 3, ftime = 4, lat = 6, lon = 6) + lon <- seq(4, 9, 1) + lat <- seq(42, 47, 1) + exp <- list(data = t, lat = lat, lon = lon) + o <- runif(29 * 29) * 3000 + dim(o) <- c(lat = 29, lon = 29) + lon <- seq(3.125, 10.125, 0.25) - 100 + lat <- seq(41.125, 48.125, 0.25) - 60 + oro <- list(data = o, lat = lat, lon = lon) + attr(oro, "class") <- "s2dv_cube" + + expect_error( + res <- CST_RFTemp(exp, oro, xlim = c(1, 3), ylim = c(1, 3)), + paste("Parameter 'data' must be of the class", + "'s2dv_cube', as output by CSTools::CST_Load.")) + attr(exp, "class") <- "s2dv_cube" + + expect_error( + res <- CST_RFTemp(exp, oro, xlim = c(1, 3), ylim = c(1, 3)), + "Orography not available for selected area" + ) + + oro$lon <- oro$lon + 100 + oro$lat <- oro$lat + 60 + + expect_error( + res <- CST_RFTemp(exp, oro, xlim = c(3, 8), ylim = c(43, 46)), + "Downscaling area not contained in input data" + ) + + expect_warning( + resl <- CST_RFTemp(exp, oro, lapse = 6.5), + "Selected time dim: ftime" + ) + + dimexp <- dim(exp$data) + expect_equal(dim(resl$data), c(lon = 16, lat = 16, dimexp["ftime"], + dimexp["sdate"], dimexp["dataset"], + dimexp["member"])) + expect_warning( + resd <- CST_RFTemp(exp, oro, time_dim = c("ftime", "sdate", "member"), + nolapse = TRUE), NA) + dim(resd$data) <- c(16, 16, 4 * 3 * 2 * 2) + resm <- apply(resd$data, c(1, 2), mean) + expect_equal(mean(oro$data[7:22, 7:22]), mean(resm), tolerance = 1e-10) + + # Test precomputation of delta based with nolapse + delta <- CST_RFTemp(exp, oro, time_dim = c("ftime", "sdate", "member"), + nolapse = TRUE, compute_delta = TRUE) + resdnl <- CST_RFTemp(exp, oro, time_dim = c("ftime", "sdate", "member"), + nolapse = TRUE, delta = delta) + expect_equal(mean(resdnl$data), mean(resd$data), tolerance = 1e-10) + + # Test precomputation of delta based with lapse rate + delta <- CST_RFTemp(exp, oro, time_dim = c("ftime", "sdate", "member"), + lapse = 6.5, compute_delta = TRUE) + resdl <- CST_RFTemp(exp, oro, time_dim = c("ftime", "sdate", "member"), + delta = delta) + expect_equal(mean(resdl$data), mean(resl$data), tolerance = 1e-10) + + expect_warning( + resd <- CST_RFTemp(exp, oro, time_dim = c("ftime", "sdate", "member"), + nolapse = TRUE, method = "nearest"), NA) + dim(resd$data) <- c(16, 16, 4 * 3 * 2 * 2) + resm <- apply(resd$data, c(1, 2), mean) + expect_equal(mean(oro$data[7:22, 7:22]), mean(resm), tolerance = 1e-10) +}) diff --git a/tests/testthat/test-CST_RegimesAssign.R b/tests/testthat/test-CST_RegimesAssign.R new file mode 100644 index 0000000000000000000000000000000000000000..09dad48f959ac418538e9520077b825ceafe2264 --- /dev/null +++ b/tests/testthat/test-CST_RegimesAssign.R @@ -0,0 +1,121 @@ +context("Generic tests") +test_that("Sanity checks", { + expect_error( + CST_RegimesAssign(data = 1), + paste0("Parameter 'data' must be of the class 's2dv_cube', as output by ", + "CSTools::CST_Load.")) + + data1 <- 1 : 20 + data1 <- list(data = data1) + class(data1) <- 's2dv_cube' + expect_error( + CST_RegimesAssign(data = data1,ref_maps=1), + paste0("Parameter 'ref_maps' must be of the class 's2dv_cube', as output by ", + "CSTools::CST_Load.")) + + regimes <- 1:20 + dim(regimes) <- c(lat = 5, lon=2, cluster=2) + regimes <- list(data=regimes) + class(regimes) <- 's2dv_cube' + expect_error( + CST_RegimesAssign(data = data1,ref_maps = regimes), + paste0("Parameter 'data' must be an array with named dimensions.")) + + data1 <- 1 : 20 + dim(data1) <- c(lat = 5, lon=4) + data1 <- list(data = data1 , lat=1:5) + class(data1) <- 's2dv_cube' + expect_error( + CST_RegimesAssign(data = data1,ref_maps = regimes), + paste0("Parameter 'data' must have temporal dimensions.")) + + data1 <- 1 : 20 + dim(data1) <- c(time=20) + data1 <- list(data = data1) + class(data1) <- 's2dv_cube' + + expect_error( + CST_RegimesAssign(data = data1,ref_maps = regimes), + paste0("Parameter 'lat' must be specified.")) + + + data1 <- 1 : 20 + dim(data1) <- c(time=20) + data1 <- list(data = data1,lat=1:5) + class(data1) <- 's2dv_cube' + + expect_error( + CST_RegimesAssign(data = data1,ref_maps = regimes), + paste0("Parameter 'data' must contain the named dimensions 'lat' and 'lon'.")) + + data1 <- 1: 20 + dim(data1) <- c(lat = 2, lon=5, time=2) + data1 <- list(data = data1, lat=1:5) + class(data1) <- 's2dv_cube' + + expect_error( + CST_RegimesAssign(data = data1,ref_maps = regimes), + " Parameter 'lat' does not match with the dimension 'lat' in the + parameter 'data' or in the parameter 'ref_maps'.") + + + data1 <- 1: 20 + dim(data1) <- c(lat = 5, lon=2, time=2) + data1 <- list(data = data1, lat=1:5) + class(data1) <- 's2dv_cube' + + expect_equal(names(CST_RegimesAssign(data = data1, ref_maps = regimes)$statistics), + c('cluster', 'frequency')) + + expect_equal(names( + CST_RegimesAssign( + data = data1, + ref_maps = regimes, + composite = TRUE)$statistics), c('pvalue', 'cluster', 'frequency')) + + expect_equal(names(dim( + CST_RegimesAssign( + data = data1, + ref_maps = regimes, + composite = TRUE)$data)), c('lon', 'lat', 'composite.cluster')) + + data1 <- 1: 160 + dim(data1) <- c(lat = 5, lon=2, time=2, member=8) + data1 <- list(data = data1, lat=1:5) + class(data1) <- 's2dv_cube' + + expect_equal(names(dim( + CST_RegimesAssign( + data = data1, + ref_maps = regimes, + composite = TRUE)$data)), c('lon', 'lat', 'composite.cluster', 'member')) + + expect_equal(names(dim( + CST_RegimesAssign( + data = data1, + ref_maps = regimes, + composite = TRUE)$statistics$cluster)), c('time', 'member')) + + regimes <- 1:60 + dim(regimes) <- c(lat = 5, lon=2, cluster=6) + regimes <- list(data=regimes) + class(regimes) <- 's2dv_cube' + expect_equal(max(CST_RegimesAssign(data = data1, ref_maps = regimes, + composite = FALSE)$statistics$cluster), + unname(dim(regimes$data)['cluster'])) + + + regimes <- 1:60 + dim(regimes) <- c(lat = 5, lon=2, cluster=3, member=2) + regimes <- list(data=regimes) + class(regimes) <- 's2dv_cube' + expect_equal(names(dim(CST_RegimesAssign(data = data1, ref_maps = regimes, + composite = FALSE)$statistics$cluster)),c('time','member','member')) + + + + + + + +}) diff --git a/tests/testthat/test-CST_WeatherRegimes.R b/tests/testthat/test-CST_WeatherRegimes.R new file mode 100644 index 0000000000000000000000000000000000000000..5f2967a172f393659dea3c93f8f1cc06986a9099 --- /dev/null +++ b/tests/testthat/test-CST_WeatherRegimes.R @@ -0,0 +1,103 @@ +context("Generic tests") +test_that("Sanity checks", { + expect_error( + CST_WeatherRegimes(data = 1), + paste0("Parameter 'data' must be of the class 's2dv_cube', as output by ", + "CSTools::CST_Load.")) + + data1 <- 1 : 20 + data1 <- list(data = data1) + class(data1) <- 's2dv_cube' + expect_error( + CST_WeatherRegimes(data = data1), + paste0("Parameter 'data' must be an array with named dimensions.")) + + data1 <- 1 : 20 + dim(data1) <- c(lat = 5, lon = 4) + data1 <- list(data = data1 , lat = 1:5) + class(data1) <- 's2dv_cube' + expect_error( + CST_WeatherRegimes(data = data1), + paste0("Parameter 'data' must have temporal dimensions.")) + + data1 <- 1 : 20 + dim(data1) <- c(time = 20) + data1 <- list(data = data1) + class(data1) <- 's2dv_cube' + expect_error( + CST_WeatherRegimes(data = data1) , + paste0("Parameter 'lat' must be specified.")) + + data1 <- 1 : 400 + dim(data1) <- c(time = 20, lat = 5, lon = 4) + data1 <- list(data = data1, lat = 1:5) + class(data1) <- 's2dv_cube' + expect_error( + CST_WeatherRegimes(data = data1), + paste0("Parameter 'ncenters' must be specified.")) + + expect_error( + CST_WeatherRegimes(data = data1, ncenters = 3), + paste0("Parameter 'lon' must be specified.")) + + expect_equal( + names(dim(CST_WeatherRegimes(data = data1, ncenters = 3, EOFs = FALSE)$data)), + c('lat', 'lon', 'cluster')) + + data1 <- 1 : 400 + dim(data1) <- c(sdate = 2, ftime = 10, lat = 5, lon = 4) + data1 <- list(data = data1, lat = 1:5) + class(data1) <- 's2dv_cube' + nclusters <- 3 + + expect_equal( + dim(CST_WeatherRegimes(data = data1 , + ncenters = nclusters, + EOFs = FALSE)$statistics$frequency), c(2, nclusters)) + expect_equal( + names(dim(CST_WeatherRegimes(data = data1, nclusters, EOFs = FALSE)$data)), + c('lat', 'lon', 'cluster')) + + data1 <- 1 : 400 + dim(data1) <- c(sdate = 2, ftime = 10, lat = 5, lon = 4) + data1 <- list(data = data1, lat = 1:5 ,lon = 1:4) + class(data1) <- 's2dv_cube' + + expect_equal( + names(CST_WeatherRegimes(data = data1 , ncenters = 4)$statistics), + c('pvalue', 'cluster', 'frequency', 'persistence')) + + expect_equal( + names(CST_WeatherRegimes(data = data1 , ncenters = 4, method = 'ward.D')$statistics), + c('pvalue', 'cluster')) + + expect_equal( + names(dim(CST_WeatherRegimes(data = data1, ncenters = 4)$data)), + c('lat', 'lon', 'cluster')) + + data1 <- 1 : 400 + dim(data1) <- c(time = 20, lat = 5, lon = 4) + data1[4,,] <- NA + data1 <- list(data = data1, lat = 1:5 ,lon = 1:4) + class(data1) <- 's2dv_cube' + expect_error( + CST_WeatherRegimes(data = data1, ncenters = 3, EOFs = FALSE), + paste0("Parameter 'data' contains NAs in the 'time' dimensions.")) + + data1 <- 1 : 400 + dim(data1) <- c(time = 20, lat = 5, lon = 4) + data1[,2,3] <- NA + data1 <- list(data = data1, lat = 1:5 ,lon = 1:4) + class(data1) <- 's2dv_cube' + expect_equal( + any(is.na(CST_WeatherRegimes(data = data1, ncenters = 3, EOFs = FALSE)$data)), + TRUE) + expect_equal( + names(dim(CST_WeatherRegimes(data = data1, ncenters = 3, EOFs = FALSE)$data)), + c('lat', 'lon', 'cluster')) +}) + + + + + diff --git a/tests/testthat/test-PlotPDFsOLE.R b/tests/testthat/test-PlotPDFsOLE.R new file mode 100644 index 0000000000000000000000000000000000000000..bd1279d9df403c795910d5d9b921f00b6df38592 --- /dev/null +++ b/tests/testthat/test-PlotPDFsOLE.R @@ -0,0 +1,121 @@ +context("Generic tests") + +test_that("Sanity checks", { + pdf_1 <- c(1.1,0.6) + attr(pdf_1, "name") <- "NAO1" + dim(pdf_1) <- c(statistic = 2) + pdf_2 <- c(1,0.5) + attr(pdf_2, "name") <- "NAO2" + dim(pdf_2) <- c(statistic = 2) + + expect_error(PlotPDFsOLE(pdf_1, pdf_2, nsigma = 3, plotfile = "plot.png", + width = 30, height = 15, + units = "cm", dpi = '300') , + "Parameter 'dpi' must be numeric.") + + expect_error(PlotPDFsOLE(pdf_1, pdf_2, nsigma = 3, plotfile = "plot.png", + width = 30, height = 15, + units = 20, dpi = 300) , + "Parameter 'units' must be character") + + expect_error(PlotPDFsOLE(pdf_1, pdf_2, nsigma = 3, plotfile = "plot.png", + width = 30, height = 15, + units = "dm", dpi = 300) , + "Parameter 'units' must be equal to 'in', 'cm' or 'mm'.") + + expect_error(PlotPDFsOLE(pdf_1, pdf_2, nsigma = 3, plotfile = "plot.png", + width = 30, height = '15', + units = "cm", dpi = 300) , + "Parameter 'height' must be numeric.") + + expect_error(PlotPDFsOLE(pdf_1, pdf_2, nsigma = 3, plotfile = "plot.png", + width = list(30), height = 15, + units = "cm", dpi = 300) , + "Parameter 'width' must be numeric.") + + expect_error(PlotPDFsOLE(pdf_1, pdf_2, nsigma = 3, plotfile = 0, + width = 30, height = 15, + units = "cm", dpi = 300) , + paste0("Parameter 'plotfile' must be a character string ", + "indicating the path and name of output png file.")) + + expect_error(PlotPDFsOLE(pdf_1, pdf_2, nsigma = '3', plotfile = NULL, + width = 30, height = 15, + units = "cm", dpi = 300) , + "Parameter 'nsigma' must be numeric.") + + pdf_1 <- list(1.1,0.6) + attr(pdf_1, "name") <- "NAO1" + expect_error(PlotPDFsOLE(pdf_1, pdf_2, nsigma = 3, plotfile = NULL, + width = 30, height = 15, + units = "cm", dpi = 300) , + "Parameter 'pdf_1' must be an array.") + + pdf_1 <- c('1.1','0.6') + attr(pdf_1, "name") <- "NAO1" + dim(pdf_1) <- c(statistic = 2) + expect_error(PlotPDFsOLE(pdf_1, pdf_2, nsigma = 3, plotfile = NULL, + width = 30, height = 15, + units = "cm", dpi = 300) , + "Parameter 'pdf_1' must be a numeric array.") + + pdf_1 <- c(1.1,0.6) + attr(pdf_1, "name") <- "NAO1" + dim(pdf_1) <- c(2) + expect_error(PlotPDFsOLE(pdf_1, pdf_2, nsigma = 3, plotfile = NULL, + width = 30, height = 15, + units = "cm", dpi = 300) , + paste0("Parameters 'pdf_1' and 'pdf_2' ", + "should have dimmension names.")) + + pdf_1 <- c(1.1,0.6) + attr(pdf_1, "name") <- "NAO1" + dim(pdf_1) <- c(statisti = 2) + expect_error(PlotPDFsOLE(pdf_1, pdf_2, nsigma = 3, plotfile = NULL, + width = 30, height = 15, + units = "cm", dpi = 300) , + "Parameter 'pdf_1' must have dimension 'statistic'.") + + pdf_1 <- c(1.1,0.6) + attr(pdf_1, "name") <- "NAO1" + dim(pdf_1) <- c(statistic = 2, model = 1) + expect_error(PlotPDFsOLE(pdf_1, pdf_2, nsigma = 3, plotfile = NULL, + width = 30, height = 15, + units = "cm", dpi = 300) , + "Parameter 'pdf_1' must have only dimension 'statistic'.") + + pdf_1 <- c(1.1, 0.6, 0.2) + attr(pdf_1, "name") <- "NAO1" + dim(pdf_1) <- c(statistic = 3) + expect_error(PlotPDFsOLE(pdf_1, pdf_2, nsigma = 3, plotfile = NULL, + width = 30, height = 15, + units = "cm", dpi = 300) , + paste0("Length of dimension 'statistic'", + "of parameter 'pdf_1' and 'pdf_2' must be equal to 2.")) + + pdf_1 <- c(1.1, 0.6) + attr(pdf_1, "name") <- 12 + dim(pdf_1) <- c(statistic = 2) + expect_error(PlotPDFsOLE(pdf_1, pdf_2, nsigma = 3, plotfile = NULL, + width = 30, height = 15, + units = "cm", dpi = 300) , + paste0("The 'name' attribute of parameter 'pdf_1' must be a character ", + "indicating the name of the variable of parameter 'pdf_1'.")) + + pdf_1 <- c(1.1,0.6) + attr(pdf_1, "name") <- "NAO1" + dim(pdf_1) <- c(statistic = 2) + pdf_2 <- c(1,0.5) + attr(pdf_2, "name") <- 12 + dim(pdf_2) <- c(statistic = 2) + + expect_error(PlotPDFsOLE(pdf_1, pdf_2, nsigma = 3, plotfile = NULL, + width = 30, height = 15, + units = "cm", dpi = 300) , + paste0("The 'name' attribute of parameter 'pdf_2' must be a character ", + "indicating the name of the variable of parameter 'pdf_2'.")) + + + + +}) diff --git a/tests/testthat/test-PlotTriangles4Categories.R b/tests/testthat/test-PlotTriangles4Categories.R new file mode 100644 index 0000000000000000000000000000000000000000..8105cc91388aba7a1ac20f80041907a5d7a2de0d --- /dev/null +++ b/tests/testthat/test-PlotTriangles4Categories.R @@ -0,0 +1,65 @@ +context("Generic tests") +test_that("Sanity checks", { + expect_error( + PlotTriangles4Categories(data = 1:20), + paste0("Parameter 'data' must be an array with three dimensions.")) + + data1 <- array(runif(min = -1, max = 1, n = 30), dim=c(5,3,2)) + expect_error( + PlotTriangles4Categories(data = data1), + paste0("Parameter 'data' must be an array with named dimensions.")) + + data1 <- runif(min = -1, max = 1, n = 30) + dim(data1) <- c(dim1 = 5, dim2 = 2, dim3 = 3) + expect_error( + PlotTriangles4Categories(data = data1), + paste0("Parameter 'data' should contain 'dimx', 'dimy' and 'dimcat' dimension names.")) + + data1 <- runif(min = -1, max = 1, n = 30) + dim(data1) <- c(dimx = 5, dimy =2 , dimcat=3) + expect_error( + PlotTriangles4Categories(data = data1), + paste0("Parameter 'data' should contain a dimcat dimension with length equals + to two or four as only two or four categories can be plotted")) + + data1 <- runif(min = -1, max = 1, n = 30) + data1[5:10] <- NA + dim(data1) <- c(dimx = 5, dimy =3 , dimcat=2) + expect_error( + PlotTriangles4Categories(data = data1), + paste0("Parameter 'data' cannot contain NAs.")) + + data1 <- runif(min = -1, max = 1, n = 30) + dim(data1) <- c(dimx = 5, dimy =3 , dimcat=2) + expect_error( + PlotTriangles4Categories(data = data1,sig_data = 0.5), + paste0("Parameter 'sig_data' array must be logical.")) + + expect_error( + PlotTriangles4Categories(data = data1, sig_data = TRUE), + paste0("Parameter 'sig_data' must be an array with three dimensions.")) + + sig1 <- array(TRUE, dim=c(5,2,3)) + expect_error( + PlotTriangles4Categories(data = data1, sig_data = sig1), + paste0("Parameter 'sig_data' must be an array with the same dimensions as 'data'")) + + sig1 <- array(TRUE, dim= c(5,3,2)) + dim(sig1) <- c(dimy = 5, dimx =3 , dimcat=2) + expect_error( + PlotTriangles4Categories(data = data1, sig_data = sig1), + paste0("Parameter 'sig_data' must be an array with the same named dimensions as 'data'.")) + + data1 <- runif(min = -1, max = 1, n = 30) + dim(data1) <- c(dimx = 5, dimy =3 , dimcat=2) + expect_error( + PlotTriangles4Categories(data = data1, lab_legend = c('1','2','3')), + paste0("Parameter 'lab_legend' should contain two or four names.")) + +expect_error( + PlotTriangles4Categories(data = data1, brks=c(-1,0,1),cols=c('blue','red','black')), + paste0("The length of the parameter 'brks' must be one more than 'cols'.")) + + + +}) diff --git a/vignettes/BestEstimateIndex_vignette.Rmd b/vignettes/BestEstimateIndex_vignette.Rmd index e577eeb2cffbfe417333d123d76b48c3f52d8179..e549533f389fec6b3eb074a22cdf9d61109cfdf3 100644 --- a/vignettes/BestEstimateIndex_vignette.Rmd +++ b/vignettes/BestEstimateIndex_vignette.Rmd @@ -1,134 +1,174 @@ ---- -author: "Eroteida Sánchez-García" -date: "`r Sys.Date()`" -output: rmarkdown::html_vignette -vignette: > - %\VignetteEngine{knitr::knitr} - %\VignetteIndexEntry{Achiving Best Estimate Index} - %\usepackage[utf8]{inputenc} ---- - -Achiving the Precipitation Best prediction giving the NAO index -------------------------------------------------------------------- - -The boreal winter precipitation forecast, acumulated from November to March, can be improved by considering the NAO index. The first step is to find the best estimation of winter NAO, given by two Seasonal Forecast System (SFS). The second step is to employ the enhanced NAO index pdf to produce weights for a SFS (it could be the same or a different SFS from the previous one). The third step is to apply this weights to a precipitation field. The methodology has been proved to improve the skill on precipitation forecast in the iberian peninsula given the relation between the winter precipitation and the NAO index in seasonal time scale (Sánchez-García, E., Voces-Aboy, J., Navascués, N., & Rodríguez-Camino, E. (2019). Regionally improved seasonal forecast of precipitation through Best estimation of winter NAO, Adv. Sci. Res., 16, 165174, ). - -This document is aim to ilustrate a practical use of the functions included in CSTools applying this methodology. - - -## Loading packages and data - -Open an R sesion and load library CSTools: - -``` -library(CSTools) -``` -The required data to applied this methodology are: -- the winter index NAO for two (or three) different SFS in a reference period (hindcast) and in a future simulation (forecast), -- the observed (reconstructed) NAO index in the reference period and -- the acumulated precipitation field from a SFS that aims to be improved (hindcast or hindcast and forecast) - -Given the memory limitations, the following example uses synthetic data. - -The first SFS is a dynamical model containing 25 ensemble members and its output will be save in the object `NAO_hind1` for the 20 years reference period and `NAO_fcst1` for a next season forecast. -The second SFS is a empirical model, so, the `NAO_hind2` and `NAO_fcst2` are characterized by a mean and standard deviation saved in the 'statistics' dimension. - -The synthetic data is created by running the following lines: - - -``` -NAO_obs <- rnorm(3, sd=3) -dim(NAO_obs) <- c(time = 3) -NAO_hind1 <- rnorm(3 * 6, sd=3) -dim(NAO_hind1) <- c(time = 3, member = 6) -NAO_hind2_mean <- rnorm(3, sd=3) -NAO_hind2_sd <- rnorm(3, mean=5, sd=1) -NAO_hind2 <- cbind(NAO_hind2_mean,NAO_hind2_sd) -dim(NAO_hind2) <- c(time=3, statistic=2) - -``` - -The acumulated precipiation field (only march for memory limiations) is loaded by running: - -``` -lonlat_prec$data <- apply(lonlat_prec$data, c(1, 2, 3, 5, 6), sum) -``` - - -## 1- Best Estimate Index NAO - -The function `PDFBest` performes the next steps: - - improves the PDF NAO index for each SFS (it could use bias correction method) and - - does a statistical combination of these improved NAO indexes. -Its output is an array containing the parameters (mean and starndar deviation) of the PDF for the reference period. - - -``` -pdf_hind_best <- BEI_PDFBest(NAO_obs, NAO_hind1, NAO_hind2, index_fcst1 = NULL, - index_fcst2 = NULL, method_BC = 'none', - time_dim_name = 'time', na.rm = FALSE) -``` - - -## 2- Compute weights using the Best Estimation of Index NAO - - -An array of weights is calculated for a SFS. This SFS could be a different SFS than the ones in section 1. -The function WeightIndex computes these weights for each ensemble's member based on the best NAO PDF estimate. - - -``` -weights_hind <- BEI_Weights(NAO_hind1, pdf_hind_best) -names(dim(weights_hind))[1] <- 'sdate' -``` - -The expected dimensions of these weights are 'member' and temporal dimension. - - -## 3- Apply weights to a precipitation field - - -The function `CST_BEI_Weighting` computes the ensemble mean or the terciles probabilities for a climate variable. -The corrected precipitation field is obtained by running: - -``` -em_prec <- CST_BEI_Weighting(lonlat_prec, weights_hind, type = 'ensembleMean', time_dim_name = 'sdate') -prob_prec <- CST_BEI_Weighting(lonlat_prec, weights_hind, type = 'probs', ) -``` - -## Comparison and visualization - -The original model output can be compared against the BEI corrected field. -To do this a equiprobability weights are created and they are applied to the original precipitation field: - - -``` -aweights_raw <- rep(1/6, 3 * 6) -dim(aweights_raw) <- dim(weights_hind) - -em_prec_raw <- CST_BEI_Weighting(lonlat_prec, aweights_raw, type = 'ensembleMean', time_dim_name = 'sdate') -prob_prec_raw <- CST_BEI_Weighting(lonlat_prec, aweights_raw, type = 'probs', time_dim_name = 'sdate') - -``` - -A map with the probability that the total precipitation will be in the lower/normal/upper tercile based on the Best Estimate Index NAO could be obtained using the 'PlotEquiMap' function or 'PlotMostLikelyQuantileMap' function from 'CSToools' package. - -The following figures show the probabilities for lower tercile for precipitation from November to March 2012/13 for ECMWF S5 system applying the methodology exposed or not, obtained using real data: -- NAO indices from the ECMWF-S5 dynamical model and the S-ClimWaRe empirical model from AEMET from 1997 to 2016 for computing the Best Estimation of Index NAO fro this hindcast period. -- The winter precipitation (from November to March) from 1997 to 2016 over Iberia Peninsula from he ECMWF-S5 dynamical model with resolution 0.5º x 0.5º, to weighting with the previous Best Estimation of Index NAO. - - -![](./Figures/BestEstimateIndex_fig1.png){width=70%} - -![](./Figures/BestEstimateIndex_fig3.png){width=70%} - - - -In a similar way we can plot the map with the probability that the total precipitation from November 2012 to -March 2013, for example, will be in the lower tercile from ECMWF Seasonal Forecast System 5 (raw) to compare results running the code: - - - -![](./Figures/BestEstimateIndex_fig2.png){width=70%} - +--- +author: "Eroteida Sánchez-García" +date: "`r Sys.Date()`" +output: rmarkdown::html_vignette +vignette: > + %\VignetteEngine{knitr::knitr} + %\VignetteIndexEntry{Achiving Best Estimate Index} + %\usepackage[utf8]{inputenc} +--- + +Achiving the Precipitation Best prediction giving the NAO index +------------------------------------------------------------------- + +The boreal winter precipitation forecast, acumulated from November to March, can be improved by considering the NAO index. The first step is to find the best estimation of winter NAO, given by two Seasonal Forecast System (SFS). The second step is to employ the enhanced NAO index pdf to produce weights for a SFS (it could be the same or a different SFS from the previous one). The third step is to apply this weights to a precipitation field. The methodology has been proved to improve the skill on precipitation forecast in the iberian peninsula given the relation between the winter precipitation and the NAO index in seasonal time scale (Sánchez-García, E., Voces-Aboy, J., Navascués, N., & Rodríguez-Camino, E. (2019). Regionally improved seasonal forecast of precipitation through Best estimation of winter NAO, Adv. Sci. Res., 16, 165174, ). + +This document is aim to ilustrate a practical use of the functions included in CSTools applying this methodology. + + +## Loading packages and data + +Open an R sesion and load library CSTools: + +``` +library(CSTools) +``` +The required data to applied this methodology are: +- the observed (reconstructed) NAO index in the reference period +- the winter index NAO for two different SFSs (SFS1 and SFS2, to combine both of them) in a reference period (hindcast) and in a future simulation (forecast) +- the winter index NAO and the acumulated precipitation field from a SFS that aims to be improved (hindcast and forecast) + +Given the memory limitations, the following example uses synthetic data. + +The SFS1 system is a dynamical model containing 25 ensemble members and its output will be save in the object `NAO_hind1` for the 20 years reference period and `NAO_fcst1` for a next season forecast. +The second SFS, SFS2, is a empirical model, so, the `NAO_hind2` and `NAO_fcst2` are characterized by a mean and standard deviation saved in the 'statistics' dimension. +The model for improving is a dynamical model containing 25 ensemble members. +The synthetic data is created by running the following lines: + + +``` +# observations +NAO_obs <- rnorm(20, sd=3) +dim(NAO_obs) <- c(time = 20) + +# hindcast and forecast of a dynamical SFS 1 +NAO_hind1 <- rnorm(20 * 2 * 25, sd=2.5) +dim(NAO_hind1) <- c(time = 20, member = 50) +NAO_fcst1 <- rnorm(2*51, sd=2.5) +dim(NAO_fcst1) <- c(time = 1, member = 102) + +# hindcast and forecast of an empirical SFS 2 +NAO_hind2_mean <- rnorm(20, sd=3) +NAO_hind2_sd <- rnorm(20, mean=5, sd=1) +NAO_hind2 <- cbind(NAO_hind2_mean, NAO_hind2_sd) +dim(NAO_hind2) <- c(time=20, statistic=2) +NAO_fcst2_mean <- rnorm(1, sd=3) +NAO_fcst2_sd <- rnorm(1, mean=5, sd=1) +NAO_fcst2 <- cbind(NAO_fcst2_mean, NAO_fcst2_sd) +dim(NAO_fcst2) <- c(time=1, statistic=2) + +``` + +The winter index NAO and the acumulated precipiation field from the dynamical SFS that aims to be improved could be created by running: + +``` +# NAO index of a SFS to compute weights for each ensemble's member +NAO_hind <- rnorm(20 * 25, sd=2.5) +dim(NAO_hind) <- c(time = 20, member = 25) +NAO_fcst <- rnorm(51, sd=2.5) +dim(NAO_fcst) <- c(time = 1, member = 51) + +# The acumulated precipiation field +prec_hind <- rnorm(20 * 25 * 21 * 31, mean=30, sd=10) +dim(prec_hind) <- c(time = 20, member = 25, lat = 21, lon = 31) +prec_hind <- list(data = prec_hind) +class(prec_hind) <- 's2dv_cube' +prec_fcst <- rnorm(51 * 21 * 31, mean=25,sd=8) +dim(prec_fcst) <- c(time = 1, member = 51, lat = 21, lon = 31) +prec_fcst <- list(data = prec_fcst) +class(prec_fcst) <- 's2dv_cube' + +``` + + +## 1- Best Estimate Index NAO + +The function `PDFBest` performes the next steps: + - improves the PDF NAO index for each SFS (it could use bias correction method) and + - does a statistical combination of these improved NAO indexes. +Its output is an array containing the parameters (mean and starndar deviation) of the PDF for the reference period (hindcast) or forecast period. + + +``` +# for hindcast +pdf_hind_best <- BEI_PDFBest(NAO_obs, NAO_hind1, NAO_hind2, index_fcst1 = NULL, + index_fcst2 = NULL, method_BC = 'none', + time_dim_name = 'time', na.rm = FALSE) +# for forecast +pdf_fcst_best <- BEI_PDFBest (NAO_obs, NAO_hind1, NAO_hind2, index_fcst1 = NAO_fcst1, + index_fcst2 = NAO_fcst2, method_BC = 'none', + time_dim_name = 'time', na.rm = FALSE) +``` + + +## 2- Compute weights using the Best Estimation of Index NAO + + +An array of weights is calculated for the SFS. This SFS could be the same or different SFS than the ones in section 1. +The function WeightIndex computes these weights for each ensemble's member based on the best NAO PDF estimate. + + +``` +# for hindcast +weights_hind <- BEI_Weights(NAO_hind, pdf_hind_best) +# for forecast +weights_fcst <- BEI_Weights(NAO_fcst, pdf_fcst_best) + +``` + +The expected dimensions of these weights are 'member' and temporal dimension. + + +## 3- Apply weights to a precipitation field + + +The function `CST_BEI_Weighting` computes the ensemble mean or the terciles probabilities for a climate variable. +The ensemble mean and the probabilities of terciles from the weighted precipitation field is obtained by running: + +``` +# for hindcast +em_prec_hind <- CST_BEI_Weighting(prec_hind, weights_hind, type = 'ensembleMean') +prob_prec_hind <- CST_BEI_Weighting(prec_hind, weights_hind, type = 'probs') + +# for forecast +em_prec_fcst <- CST_BEI_Weighting(prec_fcst, weights_fcst, type = 'ensembleMean') +prob_prec_fcst <- CST_BEI_Weighting(prec_fcst, weights_fcst, type = 'probs') + +``` + +## Comparison and visualization + +The original model output can be compared against the BEI corrected field. +To do this a equiprobability weights are created for hindcast period and they are applied to the original precipitation field: + + +``` +aweights_raw <- rep(1/25, 20 * 25) +dim(aweights_raw) <- dim(weights_hind) + +em_prec_raw <- CST_BEI_Weighting(prec_hind, aweights_raw, type = 'ensembleMean') +prob_prec_raw <- CST_BEI_Weighting(prec_hind, aweights_raw, type = 'probs') + +``` + +A map with the probability that the total precipitation will be in the lower/normal/upper tercile based on the Best Estimate Index NAO could be obtained using the 'PlotEquiMap' function or 'PlotMostLikelyQuantileMap' function from 'CSToools' package. + +The following figures show the probabilities for lower tercile for precipitation from November to March 2012/13 for ECMWF S5 system applying the methodology exposed or not, obtained using real data: +- NAO indices from the ECMWF-S5 dynamical model and the S-ClimWaRe empirical model from AEMET from 1997 to 2016 for computing the Best Estimation of Index NAO fro this hindcast period. +- The winter precipitation (from November to March) from 1997 to 2016 over Iberia Peninsula from he ECMWF-S5 dynamical model with resolution 0.5º x 0.5º, to weighting with the previous Best Estimation of Index NAO. + + +![](./Figures/BestEstimateIndex_fig1.png){width=70%} + +![](./Figures/BestEstimateIndex_fig3.png){width=70%} + + + +In a similar way we can plot the map with the probability that the total precipitation from November 2012 to +March 2013, for example, will be in the lower tercile from ECMWF Seasonal Forecast System 5 (raw) to compare results running the code: + + + +![](./Figures/BestEstimateIndex_fig2.png){width=70%} + ![](./Figures/BestEstimateIndex_fig4.png){width=70%} \ No newline at end of file diff --git a/vignettes/ENSclustering_vignette.Rmd b/vignettes/ENSclustering_vignette.Rmd new file mode 100644 index 0000000000000000000000000000000000000000..af8d5c6fd2fac1820509ff1426bef452bf861379 --- /dev/null +++ b/vignettes/ENSclustering_vignette.Rmd @@ -0,0 +1,225 @@ +--- +author: "Ignazio Giuntoli and Federico Fabiano - CNR-ISAC" +date: "`r Sys.Date()`" +output: rmarkdown::html_vignette +vignette: > + %\VignetteEngine{knitr::knitr} + %\VignetteIndexEntry{Ensemble Clustering} + %\usepackage[utf8]{inputenc} +--- + +Ensemble clustering +----------------------------------------- + +### Introduction +Ensemble forecasts provide insight on average weather conditions from sub-seasonal to seasonal timescales. With large ensembles, it is useful to group members according to similar characteristics and to select the most representative member of each cluster. This allows to characterize forecast scenarios in a multi-model (or single model) ensemble prediction. In particular, at the regional level, this function can be used to identify the subset of ensemble members that best represent the full range of possible outcomes that may be fed in turn to downscaling applications. The choice of the ensemble members can be done by selecting different options within the function in order to meet the requirements of specific climate information products that can be tailored to different regions and user needs. + +For a detailed description of the tool please refer to the CSTools guide : + +### Steps of the vignette + +This vignette consists of three main steps: 1. Preliminary setup, 2. Loading the data, 3. Launching Ensemble clustering, 4. Results retrieval, 5. Results mapping, 6. Final notes. + +### 1. Preliminary setup + +To run this vignette, install and load Library *CSTools* + +```r +install.packages("CSTools") +library(CSTools) +``` + +and the following packages: + +```r +install.packages("s2dv") +library(s2dv) +``` + +### 2. Loading the data + +For our example we will use the sample seasonal temperature data provided within the CSTools package. +Data can be loaded as follows: + +```r +datalist <- CSTools::lonlat_data$exp +``` + +The data will has the following dimension: + +```r +dim(datalist$data) +dataset member sdate ftime lat lon + 1 15 6 3 22 53 +``` + +Therefore the number of members is 15, the number of start dates is 6, while the forecast time steps are 3. The lat and lon dimensions refer to a 22x53 grid. + +### 3. Launching Ensemble clustering + +Prior to launching EnsClustering let's define the number of clusters, e.g., 4 + +```r +numcl = 4 +``` + +Let's launch the clustering using 4 clusters (numclus), 4 EOFs (numpcs), 'mean' as moment to apply to the time dimension (time_moment), and both members and starting dates, i.e. 'c("member","sdate")', as dimension over which the clustering is performed (cluster_dim). + +```r +results <- CST_EnsClustering(datalist, numclus = numcl, numpcs = 4, + time_moment = 'mean', cluster_dim = c('member', 'sdate')) +``` + +The EnsClustering produces the following outputs saved in object results: + +```r +names(results) +#[1] "cluster" "freq" "closest_member" "repr_field" +#[5] "composites" "lat" "on" + +``` + +where: + +- $cluster contains the cluster assigned for each 'member' and 'sdate' +- $freq contains relative frequency by 'cluster' +- $closest_member the representative member for each 'cluster' +- $repr_field contains list of fields for each representative member 'cluster', 'lat' and 'lon' +- $composites contains the list of mean fields for 'cluster', 'lat' and 'lon' +- $lat contains the 22 grid point latitudes +- $lon containing the 53 grid point longitudes + +So, for instance, the overall frequency per cluster can be displayed by querying the '$freq' in 'results' obtaining: + +```r +results$freq + + [,1] +[1,] 35.55556 +[2,] 27.77778 +[3,] 21.11111 +[4,] 15.55556 +``` + +Further, the cluster number to which each 'member - start-date' pair is assigned can be displayed by quering '$cluster' in 'results' as shown below (members (15) are in row and the start-dates (6) are in column (i.e. 15*6 pairs). + +```r +results$cluster + + [,1] [,2] [,3] [,4] [,5] [,6] + [1,] 3 2 1 1 4 2 + [2,] 2 4 4 2 3 1 + [3,] 3 3 1 3 4 2 + [4,] 2 1 1 2 1 1 + [5,] 4 4 1 3 3 1 + [6,] 4 1 1 2 4 1 + [7,] 1 2 1 2 2 1 + [8,] 4 2 1 2 4 1 + [9,] 3 3 1 1 4 3 +[10,] 1 2 1 2 4 1 +[11,] 3 3 2 3 3 1 +[12,] 3 1 2 1 2 2 +[13,] 2 1 1 3 3 1 +[14,] 2 2 4 1 1 2 +[15,] 3 4 1 2 3 2 +``` + +### 4. Results retrieval + +Let's keep in mind that the aim is that of identifying 'members - start-date' pairs to be retained in order to reduce the ensemble while keeping a good representativeness. +To achieve this we can get an idea of how the clustering has performed by looking at the average cluster spatial patterns, which are found in the 'composites' argument of 'results' and have the following dimensions (we will map them at point 5): + +```r +dim(results$composites) +cluster lat lon dataset + 4 22 53 1 +``` + +while the 'repr_field' argument of 'results' provides the spatial pattern of the member lying closes to the centroid (has the same dimensions as those of 'composites'): + +```r +dim(results$repr_field) +cluster lat lon dataset + 4 22 53 1 +``` + +Finally, the pairs 'member - start-dates' to be picked as representative for each cluster are found in the 'closest_member' argument. + +```r +results$closest_member + +$member + [,1] +[1,] 6 +[2,] 1 +[3,] 11 +[4,] 6 + +$sdate + [,1] +[1,] 6 +[2,] 6 +[3,] 1 +[4,] 5 +``` + +### 5. Results mapping + +The following lines produce a multiplot of the representative temperature anomalies spatial pattern per cluster (Figure 1). +These are actually the closest realizations (member - start-date pair) to the cluster centroids noted as 'repr_field' above. + +```r +EnsMean <- MeanDims(datalist $data, c('member', 'sdate', 'ftime')) +EnsMean <- InsertDim(Reorder(EnsMean, c("lat", "lon", "dataset")), + posdim = 1, lendim = 4, name = 'cluster') + +PlotLayout(PlotEquiMap, plot_dims = c("lat", "lon"), + var = results$repr_field[,,,1] - EnsMean[,,,1], + lon = results$lon, lat = results$lat, filled.continents = FALSE, + titles = c("1","2","3","4"), brks = seq(-2, 2, 0.5), + fileout = "EnsClus_4clus_both_mem_std_Fig1.png") +``` + +![Figure 1 - Representative temperature anomalies of each cluster.](./Figures/EnsClus_4clus_both_mem_std_Fig1.png){width=100%} + + + +The lines below produce a multiplot of the temperature anomaly patterns for each 'member - start-date' pair and the respective cluster to which they are assigned (Figure 2). We limit the multiplot to the first 24 out of 90 combinations (i.e. 15 members*6 start_dates = 90) for figure clarity. + +```r +ExpMean <- MeanDims(datalist$data, 'ftime') +EnsMean <- InsertDim(InsertDim(InsertDim(EnsMean[1,,,], + posdim = 1, lendim = 6, name = 'sdate'), + posdim = 1, lendim = 15, name = 'member'), + posdim = 1, lendim = 1, name = 'dataset') +ExpMeanSd <- Reorder(ExpMean - EnsMean, c('dataset', 'sdate', 'member' , 'lat', 'lon')) +PlotLayout(PlotEquiMap, plot_dims = c("lat", "lon"), + var = ExpMeanSd[, , 1:4, , ], title_scale = 0.7, + ncol = 6, nrow = 4, row_titles = paste('member' , 1:4), col_titles = paste('sdate', 1:6), + lon = results$lon, lat = results$lat, filled.continents = FALSE, + titles = as.character(t(results$cluster[1:4, 1:6,])), brks = seq(-2, 2, 0.5), + width = 24, height = 20, size_units = 'cm', + fileout = "EnsClus_4clus_both_mem_std_Fig2.png") +``` + +![Figure 2 - Temperature anomalies pattern per 'member - start_date' pair with cluster to which it is assigned displayed as map title.](./Figures/EnsClus_4clus_both_mem_std_Fig2.png){width=100%} + +### Final notes + +It should be noted that the clustering can be carried out with respect to 'members' or 'start-dates' alone. In which case the selection is no longer done over all 'member - start-dates' pairs (90) but along the 15 members (with all start-dates) or along the 6 start-dates (with all members). +The clustering command with respect to 'members' is listed below: + +```r +print('clustering over members') +results_memb <- CST_EnsClustering(datalist, numclus = numcl, numpcs = 4, time_moment = 'mean', cluster_dim = 'member') +``` + +Also, in addition to 'mean', the 'time_moment' can be set to 'sd', i.e. standard deviation along time, or 'perc', i.e. a percentile along time (in which case the 'time_percentile' argument is also needed). The latter option 'perc' may be useful when considering extremes, so for instance while analysing extreme temperatures with a high percentile (e.g. 90). +The clustering command with respect to 'member - start-dates' pairs over the 90th percentile of temperature (i.e. high temperatures) is listed below: + +```r +print('clustering using time_percentile') +results_perc <- CST_EnsClustering(datalist, numclus = numcl, numpcs = 4, time_moment = 'perc', time_percentile = 90, cluster_dim = c('member', 'sdate')) +``` + +Finally, the parameter 'time_dim' lets the user specify the dimension(s) over which the 'time_moment' is applied (i.e. mean, sd, perc). If 'time_dim' is not specified, the dimension is sought automatically picking the first between 'ftime', 'sdate', and 'time'. diff --git a/vignettes/Figures/EnsClus_4clus_both_mem_std_Fig1.png b/vignettes/Figures/EnsClus_4clus_both_mem_std_Fig1.png new file mode 100644 index 0000000000000000000000000000000000000000..9bfcb5f7777d671db02852407e2ba69bf1997885 Binary files /dev/null and b/vignettes/Figures/EnsClus_4clus_both_mem_std_Fig1.png differ diff --git a/vignettes/Figures/EnsClus_4clus_both_mem_std_Fig2.png b/vignettes/Figures/EnsClus_4clus_both_mem_std_Fig2.png new file mode 100644 index 0000000000000000000000000000000000000000..ed3938e4be4b1498e69302090f2ae364fb6f015f Binary files /dev/null and b/vignettes/Figures/EnsClus_4clus_both_mem_std_Fig2.png differ diff --git a/vignettes/Figures/Obs_Persistence.png b/vignettes/Figures/Obs_Persistence.png new file mode 100644 index 0000000000000000000000000000000000000000..6e4c33058489ed0491441ea7d77179861cbd1ebc Binary files /dev/null and b/vignettes/Figures/Obs_Persistence.png differ diff --git a/vignettes/Figures/PlotForecastPDF_ex1.png b/vignettes/Figures/PlotForecastPDF_ex1.png new file mode 100644 index 0000000000000000000000000000000000000000..d7067133054902e6b661f828f8a89b0bb285aa68 Binary files /dev/null and b/vignettes/Figures/PlotForecastPDF_ex1.png differ diff --git a/vignettes/Figures/PlotForecastPDF_ex2.png b/vignettes/Figures/PlotForecastPDF_ex2.png new file mode 100644 index 0000000000000000000000000000000000000000..67b4af1e1b523c564047fb545baecc11a4dbac2c Binary files /dev/null and b/vignettes/Figures/PlotForecastPDF_ex2.png differ diff --git a/vignettes/Figures/PlotForecastPDF_ex3.png b/vignettes/Figures/PlotForecastPDF_ex3.png new file mode 100644 index 0000000000000000000000000000000000000000..10b1e63b4766e8aff8d10419b849d8677f197c58 Binary files /dev/null and b/vignettes/Figures/PlotForecastPDF_ex3.png differ diff --git a/vignettes/Figures/PlotForecastPDF_ex4.png b/vignettes/Figures/PlotForecastPDF_ex4.png new file mode 100644 index 0000000000000000000000000000000000000000..34df30d73e73f1d1d8e5ff4c34e57c2d487f4c2f Binary files /dev/null and b/vignettes/Figures/PlotForecastPDF_ex4.png differ diff --git a/vignettes/Figures/observed_regimes.png b/vignettes/Figures/observed_regimes.png new file mode 100644 index 0000000000000000000000000000000000000000..1bb134a108ea158a23e743b3d4d0ca94634ff47e Binary files /dev/null and b/vignettes/Figures/observed_regimes.png differ diff --git a/vignettes/Figures/predicted_regimes.png b/vignettes/Figures/predicted_regimes.png new file mode 100644 index 0000000000000000000000000000000000000000..354afb7db5ff6d9d219ad6445abcd0976070d6f6 Binary files /dev/null and b/vignettes/Figures/predicted_regimes.png differ diff --git a/vignettes/MultiModelSkill_vignette.Rmd b/vignettes/MultiModelSkill_vignette.Rmd index 8b4c6a7c25e69f53ebb8c92df217e7f8f8fda4e3..734652fe67fdd3416825c40a7cf9d81673dd740f 100644 --- a/vignettes/MultiModelSkill_vignette.Rmd +++ b/vignettes/MultiModelSkill_vignette.Rmd @@ -155,7 +155,7 @@ While other relevant data is being stored in the corresponding element of the ob [1] "glosea5" "ecmwf/system4_m1" "meteofrance/system5_m1" "erainterim" ``` -In the second element of the list `AnomDJF`, `metric`, the third dimension contains the lower limit of the 95% confidence interval, the correlation, the upper limit of the 95% confidence interval and the 95% significance level given by a one-sided T-test. +In the element $data of the `AnomDJF` object, the third dimension contains the lower limit of the 95% confidence interval, the correlation, the upper limit of the 95% confidence interval and the 95% significance level given by a one-sided T-test. ```r diff --git a/vignettes/PlotForecastPDF.Rmd b/vignettes/PlotForecastPDF.Rmd index c6d8961828e70e8a4b17b4bb6a4144258402d140..98b2ae144003e7d876689a827325cdf0eadf9fdc 100644 --- a/vignettes/PlotForecastPDF.Rmd +++ b/vignettes/PlotForecastPDF.Rmd @@ -1,5 +1,5 @@ --- -author: "Francesc Roura and Llorenç Lledó" +author: "Francesc Roura-Adserias and Llorenç Lledó" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > @@ -23,8 +23,9 @@ library(CSTools) The first step is to put your forecasts in an appropriate format. For this vignette we generate some random values from two normal distributions. The PlotForecastPDF by default will plot the ensemble members, the estimated density distributions and the tercile probabilities. ```{r,fig.show = 'hide',warning=F} -fcst <- data.frame(fcst1=rnorm(mean=25,sd=3,n=30),fcst2=rnorm(mean=23,sd=4.5,n=30)) -PlotForecastPDF(fcst,tercile.limits=c(20,26)) +fcst <- data.frame(fcst1 = rnorm(mean = 25, sd = 3, n = 30), + fcst2 = rnorm(mean = 23, sd = 4.5, n = 30)) +PlotForecastPDF(fcst, tercile.limits = c(20, 26)) ``` ![Example 1](./Figures/PlotForecastPDF_ex1.png) @@ -33,15 +34,47 @@ PlotForecastPDF(fcst,tercile.limits=c(20,26)) Changing the title, the forecast labels or the units will be needed in most cases. ```{r,fig.show = 'hide',warning=F} -fcst <- data.frame(fcst1=rnorm(mean=25,sd=3,n=30),fcst2=rnorm(mean=23,sd=4.5,n=30)) -PlotForecastPDF(fcst,tercile.limits=c(20,26),var.name="Temperature (ºC)",title="Forecasts valid on 2019-01-01 at Sunny Hills",fcst.names = c("model a","model b")) +fcst <- data.frame(fcst1 = rnorm(mean = 25, sd = 3, n = 30), + fcst2 = rnorm(mean = 23, sd = 4.5, n = 30)) +PlotForecastPDF(fcst, tercile.limits = c(20, 26), var.name = "Temperature (ºC)", + title = "Forecasts valid on 2019-01-01 at Sunny Hills", + fcst.names = c("model a", "model b")) ``` ![Example 2](./Figures/PlotForecastPDF_ex2.png) ### 3.- Adding extremes and observed values We can add the probability of extreme values and the observed values. The tercile and extreme limits can be specified for each panel separately, as well as the observed values. + +```{r,fig.show = 'hide',warning=F} +fcst <- data.frame(fcst1 = rnorm(mean = 25, sd = 3, n = 30), + fcst2 = rnorm(mean = 28, sd = 4.5, n = 30), fcst3 = rnorm(mean = 17, sd = 3, n = 30)) +PlotForecastPDF(fcst, tercile.limits = rbind(c(20, 26), c(22, 28), c(15, 22)), + var.name = "Temperature (ºC)", title = "Forecasts at Sunny Hills", + fcst.names = c("January", "February", "March"), obs = c(21, 24, 17), + extreme.limits = rbind(c(18, 28), c(20, 30), c(12, 24))) +``` + +The same example using a forecast in array format is provided. ```{r,fig.show = 'hide',warning=F} -fcst <- data.frame(fcst1=rnorm(mean=25,sd=3,n=30),fcst2=rnorm(mean=28,sd=4.5,n=30),fcst3=rnorm(mean=17,sd=3,n=30)) -PlotForecastPDF(fcst,tercile.limits=rbind(c(20,26),c(22,28),c(15,22)),var.name="Temperature (ºC)",title="Forecasts at Sunny Hills",fcst.names = c("January","February","March"),obs=c(21,24,17),extreme.limits = rbind(c(18,28),c(20,30),c(12,24))) +fcst <- array(cbind(cbind(rnorm(mean = 25, sd = 3, n = 30), + rnorm(mean = 23, sd = 4.5, n = 30)), rnorm(mean = 17, sd = 3, n = 30)), + dim = c(members = 30, 3)) +PlotForecastPDF(fcst, tercile.limits = rbind(c(20, 26), c(22, 28), c(15, 22)), + var.name = "Temperature (ºC)", title = "Forecasts at Sunny Hills", + fcst.names = c("January", "February", "March"), obs = c(21, 24, 17), + extreme.limits = rbind(c(18, 28), c(20, 30), c(12, 24))) ``` ![Example 3](./Figures/PlotForecastPDF_ex3.png) + +### 4.- Example using lonlat_data +An example using the lonlat_data from CSTools is provided. + +```{r,fig.show = 'hide',warning=F} +fcst <- data.frame(fcst1 = lonlat_data$exp$data[1,,1,1,1,1] - 273.15, + fcst2 = lonlat_data$exp$data[1,,1,2,1,1] - 273.15) +PlotForecastPDF(fcst, tercile.limits = c(5, 7), extreme.limits = c(4, 8), + var.name = "Temperature (ºC)", + title = "Forecasts valid on 2000-11 at sample mediterranean region", + fcst.names = c("November", "December")) +``` +![Example 4](./Figures/PlotForecastPDF_ex4.png) diff --git a/vignettes/WeatherRegimes_vignette.Rmd b/vignettes/WeatherRegimes_vignette.Rmd new file mode 100644 index 0000000000000000000000000000000000000000..62e4883db6b55406f5100f7d6cb4e9187859f3d2 --- /dev/null +++ b/vignettes/WeatherRegimes_vignette.Rmd @@ -0,0 +1,164 @@ +--- +author: "Verónica Torralba, Nicola Cortesi and Núria Pérez-Zanón" +date: "`r Sys.Date()`" +output: rmarkdown::html_vignette +vignette: > + %\VignetteEngine{knitr::knitr} + %\VignetteIndexEntry{Weather Regime Analysis} + %\usepackage[utf8]{inputenc} +--- + + +Weather regime analysis +======================== + +Weather regimes are a set of patterns able to characterize the different circulation structures occurring each month/season. This vignette aims to illustrate a step-by-step example of how to perform a weather regime assessment using CSTools functionalities. + +### 1- Required packages + +The functions to compute the Weather Regimes are part of CSTools while plotting functions are included in s2dverification package: + +```r +library(CSTools) +library(s2dverification) +library(zeallot) +``` + + +### 2- Retrive data from files + +The data employed in this example are described below. +- Sea level pressure (psl): this has been selected as the circulation variable, however other variables such as geopotential at 500 hPa can be also used. +- Region: Euro-Atlantic domain [85.5ºW-45ºE; 27-81ºN]. +- Datasets: seasonal predictions from ECMWF System 4 ([**Molteni et al. 2011**] (https://www.ecmwf.int/sites/default/files/elibrary/2011/11209-new-ecmwf-seasonal-forecast-system-system-4.pdf)) and ERA-Interim reanalysis ([**Dee et al. 2011**] (http://onlinelibrary.wiley.com/doi/10.1002/qj.828/pdf)) as a reference dataset. +- Period: 1991-2010. Only 20 years have been selected for illustrative purposes, but the full hindcast period could be used for the analysis. + + +```r +sdates <- paste0(1991:2010, '1201') +c(exp, obs) %<-% CST_Load(var = 'psl', exp = 'system4_m1', + obs = 'erainterim', + sdates = sdates, + storefreq ='daily', + leadtimemin = 1, leadtimemax = 31, + latmin = 27, latmax = 81, + lonmin = 274.5, lonmax = 45, output = 'lonlat') +``` + + +Notice that you need the files to be stored locally in your computer or server with correct configuration file. If you are interested into run this vignette, contact nuria.perez at bsc.es to get a data sample. + +The objects returned by `CST_Load()` are s2v_cube class. They contains among others, the array with the requested data. + + +```r +> dim(exp$data) +dataset member sdate ftime lat lon + 1 15 20 31 77 186 +> dim(obs$data) +dataset member sdate ftime lat lon + 1 1 20 31 77 186 +``` + + +### 3- Daily anomalies based on a smoothed climatology) + +The weather regimes classification is based on daily anomalies, which have been computed by following these steps: + + +```r +c(ano_exp, ano_obs) %<-% CST_Anomaly(exp = exp, obs = obs, filter_span = 1) +``` + +The LOESS filter has been applied to the climatology to remove the short-term variability and retain the annual cycle. The parameter `loess_span` should be adapted to the length of the period used to compute the climatology (e.g. season, month, week,...). In this example we are using daily data, so we have selected `loess_span = 1`. + + +### 4- Weather regimes in observations + +`CST_WeatherRegimes()` function is used to define the clusters based on the sea level pressure anomalies from ERA-Interim. This function is based on the [*kmeans function*] (http://stat.ethz.ch/R-manual/R-devel/library/stats/html/kmeans.html) +from the stats R package. In this example we have made different assumptions: four clusters (`ncenters=4`) will be produced and the Empirical orthogonal functions are not used to filter the data (`EOFS=FALSE`) just to take into account the extreme values. More details about the methodology can be found in Cortesi et al. 2018 (submitted). + + +```r +WR_obs <- CST_WeatherRegimes(data = ano_obs, EOFs = FALSE, ncenters = 4) +``` + + +`CST_WeatherRegime()` provides a s2dv_cube object with several elements. `$data` the 4 weather regimes composites are stored while `$statistics` contains extra information (`$pvalue`, `$cluster`, `$persistence` and `$frequency`) which are the needed parameters for the weather regimes assessment. Further details about the outputs provided by the `CST_WeatherRegime()` function can be found in the package documentation or typing `?CST_WeatherRegimes` in the R session. + + + +### 5- Visualisation of the observed weather regimes + + +To plot the composite maps of each regime and the mean frequencies of each cluster, we have employed the `PlotLayout()` and `PlotEquiMap()` functions available in s2dverifcation. The object `WR_obs$data` is divided by 100 to change from Pa to hPa. As the `WR_obs$statistics$frequency` provides the monthly frequencies, the climatological frequencies are obtained as the average across the 20 years of the monthly frequencies. Note that these frequencies could slightly change as a consequence of the randomness inherent to the iterative processes involved in the k-means. + + ```r +clim_frequencies <- paste0('freq = ', + round(Mean1Dim(WR_obs$statistics$frequency, 1), 1), '%') +PlotLayout(PlotEquiMap, c(1, 2), lon = obs$lon, lat = obs$lat, + var = WR_obs$data / 100, + titles = paste0(paste0('Cluster ', 1:4), ' (', clim_frequencies,' )'), + filled.continents = FALSE, + axelab = FALSE, draw_separators = TRUE, subsampleg = 1, + brks = seq(-16, 16, by = 2), + bar_extra_labels = c(2, 0, 0, 0), fileout = './Figures/observed_regimes.png') +``` + + + +### 6- Visualisation of the observed regime persistence + +Persistence measures the average number of days a regime lasts before evolving to a different regime. To plot regime persistence for each start date, we have employed the `PlotTriangles4Categories()` function available in CSTools. + +```r +freq_obs <- WR_obs$statistics$persistence +freq_obs[is.na(freq_obs)] <- 0 +dim(freq_obs) <- c(dimy = 20, dimcat = 4, dimx = 1) + +PlotTriangles4Categories(freq_obs, toptitle = 'Persistence', + xtitle = 'Start Dates', ytitle = '', xlab = FALSE, + ylabels = substr(sdates, 1, 4), cex_leg = 0.6, + lab_legend = c('AR', 'NAO-', 'BL', 'NAO+'), figure.width = .7) + +``` + + + + + +### 7- Weather regimes in the predictions + +Predicted anomalies for each day, month, member and lead time are matched with the observed clusters (obtained in step 4). The assignment of the anomalies to a pre-defined set of clusters guarantees that the predicted weather regimes have very similar spatial structures to the observed regimes, which is an essential requirement for the verification of weather regimes. This is an example of how to produce a set of weather regimes based on the predictions that can be verified with the observational dataset, but this approach can be also used in an operational context for which the probability of occurence of each cluster could be estimated. + + +The matching is based on the minimization of Eucledian distance `method='distance'`, but it can also be also done in terms of spatial correlation `method='ACC'`. However the computational efficiency is superior for the distance method. + + +```r +WR_exp <- CST_RegimesAssign(data = ano_exp, ref_maps = WR_obs, method = 'distance', composite = TRUE, memb = TRUE) +``` + + +`CST_RegimesAssign()` provides a object of 's2dv_cube' class which lists several elements. The composites are stored in the $data element which in the case of `memb = TRUE` corresponds to the mean composites for all member. `$pvalue`, `$cluster` and `$frequency` are stored in element `$statistics. This elements contain similar information to that provided by the `WeatherRegime()` function. Further details about the output can be found in the package documentation. + +### 8- Visualisation of the predicted weather regimes + + +The outputs of `RegimesAssign()` have been represented to be compared with those obtained for the observational classification (step 5). + + +```r +PlotLayout(PlotEquiMap, c(1, 2),lon = exp$lon, lat = exp$lat, + var = WR_exp$data/100, + titles = paste0(paste0('Cluster ',1:4), ' (',paste0('freq = ', + round(WR_exp$statistics$frequency,1),'%'),' )'), + filled.continents = FALSE, + axelab = FALSE, draw_separators = TRUE, + subsampleg = 1, brks = seq(-16, 16, by = 2), + bar_extra_labels = c(2, 0, 0, 0), fileout = './Figures/predicted_regimes.png') +``` + + + +Observed and predicted weather regimes are very similar although their frequencies are slightly different. Cluster 1 is the Atlantic Ridge and cluster 3 the Blocking pattern, while cluster 4 and 2 are the positive and negative phases of the NAO. This patterns can change depending on the period analyzed.