diff --git a/inst/doc/figures/subseasonal_1.png b/inst/doc/figures/subseasonal_1.png new file mode 100644 index 0000000000000000000000000000000000000000..d5ca71a307dcf958543935bf93516cdb59ab27b9 Binary files /dev/null and b/inst/doc/figures/subseasonal_1.png differ diff --git a/inst/doc/figures/subseasonal_2.png b/inst/doc/figures/subseasonal_2.png new file mode 100644 index 0000000000000000000000000000000000000000..6c71dc71cf1f54e90ccc0adc7d6dddfdfd29927e Binary files /dev/null and b/inst/doc/figures/subseasonal_2.png differ diff --git a/inst/doc/figures/subseasonal_3.png b/inst/doc/figures/subseasonal_3.png new file mode 100644 index 0000000000000000000000000000000000000000..e5e3d8c2f67db15c38a3f0e1f48f2431af1cab88 Binary files /dev/null and b/inst/doc/figures/subseasonal_3.png differ diff --git a/inst/doc/figures/subseasonal_4.png b/inst/doc/figures/subseasonal_4.png new file mode 100644 index 0000000000000000000000000000000000000000..a9d1b051d7dfff9576316c1e95e57062d1752fa7 Binary files /dev/null and b/inst/doc/figures/subseasonal_4.png differ diff --git a/inst/doc/figures/subseasonal_5.png b/inst/doc/figures/subseasonal_5.png new file mode 100644 index 0000000000000000000000000000000000000000..1d339b017ca298561177358e997abbf8f3f1bef8 Binary files /dev/null and b/inst/doc/figures/subseasonal_5.png differ diff --git a/inst/doc/usecase.md b/inst/doc/usecase.md index 8224bbe28ac775da3ecd5c6c32072fe547d6e31a..47ee89e526d4a21718f7f9651ad436db76271d0f 100644 --- a/inst/doc/usecase.md +++ b/inst/doc/usecase.md @@ -102,5 +102,13 @@ Find more explanation of this use case in FAQ [How-to-27](inst/doc/faq.md#27-uti 12. [Transform and chunk spatial dimensions](inst/doc/usecase/ex2_12_transform_and_chunk.R) This use case provides an example of transforming and chunking latitude and longitude dimensions. - 13. [Load irregular grid data and interpolate it in the workflow](inst/doc/usecase/ex2_13_irregular_regrid.R) - This script shows how to load irregular grid data by Start(), then regrid it by s2dv::CDORemap in the workflow. It is a solution before Start() can deal with irregular regridding directly. + 13. [Load irregular grid data and interpolate it in the workflow](inst/doc/usecase/ex2_13_irregular_regrid.R) + This script shows how to load irregular grid data by Start(), then regrid it +by s2dv::CDORemap in the workflow. It is a solution before Start() can deal with irregular regridding directly. + +3. **Verification workflows** + 1. [Weekly ECV Subseasonal Hindcast Verification](inst/doc/usecase/ex3_1_SubseasonalECVHindcast.md) + This is a practical case to compute monthly skill scores for the ECMWF/S2S-ENSForhc +subseasonal hindcast using as a reference dataset ERA5. The ECV is air temperature at surface level (tas). +Note that since this case is practical, it is heavy and takes much time to finish running. + diff --git a/inst/doc/usecase/ex3_1_SubseasonalECVHindcast.md b/inst/doc/usecase/ex3_1_SubseasonalECVHindcast.md new file mode 100644 index 0000000000000000000000000000000000000000..85015130a5c60462493cb96dd168f410d7d995af --- /dev/null +++ b/inst/doc/usecase/ex3_1_SubseasonalECVHindcast.md @@ -0,0 +1,275 @@ +# Weekly ECV Subseasonal Hindcast Verification + +This is a practical case to compute monthly skill scores for the ECMWF/S2S-ENSForhc subseasonal hindcast using as a reference dataset ERA5. The ECV is air temperature at surface level (tas). +Note that since this case is practical, it is heavy and takes much time to finish running. + +Here, we aim to calculate the skill scores of the corresponding hindcast of 2016. + +Note that to do this, we will detect the Mondays and Thursdays, which are the days of the week in which this model is initialized, during the year 2016 (this is the *$sdate$*). +However, we will analyze the 20 previous years (this is the *$syear$*) of those initializations. + +The figure to see details on the subeasonal hindcasts storage: + + + +After loading startR package, the paths to the hindcast should be defined, including labels for $var$, $sdate$ and $syear$: + +```r +library(startR) + +ecmwf_path <- paste0('/esarchive/exp/ecmwf/s2s-monthly_ensforhc/', + 'weekly_mean/$var$_f24h/$sdate$/$var$_$syear$.nc') +``` + +Now, create the sequence of start dates in 2016 following the scheme in this figure: + + + +```r +forecast.year <- 2016 +# Mondays +sdates.seq.mon <- format(seq(as.Date(paste(forecast.year, 01, 04, sep = '-')), + as.Date(paste(forecast.year, 12, 31, sep='-')), + by = 'weeks'), format = '%Y%m%d') +# Thursdays (Monday+3days) +sdates.seq.thu <- format(seq(as.Date(paste(forecast.year, 01, 04, sep = '-')) + 3, + as.Date(paste(forecast.year, 12, 31, sep = '-')), + by = 'weeks'), format = '%Y%m%d') +# Joint dates in order +sdates.seq <- c(sdates.seq.mon, sdates.seq.thu) +ind <- order(as.Date(sdates.seq, format = '%Y%m%d')) # dates in order +sdates.seq <- sdates.seq[ind] + +# Leap years, remove 29th of February: +pos.bis <- which(sdates.seq == paste0(forecast.year,"0229")) +if(length(pos.bis) != 0) sdates.seq <- sdates.seq[-pos.bis] + +exp <- Start(dat = ecmwf_path, + var = 'tas', + sdate = sdates.seq, + syear = 'all', # before hdate + time = 'all', + ensemble = "all", + latitude = indices(1:121), + longitude = indices(1:240), + syear_depends = 'sdate', + return_vars = list(latitude = NULL, + longitude = NULL, + time = c('sdate', 'syear')), + num_procs = 16, + retrieve = FALSE) +``` + +Now, we define the obs dates to load the reference dataset: + +_(NOTE: The time attributes of exp returned by Start() cannot be used for defining +the obs dates because the time values saved in exp data are with 1-week lag. E.g., file for 19600104 has time values 1960-01-11, 1960-01-18, 1960-01-25, 1960-02-01.)_ + +```r +# Corresponding ERA5 +## Generate the syears from the sdates.seq +syears <- t(sapply(sdates.seq, function(x) { + year <- as.numeric(substr(x, 1, 4)) + syears <- paste0((year-20):(year-1), substr(x, 5, 8)) + })) +names(dim(syears)) <- c('sdate', 'syear') + +## Generate the times from the syears +Sys.setenv(TZ='UTC') # this helps to get UTC times +times <- lapply(syears, function(x) { + x <- as.Date(x, format = "%Y%m%d", tz = "UTC") + x <- seq(x, x + 25, by = 'week') + format(x, "%Y%m%d") + }) +times <- Reduce(c, times) +dim(times) <- c(time = 4, sdate = 103, syear = 20) +times <- s2dv::Reorder(times, c(2,3,1)) + +obs_path <- paste0("/esarchive/recon/ecmwf/era5/weekly_mean/", + "$var$_f1h-240x121/$var$_$file_date$.nc") + +obs <- Start(dat = obs_path, + var = 'tas', + file_date = times, + latitude = indices(1:121), + longitude = indices(1:240), + split_multiselected_dims = TRUE, + return_vars = list(latitude = NULL, + longitude = NULL, + time = 'file_date'), + num_procs = 16, + retrieve = FALSE) +``` + +The next step is to define our function to calculate scores: + + + + + +```r +score_calc <- function(hindcast, reference, sdates.seq, scores = 'all') { + + # Anomaly computation + reference <- s2dv::InsertDim(reference, pos = 3, len = 1, name = 'ensemble') + clim <- s2dv:::.Clim(hindcast, reference, time_dim = 'syear', + memb_dim = 'ensemble', memb = FALSE) + hindcast <- s2dv::Ano(hindcast, clim$clim_exp) + reference <- s2dv::Ano(reference, clim$clim_obs) + + # Create objects to store the outputs + Scores <- NULL + Skill_Scores <- NULL + + for(month in 1:12) { + # take only data of 1 month (indices of each month in start_dates) + startdates <- which(as.integer(substr(sdates.seq, 5, 6)) == month) + hindcast_month <- ClimProjDiags::Subset(hindcast, 'sdate', + list(startdates)) + reference_month <- ClimProjDiags::Subset(reference, 'sdate', + list(startdates)) + # reshape dimension to be [sdate, ensemble] + dim(hindcast_month) <- c(sdate = prod(dim(hindcast_month)[c('sdate', 'syear')]), + dim(hindcast_month)['ensemble']) + dim(reference_month) <- c(sdate = prod(dim(reference_month)[c('sdate', 'syear')]), + dim(reference_month)['ensemble']) + + if (any(c('fairrpss', 'frpss', 'all') %in% tolower(scores))) { + Scores <- c(Scores, + unlist(easyVerification::veriApply("FairRpss", + fcst = hindcast_month, obs = reference_month, + prob = c(1/3, 2/3), tdim = 1, ensdim = 2))) + } + if (any(c('faircrpss', 'fcrpss', 'all') %in% tolower(scores))) { + Scores <- c(Scores, + unlist(easyVerification::veriApply("FairCrpss", + fcst = hindcast_month, obs = reference_month, + tdim = 1, ensdim = 2))) + } + if (any(c('enscorr', 'corr', 'all') %in% tolower(scores))) { + fcst_mean <- rowMeans(hindcast_month) + Scores <- c(Scores, + cor = cor(fcst_mean, reference_month, + use = "complete.obs"), + cor.pv = cor.test(fcst_mean, reference_month, + use = "complete.obs")$p.value) + } + if (any(c('bss10', 'fbss10', 'all') %in% scores)) { + Scores <- c(Scores, + unlist(easyVerification::veriApply("FairRpss", + fcst = hindcast_month, obs = reference_month, + prob = (1/10), tdim = 1, ensdim = 2))) + } + if (any(c('bss90', 'fbss90', 'all') %in% scores)) { + Scores <- c(Scores, + unlist(easyVerification::veriApply("FairRpss", + fcst = hindcast_month, + obs = reference_month, + prob = (9/10), tdim = 1, ensdim = 2))) + } + Skill_Scores <- cbind(Skill_Scores, Scores) + Scores <- NULL + } + return(Skill_Scores) +} +``` + +The workflow is created with Step() and AddStep() + + + + +```r +step <- Step(fun = score_calc, + target_dims = list(hindcast = c('sdate','syear','ensemble'), + reference = c('sdate', 'syear')), + output_dims = list(c('scores', 'month')), + use_libraries = c('easyVerification', + 'SpecsVerification', + 's2dv', 'ClimProjDiags')) +# Workflow: +wf <- AddStep(list(exp, obs), step, sdates.seq = sdates.seq, scores = 'all') +``` + +Finally, execute the analysis, for instance, in Nord3v2: + + +```r +#-----------modify according to your personal info--------- + queue_host = 'nord4' #your own host name for Nord3v2 + temp_dir = '/gpfs/scratch/bsc32/bsc32339/startR_hpc/' + ecflow_suite_dir = '/home/Earth/nperez/startR_local/' #your own local directory +#------------------------------------------------------------ +result <- Compute(wf, + chunks = list(time = 4, longitude = 2, latitude = 2), + threads_load = 2, + threads_compute = 12, + cluster = list(queue_host = queue_host, + queue_type = 'slurm', + cores_per_job = 12, + temp_dir = temp_dir, + polling_period = 10, + job_wallclock = '03:00:00', + max_jobs = 16, + bidirectional = FALSE), + ecflow_suite_dir = ecflow_suite_dir, + wait = TRUE) +``` +*Notice that the execution of `Compute` may last for ~2 hours each chunk. Consider set `wait` as false (see [practical guide](https://earth.bsc.es/gitlab/es/startR/-/blob/master/inst/doc/practical_guide.md#collect-and-the-ec-flow-gui)).* + +If 'all' scores are requested the order in the 'scores' dimension of the output will be: + - FairRpss_skillscore (1), FairRpss_sd (2), + - FairCrpss_skillscore (3), FairCrpss_sd (4), + - EnsCorr (5), EnsCorr_pv (6) + - FairBSS10 (7), FairBSS10_pv (8) + - FairBSS90 (9), FairBSS90_pv (10) + +If you choose a subset of 'scores', it will follow the same order omitting the non-declared scores. It is useful to display the dimension names to understand the order of the output to create the plots. The significance can be also calculated using the standard deviation. Here, it is shown a simplified method: + + +```r +dim(result$output1) +library(multiApply) # to use Apply and calculate significance +library(ClimProjDiags) # to use ArrayToList facilitating the use of PlotLayout +library(s2dv) # to use PlotLayout combined with PlotEquiMap +FRPSS.pv <- Apply(result$output1, target_dims = c('scores'), + fun = function(x, my.pvalue) { + (x[1] > 0) & (x[1] > x[2] * qnorm(1 - my.pvalue))}, + my.pvalue = 0.05, ncores = 4)$output1 + + +sig <- ArrayToList(FRPSS.pv[,1,1,1,,], 1, names = 'dots', + level = 'sublist') +vars <- ArrayToList(result$output1[1,,1,1,1,,], 1, names = '') +PlotLayout(fun = PlotEquiMap, + plot_dims = c('latitude', 'longitude'), + var = vars, + lon = attributes(exp)$Variables$common$longitude, + lat = attributes(exp)$Variables$common$latitude, + brks = seq(-1, 1, 0.2), + filled.continents = FALSE, sizetit = NULL, dot_symbol = '/', + special_args = sig, + toptitle = 'S2S FairRPSS - Hindcast 2016 - sweek 1', + title_scale = 0.7, + titles = month.name, bar_scale = 0.8, + fileout = 'startR/inst/doc/figures/subseasonal_5.png') +``` + + + + + +This multipanel plot shows the monthly skill score FairRPSS corresponding to the first lead time (i.e. first week of each month) of the 20 years hindcast of 2016. Blue (red) grid points correspond to low (high) values of the Fair RPSS in which the predictability is low (high). The highest values are found in tropical regions. However, the predictability is different for each month. + +Given the high number of significant gridpoints, an alternative to display this information could be to filter the data for the non-significant points. + +*References* + +Wilks, D. S. (2011). Statistical methods in the atmospheric sciences. Elsevier/Academic Press. + +Vitart, F., Buizza, R., Alonso Balmaseda, M., Balsamo, G., Bidlot, J.-R., Bonet, A., Fuentes, M., Hofstadler, A., Molteni, F., & Palmer, T. N. (2008). The new VarEPS-monthly forecasting system: A first step towards seamless prediction. Quarterly Journal of the Royal Meteorological Society, 134(636), 1789–1799. https://doi.org/10.1002/qj.322 + +*Credits to:* + - *Original code: Andrea Manrique* + - *Adaptation: Núria Pérez-Zanón* + - *Code improvement: An-Chi Ho*