diff --git a/MODULES b/MODULES index e2709d66a4f84df02c03d753c44c801c44e8c60e..6654c0920d879c620e6878f08d901170513dae2c 100644 --- a/MODULES +++ b/MODULES @@ -17,6 +17,16 @@ if [[ $BSC_MACHINE == "nord3v2" ]]; then module load PROJ/9.0.0-GCCcore-8.3.0 module load Phantomjs/2.1.1 +elif [[ $HOSTNAME == "bsceshub02.bsc.es" ]]; then + + module purge + module load CDO/1.9.8-foss-2021b + module load R/4.2.1-foss-2021b + module load GEOS/3.11.0-GCC-11.2.0 + module load GDAL/3.5.2-foss-2021b-Python-3.9.6 + module load PROJ/9.1.0-foss-2021b + module load Phantomjs/2.1.1 + else module purge diff --git a/NEWS.md b/NEWS.md index e8cda565873c88b36148465b22808a32c3478b4c..40f97c3adbb979911614fc9f44bbe7444c84cd17 100644 --- a/NEWS.md +++ b/NEWS.md @@ -1,21 +1,60 @@ +SUNSET v2.0.0 +============= + +Modules: Loading, Units, Anomalies, Calibration, Downscaling, Indices, Skill, Saving, Visualization, Scorecards + +New features: +- New module for unit conversion, available for temperature, precipitation and pressure variables. +- New Indices module with the following indices available: NAO, Niño1+2, Niño3, Niño3.4 and Niño4. +- New Scorecards module to create Scorecard visualizations of the computed skill metrics. +- New Downscaling module making use of the in-house CSDownscale package. +- Recipe splitting feature: A recipe can be split into 'atomic recipes' and the same workflow can be easily parallelized for multiple forecast systems, references, variables, start dates and regions. +- Option to load, process, save and plot multiple variables in one atomic recipe. +- Possibility to use Autosubmit 4 as a workflow manager to run recipes in parallel. +- New SUNSET launcher script to split a recipe and run it in parallel directly in the cluster with SLURM or with Autosubmit. +- Option to load tas-tos blend by requesting the 'tas-tos' variable. +- For each module there is the possibility to choose whether or not to save the forecast, hindcast and/or observations. +- New skill metrics MSSS and MSE available. +- New use cases with hands-on tutorials. +- GRIB file loading. + +Summary of bugfixes/improvements: + +- The names of the module functions have changed to be the same as the name of the module (e.g. load_datasets() has become Loading()). The old names will be deprecated in a future release. +- New datasets and variables added to the seasonal archive. +- The launcher script and the prepare_outputs() function have a new "disable_unique_ID" option, which removes the numerical identifier that is added to the name of the output folder when it is generated. +- The seasonal and decadal loading functions have been renamed and can now sourced and called from the same Loading module. +- Bugfix in the recipe checker: checks are now completed even when the time horizon is not correct. +- The functions have been adapted to the new s2dv_cube structure in CSTools>5.0.0. +- The metric 'corr' (correlation for each ensemble member) has been renamed to 'corr_individual_members'. +- Datasets saved under 'daily' folders in esarchive can now be loaded along with 'daily_mean' datasets. +- Start date information has been added to the plot names to avoid overwriting plots from different start dates. +- In the Visualization section of the recipe, the user can specify which plots to generate, whether they should be in a single-panel or multi-panel layout, and choose between different projections. +- General improvements to the plots: color scheme, units, titles and subtitles, layout and colorbars. +- Anomalies module can compute anomalies when hindcast and observations do not share the same grid. +- Added module configuration to run SUNSET in the BSC hub. +- Language specification has been added to lubridate functions to ensure they work well in all language locales. + ESS Verification Suite v1.1.0 +============================= Modules: Loading, Anomalies, Calibration, Skill, Saving, Visualization New features: -New module for anomaly computation. -New 'Scorecards' output format (different file names and paths from the default format). -New 'recipe checker' feature in prepare_outputs(): It runs a series of checks on the recipe to detect potential errors, typos, or missing information. +- New module for anomaly computation. +- New 'Scorecards' output format (different file names and paths from the default format). +- New 'recipe checker' feature in prepare_outputs(): It runs a series of checks on the recipe to detect potential errors, typos, or missing information. + Summary of fixes/improvements: -Changed the names of the seasonal systems from the names in /esarchive to the official names in the CDS. -Fixed a bug in the conversion of precipitation units. -Fixed a bug related to the consistency between experiment and observation dates for some systems. -Function parameters have been simplified and uniformized. -Improvements in the logging functionality for longer messages. -Improvements to the plots generated by the Visualization module. -compute_probabilities() now returns the fcst probabilities as well. +- Changed the names of the seasonal systems from the names in /esarchive to the official names in the CDS. +- Fixed a bug in the conversion of precipitation units. +- Fixed a bug related to the consistency between experiment and observation dates for some systems. +- Function parameters have been simplified and uniformized. +- Improvements in the logging functionality for longer messages. +- Improvements to the plots generated by the Visualization module. +- compute_probabilities() now returns the fcst probabilities as well. ESS Verification Suite v1.0.0 ============================= diff --git a/README.md b/README.md index 5758fbf3b3808efe9a8027a66e6278500d1786b1..2ae44e8e6d6619cbe8a08bff72c644f3c33c9163 100644 --- a/README.md +++ b/README.md @@ -1,5 +1,5 @@ -SUNSET: SUbseasoNal to decadal climate forecast post-processing and asSEmenT suite +SUNSET: SUbseasoNal to decadal climate forecast post-processing and asSEssmenT suite ====================== This is the Git project for the SUNSET, a collaborative in-house tool developed at BSC-ES for research projects and operational workflows involving subseasonal to seasonal to decadal forecast verification. diff --git a/autosubmit/auto-scorecards.sh b/autosubmit/auto-scorecards.sh index 4b5273725bed84811e1267048d035a0e2f712a28..c30f643f3be53f216ead66675a9545a0e159198a 100644 --- a/autosubmit/auto-scorecards.sh +++ b/autosubmit/auto-scorecards.sh @@ -2,8 +2,8 @@ ############ AUTOSUBMIT INPUTS ############ proj_dir=%PROJDIR% -outdir=%OUTDIR% -recipe=%RECIPE% +outdir=%common.OUTDIR% +recipe=%common.RECIPE% ############################### cd $proj_dir diff --git a/conda_installation/environment-sunset.yml b/conda_installation/environment-sunset.yml index 0427f523a39363df46c5f9227066e4c36178e566..27dbd8b96bfb39955ba302c6712bc3a8cca0ef36 100644 --- a/conda_installation/environment-sunset.yml +++ b/conda_installation/environment-sunset.yml @@ -165,6 +165,7 @@ dependencies: - r-docopt=0.7.1=r42hc72bb7e_3 - r-doparallel=1.0.17=r42hc72bb7e_2 - r-dotcall64=1.0_2=r42h61816a4_2 + - r-dplyr=1.1.2=r42ha503ecb_1 - r-e1071=1.7_13=r42ha503ecb_1 - r-easyncdf=0.1.2=r42hc72bb7e_1 - r-easyverification=0.4.5=r42ha503ecb_0 @@ -187,6 +188,7 @@ dependencies: - r-globals=0.16.2=r42hc72bb7e_1 - r-glue=1.6.2=r42h57805ef_2 - r-gridextra=2.3=r42hc72bb7e_1005 + - r-gridgraphics=0.5_1=r42hc72bb7e_2 - r-gtable=0.3.4=r42hc72bb7e_0 - r-highr=0.10=r42hc72bb7e_1 - r-htmltools=0.5.6=r42ha503ecb_0 diff --git a/conda_installation/load_sunset.bash b/conda_installation/load_sunset.bash new file mode 100755 index 0000000000000000000000000000000000000000..add332b4a99210f3a890b14da77ee466f8b2ba8d --- /dev/null +++ b/conda_installation/load_sunset.bash @@ -0,0 +1,9 @@ +#!/bin/bash + +prefix=$1 + +conda env create --file environment-sunset.yml --prefix $prefix + +conda activate $prefix + +R -e "options(timeout = 600) ; install.packages('CSTools', repos='https://ftp.cixug.es/CRAN/')" diff --git a/conf/archive.yml b/conf/archive.yml index cca68c22bd5d2d42b2af9964a1f0cbc44d77874b..61f62be230b4ff05021a60a17e38e5e4d446cff9 100644 --- a/conf/archive.yml +++ b/conf/archive.yml @@ -54,7 +54,8 @@ esarchive: src: "exp/meteofrance/system7c3s/" monthly_mean: {"tas":"monthly_mean/tas_f6h/", "g500":"monthly_mean/g500_f12h/", "prlr":"monthly_mean/prlr_f24h/", "sfcWind": "monthly_mean/sfcWind_f6h/", - "tasmax":"monthly_mean/tasmax_f6h/", "tasmin": "monthly_mean/tasmin_f6h/"} + "tasmax":"monthly_mean/tasmax_f6h/", "tasmin": "monthly_mean/tasmin_f6h/", + "tos":"monthly_mean/tos_f6h/"} nmember: fcst: 51 hcst: 25 @@ -253,3 +254,26 @@ mars: monthly_mean: {"tas":""} calendar: "standard" reference_grid: "conf/grid_description/griddes_GRIB_system5_m1.txt" + +sample: + src: + System: + ECMWF-SEAS5.1: + name: "ECMWF SEAS5" + institution: "European Centre for Medium-Range Weather Forecasts" + src: + monthly_mean: {"tas":"", "prlr":""} + nmember: + fcst: 15 + hcst: 15 + calendar: "proleptic_gregorian" + time_stamp_lag: "0" + reference_grid: "conf/grid_description/griddes_GRIB_system51_m1.txt" + Reference: + ERA5: + name: "ERA5" + institution: "European Centre for Medium-Range Weather Forecasts" + src: "GRIB_era5_tas/" + monthly_mean: {"tas":"", "prlr":""} + calendar: "standard" + reference_grid: "conf/grid_description/griddes_GRIB_system5_m1.txt" diff --git a/modules/test_GRIB.R b/example_scripts/test_GRIB.R similarity index 100% rename from modules/test_GRIB.R rename to example_scripts/test_GRIB.R diff --git a/modules/test_parallel_GRIB.R b/example_scripts/test_parallel_GRIB.R similarity index 100% rename from modules/test_parallel_GRIB.R rename to example_scripts/test_parallel_GRIB.R diff --git a/launch_SUNSET.sh b/launch_SUNSET.sh index eb8fcf4638e3c96d422db3aad0bfd0af2740885c..f859a8b283203cbc2c3a75885cfa5d95988d8bc3 100644 --- a/launch_SUNSET.sh +++ b/launch_SUNSET.sh @@ -118,7 +118,7 @@ if [ $run_method == "sbatch" ]; then scorecards=$( head -4 $tmpfile | tail -1) # Create directory for slurm output - logdir=${codedir}/out-logs/slurm_logs/ + logdir=${outdir}/logs/slurm/ mkdir -p $logdir echo "Slurm job logs will be stored in $logdir" @@ -129,7 +129,7 @@ if [ $run_method == "sbatch" ]; then verification_job_list=() echo "Submitting verification jobs..." # Loop over atomic recipes - for atomic_recipe in ${outdir}/logs/recipes/atomic_recipe_??.yml; do + for atomic_recipe in ${outdir}/logs/recipes/atomic_recipe_*.yml; do job_number=$(($job_number + 1)) job_name=$(basename $outdir)_$(printf %02d $job_number) outfile=${logdir}/run-${job_name}.out diff --git a/modules/Downscaling/tmp/Analogs.R b/modules/Downscaling/tmp/Analogs.R index 99fc45e7aae1b8ac48d1e92aa254cfe3745f5591..73b76ae56cd114536a07ec410e172b09e33b3067 100644 --- a/modules/Downscaling/tmp/Analogs.R +++ b/modules/Downscaling/tmp/Analogs.R @@ -1,16 +1,24 @@ #'@rdname CST_Analogs -#'@title Downscaling using Analogs based on large scale fields. +#'@title Downscaling using Analogs based on coarse scale fields. #' #'@author J. Ramon, \email{jaume.ramon@bsc.es} #' #'@description This function performs a downscaling using Analogs. To compute -#'the analogs given a coarse-scale field, the function looks for days with -#'similar conditions in the historical observations. The coarse scale and -#'observation data can be either global or regional. In the latter case, the -#'region is defined by the user. In principle, the coarse and observation data -#'should be of the same variable, although different variables can also be admitted. -#'The analogs function will find the N best analogs based in Minimum Euclidean -#'distance. +#'the analogs given a coarse-scale field, the function looks for days with similar conditions +#'in the historical observations. The analogs function determines the N best analogs based +#'on Euclidian distance, distance correlation, or Spearman's correlation metrics. To downscale +#'a local-scale variable, either the variable itself or another large-scale variable +#'can be utilized as the predictor. In the first scenario, analogs are examined between +#'the observation and model data of the same local-scale variable. In the latter scenario, +#'the function identifies the day in the observation data that closely resembles +#'the large-scale pattern of interest in the model. When it identifies the date of +#'the best analog, the function extracts the corresponding local-scale variable for that day +#'from the observation of the local scale variable. The used local-scale and large-scale +#'variables can be retrieved from independent regions. The input data for the first case must +#'include 'exp' and 'obs,' while in the second case, 'obs,' 'obsL,' and 'exp' are the +#'required input fields. Users can perform the downscaling process over the subregions +#'that can be identified through the 'region' argument, instead of focusing +#'on the entire area of the loaded data. #' #'The search of analogs must be done in the longest dataset posible, but might #'require high-memory computational resources. This is important since it is @@ -21,25 +29,24 @@ #'is intended to downscale climate prediction data (i.e., sub-seasonal, seasonal #'and decadal predictions) but can admit climate projections or reanalyses. It does #'not have constrains of specific region or variables to downscale. -#'@param exp an 's2dv' object with named dimensions containing the experimental field on -#'the coarse scale for which the downscaling is aimed. The object must have, at least, -#'the dimensions latitude, longitude, start date and time. The object is expected to be -#'already subset for the desired region. Data can be in one or two integrated regions, e.g., -#'crossing the Greenwich meridian. To get the correct results in the latter case, -#'the borders of the region should be specified in the parameter 'region'. See parameter -#''region'. -#'@param obs an 's2dv' object with named dimensions containing the observational field. -#'The object must have, at least, the dimensions latitude, longitude and start date. -#'The object is expected to be already subset for the desired region. -#'@param obs2 an 's2dv' object with named dimensions containing a different observational -#'field to that in 'obs'. If provided, these observations will be used in the training, -#'i.e. the searching for analogs, so that they should be in a coarser grid to those in -#''obs'. Training with observations on a grid with a spatial resolution closer to that -#'in 'exp', will in principle ensure better results. The object must have, at least, the -#'dimensions latitude, longitude and start date. The object is expected to be already -#'subset for the desired region. -#'@param grid_exp a character vector with a path to an example file of the exp data. -#'It can be either a path to another NetCDF file which to read the target grid from +#'@param exp an 's2dv_cube' object with named dimensions containing the experimental field +#'on the coarse scale for the variable targeted for downscaling (in case obsL is not provided) +#'or for the large-scale variable used as the predictor (if obsL is provided). +#'The object must have, at least, the dimensions latitude, longitude, start date and time. +#'The object is expected to be already subset for the desired region. Data can be in one +#'or two integrated regions, e.g., crossing the Greenwich meridian. To get the correct +#'results in the latter case, the borders of the region should be specified in the parameter +#''region'. See parameter 'region'. +#'@param obs an 's2dv_cube' object with named dimensions containing the observational field +#'for the variable targeted for downscaling. The object must have, at least, the dimensions +#'latitude, longitude and start date. The object is expected to be already subset for the +#'desired region. +#'@param obsL an 's2dv_cube' object with named dimensions containing the observational +#'field of the large-scale variable. The object must have, at least, the dimensions latitude, +#'longitude and start date. The object is expected to be already subset for the desired region. +#'@param grid_exp a character vector with a path to an example file of the exp (if the +#'predictor is the local scale variable) or expL (if the predictor is a large scale variable) +#'data. It can be either a path to another NetCDF file which to read the target grid from #'(a single grid must be defined in such file) or a character vector indicating the #'coarse grid to be passed to CDO, and it must be a grid recognised by CDO. #'@param nanalogs an integer indicating the number of analogs to be searched @@ -56,6 +63,9 @@ #''data' in exp and obs. Default set to "time". #'@param member_dim a character vector indicating the member dimension name in the element #''data' in exp and obs. Default set to "member". +#'@param metric a character vector to select the analog specification method. Only these +#'options are valid: "dist" (i.e., Euclidian distance), "dcor" (i.e., distance correlation) +#'or "cor" (i.e., Spearman's .correlation). The default metric is "dist". #'@param region a numeric vector indicating the borders of the region defined in exp. #'It consists of four elements in this order: lonmin, lonmax, latmin, latmax. lonmin refers #'to the left border, while lonmax refers to the right border. latmin indicates the lower @@ -68,7 +78,7 @@ #'of the window. It is recommended to be set to TRUE. Default to TRUE. #'@param ncores an integer indicating the number of cores to use in parallel computation. #'The default value is NULL. -#'@return An 's2dv' object. The element 'data' contains the dowscaled field, 'lat' the +#'@return An 's2dv_cube' object. The element 'data' contains the dowscaled field, 'lat' the #'downscaled latitudes, and 'lon' the downscaled longitudes. If fun_analog is set to NULL #'(default), the output array in 'data' also contains the dimension 'analog' with the best #'analog days. @@ -81,31 +91,40 @@ #'dim(obs) <- c(lat = 12, lon = 15, sdate = 5, time = 30) #'obs_lons <- seq(0,6, 6/14) #'obs_lats <- seq(0,6, 6/11) -#'exp <- s2dv_cube(data = exp, lat = exp_lats, lon = exp_lons) -#'obs <- s2dv_cube(data = obs, lat = obs_lats, lon = obs_lons) +#'exp <- s2dv_cube(data = exp, coords = list(lat = exp_lats, lon = exp_lons)) +#'obs <- s2dv_cube(data = obs, coords = list(lat = obs_lats, lon = obs_lons)) #'downscaled_field <- CST_Analogs(exp = exp, obs = obs, grid_exp = 'r360x180') #'@export -CST_Analogs <- function(exp, obs, grid_exp, obs2 = NULL, nanalogs = 3, fun_analog = NULL, lat_dim = "lat", - lon_dim = "lon", sdate_dim = "sdate", time_dim = "time", member_dim = "member", - region = NULL, return_indices = FALSE, loocv_window = TRUE, ncores = NULL) { - - # input exp and obs must be s2dv_cube objects - if (!inherits(exp,'s2dv_cube')) { - stop("Parameter 'exp' must be of the class 's2dv_cube'") - } +CST_Analogs <- function(exp, obs, obsL = NULL, grid_exp, nanalogs = 3, + fun_analog = NULL, lat_dim = "lat", lon_dim = "lon", sdate_dim = "sdate", + time_dim = "time", member_dim = "member", metric = "dist", region = NULL, + return_indices = FALSE, loocv_window = TRUE, ncores = NULL) { # input exp and obs must be s2dv_cube objects if (!inherits(obs,'s2dv_cube')) { stop("Parameter 'obs' must be of the class 's2dv_cube'") } - res <- Analogs(exp = exp$data, obs = obs$data, exp_lats = exp$coords[[lat_dim]], - exp_lons = exp$coords[[lon_dim]], obs_lats = obs$coords[[lat_dim]], - obs_lons = obs$coords[[lon_dim]], grid_exp = grid_exp, nanalogs = nanalogs, - fun_analog = fun_analog, lat_dim = lat_dim, lon_dim = lon_dim, - sdate_dim = sdate_dim, time_dim = time_dim, member_dim = member_dim, - region = region, return_indices = return_indices, loocv_window = loocv_window, - ncores = ncores) + if (!is.null(obsL)) { + # input obs must be s2dv_cube objects + if (!inherits(obsL,'s2dv_cube')) { + stop("Parameter 'obsL' must be of the class 's2dv_cube'") + } + } + # input exp and obs must be s2dv_cube objects + if (!inherits(exp,'s2dv_cube')) { + stop("Parameter 'exp' must be of the class 's2dv_cube'") + } + + res <- Analogs(exp = exp$data, obs = obs$data, obsL = obsL$data, + exp_lats = exp$coords[[lat_dim]], exp_lons = exp$coords[[lon_dim]], + obs_lats = obs$coords[[lat_dim]], obs_lons = obs$coords[[lon_dim]], + obsL_lats = obsL$coords[[lat_dim]], obsL_lons = obsL$coords[[lon_dim]], + grid_exp = grid_exp, nanalogs = nanalogs, fun_analog = fun_analog, + lat_dim = lat_dim, lon_dim = lon_dim, sdate_dim = sdate_dim, + time_dim = time_dim, member_dim = member_dim, metric = metric, + region = region, return_indices = return_indices, + loocv_window = loocv_window, ncores = ncores) # Modify data, lat and lon in the origina s2dv_cube, adding the downscaled data exp$data <- res$data @@ -129,13 +148,21 @@ CST_Analogs <- function(exp, obs, grid_exp, obs2 = NULL, nanalogs = 3, fun_analo #'@author Ll. Lledó, \email{llorenc.lledo@ecmwf.int} #' #'@description This function performs a downscaling using Analogs. To compute -#'the analogs given a coarse-scale field, the function looks for days with -#'similar conditions in the historical observations. The coarse scale and -#'observation data can be either global or regional. In the latter case, the -#'region is defined by the user. In principle, the coarse and observation data -#'should be of the same variable, although different variables can also be admitted. -#'The analogs function will find the N best analogs based in Minimum Euclidean -#'distance. +#'the analogs given a coarse-scale field, the function looks for days with similar conditions +#'in the historical observations. The analogs function determines the N best analogs based +#'on RMSE, distance correlation, or Spearman's correlation metrics. To downscale +#'a local-scale variable, either the variable itself or another large-scale variable +#'can be utilized as the predictor. In the first scenario, analogs are examined between +#'the observation and model data of the same local-scale variable. In the latter scenario, +#'the function identifies the day in the observation data that closely resembles +#'the large-scale pattern of interest in the model. When it identifies the date of +#'the best analog, the function extracts the corresponding local-scale variable for that day +#'from the observation of the local scale variable. The used local-scale and large-scale +#'variables can be retrieved from independent regions. The input data for the first case must +#'include 'exp' and 'obs,' while in the second case, 'obs,' 'obsL,' and 'expL' are the +#'required input fields. Users can perform the downscaling process over the subregions +#'that can be identified through the 'region' and 'regionL' arguments, instead of focusing +#'on the entire area of the loaded data. #' #'The search of analogs must be done in the longest dataset posible, but might #'require high-memory computational resources. This is important since it is @@ -146,19 +173,20 @@ CST_Analogs <- function(exp, obs, grid_exp, obs2 = NULL, nanalogs = 3, fun_analo #'is intended to downscale climate prediction data (i.e., sub-seasonal, seasonal #'and decadal predictions) but can admit climate projections or reanalyses. It does #'not have constrains of specific region or variables to downscale. -#'@param exp an array with named dimensions containing the experimental field on the -#'coarse scale for which the downscaling is aimed. The object must have, at least, -#'the dimensions latitude, longitude, start date and time. The object is expected to be -#'already subset for the desired region. Data can be in one or two integrated regions, e.g., -#'crossing the Greenwich meridian. To get the correct results in the latter case, -#'the borders of the region should be specified in the parameter 'region'. See parameter -#''region'. -#'@param obs an array with named dimensions containing the observational field. The object -#'must have, at least, the dimensions latitude, longitude, start date and time. The object -#'is expected to be already subset for the desired region. Optionally, 'obs' can have the -#'dimension 'window', containing the sampled fields into which the function will look for -#'the analogs. See function 'generate_window()'. Otherwise, the function will look for -#'analogs using all the possible fields contained in obs. +#'@param exp an array with named dimensions containing the experimental field +#'on the coarse scale for the variable targeted for downscaling (in case obsL is not provided) +#'or for the large-scale variable used as the predictor (if obsL is provided). +#'The object must have, at least, the dimensions latitude, longitude, start date and time. +#'The object is expected to be already subset for the desired region. Data can be in one +#'or two integrated regions, e.g., crossing the Greenwich meridian. To get the correct +#'results in the latter case, the borders of the region should be specified in the parameter +#''region'. See parameter 'region'. +#'@param obs an array with named dimensions containing the observational field for the variable +#'targeted for downscaling. The object must have, at least, the dimensions latitude, longitude, +#'start date and time. The object is expected to be already subset for the desired region. +#'Optionally, 'obs' can have the dimension 'window', containing the sampled fields into which +#'the function will look for the analogs. See function 'generate_window()'. Otherwise, +#'the function will look for analogs using all the possible fields contained in obs. #'@param exp_lats a numeric vector containing the latitude values in 'exp'. Latitudes must #'range from -90 to 90. #'@param exp_lons a numeric vector containing the longitude values in 'exp'. Longitudes @@ -167,17 +195,21 @@ CST_Analogs <- function(exp, obs, grid_exp, obs2 = NULL, nanalogs = 3, fun_analo #'range from -90 to 90. #'@param obs_lons a numeric vector containing the longitude values in 'obs'. Longitudes #'can range from -180 to 180 or from 0 to 360. -#'@param grid_exp a character vector with a path to an example file of the exp data. +#'@param obsL an 's2dv_cube' object with named dimensions containing the observational +#'field of the large-scale variable.The object must have, at least, the dimensions latitude, +#'longitude and start date. The object is expected to be already subset for the desired region. +#'Optionally, 'obsL' can have the dimension 'window', containing the sampled fields into which +#'the function will look for the analogs. See function 'generate_window()'. Otherwise, +#'the function will look for analogs using all the possible fields contained in obs. +#'@param obsL_lats a numeric vector containing the latitude values in 'obsL'. Latitudes must +#'range from -90 to 90. +#'@param obsL_lons a numeric vector containing the longitude values in 'obsL'. Longitudes +#'can range from -180 to 180 or from 0 to 360. +#'@param grid_exp a character vector with a path to an example file of the exp (if the +#'predictor is the local scale variable) or expL (if the predictor is a large scale variable) data. #'It can be either a path to another NetCDF file which to read the target grid from #'(a single grid must be defined in such file) or a character vector indicating the #'coarse grid to be passed to CDO, and it must be a grid recognised by CDO. -#'@param obs2 an 's2dv' object with named dimensions containing a different observational -#'field to that in 'obs'. If provided, these observations will be used in the training, -#'i.e. the searching for analogs, so that they should be in a coarser grid to those in -#''obs'. Training with observations on a grid with a spatial resolution closer to that -#'in 'exp', will in principle ensure better results. The object must have, at least, the -#'dimensions latitude, longitude and start date. The object is expected to be already -#'subset for the desired region. #'@param nanalogs an integer indicating the number of analogs to be searched. #'@param fun_analog a function to be applied over the found analogs. Only these options #'are valid: "mean", "wmean", "max", "min", "median" or NULL. If set to NULL (default), @@ -192,6 +224,9 @@ CST_Analogs <- function(exp, obs, grid_exp, obs2 = NULL, nanalogs = 3, fun_analo #''data' in exp and obs. Default set to "time". #'@param member_dim a character vector indicating the member dimension name in the element #''data' in exp and obs. Default set to "member". +#'@param metric a character vector to select the analog specification method. Only these +#'options are valid: "dist" (i.e., Euclidian distance), "dcor" (i.e., distance correlation) +#'or "cor" (i.e., Spearman's .correlation). The default metric is "dist". #'@param region a numeric vector indicating the borders of the region defined in exp. #'It consists of four elements in this order: lonmin, lonmax, latmin, latmax. lonmin refers #'to the left border, while lonmax refers to the right border. latmin indicates the lower @@ -209,7 +244,7 @@ CST_Analogs <- function(exp, obs, grid_exp, obs2 = NULL, nanalogs = 3, fun_analo #'@import multiApply #'@import CSTools #'@importFrom s2dv InsertDim CDORemap -#'@importFrom FNN get.knnx +#'@importFrom energy dcor #' #'@seealso \code{\link[s2dverification]{CDORemap}} #' @@ -229,11 +264,11 @@ CST_Analogs <- function(exp, obs, grid_exp, obs2 = NULL, nanalogs = 3, fun_analo #'downscaled_field <- Analogs(exp = exp, obs = obs, exp_lats = exp_lats, exp_lons = exp_lons, #'obs_lats = obs_lats, obs_lons = obs_lons, grid_exp = 'r360x180') #'@export -Analogs <- function(exp, obs, exp_lats, exp_lons, obs_lats, obs_lons, grid_exp, obs2 = NULL, - obs2_lats = NULL, obs2_lons = NULL, nanalogs = 3, fun_analog = NULL, - lat_dim = "lat", lon_dim = "lon", sdate_dim = "sdate", time_dim = "time", - member_dim = "member", region = NULL, return_indices = FALSE, - loocv_window = TRUE, ncores = NULL) { +Analogs <- function(exp, obs, exp_lats = NULL, exp_lons = NULL, obs_lats, obs_lons, + grid_exp, obsL = NULL, obsL_lats = NULL, obsL_lons = NULL, nanalogs = 3, + fun_analog = NULL, lat_dim = "lat", lon_dim = "lon", sdate_dim = "sdate", + time_dim = "time", member_dim = "member", metric = "dist", region = NULL, + return_indices = FALSE, loocv_window = TRUE, ncores = NULL) { #----------------------------------- # Checkings #----------------------------------- @@ -264,7 +299,7 @@ Analogs <- function(exp, obs, exp_lats, exp_lons, obs_lats, obs_lons, grid_exp, stop("Parameter 'time_dim' must be of the class 'character'") } - # Do not allow synonims for lat (latitude), lon (longitude) and time (sdate) dimension names + # Do not allow synonims for lat (latitude), lon (longitude) and time (sdate) dimension names if (is.na(match(lon_dim, names(dim(exp)))) | is.na(match(lon_dim, names(dim(obs))))) { stop("Missing longitude dimension in 'exp' and/or 'obs', or does not match the parameter ", "'lon_dim'") @@ -305,38 +340,41 @@ Analogs <- function(exp, obs, exp_lats, exp_lons, obs_lats, obs_lons, grid_exp, } } - if (!is.null(obs2)) { + if (!is.null(obsL) ) { + # the code is not yet prepared to handle members in the observations - if (member_dim %in% names(dim(obs2))) { - if (identical(as.numeric(dim(obs2)[member_dim]), 1)) { - obs2 <- ClimProjDiags::Subset(x = obs2, along = member_dim, indices = 1, drop = 'selected') + if (member_dim %in% names(dim(obsL))) { + if (identical(as.numeric(dim(obsL)[member_dim]), 1)) { + obsL <- ClimProjDiags::Subset(x = obsL, along = member_dim, indices = 1, drop = 'selected') } else { - stop("Not implemented for observations with members ('obs2' can have 'member_dim', ", + stop("Not implemented for observations with members ('obsL' can have 'member_dim', ", "but it should be of length = 1).") } } - if (is.null(obs2_lats) | is.null(obs2_lons)) { + if (is.null(obsL_lats) | is.null(obsL_lons)) { stop("Missing latitudes and/or longitudes for the provided training observations. Please ", - "provide them with the parametres 'obs2_lats' and 'obs2_lons'") + "provide them with the parametres 'obsL_lats' and 'obsL_lons'") } - if (is.na(match(lon_dim, names(dim(obs2))))) { - stop("Missing longitude dimension in 'obs2', or does not match the parameter 'lon_dim'") + if (is.na(match(lon_dim, names(dim(obsL))))) { + stop("Missing longitude dimension in 'obsL', or does not match the parameter 'lon_dim'") } - if (is.na(match(lat_dim, names(dim(obs2))))) { - stop("Missing latitude dimension in 'obs2', or does not match the parameter 'lat_dim'") + if (is.na(match(lat_dim, names(dim(obsL))))) { + stop("Missing latitude dimension in 'obsL', or does not match the parameter 'lat_dim'") } - if (is.na(match(sdate_dim, names(dim(obs2))))) { - stop("Missing start date dimension in 'obs2', or does not match the parameter 'sdate_dim'") + if (is.na(match(sdate_dim, names(dim(obsL))))) { + stop("Missing start date dimension in 'obsL', or does not match the parameter 'sdate_dim'") } - if (is.na(match(time_dim, names(dim(obs2))))) { - stop("Missing time dimension in 'obs2', or does not match the parameter 'time_dim'") + if (is.na(match(time_dim, names(dim(obsL))))) { + stop("Missing time dimension in 'obsL', or does not match the parameter 'time_dim'") } + } + ## ncores if (!is.null(ncores)) { if (!is.numeric(ncores) | any(ncores %% 1 != 0) | any(ncores < 0) | @@ -350,15 +388,20 @@ Analogs <- function(exp, obs, exp_lats, exp_lons, obs_lats, obs_lons, grid_exp, stopifnot(fun_analog %in% c("mean", "wmean", "max", "min", "median")) } - if (!is.null(obs2)) { - obs_train <- obs2 - obs_train_lats <- obs2_lats - obs_train_lons <- obs2_lons + # metric method to be used to specify the analogs + stopifnot(metric %in% c("cor", "dcor", "dist")) + + + + if (!is.null(obsL)) { + obs_train <- obsL + obs_train_lats <- obsL_lats + obs_train_lons <- obsL_lons } else { obs_train <- obs obs_train_lats <- obs_lats obs_train_lons <- obs_lons - } + } # Correct indices later if cross-validation loocv_correction <- FALSE @@ -366,24 +409,40 @@ Analogs <- function(exp, obs, exp_lats, exp_lons, obs_lats, obs_lons, grid_exp, loocv_correction <- TRUE } - #----------------------------------- - # Interpolate high-res observations to the coarse grid - #----------------------------------- + # crop downscaling region, if the argument region is provided. + if (!is.null(region) & is.null(obsL)) { + # if a border is equally distant from two different grids, the map will be cropped from the grid having smaller coordinate + + a <- which.min(abs((region[1]-obs_lons))) + b <- which.min(abs((region[2]-obs_lons))) + c <- which.min(abs((region[3]-obs_lats))) + d <- which.min(abs((region[4]-obs_lats))) + obs <- ClimProjDiags::Subset(x = obs, along = list(lon_dim,lat_dim), + indices = list(a:b,c:d), drop = 'selected') + } + if (is.null(region)) { - warning("The borders of the downscaling region have not been provided. Assuming the four borders of the ", - "downscaling region are defined by the first and last elements of the parametres 'exp_lats' and ", - "'exp_lons'.") - region <- c(exp_lons[1], exp_lons[length(exp_lons)], exp_lats[1], exp_lats[length(exp_lats)]) + warning("The borders of the downscaling region have not been provided. ", + "Assuming the four borders of the downscaling region are defined by the ", + "first and last elements of the parameters 'exp_lats' and 'exp_lons'.") + region <- c(exp_lons[1], exp_lons[length(exp_lons)], exp_lats[1], + exp_lats[length(exp_lats)]) } - obs_interpolated <- Interpolation(exp = obs_train, lats = obs_train_lats, lons = obs_train_lons, - target_grid = grid_exp, lat_dim = lat_dim, lon_dim = lon_dim, - method_remap = "conservative", region = region, ncores = ncores) + obs_interpolated <- Interpolation(exp = obs_train, lats = obs_train_lats, lons = obs_train_lons, + target_grid = grid_exp, lat_dim = lat_dim, lon_dim = lon_dim, + method_remap = "conservative", region = region, + ncores = ncores) + # If after interpolating 'obs' data the coordinates do not match, the exp data is interpolated to # the same grid to force the matching - if (!.check_coords(lat1 = as.numeric(obs_interpolated$lat), lat2 = exp_lats, lon1 = as.numeric(obs_interpolated$lon), lon2 = exp_lons)) { - exp_interpolated <- Interpolation(exp = exp, lats = exp_lats, lons = exp_lons, target_grid = grid_exp, - lat_dim = lat_dim, lon_dim = lon_dim, method_remap = "conservative", + if (!.check_coords(lat1 = as.numeric(obs_interpolated$lat), + lat2 = exp_lats, + lon1 = as.numeric(obs_interpolated$lon), + lon2 = exp_lons)) { + exp_interpolated <- Interpolation(exp = exp, lats = exp_lats, lons = exp_lons, + target_grid = grid_exp, lat_dim = lat_dim, + lon_dim = lon_dim, method_remap = "conservative", region = region, ncores = ncores)$data } else { exp_interpolated <- exp @@ -391,40 +450,57 @@ Analogs <- function(exp, obs, exp_lats, exp_lons, obs_lats, obs_lons, grid_exp, # Create window if user does not have it in the training observations if ( !("window" %in% names(dim(obs_interpolated$data))) ) { - obs_train_interpolated <- .generate_window(obj = obs_interpolated$data, sdate_dim = sdate_dim, - time_dim = time_dim, loocv = loocv_window, ncores = ncores) - obs_hres <- .generate_window(obj = obs, sdate_dim = sdate_dim, time_dim = time_dim, loocv = loocv_window, ncores = ncores) + obs_train_interpolated <- .generate_window(obj = obs_interpolated$data, sdate_dim = sdate_dim, + time_dim = time_dim, loocv = loocv_window, + ncores = ncores) + if (!is.null(obsL)) { + if ( ("window" %in% names(dim(obs))) ) { + stop("Either both obs and obsL should include 'window' dimension or none.") + } + } + obs_hres <- .generate_window(obj = obs, sdate_dim = sdate_dim, time_dim = time_dim, + loocv = loocv_window, ncores = ncores) + } else { obs_train_interpolated <- obs_interpolated$data - dim(obs_train_interpolated) <- dim(obs_train_interpolated)[-which(names(dim(obs_train_interpolated))=="time")] + dim(obs_train_interpolated) <- dim(ClimProjDiags::Subset(x = obs_train_interpolated, + along = time_dim, indices = 1, drop = 'selected')) + + if (!is.null(obsL)) { + if ( !("window" %in% names(dim(obs))) ) { + stop("Either both obs and obsL should include 'window' dimension or none.") + } + } obs_hres <- obs - dim(obs_hres) <- dim(obs_hres)[-which(names(dim(obs_hres))=="time")] + dim(obs_hres) <- dim(ClimProjDiags::Subset(x = obs_hres, + along = time_dim, indices = 1, drop = 'selected')) + } #----------------------------------- # Reshape train and test #----------------------------------- - res.data <- Apply(list(obs_train_interpolated, exp_interpolated, obs_hres), - target_dims = list(c("window", lat_dim, lon_dim), c(lat_dim, lon_dim), - c("window", lat_dim, lon_dim)), - fun = function(tr, te, ob) .analogs(train = tr, test = te, obs_hres = ob, k = nanalogs, - fun_analog = fun_analog), ncores = ncores)$output1 - # Return the indices of the best analogs - if (return_indices) { - res.ind <- Apply(list(obs_train_interpolated, exp_interpolated, obs_hres), + RES <- Apply(list(obs_train_interpolated, exp_interpolated, obs_hres), target_dims = list(c("window", lat_dim, lon_dim), c(lat_dim, lon_dim), c("window", lat_dim, lon_dim)), - fun = function(tr, te, ob) .analogs(train = tr, test = te, obs_hres = ob, k = nanalogs, - fun_analog = fun_analog, return_indices = TRUE), ncores = ncores, output_dims = 'ind')$output1 + fun = function(tr, te, ob) .analogs(train = tr, test = te, obs_hres = ob, + k = nanalogs, metric = metric, fun_analog = fun_analog), + ncores = ncores) ## output1 -> data, output2 -> index, output3 -> metric + + res.data <- RES$output1 + + # Return the indices of the best analogs + if (return_indices) { + res.ind <- RES$output2 # If cross-validation has been applied, correct the indices if (loocv_correction) { nsdates <- dim(res.ind)[names(dim(res.ind)) == sdate_dim] ntimes <- dim(res.ind)[names(dim(res.ind)) == time_dim] - res.ind <- Apply(res.ind, target_dims = c("ind", sdate_dim), function(x) + res.ind <- Apply(res.ind, target_dims = c("index", sdate_dim), function(x) sapply(1:nsdates, function(s) seq(ntimes * nsdates)[ - (ntimes * (s - 1) + 1:ntimes)][x[, s]]), - output_dims = c("ind", sdate_dim), ncores = ncores)$output1 + output_dims = c("index", sdate_dim), ncores = ncores)$output1 } # restore ensemble dimension in observations if it existed originally @@ -433,72 +509,92 @@ Analogs <- function(exp, obs, exp_lats, exp_lons, obs_lats, obs_lons, grid_exp, } res <- list(data = res.data, ind = res.ind, obs = obs, lon = obs_lons, lat = obs_lats) - } - else { + } else { # restore ensemble dimension in observations if it existed originally - if (restore_ens) { - obs <- s2dv::InsertDim(obs, posdim = 1, lendim = 1, name = member_dim) - } - - res <- list(data = res.data, obs = obs, lon = obs_lons, lat = obs_lats) + if (restore_ens) { + obs <- s2dv::InsertDim(obs, posdim = 1, lendim = 1, name = member_dim) + } + res <- list(data = res.data, obs = obs, lon = obs_lons, lat = obs_lats) } return(res) } # For each element in test, find the indices of the k nearest neigbhors in train -.analogs <- function(train, test, obs_hres, k, fun_analog, return_indices = FALSE) { - - require(FNN) - - # train and obs_hres dim: 3 dimensions window, lat and lon (in this order) +.analogs <- function(train, test, obs_hres, k, fun_analog, metric = NULL, return_indices = FALSE) { + + # train and obs_rs dim: 3 dimensions window, lat and lon (in this order) # test dim: 2 dimensions lat and lon (in this order) # Number of lats/lons of the high-resolution data space_dims_hres <- dim(obs_hres)[c(2,3)] # Reformat train and test as an array with (time, points) - train <- apply(train, 1, as.vector); names(dim(train))[1] <- "space" + train <- apply(train, 1, as.vector); names(dim(train))[1] <- "space" test <- as.vector(test) obs_hres <- apply(obs_hres, 1, as.vector); names(dim(obs_hres))[1] <- "space" - + # Identify and remove NA's - idx_na_tr <- is.na(train[ , 1]) + dum<-which(!apply(train,2,function (x) all(is.na(x))))[1] ## the column in which NA in space will be investigated. it shouldn't be all-NA time-step + idx_na_tr <- is.na(train[ , dum]) # NA in space + idy_na_tr <- is.na(train[1, ]) # NA in time idx_na_te <- is.na(test) idx_na <- idx_na_tr | idx_na_te - tr_wo_na <- t(train[!idx_na , ]) + tr_wo_na <- t(train[!idx_na , !idy_na_tr ]) te_wo_na <- test[!idx_na] te_wo_na <- InsertDim(data = te_wo_na, posdim = 1, lendim = 1, name = "time") - - knn.ind <- get.knnx(tr_wo_na, te_wo_na, k) - dist <- knn.ind$nn.dist - idx <- knn.ind$nn.index - - # Either return only the indices or the analogs - if (return_indices) { - res <- as.numeric(idx) + if (all(is.na(test))) { + res <- array(NA, space_dims_hres) + res_ind <- array(NA, k) + names(dim(res_ind)) <- c("index") + res_metric <- array(NA, k) + names(dim(res_metric)) <- c("metric") } else { + if (metric == "dist") { + dist_all <- sqrt(rowSums((sweep(tr_wo_na, 2, te_wo_na[1,]))^2)) # euc dist + best_vals <- head(sort(dist_all), k) + idx <- match(best_vals, dist_all) + } else if (metric == "cor") { + cor_all <- apply(tr_wo_na, 1, function (x) cor(x,te_wo_na[1, ], method = "spearman")) # cor + best_vals <- head(sort(cor_all, decreasing = TRUE), k) + idx <- match(best_vals, cor_all) + } else if (metric == "dcor") { +# require(energy) + dcor_all <- apply(tr_wo_na, 1, function (x) .dcor(x,te_wo_na[1, ])) # dcor + best_vals <- head(sort(dcor_all, decreasing = TRUE), k,) + idx <- match(best_vals, dcor_all) + } + if (isTRUE(any(idy_na_tr))) { + dum <-(1:length(idy_na_tr))[!idy_na_tr] + idx <- dum[idx] + } + res_ind <- array(idx, k) + names(dim(res_ind)) <- c("index") + res_metric <- array(best_vals, c(k)) + names(dim(res_metric)) <- c("metric") res <- obs_hres[ , idx] dim(res) <- c(space_dims_hres, analogs = k) if (!is.null(fun_analog)) { if (fun_analog == "wmean") { - weight <- 1 / dist + if (metric == "dist") { + weight <- 1 / best_vals + } else { + weight <- best_vals + } res <- apply(res, c(1,2), function(x) weighted.mean(x, weight)) } else if (fun_analog == "min") { - res<-res[,,which.min(dist)] + res <- res[, , which.min(best_vals)] } else if (fun_analog == "max") { - res<-res[,,which.max(dist)] + res <- res[, , which.max(best_vals)] } else { res <- apply(res, c(1,2), fun_analog) } } } - - return(res) + return(list(res, res_ind, res_metric)) } - # Add the dimension window to an array that contains, at least, the start date and time # dimensions # object has at least dimensions sdate and time @@ -548,3 +644,38 @@ Analogs <- function(exp, obs, exp_lats, exp_lons, obs_lats, obs_lons, grid_exp, return(obj_window) } + +# Distance correlation function +.dcor <- function(x, y) { + n <- length(x) + + # Calculate Euclidean distances for x + dist_x <- as.matrix(dist(matrix(x))) + # Calculate Euclidean distances for y + dist_y <- as.matrix(dist(matrix(y))) + + # Centering matrices + H <- diag(n) - 1/n + + # Centered distance matrices + dist_centered_x <- H %*% dist_x %*% H + dist_centered_y <- H %*% dist_y %*% H + + # Calculate the product of mean-centered distance matrices + B <- dist_centered_x %*% t(dist_centered_y) + C <- dist_centered_x %*% t(dist_centered_x) + D <- dist_centered_y %*% t(dist_centered_y) + + # Calculate the distance covariance + dcov_xy <- sqrt(sum(diag(B))) + + # Calculate the variances + cov_xx <- sqrt(sum(diag(C))) + cov_yy <- sqrt(sum(diag(D))) + + # Calculate the distance correlation + dcor_xy <- dcov_xy / sqrt(cov_xx * cov_yy) + + return(dcor_xy) +} + diff --git a/modules/Downscaling/tmp/Intbc.R b/modules/Downscaling/tmp/Intbc.R index dc5d050b8565088215b248573a8d4e976ea5c62a..4b9527323aa58ecd67daf71870c5755e7b31492b 100644 --- a/modules/Downscaling/tmp/Intbc.R +++ b/modules/Downscaling/tmp/Intbc.R @@ -82,7 +82,7 @@ CST_Intbc <- function(exp, obs, target_grid, bc_method, int_method = NULL, point res <- Intbc(exp = exp$data, obs = obs$data, exp_lats = exp$coords[[lat_dim]], exp_lons = exp$coords[[lon_dim]], obs_lats = obs$coords[[lat_dim]], obs_lons = obs$coords[[lon_dim]], target_grid = target_grid, int_method = int_method, bc_method = bc_method, points = points, - source_file_exp = exp$attrs$source_files[1], source_file_obs = obs$attrs$source_files[1], + source_file_exp = exp$coords$source_files[1], source_file_obs = obs$coords$source_files[1], method_point_interp = method_point_interp, lat_dim = lat_dim, lon_dim = lon_dim, sdate_dim = sdate_dim, member_dim = member_dim, region = region, ncores = ncores, ...) diff --git a/modules/Downscaling/tmp/Interpolation.R b/modules/Downscaling/tmp/Interpolation.R index ed79f4fd6aa0441fecb0bd89d8c87fccc100cb6c..5d5f70b8d074f50c4ef391c78bb32f51c75ed191 100644 --- a/modules/Downscaling/tmp/Interpolation.R +++ b/modules/Downscaling/tmp/Interpolation.R @@ -69,7 +69,7 @@ CST_Interpolation <- function(exp, points = NULL, method_remap = NULL, target_gr } res <- Interpolation(exp = exp$data, lats = exp$coords[[lat_dim]], lons = exp$coords[[lon_dim]], - source_file = exp$attrs$source_files[1], points = points, + source_file = exp$coords$source_files[1], points = points, method_remap = method_remap, target_grid = target_grid, lat_dim = lat_dim, lon_dim = lon_dim, region = region, method_point_interp = method_point_interp, ncores = ncores) diff --git a/modules/Downscaling/tmp/Intlr.R b/modules/Downscaling/tmp/Intlr.R index 36a7f11b27964650836d80dea0ed30703e567ee2..f108877b7d55a6016b437a17a8144d6c68b28198 100644 --- a/modules/Downscaling/tmp/Intlr.R +++ b/modules/Downscaling/tmp/Intlr.R @@ -99,9 +99,9 @@ CST_Intlr <- function(exp, obs, lr_method, target_grid = NULL, points = NULL, in stop("Parameter 'obs' must be of the class 's2dv_cube'") } - res <- Intlr(exp = exp$data, obs = obs$data, exp_lats = exp$coords[[lat_dim]], exp_lons = exp$coords[[lon_dim]], + res <- Intlr(exp = exp$data, obs = obs$data, exp_lats = coords[[lat_dim]], exp_lons = exp$coords[[lon_dim]], obs_lats = obs$coords[[lat_dim]], obs_lons = obs$coords[[lon_dim]], points = points, - source_file_exp = exp$attrs$source_files[1], source_file_obs = obs$attrs$source_files[1], + source_file_exp = exp$coords$source_files[1], source_file_obs = obs$coords$source_files[1], target_grid = target_grid, lr_method = lr_method, int_method = int_method, method_point_interp = method_point_interp, predictors = predictors, lat_dim = lat_dim, lon_dim = lon_dim, sdate_dim = sdate_dim, time_dim = time_dim, diff --git a/modules/Downscaling/tmp/LogisticReg.R b/modules/Downscaling/tmp/LogisticReg.R index a85a1b3ff2d36741c3371e63be57f8b44d7b57ee..67fa1b28c9d8bf508d3f2b0d39c6a4940ade29c1 100644 --- a/modules/Downscaling/tmp/LogisticReg.R +++ b/modules/Downscaling/tmp/LogisticReg.R @@ -86,6 +86,7 @@ #'res <- CST_LogisticReg(exp = exp, obs = obs, int_method = 'bil', target_grid = 'r1280x640', #'probs_cat = c(1/3, 2/3)) #'@export + CST_LogisticReg <- function(exp, obs, target_grid, int_method = NULL, log_reg_method = "ens_mean", probs_cat = c(1/3,2/3), return_most_likely_cat = FALSE, points = NULL, method_point_interp = NULL, lat_dim = "lat", lon_dim = "lon", sdate_dim = "sdate", @@ -106,7 +107,7 @@ CST_LogisticReg <- function(exp, obs, target_grid, int_method = NULL, log_reg_me int_method = int_method, log_reg_method = log_reg_method, points = points, method_point_interp = method_point_interp, lat_dim = lat_dim, lon_dim = lon_dim, sdate_dim = sdate_dim, member_dim = member_dim, - source_file_exp = exp$attrs$source_files[1], source_file_obs = obs$attrs$source_files[1], + source_file_exp = exp$coords$source_files[1], source_file_obs = obs$coords$source_files[1], region = region, loocv = loocv, ncores = ncores) # Modify data, lat and lon in the origina s2dv_cube, adding the downscaled data @@ -391,12 +392,14 @@ obs_cat <- Apply(obs_ref, target_dims = sdate_dim, function(x) { rep(NA,length(x)) } else { terc <- convert2prob(as.vector(x), prob = probs_cat) - apply(terc, 1, function(r) which (r == 1))}}, + as.integer(apply(terc, 1, function(r) which (r == 1)))}}, output_dims = sdate_dim, ncores = ncores)$output1 - res <- Apply(list(predictor, obs_cat), target_dims = list(target_dims_predictor, sdate_dim), fun = function(x, y) - .log_reg(x = x, y = y, loocv = loocv,probs_cat=probs_cat), output_dims = c(sdate_dim, "category"), ncores = ncores)$output1 + res <- Apply(list(predictor, obs_cat), target_dims = list(target_dims_predictor, sdate_dim), + fun = function(x, y) + .log_reg(x = x, y = y, loocv = loocv,probs_cat=probs_cat), + output_dims = c(sdate_dim, "category"), ncores = ncores)$output1 if (return_most_likely_cat) { res <- Apply(res, target_dims = c(sdate_dim, "category"), .most_likely_category, @@ -444,7 +447,7 @@ obs_cat <- Apply(obs_ref, target_dims = sdate_dim, function(x) { require(s2dv) # compute climatology - clim <- Apply(obj_ens, target_dims = c(member_dim, sdate_dim), mean, ncores = ncores)$output1 + clim <- Apply(obj_ens, target_dims = c(member_dim, sdate_dim), mean, ncores = ncores, na.rm = TRUE)$output1 # compute ensemble mean ens_mean <- Apply(obj_ens, target_dims = member_dim, mean, na.rm = TRUE, ncores = ncores)$output1 @@ -472,7 +475,7 @@ obs_cat <- Apply(obs_ref, target_dims = sdate_dim, function(x) { lm1 <- .train_lr(df = tmp_df, loocv = loocv) # prediction - res <- pred_lr(lm1 = lm1, df = tmp_df, loocv = loocv,probs_cat=probs_cat) + res <- pred_lr(lm1 = lm1, df = tmp_df, loocv = loocv, probs_cat=probs_cat) } return(res) } @@ -489,11 +492,11 @@ obs_cat <- Apply(obs_ref, target_dims = sdate_dim, function(x) { if (loocv) { - lm1 <- lapply(1:nrow(df), function(j) multinom(y ~ ., data = df[ -j, ])) + lm1 <- lapply(1:nrow(df), function(j) multinom(y ~ ., data = df[ -j, ], trace = FALSE)) } else { - lm1 <- list(multinom(y ~ ., data = df)) + lm1 <- list(multinom(y ~ ., data = df, trace = FALSE)) } @@ -503,7 +506,7 @@ obs_cat <- Apply(obs_ref, target_dims = sdate_dim, function(x) { #----------------------------------- # Function to apply the logistic regressions. #----------------------------------- -pred_lr <- function(df, lm1, loocv,probs_cat) { +pred_lr <- function(df, lm1, loocv, probs_cat) { require(plyr) @@ -514,35 +517,37 @@ pred_lr <- function(df, lm1, loocv,probs_cat) { pred_vals_ls <-list() for (j in 1:nrow(df)) { - pred_vals_ls[[j]] <- predict(lm1[[j]], df[j,], type = "probs") - } + if (all(is.na(df[j,]))) { + pred_vals_ls[[j]] <- rep(NA, length(probs_cat) + 1) + } else { + pred_vals_ls[[j]] <- predict(lm1[[j]], df[j,], type = "probs") + } + } - pred_vals <- laply(pred_vals_ls, .fun = as.array) + pred_vals <- laply(pred_vals_ls, .fun = as.array) - if( length(probs_cat)+1==2) - { + if( length(probs_cat)+1==2) { pred_vals_dum<-array(NA,dim=c(nrow(df),2)) pred_vals_dum[,2]<-pred_vals pred_vals_dum[,1]<-1-pred_vals pred_vals<-pred_vals_dum colnames(pred_vals)<-c(1,2) - } + } - } else { + } else { # type = class, probs #pred_vals_ls <- lapply(lm1, predict, data = df, type = "probs") #pred_vals <- unlist(pred_vals_ls) pred_vals <- predict(lm1[[1]], df, type = "probs") - if( length(probs_cat)+1==2) - { - pred_vals_dum<-array(NA,dim=c(nrow(df),2)) - pred_vals_dum[,2]<-pred_vals - pred_vals_dum[,1]<-1-pred_vals - pred_vals<-pred_vals_dum - colnames(pred_vals)<-c(1,2) - } + if( length(probs_cat)+1==2) { + pred_vals_dum<-array(NA,dim=c(nrow(df),2)) + pred_vals_dum[,2]<-pred_vals + pred_vals_dum[,1]<-1-pred_vals + pred_vals<-pred_vals_dum + colnames(pred_vals)<-c(1,2) + } } diff --git a/modules/Indices/R/compute_nino.R b/modules/Indices/R/compute_nino.R index 8fd2c9c8a90b2e947000d940684a659a68cb49af..915dc9cedd826e9733f0b8495dbb7f72ee8edcbb 100644 --- a/modules/Indices/R/compute_nino.R +++ b/modules/Indices/R/compute_nino.R @@ -208,6 +208,7 @@ compute_nino <- function(data, recipe, region, standardised = TRUE, } } if (plot_sp) { + ## TODO: Remove sourcing of plot robinson and viz module code source("modules/Visualization/R/tmp/PlotRobinson.R") source("modules/Indices/R/correlation_eno.R") source("modules/Visualization/R/get_proj_code.R") @@ -227,28 +228,36 @@ compute_nino <- function(data, recipe, region, standardised = TRUE, correl_hcst <- Apply(list(data$hcst$data, nino$hcst$data), target_dims = c('syear', 'ensemble'), fun = function(x, y) { - x <- apply(x, 1, mean, na.rm = TRUE) - y <- apply(y, 1, mean, na.rm = TRUE) - dim(y) <- c(syear = length(y)) - dim(x) <- c(syear = length(x)) - res <- .correlation_eno(x, y, - time_dim = 'syear', method = 'pearson', alpha = alpha, - test.type = 'two-sided', pval = FALSE)}, - ncores = recipe$Analysis$ncores) + x <- apply(x, 1, mean, na.rm = TRUE) + y <- apply(y, 1, mean, na.rm = TRUE) + dim(y) <- c(syear = length(y)) + dim(x) <- c(syear = length(x)) + res <- .correlation_eno(x, y, time_dim = 'syear', + method = 'pearson', + alpha = alpha, + test.type = 'two-sided', + pval = FALSE)}, + ncores = recipe$Analysis$ncores) correl_hcst_full <- Apply(list(data$hcst$data, nino$hcst$data), target_dims = c('syear', 'ensemble'), fun = function(x,y) { - dim(y) <- c(syear = length(y)) - dim(x) <- c(syear = length(x)) - res <- .correlation_eno(x, y, - time_dim = 'syear', method = 'pearson', alpha = alpha, - test.type = 'two-sided', pval = FALSE)}, + dim(y) <- c(syear = length(y)) + dim(x) <- c(syear = length(x)) + res <- .correlation_eno(x, y, + time_dim = 'syear', + method = 'pearson', + alpha = alpha, + test.type = 'two-sided', + pval = FALSE)}, ncores = recipe$Analysis$ncores) + months <- lubridate::month(Subset(data$hcst$attrs$Dates, "syear", indices = 1), + label = T, abb = F, locale = "en_GB") + for (tstep in 1:dim(nino$obs$data)['time']) { map <- Subset(correl_obs$r, along = 'time', ind = tstep, drop = T) sig <- Subset(correl_obs$sig, along = 'time', ind = tstep, drop = T) if (tolower(recipe$Analysis$Horizon) == "seasonal") { - mes <- as.numeric(substr(recipe$Analysis$Time$sdate, 1,2)) + + mes <- as.numeric(substr(recipe$Analysis$Time$sdate, 1, 2)) + (tstep - 1) + (recipe$Analysis$Time$ftime_min - 1) mes <- ifelse(mes > 12, mes - 12, mes) fmonth <- sprintf("%02d", tstep - 1 + recipe$Analysis$Time$ftime_min) @@ -313,7 +322,7 @@ compute_nino <- function(data, recipe, region, standardised = TRUE, toptitle <- paste(recipe$Analysis$Datasets$System$name, "\n", "Ni\u00F1o", region_name, "SST Index -",var_name, "\n", "Correlation /", - month.abb[as.numeric(fmonth)], + month.abb[mes], "/", recipe$Analysis$Time$hcst_start, "-", recipe$Analysis$Time$hcst_end) plotfile <- paste0(recipe$Run$output_dir, "/plots/Indices/", @@ -367,7 +376,7 @@ compute_nino <- function(data, recipe, region, standardised = TRUE, toptitle <- paste(recipe$Analysis$Datasets$System$name, "\n", "Ni\u00F1o", region_name, "SST Index -",var_name, "\n", " Correlation /", - month.abb[as.numeric(fmonth)], + month.abb[mes], "/", recipe$Analysis$Time$hcst_start, "-", recipe$Analysis$Time$hcst_end) plotfile <- paste0(recipe$Run$output_dir, "/plots/Indices/", diff --git a/modules/Loading/Dev_Loading.R b/modules/Loading/Dev_Loading.R deleted file mode 100644 index fb456eb31aee1f0ba2b0eded1839ca2dd8d4c723..0000000000000000000000000000000000000000 --- a/modules/Loading/Dev_Loading.R +++ /dev/null @@ -1,501 +0,0 @@ -## TODO: remove paths to personal scratchs -source("/esarchive/scratch/vagudets/repos/csoperational/R/get_regrid_params.R") -# Load required libraries/funs -source("modules/Loading/R/dates2load.R") -source("modules/Loading/R/get_timeidx.R") -source("modules/Loading/R/check_latlon.R") -## TODO: Move to prepare_outputs.R -source("tools/libs.R") -## TODO: remove these two lines when new as.s2dv_cube() is in CSTools -source('https://earth.bsc.es/gitlab/external/cstools/-/raw/develop-new_s2dv_cube/R/as.s2dv_cube.R') -source('https://earth.bsc.es/gitlab/external/cstools/-/raw/develop-new_s2dv_cube/R/zzz.R') - -## TODO: Source new s2dv_cube version -## TODO: Eliminate dim_var dimension (merge_across_dims?) - -load_datasets <- function(recipe) { - - # ------------------------------------------- - # Set params ----------------------------------------- - - hcst.inityear <- recipe$Analysis$Time$hcst_start - hcst.endyear <- recipe$Analysis$Time$hcst_end - lats.min <- recipe$Analysis$Region$latmin - lats.max <- recipe$Analysis$Region$latmax - lons.min <- recipe$Analysis$Region$lonmin - lons.max <- recipe$Analysis$Region$lonmax - ref.name <- recipe$Analysis$Datasets$Reference$name - exp.name <- recipe$Analysis$Datasets$System$name - - variable <- strsplit(recipe$Analysis$Variables$name, ", | |,")[[1]][1] - vars <- strsplit(recipe$Analysis$Variables$name, ", | |,")[[1]] - store.freq <- recipe$Analysis$Variables$freq - - # get sdates array - ## LOGGER: Change dates2load to extract logger from recipe? - sdates <- dates2load(recipe, recipe$Run$logger) - - idxs <- NULL - idxs$hcst <- get_timeidx(sdates$hcst, - recipe$Analysis$Time$ftime_min, - recipe$Analysis$Time$ftime_max, - time_freq=store.freq) - - if (!(is.null(sdates$fcst))) { - idxs$fcst <- get_timeidx(sdates$fcst, - recipe$Analysis$Time$ftime_min, - recipe$Analysis$Time$ftime_max, - time_freq=store.freq) - } - - ## TODO: Examine this verifications part, verify if it's necessary - # stream <- verifications$stream - # sdates <- verifications$fcst.sdate - - ## TODO: define fcst.name - ##fcst.name <- recipe$Analysis$Datasets$System[[sys]]$name - - # get esarchive datasets dict: - ## TODO: Adapt to 'filesystem' option in recipe - archive <- read_yaml("conf/archive.yml")$esarchive - exp_descrip <- archive$System[[exp.name]] - - freq.hcst <- unlist(exp_descrip[[store.freq]][variable]) - reference_descrip <- archive$Reference[[ref.name]] - freq.obs <- unlist(reference_descrip[[store.freq]][variable]) - obs.dir <- reference_descrip$src - fcst.dir <- exp_descrip$src - hcst.dir <- exp_descrip$src - fcst.nmember <- exp_descrip$nmember$fcst - hcst.nmember <- exp_descrip$nmember$hcst - - ## TODO: it is necessary? - ##if ("accum" %in% names(reference_descrip)) { - ## accum <- unlist(reference_descrip$accum[store.freq][[1]]) - ##} else { - ## accum <- FALSE - ##} - - var_dir_obs <- reference_descrip[[store.freq]][vars] - var_dir_exp <- exp_descrip[[store.freq]][vars] - - # ----------- - obs.path <- paste0(archive$src, - obs.dir, store.freq, "/$var$", "$var_dir$", - "/$var$_$file_date$.nc") - - hcst.path <- paste0(archive$src, - hcst.dir, store.freq, "/$var$", "$var_dir$", - "$var$_$file_date$.nc") - - fcst.path <- paste0(archive$src, - hcst.dir, store.freq, "/$var$", "$var_dir$", - "/$var$_$file_date$.nc") - - # Define regrid parameters: - #------------------------------------------------------------------- - regrid_params <- get_regrid_params(recipe, archive) - - # Longitude circular sort and latitude check - #------------------------------------------------------------------- - circularsort <- check_latlon(lats.min, lats.max, lons.min, lons.max) - - if (recipe$Analysis$Variables$freq == "monthly_mean"){ - split_multiselected_dims = TRUE - } else { - split_multiselected_dims = FALSE - } - - # Load hindcast - #------------------------------------------------------------------- - hcst <- Start(dat = hcst.path, - var = vars, - var_dir = var_dir_exp, - file_date = sdates$hcst, - time = idxs$hcst, - var_dir_depends = 'var', - latitude = values(list(lats.min, lats.max)), - latitude_reorder = Sort(), - longitude = values(list(lons.min, lons.max)), - longitude_reorder = circularsort, - transform = regrid_params$fcst.transform, - transform_params = list(grid = regrid_params$fcst.gridtype, - method = regrid_params$fcst.gridmethod), - transform_vars = c('latitude', 'longitude'), - synonims = list(latitude = c('lat', 'latitude'), - longitude = c('lon', 'longitude'), - ensemble = c('member', 'ensemble')), - ensemble = indices(1:hcst.nmember), - metadata_dims = 'var', # change to just 'var'? - return_vars = list(latitude = 'dat', - longitude = 'dat', - time = 'file_date'), - split_multiselected_dims = split_multiselected_dims, - retrieve = TRUE) - - # Remove var_dir dimension - if ("var_dir" %in% names(dim(hcst))) { - hcst <- Subset(hcst, along = "var_dir", indices = 1, drop = "selected") - } - - if (recipe$Analysis$Variables$freq == "daily_mean") { - # Adjusts dims for daily case, could be removed if startR allows - # multidim split - names(dim(hcst))[which(names(dim(hcst)) == 'file_date')] <- "syear" - default_dims <- c(dat = 1, var = 1, sday = 1, - sweek = 1, syear = 1, time = 1, - latitude = 1, longitude = 1, ensemble = 1) - default_dims[names(dim(hcst))] <- dim(hcst) - dim(hcst) <- default_dims - # Change time attribute dimensions - default_time_dims <- c(sday = 1, sweek = 1, syear = 1, time = 1) - names(dim(attr(hcst, "Variables")$common$time))[which(names( - dim(attr(hcst, "Variables")$common$time)) == 'file_date')] <- "syear" - default_time_dims[names(dim(attr(hcst, "Variables")$common$time))] <- - dim(attr(hcst, "Variables")$common$time) - dim(attr(hcst, "Variables")$common$time) <- default_time_dims - } - - # Convert hcst to s2dv_cube object - ## TODO: Give correct dimensions to $Dates - ## (sday, sweek, syear instead of file_date) - hcst <- as.s2dv_cube(hcst) - # Adjust dates for models where the time stamp goes into the next month - if (recipe$Analysis$Variables$freq == "monthly_mean") { - hcst$attrs$Dates[] <- hcst$attrs$Dates - seconds(exp_descrip$time_stamp_lag) - } - - ## Combine tas and tos data into one variable: tas-tos - if(recipe$Analysis$Variables$name == 'tas tos'){ - #if(recipe$Analysis$Datasets$Reference$name == 'HadCRUT5' || recipe$Analysis$Datasets$Reference$name == 'BEST') { - source('/esarchive/scratch/nmilders/gitlab/git_clones/s2s-suite/modules/Loading/R/mask_tas_tos.R') - hcst <- mask_tas_tos(input_data = hcst, region = c(lons.min, lons.max,lats.min, lats.max), - grid = 'r360x181', - lon = hcst$coords$longitude, - lat = hcst$coords$latitude, - lon_dim = 'longitude', lat_dim = 'latitude', ncores = NULL) - - hcst$dims[['var']] <- dim(hcst$data)[['var']] - #} - } - - # Load forecast - #------------------------------------------------------------------- - if (!is.null(recipe$Analysis$Time$fcst_year)) { - # the call uses file_date instead of fcst_syear so that it can work - # with the daily case and the current version of startR not allowing - # multiple dims split - - fcst <- Start(dat = fcst.path, - var = vars, - var_dir = var_dir_exp, - var_dir_depends = 'var', - file_date = sdates$fcst, - time = idxs$fcst, - latitude = values(list(lats.min, lats.max)), - latitude_reorder = Sort(), - longitude = values(list(lons.min, lons.max)), - longitude_reorder = circularsort, - transform = regrid_params$fcst.transform, - transform_params = list(grid = regrid_params$fcst.gridtype, - method = regrid_params$fcst.gridmethod), - transform_vars = c('latitude', 'longitude'), - synonims = list(latitude = c('lat', 'latitude'), - longitude = c('lon', 'longitude'), - ensemble = c('member', 'ensemble')), - ensemble = indices(1:fcst.nmember), - metadata_dims = 'var', - return_vars = list(latitude = 'dat', - longitude = 'dat', - time = 'file_date'), - split_multiselected_dims = split_multiselected_dims, - retrieve = TRUE) - - if ("var_dir" %in% names(dim(fcst))) { - fcst <- Subset(fcst, along = "var_dir", indices = 1, drop = "selected") - } - - if (recipe$Analysis$Variables$freq == "daily_mean") { - # Adjusts dims for daily case, could be removed if startR allows - # multidim split - names(dim(fcst))[which(names(dim(fcst)) == 'file_date')] <- "syear" - default_dims <- c(dat = 1, var = 1, sday = 1, - sweek = 1, syear = 1, time = 1, - latitude = 1, longitude = 1, ensemble = 1) - default_dims[names(dim(fcst))] <- dim(fcst) - dim(fcst) <- default_dims - # Change time attribute dimensions - default_time_dims <- c(sday = 1, sweek = 1, syear = 1, time = 1) - names(dim(attr(fcst, "Variables")$common$time))[which(names( - dim(attr(fcst, "Variables")$common$time)) == 'file_date')] <- "syear" - default_time_dims[names(dim(attr(fcst, "Variables")$common$time))] <- - dim(attr(fcst, "Variables")$common$time) - dim(attr(fcst, "Variables")$common$time) <- default_time_dims - } - - # Convert fcst to s2dv_cube - fcst <- as.s2dv_cube(fcst) - # Adjust dates for models where the time stamp goes into the next month - if (recipe$Analysis$Variables$freq == "monthly_mean") { - fcst$attrs$Dates[] <- - fcst$attrs$Dates - seconds(exp_descrip$time_stamp_lag) - } - - } else { - fcst <- NULL - } - - # Load reference - #------------------------------------------------------------------- - - # Obtain dates and date dimensions from the loaded hcst data to make sure - # the corresponding observations are loaded correctly. - dates <- hcst$attrs$Dates - dim(dates) <- hcst$dims[c("sday", "sweek", "syear", "time")] - - # Separate Start() call for monthly vs daily data - if (store.freq == "monthly_mean") { - - dates_file <- format(as.Date(dates, '%Y%m%d'), "%Y%m") - dim(dates_file) <- dim(dates) - - ## tas tos mask - if (recipe$Analysis$Variables$name == 'tas tos'){ - if (recipe$Analysis$Datasets$Reference$name == 'HadCRUT5'){ - vars <- 'tasanomaly' - var_dir_obs <- reference_descrip[[store.freq]][vars] - } - } - - if (recipe$Analysis$Variables$name == 'tas tos'){ - if (recipe$Analysis$Datasets$Reference$name == 'BEST'){ - vars <- 'tas' - var_dir_obs <- reference_descrip[[store.freq]][vars] - } - } - - obs <- Start(dat = obs.path, - var = vars, - var_dir = var_dir_obs, - var_dir_depends = 'var', - file_date = dates_file, - latitude = values(list(lats.min, lats.max)), - latitude_reorder = Sort(), - longitude = values(list(lons.min, lons.max)), - longitude_reorder = circularsort, - transform = regrid_params$obs.transform, - transform_params = list(grid = regrid_params$obs.gridtype, - method = regrid_params$obs.gridmethod), - transform_vars = c('latitude', 'longitude'), - synonims = list(latitude = c('lat','latitude'), - longitude = c('lon','longitude')), - metadata_dims = 'var', - return_vars = list(latitude = 'dat', - longitude = 'dat', - time = 'file_date'), - split_multiselected_dims = TRUE, - retrieve = TRUE) - - } else if (store.freq == "daily_mean") { - - # Get year and month for file_date - dates_file <- sapply(dates, format, '%Y%m') - dim(dates_file) <- dim(dates) - # Set hour to 12:00 to ensure correct date retrieval for daily data - lubridate::hour(dates) <- 12 - lubridate::minute(dates) <- 00 - # Restore correct dimensions - dim(dates) <- dim(dates_file) - - obs <- Start(dat = obs.path, - var = vars, - var_dir = var_dir_obs, - var_dir_depends = 'var', - file_date = sort(unique(dates_file)), - time = dates, - time_var = 'time', - time_across = 'file_date', - merge_across_dims = TRUE, - merge_across_dims_narm = TRUE, - latitude = values(list(lats.min, lats.max)), - latitude_reorder = Sort(), - longitude = values(list(lons.min, lons.max)), - longitude_reorder = circularsort, - transform = regrid_params$obs.transform, - transform_params = list(grid = regrid_params$obs.gridtype, - method = regrid_params$obs.gridmethod), - transform_vars = c('latitude', 'longitude'), - synonims = list(latitude = c('lat','latitude'), - longitude = c('lon','longitude')), - metadata_dims = 'var', - return_vars = list(latitude = 'dat', - longitude = 'dat', - time = 'file_date'), - split_multiselected_dims = TRUE, - retrieve = TRUE) - } - - # Remove var_dir dimension - if ("var_dir" %in% names(dim(obs))) { - obs <- Subset(obs, along = "var_dir", indices = 1, drop = "selected") - } - # Adds ensemble dim to obs (for consistency with hcst/fcst) - default_dims <- c(dat = 1, var = 1, sday = 1, - sweek = 1, syear = 1, time = 1, - latitude = 1, longitude = 1, ensemble = 1) - default_dims[names(dim(obs))] <- dim(obs) - dim(obs) <- default_dims - - # Convert obs to s2dv_cube - obs <- as.s2dv_cube(obs) - - ## Combine tas and tos data into one variable: tas-tos - if(recipe$Analysis$Variables$name == 'tas tos'){ - if(recipe$Analysis$Datasets$Reference$name != 'HadCRUT5' & recipe$Analysis$Datasets$Reference$name != 'BEST'){ - source('/esarchive/scratch/nmilders/gitlab/git_clones/s2s-suite/modules/Loading/R/mask_tas_tos.R') - obs <- mask_tas_tos(input_data = obs, region = c(lons.min, lons.max,lats.min, lats.max), - grid = 'r360x181', - lon = obs$coords$longitude, - lat = obs$coords$latitude, - lon_dim = 'longitude', lat_dim = 'latitude', ncores = NULL) - - obs$dims[['var']] <- dim(obs$data)[['var']] - } - } - - # Check for consistency between hcst and obs grid - if (!(recipe$Analysis$Regrid$type == 'none')) { - if (!isTRUE(all.equal(as.vector(hcst$lat), as.vector(obs$lat)))) { - lat_error_msg <- paste("Latitude mismatch between hcst and obs.", - "Please check the original grids and the", - "regrid parameters in your recipe.") - error(recipe$Run$logger, lat_error_msg) - hcst_lat_msg <- paste0("First hcst lat: ", hcst$lat[1], - "; Last hcst lat: ", hcst$lat[length(hcst$lat)]) - info(recipe$Run$logger, hcst_lat_msg) - obs_lat_msg <- paste0("First obs lat: ", obs$lat[1], - "; Last obs lat: ", obs$lat[length(obs$lat)]) - info(recipe$Run$logger, obs_lat_msg) - stop("hcst and obs don't share the same latitudes.") - } - if (!isTRUE(all.equal(as.vector(hcst$lon), as.vector(obs$lon)))) { - lon_error_msg <- paste("Longitude mismatch between hcst and obs.", - "Please check the original grids and the", - "regrid parameters in your recipe.") - error(recipe$Run$logger, lon_error_msg) - hcst_lon_msg <- paste0("First hcst lon: ", hcst$lon[1], - "; Last hcst lon: ", hcst$lon[length(hcst$lon)]) - info(recipe$Run$logger, hcst_lon_msg) - obs_lon_msg <- paste0("First obs lon: ", obs$lon[1], - "; Last obs lon: ", obs$lon[length(obs$lon)]) - info(recipe$Run$logger, obs_lon_msg) - stop("hcst and obs don't share the same longitudes.") - - } - } - - # Remove negative values in accumulative variables - dictionary <- read_yaml("conf/variable-dictionary.yml") - for (var_idx in 1:length(vars)) { - var_name <- vars[var_idx] - if (dictionary$vars[[var_name]]$accum) { - info(recipe$Run$logger, - paste0("Accumulated variable ", var_name, - ": setting negative values to zero.")) - # obs$data[, var_idx, , , , , , , ] <- pmax(Subset(obs$data, - # along = "var", - # indices = var_idx, F), 0) - obs$data[, var_idx, , , , , , , ][obs$data[, var_idx, , , , , , , ] < 0] <- 0 - hcst$data[, var_idx, , , , , , , ][hcst$data[, var_idx, , , , , , , ] < 0] <- 0 - if (!is.null(fcst)) { - fcst$data[, var_idx, , , , , , , ][fcst$data[, var_idx, , , , , , , ] < 0] <- 0 - } - } - - # Convert prlr from m/s to mm/day - ## TODO: Make a unit conversion function - if (vars[[var_idx]] == "prlr") { - # Verify that the units are m/s and the same in obs and hcst - if (((obs$attrs$Variable$metadata[[var_name]]$units == "m s-1") || - (obs$attrs$Variable$metadata[[var_name]]$units == "m s**-1")) && - ((hcst$attrs$Variable$metadata[[var_name]]$units == "m s-1") || - (hcst$attrs$Variable$metadata[[var_name]]$units == "m s**-1"))) { - info(recipe$Run$logger, "Converting precipitation from m/s to mm/day.") - obs$data[, var_idx, , , , , , , ] <- - obs$data[, var_idx, , , , , , , ]*86400*1000 - obs$attrs$Variable$metadata[[var_name]]$units <- "mm/day" - hcst$data[, var_idx, , , , , , , ] <- - hcst$data[, var_idx, , , , , , , ]*86400*1000 - hcst$attrs$Variable$metadata[[var_name]]$units <- "mm/day" - if (!is.null(fcst)) { - fcst$data[, var_idx, , , , , , , ] <- - fcst$data[, var_idx, , , , , , , ]*86400*1000 - fcst$attrs$Variable$metadata[[var_name]]$units <- "mm/day" - } - } - } - } - # Compute anomalies if requested - # Print a summary of the loaded data for the user, for each object - if (recipe$Run$logger$threshold <= 2) { - data_summary(hcst, recipe) - data_summary(obs, recipe) - if (!is.null(fcst)) { - data_summary(fcst, recipe) - } - } - - info(recipe$Run$logger, - "##### DATA LOADING COMPLETED SUCCESSFULLY #####") - - ############################################################################ - # - # CHECKS ON MISSING FILES - # - ############################################################################ - - #obs.NA_dates.ind <- Apply(obs, - # fun=(function(x){ all(is.na(x))}), - # target_dims=c('time', 'latitude', 'longitude'))[[1]] - #obs.NA_dates <- dates_file[obs.NA_dates.ind] - #obs.NA_dates <- obs.NA_dates[order(obs.NA_dates)] - #obs.NA_files <- paste0(obs.dir, store.freq,"/",variable,"_", - # freq.obs,"obs.grid","/",variable,"_",obs.NA_dates,".nc") - # - #if (any(is.na(hcst))){ - # fatal(recipe$Run$logger, - # paste(" ERROR: MISSING HCST VALUES FOUND DURING LOADING # ", - # " ################################################# ", - # " ###### MISSING FILES #### ", - # " ################################################# ", - # "hcst files:", - # hcst.NA_files, - # " ################################################# ", - # " ################################################# ", - # sep="\n")) - # quit(status = 1) - #} - # - #if (any(is.na(obs)) && !identical(obs.NA_dates,character(0))){ - # fatal(recipe$logger, - # paste(" ERROR: MISSING OBS VALUES FOUND DURING LOADING # ", - # " ################################################# ", - # " ###### MISSING FILES #### ", - # " ################################################# ", - # "obs files:", - # obs.NA_files, - # " ################################################# ", - # " ################################################# ", - # sep="\n")) - # quit(status=1) - #} - # - #info(recipe$logger, - # "######### DATA LOADING COMPLETED SUCCESFULLY ##############") - - ############################################################################ - ############################################################################ - - return(list(hcst = hcst, fcst = fcst, obs = obs)) - -} diff --git a/modules/Loading/Loading.R b/modules/Loading/Loading.R index 43dea0651fc7eb02cb8cb69a1d44d580903a4666..63fee97bede51b72128b917af9b5171d83061d4f 100644 --- a/modules/Loading/Loading.R +++ b/modules/Loading/Loading.R @@ -8,6 +8,9 @@ Loading <- function(recipe) { if (tolower(recipe$Run$filesystem) == "mars") { source("modules/Loading/R/load_GRIB.R") data <- load_GRIB(recipe) + } else if (tolower(recipe$Run$filesystem) == "sample") { + source("modules/Loading/R/load_sample.R") + data <- load_sample(recipe) } else { # Case: esarchive time_horizon <- tolower(recipe$Analysis$Horizon) diff --git a/modules/Loading/R/get_regrid_params.R b/modules/Loading/R/get_regrid_params.R new file mode 100644 index 0000000000000000000000000000000000000000..ef08adcd11f609e08ccc484e0408bb87317eb9b3 --- /dev/null +++ b/modules/Loading/R/get_regrid_params.R @@ -0,0 +1,73 @@ +#'Read regrid parameters from recipe and returns a list for use with Start() +#' +#'The purpose of this function is to read the recipe and archive configuration +#'data for Auto-S2S workflows, retrieve the regridding parameters for hcst and +#'obs, and return an object that can be the input for 'transform' and +#''transform_params' when the data is loaded using Start(). +#'Requires CDORemapper. +#' +#'@param recipe Auto-S2S configuration recipe as returned by read_yaml() +#'@param archive Auto-S2S exp and obs archive as returned by read_yaml() +#' +#'@return A list containing regridding parameters for fcst and obs +#' +#'@import startR +#'@examples +#'setwd("/esarchive/scratch/vagudets/repos/auto-s2s/") +#'library(yaml) +#'library(startR) +#'recipe <- read_yaml("modules/data_load/recipe_1.yml") +#'archive <- read_yaml(paste0(recipe$Run$code_dir, "conf/archive.yml"))$archive +#'regrid_params <- get_regrid_params(recipe, archive) +#' +#'@export +get_regrid_params <- function(recipe, archive) { + + ## TODO: Multi-model case + ## If multi-model, use the first system grid? + ## TODO: 'NULL' entries had to be removed due to bug in Start(). Rewrite when + ## the bug is fixed. + exp.name <- recipe$Analysis$Datasets$System$name + ref.name <- recipe$Analysis$Datasets$Reference$name + exp_descrip <- archive$System[[exp.name]] + reference_descrip <- archive$Reference[[ref.name]] + + if (tolower(recipe$Analysis$Regrid$type) == 'to_reference') { + + regrid_params <- list(fcst.gridtype = reference_descrip$reference_grid, + fcst.gridmethod = recipe$Analysis$Regrid$method, + fcst.transform = CDORemapper, + obs.gridtype = NULL, + obs.gridmethod = NULL, + obs.transform = NULL) + + } else if (tolower(recipe$Analysis$Regrid$type) == 'to_system') { + + regrid_params <- list(fcst.gridtype = NULL, + fcst.gridmethod = NULL, + fcst.transform = NULL, + obs.gridtype = exp_descrip$reference_grid, + obs.gridmethod = recipe$Analysis$Regrid$method, + obs.transform = CDORemapper) + + } else if (tolower(recipe$Analysis$Regrid$type) == 'none') { + + regrid_params <- list(fcst.gridtype = NULL, + fcst.gridmethod = NULL, + fcst.transform = NULL, + obs.gridtype = NULL, + obs.gridmethod = NULL, + obs.transform = NULL) + + } else { + regrid_params <- list(fcst.gridtype = recipe$Analysis$Regrid$type, + fcst.gridmethod = recipe$Analysis$Regrid$method, + fcst.transform = CDORemapper, + obs.gridtype = recipe$Analysis$Regrid$type, + obs.gridmethod = recipe$Analysis$Regrid$method, + obs.transform = CDORemapper) + } + + return(regrid_params) +} + diff --git a/modules/Loading/R/load_decadal.R b/modules/Loading/R/load_decadal.R index 797d68724ca9d933e0916e595ea3212ae489f30f..9268b090fc70db506279c87cbc3a697dd763fe67 100644 --- a/modules/Loading/R/load_decadal.R +++ b/modules/Loading/R/load_decadal.R @@ -1,12 +1,5 @@ -# Loading module: -# 1. archive.yml -# 2. recipe.yml -# 3. Load_decadal.R (V) -#setwd('/esarchive/scratch/aho/git/auto-s2s/') - -## TODO: remove paths to personal scratchs -source("/esarchive/scratch/vagudets/repos/csoperational/R/get_regrid_params.R") # Load required libraries/funs +source("modules/Loading/R/get_regrid_params.R") source("modules/Loading/R/helper_loading_decadal.R") source("modules/Loading/R/dates2load.R") source("modules/Loading/R/check_latlon.R") @@ -20,7 +13,7 @@ source("modules/Loading/R/compare_exp_obs_grids.R") load_decadal <- function(recipe) { ## - archive <- read_yaml(paste0("conf/archive_decadal.yml"))$esarchive + archive <- read_yaml(paste0("conf/archive_decadal.yml"))[[recipe$Run$filesystem]] # Print Start() info or not DEBUG <- FALSE diff --git a/modules/Loading/R/load_sample.R b/modules/Loading/R/load_sample.R new file mode 100644 index 0000000000000000000000000000000000000000..e0d906d3dbc6beb4f59a3fe0eece2f6bd5fea8a4 --- /dev/null +++ b/modules/Loading/R/load_sample.R @@ -0,0 +1,49 @@ +## TODO: remove paths to personal scratchs +source("/esarchive/scratch/vagudets/repos/csoperational/R/get_regrid_params.R") +# Load required libraries/funs +source("tools/CST_ChangeDimName.R") +source("modules/Loading/R/compare_exp_obs_grids.R") + +load_sample <- function(recipe) { + # Hindcast: + # Change hcst dimension names + hcst <- CST_ChangeDimName(lonlat_temp_st$exp, + original_dimnames = c("dataset", "sdate", "ftime", + "lat", "lon", "member"), + final_dimnames = c("dat", "syear", "time", + "latitude", "longitude", + "ensemble")) + # Add sday and sweek dimensions + hcst <- CST_InsertDim(hcst, posdim = 3, lendim = 1, name = "sday", values = 1) + hcst <- CST_InsertDim(hcst, posdim = 4, lendim = 1, name = "sweek", values = 1) + dim(hcst$attrs$Dates) <- c(sday = 1, sweek = 1, + syear = dim(hcst$attrs$Dates)[['syear']], + time = dim(hcst$attrs$Dates)[['time']]) + # Observations: + # Change obs dimension names + obs <- CST_ChangeDimName(lonlat_temp_st$obs, + original_dimnames = c("dataset", "sdate", "ftime", + "lat", "lon"), + final_dimnames = c("dat", "syear", "time", + "latitude", "longitude")) + # Add sday and sweek dimensions + obs <- CST_InsertDim(obs, posdim = 3, lendim = 1, name = "sday", values = 1) + obs <- CST_InsertDim(obs, posdim = 4, lendim = 1, name = "sweek", values = 1) + dim(obs$attrs$Dates) <- c(sday = 1, sweek = 1, + syear = dim(obs$attrs$Dates)[['syear']], + time = dim(obs$attrs$Dates)[['time']]) + # Add ensemble dimension to obs + obs <- CST_InsertDim(obs, posdim = 7, lendim = 1, name = "ensemble", values = 1) + # Adjust name of 'load_parameters$date' attribute + obs$attrs$load_parameters$dat1$file_date <- obs$attrs$load_parameters$dat1$date + + # Sample fcst NULL + fcst <- NULL + # Check for consistency between hcst and obs grid + compare_exp_obs_grids(hcst, obs) + + info(recipe$Run$logger, + "##### DATA LOADING COMPLETED SUCCESSFULLY #####") + .log_memory_usage(recipe$Run$logger, when = "After loading (sample data)") + return(list(hcst = hcst, fcst = fcst, obs = obs)) +} diff --git a/modules/Loading/R/load_seasonal.R b/modules/Loading/R/load_seasonal.R index 14f34d1c68fe1e6b4ba78710ff36f7e0a2fc6ecd..2caa34a9ebe10315f94280cfbae44901045cd306 100644 --- a/modules/Loading/R/load_seasonal.R +++ b/modules/Loading/R/load_seasonal.R @@ -1,6 +1,5 @@ -## TODO: remove paths to personal scratchs -source("/esarchive/scratch/vagudets/repos/csoperational/R/get_regrid_params.R") # Load required libraries/funs +source("modules/Loading/R/get_regrid_params.R") source("modules/Loading/R/dates2load.R") source("modules/Loading/R/get_timeidx.R") source("modules/Loading/R/check_latlon.R") diff --git a/modules/Loading/R/load_tas_tos.R b/modules/Loading/R/load_tas_tos.R index 34f6220cbed8bbc3c8a296a16fa985890bf50b9c..ea231b56d8a00f3833dc7bf9eaa271bc9a97a097 100644 --- a/modules/Loading/R/load_tas_tos.R +++ b/modules/Loading/R/load_tas_tos.R @@ -1,5 +1,5 @@ -source("/esarchive/scratch/vagudets/repos/csoperational/R/get_regrid_params.R") # Load required libraries/funs +source("modules/Loading/R/get_regrid_params.R") source("modules/Loading/R/dates2load.R") source("modules/Loading/R/get_timeidx.R") source("modules/Loading/R/check_latlon.R") diff --git a/modules/Units/R/transform_units_precipitation.R b/modules/Units/R/transform_units_precipitation.R index 5cc1e812fd31a8aec507c3cae283707a30c273b8..d0dd7ffd50dfb6f04db35006a06cd61b1d8f43e9 100644 --- a/modules/Units/R/transform_units_precipitation.R +++ b/modules/Units/R/transform_units_precipitation.R @@ -96,22 +96,25 @@ transform_units_precipitation <- function(data, original_units, new_units, } -.days_in_month <- function(x, cal) { - if (cal %in% c('gregorian', 'standard', 'proleptic_gregorian')) { - N_DAYS_IN_MONTHS <- lubridate:::N_DAYS_IN_MONTHS - if (leap_year(year(x))) { - N_DAYS_IN_MONTHS[2] <- N_DAYS_IN_MONTHS[2] + 1 - } - } else if (cal %in% c('360', '360_day')) { - N_DAYS_IN_MONTHS <- rep(30, 12) - names(N_DAYS_IN_MONTHS) <- month.abb - } else if (cal %in% c('365', '365_day')) { - N_DAYS_IN_MONTHS <- lubridate:::N_DAYS_IN_MONTHS - } else { - stop("Unknown calendar") - } - month_x <- month(x, label = TRUE, locale = "C") - n_days <- N_DAYS_IN_MONTHS[month_x] - n_days[month_x == "Feb" & leap_year(x)] <- 29L +.days_in_month <- function(dates, cal) { + n_days <- array() + for (x in 1:length(dates)) { + if (cal %in% c('gregorian', 'standard', 'proleptic_gregorian')) { + N_DAYS_IN_MONTHS <- lubridate:::N_DAYS_IN_MONTHS + if (leap_year(year(dates[x]))) { + N_DAYS_IN_MONTHS[2] <- N_DAYS_IN_MONTHS[2] + 1 + } + } else if (cal %in% c('360', '360_day')) { + N_DAYS_IN_MONTHS <- rep(30, 12) + names(N_DAYS_IN_MONTHS) <- month.abb + } else if (cal %in% c('365', '365_day')) { + N_DAYS_IN_MONTHS <- lubridate:::N_DAYS_IN_MONTHS + } else { + stop("Unknown calendar") + } + month_x <- month(dates[x], label = TRUE, locale = "C") + n_days[x] <- N_DAYS_IN_MONTHS[month_x] + # n_days[month_x == "Feb" & leap_year(x)] <- 29L + } return(n_days) } diff --git a/modules/Visualization/Visualization.R b/modules/Visualization/Visualization.R index a6b6bd75ebb6864e686c7101eb713a5f82f5a773..ae94ed3d01526f8cf98932a2adf6f6bce6d7359d 100644 --- a/modules/Visualization/Visualization.R +++ b/modules/Visualization/Visualization.R @@ -46,7 +46,7 @@ Visualization <- function(recipe, "parameters of the requested plotting function, i.e. ", "PlotEquiMap, PlotRobinson or PlotLayout. There could be ", "plotting erros if the list is incomplete.")) - } else { + } else if (!is.null(output_conf)) { warning(paste("Parameter 'output_conf' should be a list.", "Using default configuration.")) output_conf <- NULL diff --git a/recipe_ecvs_seasonal_oper.yml b/recipe_ecvs_seasonal_oper.yml deleted file mode 100644 index 832f36d54b04c688019b75b066dc41360b302288..0000000000000000000000000000000000000000 --- a/recipe_ecvs_seasonal_oper.yml +++ /dev/null @@ -1,73 +0,0 @@ -Description: - Author: nperez - Info: ECVs Oper ESS ECMWF SEAS5 Seasonal Forecast recipe (monthly mean, tas) - -Analysis: - Horizon: seasonal # Mandatory, str: either subseasonal, seasonal, or decadal - Variables: - - {name: tas, freq: monthly_mean, units: C} - - {name: prlr, freq: monthly_mean, units: mm, flux: no} - Datasets: - System: - - {name: ECMWF-SEAS5.1} # system21_m1 system35c3s - Multimodel: no # Mandatory, bool: Either yes/true or no/false - Reference: - - {name: ERA5} # Mandatory, str: Reference codename. See docu. - Time: - sdate: '0801' ## MMDD - fcst_year: '2023' # Optional, int: Forecast year 'YYYY' - hcst_start: '1993' # Mandatory, int: Hindcast start year 'YYYY' - hcst_end: '2016' # Mandatory, int: Hindcast end year 'YYYY' - ftime_min: 1 # Mandatory, int: First leadtime time step in months - ftime_max: 6 # Mandatory, int: Last leadtime time step in months - Region: - - {name: "EU", latmin: 20, latmax: 80, lonmin: -20, lonmax: 40} - Regrid: - method: bilinear # Mandatory, str: Interpolation method. See docu. - type: "to_system" - #type: /esarchive/scratch/nmilders/gitlab/git_clones/auto-s2s/conf/grid_description.txt #'r360x180' # Mandatory, str: to_system, to_reference, or CDO-accepted grid. - Workflow: - Anomalies: - compute: no - cross_validation: no - save: none - Calibration: - method: evmos # Mandatory, str: Calibration method. See docu. - cross_validation: yes - save: none - Skill: - metric: mean_bias EnsCorr rpss crpss bss10 bss90 - save: 'all' - cross_validation: yes - Probabilities: - percentiles: [[1/3, 2/3], [1/10, 9/10]] # frac: Quantile thresholds. - save: 'all' - Indicators: - index: no - Visualization: - plots: skill_metrics forecast_ensemble_mean most_likely_terciles - multi_panel: no - dots: both - ncores: 4 # Optional, int: number of cores, defaults to 1 - remove_NAs: # Optional, bool: Whether NAs are removed, defaults to FALSE - Output_format: scorecards - logo: yes -Run: - Loglevel: INFO - Terminal: yes - filesystem: esarchive - output_dir: /esarchive/scratch/nperez/cs_oper/seasonal/ # replace with the directory where you want to save the outputs - code_dir: /esarchive/scratch/nperez/git/auto-s2s/ # replace with the directory where your code is - autosubmit: yes - # fill only if using autosubmit - auto_conf: - script: /esarchive/scratch/nperez/git/auto-s2s/exec_ecvs_seasonal_oper.R # replace with the path to your script - expid: a68v # replace with your EXPID - hpc_user: bsc32339 # replace with your hpc username - wallclock: 02:00 # hh:mm - processors_per_job: 4 - platform: nord3v2 - email_notifications: yes # enable/disable email notifications. Change it if you want to. - email_address: nuria.perez@bsc.es # replace with your email address - notify_completed: yes # notify me by email when a job finishes - notify_failed: yes # notify me by email when a job fails diff --git a/recipes/atomic_recipes/recipe_seasonal_provenance.yml b/recipes/atomic_recipes/recipe_seasonal_provenance.yml new file mode 100644 index 0000000000000000000000000000000000000000..196b49c92bcbd52582bbf4573ae51591e3096ebc --- /dev/null +++ b/recipes/atomic_recipes/recipe_seasonal_provenance.yml @@ -0,0 +1,57 @@ +Description: + Author: V. Agudetse + '': split version +Analysis: + Horizon: seasonal + Variables: + name: tas + freq: monthly_mean + Datasets: + System: + name: ECMWF-SEAS5.1 + Multimodel: no + Reference: + name: ERA5 + Time: + sdate: '1101' + fcst_year: + hcst_start: '2000' + hcst_end: '2006' + ftime_min: 1 + ftime_max: 3 + Region: + latmin: 27 + latmax: 48 + lonmin: 0 + lonmax: 359 + Regrid: + method: bilinear + type: to_system + Workflow: + Anomalies: + compute: no + cross_validation: + save: + Calibration: + method: bias + save: 'all' + Skill: + metric: RPSS + save: 'all' + Probabilities: + percentiles: [[1/3, 2/3]] + save: 'all' + Indicators: + index: FALSE + Visualization: + plots: skill_metrics, forecast_ensemble_mean, most_likely_terciles + ncores: # Optional, int: number of cores, defaults to 1 + remove_NAs: # Optional, bool: Whether NAs are removed, defaults to FALSE + Output_format: S2S4E +Run: + Loglevel: INFO + Terminal: yes + output_dir: /tmp/out-logs/ + code_dir: /home/kinow/Development/r/workspace/sunset/ + filesystem: sample + diff --git a/recipes/atomic_recipes/recipe_system7c3s-prlr.yml b/recipes/atomic_recipes/recipe_system7c3s-prlr.yml index 1cba3c97f76f05fddec5aefe294f02207e6459b9..590d4499d07b1761737f4c0d2f89bec2bb5e30c7 100644 --- a/recipes/atomic_recipes/recipe_system7c3s-prlr.yml +++ b/recipes/atomic_recipes/recipe_system7c3s-prlr.yml @@ -4,8 +4,9 @@ Description: Analysis: Horizon: Seasonal Variables: - name: prlr + name: prlr freq: monthly_mean + units: mm Datasets: System: name: Meteo-France-System7 @@ -36,7 +37,7 @@ Analysis: method: mse_min save: 'all' Skill: - metric: RPS RPSS CRPS CRPSS FRPSS BSS10 BSS90 EnsCorr Corr + metric: RPS RPSS CRPS CRPSS FRPSS BSS10 BSS90 EnsCorr save: 'all' Probabilities: percentiles: [[1/3, 2/3], [1/10, 9/10], [1/4, 2/4, 3/4]] diff --git a/recipes/recipe_decadal_split.yml b/recipes/examples/recipe_decadal_split.yml similarity index 100% rename from recipes/recipe_decadal_split.yml rename to recipes/examples/recipe_decadal_split.yml diff --git a/recipes/examples/recipe_ecvs_seasonal_oper.yml b/recipes/examples/recipe_ecvs_seasonal_oper.yml index d47fd1593749cb07c04dd8166a73075ac2058d76..832f36d54b04c688019b75b066dc41360b302288 100644 --- a/recipes/examples/recipe_ecvs_seasonal_oper.yml +++ b/recipes/examples/recipe_ecvs_seasonal_oper.yml @@ -5,8 +5,8 @@ Description: Analysis: Horizon: seasonal # Mandatory, str: either subseasonal, seasonal, or decadal Variables: - - {name: tas, freq: monthly_mean} - - {name: prlr, freq: monthly_mean} + - {name: tas, freq: monthly_mean, units: C} + - {name: prlr, freq: monthly_mean, units: mm, flux: no} Datasets: System: - {name: ECMWF-SEAS5.1} # system21_m1 system35c3s @@ -14,14 +14,14 @@ Analysis: Reference: - {name: ERA5} # Mandatory, str: Reference codename. See docu. Time: - sdate: '0701' ## MMDD + sdate: '0801' ## MMDD fcst_year: '2023' # Optional, int: Forecast year 'YYYY' hcst_start: '1993' # Mandatory, int: Hindcast start year 'YYYY' hcst_end: '2016' # Mandatory, int: Hindcast end year 'YYYY' ftime_min: 1 # Mandatory, int: First leadtime time step in months ftime_max: 6 # Mandatory, int: Last leadtime time step in months Region: - - {name: "UE", latmin: 20, latmax: 80, lonmin: -20, lonmax: 40} + - {name: "EU", latmin: 20, latmax: 80, lonmin: -20, lonmax: 40} Regrid: method: bilinear # Mandatory, str: Interpolation method. See docu. type: "to_system" @@ -47,7 +47,7 @@ Analysis: Visualization: plots: skill_metrics forecast_ensemble_mean most_likely_terciles multi_panel: no - projection: lambert_europe + dots: both ncores: 4 # Optional, int: number of cores, defaults to 1 remove_NAs: # Optional, bool: Whether NAs are removed, defaults to FALSE Output_format: scorecards diff --git a/recipes/recipe_scorecards.yml b/recipes/examples/recipe_scorecards.yml similarity index 100% rename from recipes/recipe_scorecards.yml rename to recipes/examples/recipe_scorecards.yml diff --git a/recipes/recipe_splitting_example.yml b/recipes/examples/recipe_splitting_example.yml similarity index 100% rename from recipes/recipe_splitting_example.yml rename to recipes/examples/recipe_splitting_example.yml diff --git a/split.R b/split.R index 9bd95e8652c77a59aa4a9fd1a655b9efc384a21d..0d443c7c84163934f1e01c33e6ad738d3fb782c6 100755 --- a/split.R +++ b/split.R @@ -29,9 +29,8 @@ arguments <- docopt(doc = doc) recipe <- prepare_outputs(recipe_file = arguments$recipe, uniqueID = !arguments$disable_unique_ID, restructure = FALSE) -# Split recipe into atomic recipes -## TODO: Add autosubmit yes/no to the parameters? +# Split recipe into atomic recipes run_parameters <- divide_recipe(recipe) if (!is.null(recipe$Run$autosubmit) && (recipe$Run$autosubmit)) { diff --git a/tools/CST_ChangeDimName.R b/tools/CST_ChangeDimName.R new file mode 100644 index 0000000000000000000000000000000000000000..1ccbbb3d327df21f6582326e2e54fdf58ea68f09 --- /dev/null +++ b/tools/CST_ChangeDimName.R @@ -0,0 +1,39 @@ +## TODO: Documentation + +CST_ChangeDimName <- function(data, original_dimnames, final_dimnames) { + if (!inherits(data, "s2dv_cube")) { + stop("Parameter 'data' must be an object of class 's2dv_cube'") + } + if (!(length(original_dimnames) == length(final_dimnames))) { + stop("The number of dimension names in 'final_dimnames' must be the same + as in 'original_dimnames'") + } + ## TODO: Add check to verify that all original_dimnames are present in the array + for (index in 1:length(original_dimnames)) { + original_name <- original_dimnames[index] + final_name <- final_dimnames[index] + # Step 1: Change dims and data + names(data$dims)[which(names(data$dims) == original_name)] <- final_name + dim(data$data) <- data$dims + # Step 2: Change coords + names(data$coords)[which(names(data$coords) == original_name)] <- final_name + # Step 3: Change attrs + # 3.1 - Dates + if (original_name %in% names(dim(data$attrs$Dates))) { + names(dim(data$attrs$Dates))[which(names(dim(data$attrs$Dates)) + == original_name)] <- final_name + } + # 3.2 - Variable metadata + if (original_name %in% names(data$attrs$Variable$metadata)) { + names(data$attrs$Variable$metadata)[which(names(data$attrs$Variable$metadata) + == original_name)] <- final_name + } + # 3.3 - Source files + if (original_name %in% names(dim(data$attrs$source_files))) { + names(dim(data$attrs$source_files))[which(names(dim(data$attrs$source_files)) + == original_name)] <- final_name + } + } + return(data) +} + diff --git a/modules/verifications.R b/tools/verifications.R similarity index 100% rename from modules/verifications.R rename to tools/verifications.R diff --git a/use_cases/ex0_1_sample_dataset/ex0_1-handson.md b/use_cases/ex0_1_sample_dataset/ex0_1-handson.md new file mode 100644 index 0000000000000000000000000000000000000000..c6d824ef3fa61603c7100ecb95915843d7c8ac28 --- /dev/null +++ b/use_cases/ex0_1_sample_dataset/ex0_1-handson.md @@ -0,0 +1,240 @@ +# Use case 0.1: Loading a sample dataset + +Author: Victòria Agudetse Roures + +## Goal +Load a sample dataset into SUNSET without the need configure an archive of netCDF or GRIB files. + +## 0. SUNSET software stack + +The first step to use SUNSET is to create a copy of the code in your local environment. Open a terminal and `cd` to the directory where you would like to store your local copy of SUNSET. For example: `/esarchive/scratch//git/`. If a directory does not exist yet, you can create it with the `mkdir` shell command. + +```shell +# Clone the GitLab repository to create a local copy of the code +git clone https://earth.bsc.es/gitlab/es/sunset.git +``` + +If you are using BSC infrastructure, all of the software dependencies are already installed within our common environment. However, if you are running SUNSET outside of the BSC facilities, you can install the dependencies by creating a conda environment. A script is provided in the SUNSET repository, and you can install the environment by running the following line from the main folder of the SUNSET repository: + +```shell +bash conda_installation/load_sunset.bash +``` + +To run the line above, you should replace `` with the path where you want the conda environment to be installed. For example, `/home//conda-sunset`. + +## 1. The recipe + +SUNSET uses YAML configuration files called 'recipes' to specify which data you want to load and the details of the different steps of the workflow. In this example, we are using a sample dataset which contains data of temperature-at-surface (tas) monthly means, from ECMWF SEAS5 as the experiment and ERA5 as the reference dataset, for the initialization month of November. + +There is a template file for this hands-on tutorial, which you can open with a text editor: + +```shell +# cd to the main SUNSET directory +# Open the recipe with a text editor such as vim or emacs +vim use_cases/ex0_1_sample_dataset/ex0_1-recipe.yml +``` + +Once you have opened your recipe, it is time to edit the contents. Fill in the blank slots according to your preference, based on the options given in the description. + +NOTE: In this particular example, the sample data is limited, so the system and reference names, variable, forecast times and region cannot be changed. + +```yaml +Description: + Author: <___> + Description: Exercise 0.1: Skill assessment of a sample dataset +Analysis: + Horizon: Seasonal + Variables: + name: tas + freq: monthly_mean + units: <___> # Choose your units: C or K + Datasets: + System: + name: ECMWF-SEAS5.1 + Multimodel: no + Reference: + name: ERA5 + Time: + sdate: '1101' + fcst_year: + hcst_start: '2000' + hcst_end: '2006' + ftime_min: 1 + ftime_max: 3 + Region: + latmin: 27 + latmax: 48 + lonmin: 0 + lonmax: 359 + Regrid: + method: bilinear + type: 'to_system' + Workflow: + Anomalies: + compute: yes + cross_validation: yes + save: 'none' + Calibration: + method: <___> + save: 'none' + Skill: + metric: RPSS BSS10 BSS90 + cross_validation: yes + save: 'none' + Probabilities: + percentiles: [[1/3, 2/3]] + save: 'none' + Visualization: + plots: skill_metrics + multi_panel: no + projection: cylindrical_equidistant + ncores: 10 + remove_NAs: yes + Output_format: S2S4E +Run: + filesystem: sample # This parameter specifies that we want to load the sample data, rather than data from a filesystem + Loglevel: INFO + Terminal: yes + output_dir: <______> # Path to the directory where you want your outputs to be saved + code_dir: <______> # Path to the directory where your code is +``` + +## 2. Load the required SUNSET modules and read the recipe + +If you are running SUNSET within BSC infrastructure, source the MODULES file to load the environment modules needed to run SUNSET. + +```shell +source MODULES +``` + +Otherwise, activate the conda environment: + +```shell +conda activate +``` + +Open an R session, by simply typing `R` on the terminal. + +To run SUNSET, we must run the R session from the directory where the code is. To check your working directory, you can run the shell command `pwd`. From the R session, you can use the commands `getwd()` and `setwd()` to see and change your working directory. + +```r +# Load required modules +source("modules/Loading/Loading.R") +source("modules/Units/Units.R") +source("modules/Calibration/Calibration.R") +source("modules/Anomalies/Anomalies.R") +source("modules/Skill/Skill.R") +source("modules/Saving/Saving.R") +source("modules/Visualization/Visualization.R") + +# Read recipe +recipe_file <- "use_cases/ex0_1_sample_dataset/ex0_1-recipe.yml" +recipe <- prepare_outputs(recipe_file) +``` + +The function `prepare_outputs()` creates a unique folder for the logs, data files and plots that result from the execution of your recipe, inside the directory you specified. It also runs a check over the recipe to detect any potential errors, misspellings or missing arguments. At the end of the check, a message is displayed indicating whether or not the recipe passed the check, along with the list of errors and warnings. + +**Questions** + +Read the logs! + +(1) Did your recipe pass the check? Did you get any warnings? + +(2) Where will your outputs be saved? Copy and paste this directory somewhere, so that you can check it later! + +*Tip*: The recipe is now stored as a `list` containing all the information of the original YAML file, plus some extra things! If you want to see any particular element of the recipe from the R session, you can simply access that element in the list. For example: + +```r +# Checking the variable name +recipe$Analysis$Variables$name +# Checking the output directory +recipe$Run$output_dir +``` + +## 3. Load the data and change the units + +The **Loading** module retrieves the information from the recipe to load the data that has been requested it in. It loads the experiment data for the hindcast period, the reference data for the corresponding period, and the experiment forecast if a forecast year has been requested. In the case of the sample dataset, we have a hindcast and the corresponding observations, but no forecast. + +For certain variables like temperature, precipitation or sea level pressure, the user can request for specific units to load the data in. The **Units** module will read the original units as stored in the netCDF files and perform any necessary unit converstions to match the request in the recipe. It also verifies that all of the loaded datasets share the same units, even if no specific unit has been requested. For this reason, users are strongly encouraged to run it even if they did not request any unit conversion. + +```r +# Load datasets +data <- Loading(recipe) +# Change units +data <- Units(recipe, data) +``` + +**Questions** + +(1) What is the structure of `data`? What is the class of the objects in `data`? *Tip*: you can use functions like `class()`, `names()` or `str()` to gain information about the structure of the object and its contents. + +```r +class(data) +names(data) +str(data, max.level = 2) +# You can access any of the three objects with the `$` operator: +class(data$hcst) +``` + +(2) Pay attention to the log messages: Did your units get converted? Are the new units what you expect? You can check the metadata of any of the objects in data. SUNSET also provides the `data_summary()` function, which lets you have a quick look at your objects: + +```r +# Check the new units and data of the hindcast (hcst) and/or observations (obs). Are they the same? +data$hcst$attrs$Variable$metadata$tas$units +data_summary(data$hcst, recipe) +data_summary(data$obs, recipe) +``` +(3) What are the dimensions of the datasets? Are they consistent with what is requested in the recipe? *Tip*: Check the data summary! + +## 4. Calibrate the data and compute the anomalies + +SUNSET has a few modules to perform post-processing on the experimental and the reference datasets. The **Calibration** module performs the bias correction method indicated in the recipe, using the `CSTools::CST_Calibration()` function. + +The **Anomalies** module removes the climatologies using functions like `CSTools::CST_Anomaly()` and `s2dv::Clim()`, and also returns the full fields in case they are needed for any future computations. + +```r +# Calibrate the data +data <- Calibration(recipe, data) +# Compute anomalies +data <- Anomalies(recipe, data) +``` +**Questions** + +(1) Verify that you now have anomaly values instead of the original full field. *Tip*: Use `data_summary()` like in the previous example and pay attention to the new values. + +## 5. Evaluate the model skill and compute the probability thresholds + +The **Skill** module returns a list of all the evaluation metrics requested in the recipe, in the shape of multi-dimensional arrays. In this case, we will compute three metrics: + +- **RPSS (Ranked Probability Skill Score)**: This skill score measures how well a forecast predicts the probability of the tercile categories (below normal, normal and above-normal), compared to the climatology. +- **BSS10 and BSS90 (Brier Skill Score):** This skill score measures how well a forecast predicts the probability of the 10th percentile and 90th percentile extreme events, compared to the climatology. + +The `Probabilities()` function returns the probability values for each requested category for the hindcast (and forecast) data, as well as the hindcast percentile values corresponding to each threshold. +``` +# Compute skill metrics +skill_metrics <- Skill(recipe, data) +# Compute percentiles and probability bins +probabilities <- Probabilities(recipe, data) +``` +**Questions** + +(1) What is the structure of `skill_metrics`? Which metrics were computed? What dimensions do they have? *Tip*: use `str()` and `names()`. + +(2) What is the structure of `probabilities`? Can you identify the probability categories and the percentiles? *Tip*: use `str()`. + +## 6. Plotting the results + +Now, let's visualize the information that was computed! + +The **Visualization** module will generate the maps we requested in the recipe: Skill maps to visualize the skill distribution of the model, for each metric. + +With the significance option in the `Visualization()` function, you can choose whether or not to shade the grid points that are statistically significant in each skill metric plot. + +```r +# Plot data +Visualization(recipe, data, + skill_metrics = skill_metrics, + significance = TRUE) +``` + +Now, you can `cd` to the the output directory and inspect the contents of the `plots/` subdirectory. The plots are PNG files that can be visualized with the `display` command. They have a descriptive name including the content of the plot, the date and the forecast time. diff --git a/use_cases/ex0_1_sample_dataset/ex0_1-recipe.yml b/use_cases/ex0_1_sample_dataset/ex0_1-recipe.yml new file mode 100644 index 0000000000000000000000000000000000000000..662e75cc77691b7e6670cf68f31af885c48225c5 --- /dev/null +++ b/use_cases/ex0_1_sample_dataset/ex0_1-recipe.yml @@ -0,0 +1,57 @@ +Description: + Author: V. Agudetse + '': split version +Analysis: + Horizon: seasonal + Variables: + name: tas + freq: monthly_mean + Datasets: + System: + name: ECMWF-SEAS5.1 + Multimodel: no + Reference: + name: ERA5 + Time: + sdate: '1101' + fcst_year: + hcst_start: '2000' + hcst_end: '2006' + ftime_min: 1 + ftime_max: 3 + Region: + latmin: 27 + latmax: 48 + lonmin: 0 + lonmax: 359 + Regrid: + method: bilinear + type: to_system + Workflow: + Anomalies: + compute: yes + cross_validation: yes + save: 'all' + Calibration: + method: bias + save: 'none' + Skill: + metric: RPSS BSS10 BSS90 + save: 'all' + Probabilities: + percentiles: [[1/3, 2/3]] + save: 'all' + Indicators: + index: FALSE + Visualization: + plots: skill_metrics + ncores: # Optional, int: number of cores, defaults to 1 + remove_NAs: # Optional, bool: Whether NAs are removed, defaults to FALSE + Output_format: S2S4E +Run: + Loglevel: INFO + Terminal: yes + output_dir: /tmp/out-logs/ + code_dir: ./ + filesystem: sample + diff --git a/use_cases/ex1_1_single_analysis_terminal/ex1_1-handson.md b/use_cases/ex1_1_single_analysis_terminal/ex1_1-handson.md index 391d48566c47fb78f26d7264372ce1e22c0aaf41..ee86459a6f556b17c6776b86ef0021691a25478a 100644 --- a/use_cases/ex1_1_single_analysis_terminal/ex1_1-handson.md +++ b/use_cases/ex1_1_single_analysis_terminal/ex1_1-handson.md @@ -1,5 +1,7 @@ # Hands-on 1.1: Single Verification Workflow on the Terminal +Author: Victòria Agudetse Roures + ## Goal Create a SUNSET recipe and use the functions in the suite to reproduce the verification workflow from the previous hands-on exercises. @@ -16,7 +18,8 @@ git clone https://earth.bsc.es/gitlab/es/sunset.git SUNSET uses YAML configuration files called 'recipes' to specify which data you want to load and the details of the different steps of the workflow. In this example, we want to evaluate the temperature-at-surface (tas) monthly means, using MeteoFrance System 7 data as our experiment and ERA5 as our reference dataset, for the initialization month of November. -There is a template file +There is a template file for this hands-on tutorial, which you can open with a text editor: + ```shell # cd to the main SUNSET directory # Open the recipe with a text editor such as vim or emacs diff --git a/use_cases/ex1_2_autosubmit_scorecards/Figures/as_change_status.PNG b/use_cases/ex1_2_autosubmit_scorecards/Figures/as_change_status.PNG new file mode 100644 index 0000000000000000000000000000000000000000..5fc5cd8a3397a626824cbaa8fb9d3fecb090ce13 Binary files /dev/null and b/use_cases/ex1_2_autosubmit_scorecards/Figures/as_change_status.PNG differ diff --git a/use_cases/ex1_2_autosubmit_scorecards/Figures/as_tree.PNG b/use_cases/ex1_2_autosubmit_scorecards/Figures/as_tree.PNG new file mode 100644 index 0000000000000000000000000000000000000000..5194568e387d7251c478eed0cf071f7d7c7a4a60 Binary files /dev/null and b/use_cases/ex1_2_autosubmit_scorecards/Figures/as_tree.PNG differ diff --git a/use_cases/ex1_2_autosubmit_scorecards/Figures/scorecard-1_ECMWF-SEAS5_ERA5_tas_1993-2003.png b/use_cases/ex1_2_autosubmit_scorecards/Figures/scorecard-1_ECMWF-SEAS5_ERA5_tas_1993-2003.png new file mode 100644 index 0000000000000000000000000000000000000000..631e3e2f80955aaf2b5dbd8abb2e577a73746373 Binary files /dev/null and b/use_cases/ex1_2_autosubmit_scorecards/Figures/scorecard-1_ECMWF-SEAS5_ERA5_tas_1993-2003.png differ diff --git a/use_cases/ex1_2_autosubmit_scorecards/Figures/scorecard-2_ECMWF-SEAS5_ERA5_tas_1993-2003.png b/use_cases/ex1_2_autosubmit_scorecards/Figures/scorecard-2_ECMWF-SEAS5_ERA5_tas_1993-2003.png new file mode 100644 index 0000000000000000000000000000000000000000..e311079536dc83ccb94c2cd786cebc44b46694ed Binary files /dev/null and b/use_cases/ex1_2_autosubmit_scorecards/Figures/scorecard-2_ECMWF-SEAS5_ERA5_tas_1993-2003.png differ diff --git a/use_cases/ex1_2_autosubmit_scorecards/Figures/scorecard-3_ECMWF-SEAS5_ERA5_tas_1993-2003.png b/use_cases/ex1_2_autosubmit_scorecards/Figures/scorecard-3_ECMWF-SEAS5_ERA5_tas_1993-2003.png new file mode 100644 index 0000000000000000000000000000000000000000..356243497a01f392ce21575d9df17c1a1d2b1796 Binary files /dev/null and b/use_cases/ex1_2_autosubmit_scorecards/Figures/scorecard-3_ECMWF-SEAS5_ERA5_tas_1993-2003.png differ diff --git a/use_cases/ex1_2_autosubmit_scorecards/Figures/scorecard-4_ECMWF-SEAS5_ERA5_tas_1993-2003.png b/use_cases/ex1_2_autosubmit_scorecards/Figures/scorecard-4_ECMWF-SEAS5_ERA5_tas_1993-2003.png new file mode 100644 index 0000000000000000000000000000000000000000..54e1a5b1825b1a00672360dd86f3c8c15a86380c Binary files /dev/null and b/use_cases/ex1_2_autosubmit_scorecards/Figures/scorecard-4_ECMWF-SEAS5_ERA5_tas_1993-2003.png differ diff --git a/use_cases/ex1_2_autosubmit_scorecards/ex1_2-handson.md b/use_cases/ex1_2_autosubmit_scorecards/ex1_2-handson.md new file mode 100644 index 0000000000000000000000000000000000000000..c77fb2b164a631c51dac51dd2a0f2ad90479d6bc --- /dev/null +++ b/use_cases/ex1_2_autosubmit_scorecards/ex1_2-handson.md @@ -0,0 +1,174 @@ +# Hands-on 1.2: Computation of Scorecards with Autosubmit + +Author: An-Chi Ho + +## Goal +Compute some skill metrics and plots scorecards with SUNSET, using Autosubmit to dispatch jobs in parallel. +In the recipe, we request 12 start dates (0101, 0201, ..., 1201). SUNSET will split the recipe into 12 atomic recipes, and Autosubmit will run 12 jobs, which process the verification, for each recipe in parallel. +When all the verification jobs are finished, the scorecard job will be triggered and produces the scorecards. + +We only use one variable, one model and one reference dataset in this example, but you can add more datasets and variables if needed, and SUNSET will split the recipes accordingly. + +Check GitLab Wiki: +- Autosubmit page for full explanation of using SUNSET with Autosubmit https://earth.bsc.es/gitlab/es/sunset/-/wikis/Autosubmit + +- Home page Scorecards module section to know more about scorecards https://earth.bsc.es/gitlab/es/sunset/-/wikis/home#scorecards-module + + +## 0. Cloning the SUNSET repository + +If you're completely new to SUNSET, the first step is to create a copy of the tool in your local environment. +Open a terminal and `cd` to the directory where you would like to store your local copy of SUNSET. For example: `/esarchive/scratch//git/`. If a directory does not exist yet, you can create it with the `mkdir` shell command. + +```shell +# Clone the GitLab repository to create a local copy of the code +git clone https://earth.bsc.es/gitlab/es/sunset.git +``` +You should see a git folder "sunset" under the current directory. Now you have all the code, recipes, and scripts for running SUNSET. + + +## 1. Create Autosubmit experiment + +Since we're going to use Autosubmit to dispatch jobs, we need to have an Autosubmit experiment. Note that SUNSET uses Autosubmit >= 4.0.0. + +On the workstation or the Autosubmit machine, you can create an experiment by the following commands. + +```shell +module load autosubmit/4.0.0b-foss-2015a-Python-3.7.3 +autosubmit expid -H nord3v2 -d "SUNSET use case 1_2" +``` +You will see the messages like below: + +```shell +Autosubmit is running with 4.0.0b +The new experiment "a6pc" has been registered. +Generating folder structure... +Experiment folder: /esarchive/autosubmit/a6pc +Generating config files... +Experiment a6pc created +``` +Note the experiment ID (in this snippet above, a6pc) down. We need it for the recipe later. + + +## 2. Modifying the recipe + +The template recipe for this use case can be found in [ex1_2-recipe.yml](use_cases/ex1_2_autosubmit_scorecards/ex1_2-recipe.yml). +You should at least edit some items in the "Run" section: +- `output_dir`: The directory you want to save the outputs and logs +- `code_dir`: The directory where your SUNSET code is stored (i.e., the git folder) +- `auto_conf$script`: The path to the script ex1_2-recipe.yml +- `auto_conf$expid`: The experiment "xxxx" you just created +- `auto_conf$hpc_user`: You user ID on Nord3, which should be bsc32xxx +- `auto_conf$email_address`: Your email. You can also adjust other email notification parts up to your preference. + +In the recipe, we ask for anomaly calculation after loading the data, calculate the skill scores and save the result for scorecards. In the Scorecard section, three regions are requested. + +Feel free to also modify other aspects according to your particular needs. You can read more about the parameters and the available modules in the SUNSET GitLab wiki. + +## 3. The user-defined script + +We need to have a script to define the modules to use and the steps of the workflow. Note that the script is for data loading and verification parts. The Scorecards module doesn't need to be included in this script. + +The prepare_outputs() function is already incorporated into the launcher script (see the next section for details about launcher), so it does not need to be included in the user-defined script in this case. +In its place, we will use the function read_atomic_recipe(). The recipe path is passed as an argument onto the R script. The beginning of our script should look like this: + +```R +# Load modules +source("modules/Loading/Loading.R") +source("modules/Anomalies/Anomalies.R") +source("modules/Skill/Skill.R") +source("modules/Saving/Saving.R") + +# Read recipe +## (leave this part as-is! Autosubmit will automatically pass the atomic recipe as a parameter) +args = commandArgs(trailingOnly = TRUE) +recipe_file <- args[1] +recipe <- read_atomic_recipe(recipe_file) +``` + +The rest of the user-defined script can be written in the same way as any other SUNSET script. We load the data, calculate the anomalies, then compute the skill scores and save the result as netCDF files for Scorecards. + +```R +# Load data +data <- Loading(recipe) +# Compute tos anomalies +data <- Anomalies(recipe, data) +# Compute skill metrics +skill_metrics <- Skill(recipe, data) +``` +Check the example script at [ex1_2-script.yml](use_cases/ex1_2_autosubmit_scorecards/ex1_2-script.R). +You can execute it as-is or copy it and modify it according to your needs. + + +## 4. Launch jobs and Use Autosubmit + +We will start the jobs with the launcher. The SUNSET Launcher is a bash script named launch_SUNSET.sh that can be found in the main directory of the SUNSET repository. It runs in two steps: + +1. Run the recipe checks, split the recipe into atomic recipes and create the directory for the outputs. +2. Modify the Autosubmit configuration of your experiment according to the parameters in the recipe. + +The bash script needs two inputs: (1) [recipe](#2-modifying-the-recipe) (2) [R script](#3-the-user-defined-script). + + On your workstation or Nord3 under the SUNSET code directory, run: + +```shell +bash launch_SUNSET.sh use_cases/ex1_2_autosubmit_scorecards/ex1_2-recipe.yml use_cases/ex1_2_autosubmit_scorecards/ex1_2-script.R +``` +You will see the messages similar to below: +```shell +[1] "Saving all outputs to:" +[1] "/esarchive/scratch/aho/auto-s2s-outputs/ex1_2-recipe_20231129003740" +INFO [2023-11-29 00:37:41] Checking recipe: use_cases/ex1_2_autosubmit_scorecards/ex1_2-recipe.yml +WARN [2023-11-29 00:37:41] The element 'fcst_year' is not defined in the recipe. No forecast year will be used. +INFO [2023-11-29 00:37:41] ##### RECIPE CHECK SUCCESSFULL ##### +INFO [2023-11-29 00:37:41] Splitting recipe in single verifications. +INFO [2023-11-29 00:37:41] The main recipe has been divided into 12 atomic recipes. +INFO [2023-11-29 00:37:41] Check output directory /esarchive/scratch/aho/auto-s2s-outputs//ex1_2-recipe_20231129003740/logs/recipes/ to see all the individual atomic recipes. +INFO [2023-11-29 00:37:41] ##### AUTOSUBMIT CONFIGURATION WRITTEN FOR a6pc ##### +INFO [2023-11-29 00:37:41] You can check your experiment configuration at: /esarchive/autosubmit/a6pc/conf/ +INFO [2023-11-29 00:37:41] Please SSH into bscesautosubmit01 or bscesautosubmit02 and run the following commands: +INFO [2023-11-29 00:37:41] module load autosubmit/4.0.0b-foss-2015a-Python-3.7.3 +INFO [2023-11-29 00:37:41] autosubmit create a6pc +INFO [2023-11-29 00:37:41] autosubmit refresh a6pc +INFO [2023-11-29 00:37:41] nohup autosubmit run a6pc & disown +``` +You can see some useful information, like the the path to atomic recipes, the Autosubmit configuration files, and most importantly, follow the last lines to launch your experiment. + +```shell +ssh bscesautosubmit01.bsc.es +(enter Autosubmit machine) +module load autosubmit/4.0.0b-foss-2015a-Python-3.7.3 +autosubmit create a6pc +autosubmit refresh a6pc +nohup autosubmit run a6pc & disown +``` + +Then, you can go to [Autosubmit GUI](https://earth.bsc.es/autosubmitapp/) to check the experiment status. + + + +As you can see, the Scorecards job is dependent on the Verification jobs. Once the 12 verification jobs are finished, the Scorecards job will start. + +## 5. Results and plots + +The scorecards are saved under `plots/Scorecards` under the output directory. There will be 4 files (_more explanation here_) + + + + + + + +## 6. Rerun Autosubmit + +If something goes wrong and makes the jobs fail, you can rerun the failed jobs only. + +1. Go to Autosubmit GUI, select the failed job(s), click "CHANGE STATUS" +2. Select "Set status to:" as "WAITING". Copy the lines and run them on Autosubmit machine or workstation. +3. Fix the problem under your local SUNSET git directory. +4. Run `autosubmit refresh xxxx` and `nohup autosubmit run xxxx & disown`. + + + +If everything fails, you can also simply recreate the experiment by `autosubmit create xxxx` --> `autosubmit refresh xxxx` --> `nohup autosubmit run xxxx & disown`. + diff --git a/recipes/recipe_scorecards_vic.yml b/use_cases/ex1_2_autosubmit_scorecards/ex1_2-recipe.yml similarity index 61% rename from recipes/recipe_scorecards_vic.yml rename to use_cases/ex1_2_autosubmit_scorecards/ex1_2-recipe.yml index fbd2cb90e2cd162f90e517bcbd82fe634783b8e0..73f16311f93cf69aa439945b00715b7d29f80afa 100644 --- a/recipes/recipe_scorecards_vic.yml +++ b/use_cases/ex1_2_autosubmit_scorecards/ex1_2-recipe.yml @@ -1,25 +1,17 @@ -################################################################################ -## RECIPE DESCRIPTION -################################################################################ - Description: - Author: V. Agudetse - Info: Test for recipe splitting - -################################################################################ -## ANALYSIS CONFIGURATION -################################################################################ + Author: An-Chi Ho + Info: Compute Skills and Plot Scorecards with Autosubmit Analysis: - Horizon: Seasonal - Variables: # ECVs and Indicators? + Horizon: seasonal + Variables: - {name: tas, freq: monthly_mean} Datasets: System: # multiple systems for single model, split if Multimodel = F - {name: ECMWF-SEAS5} Multimodel: False # single option Reference: - - {name: ERA5} # multiple references for single model? + - {name: ERA5} Time: sdate: # list, split - '0101' @@ -42,7 +34,7 @@ Analysis: Region: # multiple lists, split? Add region name if length(Region) > 1 - {name: "global", latmin: -90, latmax: 90, lonmin: 0, lonmax: 359.9} Regrid: - method: bilinear ## TODO: allow multiple methods? + method: bilinear type: to_system Workflow: Anomalies: @@ -50,34 +42,30 @@ Analysis: cross_validation: no save: 'none' Calibration: - method: raw ## TODO: list, split? + method: raw save: 'none' Skill: metric: mean_bias EnsCorr rps rpss crps crpss EnsSprErr # list, don't split cross_validation: yes save: 'all' Probabilities: - percentiles: [[1/3, 2/3]] # list, don't split + percentiles: [[1/3, 2/3]] save: 'none' - # Visualization: - # plots: skill_metrics - Indicators: - index: no # ? Scorecards: execute: yes # yes/no regions: Extra-tropical NH: {lon.min: 0, lon.max: 360, lat.min: 30, lat.max: 90} Tropics: {lon.min: 0, lon.max: 360, lat.min: -30, lat.max: 30} Extra-tropical SH : {lon.min: 0, lon.max: 360, lat.min: -30, lat.max: -90} - start_months: NULL + start_months: 'all' metric: mean_bias enscorr rpss crpss enssprerr metric_aggregation: 'score' - table_label: NULL + inf_to_na: TRUE # Optional, bool: set inf values in data to NA, default is FALSE table_label: NULL fileout_label: NULL col1_width: NULL col2_width: NULL calculate_diff: FALSE - ncores: 14 + ncores: 8 remove_NAs: no # bool, don't split Output_format: Scorecards # string, don't split @@ -87,20 +75,22 @@ Analysis: Run: Loglevel: INFO Terminal: yes - output_dir: /esarchive/scratch/vagudets/auto-s2s-outputs/ - code_dir: /esarchive/scratch/vagudets/repos/auto-s2s/ + filesystem: esarchive + output_dir: /esarchive/scratch/aho/auto-s2s-outputs/ + code_dir: /esarchive/scratch/aho/git/auto-s2s/ autosubmit: yes # fill only if using autosubmit auto_conf: - script: /esarchive/scratch/vagudets/repos/auto-s2s/example_scripts/test_scorecards_workflow.R # replace with the path to your script - expid: a6ae # replace with your EXPID - hpc_user: bsc32762 # replace with your hpc username + script: /esarchive/scratch/aho/git/auto-s2s/use_cases/ex1_2_autosubmit_scorecards/ex1_2-script.R # replace with the path to your script + expid: a6pc # replace with your EXPID + hpc_user: bsc32734 # replace with your hpc username wallclock: 03:00 # hh:mm - processors_per_job: 14 + processors_per_job: 8 platform: nord3v2 custom_directives: ['#SBATCH --exclusive'] email_notifications: yes # enable/disable email notifications. Change it if you want to. - email_address: victoria.agudetse@bsc.es # replace with your email address + email_address: an.ho@bsc.es # replace with your email address notify_completed: yes # notify me by email when a job finishes - notify_failed: no # notify me by email when a job fails + notify_failed: yes # notify me by email when a job fails + diff --git a/use_cases/ex1_2_autosubmit_scorecards/ex1_2-script.R b/use_cases/ex1_2_autosubmit_scorecards/ex1_2-script.R new file mode 100644 index 0000000000000000000000000000000000000000..1f60798736ae021e55face318444f62149c2aec2 --- /dev/null +++ b/use_cases/ex1_2_autosubmit_scorecards/ex1_2-script.R @@ -0,0 +1,25 @@ +############################################################################### +## Author: An-Chi Ho +## Description: Computes some skill metrics and plots scorecards with Autosubmit +## Instructions: Follow the steps described in: +## use_cases/ex1_2_autosubmit_scorecards/ex1_2-handson.md +## This script should be called by bash script launch_SUNSET.sh. +############################################################################### + +# Load modules +source("modules/Loading/Loading.R") +source("modules/Anomalies/Anomalies.R") +source("modules/Skill/Skill.R") + +# Read recipe +args = commandArgs(trailingOnly = TRUE) +recipe_file <- args[1] +recipe <- read_atomic_recipe(recipe_file) + +# Load data +data <- Loading(recipe) +# Compute tos anomalies +data <- Anomalies(recipe, data) +# Compute skill metrics +skill_metrics <- Skill(recipe, data) + diff --git a/use_cases/ex1_3_nino_indices_comparison/ex1_3-handson.md b/use_cases/ex1_3_nino_indices_comparison/ex1_3-handson.md new file mode 100644 index 0000000000000000000000000000000000000000..8417a4f08f367ab282a66932227a5f6cf1c3c90a --- /dev/null +++ b/use_cases/ex1_3_nino_indices_comparison/ex1_3-handson.md @@ -0,0 +1,108 @@ +# Hands-on 1.3: Computation of El Niño indices for two seasonal models + +Author: Victòria Agudetse Roures and Núria Pérez Zanón + +## Goal +Create a SUNSET recipe to compute and evaluate the skill of several El Niño indices (Niño1+2, Niño3, Niño3.4 and Niño4) for two models: ECMWF-SEAS5 and MeteoFrance System 7. We include the information for both of the models in a single recipe, and the SUNSET Launcher will split the recipe into two 'atomic recipes': one for each model. The computation for atomic recipe will be run in the cluster as two separate jobs. + +It is also possible to split a recipe along different Reference datasets, Variables and Start Dates. + +## 0. Cloning the SUNSET repository + +The first step to use SUNSET is to create a copy of the code in your local environment. Open a terminal and `cd` to the directory where you would like to store your local copy of SUNSET. For example: `/esarchive/scratch//git/`. If a directory does not exist yet, you can create it with the `mkdir` shell command. + +```shell +# Clone the GitLab repository to create a local copy of the code +git clone https://earth.bsc.es/gitlab/es/sunset.git +``` + +## 1. Modifying the recipe + +The template recipe for this use case can be found in `use_cases/ex1_3_nino_indices_comparison/ex1_3-recipe.yml`. You should open it with an editor such as emacs or vim: + +```shell +# cd to the main SUNSET directory +# Open the recipe with a text editor such as vim or emacs +vim use_cases/ex1_3_nino_indices_comparison/ex1_3-recipe.yml +``` + +Then, under the 'Run' section of the recipe, you should edit the parameters `output_dir` and `code_dir` to point to your desire output directory and to the directory where your SUNSET code is stored, respectively. + +Feel free to also modify other aspects of the reicpe according to your particular needs. You can read more about the parameters and the available modules in the SUNSET wiki. + +## 2. The user-defined script + +The SUNSET Launcher is a bash script named launch_SUNSET.sh that can be found in the main directory of the SUNSET repository. When working without Autosubmit, it runs in two steps: + +1. Running the recipe checks, splitting the recipe into atomic recipes and creating the directory for the outputs. +2. Sending jobs to the cluster to run the user-defined script for each atomic recipe, using SLURM. + +The `prepare_outputs()` function is already incorporated into the first step. For that reason, it does not need to be included in the user-defined script in this particular case. In its place, we will use the function `read_atomic_recipe()`. The recipe path is passed as an argument onto the R script. The beginning of our script should look like this: + +```R +# Load the modules to be used +source("modules/Loading/Loading.R") +source("modules/Units/Units.R") +source("modules/Anomalies/Anomalies.R") +source("modules/Indices/Indices.R") +source("modules/Skill/Skill.R") + +# Define the recipe path as the first argument from the command line +args = commandArgs(trailingOnly = TRUE) +recipe_file <- args[1] +# Read the atomic recipe +recipe <- read_atomic_recipe(recipe_file) +``` + +The rest of the user-defined script can be written in the same way as any other SUNSET script: + +```R +# Load data +data <- Loading(recipe) +# Check units and transform if needed +data <- Units(recipe, data) +# Compute tos anomalies +data <- Anomalies(recipe, data) +# Compute Niño Indices +nino_indices <- Indices(data = data, recipe = recipe) + +# We can compute the Skill metrics for each of the El Niño indices, +# specifying that the data is spatially aggregated, with the parameter +# agg = "region". +for (index in nino_indices) { + nino_skill_metrics <- Skill(recipe = recipe, data = index, agg = "region") +} +``` + +A complete, ready-to-use sample of this example script can be found in `use_cases/ex1_3_nino_indices_comparison/ex1_3-script.R`. You can execute it as-is or copy it and modify it according to your specific needs. + +## 3. Launching the jobs with the SUNSET Launcher + +The first step is to connect to the HPC machine through `ssh`. When working without Autosubmit, the SUNSET Launcher should be run directly from the HPC machine where the jobs will run (for example, Nord3v2). There is no need to request an interactive session; the launcher script can be called directly from the login node. You can obtain detailed usage information by running: + +```shell +bash launch_SUNSET.sh --help +``` + +The mandatory arguments are the paths to the recipe and the script. We can also include other optional arguments to be used by SLURM, such as the number of CPUs to request (--cpus), the wallclock time for each job (--wallclock) and other extra directives (--custom_directives). You can refer to the [Nord3v2 user guide](https://www.bsc.es/user-support/nord3v2.php#jobdirectives) and the [SLURM sbatch documentation](https://slurm.schedmd.com/sbatch.html) for more information on the available options for the parameters. + +In this case, we are giving each job a wallclock time of 1 hour and requesting exclusive usage of all the cores in one node. The shell command to run SUNSET will look like this: + +```shell +bash launch_SUNSET.sh use_cases/ex1_3_nino_indices_comparison/ex1_3-recipe.yml use_cases/ex1_3_nino_indices_comparison/ex1_3-script.R --wallclock=01:00:00 --custom_directives="--exclusive" +``` + +You can check the status of your running jobs with the `squeue` command. The SLURM logs will be inside of your code directory, in a subfolder named 'out-logs'. It can be useful to check them in case of errors. + +## 4. Results and plots + +The spatial pattern and time series plots that were requested are saved inside `plots/Indices/` in the output directory. There will be one set of plots for each El Niño index, with a descriptive file name providing information about the content of the plot, the system/reference datasets, the start date and the forecast time. Here are some examples of the results: + +Spatial correlation for the ensemble mean: + +![](./figures/nino34_correlation_tos_ensmean_ECMWF-SEAS5_s1101_ftime02.png) +![](./figures/nino34_correlation_tos_ensmean_Meteo-France-System7_s1101_ftime02.png) + +Time series comparison between the model and the reference dataset (ERA5): +![](./figures/nino34_ECMWF-SEAS5_ERA5_s1101_ftime02.png) +![](./figures/nino34_Meteo-France-System7_ERA5_s1101_ftime02.png) diff --git a/use_cases/ex1_3_nino_indices_comparison/ex1_3-recipe.yml b/use_cases/ex1_3_nino_indices_comparison/ex1_3-recipe.yml new file mode 100644 index 0000000000000000000000000000000000000000..a4231e6a2403201b4e9b9d9963bd8ba99d6bd52c --- /dev/null +++ b/use_cases/ex1_3_nino_indices_comparison/ex1_3-recipe.yml @@ -0,0 +1,56 @@ +Description: + Author: V. Agudetse + Info: Computing El Nino indices for ECMWF SEAS5 and MeteoFrance System7 + +Analysis: + Horizon: seasonal + Variables: + - {name: tos, freq: monthly_mean, units: K} + Datasets: + System: + - {name: ECMWF-SEAS5} + - {name: Meteo-France-System7} + Multimodel: no + Reference: + - {name: ERA5} + Time: + sdate: '1101' + fcst_year: + hcst_start: '1993' + hcst_end: '2016' + ftime_min: 2 + ftime_max: 4 + Region: + latmin: -90 + latmax: 90 + lonmin: 0 + lonmax: 359.9 + Regrid: + method: bilinear + type: to_system + Workflow: + Calibration: + method: mse_min + save: none + Anomalies: + compute: yes + cross_validation: no + save: none + Indices: + Nino1+2: {save: all, plot_ts: yes, plot_sp: yes} + Nino3: {save: all, plot_ts: yes, plot_sp: yes} + Nino3.4: {save: all, plot_ts: yes, plot_sp: yes} + Nino4: {save: all, plot_ts: yes, plot_sp: yes} + Skill: + metric: mean_bias EnsCorr rps rpss crps crpss EnsSprErr + save: 'all' + ncores: 8 + remove_NAs: yes + Output_format: S2S4E + logo: TRUE +Run: + Loglevel: INFO + Terminal: yes + output_dir: /esarchive/scratch/vagudets/auto-s2s-outputs/ # ______ + code_dir: /esarchive/scratch/vagudets/repos/auto-s2s/ # _____ + autosubmit: no diff --git a/use_cases/ex1_3_nino_indices_comparison/ex1_3-script.R b/use_cases/ex1_3_nino_indices_comparison/ex1_3-script.R new file mode 100644 index 0000000000000000000000000000000000000000..c2b0ba341a015ef4440f653a8c2638cc0fe6619f --- /dev/null +++ b/use_cases/ex1_3_nino_indices_comparison/ex1_3-script.R @@ -0,0 +1,37 @@ +############################################################################### +## Author: Núria Pérez-Zanón and Victòria Agudetse Roures +## Description: Computes the Niño1+2, Niño3, Niño3.4 and Niño4 indices and some +## skill metrics for each index. +## Instructions: To run it, follow the steps described in: +## use_cases/ex1_3_nino_indices_comparison/ex1_3-handson.md +############################################################################### + +# Load modules +source("modules/Loading/Loading.R") +source("modules/Units/Units.R") +source("modules/Anomalies/Anomalies.R") +source("modules/Indices/Indices.R") +source("modules/Skill/Skill.R") + +# Read recipe +args = commandArgs(trailingOnly = TRUE) +recipe_file <- args[1] +recipe <- read_atomic_recipe(recipe_file) + +# Load data +data <- Loading(recipe) +# Check units and transform if needed +data <- Units(recipe, data) +# Calibrate data +# data <- Calibration(recipe, data) +# Compute tos anomalies +data <- Anomalies(recipe, data) +# Compute Niño Indices +nino_indices <- Indices(data = data, recipe = recipe) + +# We can compute the Skill metrics for each of the El Niño indices, +# specifying that the data is spatially aggregated, with the parameter +# agg = "region". +for (index in nino_indices) { + skill_metrics <- Skill(recipe = recipe, data = index, agg = "region") +} diff --git a/use_cases/ex1_3_nino_indices_comparison/figures/nino34_ECMWF-SEAS5_ERA5_s1101_ftime02.png b/use_cases/ex1_3_nino_indices_comparison/figures/nino34_ECMWF-SEAS5_ERA5_s1101_ftime02.png new file mode 100644 index 0000000000000000000000000000000000000000..b14e72311586239292d447c238442a9b9f2957f5 Binary files /dev/null and b/use_cases/ex1_3_nino_indices_comparison/figures/nino34_ECMWF-SEAS5_ERA5_s1101_ftime02.png differ diff --git a/use_cases/ex1_3_nino_indices_comparison/figures/nino34_Meteo-France-System7_ERA5_s1101_ftime02.png b/use_cases/ex1_3_nino_indices_comparison/figures/nino34_Meteo-France-System7_ERA5_s1101_ftime02.png new file mode 100644 index 0000000000000000000000000000000000000000..696de30e0a7a883bce952897b47c82223cb0d186 Binary files /dev/null and b/use_cases/ex1_3_nino_indices_comparison/figures/nino34_Meteo-France-System7_ERA5_s1101_ftime02.png differ diff --git a/use_cases/ex1_3_nino_indices_comparison/figures/nino34_correlation_tos_ensmean_ECMWF-SEAS5_s1101_ftime02.png b/use_cases/ex1_3_nino_indices_comparison/figures/nino34_correlation_tos_ensmean_ECMWF-SEAS5_s1101_ftime02.png new file mode 100644 index 0000000000000000000000000000000000000000..5b8af9efa6878dc51e4fea01bab2a240bf1ec2b4 Binary files /dev/null and b/use_cases/ex1_3_nino_indices_comparison/figures/nino34_correlation_tos_ensmean_ECMWF-SEAS5_s1101_ftime02.png differ diff --git a/use_cases/ex1_3_nino_indices_comparison/figures/nino34_correlation_tos_ensmean_Meteo-France-System7_s1101_ftime02.png b/use_cases/ex1_3_nino_indices_comparison/figures/nino34_correlation_tos_ensmean_Meteo-France-System7_s1101_ftime02.png new file mode 100644 index 0000000000000000000000000000000000000000..39cb7dcde131f31f3169d5e25ed0f5bd972c2abf Binary files /dev/null and b/use_cases/ex1_3_nino_indices_comparison/figures/nino34_correlation_tos_ensmean_Meteo-France-System7_s1101_ftime02.png differ