diff --git a/MODULES b/MODULES index e2709d66a4f84df02c03d753c44c801c44e8c60e..6654c0920d879c620e6878f08d901170513dae2c 100644 --- a/MODULES +++ b/MODULES @@ -17,6 +17,16 @@ if [[ $BSC_MACHINE == "nord3v2" ]]; then module load PROJ/9.0.0-GCCcore-8.3.0 module load Phantomjs/2.1.1 +elif [[ $HOSTNAME == "bsceshub02.bsc.es" ]]; then + + module purge + module load CDO/1.9.8-foss-2021b + module load R/4.2.1-foss-2021b + module load GEOS/3.11.0-GCC-11.2.0 + module load GDAL/3.5.2-foss-2021b-Python-3.9.6 + module load PROJ/9.1.0-foss-2021b + module load Phantomjs/2.1.1 + else module purge diff --git a/NEWS.md b/NEWS.md index e8cda565873c88b36148465b22808a32c3478b4c..40f97c3adbb979911614fc9f44bbe7444c84cd17 100644 --- a/NEWS.md +++ b/NEWS.md @@ -1,21 +1,60 @@ +SUNSET v2.0.0 +============= + +Modules: Loading, Units, Anomalies, Calibration, Downscaling, Indices, Skill, Saving, Visualization, Scorecards + +New features: +- New module for unit conversion, available for temperature, precipitation and pressure variables. +- New Indices module with the following indices available: NAO, Niño1+2, Niño3, Niño3.4 and Niño4. +- New Scorecards module to create Scorecard visualizations of the computed skill metrics. +- New Downscaling module making use of the in-house CSDownscale package. +- Recipe splitting feature: A recipe can be split into 'atomic recipes' and the same workflow can be easily parallelized for multiple forecast systems, references, variables, start dates and regions. +- Option to load, process, save and plot multiple variables in one atomic recipe. +- Possibility to use Autosubmit 4 as a workflow manager to run recipes in parallel. +- New SUNSET launcher script to split a recipe and run it in parallel directly in the cluster with SLURM or with Autosubmit. +- Option to load tas-tos blend by requesting the 'tas-tos' variable. +- For each module there is the possibility to choose whether or not to save the forecast, hindcast and/or observations. +- New skill metrics MSSS and MSE available. +- New use cases with hands-on tutorials. +- GRIB file loading. + +Summary of bugfixes/improvements: + +- The names of the module functions have changed to be the same as the name of the module (e.g. load_datasets() has become Loading()). The old names will be deprecated in a future release. +- New datasets and variables added to the seasonal archive. +- The launcher script and the prepare_outputs() function have a new "disable_unique_ID" option, which removes the numerical identifier that is added to the name of the output folder when it is generated. +- The seasonal and decadal loading functions have been renamed and can now sourced and called from the same Loading module. +- Bugfix in the recipe checker: checks are now completed even when the time horizon is not correct. +- The functions have been adapted to the new s2dv_cube structure in CSTools>5.0.0. +- The metric 'corr' (correlation for each ensemble member) has been renamed to 'corr_individual_members'. +- Datasets saved under 'daily' folders in esarchive can now be loaded along with 'daily_mean' datasets. +- Start date information has been added to the plot names to avoid overwriting plots from different start dates. +- In the Visualization section of the recipe, the user can specify which plots to generate, whether they should be in a single-panel or multi-panel layout, and choose between different projections. +- General improvements to the plots: color scheme, units, titles and subtitles, layout and colorbars. +- Anomalies module can compute anomalies when hindcast and observations do not share the same grid. +- Added module configuration to run SUNSET in the BSC hub. +- Language specification has been added to lubridate functions to ensure they work well in all language locales. + ESS Verification Suite v1.1.0 +============================= Modules: Loading, Anomalies, Calibration, Skill, Saving, Visualization New features: -New module for anomaly computation. -New 'Scorecards' output format (different file names and paths from the default format). -New 'recipe checker' feature in prepare_outputs(): It runs a series of checks on the recipe to detect potential errors, typos, or missing information. +- New module for anomaly computation. +- New 'Scorecards' output format (different file names and paths from the default format). +- New 'recipe checker' feature in prepare_outputs(): It runs a series of checks on the recipe to detect potential errors, typos, or missing information. + Summary of fixes/improvements: -Changed the names of the seasonal systems from the names in /esarchive to the official names in the CDS. -Fixed a bug in the conversion of precipitation units. -Fixed a bug related to the consistency between experiment and observation dates for some systems. -Function parameters have been simplified and uniformized. -Improvements in the logging functionality for longer messages. -Improvements to the plots generated by the Visualization module. -compute_probabilities() now returns the fcst probabilities as well. +- Changed the names of the seasonal systems from the names in /esarchive to the official names in the CDS. +- Fixed a bug in the conversion of precipitation units. +- Fixed a bug related to the consistency between experiment and observation dates for some systems. +- Function parameters have been simplified and uniformized. +- Improvements in the logging functionality for longer messages. +- Improvements to the plots generated by the Visualization module. +- compute_probabilities() now returns the fcst probabilities as well. ESS Verification Suite v1.0.0 ============================= diff --git a/README.md b/README.md index 538fa029545cdf20c8043f9caef9be76c82990cc..2ae44e8e6d6619cbe8a08bff72c644f3c33c9163 100644 --- a/README.md +++ b/README.md @@ -1,19 +1,29 @@ -ESS Verification Suite +SUNSET: SUbseasoNal to decadal climate forecast post-processing and asSEssmenT suite ====================== -This is the Git project for the ESS Verification Suite, which will serve as a tool for research projects and operational workflows involving subseasonal to seasonal to decadal forecast verification. +This is the Git project for the SUNSET, a collaborative in-house tool developed at BSC-ES for research projects and operational workflows involving subseasonal to seasonal to decadal forecast verification. -The main developers of the tool are Victòria Agudetse (@vagudets), An-Chi Ho (@aho), Lluís Palma (@lpalma) and Núria Pérez-Zanón (@nperez). +This is the Git project for SUNSET, an collaborative R-based tool developed in-house at BSC-ES that aims to provide climate services for sub-seasonal, seasonal and decadal climate forecast time scales. The tool post-processes climate forecast outputs by applying state-of-the-art methodologies to tailor climate products for each application and sector (e.g.: agriculture, energy, water management, or health). + +Its modular design allows flexibility in defining the required post-processing steps, as well as the products definition by deciding on the forecast system and reference datasets, variables, and forecast horizon among others. The tool also allows for the creation and visualization of climate forecast products, such as maps for the most likely terciles, and performs the verification of the products, which can be visualized on maps and scorecards. The inclusion of a launcher script provides a user-friendly way to parallelize the computation on HPC machines. + +Victòria Agudetse (@vagudets) is the maintainer of the SUNSET repository. Resources --------- -You can access the documentation of the Verification Suite through the wiki: -[Auto-s2s Wiki](https://earth.bsc.es/gitlab/es/auto-s2s/-/wikis/home?target=_blank) +You can access the documentation through the wiki: +[SUNSET Wiki](https://earth.bsc.es/gitlab/es/sunset/-/wikis/home?target=_blank) + +Use cases and hands-on tutorials are available in the repository for you to follow: + +[SUNSET Use Cases](https://earth.bsc.es/gitlab/es/sunset/-/tree/master/use_cases/) You may also find useful information in the slides from past user meetings: +[R-tools Climate Forecast Analysis Training session 2023](https://earth.bsc.es/wiki/lib/exe/fetch.php?media=tools:day2_04_handson_r_tools.pdf) + [User meeting March 2023](https://docs.google.com/presentation/d/18VoqgJCzcZTmqNyXL3op_KecsPxsWRkf/edit#slide=id.p1?target=_blank) [User meeting September 2022](https://docs.google.com/presentation/d/14-qq__fblMt7xvJDaqS5UqfQMXWCf3Ju/edit#slide=id.p1?target=_blank) @@ -25,9 +35,9 @@ Branching strategy Branches containing developments that are to be merged into the tool must contain "dev-" at the beginning of the name, followed by a short, meaningful description of the development in question. E.g. "dev-loading-subseasonal" for the branch containing developments related to the loading of subseasonal datasets. -Users that wish to incorporate their own developments into the core of the tool are encouraged to create a personal fork of the Auto-S2S repository to work on their projects. Please contact Victòria Agudetse at victoria.agudetse@bsc.es to discuss the first steps. +Users that wish to incorporate their own developments into the core of the tool are encouraged to create a local copy repository to work on their projects and push their changes to a new branch. Please contact Victòria Agudetse (@vagudets) or Núria Pérez-Zanón (@nperez) to discuss the first steps. Mailing list ------------ -User meetings, internal releases and news are announced through the mailing list. You can send an email to victoria.agudetse@bsc.es or an.ho@bsc.es to request subscription. +User meetings, internal releases and news are announced through the mailing list. You can send an email to victoria.agudetse@bsc.es to request subscription. diff --git a/autosubmit/auto-scorecards.sh b/autosubmit/auto-scorecards.sh index 4b5273725bed84811e1267048d035a0e2f712a28..c30f643f3be53f216ead66675a9545a0e159198a 100644 --- a/autosubmit/auto-scorecards.sh +++ b/autosubmit/auto-scorecards.sh @@ -2,8 +2,8 @@ ############ AUTOSUBMIT INPUTS ############ proj_dir=%PROJDIR% -outdir=%OUTDIR% -recipe=%RECIPE% +outdir=%common.OUTDIR% +recipe=%common.RECIPE% ############################### cd $proj_dir diff --git a/conda_installation/environment-sunset.yml b/conda_installation/environment-sunset.yml new file mode 100644 index 0000000000000000000000000000000000000000..27dbd8b96bfb39955ba302c6712bc3a8cca0ef36 --- /dev/null +++ b/conda_installation/environment-sunset.yml @@ -0,0 +1,375 @@ +name: condaSUNSET +channels: + - r + - conda-forge + - defaults +dependencies: + - _libgcc_mutex=0.1=conda_forge + - _openmp_mutex=4.5=2_gnu + - _r-mutex=1.0.1=anacondar_1 + - atk-1.0=2.38.0=hd4edc92_1 + - binutils_impl_linux-64=2.40=hf600244_0 + - blosc=1.21.5=h0f2a231_0 + - boost-cpp=1.78.0=h5adbc97_2 + - bwidget=1.9.14=ha770c72_1 + - bzip2=1.0.8=h7f98852_4 + - c-ares=1.19.1=hd590300_0 + - ca-certificates=2023.7.22=hbcca054_0 + - cairo=1.16.0=ha61ee94_1014 + - cfitsio=4.1.0=hd9d235c_0 + - curl=8.1.2=h409715c_0 + - expat=2.5.0=hcb278e6_1 + - fftw=3.3.10=nompi_hc118613_108 + - font-ttf-dejavu-sans-mono=2.37=hab24e00_0 + - font-ttf-inconsolata=3.000=h77eed37_0 + - font-ttf-source-code-pro=2.038=h77eed37_0 + - font-ttf-ubuntu=0.83=hab24e00_0 + - fontconfig=2.14.2=h14ed4e7_0 + - fonts-conda-ecosystem=1=0 + - fonts-conda-forge=1=0 + - freeglut=3.2.2=h9c3ff4c_1 + - freetype=2.12.1=h267a509_2 + - freexl=1.0.6=h166bdaf_1 + - fribidi=1.0.10=h36c2ea0_0 + - gcc_impl_linux-64=13.2.0=h338b0a0_2 + - gdal=3.5.2=py311hd39052d_7 + - gdk-pixbuf=2.42.8=hff1cb4f_1 + - geos=3.11.0=h27087fc_0 + - geotiff=1.7.1=ha76d385_4 + - gettext=0.21.1=h27087fc_0 + - gfortran_impl_linux-64=13.2.0=h76e1118_2 + - ghostscript=10.02.0=h59595ed_0 + - giflib=5.2.1=h0b41bf4_3 + - gmp=6.2.1=h58526e2_0 + - graphite2=1.3.13=h58526e2_1001 + - graphviz=6.0.2=h99bc08f_0 + - gsl=2.7=he838d99_0 + - gtk2=2.24.33=h90689f9_2 + - gts=0.7.6=h977cf35_4 + - gxx_impl_linux-64=13.2.0=h338b0a0_2 + - harfbuzz=6.0.0=h8e241bc_0 + - hdf4=4.2.15=h9772cbc_5 + - hdf5=1.12.2=nompi_h4df4325_101 + - icu=70.1=h27087fc_0 + - imagemagick=7.1.0_55=pl5321h0d24a18_0 + - jasper=2.0.33=h0ff4b12_1 + - jbig=2.1=h7f98852_2003 + - jpeg=9e=h0b41bf4_3 + - json-c=0.16=hc379101_0 + - kealib=1.4.15=ha7026e8_1 + - kernel-headers_linux-64=2.6.32=he073ed8_16 + - keyutils=1.6.1=h166bdaf_0 + - krb5=1.20.1=h81ceb04_0 + - lcms2=2.14=h6ed2654_0 + - ld_impl_linux-64=2.40=h41732ed_0 + - lerc=4.0.0=h27087fc_0 + - libaec=1.0.6=hcb278e6_1 + - libblas=3.9.0=18_linux64_openblas + - libcblas=3.9.0=18_linux64_openblas + - libcurl=8.1.2=h409715c_0 + - libdap4=3.20.6=hd7c4107_2 + - libdeflate=1.14=h166bdaf_0 + - libedit=3.1.20191231=he28a2e2_2 + - libev=4.33=h516909a_1 + - libexpat=2.5.0=hcb278e6_1 + - libffi=3.4.2=h7f98852_5 + - libgcc-devel_linux-64=13.2.0=ha9c7c90_2 + - libgcc-ng=13.2.0=h807b86a_2 + - libgd=2.3.3=h18fbbfe_3 + - libgdal=3.5.2=h27ae5c1_7 + - libgfortran-ng=13.2.0=h69a702a_2 + - libgfortran5=13.2.0=ha4646dd_2 + - libglib=2.78.0=hebfc3b9_0 + - libglu=9.0.0=he1b5a44_1001 + - libgomp=13.2.0=h807b86a_2 + - libiconv=1.17=h166bdaf_0 + - libkml=1.3.0=h37653c0_1015 + - liblapack=3.9.0=18_linux64_openblas + - libnetcdf=4.8.1=nompi_h261ec11_106 + - libnghttp2=1.52.0=h61bc06f_0 + - libnsl=2.0.0=hd590300_1 + - libopenblas=0.3.24=pthreads_h413a1c8_0 + - libpng=1.6.39=h753d276_0 + - libpq=14.5=hb675445_5 + - librsvg=2.54.4=h7abd40a_0 + - librttopo=1.1.0=hf730bdb_11 + - libsanitizer=13.2.0=h7e041cc_2 + - libspatialite=5.0.1=hfbd986c_21 + - libsqlite=3.43.0=h2797004_0 + - libssh2=1.11.0=h0841786_0 + - libstdcxx-devel_linux-64=13.2.0=ha9c7c90_2 + - libstdcxx-ng=13.2.0=h7e041cc_2 + - libtiff=4.4.0=h82bc61c_5 + - libtool=2.4.7=h27087fc_0 + - libuuid=2.38.1=h0b41bf4_0 + - libwebp=1.2.4=h522a892_0 + - libwebp-base=1.2.4=h166bdaf_0 + - libxcb=1.13=h7f98852_1004 + - libxml2=2.10.3=hca2bb57_4 + - libzip=1.10.1=h2629f0a_3 + - libzlib=1.2.13=hd590300_5 + - lz4-c=1.9.4=hcb278e6_0 + - make=4.3=hd18ef5c_1 + - ncurses=6.4=hcb278e6_0 + - nspr=4.35=h27087fc_0 + - nss=3.92=h1d7d5a4_0 + - numpy=1.26.0=py311h64a7726_0 + - openjpeg=2.5.0=h7d73246_1 + - openssl=3.1.3=hd590300_0 + - pandoc=3.1.3=h32600fe_0 + - pango=1.50.14=hd33c08f_0 + - pcre=8.45=h9c3ff4c_0 + - pcre2=10.40=hc3806b6_0 + - perl=5.32.1=4_hd590300_perl5 + - phantomjs=2.1.1=ha770c72_1 + - pip=23.2.1=pyhd8ed1ab_0 + - pixman=0.42.2=h59595ed_0 + - pkg-config=0.29.2=h36c2ea0_1008 + - poppler=22.10.0=h92391eb_0 + - poppler-data=0.4.12=hd8ed1ab_0 + - postgresql=14.5=h3248436_5 + - proj=9.1.0=h93bde94_0 + - pthread-stubs=0.4=h36c2ea0_1001 + - python=3.11.5=hab00c5b_0_cpython + - python_abi=3.11=4_cp311 + - r-abind=1.4_5=r42hc72bb7e_1005 + - r-askpass=1.2.0=r42h57805ef_0 + - r-assertthat=0.2.1=r42hc72bb7e_4 + - r-backports=1.4.1=r42h57805ef_2 + - r-base=4.2.2=h6b4767f_2 + - r-base64enc=0.1_3=r42h57805ef_1006 + - r-bigmemory=4.6.1=r42ha503ecb_2 + - r-bigmemory.sri=0.1.6=r42hc72bb7e_1 + - r-brio=1.1.3=r42h57805ef_2 + - r-bslib=0.5.1=r42hc72bb7e_0 + - r-cachem=1.0.8=r42h57805ef_1 + - r-callr=3.7.3=r42hc72bb7e_1 + - r-class=7.3_22=r42h57805ef_1 + - r-classint=0.4_10=r42h61816a4_0 + - r-cli=3.6.1=r42ha503ecb_1 + - r-climdex.pcic=1.1_11=r42ha503ecb_2 + - r-climprojdiags=0.3.2=r42hc72bb7e_1 + - r-clock=0.7.0=r42ha503ecb_1 + - r-codetools=0.2_19=r42hc72bb7e_1 + - r-colorspace=2.1_0=r42h57805ef_1 + - r-commonmark=1.9.0=r42h57805ef_1 + - r-configr=0.3.5=r42hc72bb7e_1 + - r-cowplot=1.1.1=r42hc72bb7e_2 + - r-cpp11=0.4.6=r42hc72bb7e_0 + - r-crayon=1.5.2=r42hc72bb7e_2 + - r-curl=5.0.1=r42hf9611b0_0 + - r-dbi=1.1.3=r42hc72bb7e_2 + - r-desc=1.4.2=r42hc72bb7e_2 + - r-diffobj=0.3.5=r42h57805ef_2 + - r-digest=0.6.33=r42ha503ecb_0 + - r-docopt=0.7.1=r42hc72bb7e_3 + - r-doparallel=1.0.17=r42hc72bb7e_2 + - r-dotcall64=1.0_2=r42h61816a4_2 + - r-dplyr=1.1.2=r42ha503ecb_1 + - r-e1071=1.7_13=r42ha503ecb_1 + - r-easyncdf=0.1.2=r42hc72bb7e_1 + - r-easyverification=0.4.5=r42ha503ecb_0 + - r-ellipsis=0.3.2=r42h57805ef_2 + - r-evaluate=0.21=r42hc72bb7e_1 + - r-fansi=1.0.4=r42h57805ef_1 + - r-farver=2.1.1=r42ha503ecb_2 + - r-fastmap=1.1.1=r42ha503ecb_1 + - r-fields=15.2=r42h61816a4_0 + - r-fnn=1.1.3.2=r42ha503ecb_1 + - r-fontawesome=0.5.2=r42hc72bb7e_0 + - r-foreach=1.5.2=r42hc72bb7e_2 + - r-formattable=0.2.1=r42ha770c72_2 + - r-fs=1.6.3=r42ha503ecb_0 + - r-future=1.33.0=r42hc72bb7e_0 + - r-generics=0.1.3=r42hc72bb7e_2 + - r-geomap=2.5_0=r42h57805ef_2 + - r-geomapdata=2.0_2=r42hc72bb7e_0 + - r-ggplot2=3.4.3=r42hc72bb7e_0 + - r-globals=0.16.2=r42hc72bb7e_1 + - r-glue=1.6.2=r42h57805ef_2 + - r-gridextra=2.3=r42hc72bb7e_1005 + - r-gridgraphics=0.5_1=r42hc72bb7e_2 + - r-gtable=0.3.4=r42hc72bb7e_0 + - r-highr=0.10=r42hc72bb7e_1 + - r-htmltools=0.5.6=r42ha503ecb_0 + - r-htmlwidgets=1.6.2=r42hc72bb7e_1 + - r-httpuv=1.6.11=r42ha503ecb_1 + - r-httr=1.4.7=r42hc72bb7e_0 + - r-ini=0.3.1=r42hc72bb7e_1005 + - r-isoband=0.2.7=r42ha503ecb_2 + - r-iterators=1.0.14=r42hc72bb7e_2 + - r-jquerylib=0.1.4=r42hc72bb7e_2 + - r-jsonlite=1.8.7=r42h57805ef_0 + - r-kableextra=1.3.4=r42hc72bb7e_2 + - r-kernsmooth=2.23_22=r42h13b3f57_0 + - r-knitr=1.44=r42hc72bb7e_0 + - r-labeling=0.4.3=r42hc72bb7e_0 + - r-later=1.3.1=r42ha503ecb_1 + - r-lattice=0.21_8=r42h57805ef_1 + - r-lifecycle=1.0.3=r42hc72bb7e_2 + - r-listenv=0.9.0=r42hc72bb7e_1 + - r-lobstr=1.1.2=r42ha503ecb_3 + - r-log4r=0.4.3=r42h57805ef_1 + - r-lubridate=1.9.2=r42h57805ef_2 + - r-magick=2.7.3=r42h7525677_1 + - r-magrittr=2.0.3=r42h57805ef_2 + - r-mapproj=1.2.11=r42h57805ef_1 + - r-maps=3.4.1=r42h57805ef_2 + - r-mass=7.3_60=r42h57805ef_1 + - r-matrix=1.6_1.1=r42h316c678_0 + - r-mba=0.1_0=r42ha503ecb_1 + - r-memoise=2.0.1=r42hc72bb7e_2 + - r-memuse=4.2_3=r42h57805ef_1 + - r-mgcv=1.9_0=r42h316c678_0 + - r-mime=0.12=r42h57805ef_2 + - r-multiapply=2.1.4=r42hc72bb7e_1 + - r-munsell=0.5.0=r42hc72bb7e_1006 + - r-nbclust=3.0.1=r42hc72bb7e_2 + - r-ncdf4=1.21=r42h15f2bca_0 + - r-nlme=3.1_163=r42h61816a4_0 + - r-nnet=7.3_19=r42h57805ef_1 + - r-openssl=2.1.1=r42hb353fa6_0 + - r-parallelly=1.36.0=r42hc72bb7e_1 + - r-pbapply=1.7_2=r42hc72bb7e_0 + - r-pcict=0.5_4.4=r42h57805ef_1 + - r-pillar=1.9.0=r42hc72bb7e_1 + - r-pkgbuild=1.4.2=r42hc72bb7e_0 + - r-pkgconfig=2.0.3=r42hc72bb7e_3 + - r-pkgload=1.3.3=r42hc72bb7e_0 + - r-plyr=1.8.8=r42ha503ecb_1 + - r-praise=1.0.0=r42hc72bb7e_1007 + - r-prettyunits=1.2.0=r42hc72bb7e_0 + - r-processx=3.8.2=r42h57805ef_0 + - r-proj4=1.0_12=r42h4db2be8_0 + - r-promises=1.2.1=r42ha503ecb_0 + - r-proxy=0.4_27=r42h57805ef_2 + - r-pryr=0.1.6=r42ha503ecb_1 + - r-ps=1.7.5=r42h57805ef_1 + - r-r6=2.5.1=r42hc72bb7e_2 + - r-rappdirs=0.3.3=r42h57805ef_2 + - r-rcolorbrewer=1.1_3=r42h6115d3f_0 + - r-rcpp=1.0.11=r42h7df8631_0 + - r-rcpparmadillo=0.12.6.4.0=r42h08d816e_0 + - r-rcpptoml=0.2.2=r42ha503ecb_1 + - r-rematch2=2.1.2=r42hc72bb7e_3 + - r-rlang=1.1.1=r42ha503ecb_1 + - r-rmarkdown=2.25=r42hc72bb7e_0 + - r-rnaturalearth=0.1.0=r42hc72bb7e_1 + - r-rpmg=2.2_7=r42hc72bb7e_0 + - r-rprojroot=2.0.3=r42hc72bb7e_1 + - r-rstudioapi=0.15.0=r42hc72bb7e_0 + - r-rvest=1.0.3=r42hc72bb7e_2 + - r-s2=1.1.4=r42h5eac2b3_1 + - r-s2dv=1.4.1=r42hc72bb7e_1 + - r-s2dverification=2.10.3=r42hc72bb7e_2 + - r-sass=0.4.7=r42ha503ecb_0 + - r-scales=1.2.1=r42hc72bb7e_2 + - r-selectr=0.4_2=r42hc72bb7e_3 + - r-sf=1.0_7=r42h25da31b_5 + - r-shiny=1.7.5=r42h785f33e_0 + - r-sourcetools=0.1.7_1=r42ha503ecb_1 + - r-sp=2.0_0=r42h57805ef_0 + - r-spam=2.9_1=r42hd9ac46e_2 + - r-specsverification=0.5_3=r42h7525677_2 + - r-splancs=2.01_44=r42h61816a4_0 + - r-startr=2.3.0=r42hc72bb7e_0 + - r-stringi=1.7.12=r42h1ae9187_0 + - r-stringr=1.5.0=r42h785f33e_1 + - r-svglite=2.1.1=r42h329214f_1 + - r-sys=3.4.2=r42h57805ef_1 + - r-systemfonts=1.0.4=r42haf97adc_2 + - r-testthat=3.1.10=r42ha503ecb_0 + - r-tibble=3.2.1=r42h57805ef_2 + - r-timechange=0.2.0=r42ha503ecb_1 + - r-tinytex=0.46=r42hc72bb7e_0 + - r-tzdb=0.4.0=r42ha503ecb_1 + - r-units=0.8_4=r42ha503ecb_0 + - r-utf8=1.2.3=r42h57805ef_1 + - r-uuid=1.1_1=r42h57805ef_0 + - r-vctrs=0.6.3=r42ha503ecb_0 + - r-viridis=0.6.4=r42hc72bb7e_0 + - r-viridislite=0.4.2=r42hc72bb7e_1 + - r-waldo=0.5.1=r42hc72bb7e_1 + - r-webshot=0.5.5=r42hc72bb7e_0 + - r-withr=2.5.1=r42hc72bb7e_0 + - r-wk=0.8.0=r42ha503ecb_0 + - r-xfun=0.40=r42ha503ecb_0 + - r-xml2=1.3.3=r42h044e5c7_2 + - r-xtable=1.8_4=r42hc72bb7e_5 + - r-yaml=2.3.5=r42h06615bd_1 + - readline=8.2=h8228510_1 + - sed=4.8=he412f7d_0 + - setuptools=68.2.2=pyhd8ed1ab_0 + - snappy=1.1.10=h9fff704_0 + - sqlite=3.43.0=h2c6b66d_0 + - sysroot_linux-64=2.12=he073ed8_16 + - tiledb=2.11.3=h3f4058f_1 + - tk=8.6.13=h2797004_0 + - tktable=2.10=h0c5db8f_5 + - tzcode=2023c=h0b41bf4_0 + - tzdata=2023c=h71feb2d_0 + - udunits2=2.2.28=h40f5838_2 + - wheel=0.41.2=pyhd8ed1ab_0 + - xerces-c=3.2.4=h55805fa_1 + - xorg-fixesproto=5.0=h7f98852_1002 + - xorg-inputproto=2.3.2=h7f98852_1002 + - xorg-kbproto=1.0.7=h7f98852_1002 + - xorg-libice=1.0.10=h7f98852_0 + - xorg-libsm=1.2.3=hd9c2040_1000 + - xorg-libx11=1.8.4=h0b41bf4_0 + - xorg-libxau=1.0.11=hd590300_0 + - xorg-libxdmcp=1.1.3=h7f98852_0 + - xorg-libxext=1.3.4=h0b41bf4_2 + - xorg-libxfixes=5.0.3=h7f98852_1004 + - xorg-libxi=1.7.10=h7f98852_0 + - xorg-libxrender=0.9.10=h7f98852_1003 + - xorg-libxt=1.3.0=hd590300_0 + - xorg-renderproto=0.11.1=h7f98852_1002 + - xorg-xextproto=7.3.0=h0b41bf4_1003 + - xorg-xproto=7.0.31=h7f98852_1007 + - xz=5.2.6=h166bdaf_0 + - zlib=1.2.13=hd590300_5 + - zstd=1.5.5=hfc55251_0 + - pip: + - argparse==1.4.0 + - autosubmit==4.0.98 + - autosubmitconfigparser==1.0.49 + - bcrypt==4.0.1 + - bscearth-utils==0.5.2 + - cdo==1.6.0 + - certifi==2023.7.22 + - cffi==1.16.0 + - charset-normalizer==3.3.1 + - configobj==5.0.8 + - coverage==7.3.2 + - cryptography==41.0.5 + - cycler==0.12.1 + - cython==3.0.4 + - fonttools==4.43.1 + - idna==3.4 + - iniconfig==2.0.0 + - kiwisolver==1.4.5 + - matplotlib==3.5.3 + - mock==5.1.0 + - networkx==2.6.3 + - nose==1.3.7 + - packaging==23.2 + - paramiko==3.3.1 + - pillow==10.1.0 + - pluggy==1.3.0 + - portalocker==2.7.0 + - psutil==5.9.6 + - py3dotplus==1.1.0 + - pycparser==2.21 + - pygments==2.16.1 + - pynacl==1.5.0 + - pyparsing==3.1.1 + - pytest==7.4.3 + - python-dateutil==2.8.2 + - pythondialog==3.5.3 + - requests==2.31.0 + - ruamel-yaml==0.17.21 + - six==1.16.0 + - urllib3==2.0.7 + - xlib==0.21 +# prefix: /perm/cyce/conda/envs/condaCerise diff --git a/conda_installation/load_sunset.bash b/conda_installation/load_sunset.bash new file mode 100755 index 0000000000000000000000000000000000000000..add332b4a99210f3a890b14da77ee466f8b2ba8d --- /dev/null +++ b/conda_installation/load_sunset.bash @@ -0,0 +1,9 @@ +#!/bin/bash + +prefix=$1 + +conda env create --file environment-sunset.yml --prefix $prefix + +conda activate $prefix + +R -e "options(timeout = 600) ; install.packages('CSTools', repos='https://ftp.cixug.es/CRAN/')" diff --git a/conf/archive.yml b/conf/archive.yml index 0b643ae92d93bf55306b59c815c8daeddb8a3598..61f62be230b4ff05021a60a17e38e5e4d446cff9 100644 --- a/conf/archive.yml +++ b/conf/archive.yml @@ -21,13 +21,14 @@ esarchive: "ta850":"monthly_mean/ta850_f12h/", "g300":"monthly_mean/g300_f12h/", "g500":"monthly_mean/g500_f12h/", "g850":"monthly_mean/g500_f12h/", "tdps":"monthly_mean/tdps_f6h/", "psl":"monthly_mean/psl_f6h/", - "tos":"monthly_mean/tos_f6h/"} + "tos":"monthly_mean/tos_f6h/", "sic":"monthly_mean/sic_f24h/"} nmember: fcst: 51 hcst: 25 calendar: "proleptic_gregorian" time_stamp_lag: "0" reference_grid: "/esarchive/exp/ecmwf/system5c3s/monthly_mean/tas_f6h/tas_20180501.nc" + land_sea_mask: "/esarchive/exp/ecmwf/system5c3s/constant/lsm/lsm.nc" ECMWF-SEAS5.1: name: "ECMWF SEAS5 (v5.1)" institution: "European Centre for Medium-Range Weather Forecasts" @@ -53,7 +54,8 @@ esarchive: src: "exp/meteofrance/system7c3s/" monthly_mean: {"tas":"monthly_mean/tas_f6h/", "g500":"monthly_mean/g500_f12h/", "prlr":"monthly_mean/prlr_f24h/", "sfcWind": "monthly_mean/sfcWind_f6h/", - "tasmax":"monthly_mean/tasmax_f6h/", "tasmin": "monthly_mean/tasmin_f6h/"} + "tasmax":"monthly_mean/tasmax_f6h/", "tasmin": "monthly_mean/tasmin_f6h/", + "tos":"monthly_mean/tos_f6h/"} nmember: fcst: 51 hcst: 25 @@ -165,9 +167,11 @@ esarchive: "ta300":"montly_mean/ta300_f1h-r1440x721cds/", "ta500":"monthly_mean/ta500_f1h-r1440x721cds/", "ta850":"monthly_mean/ta850_f1h-r1440x721cds/", - "tos":"monthly_mean/tos_f1h-r1440x721cds/"} + "tos":"monthly_mean/tos_f1h-r1440x721cds/", + "sic":"monthly_mean/sic_f1h-r1440x721cds/"} calendar: "standard" reference_grid: "/esarchive/recon/ecmwf/era5/monthly_mean/tas_f1h-r1440x721cds/tas_201805.nc" + land_sea_mask: "/esarchive/recon/ecmwf/era5/constant/lsm-r1440x721cds/sftof.nc" ERA5-Land: name: "ERA5-Land" institution: "European Centre for Medium-Range Weather Forecasts" @@ -250,3 +254,26 @@ mars: monthly_mean: {"tas":""} calendar: "standard" reference_grid: "conf/grid_description/griddes_GRIB_system5_m1.txt" + +sample: + src: + System: + ECMWF-SEAS5.1: + name: "ECMWF SEAS5" + institution: "European Centre for Medium-Range Weather Forecasts" + src: + monthly_mean: {"tas":"", "prlr":""} + nmember: + fcst: 15 + hcst: 15 + calendar: "proleptic_gregorian" + time_stamp_lag: "0" + reference_grid: "conf/grid_description/griddes_GRIB_system51_m1.txt" + Reference: + ERA5: + name: "ERA5" + institution: "European Centre for Medium-Range Weather Forecasts" + src: "GRIB_era5_tas/" + monthly_mean: {"tas":"", "prlr":""} + calendar: "standard" + reference_grid: "conf/grid_description/griddes_GRIB_system5_m1.txt" diff --git a/conf/archive_decadal.yml b/conf/archive_decadal.yml index 2e0a1b296c14cdf1230e872ad94fd3d1e556829d..c25d8d3a86fb08670e1556db0b9281adf5e43d4c 100644 --- a/conf/archive_decadal.yml +++ b/conf/archive_decadal.yml @@ -95,8 +95,8 @@ esarchive: #version depends on member and variable version: {"tas":"v20200417", "psl":"v20200316", "pr":"v20200316", "ts":"v20200316", "tos":"v20200417"} daily_mean: - grid: {"tas":"gn"} - version: {"tasmin":"v20200101", "tasmax":"v20200101", "pr":"v20200417"} + grid: {"tasmin":"gn", "tasmax":"gn", "pr":"gn"} + version: {"tasmin":"v20200417", "tasmax":"v20200417", "pr":"v20200417"} calendar: "360-day" member: r1i1p1f2,r2i1p1f2,r3i1p1f2,r4i1p1f2,r5i1p1f2,r6i1p1f2,r7i1p1f2,r8i1p1f2,r9i1p1f2,r10i1p1f2 initial_month: 11 diff --git a/conf/variable-dictionary.yml b/conf/variable-dictionary.yml index 0bfbffe002c81d389207a54b8fb8687f991fd33e..c440eac1baa910973cf45e6918802135d2310bd4 100644 --- a/conf/variable-dictionary.yml +++ b/conf/variable-dictionary.yml @@ -204,13 +204,16 @@ vars: long_name: "Surface Upward Sensible Heat Flux" standard_name: "surface_upward_sensible_heat_flux" accum: no -## Adding new variable - tasanomaly: + tas-tos: units: "K" - long_name: "Near-Surface Air Temperature Anomaly" - standard_name: "air_temperature_anom" + long_name: "Blended air - sea temperature" + standard_name: "air_sea_temperature" + accum: no + sic: + units: "1" + long_name: "Sea Ice Concentration" + standard_name: "sea_ice_concentration" accum: no - # Coordinates diff --git a/launch_SUNSET.sh b/launch_SUNSET.sh index eb8fcf4638e3c96d422db3aad0bfd0af2740885c..153d64b3cee49c24066ad298464615f984a2ce35 100644 --- a/launch_SUNSET.sh +++ b/launch_SUNSET.sh @@ -118,7 +118,7 @@ if [ $run_method == "sbatch" ]; then scorecards=$( head -4 $tmpfile | tail -1) # Create directory for slurm output - logdir=${codedir}/out-logs/slurm_logs/ + logdir=${outdir}/logs/slurm/ mkdir -p $logdir echo "Slurm job logs will be stored in $logdir" diff --git a/modules/Anomalies/Anomalies.R b/modules/Anomalies/Anomalies.R index edb6a8beb96279ba166a2703605829c0757f0da3..2d54365a6e611291fc355a918de30c0e3e24a1b9 100644 --- a/modules/Anomalies/Anomalies.R +++ b/modules/Anomalies/Anomalies.R @@ -20,6 +20,11 @@ Anomalies <- function(recipe, data) { cross <- FALSE cross_msg <- "without" } + if (is.null(recipe$Analysis$remove_NAs)) { + na.rm <- FALSE + } else { + na.rm <- recipe$Analysis$remove_NAs + } original_dims <- data$hcst$dim # Save full fields @@ -57,12 +62,12 @@ Anomalies <- function(recipe, data) { clim_hcst <- Apply(data$hcst$data, target_dims = c('syear', 'ensemble'), mean, - na.rm = recipe$Analysis$remove_NAs, + na.rm = na.rm, ncores = recipe$Analysis$ncores)$output1 clim_obs <- Apply(data$obs$data, target_dims = c('syear', 'ensemble'), mean, - na.rm = recipe$Analysis$remove_NAs, + na.rm = na.rm, ncores = recipe$Anaysis$ncores)$output1 data$hcst$data <- Ano(data = data$hcst$data, clim = clim_hcst) data$obs$data <- Ano(data = data$obs$data, clim = clim_obs) diff --git a/modules/Indices/R/compute_nino.R b/modules/Indices/R/compute_nino.R index 8fd2c9c8a90b2e947000d940684a659a68cb49af..915dc9cedd826e9733f0b8495dbb7f72ee8edcbb 100644 --- a/modules/Indices/R/compute_nino.R +++ b/modules/Indices/R/compute_nino.R @@ -208,6 +208,7 @@ compute_nino <- function(data, recipe, region, standardised = TRUE, } } if (plot_sp) { + ## TODO: Remove sourcing of plot robinson and viz module code source("modules/Visualization/R/tmp/PlotRobinson.R") source("modules/Indices/R/correlation_eno.R") source("modules/Visualization/R/get_proj_code.R") @@ -227,28 +228,36 @@ compute_nino <- function(data, recipe, region, standardised = TRUE, correl_hcst <- Apply(list(data$hcst$data, nino$hcst$data), target_dims = c('syear', 'ensemble'), fun = function(x, y) { - x <- apply(x, 1, mean, na.rm = TRUE) - y <- apply(y, 1, mean, na.rm = TRUE) - dim(y) <- c(syear = length(y)) - dim(x) <- c(syear = length(x)) - res <- .correlation_eno(x, y, - time_dim = 'syear', method = 'pearson', alpha = alpha, - test.type = 'two-sided', pval = FALSE)}, - ncores = recipe$Analysis$ncores) + x <- apply(x, 1, mean, na.rm = TRUE) + y <- apply(y, 1, mean, na.rm = TRUE) + dim(y) <- c(syear = length(y)) + dim(x) <- c(syear = length(x)) + res <- .correlation_eno(x, y, time_dim = 'syear', + method = 'pearson', + alpha = alpha, + test.type = 'two-sided', + pval = FALSE)}, + ncores = recipe$Analysis$ncores) correl_hcst_full <- Apply(list(data$hcst$data, nino$hcst$data), target_dims = c('syear', 'ensemble'), fun = function(x,y) { - dim(y) <- c(syear = length(y)) - dim(x) <- c(syear = length(x)) - res <- .correlation_eno(x, y, - time_dim = 'syear', method = 'pearson', alpha = alpha, - test.type = 'two-sided', pval = FALSE)}, + dim(y) <- c(syear = length(y)) + dim(x) <- c(syear = length(x)) + res <- .correlation_eno(x, y, + time_dim = 'syear', + method = 'pearson', + alpha = alpha, + test.type = 'two-sided', + pval = FALSE)}, ncores = recipe$Analysis$ncores) + months <- lubridate::month(Subset(data$hcst$attrs$Dates, "syear", indices = 1), + label = T, abb = F, locale = "en_GB") + for (tstep in 1:dim(nino$obs$data)['time']) { map <- Subset(correl_obs$r, along = 'time', ind = tstep, drop = T) sig <- Subset(correl_obs$sig, along = 'time', ind = tstep, drop = T) if (tolower(recipe$Analysis$Horizon) == "seasonal") { - mes <- as.numeric(substr(recipe$Analysis$Time$sdate, 1,2)) + + mes <- as.numeric(substr(recipe$Analysis$Time$sdate, 1, 2)) + (tstep - 1) + (recipe$Analysis$Time$ftime_min - 1) mes <- ifelse(mes > 12, mes - 12, mes) fmonth <- sprintf("%02d", tstep - 1 + recipe$Analysis$Time$ftime_min) @@ -313,7 +322,7 @@ compute_nino <- function(data, recipe, region, standardised = TRUE, toptitle <- paste(recipe$Analysis$Datasets$System$name, "\n", "Ni\u00F1o", region_name, "SST Index -",var_name, "\n", "Correlation /", - month.abb[as.numeric(fmonth)], + month.abb[mes], "/", recipe$Analysis$Time$hcst_start, "-", recipe$Analysis$Time$hcst_end) plotfile <- paste0(recipe$Run$output_dir, "/plots/Indices/", @@ -367,7 +376,7 @@ compute_nino <- function(data, recipe, region, standardised = TRUE, toptitle <- paste(recipe$Analysis$Datasets$System$name, "\n", "Ni\u00F1o", region_name, "SST Index -",var_name, "\n", " Correlation /", - month.abb[as.numeric(fmonth)], + month.abb[mes], "/", recipe$Analysis$Time$hcst_start, "-", recipe$Analysis$Time$hcst_end) plotfile <- paste0(recipe$Run$output_dir, "/plots/Indices/", diff --git a/modules/Indices/R/plot_deterministic_forecast.R b/modules/Indices/R/plot_deterministic_forecast.R index da107e4732ec027f60729b7520a8d4264787a1a0..30f13b78db1654bcb35fa7a86b6f4ad93d7c6393 100644 --- a/modules/Indices/R/plot_deterministic_forecast.R +++ b/modules/Indices/R/plot_deterministic_forecast.R @@ -16,8 +16,13 @@ plot_deterministic_forecast <- function(obs, fcst, title = NULL, n_fcst <- as.numeric(dim(fcst)[time_dim]) if (is.null(ylims)) { - ylims <- c(-max(abs(fcst) + 0.1, abs(obs) + 0.1, na.rm = TRUE), - max(abs(fcst) + 0.1, abs(obs) + 0.1, na.rm = TRUE)) + if (all(is.na(fcst)) && all(is.na(obs))) { + # No data, return random ylims + ylims <- c(-1, 1) + } else { + ylims <- c(-max(abs(fcst) + 0.1, abs(obs) + 0.1, na.rm = TRUE), + max(abs(fcst) + 0.1, abs(obs) + 0.1, na.rm = TRUE)) + } } if (is.null(xlabs)) { xlabs <- 1:n_fcst diff --git a/modules/Loading/Dev_Loading.R b/modules/Loading/Dev_Loading.R deleted file mode 100644 index fb456eb31aee1f0ba2b0eded1839ca2dd8d4c723..0000000000000000000000000000000000000000 --- a/modules/Loading/Dev_Loading.R +++ /dev/null @@ -1,501 +0,0 @@ -## TODO: remove paths to personal scratchs -source("/esarchive/scratch/vagudets/repos/csoperational/R/get_regrid_params.R") -# Load required libraries/funs -source("modules/Loading/R/dates2load.R") -source("modules/Loading/R/get_timeidx.R") -source("modules/Loading/R/check_latlon.R") -## TODO: Move to prepare_outputs.R -source("tools/libs.R") -## TODO: remove these two lines when new as.s2dv_cube() is in CSTools -source('https://earth.bsc.es/gitlab/external/cstools/-/raw/develop-new_s2dv_cube/R/as.s2dv_cube.R') -source('https://earth.bsc.es/gitlab/external/cstools/-/raw/develop-new_s2dv_cube/R/zzz.R') - -## TODO: Source new s2dv_cube version -## TODO: Eliminate dim_var dimension (merge_across_dims?) - -load_datasets <- function(recipe) { - - # ------------------------------------------- - # Set params ----------------------------------------- - - hcst.inityear <- recipe$Analysis$Time$hcst_start - hcst.endyear <- recipe$Analysis$Time$hcst_end - lats.min <- recipe$Analysis$Region$latmin - lats.max <- recipe$Analysis$Region$latmax - lons.min <- recipe$Analysis$Region$lonmin - lons.max <- recipe$Analysis$Region$lonmax - ref.name <- recipe$Analysis$Datasets$Reference$name - exp.name <- recipe$Analysis$Datasets$System$name - - variable <- strsplit(recipe$Analysis$Variables$name, ", | |,")[[1]][1] - vars <- strsplit(recipe$Analysis$Variables$name, ", | |,")[[1]] - store.freq <- recipe$Analysis$Variables$freq - - # get sdates array - ## LOGGER: Change dates2load to extract logger from recipe? - sdates <- dates2load(recipe, recipe$Run$logger) - - idxs <- NULL - idxs$hcst <- get_timeidx(sdates$hcst, - recipe$Analysis$Time$ftime_min, - recipe$Analysis$Time$ftime_max, - time_freq=store.freq) - - if (!(is.null(sdates$fcst))) { - idxs$fcst <- get_timeidx(sdates$fcst, - recipe$Analysis$Time$ftime_min, - recipe$Analysis$Time$ftime_max, - time_freq=store.freq) - } - - ## TODO: Examine this verifications part, verify if it's necessary - # stream <- verifications$stream - # sdates <- verifications$fcst.sdate - - ## TODO: define fcst.name - ##fcst.name <- recipe$Analysis$Datasets$System[[sys]]$name - - # get esarchive datasets dict: - ## TODO: Adapt to 'filesystem' option in recipe - archive <- read_yaml("conf/archive.yml")$esarchive - exp_descrip <- archive$System[[exp.name]] - - freq.hcst <- unlist(exp_descrip[[store.freq]][variable]) - reference_descrip <- archive$Reference[[ref.name]] - freq.obs <- unlist(reference_descrip[[store.freq]][variable]) - obs.dir <- reference_descrip$src - fcst.dir <- exp_descrip$src - hcst.dir <- exp_descrip$src - fcst.nmember <- exp_descrip$nmember$fcst - hcst.nmember <- exp_descrip$nmember$hcst - - ## TODO: it is necessary? - ##if ("accum" %in% names(reference_descrip)) { - ## accum <- unlist(reference_descrip$accum[store.freq][[1]]) - ##} else { - ## accum <- FALSE - ##} - - var_dir_obs <- reference_descrip[[store.freq]][vars] - var_dir_exp <- exp_descrip[[store.freq]][vars] - - # ----------- - obs.path <- paste0(archive$src, - obs.dir, store.freq, "/$var$", "$var_dir$", - "/$var$_$file_date$.nc") - - hcst.path <- paste0(archive$src, - hcst.dir, store.freq, "/$var$", "$var_dir$", - "$var$_$file_date$.nc") - - fcst.path <- paste0(archive$src, - hcst.dir, store.freq, "/$var$", "$var_dir$", - "/$var$_$file_date$.nc") - - # Define regrid parameters: - #------------------------------------------------------------------- - regrid_params <- get_regrid_params(recipe, archive) - - # Longitude circular sort and latitude check - #------------------------------------------------------------------- - circularsort <- check_latlon(lats.min, lats.max, lons.min, lons.max) - - if (recipe$Analysis$Variables$freq == "monthly_mean"){ - split_multiselected_dims = TRUE - } else { - split_multiselected_dims = FALSE - } - - # Load hindcast - #------------------------------------------------------------------- - hcst <- Start(dat = hcst.path, - var = vars, - var_dir = var_dir_exp, - file_date = sdates$hcst, - time = idxs$hcst, - var_dir_depends = 'var', - latitude = values(list(lats.min, lats.max)), - latitude_reorder = Sort(), - longitude = values(list(lons.min, lons.max)), - longitude_reorder = circularsort, - transform = regrid_params$fcst.transform, - transform_params = list(grid = regrid_params$fcst.gridtype, - method = regrid_params$fcst.gridmethod), - transform_vars = c('latitude', 'longitude'), - synonims = list(latitude = c('lat', 'latitude'), - longitude = c('lon', 'longitude'), - ensemble = c('member', 'ensemble')), - ensemble = indices(1:hcst.nmember), - metadata_dims = 'var', # change to just 'var'? - return_vars = list(latitude = 'dat', - longitude = 'dat', - time = 'file_date'), - split_multiselected_dims = split_multiselected_dims, - retrieve = TRUE) - - # Remove var_dir dimension - if ("var_dir" %in% names(dim(hcst))) { - hcst <- Subset(hcst, along = "var_dir", indices = 1, drop = "selected") - } - - if (recipe$Analysis$Variables$freq == "daily_mean") { - # Adjusts dims for daily case, could be removed if startR allows - # multidim split - names(dim(hcst))[which(names(dim(hcst)) == 'file_date')] <- "syear" - default_dims <- c(dat = 1, var = 1, sday = 1, - sweek = 1, syear = 1, time = 1, - latitude = 1, longitude = 1, ensemble = 1) - default_dims[names(dim(hcst))] <- dim(hcst) - dim(hcst) <- default_dims - # Change time attribute dimensions - default_time_dims <- c(sday = 1, sweek = 1, syear = 1, time = 1) - names(dim(attr(hcst, "Variables")$common$time))[which(names( - dim(attr(hcst, "Variables")$common$time)) == 'file_date')] <- "syear" - default_time_dims[names(dim(attr(hcst, "Variables")$common$time))] <- - dim(attr(hcst, "Variables")$common$time) - dim(attr(hcst, "Variables")$common$time) <- default_time_dims - } - - # Convert hcst to s2dv_cube object - ## TODO: Give correct dimensions to $Dates - ## (sday, sweek, syear instead of file_date) - hcst <- as.s2dv_cube(hcst) - # Adjust dates for models where the time stamp goes into the next month - if (recipe$Analysis$Variables$freq == "monthly_mean") { - hcst$attrs$Dates[] <- hcst$attrs$Dates - seconds(exp_descrip$time_stamp_lag) - } - - ## Combine tas and tos data into one variable: tas-tos - if(recipe$Analysis$Variables$name == 'tas tos'){ - #if(recipe$Analysis$Datasets$Reference$name == 'HadCRUT5' || recipe$Analysis$Datasets$Reference$name == 'BEST') { - source('/esarchive/scratch/nmilders/gitlab/git_clones/s2s-suite/modules/Loading/R/mask_tas_tos.R') - hcst <- mask_tas_tos(input_data = hcst, region = c(lons.min, lons.max,lats.min, lats.max), - grid = 'r360x181', - lon = hcst$coords$longitude, - lat = hcst$coords$latitude, - lon_dim = 'longitude', lat_dim = 'latitude', ncores = NULL) - - hcst$dims[['var']] <- dim(hcst$data)[['var']] - #} - } - - # Load forecast - #------------------------------------------------------------------- - if (!is.null(recipe$Analysis$Time$fcst_year)) { - # the call uses file_date instead of fcst_syear so that it can work - # with the daily case and the current version of startR not allowing - # multiple dims split - - fcst <- Start(dat = fcst.path, - var = vars, - var_dir = var_dir_exp, - var_dir_depends = 'var', - file_date = sdates$fcst, - time = idxs$fcst, - latitude = values(list(lats.min, lats.max)), - latitude_reorder = Sort(), - longitude = values(list(lons.min, lons.max)), - longitude_reorder = circularsort, - transform = regrid_params$fcst.transform, - transform_params = list(grid = regrid_params$fcst.gridtype, - method = regrid_params$fcst.gridmethod), - transform_vars = c('latitude', 'longitude'), - synonims = list(latitude = c('lat', 'latitude'), - longitude = c('lon', 'longitude'), - ensemble = c('member', 'ensemble')), - ensemble = indices(1:fcst.nmember), - metadata_dims = 'var', - return_vars = list(latitude = 'dat', - longitude = 'dat', - time = 'file_date'), - split_multiselected_dims = split_multiselected_dims, - retrieve = TRUE) - - if ("var_dir" %in% names(dim(fcst))) { - fcst <- Subset(fcst, along = "var_dir", indices = 1, drop = "selected") - } - - if (recipe$Analysis$Variables$freq == "daily_mean") { - # Adjusts dims for daily case, could be removed if startR allows - # multidim split - names(dim(fcst))[which(names(dim(fcst)) == 'file_date')] <- "syear" - default_dims <- c(dat = 1, var = 1, sday = 1, - sweek = 1, syear = 1, time = 1, - latitude = 1, longitude = 1, ensemble = 1) - default_dims[names(dim(fcst))] <- dim(fcst) - dim(fcst) <- default_dims - # Change time attribute dimensions - default_time_dims <- c(sday = 1, sweek = 1, syear = 1, time = 1) - names(dim(attr(fcst, "Variables")$common$time))[which(names( - dim(attr(fcst, "Variables")$common$time)) == 'file_date')] <- "syear" - default_time_dims[names(dim(attr(fcst, "Variables")$common$time))] <- - dim(attr(fcst, "Variables")$common$time) - dim(attr(fcst, "Variables")$common$time) <- default_time_dims - } - - # Convert fcst to s2dv_cube - fcst <- as.s2dv_cube(fcst) - # Adjust dates for models where the time stamp goes into the next month - if (recipe$Analysis$Variables$freq == "monthly_mean") { - fcst$attrs$Dates[] <- - fcst$attrs$Dates - seconds(exp_descrip$time_stamp_lag) - } - - } else { - fcst <- NULL - } - - # Load reference - #------------------------------------------------------------------- - - # Obtain dates and date dimensions from the loaded hcst data to make sure - # the corresponding observations are loaded correctly. - dates <- hcst$attrs$Dates - dim(dates) <- hcst$dims[c("sday", "sweek", "syear", "time")] - - # Separate Start() call for monthly vs daily data - if (store.freq == "monthly_mean") { - - dates_file <- format(as.Date(dates, '%Y%m%d'), "%Y%m") - dim(dates_file) <- dim(dates) - - ## tas tos mask - if (recipe$Analysis$Variables$name == 'tas tos'){ - if (recipe$Analysis$Datasets$Reference$name == 'HadCRUT5'){ - vars <- 'tasanomaly' - var_dir_obs <- reference_descrip[[store.freq]][vars] - } - } - - if (recipe$Analysis$Variables$name == 'tas tos'){ - if (recipe$Analysis$Datasets$Reference$name == 'BEST'){ - vars <- 'tas' - var_dir_obs <- reference_descrip[[store.freq]][vars] - } - } - - obs <- Start(dat = obs.path, - var = vars, - var_dir = var_dir_obs, - var_dir_depends = 'var', - file_date = dates_file, - latitude = values(list(lats.min, lats.max)), - latitude_reorder = Sort(), - longitude = values(list(lons.min, lons.max)), - longitude_reorder = circularsort, - transform = regrid_params$obs.transform, - transform_params = list(grid = regrid_params$obs.gridtype, - method = regrid_params$obs.gridmethod), - transform_vars = c('latitude', 'longitude'), - synonims = list(latitude = c('lat','latitude'), - longitude = c('lon','longitude')), - metadata_dims = 'var', - return_vars = list(latitude = 'dat', - longitude = 'dat', - time = 'file_date'), - split_multiselected_dims = TRUE, - retrieve = TRUE) - - } else if (store.freq == "daily_mean") { - - # Get year and month for file_date - dates_file <- sapply(dates, format, '%Y%m') - dim(dates_file) <- dim(dates) - # Set hour to 12:00 to ensure correct date retrieval for daily data - lubridate::hour(dates) <- 12 - lubridate::minute(dates) <- 00 - # Restore correct dimensions - dim(dates) <- dim(dates_file) - - obs <- Start(dat = obs.path, - var = vars, - var_dir = var_dir_obs, - var_dir_depends = 'var', - file_date = sort(unique(dates_file)), - time = dates, - time_var = 'time', - time_across = 'file_date', - merge_across_dims = TRUE, - merge_across_dims_narm = TRUE, - latitude = values(list(lats.min, lats.max)), - latitude_reorder = Sort(), - longitude = values(list(lons.min, lons.max)), - longitude_reorder = circularsort, - transform = regrid_params$obs.transform, - transform_params = list(grid = regrid_params$obs.gridtype, - method = regrid_params$obs.gridmethod), - transform_vars = c('latitude', 'longitude'), - synonims = list(latitude = c('lat','latitude'), - longitude = c('lon','longitude')), - metadata_dims = 'var', - return_vars = list(latitude = 'dat', - longitude = 'dat', - time = 'file_date'), - split_multiselected_dims = TRUE, - retrieve = TRUE) - } - - # Remove var_dir dimension - if ("var_dir" %in% names(dim(obs))) { - obs <- Subset(obs, along = "var_dir", indices = 1, drop = "selected") - } - # Adds ensemble dim to obs (for consistency with hcst/fcst) - default_dims <- c(dat = 1, var = 1, sday = 1, - sweek = 1, syear = 1, time = 1, - latitude = 1, longitude = 1, ensemble = 1) - default_dims[names(dim(obs))] <- dim(obs) - dim(obs) <- default_dims - - # Convert obs to s2dv_cube - obs <- as.s2dv_cube(obs) - - ## Combine tas and tos data into one variable: tas-tos - if(recipe$Analysis$Variables$name == 'tas tos'){ - if(recipe$Analysis$Datasets$Reference$name != 'HadCRUT5' & recipe$Analysis$Datasets$Reference$name != 'BEST'){ - source('/esarchive/scratch/nmilders/gitlab/git_clones/s2s-suite/modules/Loading/R/mask_tas_tos.R') - obs <- mask_tas_tos(input_data = obs, region = c(lons.min, lons.max,lats.min, lats.max), - grid = 'r360x181', - lon = obs$coords$longitude, - lat = obs$coords$latitude, - lon_dim = 'longitude', lat_dim = 'latitude', ncores = NULL) - - obs$dims[['var']] <- dim(obs$data)[['var']] - } - } - - # Check for consistency between hcst and obs grid - if (!(recipe$Analysis$Regrid$type == 'none')) { - if (!isTRUE(all.equal(as.vector(hcst$lat), as.vector(obs$lat)))) { - lat_error_msg <- paste("Latitude mismatch between hcst and obs.", - "Please check the original grids and the", - "regrid parameters in your recipe.") - error(recipe$Run$logger, lat_error_msg) - hcst_lat_msg <- paste0("First hcst lat: ", hcst$lat[1], - "; Last hcst lat: ", hcst$lat[length(hcst$lat)]) - info(recipe$Run$logger, hcst_lat_msg) - obs_lat_msg <- paste0("First obs lat: ", obs$lat[1], - "; Last obs lat: ", obs$lat[length(obs$lat)]) - info(recipe$Run$logger, obs_lat_msg) - stop("hcst and obs don't share the same latitudes.") - } - if (!isTRUE(all.equal(as.vector(hcst$lon), as.vector(obs$lon)))) { - lon_error_msg <- paste("Longitude mismatch between hcst and obs.", - "Please check the original grids and the", - "regrid parameters in your recipe.") - error(recipe$Run$logger, lon_error_msg) - hcst_lon_msg <- paste0("First hcst lon: ", hcst$lon[1], - "; Last hcst lon: ", hcst$lon[length(hcst$lon)]) - info(recipe$Run$logger, hcst_lon_msg) - obs_lon_msg <- paste0("First obs lon: ", obs$lon[1], - "; Last obs lon: ", obs$lon[length(obs$lon)]) - info(recipe$Run$logger, obs_lon_msg) - stop("hcst and obs don't share the same longitudes.") - - } - } - - # Remove negative values in accumulative variables - dictionary <- read_yaml("conf/variable-dictionary.yml") - for (var_idx in 1:length(vars)) { - var_name <- vars[var_idx] - if (dictionary$vars[[var_name]]$accum) { - info(recipe$Run$logger, - paste0("Accumulated variable ", var_name, - ": setting negative values to zero.")) - # obs$data[, var_idx, , , , , , , ] <- pmax(Subset(obs$data, - # along = "var", - # indices = var_idx, F), 0) - obs$data[, var_idx, , , , , , , ][obs$data[, var_idx, , , , , , , ] < 0] <- 0 - hcst$data[, var_idx, , , , , , , ][hcst$data[, var_idx, , , , , , , ] < 0] <- 0 - if (!is.null(fcst)) { - fcst$data[, var_idx, , , , , , , ][fcst$data[, var_idx, , , , , , , ] < 0] <- 0 - } - } - - # Convert prlr from m/s to mm/day - ## TODO: Make a unit conversion function - if (vars[[var_idx]] == "prlr") { - # Verify that the units are m/s and the same in obs and hcst - if (((obs$attrs$Variable$metadata[[var_name]]$units == "m s-1") || - (obs$attrs$Variable$metadata[[var_name]]$units == "m s**-1")) && - ((hcst$attrs$Variable$metadata[[var_name]]$units == "m s-1") || - (hcst$attrs$Variable$metadata[[var_name]]$units == "m s**-1"))) { - info(recipe$Run$logger, "Converting precipitation from m/s to mm/day.") - obs$data[, var_idx, , , , , , , ] <- - obs$data[, var_idx, , , , , , , ]*86400*1000 - obs$attrs$Variable$metadata[[var_name]]$units <- "mm/day" - hcst$data[, var_idx, , , , , , , ] <- - hcst$data[, var_idx, , , , , , , ]*86400*1000 - hcst$attrs$Variable$metadata[[var_name]]$units <- "mm/day" - if (!is.null(fcst)) { - fcst$data[, var_idx, , , , , , , ] <- - fcst$data[, var_idx, , , , , , , ]*86400*1000 - fcst$attrs$Variable$metadata[[var_name]]$units <- "mm/day" - } - } - } - } - # Compute anomalies if requested - # Print a summary of the loaded data for the user, for each object - if (recipe$Run$logger$threshold <= 2) { - data_summary(hcst, recipe) - data_summary(obs, recipe) - if (!is.null(fcst)) { - data_summary(fcst, recipe) - } - } - - info(recipe$Run$logger, - "##### DATA LOADING COMPLETED SUCCESSFULLY #####") - - ############################################################################ - # - # CHECKS ON MISSING FILES - # - ############################################################################ - - #obs.NA_dates.ind <- Apply(obs, - # fun=(function(x){ all(is.na(x))}), - # target_dims=c('time', 'latitude', 'longitude'))[[1]] - #obs.NA_dates <- dates_file[obs.NA_dates.ind] - #obs.NA_dates <- obs.NA_dates[order(obs.NA_dates)] - #obs.NA_files <- paste0(obs.dir, store.freq,"/",variable,"_", - # freq.obs,"obs.grid","/",variable,"_",obs.NA_dates,".nc") - # - #if (any(is.na(hcst))){ - # fatal(recipe$Run$logger, - # paste(" ERROR: MISSING HCST VALUES FOUND DURING LOADING # ", - # " ################################################# ", - # " ###### MISSING FILES #### ", - # " ################################################# ", - # "hcst files:", - # hcst.NA_files, - # " ################################################# ", - # " ################################################# ", - # sep="\n")) - # quit(status = 1) - #} - # - #if (any(is.na(obs)) && !identical(obs.NA_dates,character(0))){ - # fatal(recipe$logger, - # paste(" ERROR: MISSING OBS VALUES FOUND DURING LOADING # ", - # " ################################################# ", - # " ###### MISSING FILES #### ", - # " ################################################# ", - # "obs files:", - # obs.NA_files, - # " ################################################# ", - # " ################################################# ", - # sep="\n")) - # quit(status=1) - #} - # - #info(recipe$logger, - # "######### DATA LOADING COMPLETED SUCCESFULLY ##############") - - ############################################################################ - ############################################################################ - - return(list(hcst = hcst, fcst = fcst, obs = obs)) - -} diff --git a/modules/Loading/Loading.R b/modules/Loading/Loading.R index 315ef78dae696035ac94e09db5baaf391cb62a79..63fee97bede51b72128b917af9b5171d83061d4f 100644 --- a/modules/Loading/Loading.R +++ b/modules/Loading/Loading.R @@ -8,12 +8,20 @@ Loading <- function(recipe) { if (tolower(recipe$Run$filesystem) == "mars") { source("modules/Loading/R/load_GRIB.R") data <- load_GRIB(recipe) + } else if (tolower(recipe$Run$filesystem) == "sample") { + source("modules/Loading/R/load_sample.R") + data <- load_sample(recipe) } else { # Case: esarchive time_horizon <- tolower(recipe$Analysis$Horizon) if (time_horizon == "seasonal") { - source("modules/Loading/R/load_seasonal.R") - data <- load_seasonal(recipe) + if(recipe$Analysis$Variables$name == 'tas-tos') { + source("modules/Loading/R/load_tas_tos.R") + data <- load_tas_tos(recipe) + } else { + source("modules/Loading/R/load_seasonal.R") + data <- load_seasonal(recipe) + } } else if (time_horizon == "decadal") { source("modules/Loading/R/load_decadal.R") data <- load_decadal(recipe) diff --git a/modules/Loading/R/compare_exp_obs_grids.R b/modules/Loading/R/compare_exp_obs_grids.R new file mode 100644 index 0000000000000000000000000000000000000000..0270cd63a1193e8f444cbd19a9a05f6f78749bab --- /dev/null +++ b/modules/Loading/R/compare_exp_obs_grids.R @@ -0,0 +1,35 @@ +compare_exp_obs_grids <- function(exp, obs) { + # Check for consistency between exp and obs grid + if (!isTRUE(all.equal(as.vector(exp$coords$latitude), + as.vector(obs$coords$latitude)))) { + lat_error_msg <- paste("Latitude mismatch between exp and obs.", + "Please check the original grids and the", + "regrid parameters in your recipe.") + error(recipe$Run$logger, lat_error_msg) + exp_lat_msg <- paste0("First exp lat: ", exp$coords$latitude[1], + "; Last exp lat: ", + exp$coords$latitude[length(exp$coords$latitude)]) + info(recipe$Run$logger, exp_lat_msg) + obs_lat_msg <- paste0("First obs lat: ", obs$coords$latitude[1], + "; Last obs lat: ", + obs$coords$latitude[length(obs$coords$latitude)]) + info(recipe$Run$logger, obs_lat_msg) + stop("exp and obs don't share the same latitudes.") + } + if (!isTRUE(all.equal(as.vector(exp$coords$longitude), + as.vector(obs$coords$longitude)))) { + lon_error_msg <- paste("Longitude mismatch between exp and obs.", + "Please check the original grids and the", + "regrid parameters in your recipe.") + error(recipe$Run$logger, lon_error_msg) + exp_lon_msg <- paste0("First exp lon: ", exp$coords$longitude[1], + "; Last exp lon: ", + exp$coords$longitude[length(exp$coords$longitude)]) + info(recipe$Run$logger, exp_lon_msg) + obs_lon_msg <- paste0("First obs lon: ", obs$coords$longitude[1], + "; Last obs lon: ", + obs$coords$longitude[length(obs$coords$longitude)]) + info(recipe$Run$logger, obs_lon_msg) + stop("exp and obs don't share the same longitudes.") + } +} diff --git a/modules/Loading/R/get_regrid_params.R b/modules/Loading/R/get_regrid_params.R new file mode 100644 index 0000000000000000000000000000000000000000..ef08adcd11f609e08ccc484e0408bb87317eb9b3 --- /dev/null +++ b/modules/Loading/R/get_regrid_params.R @@ -0,0 +1,73 @@ +#'Read regrid parameters from recipe and returns a list for use with Start() +#' +#'The purpose of this function is to read the recipe and archive configuration +#'data for Auto-S2S workflows, retrieve the regridding parameters for hcst and +#'obs, and return an object that can be the input for 'transform' and +#''transform_params' when the data is loaded using Start(). +#'Requires CDORemapper. +#' +#'@param recipe Auto-S2S configuration recipe as returned by read_yaml() +#'@param archive Auto-S2S exp and obs archive as returned by read_yaml() +#' +#'@return A list containing regridding parameters for fcst and obs +#' +#'@import startR +#'@examples +#'setwd("/esarchive/scratch/vagudets/repos/auto-s2s/") +#'library(yaml) +#'library(startR) +#'recipe <- read_yaml("modules/data_load/recipe_1.yml") +#'archive <- read_yaml(paste0(recipe$Run$code_dir, "conf/archive.yml"))$archive +#'regrid_params <- get_regrid_params(recipe, archive) +#' +#'@export +get_regrid_params <- function(recipe, archive) { + + ## TODO: Multi-model case + ## If multi-model, use the first system grid? + ## TODO: 'NULL' entries had to be removed due to bug in Start(). Rewrite when + ## the bug is fixed. + exp.name <- recipe$Analysis$Datasets$System$name + ref.name <- recipe$Analysis$Datasets$Reference$name + exp_descrip <- archive$System[[exp.name]] + reference_descrip <- archive$Reference[[ref.name]] + + if (tolower(recipe$Analysis$Regrid$type) == 'to_reference') { + + regrid_params <- list(fcst.gridtype = reference_descrip$reference_grid, + fcst.gridmethod = recipe$Analysis$Regrid$method, + fcst.transform = CDORemapper, + obs.gridtype = NULL, + obs.gridmethod = NULL, + obs.transform = NULL) + + } else if (tolower(recipe$Analysis$Regrid$type) == 'to_system') { + + regrid_params <- list(fcst.gridtype = NULL, + fcst.gridmethod = NULL, + fcst.transform = NULL, + obs.gridtype = exp_descrip$reference_grid, + obs.gridmethod = recipe$Analysis$Regrid$method, + obs.transform = CDORemapper) + + } else if (tolower(recipe$Analysis$Regrid$type) == 'none') { + + regrid_params <- list(fcst.gridtype = NULL, + fcst.gridmethod = NULL, + fcst.transform = NULL, + obs.gridtype = NULL, + obs.gridmethod = NULL, + obs.transform = NULL) + + } else { + regrid_params <- list(fcst.gridtype = recipe$Analysis$Regrid$type, + fcst.gridmethod = recipe$Analysis$Regrid$method, + fcst.transform = CDORemapper, + obs.gridtype = recipe$Analysis$Regrid$type, + obs.gridmethod = recipe$Analysis$Regrid$method, + obs.transform = CDORemapper) + } + + return(regrid_params) +} + diff --git a/modules/Loading/R/load_decadal.R b/modules/Loading/R/load_decadal.R index a0e85e15a970d4cdd9610033fbb25b544021058b..9268b090fc70db506279c87cbc3a697dd763fe67 100644 --- a/modules/Loading/R/load_decadal.R +++ b/modules/Loading/R/load_decadal.R @@ -1,16 +1,10 @@ -# Loading module: -# 1. archive.yml -# 2. recipe.yml -# 3. Load_decadal.R (V) -#setwd('/esarchive/scratch/aho/git/auto-s2s/') - -## TODO: remove paths to personal scratchs -source("/esarchive/scratch/vagudets/repos/csoperational/R/get_regrid_params.R") # Load required libraries/funs +source("modules/Loading/R/get_regrid_params.R") source("modules/Loading/R/helper_loading_decadal.R") source("modules/Loading/R/dates2load.R") source("modules/Loading/R/check_latlon.R") source("modules/Loading/R/get_timeidx.R") +source("modules/Loading/R/compare_exp_obs_grids.R") #==================================================================== @@ -19,7 +13,7 @@ source("modules/Loading/R/get_timeidx.R") load_decadal <- function(recipe) { ## - archive <- read_yaml(paste0("conf/archive_decadal.yml"))$esarchive + archive <- read_yaml(paste0("conf/archive_decadal.yml"))[[recipe$Run$filesystem]] # Print Start() info or not DEBUG <- FALSE @@ -83,7 +77,7 @@ load_decadal <- function(recipe) { regrid_params <- get_regrid_params(recipe, archive) # Only if the time length in each chunk may differ that we need largest_dims_length to be TRUE. Otherwise, set FALSE to increase efficiency. - need_largest_dims_length <- ifelse(exp.name == 'EC-Earth3-i2', TRUE, FALSE) + need_largest_dims_length <- ifelse(exp.name %in% c('HadGEM3-GC31-MM', 'EC-Earth3-i2'), TRUE, FALSE) #------------------------------------------- @@ -454,35 +448,8 @@ load_decadal <- function(recipe) { # lat and lon attributes if (!(recipe$Analysis$Regrid$type == 'none')) { - if (!identical(as.vector(hcst$lat), as.vector(fcst$lat))) { - lat_error_msg <- paste("Latitude mismatch between hcst and fcst.", - "Please check the original grids and the", - "regrid parameters in your recipe.") - error(recipe$Run$logger, lat_error_msg) - hcst_lat_msg <- paste0("First hcst lat: ", hcst$lat[1], - "; Last hcst lat: ", hcst$lat[length(hcst$lat)]) - info(recipe$Run$logger, hcst_lat_msg) - fcst_lat_msg <- paste0("First fcst lat: ", fcst$lat[1], - "; Last fcst lat: ", fcst$lat[length(fcst$lat)]) - info(recipe$Run$logger, fcst_lat_msg) - stop("hcst and fcst don't share the same latitudes.") - } - - if (!identical(as.vector(hcst$lon), as.vector(fcst$lon))) { - lon_error_msg <- paste("Longitude mismatch between hcst and fcst.", - "Please check the original grids and the", - "regrid parameters in your recipe.") - error(recipe$Run$logger, lon_error_msg) - hcst_lon_msg <- paste0("First hcst lon: ", hcst$lon[1], - "; Last hcst lon: ", hcst$lon[length(hcst$lon)]) - info(recipe$Run$logger, hcst_lon_msg) - fcst_lon_msg <- paste0("First fcst lon: ", fcst$lon[1], - "; Last fcst lon: ", fcst$lon[length(fcst$lon)]) - info(recipe$Run$logger, fcst_lon_msg) - stop("hcst and fcst don't share the same longitudes.") - } + compare_exp_obs_grids(hcst, obs) } - } #------------------------------------------- diff --git a/modules/Loading/R/load_sample.R b/modules/Loading/R/load_sample.R new file mode 100644 index 0000000000000000000000000000000000000000..e0d906d3dbc6beb4f59a3fe0eece2f6bd5fea8a4 --- /dev/null +++ b/modules/Loading/R/load_sample.R @@ -0,0 +1,49 @@ +## TODO: remove paths to personal scratchs +source("/esarchive/scratch/vagudets/repos/csoperational/R/get_regrid_params.R") +# Load required libraries/funs +source("tools/CST_ChangeDimName.R") +source("modules/Loading/R/compare_exp_obs_grids.R") + +load_sample <- function(recipe) { + # Hindcast: + # Change hcst dimension names + hcst <- CST_ChangeDimName(lonlat_temp_st$exp, + original_dimnames = c("dataset", "sdate", "ftime", + "lat", "lon", "member"), + final_dimnames = c("dat", "syear", "time", + "latitude", "longitude", + "ensemble")) + # Add sday and sweek dimensions + hcst <- CST_InsertDim(hcst, posdim = 3, lendim = 1, name = "sday", values = 1) + hcst <- CST_InsertDim(hcst, posdim = 4, lendim = 1, name = "sweek", values = 1) + dim(hcst$attrs$Dates) <- c(sday = 1, sweek = 1, + syear = dim(hcst$attrs$Dates)[['syear']], + time = dim(hcst$attrs$Dates)[['time']]) + # Observations: + # Change obs dimension names + obs <- CST_ChangeDimName(lonlat_temp_st$obs, + original_dimnames = c("dataset", "sdate", "ftime", + "lat", "lon"), + final_dimnames = c("dat", "syear", "time", + "latitude", "longitude")) + # Add sday and sweek dimensions + obs <- CST_InsertDim(obs, posdim = 3, lendim = 1, name = "sday", values = 1) + obs <- CST_InsertDim(obs, posdim = 4, lendim = 1, name = "sweek", values = 1) + dim(obs$attrs$Dates) <- c(sday = 1, sweek = 1, + syear = dim(obs$attrs$Dates)[['syear']], + time = dim(obs$attrs$Dates)[['time']]) + # Add ensemble dimension to obs + obs <- CST_InsertDim(obs, posdim = 7, lendim = 1, name = "ensemble", values = 1) + # Adjust name of 'load_parameters$date' attribute + obs$attrs$load_parameters$dat1$file_date <- obs$attrs$load_parameters$dat1$date + + # Sample fcst NULL + fcst <- NULL + # Check for consistency between hcst and obs grid + compare_exp_obs_grids(hcst, obs) + + info(recipe$Run$logger, + "##### DATA LOADING COMPLETED SUCCESSFULLY #####") + .log_memory_usage(recipe$Run$logger, when = "After loading (sample data)") + return(list(hcst = hcst, fcst = fcst, obs = obs)) +} diff --git a/modules/Loading/R/load_seasonal.R b/modules/Loading/R/load_seasonal.R index a77f239ebb624c65f7c861617aeb8c8937c2950b..2caa34a9ebe10315f94280cfbae44901045cd306 100644 --- a/modules/Loading/R/load_seasonal.R +++ b/modules/Loading/R/load_seasonal.R @@ -1,9 +1,9 @@ -## TODO: remove paths to personal scratchs -source("/esarchive/scratch/vagudets/repos/csoperational/R/get_regrid_params.R") # Load required libraries/funs +source("modules/Loading/R/get_regrid_params.R") source("modules/Loading/R/dates2load.R") source("modules/Loading/R/get_timeidx.R") source("modules/Loading/R/check_latlon.R") +source("modules/Loading/R/compare_exp_obs_grids.R") load_seasonal <- function(recipe) { @@ -313,36 +313,10 @@ load_seasonal <- function(recipe) { # Convert obs to s2dv_cube obs <- as.s2dv_cube(obs) - + # Check for consistency between hcst and obs grid if (!(recipe$Analysis$Regrid$type == 'none')) { - if (!isTRUE(all.equal(as.vector(hcst$lat), as.vector(obs$lat)))) { - lat_error_msg <- paste("Latitude mismatch between hcst and obs.", - "Please check the original grids and the", - "regrid parameters in your recipe.") - error(recipe$Run$logger, lat_error_msg) - hcst_lat_msg <- paste0("First hcst lat: ", hcst$lat[1], - "; Last hcst lat: ", hcst$lat[length(hcst$lat)]) - info(recipe$Run$logger, hcst_lat_msg) - obs_lat_msg <- paste0("First obs lat: ", obs$lat[1], - "; Last obs lat: ", obs$lat[length(obs$lat)]) - info(recipe$Run$logger, obs_lat_msg) - stop("hcst and obs don't share the same latitudes.") - } - if (!isTRUE(all.equal(as.vector(hcst$lon), as.vector(obs$lon)))) { - lon_error_msg <- paste("Longitude mismatch between hcst and obs.", - "Please check the original grids and the", - "regrid parameters in your recipe.") - error(recipe$Run$logger, lon_error_msg) - hcst_lon_msg <- paste0("First hcst lon: ", hcst$lon[1], - "; Last hcst lon: ", hcst$lon[length(hcst$lon)]) - info(recipe$Run$logger, hcst_lon_msg) - obs_lon_msg <- paste0("First obs lon: ", obs$lon[1], - "; Last obs lon: ", obs$lon[length(obs$lon)]) - info(recipe$Run$logger, obs_lon_msg) - stop("hcst and obs don't share the same longitudes.") - - } + compare_exp_obs_grids(hcst, obs) } # Remove negative values in accumulative variables diff --git a/modules/Loading/R/load_tas_tos.R b/modules/Loading/R/load_tas_tos.R new file mode 100644 index 0000000000000000000000000000000000000000..ea231b56d8a00f3833dc7bf9eaa271bc9a97a097 --- /dev/null +++ b/modules/Loading/R/load_tas_tos.R @@ -0,0 +1,487 @@ +# Load required libraries/funs +source("modules/Loading/R/get_regrid_params.R") +source("modules/Loading/R/dates2load.R") +source("modules/Loading/R/get_timeidx.R") +source("modules/Loading/R/check_latlon.R") +source('modules/Loading/R/mask_tas_tos.R') +source("modules/Loading/R/compare_exp_obs_grids.R") + +load_tas_tos <- function(recipe) { + + # ------------------------------------------- + # Set params ----------------------------------------- + + hcst.inityear <- recipe$Analysis$Time$hcst_start + hcst.endyear <- recipe$Analysis$Time$hcst_end + lats.min <- recipe$Analysis$Region$latmin + lats.max <- recipe$Analysis$Region$latmax + lons.min <- recipe$Analysis$Region$lonmin + lons.max <- recipe$Analysis$Region$lonmax + ref.name <- recipe$Analysis$Datasets$Reference$name + exp.name <- recipe$Analysis$Datasets$System$name + + variable <- c("tas", "tos", "sic") + store.freq <- recipe$Analysis$Variables$freq + + if(is.null(recipe$Analysis$Variables$sic_threshold)){ + sic.threshold = 0.15 + } else { + sic.threshold <- recipe$Analysis$Variables$sic_threshold + } + + data_order <- c('dat', 'var', 'sday', 'sweek', 'syear', 'time', 'latitude', 'longitude', 'ensemble') + + + # get sdates array + ## LOGGER: Change dates2load to extract logger from recipe? + sdates <- dates2load(recipe, recipe$Run$logger) + + idxs <- NULL + idxs$hcst <- get_timeidx(sdates$hcst, + recipe$Analysis$Time$ftime_min, + recipe$Analysis$Time$ftime_max, + time_freq=store.freq) + + if (!(is.null(sdates$fcst))) { + idxs$fcst <- get_timeidx(sdates$fcst, + recipe$Analysis$Time$ftime_min, + recipe$Analysis$Time$ftime_max, + time_freq=store.freq) + } + + # get esarchive datasets dict: + archive <- read_yaml("conf/archive.yml")[[recipe$Run$filesystem]] + exp_descrip <- archive$System[[exp.name]] + + freq.hcst <- unlist(exp_descrip[[store.freq]][variable[1]]) + reference_descrip <- archive$Reference[[ref.name]] + freq.obs <- unlist(reference_descrip[[store.freq]][variable[1]]) + obs.dir <- reference_descrip$src + fcst.dir <- exp_descrip$src + hcst.dir <- exp_descrip$src + fcst.nmember <- exp_descrip$nmember$fcst + hcst.nmember <- exp_descrip$nmember$hcst + + var_dir_obs <- reference_descrip[[store.freq]][variable] + var_dir_exp <- exp_descrip[[store.freq]][variable] + + # ----------- + obs.path <- paste0(archive$src, obs.dir, "$var_dir$", + "$var$_$file_date$.nc") + + hcst.path <- paste0(archive$src, hcst.dir, "$var_dir$", + "$var$_$file_date$.nc") + + fcst.path <- paste0(archive$src, hcst.dir, "$var_dir$", + "/$var$_$file_date$.nc") + + # Define regrid parameters: + #------------------------------------------------------------------- + regrid_params <- get_regrid_params(recipe, archive) + + # Longitude circular sort and latitude check + #------------------------------------------------------------------- + circularsort <- check_latlon(lats.min, lats.max, lons.min, lons.max) + + if (recipe$Analysis$Variables$freq == "monthly_mean"){ + split_multiselected_dims = TRUE + } else { + split_multiselected_dims = FALSE + } + + # Load hindcast data without regrid + #------------------------------------------------------------------- + hcst <- Start(dat = hcst.path, + var = variable, + var_dir = var_dir_exp, + file_date = sdates$hcst, + time = idxs$hcst, + var_dir_depends = 'var', + latitude = values(list(lats.min, lats.max)), + latitude_reorder = Sort(decreasing = TRUE), + longitude = values(list(lons.min, lons.max)), + longitude_reorder = circularsort, + synonims = list(latitude = c('lat', 'latitude'), + longitude = c('lon', 'longitude'), + ensemble = c('member', 'ensemble')), + ensemble = indices(1:hcst.nmember), + metadata_dims = 'var', # change to just 'var'? + return_vars = list(latitude = 'dat', + longitude = 'dat', + time = 'file_date'), + split_multiselected_dims = split_multiselected_dims, + retrieve = TRUE) + + + # Remove var_dir dimension + if ("var_dir" %in% names(dim(hcst))) { + hcst <- Subset(hcst, along = "var_dir", indices = 1, drop = "selected") + } + + if (recipe$Analysis$Variables$freq == "daily_mean") { + # Adjusts dims for daily case, could be removed if startR allows + # multidim split + names(dim(hcst))[which(names(dim(hcst)) == 'file_date')] <- "syear" + default_dims <- c(dat = 1, var = 1, sday = 1, + sweek = 1, syear = 1, time = 1, + latitude = 1, longitude = 1, ensemble = 1) + default_dims[names(dim(hcst))] <- dim(hcst) + dim(hcst) <- default_dims + # Change time attribute dimensions + default_time_dims <- c(sday = 1, sweek = 1, syear = 1, time = 1) + names(dim(attr(hcst, "Variables")$common$time))[which(names( + dim(attr(hcst, "Variables")$common$time)) == 'file_date')] <- "syear" + default_time_dims[names(dim(attr(hcst, "Variables")$common$time))] <- + dim(attr(hcst, "Variables")$common$time) + dim(attr(hcst, "Variables")$common$time) <- default_time_dims + } + + # Define sea-ice grid points based of sea-ice concentration threshold + ice_hcst <- hcst[,3,,,,,,,] >= sic.threshold + + # Replace Tos with Tas for data points with sea ice + hcst[,2,,,,,,,][ice_hcst] <- hcst[,1,,,,,,,][ice_hcst] + + + # Convert hcst to s2dv_cube object + ## TODO: Give correct dimensions to $Dates + ## (sday, sweek, syear instead of file_date) + hcst <- as.s2dv_cube(hcst) + # Adjust dates for models where the time stamp goes into the next month + if (recipe$Analysis$Variables$freq == "monthly_mean") { + hcst$attrs$Dates[] <- hcst$attrs$Dates - seconds(exp_descrip$time_stamp_lag) + } + + # Combine hcst tas and tos data + #------------------------------------------------------------------- + + hcst <- mask_tas_tos(input_data = hcst, region = c(lons.min, lons.max,lats.min, lats.max), + mask_path = archive$System[[exp.name]]$land_sea_mask, lsm_var_name = 'lsm', + lon = hcst$coords$longitude, lat = hcst$coords$latitude, + lon_dim = 'longitude', lat_dim = 'latitude', ncores = NULL) + + hcst$dims[['var']] <- dim(hcst$data)[['var']] + hcst$attrs$Variable$varName <- 'tas-tos' + + hcst$data <- Reorder(hcst$data, data_order) + + # Load forecast data without regrid + #------------------------------------------------------------------- + if (!is.null(recipe$Analysis$Time$fcst_year)) { + # the call uses file_date instead of fcst_syear so that it can work + # with the daily case and the current version of startR not allowing + # multiple dims split + + fcst <- Start(dat = fcst.path, + var = variable, + var_dir = var_dir_exp, + var_dir_depends = 'var', + file_date = sdates$fcst, + time = idxs$fcst, + latitude = values(list(lats.min, lats.max)), + latitude_reorder = Sort(decreasing = TRUE), + longitude = values(list(lons.min, lons.max)), + longitude_reorder = circularsort, + synonims = list(latitude = c('lat', 'latitude'), + longitude = c('lon', 'longitude'), + ensemble = c('member', 'ensemble')), + ensemble = indices(1:fcst.nmember), + metadata_dims = 'var', + return_vars = list(latitude = 'dat', + longitude = 'dat', + time = 'file_date'), + split_multiselected_dims = split_multiselected_dims, + retrieve = TRUE) + + if ("var_dir" %in% names(dim(fcst))) { + fcst <- Subset(fcst, along = "var_dir", indices = 1, drop = "selected") + } + + if (recipe$Analysis$Variables$freq == "daily_mean") { + # Adjusts dims for daily case, could be removed if startR allows + # multidim split + names(dim(fcst))[which(names(dim(fcst)) == 'file_date')] <- "syear" + default_dims <- c(dat = 1, var = 1, sday = 1, + sweek = 1, syear = 1, time = 1, + latitude = 1, longitude = 1, ensemble = 1) + default_dims[names(dim(fcst))] <- dim(fcst) + dim(fcst) <- default_dims + # Change time attribute dimensions + default_time_dims <- c(sday = 1, sweek = 1, syear = 1, time = 1) + names(dim(attr(fcst, "Variables")$common$time))[which(names( + dim(attr(fcst, "Variables")$common$time)) == 'file_date')] <- "syear" + default_time_dims[names(dim(attr(fcst, "Variables")$common$time))] <- + dim(attr(fcst, "Variables")$common$time) + dim(attr(fcst, "Variables")$common$time) <- default_time_dims + } + + # Define sea-ice grid points based of sea-ice concentration threshold + ice_fcst <- fcst[,3,,,,,,,] >= sic.threshold + + # Replace Tos with Tas for datapoints with sea ice + fcst[,2,,,,,,,][ice_fcst] <- fcst[,1,,,,,,,][ice_fcst] + + + # Convert fcst to s2dv_cube + fcst <- as.s2dv_cube(fcst) + # Adjust dates for models where the time stamp goes into the next month + if (recipe$Analysis$Variables$freq == "monthly_mean") { + fcst$attrs$Dates[] <- + fcst$attrs$Dates - seconds(exp_descrip$time_stamp_lag) + } + + # Combine fcst tas and tos data + #------------------------------------------------------------------- + + fcst <- mask_tas_tos(input_data = fcst, region = c(lons.min, lons.max,lats.min, lats.max), + mask_path = archive$System[[exp.name]]$land_sea_mask, lsm_var_name = 'lsm', + lon = fcst$coords$longitude, lat = fcst$coords$latitude, + lon_dim = 'longitude', lat_dim = 'latitude', ncores = NULL) + + fcst$dims[['var']] <- dim(fcst$data)[['var']] + fcst$attrs$Variable$varName <- 'tas-tos' + + fcst$data <- Reorder(fcst$data, data_order) + + } else { + fcst <- NULL + } + + # Load obs data without regrid + #------------------------------------------------------------------- + + # Obtain dates and date dimensions from the loaded hcst data to make sure + # the corresponding observations are loaded correctly. + dates <- hcst$attrs$Dates + dim(dates) <- hcst$dims[c("sday", "sweek", "syear", "time")] + + # Separate Start() call for monthly vs daily data + if (store.freq == "monthly_mean") { + + dates_file <- format(as.Date(dates, '%Y%m%d'), "%Y%m") + dim(dates_file) <- dim(dates) + + + # Define variables for blended tas-tos datasets + if (recipe$Analysis$Datasets$Reference$name == 'BEST'){ + variable <- 'tas' + var_dir_obs <- reference_descrip[[store.freq]][variable] + } + + obs <- Start(dat = obs.path, + var = variable, + var_dir = var_dir_obs, + var_dir_depends = 'var', + file_date = dates_file, + latitude = values(list(lats.min, lats.max)), + latitude_reorder = Sort(decreasing = TRUE), + longitude = values(list(lons.min, lons.max)), + longitude_reorder = circularsort, + synonims = list(latitude = c('lat','latitude'), + longitude = c('lon','longitude')), + metadata_dims = 'var', + return_vars = list(latitude = 'dat', + longitude = 'dat', + time = 'file_date'), + split_multiselected_dims = TRUE, + retrieve = TRUE) + + + } else if (store.freq == "daily_mean") { + + # Get year and month for file_date + dates_file <- sapply(dates, format, '%Y%m') + dim(dates_file) <- dim(dates) + # Set hour to 12:00 to ensure correct date retrieval for daily data + lubridate::hour(dates) <- 12 + lubridate::minute(dates) <- 00 + # Restore correct dimensions + dim(dates) <- dim(dates_file) + + obs <- Start(dat = obs.path, + var = variable, + var_dir = var_dir_obs, + var_dir_depends = 'var', + file_date = sort(unique(dates_file)), + time = dates, + time_var = 'time', + time_across = 'file_date', + merge_across_dims = TRUE, + merge_across_dims_narm = TRUE, + latitude = values(list(lats.min, lats.max)), + latitude_reorder = Sort(decreasing = TRUE), + longitude = values(list(lons.min, lons.max)), + longitude_reorder = circularsort, + synonims = list(latitude = c('lat','latitude'), + longitude = c('lon','longitude')), + metadata_dims = 'var', + return_vars = list(latitude = 'dat', + longitude = 'dat', + time = 'file_date'), + split_multiselected_dims = TRUE, + retrieve = TRUE) + + } + + # Remove var_dir dimension + if ("var_dir" %in% names(dim(obs))) { + obs <- Subset(obs, along = "var_dir", indices = 1, drop = "selected") + } + # Adds ensemble dim to obs (for consistency with hcst/fcst) + default_dims <- c(dat = 1, var = 1, sday = 1, + sweek = 1, syear = 1, time = 1, + latitude = 1, longitude = 1, ensemble = 1) + default_dims[names(dim(obs))] <- dim(obs) + dim(obs) <- default_dims + + if(!recipe$Analysis$Datasets$Reference$name %in% c('HadCRUT4','HadCRUT5','BEST','GISTEMPv4')){ + + # Define sea-ice grid points based of sea-ice concentration threshold + ice_obs <- (obs[,3,,,,,,,]) >= sic.threshold + + # Replace NA values with False + ice_obs[is.na(ice_obs)] <- FALSE + + # Replace Tos with Tas for datapoints with sea ice + obs[,2,,,,,,,][ice_obs] <- obs[,1,,,,,,,][ice_obs] + } + + # Convert obs to s2dv_cube + obs <- as.s2dv_cube(obs) + + + # Combine obs tas and tos data + #------------------------------------------------------------------- + ## TODO: Ask about this list + if(!recipe$Analysis$Datasets$Reference$name %in% c('HadCRUT4','HadCRUT5','BEST','GISTEMPv4')){ + + obs <- mask_tas_tos(input_data = obs, region = c(lons.min, lons.max,lats.min, lats.max), + mask_path = archive$Reference[[ref.name]]$land_sea_mask, lsm_var_name = 'sftof', + lon = obs$coords$longitude, lat = obs$coords$latitude, + lon_dim = 'longitude', lat_dim = 'latitude', ncores = NULL) + + obs$dims[['var']] <- dim(obs$data)[['var']] + obs$attrs$Variable$varName <- 'tas-tos' + + } ## close if on reference name + + obs$data <- Reorder(obs$data, data_order) + + + # Regrid data + #------------------------------------------------------------------- + + # Regrid reference to system grid: + if(recipe$Analysis$Regrid$type == 'to_system'){ + + aux <- CDORemap(data_array = obs$data, ## Not regridding to desired grid when latitudes are ordered descending + lons = obs$coords$longitude, + lats = obs$coords$latitude, + grid = regrid_params$obs.gridtype, + method = recipe$Analysis$Regrid$method, + avoid_writes = TRUE, + crop = c(lons.min, lons.max,lats.min, lats.max), + force_remap = TRUE) + + obs$data <- aux$data_array + obs$coords$longitude <- aux$lons + obs$coords$latitude <- aux$lats + obs$dims['longitude'] <- dim(aux$data_array)['longitude'] + obs$dims['latitude'] <- dim(aux$data_array)['latitude'] + rm(aux) + } + + # Regrid system to reference grid: + if(recipe$Analysis$Regrid$type == 'to_reference'){ + + aux <- CDORemap(data_array = hcst$data, + lons = hcst$coords$longitude, lats = hcst$coords$latitude, + grid = regrid_params$fcst.gridtype, method = recipe$Analysis$Regrid$method, + avoid_writes = TRUE, crop = TRUE, + force_remap = TRUE) + + hcst$data <- aux$data_array + hcst$coords$longitude <- aux$lons + hcst$coords$latitude <- aux$lats + hcst$dims['longitude'] <- dim(aux$data_array)['longitude'] + hcst$dims['latitude'] <- dim(aux$data_array)['latitude'] + rm(aux) + + if (!is.null(recipe$Analysis$Time$fcst_year)) { + aux <- CDORemap(data_array = fcst$data, + lons = fcst$coords$longitude, + lats = fcst$coords$latitude, + grid = regrid_params$fcst.gridtype, + method = recipe$Analysis$Regrid$method, + avoid_writes = TRUE, crop = TRUE, + force_remap = TRUE) + + fcst$data <- aux$data_array + fcst$coords$longitude <- aux$lons + fcst$coords$latitude <- aux$lats + fcst$dims['longitude'] <- dim(aux$data_array)['longitude'] + fcst$dims['latitude'] <- dim(aux$data_array)['latitude'] + rm(aux) + } + } + + # Regrid all data to user defined grid: + if(!recipe$Analysis$Regrid$type %in% c('to_system','to_reference')){ + + aux <- CDORemap(data_array = hcst$data, + lons = hcst$coords$longitude, lats = hcst$coords$latitude, + grid = regrid_params$fcst.gridtype, method = recipe$Analysis$Regrid$method, + avoid_writes = TRUE, crop = TRUE, + force_remap = TRUE) + + hcst$data <- aux$data_array + hcst$coords$longitude <- aux$lons + hcst$coords$latitude <- aux$lats + hcst$dims['longitude'] <- dim(aux$data_array)['longitude'] + hcst$dims['latitude'] <- dim(aux$data_array)['latitude'] + rm(aux) + + if (!is.null(recipe$Analysis$Time$fcst_year)) { + aux <- CDORemap(data_array = fcst$data, + lons = fcst$coords$longitude, + lats = fcst$coords$latitude, + grid = regrid_params$fcst.gridtype, + method = recipe$Analysis$Regrid$method, + avoid_writes = TRUE, crop = TRUE, + force_remap = TRUE) + + fcst$data <- aux$data_array + fcst$coords$longitude <- aux$lons + fcst$coords$latitude <- aux$lats + fcst$dims['longitude'] <- dim(aux$data_array)['longitude'] + fcst$dims['latitude'] <- dim(aux$data_array)['latitude'] + rm(aux) + } + + aux <- CDORemap(data_array = obs$data, + lons = obs$coords$longitude, + lats = obs$coords$latitude, + grid = regrid_params$obs.gridtype, + method = recipe$Analysis$Regrid$method, + avoid_writes = TRUE, crop = TRUE, + force_remap = TRUE) + + obs$data <- aux$data_array + obs$coords$longitude <- aux$lons + obs$coords$latitude <- aux$lats + obs$dims['longitude'] <- dim(aux$data_array)['longitude'] + obs$dims['latitude'] <- dim(aux$data_array)['latitude'] + rm(aux) + } + + + # Check for consistency between hcst and obs grid + if (!(recipe$Analysis$Regrid$type == 'none')) { + compare_exp_obs_grids(exp = hcst, obs = obs) + } + + info(recipe$Run$logger, + "##### DATA LOADING COMPLETED SUCCESSFULLY #####") + + return(list(hcst = hcst, fcst = fcst, obs = obs)) +} diff --git a/modules/Loading/R/mask_tas_tos.R b/modules/Loading/R/mask_tas_tos.R index a2eeb0b610e3cdbd5173ef6770442913dfab03e0..cc6937957a4aa88993b95b905765622e852f9560 100644 --- a/modules/Loading/R/mask_tas_tos.R +++ b/modules/Loading/R/mask_tas_tos.R @@ -2,29 +2,28 @@ library(multiApply) library(startR) library(s2dv) -mask_tas_tos <- function(input_data, grid, lon, lat, region = region , - lon_dim = 'lon', lat_dim = 'lat', ncores = NULL){ +mask_tas_tos <- function(input_data, mask_path, lsm_var_name = lsm_var_name, lon, lat, region = region, + lon_dim = 'lon', lat_dim = 'lat', + ncores = NULL){ - - mask <- .load_mask(grid = grid, lon_dim = lon_dim, lat_dim = lat_dim, + + mask <- .load_mask(mask_path = mask_path, lsm_var_name = lsm_var_name, lon_dim = lon_dim, lat_dim = lat_dim, sea_value = 1, land_value = 0, region = region) - ## TO DO: improve the check and correct lats stopifnot(all(lon == mask$lon)) stopifnot(max(abs(as.numeric(round(lat,2) - round(mask$lat,2)))) < 0.1) # stopifnot(all(lat == mask$lat)) - + tas <- Subset(input_data$data, along = 'var', indices = 1) tos <- Subset(input_data$data, along = 'var', indices = 2) - tas_tos <- multiApply::Apply(data = list(tas, tos), - target_dims = c(lon_dim, lat_dim), - fun = .mask_tas_tos, - mask = mask$mask, - sea_value = 1, - ncores = ncores)$output1 - input_data$data <- tas_tos - + input_data$data <- multiApply::Apply(data = list(tas, tos), + target_dims = c(lon_dim, lat_dim), + fun = .mask_tas_tos, + mask = mask$mask, + sea_value = 1, + ncores = ncores)$output1 + return(input_data) } @@ -33,52 +32,32 @@ mask_tas_tos <- function(input_data, grid, lon, lat, region = region , return(data_tas) } -.load_mask <- function(grid, mask_path = NULL, land_value = 0, sea_value = 1, +.load_mask <- function(mask_path = mask_path, lsm_var_name = lsm_var_name, land_value = 0, sea_value = 1, lon_dim = 'lon', lat_dim = 'lat', region = region){ - if (is.null(mask_path)){ - mask_sea_land_path <- '/esarchive/exp/ecmwf/system5c3s/constant/lsm/lsm.nc' ## /esarchive/recon/ecmwf/era5land/constant/lsm-r3600x1801cds/lsm.nc' - } else if (is.character(mask_path)){ - mask_sea_land_path <- mask_path - } else { - stop("mask_path must be NULL (to use the default mask and interpolate it to - the specified grid) or a string with the mask's path you want to load") - } - lons.min <- region[1] lons.max <- region[2] lats.min <- region[3] lats.max <- region[4] - - ## TO DO: - ## Fix region filter for lat and lon - ## Fix 'number' parameter for mask - - data <- startR::Start(dat = mask_sea_land_path, - var = 'lsm', - lon = 'all', - lat = 'all', - number = 1, ## needed to add for ensemble member dimension of lsm.nc - # lon = values(list(lons.min, lons.max)), - # lat = values(list(lats.min, lats.max)), - transform = CDORemapper, transform_extra_cells = 2, - transform_params = list(grid = grid, method = 'con', crop = region), - transform_vars = c('lat','lon'), + data <- startR::Start(dat = mask_path, + var = lsm_var_name, + lon = values(list(lons.min, lons.max)), + lat = values(list(lats.min, lats.max)), return_vars = list(lat = NULL, lon = NULL), synonims = list(lon = c('lon','longitude'), lat = c('lat','latitude')), - lat_reorder = Sort(decreasing = FALSE), - lon_reorder = CircularSort(0,359.9), + lat_reorder = Sort(decreasing = TRUE), + lon_reorder = CircularSort(0,360), num_procs = 1, retrieve = TRUE) - + mask <- list(mask = drop(data), lon = as.numeric(attr(data,'Variables')$common$lon), lat = as.numeric(attr(data,'Variables')$common$lat)) mask$mask[data <= 0.5] <- sea_value mask$mask[data > 0.5] <- land_value - names(dim(mask$mask)) <- c(lon_dim, lat_dim) + names(dim(mask$mask)) <- c(lon_dim, lat_dim) return(mask) } diff --git a/modules/Saving/R/get_dir.R b/modules/Saving/R/get_dir.R index a2cbc79f21620eef6b983ce3d17b8321ed3fc044..b81450e746c94b14dc5b90496a74417b6bd528e1 100644 --- a/modules/Saving/R/get_dir.R +++ b/modules/Saving/R/get_dir.R @@ -10,12 +10,14 @@ get_dir <- function(recipe, variable, agg = "global") { # variable <- strsplit(recipe$Analysis$Variables$name, ", | |,")[[1]] outdir <- recipe$Run$output_dir system <- gsub('.','', recipe$Analysis$Datasets$System$name, fixed = T) + reference <- gsub('.','', recipe$Analysis$Datasets$Reference$name, fixed = T) + calib.method <- tolower(recipe$Analysis$Workflow$Calibration$method) if (tolower(recipe$Analysis$Output_format) == 'scorecards') { # Define output dir name accordint to Scorecards format dict <- read_yaml("conf/output_dictionaries/scorecards.yml") # system <- dict$System[[recipe$Analysis$Datasets$System$name]]$short_name - dir <- paste0(outdir, "/", system, "/", variable, "/") + dir <- paste0(outdir, system, "/", reference, "/", calib.method, "/", variable, "/") } else { # Default generic output format based on FOCUS # Get startdate or hindcast period @@ -37,19 +39,16 @@ get_dir <- function(recipe, variable, agg = "global") { fcst.sdate <- paste0("hcst-", recipe$Analysis$Time$sdate) } } - ## TODO: Remove calibration method from output directory? - calib.method <- tolower(recipe$Analysis$Workflow$Calibration$method) store.freq <- recipe$Analysis$Variables$freq - ## TODO: Change "_country" if (!is.null(recipe$Analysis$Region$name)) { outdir <- paste0(outdir, "/", recipe$Analysis$Region$name) } switch(tolower(agg), - "country" = {dir <- paste0(outdir, "/", system, "/", calib.method, - "-", store.freq, "/", variable, - "_country/", fcst.sdate, "/")}, + "region" = {dir <- paste0(outdir, "/", system, "/", calib.method, + store.freq, "/", variable, + "_region/", fcst.sdate, "/")}, "global" = {dir <- paste0(outdir, "/", system, "/", calib.method, - "-", store.freq, "/", variable, "/", + store.freq, "/", variable, "/", fcst.sdate, "/")}) } diff --git a/modules/Saving/R/get_filename.R b/modules/Saving/R/get_filename.R index 1c92565179ab74ee0968323258b3efa18e396f77..eeff367af8afc01aa120c9eb69b68da30ad6c6d3 100644 --- a/modules/Saving/R/get_filename.R +++ b/modules/Saving/R/get_filename.R @@ -16,7 +16,7 @@ get_filename <- function(dir, recipe, var, date, agg, file.type) { } switch(tolower(agg), - "country" = {gg <- "-country"}, + "region" = {gg <- "-region"}, "global" = {gg <- ""}) system <- gsub('.','', recipe$Analysis$Datasets$System$name, fixed = T) @@ -36,11 +36,19 @@ get_filename <- function(dir, recipe, var, date, agg, file.type) { "obs" = {type_info <- paste0("-obs_", date, "_")}, "percentiles" = {type_info <- "-percentiles_"}, "probs" = {type_info <- paste0("-probs_", date, "_")}, - "bias" = {type_info <- paste0("-bias_", date, "_")}) + "bias" = {type_info <- paste0("-bias_", date, "_")}, + # new + "rps_syear" = {type_info <- paste0("rps_syear")}, + "rps_clim_syear" = {type_info <- paste0("rps_clim_syear")}, + "crps_syear" = {type_info <- paste0("crps_syear")}, + "crps_clim_syear" = {type_info <- paste0("crps_clim_syear")}, + "crps" = {type_info <- paste0("crps")}, + "mean_bias" = {type_info <- paste0("mean_bias")}, + {type_info <- paste0(file.type)}) # Build file name file <- paste0("scorecards_", system, "_", reference, "_", - var, type_info, hcst_start, "-", hcst_end, "_s", shortdate) + var, "_",type_info, "_", hcst_start, "-", hcst_end, "_s", shortdate) } else { switch(file.type, "skill" = {file <- paste0(var, gg, "-skill_", dd, shortdate)}, diff --git a/modules/Saving/R/save_metrics_scorecards.R b/modules/Saving/R/save_metrics_scorecards.R new file mode 100644 index 0000000000000000000000000000000000000000..9c13339583e587c1fac9647097718813ad8ec565 --- /dev/null +++ b/modules/Saving/R/save_metrics_scorecards.R @@ -0,0 +1,79 @@ +save_metrics_scorecards <- function(recipe, + skill, + data_cube, + agg = "global", + outdir = NULL) { + # Time indices and metadata + fcst.horizon <- tolower(recipe$Analysis$Horizon) + store.freq <- recipe$Analysis$Variables$freq + + # archive <- get_archive(recipe) + # global_attributes <- .get_global_attributes(recipe, archive) + # ## TODO: Sort out the logic once default behavior is decided + # if ((!is.null(recipe$Analysis$Workflow$Anomalies$compute)) && + # (recipe$Analysis$Workflow$Anomalies$compute)) { + # global_attributes <- c(list(from_anomalies = "Yes"), + # global_attributes) + # } else { + # global_attributes <- c(list(from_anomalies = "No"), + # global_attributes) + # } + + # Select start date + # If a fcst is provided, use that as the ref. year. Otherwise use 1970. + if (fcst.horizon == 'decadal') { + if (!is.null(recipe$Analysis$Time$fcst_year)) { + #PROBLEM: May be more than one fcst_year + fcst.sdate <- paste0(recipe$Analysis$Time$fcst_year[1], + sprintf('%02d', init_month), '01') + } else { + fcst.sdate <- paste0("1970", sprintf('%02d', init_month), '01') + } + } else { + if (!is.null(recipe$Analysis$Time$fcst_year)) { + fcst.sdate <- paste0(recipe$Analysis$Time$fcst_year, + recipe$Analysis$Time$sdate) + } else { + fcst.sdate <- paste0("1970", recipe$Analysis$Time$sdate) + } + } + + # This needs to be developed: + coords <- c(data_cube$coords['longitude'], + data_cube$coords['latitude']) + # Loop over variable dimension + for (var in 1:data_cube$dims[['var']]) { + # Subset skill arrays + subset_skill <- lapply(skill, function(x) { + ClimProjDiags::Subset(x, along = 'var', + indices = var, + drop = 'selected')}) + variable <- data_cube$attrs$Variable$varName[[var]] + outdir <- get_dir(recipe = recipe, variable = variable) + if (!dir.exists(outdir)) { + dir.create(outdir, recursive = T) + } + + for (i in 1:length(subset_skill)) { + if (any('syear' %in% names(dim(subset_skill[[i]])))) { + sdate_dim_save = 'syear' + dates <- data_cube$attrs$Dates + } else { + sdate_dim_save = NULL + dates <- Subset(data_cube$attrs$Dates, along = 'syear', indices = 1) + } + extra_string <- get_filename(NULL, recipe, variable, + fcst.sdate, agg, names(subset_skill)[[i]]) + SaveExp(data = subset_skill[[i]], destination = outdir, + Dates = dates, + coords = coords, + varname = names(subset_skill)[[i]], + metadata = data_cube$attrs$Variable$metadata, Datasets = NULL, + startdates = NULL, dat_dim = NULL, sdate_dim = sdate_dim_save, + ftime_dim = 'time', var_dim = NULL, memb_dim = NULL, + drop_dims = NULL, single_file = TRUE, + extra_string = extra_string) + } + } + info(recipe$Run$logger, "##### SKILL METRICS SAVED TO NETCDF FILE #####") +} \ No newline at end of file diff --git a/modules/Saving/R/tmp/CST_SaveExp.R b/modules/Saving/R/tmp/CST_SaveExp.R new file mode 100644 index 0000000000000000000000000000000000000000..2ffd8fa8cd6babc1ac5b31c18c5a50dea5a3c420 --- /dev/null +++ b/modules/Saving/R/tmp/CST_SaveExp.R @@ -0,0 +1,915 @@ +#'Save objects of class 's2dv_cube' to data in NetCDF format +#' +#'@author Perez-Zanon Nuria, \email{nuria.perez@bsc.es} +#' +#'@description This function allows to divide and save a object of class +#''s2dv_cube' into a NetCDF file, allowing to reload the saved data using +#'\code{CST_Start} or \code{CST_Load} functions. It also allows to save any +#''s2dv_cube' object that follows the NetCDF attributes conventions. +#' +#'@param data An object of class \code{s2dv_cube}. +#'@param destination A character string containing the directory name in which +#' to save the data. NetCDF file for each starting date are saved into the +#' folder tree: 'destination/Dataset/variable/'. By default the function +#' saves the data into the working directory. +#'@param sdate_dim A character string indicating the name of the start date +#' dimension. By default, it is set to 'sdate'. It can be NULL if there is no +#' start date dimension. +#'@param ftime_dim A character string indicating the name of the forecast time +#' dimension. If 'Dates' are used, it can't be NULL. If there is no forecast +#' time dimension, 'Dates' will be set to NULL and will not be used. By +#' default, it is set to 'time'. +#'@param dat_dim A character string indicating the name of dataset dimension. +#' It can be NULL if there is no dataset dimension. By default, it is set to +#' 'dataset'. +#'@param var_dim A character string indicating the name of variable dimension. +#' It can be NULL if there is no variable dimension. By default, it is set to +#' 'var'. +#'@param memb_dim A character string indicating the name of the member +#' dimension. It can be NULL if there is no member dimension. By default, it is +#' set to 'member'. +#'@param startdates A vector of dates that will be used for the filenames +#' when saving the data in multiple files (single_file = FALSE). It must be a +#' vector of the same length as the start date dimension of data. It must be a +#' vector of class \code{Dates}, \code{'POSIXct'} or character with lenghts +#' between 1 and 10. If it is NULL, the coordinate corresponding the the start +#' date dimension or the first Date of each time step will be used as the name +#' of the files. It is NULL by default. +#'@param single_file A logical value indicating if all object is saved in a +#' single file (TRUE) or in multiple files (FALSE). When it is FALSE, +#' the array is separated for datasets, variable and start date. When there are +#' no specified time dimensions, the data will be saved in a single file by +#' default. The output file name when 'single_file' is TRUE is a character +#' string containing: '__.nc'; when it is FALSE, +#' it is '_.nc'. It is FALSE by default. +#'@param drop_dims (optional) A vector of character strings indicating the +#' dimension names of length 1 that need to be dropped in order that they don't +#' appear in the netCDF file. Only is allowed to drop dimensions that are not +#' used in the computation. The dimensions used in the computation are the ones +#' specified in: sdate_dim, ftime_dim, dat_dim, var_dim and memb_dim. It is +#' NULL by default. +#'@param extra_string (Optional) A character string to be included as part of +#' the file name, for instance, to identify member or realization. When +#' single_file is TRUE, the 'extra_string' will substitute all the default +#' file name; when single_file is FALSE, the 'extra_string' will be added +#' in the file name as: '__.nc'. It is NULL by +#' default. +#'@param units_hours_since (Optional) A logical value only available for the +#' case: 'Dates' have forecast time and start date dimension, 'single_file' is +#' TRUE and 'time_bounds' are not used. When it is TRUE, it saves the forecast +#' time with units of 'hours since'; if it is FALSE, the time units will be a +#' number of time steps with its corresponding frequency (e.g. n days, n months +#' or n hours). It is FALSE by default. +#'@param global_attrs (Optional) A list with elements containing the global +#' attributes to be saved in the NetCDF. +#' +#'@return Multiple or single NetCDF files containing the data array.\cr +#'\item{\code{single_file is TRUE}}{ +#' All data is saved in a single file located in the specified destination +#' path with the following name (by default): +#' '__.nc'. Multiple variables +#' are saved separately in the same file. The forecast time units +#' are calculated from each start date (if sdate_dim is not NULL) or from +#' the time step. If 'units_hours_since' is TRUE, the forecast time units +#' will be 'hours since '. If 'units_hours_since' is FALSE, +#' the forecast time units are extracted from the frequency of the time steps +#' (hours, days, months); if no frequency is found, the units will be ’hours +#' since’. When the time units are 'hours since' the time ateps are assumed to +#' be equally spaced. +#'} +#'\item{\code{single_file is FALSE}}{ +#' The data array is subset and stored into multiple files. Each file +#' contains the data subset for each start date, variable and dataset. Files +#' with different variables and datasets are stored in separated directories +#' within the following directory tree: 'destination/Dataset/variable/'. +#' The name of each file will be by default: '_.nc'. +#' The forecast time units are calculated from each start date (if sdate_dim +#' is not NULL) or from the time step. The forecast time units will be 'hours +#' since '. +#'} +#' +#'@seealso \code{\link[startR]{Start}}, \code{\link{as.s2dv_cube}} and +#'\code{\link{s2dv_cube}} +#' +#'@examples +#'data <- lonlat_temp_st$exp +#'CST_SaveExp(data = data, ftime_dim = 'ftime', var_dim = 'var', +#' dat_dim = 'dataset', sdate_dim = 'sdate') +#' +#'@export +CST_SaveExp <- function(data, destination = "./", startdates = NULL, + sdate_dim = 'sdate', ftime_dim = 'time', + memb_dim = 'member', dat_dim = 'dataset', + var_dim = 'var', drop_dims = NULL, + single_file = FALSE, extra_string = NULL, + global_attrs = NULL, units_hours_since = FALSE) { + # Check 's2dv_cube' + if (!inherits(data, 's2dv_cube')) { + stop("Parameter 'data' must be of the class 's2dv_cube'.") + } + # Check object structure + if (!all(c('data', 'attrs') %in% names(data))) { + stop("Parameter 'data' must have at least 'data' and 'attrs' elements ", + "within the 's2dv_cube' structure.") + } + if (!inherits(data$attrs, 'list')) { + stop("Level 'attrs' must be a list with at least 'Dates' element.") + } + # metadata + if (!is.null(data$attrs$Variable$metadata)) { + if (!inherits(data$attrs$Variable$metadata, 'list')) { + stop("Element metadata from Variable element in attrs must be a list.") + } + } + # Dates + if (is.null(data$attrs$Dates)) { + stop("Element 'Dates' from 'attrs' level cannot be NULL.") + } + if (is.null(dim(data$attrs$Dates))) { + stop("Element 'Dates' from 'attrs' level must have time dimensions.") + } + # sdate_dim + if (!is.null(sdate_dim)) { + if (!is.character(sdate_dim)) { + stop("Parameter 'sdate_dim' must be a character string.") + } + } + # startdates + if (is.null(startdates)) { + if (is.character(data$coords[[sdate_dim]])) { + startdates <- data$coords[[sdate_dim]] + } + } + + SaveExp(data = data$data, + destination = destination, + coords = data$coords, + Dates = data$attrs$Dates, + time_bounds = data$attrs$time_bounds, + startdates = startdates, + varname = data$attrs$Variable$varName, + metadata = data$attrs$Variable$metadata, + Datasets = data$attrs$Datasets, + sdate_dim = sdate_dim, ftime_dim = ftime_dim, + memb_dim = memb_dim, + dat_dim = dat_dim, var_dim = var_dim, + drop_dims = drop_dims, + single_file = single_file, + extra_string = extra_string, + global_attrs = global_attrs, + units_hours_since = units_hours_since) +} +#'Save a multidimensional array with metadata to data in NetCDF format +#'@description This function allows to save a data array with metadata into a +#'NetCDF file, allowing to reload the saved data using \code{Start} function +#'from StartR package. If the original 's2dv_cube' object has been created from +#'\code{CST_Load()}, then it can be reloaded with \code{Load()}. +#' +#'@author Perez-Zanon Nuria, \email{nuria.perez@bsc.es} +#' +#'@param data A multi-dimensional array with named dimensions. +#'@param destination A character string indicating the path where to store the +#' NetCDF files. +#'@param coords A named list with elements of the coordinates corresponding to +#' the dimensions of the data parameter. The names and length of each element +#' must correspond to the names of the dimensions. If any coordinate is not +#' provided, it is set as an index vector with the values from 1 to the length +#' of the corresponding dimension. +#'@param Dates A named array of dates with the corresponding sdate and forecast +#' time dimension. If there is no sdate_dim, you can set it to NULL. +#' It must have ftime_dim dimension. +#'@param time_bounds (Optional) A list of two arrays of dates containing +#' the lower (first array) and the upper (second array) time bounds +#' corresponding to Dates. Each array must have the same dimensions as Dates. +#' If 'Dates' parameter is NULL, 'time_bounds' are not used. It is NULL by +#' default. +#'@param startdates A vector of dates that will be used for the filenames +#' when saving the data in multiple files (single_file = FALSE). It must be a +#' vector of the same length as the start date dimension of data. It must be a +#' vector of class \code{Dates}, \code{'POSIXct'} or character with lenghts +#' between 1 and 10. If it is NULL, the coordinate corresponding the the start +#' date dimension or the first Date of each time step will be used as the name +#' of the files. It is NULL by default. +#'@param varname A character string indicating the name of the variable to be +#' saved. +#'@param metadata A named list where each element is a variable containing the +#' corresponding information. The information must be contained in a list of +#' lists for each variable. +#'@param Datasets A vector of character string indicating the names of the +#' datasets. +#'@param sdate_dim A character string indicating the name of the start date +#' dimension. By default, it is set to 'sdate'. It can be NULL if there is no +#' start date dimension. +#'@param ftime_dim A character string indicating the name of the forecast time +#' dimension. By default, it is set to 'time'. It can be NULL if there is no +#' forecast time dimension. +#'@param dat_dim A character string indicating the name of dataset dimension. +#' By default, it is set to 'dataset'. It can be NULL if there is no dataset +#' dimension. +#'@param var_dim A character string indicating the name of variable dimension. +#' By default, it is set to 'var'. It can be NULL if there is no variable +#' dimension. +#'@param memb_dim A character string indicating the name of the member +#' dimension. By default, it is set to 'member'. It can be NULL if there is no +#' member dimension. +#'@param drop_dims (optional) A vector of character strings indicating the +#' dimension names of length 1 that need to be dropped in order that they don't +#' appear in the netCDF file. Only is allowed to drop dimensions that are not +#' used in the computation. The dimensions used in the computation are the ones +#' specified in: sdate_dim, ftime_dim, dat_dim, var_dim and memb_dim. It is +#' NULL by default. +#'@param single_file A logical value indicating if all object is saved in a +#' single file (TRUE) or in multiple files (FALSE). When it is FALSE, +#' the array is separated for datasets, variable and start date. When there are +#' no specified time dimensions, the data will be saved in a single file by +#' default. The output file name when 'single_file' is TRUE is a character +#' string containing: '__.nc'; when it is FALSE, +#' it is '_.nc'. It is FALSE by default. +#'@param extra_string (Optional) A character string to be included as part of +#' the file name, for instance, to identify member or realization. When +#' single_file is TRUE, the 'extra_string' will substitute all the default +#' file name; when single_file is FALSE, the 'extra_string' will be added +#' in the file name as: '__.nc'. It is NULL by +#' default. +#'@param global_attrs (Optional) A list with elements containing the global +#' attributes to be saved in the NetCDF. +#'@param units_hours_since (Optional) A logical value only available for the +#' case: Dates have forecast time and start date dimension, single_file is +#' TRUE and 'time_bounds' is NULL. When it is TRUE, it saves the forecast time +#' with units of 'hours since'; if it is FALSE, the time units will be a number +#' of time steps with its corresponding frequency (e.g. n days, n months or n +#' hours). It is FALSE by default. +#' +#'@return Multiple or single NetCDF files containing the data array.\cr +#'\item{\code{single_file is TRUE}}{ +#' All data is saved in a single file located in the specified destination +#' path with the following name (by default): +#' '__.nc'. Multiple variables +#' are saved separately in the same file. The forecast time units +#' are calculated from each start date (if sdate_dim is not NULL) or from +#' the time step. If 'units_hours_since' is TRUE, the forecast time units +#' will be 'hours since '. If 'units_hours_since' is FALSE, +#' the forecast time units are extracted from the frequency of the time steps +#' (hours, days, months); if no frequency is found, the units will be ’hours +#' since’. When the time units are 'hours since' the time ateps are assumed to +#' be equally spaced. +#'} +#'\item{\code{single_file is FALSE}}{ +#' The data array is subset and stored into multiple files. Each file +#' contains the data subset for each start date, variable and dataset. Files +#' with different variables and datasets are stored in separated directories +#' within the following directory tree: 'destination/Dataset/variable/'. +#' The name of each file will be by default: '_.nc'. +#' The forecast time units are calculated from each start date (if sdate_dim +#' is not NULL) or from the time step. The forecast time units will be 'hours +#' since '. +#'} +#' +#'@examples +#'data <- lonlat_temp_st$exp$data +#'lon <- lonlat_temp_st$exp$coords$lon +#'lat <- lonlat_temp_st$exp$coords$lat +#'coords <- list(lon = lon, lat = lat) +#'Datasets <- lonlat_temp_st$exp$attrs$Datasets +#'varname <- 'tas' +#'Dates <- lonlat_temp_st$exp$attrs$Dates +#'metadata <- lonlat_temp_st$exp$attrs$Variable$metadata +#'SaveExp(data = data, coords = coords, Datasets = Datasets, varname = varname, +#' Dates = Dates, metadata = metadata, single_file = TRUE, +#' ftime_dim = 'ftime', var_dim = 'var', dat_dim = 'dataset') +#' +#'@import easyNCDF +#'@importFrom s2dv Reorder +#'@import multiApply +#'@importFrom ClimProjDiags Subset +#'@export +SaveExp <- function(data, destination = "./", coords = NULL, + Dates = NULL, time_bounds = NULL, startdates = NULL, + varname = NULL, metadata = NULL, Datasets = NULL, + sdate_dim = 'sdate', ftime_dim = 'time', + memb_dim = 'member', dat_dim = 'dataset', var_dim = 'var', + drop_dims = NULL, single_file = FALSE, extra_string = NULL, + global_attrs = NULL, units_hours_since = FALSE) { + ## Initial checks + # data + if (is.null(data)) { + stop("Parameter 'data' cannot be NULL.") + } + dimnames <- names(dim(data)) + if (is.null(dimnames)) { + stop("Parameter 'data' must be an array with named dimensions.") + } + if (!is.null(attributes(data)$dimensions)) { + attributes(data)$dimensions <- NULL + } + # destination + if (!is.character(destination) | length(destination) > 1) { + stop("Parameter 'destination' must be a character string of one element ", + "indicating the name of the file (including the folder if needed) ", + "where the data will be saved.") + } + # drop_dims + if (!is.null(drop_dims)) { + if (!is.character(drop_dims) | any(!drop_dims %in% names(dim(data)))) { + warning("Parameter 'drop_dims' must be character string containing ", + "the data dimension names to be dropped. It will not be used.") + } else if (!all(dim(data)[drop_dims] %in% 1)) { + warning("Parameter 'drop_dims' can only contain dimension names ", + "that are of length 1. It will not be used.") + } else if (any(drop_dims %in% c(ftime_dim, sdate_dim, dat_dim, memb_dim, var_dim))) { + warning("Parameter 'drop_dims' contains dimensions used in the computation. ", + "It will not be used.") + drop_dims <- NULL + } else { + data <- Subset(x = data, along = drop_dims, + indices = lapply(1:length(drop_dims), function(x) 1), + drop = 'selected') + dimnames <- names(dim(data)) + } + } + # coords + if (!is.null(coords)) { + if (!inherits(coords, 'list')) { + stop("Parameter 'coords' must be a named list of coordinates.") + } + if (is.null(names(coords))) { + stop("Parameter 'coords' must have names corresponding to coordinates.") + } + } else { + coords <- sapply(dimnames, function(x) 1:dim(data)[x]) + } + # varname + if (is.null(varname)) { + varname <- 'X' + } else if (length(varname) > 1) { + multiple_vars <- TRUE + } else { + multiple_vars <- FALSE + } + if (!all(sapply(varname, is.character))) { + stop("Parameter 'varname' must be a character string with the ", + "variable names.") + } + # single_file + if (!inherits(single_file, 'logical')) { + warning("Parameter 'single_file' must be a logical value. It will be ", + "set as FALSE.") + single_file <- FALSE + } + # extra_string + if (!is.null(extra_string)) { + if (!is.character(extra_string)) { + stop("Parameter 'extra_string' must be a character string.") + } + } + # global_attrs + if (!is.null(global_attrs)) { + if (!inherits(global_attrs, 'list')) { + stop("Parameter 'global_attrs' must be a list.") + } + } + + ## Dimensions checks + # Spatial coordinates + if (!any(dimnames %in% .KnownLonNames()) | + !any(dimnames %in% .KnownLatNames())) { + lon_dim <- NULL + lat_dim <- NULL + } else { + lon_dim <- dimnames[which(dimnames %in% .KnownLonNames())] + lat_dim <- dimnames[which(dimnames %in% .KnownLatNames())] + } + # ftime_dim + if (!is.null(ftime_dim)) { + if (!is.character(ftime_dim)) { + stop("Parameter 'ftime_dim' must be a character string.") + } + if (!all(ftime_dim %in% dimnames)) { + stop("Parameter 'ftime_dim' is not found in 'data' dimension. Set it ", + "as NULL if there is no forecast time dimension.") + } + } + # sdate_dim + if (!is.null(sdate_dim)) { + if (!is.character(sdate_dim)) { + stop("Parameter 'sdate_dim' must be a character string.") + } + if (!all(sdate_dim %in% dimnames)) { + stop("Parameter 'sdate_dim' is not found in 'data' dimension.") + } + } + # memb_dim + if (!is.null(memb_dim)) { + if (!is.character(memb_dim)) { + stop("Parameter 'memb_dim' must be a character string.") + } + if (!all(memb_dim %in% dimnames)) { + stop("Parameter 'memb_dim' is not found in 'data' dimension. Set it ", + "as NULL if there is no member dimension.") + } + } + # dat_dim + if (!is.null(dat_dim)) { + if (!is.character(dat_dim)) { + stop("Parameter 'dat_dim' must be a character string.") + } + if (!all(dat_dim %in% dimnames)) { + stop("Parameter 'dat_dim' is not found in 'data' dimension. Set it ", + "as NULL if there is no Datasets dimension.") + } + n_datasets <- dim(data)[dat_dim] + } else { + n_datasets <- 1 + } + # var_dim + if (!is.null(var_dim)) { + if (!is.character(var_dim)) { + stop("Parameter 'var_dim' must be a character string.") + } + if (!all(var_dim %in% dimnames)) { + stop("Parameter 'var_dim' is not found in 'data' dimension. Set it ", + "as NULL if there is no variable dimension.") + } + n_vars <- dim(data)[var_dim] + } else { + n_vars <- 1 + } + # minimum dimensions + if (all(dimnames %in% c(var_dim, dat_dim))) { + if (!single_file) { + warning("Parameter data has only ", + paste(c(var_dim, dat_dim), collapse = ' and '), " dimensions ", + "and it cannot be splitted in multiple files. All data will ", + "be saved in a single file.") + single_file <- TRUE + } + } + # Dates (1): initial checks + if (!is.null(Dates)) { + if (!any(inherits(Dates, "POSIXct"), inherits(Dates, "Date"))) { + stop("Parameter 'Dates' must be of 'POSIXct' or 'Dates' class.") + } + if (is.null(dim(Dates))) { + stop("Parameter 'Dates' must have dimension names.") + } + if (all(is.null(ftime_dim), is.null(sdate_dim))) { + warning("Parameters 'ftime_dim' and 'sdate_dim' can't both be NULL ", + "if 'Dates' are used. 'Dates' will not be used.") + Dates <- NULL + } + # sdate_dim in Dates + if (!is.null(sdate_dim)) { + if (!sdate_dim %in% names(dim(Dates))) { + warning("Parameter 'sdate_dim' is not found in 'Dates' dimension. ", + "Dates will not be used.") + Dates <- NULL + } + } + # ftime_dim in Dates + if (!is.null(ftime_dim)) { + if (!ftime_dim %in% names(dim(Dates))) { + warning("Parameter 'ftime_dim' is not found in 'Dates' dimension. ", + "Dates will not be used.") + Dates <- NULL + } + } + } + # time_bounds + if (!is.null(time_bounds)) { + if (!inherits(time_bounds, 'list')) { + stop("Parameter 'time_bounds' must be a list with two dates arrays.") + } + time_bounds_dims <- lapply(time_bounds, function(x) dim(x)) + if (!identical(time_bounds_dims[[1]], time_bounds_dims[[2]])) { + stop("Parameter 'time_bounds' must have 2 arrays with same dimensions.") + } + if (is.null(Dates)) { + time_bounds <- NULL + } else { + name_tb <- sort(names(time_bounds_dims[[1]])) + name_dt <- sort(names(dim(Dates))) + if (!identical(dim(Dates)[name_dt], time_bounds_dims[[1]][name_tb])) { + stop(paste0("Parameter 'Dates' and 'time_bounds' must have same length ", + "of all dimensions.")) + } + } + } + # Dates (2): Check dimensions + if (!is.null(Dates)) { + if (any(dim(Dates)[!names(dim(Dates)) %in% c(ftime_dim, sdate_dim)] != 1)) { + stop("Parameter 'Dates' can have only 'sdate_dim' and 'ftime_dim' ", + "dimensions of length greater than 1.") + } + # drop dimensions of length 1 different from sdate_dim and ftime_dim + dim(Dates) <- dim(Dates)[names(dim(Dates)) %in% c(ftime_dim, sdate_dim)] + + # add ftime if needed + if (is.null(ftime_dim)) { + warning("A 'time' dimension of length 1 will be added to 'Dates'.") + dim(Dates) <- c(time = 1, dim(Dates)) + dim(data) <- c(time = 1, dim(data)) + dimnames <- names(dim(data)) + ftime_dim <- 'time' + if (!is.null(time_bounds)) { + time_bounds <- lapply(time_bounds, function(x) { + dim(x) <- c(time = 1, dim(x)) + return(x) + }) + } + units_hours_since <- TRUE + } + # add sdate if needed + if (is.null(sdate_dim)) { + if (!single_file) { + dim(Dates) <- c(dim(Dates), sdate = 1) + dim(data) <- c(dim(data), sdate = 1) + dimnames <- names(dim(data)) + sdate_dim <- 'sdate' + if (!is.null(time_bounds)) { + time_bounds <- lapply(time_bounds, function(x) { + dim(x) <- c(dim(x), sdate = 1) + return(x) + }) + } + if (!is.null(startdates)) { + if (length(startdates) != 1) { + warning("Parameter 'startdates' must be of length 1 if 'sdate_dim' is NULL.", + "They won't be used.") + startdates <- NULL + } + } + } + units_hours_since <- TRUE + } + } + # startdates + if (!is.null(Dates)) { + # check startdates + if (is.null(startdates)) { + startdates <- Subset(Dates, along = ftime_dim, 1, drop = 'selected') + } else if (any(nchar(startdates) > 10, nchar(startdates) < 1)) { + warning("Parameter 'startdates' should be a character string containing ", + "the start dates in the format 'yyyy-mm-dd', 'yyyymmdd', 'yyyymm', ", + "'POSIXct' or 'Dates' class. Files will be named with Dates instead.") + startdates <- Subset(Dates, along = ftime_dim, 1, drop = 'selected') + } + } else if (!single_file) { + warning("Dates must be provided if 'data' must be saved in separated files. ", + "All data will be saved in a single file.") + single_file <- TRUE + } + # startdates + if (is.null(startdates)) { + if (is.null(sdate_dim)) { + startdates <- 'XXX' + } else { + startdates <- rep('XXX', dim(data)[sdate_dim]) + } + } else { + if (any(inherits(startdates, "POSIXct"), inherits(startdates, "Date"))) { + startdates <- format(startdates, "%Y%m%d") + } + if (!is.null(sdate_dim)) { + if (dim(data)[sdate_dim] != length(startdates)) { + warning(paste0("Parameter 'startdates' doesn't have the same length ", + "as dimension '", sdate_dim,"', it will not be used.")) + startdates <- Subset(Dates, along = ftime_dim, 1, drop = 'selected') + startdates <- format(startdates, "%Y%m%d") + } + } + } + + # Datasets + if (is.null(Datasets)) { + Datasets <- rep('XXX', n_datasets ) + } + if (inherits(Datasets, 'list')) { + Datasets <- names(Datasets) + } + if (n_datasets > length(Datasets)) { + warning("Dimension 'Datasets' in 'data' is greater than those listed in ", + "element 'Datasets' and the first element will be reused.") + Datasets <- c(Datasets, rep(Datasets[1], n_datasets - length(Datasets))) + } else if (n_datasets < length(Datasets)) { + warning("Dimension 'Datasets' in 'data' is smaller than those listed in ", + "element 'Datasets' and only the firsts elements will be used.") + Datasets <- Datasets[1:n_datasets] + } + + ## NetCDF dimensions definition + excluded_dims <- var_dim + if (!is.null(Dates)) { + excluded_dims <- c(excluded_dims, sdate_dim, ftime_dim) + } + if (!single_file) { + excluded_dims <- c(excluded_dims, dat_dim) + } + + ## Unknown dimensions check + alldims <- c(dat_dim, var_dim, sdate_dim, lon_dim, lat_dim, ftime_dim, memb_dim) + if (!all(dimnames %in% alldims)) { + unknown_dims <- dimnames[which(!dimnames %in% alldims)] + memb_dim <- c(memb_dim, unknown_dims) + } + + filedims <- c(dat_dim, var_dim, sdate_dim, lon_dim, lat_dim, ftime_dim, memb_dim) + filedims <- filedims[which(!filedims %in% excluded_dims)] + + # Delete unneded coords + coords[c(names(coords)[!names(coords) %in% filedims])] <- NULL + out_coords <- NULL + for (i_coord in filedims) { + # vals + if (i_coord %in% names(coords)) { + if (length(coords[[i_coord]]) != dim(data)[i_coord]) { + warning(paste0("Coordinate '", i_coord, "' has different lenght as ", + "its dimension and it will not be used.")) + out_coords[[i_coord]] <- 1:dim(data)[i_coord] + } else if (is.numeric(coords[[i_coord]])) { + out_coords[[i_coord]] <- as.vector(coords[[i_coord]]) + } else { + out_coords[[i_coord]] <- 1:dim(data)[i_coord] + } + } else { + out_coords[[i_coord]] <- 1:dim(data)[i_coord] + } + dim(out_coords[[i_coord]]) <- dim(data)[i_coord] + + ## metadata + if (i_coord %in% names(metadata)) { + if ('variables' %in% names(attributes(metadata[[i_coord]]))) { + # from Start: 'lon' or 'lat' + attrs <- attributes(metadata[[i_coord]])[['variables']] + attrs[[i_coord]]$dim <- NULL + attr(out_coords[[i_coord]], 'variables') <- attrs + } else if (inherits(metadata[[i_coord]], 'list')) { + # from Start and Load: main var + attr(out_coords[[i_coord]], 'variables') <- list(metadata[[i_coord]]) + names(attributes(out_coords[[i_coord]])$variables) <- i_coord + } else if (!is.null(attributes(metadata[[i_coord]]))) { + # from Load + attrs <- attributes(metadata[[i_coord]]) + # We remove because some attributes can't be saved + attrs <- NULL + attr(out_coords[[i_coord]], 'variables') <- list(attrs) + names(attributes(out_coords[[i_coord]])$variables) <- i_coord + } + } + } + + if (!single_file) { + for (i in 1:n_datasets) { + path <- file.path(destination, Datasets[i], varname) + for (j in 1:n_vars) { + if (!dir.exists(path[j])) { + dir.create(path[j], recursive = TRUE) + } + startdates <- gsub("-", "", startdates) + dim(startdates) <- c(length(startdates)) + names(dim(startdates)) <- sdate_dim + if (is.null(dat_dim) & is.null(var_dim)) { + data_subset <- data + } else if (is.null(dat_dim)) { + data_subset <- Subset(data, c(var_dim), list(j), drop = 'selected') + } else if (is.null(var_dim)) { + data_subset <- Subset(data, along = c(dat_dim), list(i), drop = 'selected') + } else { + data_subset <- Subset(data, c(dat_dim, var_dim), list(i, j), drop = 'selected') + } + target <- names(dim(data_subset))[which(!names(dim(data_subset)) %in% c(sdate_dim, ftime_dim))] + target_dims_data <- c(target, ftime_dim) + if (is.null(Dates)) { + input_data <- list(data_subset, startdates) + target_dims <- list(target_dims_data, NULL) + } else if (!is.null(time_bounds)) { + input_data <- list(data_subset, startdates, Dates, + time_bounds[[1]], time_bounds[[2]]) + target_dims = list(target_dims_data, NULL, + ftime_dim, ftime_dim, ftime_dim) + } else { + input_data <- list(data_subset, startdates, Dates) + target_dims = list(target_dims_data, NULL, ftime_dim) + } + Apply(data = input_data, + target_dims = target_dims, + fun = .saveexp, + destination = path[j], + coords = out_coords, + ftime_dim = ftime_dim, + varname = varname[j], + metadata_var = metadata[[varname[j]]], + extra_string = extra_string, + global_attrs = global_attrs) + } + } + } else { + # time_bnds + if (!is.null(time_bounds)) { + time_bnds <- c(time_bounds[[1]], time_bounds[[2]]) + } + # Dates + remove_metadata_dim <- TRUE + if (!is.null(Dates)) { + if (is.null(sdate_dim)) { + sdates <- Dates[1] + # ftime definition + leadtimes <- as.numeric(difftime(Dates, sdates, units = "hours")) + } else { + # sdate definition + sdates <- Subset(Dates, along = ftime_dim, 1, drop = 'selected') + differ <- as.numeric(difftime(sdates, sdates[1], units = "hours")) + dim(differ) <- dim(data)[sdate_dim] + differ <- list(differ) + names(differ) <- sdate_dim + out_coords <- c(differ, out_coords) + attrs <- list(units = paste('hours since', sdates[1]), + calendar = 'proleptic_gregorian', longname = sdate_dim) + attr(out_coords[[sdate_dim]], 'variables')[[sdate_dim]] <- attrs + # ftime definition + Dates <- Reorder(Dates, c(ftime_dim, sdate_dim)) + differ_ftime <- array(dim = dim(Dates)) + for (i in 1:length(sdates)) { + differ_ftime[, i] <- as.numeric(difftime(Dates[, i], Dates[1, i], + units = "hours")) + } + dim(differ_ftime) <- dim(Dates) + leadtimes <- Subset(differ_ftime, along = sdate_dim, 1, drop = 'selected') + if (!all(apply(differ_ftime, 1, function(x){length(unique(x)) == 1}))) { + warning("Time steps are not equal for all start dates. Only ", + "forecast time values for the first start date will be saved ", + "correctly.") + } + } + if (all(!units_hours_since, is.null(time_bounds))) { + if (all(diff(leadtimes/24) == 1)) { + # daily values + units <- 'days' + leadtimes_vals <- round(leadtimes/24) + 1 + } else if (all(diff(leadtimes/24) %in% c(28, 29, 30, 31))) { + # monthly values + units <- 'months' + leadtimes_vals <- round(leadtimes/(30.437*24)) + 1 + } else { + # other frequency + units <- 'hours' + leadtimes_vals <- leadtimes + 1 + } + } else { + units <- paste('hours since', paste(sdates, collapse = ', ')) + leadtimes_vals <- leadtimes + } + + # Add time_bnds + if (!is.null(time_bounds)) { + if (is.null(sdate_dim)) { + sdates <- Dates[1] + time_bnds <- c(time_bounds[[1]], time_bounds[[2]]) + leadtimes_bnds <- as.numeric(difftime(time_bnds, sdates, units = "hours")) + dim(leadtimes_bnds) <- c(dim(Dates), bnds = 2) + } else { + # assuming they have sdate and ftime + time_bnds <- lapply(time_bounds, function(x) { + x <- Reorder(x, c(ftime_dim, sdate_dim)) + return(x) + }) + time_bnds <- c(time_bounds[[1]], time_bounds[[2]]) + dim(time_bnds) <- c(dim(Dates), bnds = 2) + differ_bnds <- array(dim = c(dim(time_bnds))) + for (i in 1:length(sdates)) { + differ_bnds[, i, ] <- as.numeric(difftime(time_bnds[, i, ], Dates[1, i], + units = "hours")) + } + # NOTE (TODO): Add a warning when they are not equally spaced? + leadtimes_bnds <- Subset(differ_bnds, along = sdate_dim, 1, drop = 'selected') + } + # Add time_bnds + leadtimes_bnds <- Reorder(leadtimes_bnds, c('bnds', ftime_dim)) + leadtimes_bnds <- list(leadtimes_bnds) + names(leadtimes_bnds) <- 'time_bnds' + out_coords <- c(leadtimes_bnds, out_coords) + attrs <- list(units = paste('hours since', paste(sdates, collapse = ', ')), + calendar = 'proleptic_gregorian', + long_name = 'time bounds', unlim = FALSE) + attr(out_coords[['time_bnds']], 'variables')$time_bnds <- attrs + } + # Add ftime var + dim(leadtimes_vals) <- dim(data)[ftime_dim] + leadtimes_vals <- list(leadtimes_vals) + names(leadtimes_vals) <- ftime_dim + out_coords <- c(leadtimes_vals, out_coords) + attrs <- list(units = units, calendar = 'proleptic_gregorian', + longname = ftime_dim, + dim = list(list(name = ftime_dim, unlim = TRUE))) + if (!is.null(time_bounds)) { + attrs$bounds = 'time_bnds' + } + attr(out_coords[[ftime_dim]], 'variables')[[ftime_dim]] <- attrs + for (j in 1:n_vars) { + remove_metadata_dim <- FALSE + metadata[[varname[j]]]$dim <- list(list(name = ftime_dim, unlim = TRUE)) + } + # Reorder ftime_dim to last + if (length(dim(data)) != which(names(dim(data)) == ftime_dim)) { + order <- c(names(dim(data))[which(!names(dim(data)) %in% c(ftime_dim))], ftime_dim) + data <- Reorder(data, order) + } + } + # var definition + extra_info_var <- NULL + for (j in 1:n_vars) { + varname_j <- varname[j] + metadata_j <- metadata[[varname_j]] + if (is.null(var_dim)) { + out_coords[[varname_j]] <- data + } else { + out_coords[[varname_j]] <- Subset(data, var_dim, j, drop = 'selected') + } + if (!is.null(metadata_j)) { + if (remove_metadata_dim) metadata_j$dim <- NULL + attr(out_coords[[varname_j]], 'variables') <- list(metadata_j) + names(attributes(out_coords[[varname_j]])$variables) <- varname_j + } + # Add global attributes + if (!is.null(global_attrs)) { + attributes(out_coords[[varname_j]])$global_attrs <- global_attrs + } + } + if (is.null(extra_string)) { + first_sdate <- startdates[1] + last_sdate <- startdates[length(startdates)] + gsub("-", "", first_sdate) + file_name <- paste0(paste(c(varname, + gsub("-", "", first_sdate), + gsub("-", "", last_sdate)), + collapse = '_'), ".nc") + } else { + nc <- substr(extra_string, nchar(extra_string)-2, nchar(extra_string)) + if (nc == ".nc") { + file_name <- extra_string + } else { + file_name <- paste0(extra_string, ".nc") + } + } + full_filename <- file.path(destination, file_name) + ArrayToNc(out_coords, full_filename) + } +} + +.saveexp <- function(data, coords, destination = "./", + startdates = NULL, dates = NULL, + time_bnds1 = NULL, time_bnds2 = NULL, + ftime_dim = 'time', varname = 'var', + metadata_var = NULL, extra_string = NULL, + global_attrs = NULL) { + remove_metadata_dim <- TRUE + if (!is.null(dates)) { + if (!any(is.null(time_bnds1), is.null(time_bnds2))) { + time_bnds <- c(time_bnds1, time_bnds2) + time_bnds <- as.numeric(difftime(time_bnds, dates[1], units = "hours")) + dim(time_bnds) <- c(dim(data)[ftime_dim], bnds = 2) + time_bnds <- Reorder(time_bnds, c('bnds', ftime_dim)) + time_bnds <- list(time_bnds) + names(time_bnds) <- 'time_bnds' + coords <- c(time_bnds, coords) + attrs <- list(units = paste('hours since', dates[1]), + calendar = 'proleptic_gregorian', + longname = 'time bounds') + attr(coords[['time_bnds']], 'variables')$time_bnds <- attrs + } + # Add ftime_dim + differ <- as.numeric(difftime(dates, dates[1], units = "hours")) + dim(differ) <- dim(data)[ftime_dim] + differ <- list(differ) + names(differ) <- ftime_dim + coords <- c(differ, coords) + attrs <- list(units = paste('hours since', dates[1]), + calendar = 'proleptic_gregorian', + longname = ftime_dim, + dim = list(list(name = ftime_dim, unlim = TRUE))) + if (!is.null(time_bnds1)) { + attrs$bounds = 'time_bnds' + } + attr(coords[[ftime_dim]], 'variables')[[ftime_dim]] <- attrs + metadata_var$dim <- list(list(name = ftime_dim, unlim = TRUE)) + remove_metadata_dim <- FALSE + } + # Add data + coords[[varname]] <- data + if (!is.null(metadata_var)) { + if (remove_metadata_dim) metadata_var$dim <- NULL + attr(coords[[varname]], 'variables') <- list(metadata_var) + names(attributes(coords[[varname]])$variables) <- varname + } + # Add global attributes + if (!is.null(global_attrs)) { + attributes(coords[[varname]])$global_attrs <- global_attrs + } + + if (is.null(extra_string)) { + file_name <- paste0(varname, "_", startdates, ".nc") + } else { + file_name <- paste0(varname, "_", extra_string, "_", startdates, ".nc") + } + full_filename <- file.path(destination, file_name) + ArrayToNc(coords, full_filename) +} \ No newline at end of file diff --git a/modules/Saving/Saving.R b/modules/Saving/Saving.R index fc9fe4eebd0d51f1b33a9a397a566d424215d99c..60e886b96d9e952fa4a9b151818f7ce5f30698c8 100644 --- a/modules/Saving/Saving.R +++ b/modules/Saving/Saving.R @@ -14,6 +14,8 @@ source("modules/Saving/R/get_time.R") source("modules/Saving/R/get_latlon.R") source("modules/Saving/R/get_global_attributes.R") source("modules/Saving/R/drop_dims.R") +source("modules/Saving/R/save_metrics_scorecards.R") +source("modules/Saving/R/tmp/CST_SaveExp.R") Saving <- function(recipe, data, skill_metrics = NULL, diff --git a/modules/Scorecards/R/tmp/ClimPalette.R b/modules/Scorecards/R/tmp/ClimPalette.R new file mode 100644 index 0000000000000000000000000000000000000000..eed3c802cca93b87d3cf6b55fcc04a53051bb56d --- /dev/null +++ b/modules/Scorecards/R/tmp/ClimPalette.R @@ -0,0 +1,78 @@ +#'Generate Climate Color Palettes +#' +#'Generates a colorblind friendly color palette with color ranges useful in +#'climate temperature variable plotting. +#' +#'@param palette A character string of palette. The current choices: +#' \itemize{ +#' \item{'bluered': from blue through white to red.} +#' \item{'redblue': from red through white to blue.} +#' \item{'yellowred': from yellow through orange to red.} +#' \item{'redyellow': from red through orange to yellow.} +#' \item{'purpleorange': from purple through white to orange.} +#' \item{'orangepurple': from orange through white to purple.} +#' } +#'@param n A number indicating how many colors to generate. +#' +#'@return +#'ClimPalette() returns the function that generates the color palette and the +#'attribute 'na_color'.\cr +#'ClimColors() returns a vector of the colors. +#' +#'@examples +#'lims <- seq(-1, 1, length.out = 21) +#' +#'cb <- ColorBarContinuous(lims, color_fun = ClimPalette('redyellow'), plot = FALSE) +#' +#'cols <- ClimColors(20) +#'cb <- ColorBarContinuous(lims, cols, plot = FALSE) +#' +#'@importFrom grDevices colorRampPalette +#'@export +ClimPalette <- function(palette = "bluered") { + if (palette == "bluered") { + colorbar <- colorRampPalette(rev(c("#67001f", "#b2182b", "#d6604d", + "#f4a582", "#fddbc7", "#f7f7f7", + "#d1e5f0", "#92c5de", "#4393c3", + "#2166ac", "#053061"))) + attr(colorbar, 'na_color') <- 'pink' + } else if (palette == "redblue") { + colorbar <- colorRampPalette(c("#67001f", "#b2182b", "#d6604d", + "#f4a582", "#fddbc7", "#f7f7f7", + "#d1e5f0", "#92c5de", "#4393c3", + "#2166ac", "#053061")) + attr(colorbar, 'na_color') <- 'pink' + } else if (palette == "yellowred") { + colorbar <- colorRampPalette(c("#ffffcc", "#ffeda0", "#fed976", + "#feb24c", "#fd8d3c", "#fc4e2a", + "#e31a1c", "#bd0026", "#800026")) + attr(colorbar, 'na_color') <- 'pink' + } else if (palette == "redyellow") { + colorbar <- colorRampPalette(rev(c("#ffffcc", "#ffeda0", "#fed976", + "#feb24c", "#fd8d3c", "#fc4e2a", + "#e31a1c", "#bd0026", "#800026"))) + attr(colorbar, 'na_color') <- 'pink' + } else if (palette == "purpleorange") { + colorbar <- colorRampPalette(c("#2d004b", "#542789", "#8073ac", + "#b2abd2", "#d8daeb", "#f7f7f7", + "#fee0b6", "#fdb863", "#e08214", + "#b35806", "#7f3b08")) + attr(colorbar, 'na_color') <- 'pink' + } else if (palette == "orangepurple") { + colorbar <- colorRampPalette(rev(c("#2d004b", "#542789", "#8073ac", + "#b2abd2", "#d8daeb", "#f7f7f7", + "#fee0b6", "#fdb863", "#e08214", + "#b35806", "#7f3b08"))) + attr(colorbar, 'na_color') <- 'pink' + } else { + stop("Parameter 'palette' must be one of 'bluered', 'redblue', 'yellowred'", + "'redyellow', 'purpleorange' or 'orangepurple'.") + } + colorbar +} + +#'@rdname ClimPalette +#'@export +ClimColors <- function(n, palette = "bluered") { + ClimPalette(palette)(n) +} diff --git a/modules/Scorecards/R/tmp/ColorBarContinuous.R b/modules/Scorecards/R/tmp/ColorBarContinuous.R new file mode 100644 index 0000000000000000000000000000000000000000..a4ef933fe032cf60a22e992e1bcae175b79d4961 --- /dev/null +++ b/modules/Scorecards/R/tmp/ColorBarContinuous.R @@ -0,0 +1,594 @@ +#'Draws a Continuous Color Bar +#' +#'Generates a color bar to use as colouring function for map plots and +#'optionally draws it (horizontally or vertically) to be added to map +#'multipanels or plots. It is possible to draw triangles at the ends of the +#'colour bar to represent values that go beyond the range of interest. A +#'number of options is provided to adjust the colours and the position and +#'size of the components. The drawn colour bar spans a whole figure region +#'and is compatible with figure layouts.\cr\cr +#'The generated colour bar consists of a set of breaks that define the +#'length(brks) - 1 intervals to classify each of the values in each of the +#'grid cells of a two-dimensional field. The corresponding grid cell of a +#'given value of the field will be coloured in function of the interval it +#'belongs to.\cr\cr +#'The only mandatory parameters are 'var_limits' or 'brks' (in its second +#'format, see below). +#' +#'@param brks Can be provided in two formats: +#'\itemize{ +#' \item{A single value with the number of breaks to be generated +#' automatically, between the minimum and maximum specified in 'var_limits' +#' (both inclusive). Hence the parameter 'var_limits' is mandatory if 'brks' +#' is provided with this format. If 'bar_limits' is additionally provided, +#' values only between 'bar_limits' will be generated. The higher the value +#' of 'brks', the smoother the plot will look.} +#' \item{A vector with the actual values of the desired breaks. Values will +#' be reordered by force to ascending order. If provided in this format, no +#' other parameters are required to generate/plot the colour bar.} +#'} +#' This parameter is optional if 'var_limits' is specified. If 'brks' not +#' specified but 'cols' is specified, it will take as value length(cols) + 1. +#' If 'cols' is not specified either, 'brks' will take 21 as value. +#'@param cols Vector of length(brks) - 1 valid colour identifiers, for each +#' interval defined by the breaks. This parameter is optional and will be +#' filled in with a vector of length(brks) - 1 colours generated with the +#' function provided in 'color_fun' (\code{clim.colors} by default).\cr 'cols' +#' can have one additional colour at the beginning and/or at the end with the +#' aim to colour field values beyond the range of interest represented in the +#' colour bar. If any of these extra colours is provided, parameter +#' 'triangle_ends' becomes mandatory in order to disambiguate which of the +#' ends the colours have been provided for. +#'@param vertical TRUE/FALSE for vertical/horizontal colour bar +#' (disregarded if plot = FALSE). +#'@param subsampleg The first of each subsampleg breaks will be ticked on the +#' colorbar. Takes by default an approximation of a value that yields a +#' readable tick arrangement (extreme breaks always ticked). If set to 0 or +#' lower, no labels are drawn. See the code of the function for details or +#' use 'extra_labels' for customized tick arrangements. +#'@param bar_limits Vector of two numeric values with the extremes of the +#' range of values represented in the colour bar. If 'var_limits' go beyond +#' this interval, the drawing of triangle extremes is triggered at the +#' corresponding sides, painted in 'col_inf' and 'col_sup'. Either of them +#' can be set as NA and will then take as value the corresponding extreme in +#' 'var_limits' (hence a triangle end won't be triggered for these sides). +#' Takes as default the extremes of 'brks' if available, else the same values +#' as 'var_limits'. +#'@param var_limits Vector of two numeric values with the minimum and maximum +#' values of the field to represent. These are used to know whether to draw +#' triangle ends at the extremes of the colour bar and what colour to fill +#' them in with. If not specified, take the same value as the extremes of +#' 'brks'. Hence the parameter 'brks' is mandatory if 'var_limits' is not +#' specified. +#'@param triangle_ends Vector of two logical elements, indicating whether to +#' force the drawing of triangle ends at each of the extremes of the colour +#' bar. This choice is automatically made from the provided 'brks', +#' 'bar_limits', 'var_limits', 'col_inf' and 'col_sup', but the behaviour +#' can be manually forced to draw or not to draw the triangle ends with this +#' parameter. If 'cols' is provided, 'col_inf' and 'col_sup' will take +#' priority over 'triangle_ends' when deciding whether to draw the triangle +#' ends or not. +#'@param col_inf Colour to fill the inferior triangle end with. Useful if +#' specifying colours manually with parameter 'cols', to specify the colour +#' and to trigger the drawing of the lower extreme triangle, or if 'cols' is +#' not specified, to replace the colour automatically generated by ColorBar(). +#'@param col_sup Colour to fill the superior triangle end with. Useful if +#' specifying colours manually with parameter 'cols', to specify the colour +#' and to trigger the drawing of the upper extreme triangle, or if 'cols' is +#' not specified, to replace the colour automatically generated by ColorBar(). +#'@param color_fun Function to generate the colours of the color bar. Must +#' take an integer and must return as many colours. The returned colour vector +#' can have the attribute 'na_color', with a colour to draw NA values. This +#' parameter is set by default to ClimPalette(). +#'@param plot Logical value indicating whether to only compute its breaks and +#' colours (FALSE) or to also draw it on the current device (TRUE). +#'@param draw_ticks Whether to draw ticks for the labels along the colour bar +#' (TRUE) or not (FALSE). TRUE by default. Disregarded if 'plot = FALSE'. +#'@param draw_separators Whether to draw black lines in the borders of each of +#' the colour rectancles of the colour bar (TRUE) or not (FALSE). FALSE by +#' default. Disregarded if 'plot = FALSE'. +#'@param triangle_ends_scale Scale factor for the drawn triangle ends of the +#' colour bar, if drawn at all. Takes 1 by default (rectangle triangle +#' proportional to the thickness of the colour bar). Disregarded if +#' 'plot = FALSE'. +#'@param extra_labels Numeric vector of extra labels to draw along axis of +#' the colour bar. The number of provided decimals will be conserved. +#' Disregarded if 'plot = FALSE'. +#'@param title Title to draw on top of the colour bar, most commonly with the +#' units of the represented field in the neighbour figures. Empty by default. +#'@param title_scale Scale factor for the 'title' of the colour bar. +#' Takes 1 by default. +#'@param label_scale Scale factor for the labels of the colour bar. +#' Takes 1 by default. +#'@param tick_scale Scale factor for the length of the ticks of the labels +#' along the colour bar. Takes 1 by default. +#'@param extra_margin Extra margins to be added around the colour bar, +#' in the format c(y1, x1, y2, x2). The units are margin lines. Takes +#' rep(0, 4) by default. +#'@param label_digits Number of significant digits to be displayed in the +#' labels of the colour bar, usually to avoid too many decimal digits +#' overflowing the figure region. This does not have effect over the labels +#' provided in 'extra_labels'. Takes 4 by default. +#'@param ... Arguments to be passed to the method. Only accepts the following +#' graphical parameters:\cr adj ann ask bg bty cex.lab cex.main cex.sub cin +#' col.axis col.lab col.main col.sub cra crt csi cxy err family fg fig fin +#' font font.axis font.lab font.main font.sub lend lheight ljoin lmitre lty +#' lwd mai mex mfcol mfrow mfg mkh oma omd omi page pch pin plt pty smo srt +#' tck tcl usr xaxp xaxs xaxt xlog xpd yaxp yaxs yaxt ylbias ylog.\cr For more +#' information about the parameters see `par`. +#' +#'@return +#'\item{brks}{ +#' Breaks used for splitting the range in intervals. +#'} +#'\item{cols}{ +#' Colours generated for each of the length(brks) - 1 intervals. +#' Always of length length(brks) - 1. +#'} +#'\item{col_inf}{ +#' Colour used to draw the lower triangle end in the colour +#' bar (NULL if not drawn at all). +#'} +#'\item{col_sup}{ +#' Colour used to draw the upper triangle end in the colour +#' bar (NULL if not drawn at all). +#'} +#' +#'@examples +#'cols <- c("dodgerblue4", "dodgerblue1", "forestgreen", "yellowgreen", "white", +#' "white", "yellow", "orange", "red", "saddlebrown") +#'lims <- seq(-1, 1, 0.2) +#'cb <- ColorBarContinuous(lims, cols, plot = FALSE) +#' +#'@importFrom grDevices col2rgb rgb +#'@import utils +#'@export +ColorBarContinuous <- function(brks = NULL, cols = NULL, vertical = TRUE, + subsampleg = NULL, bar_limits = NULL, var_limits = NULL, + triangle_ends = NULL, col_inf = NULL, col_sup = NULL, + color_fun = ClimPalette(), plot = TRUE, + draw_ticks = TRUE, draw_separators = FALSE, + triangle_ends_scale = 1, extra_labels = NULL, + title = NULL, title_scale = 1, + label_scale = 1, tick_scale = 1, + extra_margin = rep(0, 4), label_digits = 4, ...) { + # Required checks + if ((is.null(brks) || length(brks) < 2) && is.null(bar_limits) && is.null(var_limits)) { + stop("At least one of 'brks' with the desired breaks, 'bar_limits' or ", + "'var_limits' must be provided to generate the colour bar.") + } + + # Check brks + if (!is.null(brks)) { + if (!is.numeric(brks)) { + stop("Parameter 'brks' must be numeric if specified.") + } else if (length(brks) > 1) { + reorder <- sort(brks, index.return = TRUE) + if (!is.null(cols)) { + cols <- cols[reorder$ix[which(reorder$ix <= length(cols))]] + } + brks <- reorder$x + } + } + + # Check bar_limits + if (!is.null(bar_limits)) { + if (!(all(is.na(bar_limits) | is.numeric(bar_limits)) && (length(bar_limits) == 2))) { + stop("Parameter 'bar_limits' must be a vector of two numeric elements or NAs.") + } + } + + # Check var_limits + if (!is.null(var_limits)) { + if (!(is.numeric(var_limits) && (length(var_limits) == 2))) { + stop("Parameter 'var_limits' must be a numeric vector of length 2.") + } else if (anyNA(var_limits)) { + stop("Parameter 'var_limits' must not contain NA values.") + } else if (any(is.infinite(var_limits))) { + stop("Parameter 'var_limits' must not contain infinite values.") + } + } + + # Check cols + if (!is.null(cols)) { + if (!is.character(cols)) { + stop("Parameter 'cols' must be a vector of character strings.") + } else if (any(!sapply(cols, .IsColor))) { + stop("Parameter 'cols' must contain valid colour identifiers.") + } + } + + # Check color_fun + if (!is.function(color_fun)) { + stop("Parameter 'color_fun' must be a colour-generator function.") + } + + # Check integrity among brks, bar_limits and var_limits + if (is.null(brks) || (length(brks) < 2)) { + if (is.null(brks)) { + if (is.null(cols)) { + brks <- 21 + } else { + brks <- length(cols) + 1 + } + } + if (is.null(bar_limits) || anyNA(bar_limits)) { + # var_limits is defined + if (is.null(bar_limits)) { + bar_limits <- c(NA, NA) + } + half_width <- 0.5 * (var_limits[2] - var_limits[1]) / (brks - 1) + bar_limits[which(is.na(bar_limits))] <- c(var_limits[1] - half_width, var_limits[2] + half_width)[which(is.na(bar_limits))] + brks <- seq(bar_limits[1], bar_limits[2], length.out = brks) + } else if (is.null(var_limits)) { + # bar_limits is defined + var_limits <- bar_limits + half_width <- 0.5 * (var_limits[2] - var_limits[1]) / (brks - 1) + brks <- seq(bar_limits[1], bar_limits[2], length.out = brks) + var_limits[1] <- var_limits[1] + half_width / 50 + } else { + # both bar_limits and var_limits are defined + brks <- seq(bar_limits[1], bar_limits[2], length.out = brks) + } + } else if (is.null(bar_limits)) { + if (is.null(var_limits)) { + # brks is defined + bar_limits <- c(head(brks, 1), tail(brks, 1)) + var_limits <- bar_limits + half_width <- 0.5 * (var_limits[2] - var_limits[1]) / (length(brks) - 1) + var_limits[1] <- var_limits[1] + half_width / 50 + } else { + # brks and var_limits are defined + bar_limits <- c(head(brks, 1), tail(brks, 1)) + } + } else { + # brks and bar_limits are defined + # or + # brks, bar_limits and var_limits are defined + if (head(brks, 1) != bar_limits[1] || tail(brks, 1) != bar_limits[2]) { + stop("Parameters 'brks' and 'bar_limits' are inconsistent.") + } + } + + # Check col_inf + if (!is.null(col_inf)) { + if (!.IsColor(col_inf)) { + stop("Parameter 'col_inf' must be a valid colour identifier.") + } + } + + # Check col_sup + if (!is.null(col_sup)) { + if (!.IsColor(col_sup)) { + stop("Parameter 'col_sup' must be a valid colour identifier.") + } + } + + # Check triangle_ends + if (!is.null(triangle_ends) && (!is.logical(triangle_ends) || length(triangle_ends) != 2)) { + stop("Parameter 'triangle_ends' must be a logical vector with two elements.") + } + teflc <- triangle_ends_from_limit_cols <- c(!is.null(col_inf), !is.null(col_sup)) + if (is.null(triangle_ends) && is.null(col_inf) && is.null(col_sup)) { + triangle_ends <- c(FALSE, FALSE) + if (bar_limits[1] >= var_limits[1]) { + triangle_ends[1] <- TRUE + } + if (bar_limits[2] < var_limits[2]) { + triangle_ends[2] <- TRUE + } + } else if (!is.null(triangle_ends) && is.null(col_inf) && is.null(col_sup)) { + triangle_ends <- triangle_ends + } else if (is.null(triangle_ends) && (!is.null(col_inf) || !is.null(col_sup))) { + triangle_ends <- teflc + } else if (any(teflc != triangle_ends)) { + if (!is.null(brks) && length(brks) > 1 && !is.null(cols) && length(cols) >= length(brks)) { + triangle_ends <- teflc + } else if (!is.null(cols)) { + triangle_ends <- teflc + } else { + triangle_ends <- triangle_ends + } + } + if (plot && !is.null(var_limits)) { + if ((bar_limits[1] >= var_limits[1]) && !triangle_ends[1]) { + warning("There are variable values smaller or equal to the lower limit ", + "of the colour bar and the lower triangle end has been ", + "disabled. These will be painted in the colour for NA values.") + } + if ((bar_limits[2] < var_limits[2]) && !triangle_ends[2]) { + warning("There are variable values greater than the higher limit ", + "of the colour bar and the higher triangle end has been ", + "disabled. These will be painted in the colour for NA values.") + } + } + + # Generate colours if needed + if (is.null(cols)) { + cols <- color_fun(length(brks) - 1 + sum(triangle_ends)) + attr_bk <- attributes(cols) + if (triangle_ends[1]) { + if (is.null(col_inf)) col_inf <- head(cols, 1) + cols <- cols[-1] + } + if (triangle_ends[2]) { + if (is.null(col_sup)) col_sup <- tail(cols, 1) + cols <- cols[-length(cols)] + } + attributes(cols) <- attr_bk + } else if ((length(cols) != (length(brks) - 1))) { + stop("Incorrect number of 'brks' and 'cols'. There must be one more break than the number of colours.") + } + + # Check vertical + if (!is.logical(vertical)) { + stop("Parameter 'vertical' must be TRUE or FALSE.") + } + + # Check extra_labels + if (is.null(extra_labels)) { + extra_labels <- numeric(0) + } + if (!is.numeric(extra_labels)) { + stop("Parameter 'extra_labels' must be numeric.") + } else { + if (any(extra_labels > bar_limits[2]) || any(extra_labels < bar_limits[1])) { + stop("Parameter 'extra_labels' must not contain ticks beyond the color bar limits.") + } + } + extra_labels <- sort(extra_labels) + + # Check subsampleg + primes <- function(x) { + # Courtesy of Chase. See http://stackoverflow.com/questions/6424856/r-function-for-returning-all-factors + x <- as.integer(x) + div <- seq_len(abs(x)) + factors <- div[x %% div == 0L] + factors <- list(neg = -factors, pos = factors) + return(factors) + } + remove_final_tick <- FALSE + added_final_tick <- TRUE + if (is.null(subsampleg)) { + subsampleg <- 1 + while (length(brks) / subsampleg > 15 - 1) { + next_factor <- primes((length(brks) - 1) / subsampleg)$pos + next_factor <- next_factor[length(next_factor) - ifelse(length(next_factor) > 2, 1, 0)] + subsampleg <- subsampleg * next_factor + } + if (subsampleg > (length(brks) - 1) / 4) { + subsampleg <- max(1, round(length(brks) / 4)) + extra_labels <- c(extra_labels, bar_limits[2]) + added_final_tick <- TRUE + if ((length(brks) - 1) %% subsampleg < (length(brks) - 1) / 4 / 2) { + remove_final_tick <- TRUE + } + } + } else if (!is.numeric(subsampleg)) { + stop("Parameter 'subsampleg' must be numeric.") + } + subsampleg <- round(subsampleg) + draw_labels <- TRUE + if ((subsampleg) < 1) { + draw_labels <- FALSE + } + + # Check plot + if (!is.logical(plot)) { + stop("Parameter 'plot' must be logical.") + } + + # Check draw_separators + if (!is.logical(draw_separators)) { + stop("Parameter 'draw_separators' must be logical.") + } + + # Check triangle_ends_scale + if (!is.numeric(triangle_ends_scale)) { + stop("Parameter 'triangle_ends_scale' must be numeric.") + } + + # Check draw_ticks + if (!is.logical(draw_ticks)) { + stop("Parameter 'draw_ticks' must be logical.") + } + + # Check title + if (is.null(title)) { + title <- '' + } + if (!is.character(title)) { + stop("Parameter 'title' must be a character string.") + } + + # Check title_scale + if (!is.numeric(title_scale)) { + stop("Parameter 'title_scale' must be numeric.") + } + + # Check label_scale + if (!is.numeric(label_scale)) { + stop("Parameter 'label_scale' must be numeric.") + } + + # Check tick_scale + if (!is.numeric(tick_scale)) { + stop("Parameter 'tick_scale' must be numeric.") + } + + # Check extra_margin + if (!is.numeric(extra_margin) || length(extra_margin) != 4) { + stop("Parameter 'extra_margin' must be a numeric vector of length 4.") + } + + # Check label_digits + if (!is.numeric(label_digits)) { + stop("Parameter 'label_digits' must be numeric.") + } + label_digits <- round(label_digits) + + # Process the user graphical parameters that may be passed in the call + ## Graphical parameters to exclude + excludedArgs <- c("cex", "cex.axis", "col", "lab", "las", "mar", "mgp", "new", "ps") + userArgs <- .FilterUserGraphicArgs(excludedArgs, ...) + + # + # Plotting colorbar + # ~~~~~~~~~~~~~~~~~~~ + # + if (plot) { + pars_to_save <- c('mar', 'cex', names(userArgs), 'mai', 'mgp', 'las', 'xpd') + saved_pars <- par(pars_to_save) + par(mar = c(0, 0, 0, 0), cex = 1) + image(1, 1, t(t(1)), col = rgb(0, 0, 0, 0), axes = FALSE, xlab = '', ylab = '') + # Get the availale space + figure_size <- par('fin') + cs <- par('csi') + # This allows us to assume we always want to plot horizontally + if (vertical) { + figure_size <- rev(figure_size) + } + # pannel_to_redraw <- par('mfg') + # .SwitchToFigure(pannel_to_redraw[1], pannel_to_redraw[2]) + # Load the user parameters + par(new = TRUE) + par(userArgs) + # Set up color bar plot region + margins <- c(0.0, 0, 0.0, 0) + cex_title <- 1 * title_scale + cex_labels <- 0.9 * label_scale + cex_ticks <- -0.3 * tick_scale + spaceticklab <- max(-cex_ticks, 0) + if (vertical) { + margins[1] <- margins[1] + (1.2 * cex_labels * 3 + spaceticklab) * cs + margins <- margins + extra_margin[c(4, 1:3)] * cs + } else { + margins[1] <- margins[1] + (1.2 * cex_labels * 1 + spaceticklab) * cs + margins <- margins + extra_margin * cs + } + if (title != '') { + margins[3] <- margins[3] + (1.0 * cex_title) * cs + } + margins[3] <- margins[3] + sqrt(figure_size[2] / (margins[1] + margins[3])) * + figure_size[2] / 6 * ifelse(title != '', 0.5, 0.8) + # Set side margins + margins[2] <- margins[2] + figure_size[1] / 16 + margins[4] <- margins[4] + figure_size[1] / 16 + triangle_ends_prop <- 1 / 32 * triangle_ends_scale + triangle_ends_cex <- triangle_ends_prop * figure_size[2] + if (triangle_ends[1]) { + margins[2] <- margins[2] + triangle_ends_cex + } + if (triangle_ends[2]) { + margins[4] <- margins[4] + triangle_ends_cex + } + ncols <- length(cols) + # Set up the points of triangles + # Compute the proportion of horiz. space occupied by one plot unit + prop_unit <- (1 - (margins[2] + margins[4]) / figure_size[1]) / ncols + # Convert triangle height to plot inits + triangle_height <- triangle_ends_prop / prop_unit + left_triangle <- list(x = c(1, 1 - triangle_height, 1) - 0.5, + y = c(1.4, 1, 0.6)) + right_triangle <- list(x = c(ncols, ncols + triangle_height, ncols) + 0.5, + y = c(1.4, 1, 0.6)) + # Draw the color squares and title + if (vertical) { + par(mai = c(margins[2:4], margins[1]), + mgp = c(0, spaceticklab + 0.2, 0), las = 1) + d <- 4 + image(1, 1:ncols, t(1:ncols), axes = FALSE, col = cols, + xlab = '', ylab = '') + title(ylab = title, line = cex_title * (0.2 + 0.1), cex.lab = cex_title) + # Draw top and bottom border lines + lines(c(0.6, 0.6), c(1 - 0.5, ncols + 0.5)) + lines(c(1.4, 1.4), c(1 - 0.5, ncols + 0.5)) + # Rotate triangles + names(left_triangle) <- rev(names(left_triangle)) + names(right_triangle) <- rev(names(right_triangle)) + } else { + # The term - cex_labels / 4 * (3 / cex_labels - 1) was found by + # try and error + par(mai = margins, + mgp = c(0, cex_labels / 2 + spaceticklab + - cex_labels / 4 * (3 / cex_labels - 1), 0), + las = 1) + d <- 1 + image(1:ncols, 1, t(t(1:ncols)), axes = FALSE, col = cols, + xlab = '', ylab = '') + title(title, line = cex_title * (0.2 + 0.1), cex.main = cex_title) + # Draw top and bottom border lines + lines(c(1 - 0.5, ncols + 0.5), c(0.6, 0.6)) + lines(c(1 - 0.5, ncols + 0.5), c(1.4, 1.4)) + tick_length <- -0.4 + } + # Draw the triangles + par(xpd = TRUE) + if (triangle_ends[1]) { + # Draw left triangle + polygon(left_triangle$x, left_triangle$y, col = col_inf, border = NA) + lines(left_triangle$x, left_triangle$y) + } + if (triangle_ends[2]) { + # Draw right triangle + polygon(right_triangle$x, right_triangle$y, col = col_sup, border = NA) + lines(right_triangle$x, right_triangle$y) + } + par(xpd = FALSE) + + # Put the separators + if (vertical) { + if (draw_separators) { + for (i in 1:(ncols - 1)) { + lines(c(0.6, 1.4), c(i, i) + 0.5) + } + } + if (draw_separators || is.null(col_inf)) { + lines(c(0.6, 1.4), c(0.5, 0.5)) + } + if (draw_separators || is.null(col_sup)) { + lines(c(0.6, 1.4), c(ncols + 0.5, ncols + 0.5)) + } + } else { + if (draw_separators) { + for (i in 1:(ncols - 1)) { + lines(c(i, i) + 0.5, c(0.6, 1.4)) + } + } + if (draw_separators || is.null(col_inf)) { + lines(c(0.5, 0.5), c(0.6, 1.4)) + } + if (draw_separators || is.null(col_sup)) { + lines(c(ncols + 0.5, ncols + 0.5), c(0.6, 1.4)) + } + } + # Put the ticks + plot_range <- length(brks) - 1 + var_range <- tail(brks, 1) - head(brks, 1) + extra_labels_at <- ((extra_labels - head(brks, 1)) / var_range) * plot_range + 0.5 + at <- seq(1, length(brks), subsampleg) + labels <- brks[at] + # Getting rid of next-to-last tick if too close to last one + if (remove_final_tick) { + at <- at[-length(at)] + labels <- labels[-length(labels)] + } + labels <- signif(labels, label_digits) + if (added_final_tick) { + extra_labels[length(extra_labels)] <- signif(tail(extra_labels, 1), label_digits) + } + at <- at - 0.5 + at <- c(at, extra_labels_at) + labels <- c(labels, extra_labels) + tick_reorder <- sort(at, index.return = TRUE) + at <- tick_reorder$x + if (draw_labels) { + labels <- labels[tick_reorder$ix] + } else { + labels <- FALSE + } + axis(d, at = at, tick = draw_ticks, labels = labels, cex.axis = cex_labels, tcl = cex_ticks) + par(saved_pars) + } + invisible(list(brks = brks, cols = cols, col_inf = col_inf, col_sup = col_sup)) +} diff --git a/modules/Scorecards/R/tmp/LoadMetrics.R b/modules/Scorecards/R/tmp/LoadMetrics.R index e5e154213377be7ec82100507a414d93e091d0b6..4f803c5a5455e4be8d1c353f5b0e84c24cf91c4c 100644 --- a/modules/Scorecards/R/tmp/LoadMetrics.R +++ b/modules/Scorecards/R/tmp/LoadMetrics.R @@ -12,11 +12,10 @@ #'@param var A character string following the format from #' variable-dictionary.yml from verification suite (TO DO: multiple variables). #' The accepted names are: 'psl', 'tas', 'sfcWind', 'prlr'. -#'@param start.year A numeric indicating the start year of the reference period -#'@param end.year A numeric indicating the end year of the reference period -#'@param start.months A vector indicating the numbers of the start months -#'@param forecast.months A vector indicating the numbers of the forecast months -#'@param input.path A character string indicating the path where metrics output +#'@param period A character string indicating the start and end years of the +#' reference period (e.g. '1993-203') +#'@param start_months A vector indicating the numbers of the start months +#'@param input_path A character string indicating the path where metrics output #' files from verification suite are saved (or any other compatible files) #' #'@return A is a list by system and reference containing an array of with @@ -28,20 +27,29 @@ #'loaded_metrics <- LoadMetrics(system = c('ECMWF-SEAS5','DWD-GFCS2.1'), #' reference. = 'ERA5', #' var = 'tas', -#' start.year = 1993, -#' end.year = 2016, +#' period = '1993-2016' #' metrics = c('mean_bias', 'enscorr', 'rpss', 'crpss', 'enssprerr'), -#' start.months = sprintf("%02d", 1:12), -#' forecast.months = 1:6, -#' input.path = '/esarchive/scratch/nmilders/scorecards_data/input_data') +#' start_months = sprintf("%02d", 1:12), +#' calib_method = 'raw', +#' input_path = '/esarchive/scratch/nmilders/scorecards_data/input_data') #'} #'@import easyNCDF #'@import multiApply #'@export -LoadMetrics <- function(system, reference, var, start.year, end.year, - metrics, start.months, forecast.months, - inf_to_na = FALSE, - input.path) { + +system <- 'ECMWF-SEAS5' +reference <- 'ERA5' +var <- 'tas' +period <- '1993-2016' +metrics <- 'rps_syear' +start_months <- 1:2 +input_path <- '/esarchive/scratch/nmilders/scorecards_data/syear/testing/Skill/' +calib_method <- 'raw' +syear <- TRUE + +LoadMetrics <- function(input_path, system, reference, var, period, + metrics, start_months, calib_method = NULL, + inf_to_na = FALSE) { # Initial checks ## system @@ -59,51 +67,38 @@ LoadMetrics <- function(system, reference, var, start.year, end.year, "names.") } if (length(var) > 1) { - warning("Parameter 'var' must be of length one. Only the first value ", + warning("Parameter 'var' must be of length one. Only the first value ", "will be used.") var <- var[1] } - ## start.year - if (!is.numeric(start.year)) { - stop("Parameter 'start.year' must be a numeric value.") - } - ## end.year - if (!is.numeric(end.year)) { - stop("Parameter 'end.year' must be a numeric value.") - } ## metrics if (!is.character(metrics)) { stop("Parameter 'metrics' cannot be NULL.") } - ## start.months - if (is.character(start.months)) { - warning("Parameter 'start.months' must be a numeric vector indicating ", + ## start_months + if (is.character(start_months)) { + warning("Parameter 'start_months' must be a numeric vector indicating ", "the starting months.") - start.months <- as.numeric(start.months) + start_months <- as.numeric(start_months) } - if (!is.numeric(start.months)) { - stop("Parameter 'start.months' must be a numeric vector indicating ", + if (!is.numeric(start_months)) { + stop("Parameter 'start_months' must be a numeric vector indicating ", "the starting months.") } - start.months <- sprintf("%02d", start.months) + start_months <- sprintf("%02d", start_months) ## Check if sdates are continuous or discrete - if (all(diff(as.numeric(start.months)) == 1)) { - consecutive_start.months <- TRUE + if (all(diff(as.numeric(start_months)) == 1)) { + consecutive_start_months <- TRUE } else { - consecutive_start.months <- FALSE + consecutive_start_months <- FALSE } - ## forecast.months - if (!is.numeric(forecast.months)) { - stop("Parameter 'forecast.months' must be a numeric vector indicating ", - "the starting months.") - } - ## input.path - if (!is.character(input.path)) { - stop("Parameter 'input.path must be a character string.") + ## input_path + if (!is.character(input_path)) { + stop("Parameter 'input_path must be a character string.") } - if (length(input.path) > 1) { - input.path <- input.path[1] - warning("Parameter 'input.path' has length greater than 1 and only the ", + if (length(input_path) > 1) { + input_path <- input_path[1] + warning("Parameter 'input_path' has length greater than 1 and only the ", "first element will be used.") } @@ -111,9 +106,6 @@ LoadMetrics <- function(system, reference, var, start.year, end.year, system <- gsub('.','', system, fixed = T) reference <- gsub('.','', reference, fixed = T) - period <- paste0(start.year, "-", end.year) - - ## Define empty list to saved data all_metrics <- sapply(system, function(x) NULL) ## Load data for each system for (sys in 1:length(system)) { @@ -122,42 +114,55 @@ LoadMetrics <- function(system, reference, var, start.year, end.year, ## Load data for each reference for (ref in 1:length(reference)) { ## Call function to load metrics data - met <- .Loadmetrics(input.path = input.path, # recipe$Run$output, - system = system[sys], - reference = reference[ref], - var = var, - period = period, - start.months = start.months, - forecast.months = forecast.months, - metrics = metrics) + met_by_smonth <- NULL + for (met in metrics) { + result <- .loadmetrics(input_path = input_path, + system = system[sys], + reference = reference[ref], + var = var, + period = period, + start_months = start_months, + calib_method = calib_method, + metric = met) + + result_attr <- attributes(result) + met_by_smonth <- abind::abind(met_by_smonth, result, along = length(dim(result)) + 1) + } + attributes(met_by_smonth) <- result_attr[-1] + # names(dim(met_by_smonth)) <- c(names(result_attr$dim), 'metric') + + dim(met_by_smonth) <- c(dim(result), metric = length(metrics)) ## Save metric data as array in reference list - by_reference[[reference[ref]]] <- met + by_reference[[reference[ref]]] <- met_by_smonth ## Remove -Inf from crpss data if variable is precipitation if (inf_to_na) { - by_reference[[reference]][by_reference[[reference]]==-Inf] <- NA + by_reference[[reference]][by_reference[[reference]]==-Inf] <- NA } } ## close loop on reference ## Save reference data in list of system all_metrics[[system[sys]]] <- by_reference } ## close loop on system + attributes(all_metrics)$metrics <- metrics + attributes(all_metrics)$start_months <- start_months + return(all_metrics) } ## close function -############################################################ +########################################################### -.Loadmetrics <- function(input.path, system, reference, - var, period, start.months, - forecast.months, metrics) { +.loadmetrics <- function(input_path, system, reference, + var, period, start_months, + calib_method, metric) { ## Load data for each start date - allfiles <- sapply(start.months, function(m) { - paste0(input.path, "/", system, "/", var, - "/scorecards_", system, "_", reference, "_", - var, "-skill_", period, "_s", m, # mod.pressure, + allfiles <- sapply(start_months, function(m) { + paste0(input_path, "/", system, "/", reference, "/", calib_method, "/", + var, "/scorecards_", system, "_", reference, "_", + var, "_", metric, "_", period, "_s", m, # mod.pressure, ".nc")}) allfiles_exist <- sapply(allfiles, file.exists) - + # Check dims files_exist_by_month <- seq(1:length(allfiles))[allfiles_exist] allfiledims <- sapply(allfiles[allfiles_exist], easyNCDF::NcReadDims) @@ -168,7 +173,7 @@ LoadMetrics <- function(system, reference, var, start.year, end.year, num_dims <- numeric(dim(allfiledims)[1]) for (i in 1:dim(allfiledims)[1]) { if (length(unique(allfiledims[i,])) > 1) { - warning(paste0("Dimensions of system ", system," with var ", var, + warning(paste0("Dimensions of system ", system," with var ", var, " don't match.")) } num_dims[i] <- max(allfiledims[i,]) # We take the largest dimension @@ -181,7 +186,7 @@ LoadMetrics <- function(system, reference, var, start.year, end.year, array_met_by_sdate <- Apply(data = allfiles, target_dims = 'dat', fun = function(x) { if (file.exists(x)) { - res <- easyNCDF::NcToArray(x, vars_to_read = metrics, unlist = T, + res <- easyNCDF::NcToArray(x, vars_to_read = metric, unlist = T, drop_var_dim = T) names(dim(res)) <- NULL } else { @@ -190,26 +195,25 @@ LoadMetrics <- function(system, reference, var, start.year, end.year, } res})$output1 - dim(array_met_by_sdate) <- c(metric = length(metrics), allfiledims[-1,1], + dim(array_met_by_sdate) <- c(allfiledims[-1,1], sdate = length(allfiles)) - # Attributes # Read attributes from the first existing file if ("region" %in% rownames(allfiledims)) { file_for_att <- ncdf4::nc_open(allfiles[allfiles_exist[1]]) region <- ncdf4::ncatt_get(file_for_att, 'region') - ncdf4::nc_close(file_for_att) + ncdf4::nc_close(file_for_att) attributes(array_met_by_sdate)$region <- region } else { - lon <- easyNCDF::NcToArray(allfiles[allfiles_exist][1], vars_to_read = 'longitude', + lon <- easyNCDF::NcToArray(allfiles[allfiles_exist][1], vars_to_read = 'longitude', unlist = T, drop_var_dim = T) - lat <- easyNCDF::NcToArray(allfiles[allfiles_exist][1], vars_to_read = 'latitude', + lat <- easyNCDF::NcToArray(allfiles[allfiles_exist][1], vars_to_read = 'latitude', unlist = T, drop_var_dim = T) attributes(array_met_by_sdate)$lon <- lon attributes(array_met_by_sdate)$lat <- lat } - attributes(array_met_by_sdate)$metrics <- metrics - attributes(array_met_by_sdate)$start.months <- start.months - attributes(array_met_by_sdate)$forecast.months <- forecast.months + return(array_met_by_sdate) } + + \ No newline at end of file diff --git a/modules/Scorecards/R/tmp/SCPlotScorecard.R b/modules/Scorecards/R/tmp/SCPlotScorecard.R deleted file mode 100644 index 4373057b6e0d3901abcb3fc27c09006f5156131d..0000000000000000000000000000000000000000 --- a/modules/Scorecards/R/tmp/SCPlotScorecard.R +++ /dev/null @@ -1,444 +0,0 @@ -#'Scorecards function create simple scorecards by region (types 1 & 3) -#' -#'@description This function creates a scorecard for a single system and -#'reference combination, showing data by region and forecast month. -#' -#'@param data A multidimensional array containing spatially aggregated metrics -#' data with dimensions: metric, region, sdate and ftime. -#'@param row.dim A character string indicating the dimension name to show in the -#' rows of the plot. -#'@param subrow.dim A character string indicating the dimension name to show in -#' the sub-rows of the plot. -#'@param col.dim A character string indicating the dimension name to show in the -#' columns of the plot. -#'@param subcol.dim A character string indicating the dimension name to show in -#' the sub-columns of the plot. -#'@param legend.dim A character string indicating the dimension name to use for -#' the legend. -#'@param row.names A vector of character strings with row display names. -#'@param subrow.names A vector of character strings with sub-row display names. -#'@param col.names A vector of character strings with column display names. -#'@param subcol.names A vector of character strings with sub-column display -#' names. -#'@param row.title A character string for the title of the row names. -#'@param subrow.title A character string for the title of the sub-row names. -#'@param table.title A character string for the title of the plot. -#'@param table.subtitle A character string for the sub-title of the plot. -#'@param legend.breaks A vector of numerics or a list of vectors of numerics, -#' containing the breaks for the legends. If a vector is given as input, then -#' these breaks will be repeated for each legend.dim. A list of vectors can be -#' given as input if the legend.dims require different breaks. This parameter -#' is required even if the legend is not plotted, to define the colors in the -#' scorecard table. -#'@param plot.legend A logical value to determine if the legend is plotted. -#'@param legend.width A numeric value to define the width of the legend bars. -#'@param legend.height A numeric value to define the height of the legend bars. -#'@param palette A vector of character strings or a list of vectors of -#' character strings containing the colors to use in the legends. If a vector -#' is given as input, then these colors will be used for each legend.dim. A -#' list of vectors can be given as input if different colors are desired for -#' the legend.dims. This parameter must be included even if the the legend is -#' not plotted, to define the colors in the scorecard table. -#'@param colorunder A character string or of vector of character strings -#' defining the colors to use for data values with are inferior to the lowest -#' breaks value. This parameter will also plot a inferior triangle in the -#' legend bar. The parameter can be set to NULL if there are no inferior values. -#' If a character string is given this color will be applied to all legend.dims. -#'@param colorsup A character string or of vector of character strings -#' defining the colors to use for data values with are superior to the highest -#' breaks value. This parameter will also plot a inferior triangle in the -#' legend bar. The parameter can be set to NULL if there are no superior values. -#' If a character string is given this color will be applied to all legend.dims. -#'@param round.decimal A numeric indicating to which decimal point the data -#' is to be displayed in the scorecard table. -#'@param font.size A numeric indicating the font size on the scorecard table. -#'@param fileout A path of the location to save the scorecard plots. -#' -#'@return An image file containing the scorecard. -#'@example -#'data <- array(rnorm(1000), dim = c('sdate' = 12, 'metric' = 4, 'region' = 3, -#' 'time' = 6)) -#'row.names <- c('Tropics', 'Extra-tropical NH', 'Extra-tropical SH') -#'col.names <- c('Mean bias (K)', 'Correlation', 'RPSS','CRPSS') -#'SCPlotScorecard(data = data, row.names = row.names, col.names = col.names, -#' subcol.names = month.abb[as.numeric(1:12)], -#' row.title = 'Region', subrow.title = 'Forecast Month', -#' col.title = 'Start date', -#' table.title = "Temperature of ECMWF System 5", -#' table.subtitle = "(Ref: ERA5 1994-2016)", -#' fileout = 'test.png') -#' -#'@import kableExtra -#'@import s2dv -#'@import ClimProjDiags -#'@export -SCPlotScorecard <- function(data, row.dim = 'region', subrow.dim = 'time', - col.dim = 'metric', subcol.dim = 'sdate', - legend.dim = 'metric', row.names = NULL, - subrow.names = NULL, col.names = NULL, - subcol.names = NULL, row.title = NULL, - subrow.title = NULL, col.title = NULL, - table.title = NULL, table.subtitle = NULL, - legend.breaks = NULL, plot.legend = TRUE, - label.scale = NULL, legend.width = NULL, - legend.height = NULL, palette = NULL, - colorunder = NULL, colorsup = NULL, - round.decimal = 2, font.size = 1.1, - legend.white.space = NULL, - col1.width = NULL, col2.width = NULL, - fileout = './scorecard.png') { - # Input parameter checks - ## Check data - if (!is.array(data)) { - stop("Parameter 'data' must be a numeric array.") - } - ## Check row.dim - if (!is.character(row.dim)) { - stop("Parameter 'row.dim' must be a character string.") - } - if (!row.dim %in% names(dim(data))) { - stop("Parameter 'row.dim' is not found in 'data' dimensions.") - } - ## Check row.names - if (!is.null(row.names)) { - if (length(row.names) != as.numeric(dim(data)[row.dim])) { - stop("Parameter 'row.names' must have the same length of dimension 'row.dims'.") - } - } else { - row.names <- as.character(1:dim(data)[row.dim]) - } - ## Check subrow.dim - if (!is.character(subrow.dim)) { - stop("Parameter 'subrow.dim' must be a character string.") - } - if (!subrow.dim %in% names(dim(data))) { - stop("Parameter 'subrow.dim' is not found in 'data' dimensions.") - } - ## Check subrow.names - if (!is.null(subrow.names)) { - if (length(subrow.names) != as.numeric(dim(data)[subrow.dim])) { - stop("Parameter 'subrow.names' must have the same length of dimension 'subrow.dims'.") - } - } else { - subrow.names <- as.character(1:dim(data)[subrow.dim]) - } - ## Check col.dim - if (!is.character(col.dim)) { - stop("Parameter 'col.dim' must be a character string.") - } - if (!col.dim %in% names(dim(data))) { - stop("Parameter 'col.dim' is not found in 'data' dimensions.") - } - ## Check col.names - if (!is.null(col.names)) { - if (length(col.names) != as.numeric(dim(data)[col.dim])) { - stop("Parameter 'col.names' must have the same length of dimension 'col.dims'.") - } - } else { - col.names <- as.character(1:dim(data)[col.dim]) - } - ## Check subcol.dim - if (!is.character(subcol.dim)) { - stop("Parameter 'subcol.dim' must be a character string.") - } - if (!subcol.dim %in% names(dim(data))) { - stop("Parameter 'subcol.dim' is not found in 'data' dimensions.") - } - ## Check subcol.names - if (!is.null(subcol.names)) { - if (length(subcol.names) != as.numeric(dim(data)[subcol.dim])) { - stop("Parameter 'subcol.names' must have the same length of dimension 'subcol.dims'.") - } - } else { - subcol.names <- as.character(1:dim(data)[subcol.dim]) - } - ## Check legend.dim - if (!is.character(legend.dim)) { - stop("Parameter 'legend.dim' must be a character string.") - } - if (!legend.dim %in% names(dim(data))) { - stop("Parameter 'legend.dim' is not found in 'data' dimensions.") - } - ## Check row.title inputs - if (!is.null(row.title)) { - if (!is.character(row.title)) { - stop("Parameter 'row.title must be a character string.") - } - } else { - row.title <- "" - } - ## Check subrow.title - if (!is.null(subrow.title)) { - if (!is.character(subrow.title)) { - stop("Parameter 'subrow.title must be a character string.") - } - } else { - subrow.title <- "" - } - ## Check col.title - if (!is.null(col.title)) { - if (!is.character(col.title)) { - stop("Parameter 'col.title must be a character string.") - } - } else { - col.title <- "" - } - ## Check table.title - if (!is.null(table.title)) { - if (!is.character(table.title)) { - stop("Parameter 'table.title' must be a character string.") - } - } else { - table.title <- "" - } - ## Check table.subtitle - if (!is.null(table.subtitle)) { - if (!is.character(table.subtitle)) { - stop("Parameter 'table.subtitle' must be a character string.") - } - } else { - table.subtitle <- "" - } - # Check legend.breaks - if (is.vector(legend.breaks) && is.numeric(legend.breaks)) { - legend.breaks <- rep(list(legend.breaks), as.numeric(dim(data)[legend.dim])) - } else if (is.null(legend.breaks)) { - legend.breaks <- rep(list(seq(-1, 1, 0.2)), as.numeric(dim(data)[legend.dim])) - } else if (inherits(legend.breaks, 'list')) { - stopifnot(length(legend.breaks) == as.numeric(dim(data)[legend.dim])) - } else { - stop("Parameter 'legend.breaks' must be a numeric vector, a list or NULL.") - } - ## Check plot.legend - if (!inherits(plot.legend, 'logical')) { - stop("Parameter 'plot.legend' must be a logical value.") - } - ## Check label.scale - if (is.null(label.scale)) { - label.scale <- 1.4 - } else { - if (!is.numeric(label.scale) | length(label.scale) != 1) { - stop("Parameter 'label.scale' must be a numeric value of length 1.") - } - } - ## Check legend.width - if (is.null(legend.width)) { - legend.width <- length(subcol.names) * 46.5 - } else { - if (!is.numeric(legend.width) | length(legend.width) != 1) { - stop("Parameter 'legend.width' must be a numeric value of length 1.") - } - } - if (is.null(legend.height)) { - legend.height <- 50 - } else { - if (!is.numeric(legend.height) | length(legend.height) != 1) { - stop("Parameter 'legend.height' must be a numeric value of length 1.") - } - } - ## Check colour palette input - if (is.vector(palette)) { - palette <- rep(list(palette), as.numeric(dim(data)[legend.dim])) - } else if (is.null(palette)) { - palette <- rep(list(c('#2D004B', '#542789', '#8073AC', '#B2ABD2', '#D8DAEB', - '#FEE0B6', '#FDB863', '#E08214', '#B35806', '#7F3B08')), - as.numeric(dim(data)[legend.dim])) - } else if (inherits(palette, 'list')) { - stopifnot(length(palette) == as.numeric(dim(data)[legend.dim])) - } else { - stop("Parameter 'palette' must be a numeric vector, a list or NULL.") - } - ## Check colorunder - if (is.null(colorunder)) { - colorunder <- rep("#04040E",as.numeric(dim(data)[legend.dim])) - } else if (is.character(colorunder) && length(colorunder) == 1) { - colorunder <- rep(colorunder, as.numeric(dim(data)[legend.dim])) - } else if (is.character(colorunder) && - length(colorunder) != as.numeric(dim(data)[legend.dim])) { - stop("Parameter 'colorunder' must be a numeric vector, a list or NULL.") - } - ## Check colorsup - if (is.null(colorsup)) { - colorsup <- rep("#730C04", as.numeric(dim(data)[legend.dim])) - } else if (is.character(colorsup) && length(colorsup) == 1) { - colorsup <- rep(colorsup,as.numeric(dim(data)[legend.dim])) - } else if (is.character(colorsup) && - length(colorsup) != as.numeric(dim(data)[legend.dim])) { - stop("Parameter 'colorsup' must be a numeric vector, a list or NULL.") - } - ## Check round.decimal - if (is.null(round.decimal)) { - round.decimal <- 2 - } else if (!is.numeric(round.decimal) | length(round.decimal) != 1) { - stop("Parameter 'round.decimal' must be a numeric value of length 1.") - } - ## Check font.size - if (is.null(font.size)) { - font.size <- 1 - } else if (!is.numeric(font.size) | length(font.size) != 1) { - stop("Parameter 'font.size' must be a numeric value of length 1.") - } - ## Check legend white space - if (is.null(legend.white.space)){ - legend.white.space <- 6 - } else { - legend.white.space <- legend.white.space - } - ## Check col1.width - if (is.null(col1.width)) { - if (max(nchar(row.names)) == 1 ) { - col1.width <- max(nchar(row.names)) - } else { - col1.width <- max(nchar(row.names))/4 - } - } else if (!is.numeric(col1.width)) { - stop("Parameter 'col1.width' must be a numeric value of length 1.") - } - ## Check col2.width - if (is.null(col2.width)) { - if (max(nchar(subrow.names)) == 1 ) { - col2.width <- max(nchar(subrow.names)) - } else { - col2.width <- max(nchar(subrow.names))/4 - } - } else if (!is.numeric(col2.width)) { - stop("Parameter 'col2.width' must be a numeric value of length 1.") - } - - - # Get dimensions of inputs - n.col.names <- length(col.names) - n.subcol.names <- length(subcol.names) - n.row.names <- length(row.names) - n.subrow.names <- length(subrow.names) - - # Define table size - n.rows <- n.row.names * n.subrow.names - n.columns <- 2 + (n.col.names * n.subcol.names) - - # Column names - row.names.table <- rep("", n.rows) - for (row in 1:n.row.names) { - row.names.table[floor(n.subrow.names/2) + (row - 1) * n.subrow.names] <- row.names[row] - } - - # Define scorecard table titles - column.titles <- c(row.title, subrow.title, rep(c(subcol.names), n.col.names)) - - # Round data - data <- round(data, round.decimal) - - # Define data inside the scorecards table - for (row in 1:n.row.names) { - table_temp <- data.frame(table_column_2 = as.character(subrow.names)) - for (col in 1:n.col.names) { - table_temp <- data.frame(table_temp, - Reorder(data = Subset(x = data, along = c(col.dim, row.dim), - indices = list(col, row), drop = 'selected'), - order = c(subrow.dim, subcol.dim))) - } - if (row == 1) { - table_data <- table_temp - } else { - table_data <- rbind(table_data, table_temp) - } - } - - # All data for plotting in table - table <- data.frame(table_column_1 = row.names.table, table_data) - table_temp <- array(unlist(table[3:n.columns]), dim = c(n.rows, n.columns - 2)) - # Define colors to show in table - table_colors <- .SCTableColors(table = table_temp, n.col = n.col.names, - n.subcol = n.subcol.names, n.row = n.row.names, - n.subrow = n.subrow.names, legend.breaks = legend.breaks, - palette = palette, colorunder = colorunder, - colorsup = colorsup) - metric.color <- table_colors$metric.color - metric.text.color <- table_colors$metric.text.color - # metric.text.bold <- table_colors$metric.text.bold - - options(stringsAsFactors = FALSE) - title <- data.frame(c1 = table.title, c2 = n.columns) - subtitle <- data.frame(c1 = table.subtitle, c2 = n.columns) - header.names <- as.data.frame(data.frame(c1 = c("", col.names), - c2 = c(2, rep(n.subcol.names, n.col.names)))) - header.names2 <- as.data.frame(data.frame(c1 = c("", paste0(rep(col.title, n.col.names))), - c2 = c(2, rep(n.subcol.names, n.col.names)))) - title.space <- data.frame(c1 = "\n", c2 = n.columns) - - # Hide NA values in table - options(knitr.kable.NA = '') - - # Create HTML table - table.html.part <- list() - table.html.part[[1]] <- kbl(table, escape = F, col.names = column.titles, align = rep("c", n.columns)) %>% - kable_paper("hover", full_width = F, font_size = 14 * font.size) %>% - add_header_above(header = header.names2, font_size = 16 * font.size) %>% - add_header_above(header = title.space, font_size = 10 * font.size) %>% - add_header_above(header = header.names, font_size = 20 * font.size) %>% - add_header_above(header = title.space, font_size = 10 * font.size) %>% - add_header_above(header = subtitle, font_size = 16 * font.size, align = "left") %>% - add_header_above(header = title.space, font_size = 10 * font.size) %>% - add_header_above(header = title, font_size = 22 * font.size, align = "left") - - for (i in 1:n.col.names) { - for (j in 1:n.subcol.names) { - my.background <- metric.color[, (i - 1) * n.subcol.names + j] - my.text.color <- metric.text.color[, (i - 1) * n.subcol.names + j] - # my.bold <- metric.text.bold[(i - 1) * n.subcol.names + j] - - table.html.part[[(i - 1) * n.subcol.names + j + 1]] <- - column_spec(table.html.part[[(i - 1) * n.subcol.names + j]], - 2 + n.subcol.names * (i - 1) + j, - background = my.background[1:n.rows], - color = my.text.color[1:n.rows], - bold = T) ## strsplit(toString(bold), ', ')[[1]] - } - } - - # Define position of table borders - column.borders <- NULL - for (i in 1:n.col.names) { - column.spacing <- (n.subcol.names * i) + 2 - column.borders <- c(column.borders, column.spacing) - } - - n.last.list <- n.col.names * n.subcol.names + 1 - - table.html <- column_spec(table.html.part[[n.last.list]], 1, bold = TRUE, width_min = paste0(col1.width, 'cm')) %>% - column_spec(2, bold = TRUE, width_min = paste0(col2.width, 'cm')) %>% - column_spec(3:n.columns, width_min = "1.2cm") %>% - column_spec(c(1, 2, column.borders), border_right = "2px solid black") %>% - column_spec(1, border_left = "2px solid black") %>% - column_spec(n.columns, border_right = "2px solid black") %>% - row_spec(seq(from = 0, to = n.subrow.names * n.row.names, by = n.subrow.names), - extra_css = "border-bottom: 2px solid black", hline_after = TRUE) - - if (plot.legend == TRUE) { - # Save the scorecard (without legend) - save_kable(table.html, file = paste0(fileout, '_tmpScorecard.png'), vheight = 1) - - # White space for legend - legend.white.space <- 37.8 * legend.white.space ## converting pixels to cm - - # Create and save color bar legend - scorecard_legend <- .SCLegend(legend.breaks = legend.breaks, - palette = palette, - colorunder = colorunder, - colorsup = colorsup, - label.scale = label.scale, - legend.width = legend.width, - legend.height = legend.height, - legend.white.space = legend.white.space, - fileout = fileout) - - # Add the legends below the scorecard table - system(paste0('convert -append ', fileout, '_tmpScorecard.png ', fileout, - '_tmpScorecardLegend.png ', fileout)) - # Remove temporary scorecard table - unlink(paste0(fileout, '_tmpScorecard*.png')) - } - if (plot.legend == FALSE) { - save_kable(table.html, file = fileout) - } -} diff --git a/modules/Scorecards/R/tmp/ScorecardsMulti.R b/modules/Scorecards/R/tmp/ScorecardsMulti.R index 89f1df44d8302759caa6d3529e32a6fbda26dbca..be3aab7123f0e7a09748ed46e1ac8629c2db4eb8 100644 --- a/modules/Scorecards/R/tmp/ScorecardsMulti.R +++ b/modules/Scorecards/R/tmp/ScorecardsMulti.R @@ -2,8 +2,10 @@ #' #'@description Scorecards function to create scorecard tables for multiple systems #' and references (types 9 to 12). -#'@param input_data is an array of spatially aggregated metrics containing the +#'@param data is an array of spatially aggregated metrics containing the #' following dimensions; system, reference, metric, time, sdate, region. +#'@param sign is an array with the same dimensions as data indicting the +#' significance of the metrics, with either true, false or null. #'@param system a vector of character strings defining the systems following the #' archive.yml format from verification suite #'@param reference a vector of character strings defining the references @@ -21,6 +23,28 @@ #' include in the scorecard title #'@param fileout.label a character string containing additional information to #' include in the output png file when saving the scorecard. +#'@param plot.legend A logical value to determine if the legend is plotted. +#'@param legend.breaks A vector of numerics or a list of vectors of numerics, +#' containing the breaks for the legends. If a vector is given as input, then +#' these breaks will be repeated for each legend.dim. A list of vectors can be +#' given as input if the legend.dims require different breaks. This parameter +#' is required even if the legend is not plotted, to define the colors in the +#' scorecard table. +#'@param legend.white.space A numeric value defining the initial starting +#' position of the legend bars, the white space infront of the legend is +#' calculated from the left most point of the table as a distance in cm. +#'@param legend.width A numeric value to define the width of the legend bars. +#'@param legend.height A numeric value to define the height of the legend bars. +#'@param label.scale A numeric value to define the size of the legend labels. +#'@param col1.width A numeric value defining the width of the first table column +#' in cm. +#'@param col2.width A numeric value defining the width of the second table +#' column in cm. +#'@param columns.width A numeric value defining the width all columns within the +#' table in cm (excluding the first and second columns containing the titles). +#'@param font.size A numeric indicating the font size on the scorecard table. +#'@param round.decimal A numeric indicating to which decimal point the data +#' is to be displayed in the scorecard table. Default is 2. #'@param output.path a path of the location to save the scorecard plots. #' #'@return @@ -44,19 +68,16 @@ #' ) -ScorecardsMulti <- function(data, - system, - reference, - var, - start.year, - end.year, - start.months, - forecast.months, - region.names, - metrics, - table.label, - fileout.label, - output.path){ +ScorecardsMulti <- function(data, sign, system, reference, var, start.year, + end.year, start.months, forecast.months, + region.names, metrics, plot.legend = TRUE, + legend.breaks = NULL, legend.white.space = NULL, + legend.width = 555, legend.height = 50, + table.label = NULL, fileout.label = NULL, + label.scale = 1.4, font.size = 1.1, + col1.width = NULL, col2.width = NULL, + columns.width = NULL, + round.decimal = 2, output.path){ ## Checks to apply: # first dimension in aggregated_metrics is system and second dimension is reference @@ -70,29 +91,66 @@ ScorecardsMulti <- function(data, fileout.label <- "" } - ## Make sure input_data is in correct order for using in functions: + ## Make sure data is in correct order for using in functions: data_order <- c('system','reference','metric','time','sdate','region') data <- Reorder(data, data_order) - + ## Identify metrics loaded metrics_loaded <- attributes(data)$metrics ## Select only the metrics to visualize from data - input_data <- Subset(data, along = 'metric', indices = match(metrics, metrics_loaded)) - attributes(input_data)$metrics <- metrics + data <- Subset(data, along = 'metric', indices = match(metrics, metrics_loaded)) + attributes(data)$metrics <- metrics + + if(!is.null(sign)){ + sign <- Reorder(sign, data_order) + sign <- Subset(sign, along = 'metric', indices = match(metrics, metrics_loaded)) + attributes(sign)$metrics <- metrics + } + ## Transform data for scorecards by forecast month (types 11 & 12) - transformed_data <- SCTransform(data = input_data, - sdate_dim = 'sdate', - ftime_dim = 'time') + if(length(start.months) >= length(forecast.months)){ + + transformed_data <- SCTransform(data = data, + sdate_dim = 'sdate', + ftime_dim = 'time') + + if(!is.null(sign)){ + transformed_sign <- SCTransform(data = sign, + sdate_dim = 'sdate', + ftime_dim = 'time') + } else { + transformed_sign <- NULL + } + } ## Load configuration files - sys_dict <- read_yaml("/esarchive/scratch/nmilders/gitlab/git_clones/s2s-suite/conf/archive.yml")$esarchive - var_dict <- read_yaml("/esarchive/scratch/nmilders/gitlab/git_clones/csscorecards/inst/config/variable-dictionary.yml")$vars + if (is.null(recipe$Run$filesystem)) { + filesystem <- 'esarchive' + } else { + filesystem <- recipe$Run$filesystem + } + sys_dict <- read_yaml("conf/archive.yml")[[filesystem]] + var_dict <- read_yaml("conf/variable-dictionary.yml")$vars ## Get scorecards table display names from configuration files var.name <- var_dict[[var]]$long_name - var.units <- var_dict[[var]]$units + + if ('name' %in% names(recipe$Analysis$Variables)){ + if (recipe$Analysis$Variables$name == var) { + var.units <- recipe$Analysis$Variables$units + } + } else { + for (i in 1:length(recipe$Analysis$Variables)) { + if (recipe$Analysis$Variables[[i]]$name == var) { + var.units <- recipe$Analysis$Variables[[i]]$units + } + } + } + if (is.null(var.units)) { + var.units <- var_dict[[var]]$units + } system.name <- NULL reference.name <- NULL @@ -106,6 +164,10 @@ ScorecardsMulti <- function(data, reference.name <- c(reference.name, reference.name1) } + if("Multimodel" %in% system ){ + system.name <- c(system.name, "Multimodel") + } + ## Get metric long names metric.names.list <- .met_names(metrics, var.units) @@ -147,18 +209,6 @@ ScorecardsMulti <- function(data, ## Legend upper limit color legend.col.sup <- .legend_col_sup(metrics, colorsup) legend.col.sup <- legend.col.sup[metrics] - - ## Legend inputs - plot.legend = TRUE - label.scale = 1.4 - legend.width = 555 - legend.height = 50 - - ## Data display inputs - round.decimal = 2 - font.size = 1.1 - - legend.white.space <- col1.width <- col2.width <- NULL ## Use default values of function ## Loop over region for(reg in 1:length(region.names)){ @@ -189,40 +239,54 @@ ScorecardsMulti <- function(data, start.year = start.year, end.year = end.year, scorecard.type = 9, region = sub(" ", "-", region.names[reg]), fileout.label = fileout.label, output.path = output.path) + if(model == 'system'){ - data_sc_9 <- Subset(input_data, c('reference','region'), list(1, reg), drop = 'selected') + data_sc_9 <- Subset(data, c('reference','region'), list(1, reg), drop = 'selected') + if(!is.null(sign)){ + sign_sc_9 <- Subset(sign, c('reference','region'), list(1, reg), drop = 'selected') + } else { + sign_sc_9 <- NULL + } } else if(model == 'reference'){ - data_sc_9 <- Subset(input_data, c('system','region'), list(1, reg), drop = 'selected') + data_sc_9 <- Subset(data, c('system','region'), list(1, reg), drop = 'selected') + if(!is.null(sign)){ + sign_sc_9 <- Subset(sign, c('system','region'), list(1, reg), drop = 'selected') + } else { + sign_sc_9 <- NULL + } } - SCPlotScorecard(data = data_sc_9, - row.dim = model, - subrow.dim = 'time', - col.dim = 'metric', - subcol.dim = 'sdate', - legend.dim = 'metric', - row.names = model.name, - subrow.names = forecast.months, - col.names = metric.names, - subcol.names = month.abb[as.numeric(start.months)], - table.title = table.title, - table.subtitle = table.subtitle, - row.title = table.model.name, - subrow.title = 'Forecast Month', - col.title = 'Start date', - legend.breaks = legend.breaks, - plot.legend = plot.legend, - label.scale = label.scale, - legend.width = legend.width, - legend.height = legend.height, - palette = palette, - colorunder = legend.col.inf, - colorsup = legend.col.sup, - round.decimal = round.decimal, - font.size = font.size, - legend.white.space = legend.white.space, - col1.width = 4, - col2.width = col2.width, - fileout = fileout) + + VizScorecard(data = data_sc_9, + sign = sign_sc_9, + row_dim = model, + subrow_dim = 'time', + col_dim = 'metric', + subcol_dim = 'sdate', + legend_dim = 'metric', + row_names = model.name, + subrow_names = forecast.months, + col_names = metric.names, + subcol_names = month.abb[as.numeric(start.months)], + table_title = table.title, + table_subtitle = table.subtitle, + row_title = table.model.name, + subrow_title = 'Forecast Month', + col_title = 'Start date', + legend_breaks = legend.breaks, + plot_legend = plot.legend, + label_scale = label.scale, + legend_width = legend.width, + legend_height = legend.height, + palette = palette, + colorunder = legend.col.inf, + colorsup = legend.col.sup, + round_decimal = round.decimal, + font_size = font.size, + legend_white_space = legend.white.space, + col1_width = 4, + col2_width = col2.width, + columns_width = columns.width, + fileout = fileout) #### Scorecard_type 10 #### @@ -231,126 +295,174 @@ ScorecardsMulti <- function(data, start.year = start.year, end.year = end.year, scorecard.type = 10, region = sub(" ", "-", region.names[reg]), fileout.label = fileout.label, output.path = output.path) + new_order <- c('system', 'reference', 'metric', 'region','sdate', 'time') + if(model == 'system'){ - data_sc_10 <- Subset(Reorder(input_data, new_order), c('reference','region'), list(1, reg), drop = 'selected') + data_sc_10 <- Subset(Reorder(data, new_order), c('reference','region'), list(1, reg), drop = 'selected') + if(!is.null(sign)){ + sign_sc_10 <- Subset(Reorder(sign, new_order), c('reference','region'), list(1, reg), drop = 'selected') + } else { + sign_sc_10 <- NULL + } } else if(model == 'reference'){ - data_sc_10 <- Subset(Reorder(input_data, new_order), c('system','region'), list(1, reg), drop = 'selected') + data_sc_10 <- Subset(Reorder(data, new_order), c('system','region'), list(1, reg), drop = 'selected') + if(!is.null(sign)){ + sign_sc_10 <- Subset(Reorder(sign, new_order), c('system','region'), list(1, reg), drop = 'selected') + } else { + sign_sc_10 <- NULL + } } - SCPlotScorecard(data = data_sc_10, - row.dim = 'time', - subrow.dim = model, - col.dim = 'metric', - subcol.dim = 'sdate', - legend.dim = 'metric', - row.names = forecast.months, - subrow.names = model.name, - col.names = metric.names, - subcol.names = month.abb[as.numeric(start.months)], - table.title = table.title, - table.subtitle = table.subtitle, - row.title = 'Forecast month', - subrow.title = table.model.name, - col.title = 'Start date', - legend.breaks = legend.breaks, - plot.legend = plot.legend, - label.scale = label.scale, - legend.width = legend.width, - legend.height = legend.height, - palette = palette, - colorunder = legend.col.inf, - colorsup = legend.col.sup, - round.decimal = round.decimal, - font.size = font.size, - legend.white.space = legend.white.space, - col1.width = col1.width, - col2.width = 4, - fileout = fileout) + VizScorecard(data = data_sc_10, + sign = sign_sc_10, + row_dim = 'time', + subrow_dim = model, + col_dim = 'metric', + subcol_dim = 'sdate', + legend_dim = 'metric', + row_names = forecast.months, + subrow_names = model.name, + col_names = metric.names, + subcol_names = month.abb[as.numeric(start.months)], + table_title = table.title, + table_subtitle = table.subtitle, + row_title = 'Forecast month', + subrow_title = table.model.name, + col_title = 'Start date', + legend_breaks = legend.breaks, + plot_legend = plot.legend, + label_scale = label.scale, + legend_width = legend.width, + legend_height = legend.height, + palette = palette, + colorunder = legend.col.inf, + colorsup = legend.col.sup, + round_decimal = round.decimal, + font_size = font.size, + legend_white_space = legend.white.space, + col1_width = col1.width, + col2_width = 4, + columns_width = columns.width, + fileout = fileout) #### Scorecard_type 11 #### ## (transformation only) - fileout <- .Filename(model = model, eval.name = eval.filename, var = var, - start.year = start.year, end.year = end.year, scorecard.type = 11, - region = sub(" ", "-", region.names[reg]), - fileout.label = fileout.label, output.path = output.path) - if(model == 'system'){ - data_sc_11 <- Subset(transformed_data, c('reference','region'), list(1, reg), drop = 'selected') - } else if(model == 'reference'){ - data_sc_11 <- Subset(transformed_data, c('system','region'), list(1, reg), drop = 'selected') - } - SCPlotScorecard(data = data_sc_11, - row.dim = model, - subrow.dim = 'time', - col.dim = 'metric', - subcol.dim = 'sdate', - legend.dim = 'metric', - row.names = model.name, - subrow.names = forecast.months, - col.names = metric.names, - subcol.names = month.abb[as.numeric(start.months)], - table.title = table.title, - table.subtitle = table.subtitle, - row.title = table.model.name, - subrow.title = 'Forecast Month', - col.title = 'Target month', - legend.breaks = legend.breaks, - plot.legend = plot.legend, - label.scale = label.scale, - legend.width = legend.width, - legend.height = legend.height, - palette = palette, - colorunder = legend.col.inf, - colorsup = legend.col.sup, - round.decimal = round.decimal, - font.size = font.size, - legend.white.space = legend.white.space, - col1.width = 4, - col2.width = col2.width, - fileout = fileout) + if(length(start.months) >= length(forecast.months)){ + fileout <- .Filename(model = model, eval.name = eval.filename, var = var, + start.year = start.year, end.year = end.year, scorecard.type = 11, + region = sub(" ", "-", region.names[reg]), + fileout.label = fileout.label, output.path = output.path) + + if(model == 'system'){ + data_sc_11 <- Subset(transformed_data, c('reference','region'), list(1, reg), drop = 'selected') + if(!is.null(sign)){ + sign_sc_11 <- Subset(transformed_sign, c('reference','region'), list(1, reg), drop = 'selected') + } else { + sign_sc_11 <- NULL + } + } else if(model == 'reference'){ + data_sc_11 <- Subset(transformed_data, c('system','region'), list(1, reg), drop = 'selected') + if(!is.null(sign)){ + sign_sc_11 <- Subset(transformed_sign, c('system','region'), list(1, reg), drop = 'selected') + } else { + sign_sc_11 <- NULL + } + } + + VizScorecard(data = data_sc_11, + sign = sign_sc_11, + row_dim = model, + subrow_dim = 'time', + col_dim = 'metric', + subcol_dim = 'sdate', + legend_dim = 'metric', + row_names = model.name, + subrow_names = forecast.months, + col_names = metric.names, + subcol_names = month.abb[as.numeric(start.months)], + table_title = table.title, + table_subtitle = table.subtitle, + row_title = table.model.name, + subrow_title = 'Forecast Month', + col_title = 'Target month', + legend_breaks = legend.breaks, + plot_legend = plot.legend, + label_scale = label.scale, + legend_width = legend.width, + legend_height = legend.height, + palette = palette, + colorunder = legend.col.inf, + colorsup = legend.col.sup, + round_decimal = round.decimal, + font_size = font.size, + legend_white_space = legend.white.space, + col1_width = 4, + col2_width = col2.width, + columns_width = columns.width, + fileout = fileout) + } #### Scorecard_type 12 #### ## (transformation and reorder) - fileout <- .Filename(model = model, eval.name = eval.filename, var = var, - start.year = start.year, end.year = end.year, scorecard.type = 12, - region = sub(" ", "-", region.names[reg]), - fileout.label = fileout.label, output.path = output.path) - new_order <- c('system', 'reference', 'metric', 'region','sdate', 'time') - if(model == 'system'){ - data_sc_12 <- Subset(Reorder(transformed_data, new_order), c('reference','region'), list(1, reg), drop = 'selected') - } else if(model == 'reference'){ - data_sc_12 <- Subset(Reorder(transformed_data, new_order), c('system','region'), list(1, reg), drop = 'selected') + if(length(start.months) >= length(forecast.months)){ + + fileout <- .Filename(model = model, eval.name = eval.filename, var = var, + start.year = start.year, end.year = end.year, scorecard.type = 12, + region = sub(" ", "-", region.names[reg]), + fileout.label = fileout.label, output.path = output.path) + + new_order <- c('system', 'reference', 'metric', 'region','sdate', 'time') + + if(model == 'system'){ + data_sc_12 <- Subset(Reorder(transformed_data, new_order), c('reference','region'), list(1, reg), drop = 'selected') + if(!is.null(sign)){ + sign_sc_12 <- Subset(Reorder(transformed_sign, new_order), c('reference','region'), list(1, reg), drop = 'selected') + } else { + sign_sc_12 <- NULL + } + } else if(model == 'reference'){ + data_sc_12 <- Subset(Reorder(transformed_data, new_order), c('system','region'), list(1, reg), drop = 'selected') + if(!is.null(sign)){ + sign_sc_12 <- Subset(Reorder(transformed_sign, new_order), c('system','region'), list(1, reg), drop = 'selected') + } else { + sign_sc_12 <- NULL + } + } + + VizScorecard(data = data_sc_12, + sign = sign_sc_12, + row_dim = 'time', + subrow_dim = model, + col_dim = 'metric', + subcol_dim = 'sdate', + legend_dim = 'metric', + row_names = forecast.months, + subrow_names = model.name, + col_names = metric.names, + subcol_names = month.abb[as.numeric(start.months)], + table_title = table.title, + table_subtitle = table.subtitle, + row_title = 'Forecast Month', + subrow_title = table.model.name, + col_title = 'Target month', + legend_breaks = legend.breaks, + plot_legend = plot.legend, + label_scale = label.scale, + legend_width = legend.width, + legend_height = legend.height, + palette = palette, + colorunder = legend.col.inf, + colorsup = legend.col.sup, + round_decimal = round.decimal, + font_size = font.size, + legend_white_space = legend.white.space, + col1_width = col1.width, + col2_width = 4, + columns_width = columns.width, + fileout = fileout) } - SCPlotScorecard(data = data_sc_12, - row.dim = 'time', - subrow.dim = model, - col.dim = 'metric', - subcol.dim = 'sdate', - legend.dim = 'metric', - row.names = forecast.months, - subrow.names = model.name, - col.names = metric.names, - subcol.names = month.abb[as.numeric(start.months)], - table.title = table.title, - table.subtitle = table.subtitle, - row.title = 'Forecast Month', - subrow.title = table.model.name, - col.title = 'Target month', - legend.breaks = legend.breaks, - plot.legend = plot.legend, - label.scale = label.scale, - legend.width = legend.width, - legend.height = legend.height, - palette = palette, - colorunder = legend.col.inf, - colorsup = legend.col.sup, - round.decimal = round.decimal, - font.size = font.size, - legend.white.space = legend.white.space, - col1.width = col1.width, - col2.width = 4, - fileout = fileout) } ## close loop on region diff --git a/modules/Scorecards/R/tmp/ScorecardsSingle.R b/modules/Scorecards/R/tmp/ScorecardsSingle.R index 56f08204ad5443b94bbc281a31f3c75cf5cb7614..7c76c92c60e625d054acbb0483caa61169d28b57 100644 --- a/modules/Scorecards/R/tmp/ScorecardsSingle.R +++ b/modules/Scorecards/R/tmp/ScorecardsSingle.R @@ -2,8 +2,10 @@ #' #'@description Scorecards function to create scorecard tables for one system and #' one reference combination (types 1 to 4). -#'@param input_data is an array of spatially aggregated metrics containing the +#'@param data is an array of spatially aggregated metrics containing the #' following dimensions; system, reference, metric, time, sdate, region. +#'@param sign is an array with the same dimensions as data indicting the +#' significance of the metrics, with either true, false or null. #'@param system a vector of character strings defining the systems following the #' archive.yml format from verification suite #'@param reference a vector of character strings defining the references @@ -21,7 +23,29 @@ #' include in the scorecard title #'@param fileout.label a character string containing additional information to #' include in the output png file when saving the scorecard. -#'@param output.path a path of the location to save the scorecard plots +#'@param plot.legend A logical value to determine if the legend is plotted. +#'@param legend.breaks A vector of numerics or a list of vectors of numerics, +#' containing the breaks for the legends. If a vector is given as input, then +#' these breaks will be repeated for each legend.dim. A list of vectors can be +#' given as input if the legend.dims require different breaks. This parameter +#' is required even if the legend is not plotted, to define the colors in the +#' scorecard table. +#'@param legend.white.space A numeric value defining the initial starting +#' position of the legend bars, the white space infront of the legend is +#' calculated from the left most point of the table as a distance in cm. +#'@param legend.width A numeric value to define the width of the legend bars. +#'@param legend.height A numeric value to define the height of the legend bars. +#'@param label.scale A numeric value to define the size of the legend labels. +#'@param col1.width A numeric value defining the width of the first table column +#' in cm. +#'@param col2.width A numeric value defining the width of the second table +#' column in cm. +#'@param columns.width A numeric value defining the width all columns within the +#' table in cm (excluding the first and second columns containing the titles). +#'@param font.size A numeric indicating the font size on the scorecard table. +#'@param round.decimal A numeric indicating to which decimal point the data +#' is to be displayed in the scorecard table. Default is 2. +#'@param output.path A path of the location to save the scorecard plots #' #'@return #' This function returns 4 scorecards images, saved in the directory output.path @@ -41,15 +65,18 @@ #' output.path = '/esarchive/scratch/nmilders/scorecards_images/test' #' ) #'@export -ScorecardsSingle <- function(data, system, reference, var, start.year, end.year, - start.months, forecast.months, region.names, - metrics, legend.breaks = NULL, +ScorecardsSingle <- function(data, sign, system, reference, var, start.year, + end.year, start.months, forecast.months, + region.names, metrics, plot.legend = TRUE, + legend.breaks = NULL, legend.white.space = NULL, + legend.width = 550, legend.height = 50, table.label = NULL, fileout.label = NULL, - legend.white.space = NULL, - col1.width = NULL, col2.width = NULL, - output.path){ - - ## Checks to apply: + label.scale = 1.4, font.size = 1.1, + col1.width = NULL, col2.width = NULL, + columns.width = 1.2, + round.decimal = 2, output.path){ + + ## Checks to apply: # First dimension in aggregated_metrics is system and second dimension is reference # To allow 1 region - if region = 1 --> only scorecards 1 & 3 need to be plotted # If any dimension of input dat is 1, make sure dimension is still present in array @@ -77,29 +104,65 @@ ScorecardsSingle <- function(data, system, reference, var, start.year, end.year, fileout.label <- "" } - ## Make sure input_data is in correct order for using in functions: + ## Make sure data is in correct order for using in functions: data_order <- c('system', 'reference', 'metric', 'time', 'sdate', 'region') data <- Reorder(data, data_order) ## Identify metrics loaded metrics_loaded <- attributes(data)$metrics - + ## Select only the metrics to visualize from data - input_data <- Subset(data, along = 'metric', indices = match(metrics, metrics_loaded)) - attributes(input_data)$metrics <- metrics + data <- Subset(data, along = 'metric', indices = match(metrics, metrics_loaded)) + attributes(data)$metrics <- metrics + + if(!is.null(sign)){ + sign <- Reorder(sign, data_order) + sign <- Subset(sign, along = 'metric', indices = match(metrics, metrics_loaded)) + attributes(sign)$metrics <- metrics + } ## Transform data for scorecards by forecast month (types 3 & 4) - transformed_data <- SCTransform(data = input_data, - sdate_dim = 'sdate', - ftime_dim = 'time') + if(length(start.months) >= length(forecast.months)){ + + transformed_data <- SCTransform(data = data, + sdate_dim = 'sdate', + ftime_dim = 'time') + + if(!is.null(sign)){ + transformed_sign <- SCTransform(data = sign, + sdate_dim = 'sdate', + ftime_dim = 'time') + } else { + transformed_sign <- NULL + } + } ## Load configuration files - sys_dict <- read_yaml("/esarchive/scratch/nmilders/gitlab/git_clones/s2s-suite/conf/archive.yml")$esarchive - var_dict <- read_yaml("/esarchive/scratch/nmilders/gitlab/git_clones/csscorecards/inst/config/variable-dictionary.yml")$vars + if (is.null(recipe$Run$filesystem)) { + filesystem <- 'esarchive' + } else { + filesystem <- recipe$Run$filesystem + } + sys_dict <- read_yaml("conf/archive.yml")[[filesystem]] + var_dict <- read_yaml("conf/variable-dictionary.yml")$vars ## Get scorecards table display names from configuration files var.name <- var_dict[[var]]$long_name - var.units <- var_dict[[var]]$units + + if ('name' %in% names(recipe$Analysis$Variables)){ + if (recipe$Analysis$Variables$name == var) { + var.units <- recipe$Analysis$Variables$units + } + } else { + for (i in 1:length(recipe$Analysis$Variables)) { + if (recipe$Analysis$Variables[[i]]$name == var) { + var.units <- recipe$Analysis$Variables[[i]]$units + } + } + } + if (is.null(var.units)) { + var.units <- var_dict[[var]]$units + } ## Get metric long names metric.names.list <- .met_names(metrics, var.units) @@ -126,31 +189,21 @@ ScorecardsSingle <- function(data, system, reference, var, start.year, end.year, legend.col.sup <- .legend_col_sup(metrics, colorsup) legend.col.sup <- legend.col.sup[metrics] - ## Legend inputs - plot.legend = TRUE - label.scale = 1.4 - legend.width = 555 - legend.height = 50 - - ## Data display inputs - round.decimal = 2 - font.size = 1.1 - ## Loop over system and reference for each scorecard plot - for (sys in 1:dim(input_data)['system']) { - for (ref in 1:dim(input_data)['reference']) { + for (sys in 1:dim(data)['system']) { + for (ref in 1:dim(data)['reference']) { ## TO DO: Apply check to each scorecard function ## check dimension 'metric' exists: - if (!("metric" %in% names(dim(input_data)))) { - dim(input_data) <- c(metric = 1, dim(input_data)) + if (!("metric" %in% names(dim(data)))) { + dim(data) <- c(metric = 1, dim(data)) } ## Find position of mean bias metric to calculate breaks breaks_bias <- NULL if ('mean_bias' %in% metrics){ - stopifnot(identical(names(dim(Subset(input_data, c('system', 'reference'), list(sys, ref), drop = 'selected'))), c('metric','time','sdate','region'))) - temp_data <- Subset(input_data, c('system', 'reference'), list(sys, ref), drop = 'selected') + stopifnot(identical(names(dim(Subset(data, c('system', 'reference'), list(sys, ref), drop = 'selected'))), c('metric','time','sdate','region'))) + temp_data <- Subset(data, c('system', 'reference'), list(sys, ref), drop = 'selected') pos_bias <- which(metrics == 'mean_bias') if(var == 'psl'){ temp_data[pos_bias,,,] <- temp_data[pos_bias,,,]/100 @@ -180,156 +233,202 @@ ScorecardsSingle <- function(data, system, reference, var, start.year, end.year, fileout <- .Filename(system = system[sys], reference = reference[ref], var = var, start.year = start.year, end.year = end.year, scorecard.type = 1, fileout.label = fileout.label, output.path = output.path) - data_sc_1 <- Subset(input_data, c('system', 'reference'), list(sys, ref), drop = 'selected') - SCPlotScorecard(data = data_sc_1, - row.dim = 'region', - subrow.dim = 'time', - col.dim = 'metric', - subcol.dim = 'sdate', - legend.dim = 'metric', - row.names = region.names, - subrow.names = forecast.months, - col.names = metric.names, - subcol.names = month.abb[as.numeric(start.months)], - table.title = table.title, - table.subtitle = table.subtitle, - row.title = 'Region', - subrow.title = 'Forecast Month', - col.title = 'Start date', - legend.breaks = legend.breaks, - plot.legend = plot.legend, - label.scale = label.scale, - legend.width = legend.width, - legend.height = legend.height, - palette = palette, - colorunder = legend.col.inf, - colorsup = legend.col.sup, - round.decimal = round.decimal, - font.size = font.size, - legend.white.space = legend.white.space, - col1.width = col1.width, - col2.width = col2.width, - fileout = fileout) + + data_sc_1 <- Subset(data, c('system', 'reference'), list(sys, ref), drop = 'selected') + + if(!is.null(sign)){ + sign_sc_1 <- Subset(sign, c('system', 'reference'), list(sys, ref), drop = 'selected') + } else { + sign_sc_1 <- NULL + } + + VizScorecard(data = data_sc_1, + sign = sign_sc_1, + row_dim = 'region', + subrow_dim = 'time', + col_dim = 'metric', + subcol_dim = 'sdate', + legend_dim = 'metric', + row_names = region.names, + subrow_names = forecast.months, + col_names = metric.names, + subcol_names = month.abb[as.numeric(start.months)], + table_title = table.title, + table_subtitle = table.subtitle, + row_title = 'Region', + subrow_title = 'Forecast Month', + col_title = 'Start date', + legend_breaks = legend.breaks, + plot_legend = plot.legend, + label_scale = label.scale, + legend_width = legend.width, + legend_height = legend.height, + palette = palette, + colorunder = legend.col.inf, + colorsup = legend.col.sup, + round_decimal = round.decimal, + font_size = font.size, + legend_white_space = legend.white.space, + col1_width = col1.width, + col2_width = col2.width, + columns_width = columns.width, + fileout = fileout) #### Scorecard_type 2 #### ## (reorder only) ## Scorecard type 2 is same as type 1 for only one region, therefore is ## only plotted if more that one region is requested - if(dim(input_data)['region'] > 1) { + if(dim(data)['region'] > 1) { + fileout <- .Filename(system = system[sys], reference = reference[ref], var = var, start.year = start.year, end.year = end.year, scorecard.type = 2, fileout.label = fileout.label, output.path = output.path) + new_order <- c('metric', 'region', 'sdate', 'time') - data_sc_2 <- Reorder(Subset(input_data, c('system', 'reference'), list(sys, ref), drop = 'selected'), new_order) - SCPlotScorecard(data = data_sc_2, - row.dim = 'time', - subrow.dim = 'region', - col.dim = 'metric', - subcol.dim = 'sdate', - legend.dim = 'metric', - row.names = forecast.months, - subrow.names = region.names, - col.names = metric.names, - subcol.names = month.abb[as.numeric(start.months)], - table.title = table.title, - table.subtitle = table.subtitle, - row.title = 'Forecast Month', - subrow.title = 'Region', - col.title = 'Start date', - legend.breaks = legend.breaks, - plot.legend = plot.legend, - label.scale = label.scale, - legend.width = legend.width, - legend.height = legend.height, - palette = palette, - colorunder = legend.col.inf, - colorsup = legend.col.sup, - round.decimal = round.decimal, - font.size = font.size, - legend.white.space = legend.white.space, - col1.width = col1.width, - col2.width = col2.width, - fileout = fileout) + + data_sc_2 <- Reorder(Subset(data, c('system', 'reference'), list(sys, ref), drop = 'selected'), new_order) + + if(!is.null(sign)){ + sign_sc_2 <- Reorder(Subset(sign, c('system', 'reference'), list(sys, ref), drop = 'selected'), new_order) + } else { + sign_sc_2 <- NULL + } + + VizScorecard(data = data_sc_2, + sign = sign_sc_2, + row_dim = 'time', + subrow_dim = 'region', + col_dim = 'metric', + subcol_dim = 'sdate', + legend_dim = 'metric', + row_names = forecast.months, + subrow_names = region.names, + col_names = metric.names, + subcol_names = month.abb[as.numeric(start.months)], + table_title = table.title, + table_subtitle = table.subtitle, + row_title = 'Forecast Month', + subrow_title = 'Region', + col_title = 'Start date', + legend_breaks = legend.breaks, + plot_legend = plot.legend, + label_scale = label.scale, + legend_width = legend.width, + legend_height = legend.height, + palette = palette, + colorunder = legend.col.inf, + colorsup = legend.col.sup, + round_decimal = round.decimal, + font_size = font.size, + legend_white_space = legend.white.space, + col1_width = col1.width, + col2_width = col2.width, + columns_width = columns.width, + fileout = fileout) } ## close if #### Scorecard_type 3 #### ## (transformation only) - fileout <- .Filename(system = system[sys], reference = reference[ref], var = var, - start.year = start.year, end.year = end.year, scorecard.type = 3, - fileout.label = fileout.label, output.path = output.path) - data_sc_3 <- Subset(transformed_data, c('system', 'reference'), list(sys, ref), drop = 'selected') - SCPlotScorecard(data = data_sc_3, - row.dim = 'region', - subrow.dim = 'time', - col.dim = 'metric', - subcol.dim = 'sdate', - legend.dim = 'metric', - row.names = region.names, - subrow.names = forecast.months, - col.names = metric.names, - subcol.names = month.abb[as.numeric(start.months)], - table.title = table.title, - table.subtitle = table.subtitle, - row.title = 'Region', - subrow.title = 'Forecast Month', - col.title = 'Target month', - legend.breaks = legend.breaks, - plot.legend = plot.legend, - label.scale = label.scale, - legend.width = legend.width, - legend.height = legend.height, - palette = palette, - colorunder = legend.col.inf, - colorsup = legend.col.sup, - round.decimal = round.decimal, - font.size = font.size, - legend.white.space = legend.white.space, - col1.width = col1.width, - col2.width = col2.width, - fileout = fileout) - + + if(length(start.months) >= length(forecast.months)){ + fileout <- .Filename(system = system[sys], reference = reference[ref], var = var, + start.year = start.year, end.year = end.year, scorecard.type = 3, + fileout.label = fileout.label, output.path = output.path) + + data_sc_3 <- Subset(transformed_data, c('system', 'reference'), list(sys, ref), drop = 'selected') + + if(!is.null(sign)){ + sign_sc_3 <- Subset(transformed_sign, c('system', 'reference'), list(sys, ref), drop = 'selected') + } else { + sign_sc_3 <- NULL + } + + VizScorecard(data = data_sc_3, + sign = sign_sc_3, + row_dim = 'region', + subrow_dim = 'time', + col_dim = 'metric', + subcol_dim = 'sdate', + legend_dim = 'metric', + row_names = region.names, + subrow_names = forecast.months, + col_names = metric.names, + subcol_names = month.abb[as.numeric(start.months)], + table_title = table.title, + table_subtitle = table.subtitle, + row_title = 'Region', + subrow_title = 'Forecast Month', + col_title = 'Target month', + legend_breaks = legend.breaks, + plot_legend = plot.legend, + label_scale = label.scale, + legend_width = legend.width, + legend_height = legend.height, + palette = palette, + colorunder = legend.col.inf, + colorsup = legend.col.sup, + round_decimal = round.decimal, + font_size = font.size, + legend_white_space = legend.white.space, + col1_width = col1.width, + col2_width = col2.width, + columns_width = columns.width, + fileout = fileout) + } #### Scorecard_type 4 #### ## (transformation and reorder) ## Scorecard type 4 is same as type 3 for only one region, therefore is ## only plotted if more that one region is requested - if(dim(input_data)['region'] > 1) { + if(dim(data)['region'] > 1 & length(start.months) >= length(forecast.months)){ + fileout <- .Filename(system = system[sys], reference = reference[ref], var = var, start.year = start.year, end.year = end.year, scorecard.type = 4, fileout.label = fileout.label, output.path = output.path) + new_order <- c('metric', 'region', 'sdate', 'time') + data_sc_4 <- Reorder(Subset(transformed_data, c('system', 'reference'), list(sys, ref), drop = 'selected'), new_order) - SCPlotScorecard(data = data_sc_4, - row.dim = 'time', - subrow.dim = 'region', - col.dim = 'metric', - subcol.dim = 'sdate', - legend.dim = 'metric', - row.names = forecast.months, - subrow.names = region.names, - col.names = metric.names, - subcol.names = month.abb[as.numeric(start.months)], - table.title = table.title, - table.subtitle = table.subtitle, - row.title = 'Forecast Month', - subrow.title = 'Region', - col.title = 'Target month', - legend.breaks = legend.breaks, - plot.legend = plot.legend, - label.scale = label.scale, - legend.width = legend.width, - legend.height = legend.height, - palette = palette, - colorunder = legend.col.inf, - colorsup = legend.col.sup, - round.decimal = round.decimal, - font.size = font.size, - legend.white.space = legend.white.space, - col1.width = col1.width, - col2.width = col2.width, - fileout = fileout) + + if(!is.null(sign)){ + sign_sc_4 <- Reorder(Subset(transformed_sign, c('system', 'reference'), list(sys, ref), drop = 'selected'), new_order) + } else { + sign_sc_4 + } + + VizScorecard(data = data_sc_4, + sign = sign_sc_4, + row_dim = 'time', + subrow_dim = 'region', + col_dim = 'metric', + subcol_dim = 'sdate', + legend_dim = 'metric', + row_names = forecast.months, + subrow_names = region.names, + col_names = metric.names, + subcol_names = month.abb[as.numeric(start.months)], + table_title = table.title, + table_subtitle = table.subtitle, + row_title = 'Forecast Month', + subrow_title = 'Region', + col_title = 'Target month', + legend_breaks = legend.breaks, + plot_legend = plot.legend, + label_scale = label.scale, + legend_width = legend.width, + legend_height = legend.height, + palette = palette, + colorunder = legend.col.inf, + colorsup = legend.col.sup, + round_decimal = round.decimal, + font_size = font.size, + legend_white_space = legend.white.space, + col1_width = col1.width, + col2_width = col2.width, + columns_width = columns.width, + fileout = fileout) } ## close if } ## close loop on ref diff --git a/modules/Scorecards/R/tmp/Utils.R b/modules/Scorecards/R/tmp/Utils.R index 6ba49e8c5c887c938902e30d200e49bff1af142e..caee98e497012b5e9dac36583545837489708265 100644 --- a/modules/Scorecards/R/tmp/Utils.R +++ b/modules/Scorecards/R/tmp/Utils.R @@ -201,8 +201,6 @@ } - - ## Output file name to save scorecard .Filename <- function(system = NULL, reference = NULL, model = NULL, eval.name = NULL, var = NULL, start.year = NULL, end.year = NULL, scorecard.type = NULL, @@ -230,96 +228,6 @@ return(scorecard_save_path) } -# Scorecards function to assign background color of table cells, -# color of text in table and to bold the text. -# -# It will return a list with 2 arrays: -# (1) metric.color, A 2-dimensional array with character strings containing the -# color codes for each cell background. -# (2) metric.text.color, A 2-dimensional array with character strings -# containing the color codes for each cell text. -.SCTableColors <- function(table, n.col, n.subcol, n.row, n.subrow, - legend.breaks, palette, colorunder, colorsup) { - # Define rows and columns - n.rows <- n.row * n.subrow - n.columns <- n.col * n.subcol - - ## Set table background colors - metric.color <- array(colorunder, c(n.row * n.subrow, n.columns)) - metric.text.color <- array("#2A2A2A", c(n.row * n.subrow , n.columns)) - # metric.text.bold <- array(TRUE, c(n.row * n.subrow , n.columns - 2)) ## Setting all values to bold - - ## Define cell and text colors to show in table - for (i in 1:n.col) { - metric.int <- legend.breaks[[i]] - for (rr in 1:n.rows) { - for (j in 1:n.subcol) { - for (pp in 1:(length(metric.int) - 1)) { - if (is.nan(table[rr,((i - 1) * n.subcol + j)])) { - metric.color[rr,((i - 1) * n.subcol + j)] <- "gray" - } else { - if (table[rr,((i - 1) * n.subcol + j)] >= - metric.int[pp] && table[rr,((i - 1) * n.subcol + j)] <= - metric.int[pp+1]) { - metric.color[rr,((i - 1) * n.subcol + j)] <- palette[[i]][pp] #palette[pp] - } - if (table[rr,((i - 1) * n.subcol + j)] < metric.int[1]) { - metric.color[rr,((i - 1) * n.subcol + j)] <- colorunder[i] - } - if (table[rr,((i - 1) * n.subcol + j)] >= - metric.int[length(metric.int)]) { - metric.color[rr,((i - 1) * n.subcol + j)] <- colorsup[i] - } - } - ## color text in white and bold if background is white or dark blue or dark red: - if (is.nan(table[rr,((i - 1) * n.subcol + j)]) || - (!is.nan(table[rr,((i - 1) * n.subcol + j)]) && pp == 1 && - table[rr,((i - 1) * n.subcol + j)] < metric.int[2]) || - (!is.nan(table[rr,((i - 1) * n.subcol + j)]) && pp == 2 && - table[rr,((i - 1) * n.subcol + j)] < metric.int[3]) || - (!is.nan(table[rr,((i - 1) * n.subcol + j)]) && pp == (length(metric.int) - 1) && - table[rr,((i - 1) * n.subcol + j)] >= metric.int[length(metric.int) - 1]) || - (!is.nan(table[rr,((i - 1) * n.subcol + j)]) && pp == (length(metric.int) - 2) && - table[rr,((i - 1) * n.subcol + j)] >= metric.int[length(metric.int) - 2])) { - metric.text.color[rr,((i - 1) * n.subcol + j)] <- "white" - #metric.text.bold[rr,((i - 1) * n.subcol + j)] <- TRUE - } - } - } - } - } - - return(list(metric.color = metric.color, - metric.text.color = metric.text.color)) - -} - -# Scorecards function to create the color bar legends for the required metrics -# and paste them below the scorecard table -.SCLegend <- function(legend.breaks, palette, colorunder, colorsup, - label.scale, legend.width, legend.height, - legend.white.space, fileout) { - - ## Create color bar legends for each metric - for (i in 1:length(palette)) { - png(filename = paste0(fileout, '_tmpLegend', i, '.png'), width = legend.width, - height = legend.height) - ColorBar(brks = legend.breaks[[i]], cols = palette[[i]], vertical = FALSE, - label_scale = label.scale, col_inf = colorunder[[i]], - col_sup = colorsup[[i]]) - dev.off() - if (i == 1) { - ## Add white space to the left of the first color bar legend - system(paste0('convert ', fileout, '_tmpLegend1.png -background white -splice ', - legend.white.space, 'x0 ', fileout, '_tmpScorecardLegend.png')) - } else { - system(paste0('convert +append ', fileout, '_tmpScorecardLegend.png ', - fileout, '_tmpLegend', i, '.png ', fileout, - '_tmpScorecardLegend.png')) - } - } - unlink(paste0(fileout,'_tmpLegend*.png')) -} # Function to calculate color bar breaks for bias metric .SCBiasBreaks <- function(data){ diff --git a/modules/Scorecards/R/tmp/VizScorecard.R b/modules/Scorecards/R/tmp/VizScorecard.R new file mode 100644 index 0000000000000000000000000000000000000000..425b799fd1b3b83f3574a959a837b96aeba24fcf --- /dev/null +++ b/modules/Scorecards/R/tmp/VizScorecard.R @@ -0,0 +1,627 @@ +#'Function to plot Scorecard tables +#' +#'This function renders a scorecard table from a multidimensional array +#'in HTML style. The structure of the table is based on the assignment of each +#'dimension of the array as a structure element: row, subrow, column or +#'subcolumn. It is useful to present tabular results with colors in a nice way. +#' +#'Note: Module PhantomJS is required. +#' +#'@param data A multidimensional array containing the data to be plotted with +#' at least four dimensions. Each dimension will have assigned a structure +#' element: row, subrow, column and subcolumn. +#'@param sign A multidimensional boolean array with the same dimensions as +#' 'data', indicting which values to be highlighted. If set to NULL no values +#' will be highlighted. +#'@param row_dim A character string indicating the dimension name to show in the +#' rows of the plot. It is set as 'region' by default. +#'@param subrow_dim A character string indicating the dimension name to show in +#' the sub-rows of the plot. It is set as 'time' by default. +#'@param col_dim A character string indicating the dimension name to show in the +#' columns of the plot. It is set as 'metric' by default. +#'@param subcol_dim A character string indicating the dimension name to show in +#' the sub-columns of the plot. It is set as 'sdate' by default. +#'@param legend_dim A character string indicating the dimension name to use for +#' the legend. It is set as 'metric' by default. +#'@param row_names A vector of character strings with row display names. It +#' is set as NULL by default. +#'@param subrow_names A vector of character strings with sub-row display names. +#' It is set as NULL by default. +#'@param col_names A vector of character strings with column display names. It +#' is set as NULL by default. +#'@param subcol_names A vector of character strings with sub-column display +#' names. It is set as NULL by default. +#'@param row_title A character string for the title of the row names. It is set +#' as NULL by default. +#'@param subrow_title A character string for the title of the sub-row names. It +#' is set as NULL by default. +#'@param table_title A character string for the title of the plot. It is set as +#' NULL by default. +#'@param table_subtitle A character string for the sub-title of the plot. It is +#' set as NULL by default. +#'@param legend_breaks A vector of numerics or a list of vectors of numerics, +#' containing the breaks for the legends. If a vector is given as input, then +#' these breaks will be repeated for each 'legend_dim'. A list of vectors can +#' be given as input if the 'legend_dims' require different breaks. This +#' parameter is required even if the legend is not plotted, to define the +#' colors in the scorecard table. It is set as NULL by default. +#'@param plot_legend A logical value to determine if the legend is plotted. It +#' is set as TRUE by default. +#'@param label_scale A numeric value to define the size of the legend labels. +#' It is set as 1.4 by default. +#'@param legend_width A numeric value to define the width of the legend bars. By +#' default it is set to NULL and calculated internally from the table width. +#'@param legend_height A numeric value to define the height of the legend bars. +#' It is set as 50 by default. +#'@param palette A vector of character strings or a list of vectors of +#' character strings containing the colors to use in the legends. If a vector +#' is given as input, then these colors will be used for each legend_dim. A +#' list of vectors can be given as input if different colors are desired for +#' the legend_dims. This parameter must be included even if the legend is +#' not plotted, to define the colors in the scorecard table. +#'@param colorunder A character string, a vector of character strings or a +#' list with single character string elements defining the colors to use for +#' data values with are inferior to the lowest breaks value. This parameter +#' will also plot a inferior triangle in the legend bar. The parameter can be +#' set to NULL if there are no inferior values. If a character string is given +#' this color will be applied to all 'legend_dims'. It is set as NULL by +#' default. +#'@param colorsup A character string, a vector of character strings or a +#' list with single character string elements defining the colors to use for +#' data values with are superior to the highest breaks value. This parameter +#' will also plot a inferior triangle in the legend bar. The parameter can be +#' set to NULL if there are no superior values. If a character string is given +#' this color will be applied to all legend_dims. It is set as NULL by default. +#'@param round_decimal A numeric indicating to which decimal point the data +#' is to be displayed in the scorecard table. It is set as 2 by default. +#'@param font_size A numeric indicating the font size on the scorecard table. +#' Default is 2. +#'@param legend_white_space A numeric value defining the initial starting +#' position of the legend bars, the white space infront of the legend is +#' calculated from the left most point of the table as a distance in cm. The +#' default value is 6. +#'@param columns_width A numeric value defining the width all columns within the +#' table in cm (excluding the first and second columns containing the titles). +#'@param col1_width A numeric value defining the width of the first table column +#' in cm. It is set as NULL by default. +#'@param col2_width A numeric value defining the width of the second table +#' column in cm. It is set as NULL by default. +#'@param fileout A path of the location to save the scorecard plots. By default +#' the plots will be saved to the working directory. +#' +#'@return An image file containing the scorecard. +#'@examples +#'data <- array(rnorm(1000), dim = c('sdate' = 12, 'metric' = 4, 'region' = 3, +#' 'time' = 6)) +#'row_names <- c('Tropics', 'Extra-tropical NH', 'Extra-tropical SH') +#'col_names <- c('Mean bias (K)', 'Correlation', 'RPSS','CRPSS') +#'VizScorecard(data = data, row_names = row_names, col_names = col_names, +#' subcol_names = month.abb[as.numeric(1:12)], +#' row_title = 'Region', subrow_title = 'Forecast Month', +#' col_title = 'Start date', +#' table_title = "Temperature of ECMWF System 5", +#' table_subtitle = "(Ref: ERA5 1994-2016)", +#' fileout = 'test.png') +#' +#'@import kableExtra +#'@importFrom RColorBrewer brewer.pal +#'@importFrom s2dv Reorder +#'@importFrom ClimProjDiags Subset +#'@importFrom CSTools MergeDims +#'@export +VizScorecard <- function(data, sign = NULL, row_dim = 'region', + subrow_dim = 'time', col_dim = 'metric', + subcol_dim = 'sdate', legend_dim = 'metric', + row_names = NULL, subrow_names = NULL, + col_names = NULL, subcol_names = NULL, + row_title = NULL, subrow_title = NULL, + col_title = NULL, table_title = NULL, + table_subtitle = NULL, legend_breaks = NULL, + plot_legend = TRUE, label_scale = 1.4, + legend_width = NULL, legend_height = 50, + palette = NULL, colorunder = NULL, colorsup = NULL, + round_decimal = 2, font_size = 1.1, + legend_white_space = 6, columns_width = 1.2, + col1_width = NULL, col2_width = NULL, + fileout = './scorecard.png') { + + # Input parameter checks + # Check data + if (!is.array(data)) { + stop("Parameter 'data' must be a numeric array.") + } + if (length(dim(data)) != 4) { + stop("Parameter 'data' must have four dimensions.") + } + dimnames <- names(dim(data)) + # Check sign + if (is.null(sign)) { + sign <- array(FALSE, dim = dim(data)) + } else { + if (!is.array(sign)) { + stop("Parameter 'sign' must be a boolean array or NULL.") + } + if (any(sort(names(dim(sign))) != sort(dimnames))) { + stop("Parameter 'sign' must have same dimensions as 'data'.") + } + if (typeof(sign) != 'logical') { + stop("Parameter 'sign' must be an array with logical values.") + } + } + # Check row_dim + if (!is.character(row_dim)) { + stop("Parameter 'row_dim' must be a character string.") + } + if (!row_dim %in% names(dim(data))) { + stop("Parameter 'row_dim' is not found in 'data' dimensions.") + } + # Check row_names + if (is.null(row_names)) { + row_names <- as.character(1:dim(data)[row_dim]) + } + if (length(row_names) != as.numeric(dim(data)[row_dim])) { + stop("Parameter 'row_names' must have the same length of dimension ", + "'row_dim'.") + } + # Check subrow_dim + if (!is.character(subrow_dim)) { + stop("Parameter 'subrow_dim' must be a character string.") + } + if (!subrow_dim %in% names(dim(data))) { + stop("Parameter 'subrow_dim' is not found in 'data' dimensions.") + } + # Check subrow_names + if (is.null(subrow_names)) { + subrow_names <- as.character(1:dim(data)[subrow_dim]) + } + if (length(subrow_names) != as.numeric(dim(data)[subrow_dim])) { + stop("Parameter 'subrow_names' must have the same length of dimension ", + "'subrow_dim'.") + } + # Check col_dim + if (!is.character(col_dim)) { + stop("Parameter 'col_dim' must be a character string.") + } + if (!col_dim %in% names(dim(data))) { + stop("Parameter 'col_dim' is not found in 'data' dimensions.") + } + # Check col_names + if (is.null(col_names)) { + col_names <- as.character(1:dim(data)[col_dim]) + } + if (length(col_names) != as.numeric(dim(data)[col_dim])) { + stop("Parameter 'col_names' must have the same length of dimension ", + "'col_dim'.") + } + # Check subcol_dim + if (!is.character(subcol_dim)) { + stop("Parameter 'subcol_dim' must be a character string.") + } + if (!subcol_dim %in% names(dim(data))) { + stop("Parameter 'subcol_dim' is not found in 'data' dimensions.") + } + # Check subcol_names + if (is.null(subcol_names)) { + subcol_names <- as.character(1:dim(data)[subcol_dim]) + } + if (length(subcol_names) != as.numeric(dim(data)[subcol_dim])) { + stop("Parameter 'subcol_names' must have the same length of dimension ", + "'subcol_dim'.") + } + # Check legend_dim + if (!is.character(legend_dim)) { + stop("Parameter 'legend_dim' must be a character string.") + } + if (!legend_dim %in% names(dim(data))) { + stop("Parameter 'legend_dim' is not found in 'data' dimensions.") + } + # Check row_title + if (is.null(row_title)) { + row_title <- "" + } else { + if (!is.character(row_title)) { + stop("Parameter 'row_title' must be a character string.") + } + } + # Check subrow_title + if (is.null(subrow_title)) { + subrow_title <- "" + } else { + if (!is.character(subrow_title)) { + stop("Parameter 'subrow_title' must be a character string.") + } + } + # Check col_title + if (is.null(col_title)) { + col_title <- "" + } else { + if (!is.character(col_title)) { + stop("Parameter 'col_title' must be a character string.") + } + } + # Check table_title + if (is.null(table_title)) { + table_title <- "" + } else { + if (!is.character(table_title)) { + stop("Parameter 'table_title' must be a character string.") + } + } + # Check table_subtitle + if (is.null(table_subtitle)) { + table_subtitle <- "" + } else { + if (!is.character(table_subtitle)) { + stop("Parameter 'table_subtitle' must be a character string.") + } + } + # Check legend_breaks + if (inherits(legend_breaks, 'list')) { + if (!(length(legend_breaks) == as.numeric(dim(data)[legend_dim]))) { + stop("Parameter 'legend_breaks' must be a list with the same number of ", + "elements as the length of the 'legend_dim' dimension in data.") + } + } else if (is.numeric(legend_breaks)) { + legend_breaks <- rep(list(legend_breaks), as.numeric(dim(data)[legend_dim])) + } else if (is.null(legend_breaks)) { + legend_breaks <- rep(list(seq(-1, 1, 0.2)), as.numeric(dim(data)[legend_dim])) + } else { + stop("Parameter 'legend_breaks' must be a numeric vector, a list or NULL.") + } + # Check plot_legend + if (!inherits(plot_legend, 'logical')) { + stop("Parameter 'plot_legend' must be a logical value.") + } + # Check label_scale + if (any(!is.numeric(label_scale), length(label_scale) != 1)) { + stop("Parameter 'label_scale' must be a numeric value of length 1.") + } + # Check legend_width + if (is.null(legend_width)) { + legend_width <- length(subcol_names) * 46.5 + } else if (any(!is.numeric(legend_width), length(legend_width) != 1)) { + stop("Parameter 'legend_width' must be a numeric value of length 1.") + } + # Check legend_height + if (any(!is.numeric(legend_height), length(legend_height) != 1)) { + stop("Parameter 'legend_height' must be a numeric value of length 1.") + } + # Check colour palette input + if (inherits(palette, 'list')) { + if (!(length(palette) == as.numeric(dim(data)[legend_dim]))) { + stop("Parameter 'palette' must be a list with the same number of ", + "elements as the length of the 'legend_dim' dimension in data.") + } + if (!all(sapply(palette, is.character))) { + stop("Parameter 'palette' must be a list of character vectors.") + } + } else if (is.character(palette)) { + palette <- rep(list(palette), as.numeric(dim(data)[legend_dim])) + } else if (is.null(palette)) { + n <- length(legend_breaks[[1]]) + if (n == 1) { + stop("Parameter 'legend_breaks' can't be of length 1.") + } else if (n == 2) { + colors <- c('#B35806') + } else if (n == 3) { + colors <- c('#8073AC', '#E08214') + } else if (n == 11) { + colors <- c('#2D004B', '#542789', '#8073AC', '#B2ABD2', '#D8DAEB', + '#FEE0B6', '#FDB863', '#E08214', '#B35806', '#7F3B08') + } else if (n > 11) { + stop("Parameter 'palette' must be provided when 'legend_breaks' ", + "exceed the length of 11.") + } else { + colors <- rev(brewer.pal(n-1, "PuOr")) + } + palette <- rep(list(colors), as.numeric(dim(data)[legend_dim])) + } else { + stop("Parameter 'palette' must be a character vector, a list or NULL.") + } + # Check colorunder + if (is.null(colorunder)) { + colorunder <- rep("#04040E", as.numeric(dim(data)[legend_dim])) + } + if (length(colorunder) == 1) { + colorunder <- rep(colorunder, as.numeric(dim(data)[legend_dim])) + } + if (length(colorunder) != as.numeric(dim(data)[legend_dim])) { + stop("Parameter 'colorunder' must be a character string vector or a list ", + "with the same number of elements as the length of the 'legend_dim' ", + "dimension in data.") + } + if (!is.character(unlist(colorunder))) { + stop("Parameter 'colorunder' must be a character string vector ", + "or a list of character string elements.") + } + # Check colorsup + if (is.null(colorsup)) { + colorsup <- rep("#730C04", as.numeric(dim(data)[legend_dim])) + } + if (length(colorsup) == 1) { + colorsup <- rep(colorsup, as.numeric(dim(data)[legend_dim])) + } + if (length(colorsup) != as.numeric(dim(data)[legend_dim])) { + stop("Parameter 'colorsup' must be a character string vector or a list ", + "with the same number of elements as the length of the 'legend_dim' ", + "dimension in data.") + } + if (!is.character(unlist(colorsup))) { + stop("Parameter 'colorsup' must be a character string vector ", + "or a list of character string elements.") + } + # Check round_decimal + if (!is.numeric(round_decimal)) { + stop("Parameter 'round_decimal' must be a numeric value of length 1.") + } + # Check font_size + if (!is.numeric(font_size)) { + stop("Parameter 'font_size' must be a numeric value of length 1.") + } + # Check legend white space + if (!is.numeric(legend_white_space)) { + stop("Parameter 'legend_white_space' must be a numeric value of length 1.") + } + # columns_width + if (!is.numeric(columns_width)) { + stop("Parameter 'columns_width' must be a numeric value.") + } + # Check col1_width + if (is.null(col1_width)) { + if (max(nchar(row_names)) == 1) { + col1_width <- max(nchar(row_names)) + } else { + col1_width <- max(nchar(row_names))/4 + } + } else if (!is.numeric(col1_width)) { + stop("Parameter 'col1_width' must be a numeric value of length 1.") + } + # Check col2_width + if (is.null(col2_width)) { + if (max(nchar(subrow_names)) == 1 ) { + col2_width <- max(nchar(subrow_names)) + } else { + col2_width <- max(nchar(subrow_names))/4 + } + } else if (!is.numeric(col2_width)) { + stop("Parameter 'col2_width' must be a numeric value of length 1.") + } + + # Get dimensions of inputs + n_col_names <- length(col_names) + n_subcol_names <- length(subcol_names) + n_row_names <- length(row_names) + n_subrow_names <- length(subrow_names) + + # Define table size + n_rows <- n_row_names * n_subrow_names + n_columns <- 2 + (n_col_names * n_subcol_names) + + # Column names + row_names_table <- rep("", n_rows) + for (row in 1:n_row_names) { + row_names_table[floor(n_subrow_names/2) + (row - 1) * n_subrow_names] <- row_names[row] + } + + # Define scorecard table titles + column_titles <- c(row_title, subrow_title, rep(c(subcol_names), n_col_names)) + + # Round data + data <- round(data, round_decimal) + + # Define data inside the scorecards table + for (row in 1:n_row_names) { + table_temp <- data.frame(table_column_2 = as.character(subrow_names)) + for (col in 1:n_col_names) { + table_temp <- data.frame(table_temp, + Reorder(data = Subset(x = data, along = c(col_dim, row_dim), + indices = list(col, row), drop = 'selected'), + order = c(subrow_dim, subcol_dim))) + } + if (row == 1) { + table_data <- table_temp + } else { + table_data <- rbind(table_data, table_temp) + } + } + + # All data for plotting in table + table <- data.frame(table_column_1 = row_names_table, table_data) + table_temp <- array(unlist(table[3:n_columns]), dim = c(n_rows, n_columns - 2)) + + # Define colors to show in table + table_colors <- .ScorecardColors(table = table_temp, n_col = n_col_names, + n_subcol = n_subcol_names, n_row = n_row_names, + n_subrow = n_subrow_names, legend_breaks = legend_breaks, + palette = palette, colorunder = colorunder, + colorsup = colorsup) + metric_color <- table_colors$metric_color + metric_text_color <- table_colors$metric_text_color + # metric_text_bold <- table_colors$metric_text_bold + + # Remove temporary table + rm(table_temp) + + # Format values to underline in table + metric_underline <- MergeDims(sign, c(subcol_dim, col_dim), + rename_dim = 'col', na.rm = FALSE) + metric_underline <- MergeDims(metric_underline, c(subrow_dim, row_dim), + rename_dim = 'row', na.rm = FALSE) + metric_underline <- Reorder(metric_underline, c('row', 'col')) + + options(stringsAsFactors = FALSE) + title <- data.frame(c1 = table_title, c2 = n_columns) + subtitle <- data.frame(c1 = table_subtitle, c2 = n_columns) + header_names <- as.data.frame(data.frame(c1 = c("", col_names), + c2 = c(2, rep(n_subcol_names, n_col_names)))) + header_names2 <- as.data.frame(data.frame(c1 = c("", paste0(rep(col_title, n_col_names))), + c2 = c(2, rep(n_subcol_names, n_col_names)))) + title_space <- data.frame(c1 = "\n", c2 = n_columns) + + # Hide NA values in table + options(knitr.kable.NA = '') + + # Create HTML table + table_html_part <- list() + table_html_part[[1]] <- kbl(table, escape = F, col.names = column_titles, align = rep("c", n_columns)) %>% + kable_paper("hover", full_width = FALSE, font_size = 14 * font_size) %>% + add_header_above(header = header_names2, font_size = 16 * font_size) %>% + add_header_above(header = title_space, font_size = 10 * font_size) %>% + add_header_above(header = header_names, font_size = 20 * font_size) %>% + add_header_above(header = title_space, font_size = 10 * font_size) %>% + add_header_above(header = subtitle, font_size = 16 * font_size, align = "left") %>% + add_header_above(header = title_space, font_size = 10 * font_size) %>% + add_header_above(header = title, font_size = 22 * font_size, align = "left") + + for (i in 1:n_col_names) { + for (j in 1:n_subcol_names) { + my_background <- metric_color[, (i - 1) * n_subcol_names + j] + my_text_color <- metric_text_color[, (i - 1) * n_subcol_names + j] + my_underline <- metric_underline[, (i - 1) * n_subcol_names + j] + # my_bold <- metric_text_bold[(i - 1) * n_subcol_names + j] + + table_html_part[[(i - 1) * n_subcol_names + j + 1]] <- + column_spec(table_html_part[[(i - 1) * n_subcol_names + j]], + 2 + n_subcol_names * (i - 1) + j, + background = my_background[1:n_rows], + color = my_text_color[1:n_rows], + underline = my_underline[1:n_rows], + bold = T) # strsplit(toString(bold), ', ')[[1]] + } + } + + # Define position of table borders + column_borders <- NULL + for (i in 1:n_col_names) { + column_spacing <- (n_subcol_names * i) + 2 + column_borders <- c(column_borders, column_spacing) + } + + n_last_list <- n_col_names * n_subcol_names + 1 + + table_html <- column_spec(table_html_part[[n_last_list]], 1, bold = TRUE, + width_min = paste0(col1_width, 'cm')) %>% + column_spec(2, bold = TRUE, width_min = paste0(col2_width, 'cm')) %>% + column_spec(3:n_columns, width_min = paste0(columns_width, 'cm')) %>% + column_spec(c(1, 2, column_borders), border_right = "2px solid black") %>% + column_spec(1, border_left = "2px solid black") %>% + column_spec(n_columns, border_right = "2px solid black") %>% + row_spec(seq(from = 0, to = n_subrow_names * n_row_names, by = n_subrow_names), + extra_css = "border-bottom: 2px solid black", hline_after = TRUE) + if (plot_legend == TRUE) { + # Save the scorecard (without legend) + save_kable(table_html, file = paste0(fileout, '_tmpScorecard.png'), vheight = 1) + + # White space for legend + legend_white_space <- 37.8 * legend_white_space # converting pixels to cm + + # Create and save color bar legend + .ScorecardLegend(legend_breaks = legend_breaks, + palette = palette, + colorunder = colorunder, + colorsup = colorsup, + label_scale = label_scale, + legend_width = legend_width, + legend_height = legend_height, + legend_white_space = legend_white_space, + fileout = fileout) + + # Add the legends below the scorecard table + system(paste0('convert -append ', fileout, '_tmpScorecard.png ', fileout, + '_tmpScorecardLegend.png ', fileout)) + # Remove temporary scorecard table + unlink(paste0(fileout, '_tmpScorecard*.png')) + } + if (plot_legend == FALSE) { + save_kable(table_html, file = fileout) + } +} + +# Scorecards function to assign background color of table cells, +# color of text in table and to bold the text. +# +# It will return a list with 2 arrays: +# (1) metric_color, A 2-dimensional array with character strings containing the +# color codes for each cell background. +# (2) metric_text_color, A 2-dimensional array with character strings +# containing the color codes for each cell text. +.ScorecardColors <- function(table, n_col, n_subcol, n_row, n_subrow, + legend_breaks, palette, colorunder, colorsup) { + # Define rows and columns + n_rows <- n_row * n_subrow + n_columns <- n_col * n_subcol + + # Set table background colors + metric_color <- array(colorunder, c(n_row * n_subrow, n_columns)) + metric_text_color <- array("#2A2A2A", c(n_row * n_subrow , n_columns)) + # metric_text_bold <- array(TRUE, c(n_row * n_subrow , n_columns - 2)) # Setting all values to bold + + # Define cell and text colors to show in table + for (i in 1:n_col) { + metric_int <- legend_breaks[[i]] + for (rr in 1:n_rows) { + for (j in 1:n_subcol) { + for (pp in 1:(length(metric_int) - 1)) { + if (is.na(table[rr, ((i - 1) * n_subcol + j)])) { + metric_color[rr, ((i - 1) * n_subcol + j)] <- "gray" + } else { + if (table[rr, ((i - 1) * n_subcol + j)] >= + metric_int[pp] && table[rr, ((i - 1) * n_subcol + j)] <= + metric_int[pp + 1]) { + metric_color[rr, ((i - 1) * n_subcol + j)] <- palette[[i]][pp] # palette[pp] + } + if (table[rr, ((i - 1) * n_subcol + j)] < metric_int[1]) { + metric_color[rr, ((i - 1) * n_subcol + j)] <- colorunder[i] + } + if (table[rr,((i - 1) * n_subcol + j)] >= + metric_int[length(metric_int)]) { + metric_color[rr, ((i - 1) * n_subcol + j)] <- colorsup[i] + } + } + # color text in white and bold if background is white or dark blue or dark red: + if (is.na(table[rr, ((i - 1) * n_subcol + j)]) || + (!is.na(table[rr, ((i - 1) * n_subcol + j)]) && pp == 1 && + table[rr, ((i - 1) * n_subcol + j)] < metric_int[2]) || + (!is.na(table[rr, ((i - 1) * n_subcol + j)]) && pp == 2 && + table[rr, ((i - 1) * n_subcol + j)] < metric_int[3]) || + (!is.na(table[rr, ((i - 1) * n_subcol + j)]) && pp == (length(metric_int) - 1) && + table[rr, ((i - 1) * n_subcol + j)] >= metric_int[length(metric_int) - 1]) || + (!is.na(table[rr, ((i - 1) * n_subcol + j)]) && pp == (length(metric_int) - 2) && + table[rr, ((i - 1) * n_subcol + j)] >= metric_int[length(metric_int) - 2])) { + metric_text_color[rr, ((i - 1) * n_subcol + j)] <- "white" + # metric_text_bold[rr,((i - 1) * n_subcol + j)] <- TRUE + } + } + } + } + } + return(list(metric_color = metric_color, + metric_text_color = metric_text_color)) +} + +# Scorecards function to create the color bar legends for the required metrics +# and paste them below the scorecard table +.ScorecardLegend <- function(legend_breaks, palette, colorunder, colorsup, + label_scale, legend_width, legend_height, + legend_white_space, fileout) { + + # Create color bar legends for each metric + for (i in 1:length(palette)) { + png(filename = paste0(fileout, '_tmpLegend', i, '.png'), width = legend_width, + height = legend_height) + ColorBarContinuous(brks = legend_breaks[[i]], cols = palette[[i]], vertical = FALSE, + label_scale = label_scale, col_inf = colorunder[[i]], + col_sup = colorsup[[i]]) + dev.off() + if (i == 1) { + # Add white space to the left of the first color bar legend + system(paste0('convert ', fileout, '_tmpLegend1.png -background white -splice ', + legend_white_space, 'x0 ', fileout, '_tmpScorecardLegend.png')) + } else { + system(paste0('convert +append ', fileout, '_tmpScorecardLegend.png ', + fileout, '_tmpLegend', i, '.png ', fileout, + '_tmpScorecardLegend.png')) + } + } + unlink(paste0(fileout,'_tmpLegend*.png')) +} diff --git a/modules/Scorecards/R/tmp/WeightedMetrics.R b/modules/Scorecards/R/tmp/WeightedMetrics.R index aea23c566851f523cc8ee19ae2336861e329d0c0..9d1630c419ebc967fb6dc94ba59c3a4f0d5b1461 100644 --- a/modules/Scorecards/R/tmp/WeightedMetrics.R +++ b/modules/Scorecards/R/tmp/WeightedMetrics.R @@ -27,8 +27,8 @@ #'@importFrom ClimProjDiags WeightedMean #'@importFrom s2dv Reorder #'@export -WeightedMetrics <- function(loaded_metrics, regions, metric.aggregation, - ncores = NULL, na.rm = TRUE) { +WeightedMetrics <- function(loaded_metrics, regions, forecast.months, + metric.aggregation, ncores = NULL, na.rm = TRUE) { ## Initial checks # loaded_metrics if (any(sapply(loaded_metrics, function(x) { @@ -53,9 +53,8 @@ WeightedMetrics <- function(loaded_metrics, regions, metric.aggregation, ## Get metric names ## TO DO: check all metric are in the same order for all sys - metrics <- attributes(loaded_metrics[[1]][[1]])$metrics - forecast.months <- attributes(loaded_metrics[[1]][[1]])$forecast.months - start.months <- attributes(loaded_metrics[[1]][[1]])$start.months + metrics <- attributes(loaded_metrics)$metrics + start.months <- attributes(loaded_metrics)$start_months all_metric_means <- array(dim = c(metric = length(metrics), time = length(forecast.months), @@ -83,47 +82,20 @@ WeightedMetrics <- function(loaded_metrics, regions, metric.aggregation, latdim = lat_dim_name, na.rm = na.rm, ncores = ncores) + all_metric_means[, , , reg, ref, sys] <- weighted.mean + } ## close loop on region } ## close loop on reference } ## close loop on system - - ## skill aggregation: - if (metric.aggregation == 'score') { - if (all(c("rps", "rps_clim") %in% metrics)) { - ## Calculate RPSS from aggregated RPS and RPS_clim - all_metric_means <- multiApply::Apply(data = all_metric_means, - target_dims = 'metric', - fun = function(x, met) { - res <- 1 - x[which(met == 'rps')] / x[which(met == 'rps_clim')] - c(x, res)}, met = metrics, - output_dims = 'metric', - ncores = ncores)$output1 - ## Define name of newly calculated RPSS metric - metrics <- c(metrics, "rpss_score_aggr") - } - if (all(c("crps", "crps_clim") %in% metrics)) { - ## Calculate CRPSS from aggragated CRPS and CRPS_clim - all_metric_means <- multiApply::Apply(data = all_metric_means, - target_dims = 'metric', - fun = function(x, met) { - res <- 1 - x[which(met == 'crps')] / x[which(met == 'crps_clim')] - c(x, res)}, - met = metrics, - output_dims = 'metric', - ncores = ncores)$output1 - ## Define name of newly calculated CRPSS metric - metrics <- c(metrics, "crpss_score_aggr") - } - ## Add warning in case metric.aggregation == 'score' but 1 of the metrics from each pair is missing - } + ## reorder dimensions in array all_metric_means <- s2dv::Reorder(all_metric_means, c('system','reference','metric','time','sdate','region')) ## Add attributes attributes(all_metric_means)$metrics <- metrics - attributes(all_metric_means)$start.months <- attributes(loaded_metrics[[1]][[1]])$start.months - attributes(all_metric_means)$forecast.months <- attributes(loaded_metrics[[1]][[1]])$forecast.months + attributes(all_metric_means)$start.months <- start.months + attributes(all_metric_means)$forecast.months <- forecast.months attributes(all_metric_means)$regions <- regions attributes(all_metric_means)$system.name <- names(loaded_metrics) attributes(all_metric_means)$reference.name <- names(loaded_metrics[[1]]) diff --git a/modules/Scorecards/Scorecards.R b/modules/Scorecards/Scorecards.R index 0dbcd9210c9227200c872204526ea4e6df05adcb..37aa421c978d8aad8479b9b972684222207ddd5f 100644 --- a/modules/Scorecards/Scorecards.R +++ b/modules/Scorecards/Scorecards.R @@ -10,127 +10,525 @@ source('modules/Scorecards/R/tmp/SCTransform.R') source('modules/Scorecards/R/tmp/ScorecardsSingle.R') source('modules/Scorecards/R/tmp/ScorecardsMulti.R') source('modules/Scorecards/R/tmp/ScorecardsSystemDiff.R') -source('modules/Scorecards/R/tmp/SCPlotScorecard.R') +source('modules/Scorecards/R/tmp/VizScorecard.R') +## Temporary for new ESviz function +source('modules/Scorecards/R/tmp/ColorBarContinuous.R') +source('modules/Scorecards/R/tmp/ClimPalette.R') +.IsColor <- s2dv:::.IsColor +.FilterUserGraphicArgs <- s2dv:::.FilterUserGraphicArgs -## TODO: Change function name to 'Scorecards'? ## Define function Scorecards <- function(recipe) { - ## set parameters - input.path <- paste0(recipe$Run$output_dir, "/outputs/Skill/") + ## Parameters for loading data files + skill.input.path <- paste0(recipe$Run$output_dir, "/outputs/Skill/") + stats.input.path <- paste0(recipe$Run$output_dir, "/outputs/Statistics/") output.path <- paste0(recipe$Run$output_dir, "/plots/Scorecards/") dir.create(output.path, recursive = T, showWarnings = F) - system <- recipe$Analysis$Datasets$System$name reference <- recipe$Analysis$Datasets$Reference$name var <- recipe$Analysis$Variables$name start.year <- as.numeric(recipe$Analysis$Time$hcst_start) end.year <- as.numeric(recipe$Analysis$Time$hcst_end) forecast.months <- recipe$Analysis$Time$ftime_min : recipe$Analysis$Time$ftime_max - - if (recipe$Analysis$Workflow$Scorecards$start_months == 'all') { - start.months <- 1:12 + calib.method <- tolower(recipe$Analysis$Workflow$Calibration$method) + + if (recipe$Analysis$Workflow$Scorecards$start_months == 'all' || is.null(recipe$Analysis$Workflow$Scorecards$start_months)) { + start.months <- as.numeric(substr(recipe$Analysis$Time$sdate, 1,2)) } else { start.months <- as.numeric(strsplit(recipe$Analysis$Workflow$Scorecards$start_months, split = ", | |,")[[1]]) + if(!any(as.numeric(substr(recipe$Analysis$Time$sdate, 1,2))) %in% start.months){ + error(recipe$Run$logger,"Requested start dates for scorecards must be loaded") + } } + start.months <- sprintf("%02d", start.months) + period <- paste0(start.year, "-", end.year) + + ## Parameters for data aggregation regions <- recipe$Analysis$Workflow$Scorecards$regions for (i in names(regions)){regions[[i]] <- unlist(regions[[i]])} metric.aggregation <- recipe$Analysis$Workflow$Scorecards$metric_aggregation metrics.load <- unlist(strsplit(tolower(recipe$Analysis$Workflow$Skill$metric), ", | |,")) + metrics.visualize <- unlist(strsplit(tolower(recipe$Analysis$Workflow$Scorecards$metric), ", | |,")) + ncores <- 1 # recipe$Analysis$ncores - ## Define skill scores in score aggregation has been requested - - if(metric.aggregation == 'score'){ - if('rps' %in% metrics.load){ - metrics.load <- c(metrics.load, 'rps_clim') - } - if('crps' %in% metrics.load){ - metrics.load <- c(metrics.load, 'crps_clim') - } + if(is.null(recipe$Analysis$Workflow$Scorecards$signif_alpha)){ + alpha <- 0.05 + } else { + alpha <- recipe$Analysis$Workflow$Scorecards$signif_alpha } - metrics.visualize <- unlist(strsplit(tolower(recipe$Analysis$Workflow$Scorecards$metric), ", | |,")) - - ## Define skill scores in score aggregation has been requested + if (is.null(recipe$Analysis$Workflow$Scorecards$inf_to_na)){ + inf.to.na <- FALSE + } else { + inf.to.na <- recipe$Analysis$Workflow$Scorecards$inf_to_na + } - if(metric.aggregation == 'score'){ - if('rpss' %in% metrics.visualize){ - metrics.visualize[metrics.visualize == 'rpss'] <- 'rpss_score_aggr' - } - if('crpss' %in% metrics.visualize){ - metrics.visualize[metrics.visualize == 'crpss'] <- 'crpss_score_aggr' - } + if(is.null(recipe$Analysis$remove_NAs)){ + na.rm <- FALSE + } else { + na.rm <- recipe$Analysis$remove_NAs } - inf.to.na <- recipe$Analysis$Workflow$Scorecards$inf_to_na + ## Parameters for scorecard layout table.label <- recipe$Analysis$Workflow$Scorecards$table_label fileout.label <- recipe$Analysis$Workflow$Scorecards$fileout_label - legend.white.space <- recipe$Analysis$Workflow$Scorecards$legend_white_space col1.width <- recipe$Analysis$Workflow$Scorecards$col1_width col2.width <- recipe$Analysis$Workflow$Scorecards$col2_width - calculate.diff <- recipe$Analysis$Workflow$Scorecards$calculate_diff - ncores <- 1 # recipe$Analysis$ncores + legend.breaks <- recipe$Analysis$Workflow$Scorecards$legend_breaks + legend.width <- recipe$Analysis$Workflow$Scorecards$legend_width - ## Load data files - loaded_metrics <- LoadMetrics(system = system, - reference = reference, - var = var, - start.year = start.year, - end.year = end.year, - metrics = metrics.load, - start.months = start.months, - forecast.months = forecast.months, - inf_to_na = inf.to.na, - input.path = input.path) + if (is.null(recipe$Analysis$Workflow$Scorecards$plot_legend)){ + plot.legend <- TRUE + } else { + plot.legend <- recipe$Analysis$Workflow$Scorecards$plot_legend + } + if(is.null(recipe$Analysis$Workflow$Scorecards$columns_width)){ + columns.width <- 1.2 + } else { + columns.width <- recipe$Analysis$Workflow$Scorecards$columns_width + } - if('region' %in% names(dim(loaded_metrics[[1]][[1]]))){ - - ### Convert loaded metrics to array for allready aggregated data - metrics.dim <- attributes(loaded_metrics[[1]][[1]])$metrics - forecast.months.dim <- attributes(loaded_metrics[[1]][[1]])$forecast.months - start.months.dim <- attributes(loaded_metrics[[1]][[1]])$start.months - regions.dim <- regions #list('NAO' = c(lon.min = -80, lon.max = 40, lat.min = 20, lat.max = 80)) - - aggregated_metrics <- array(dim = c(system = length(loaded_metrics), - reference = length(loaded_metrics[[1]]), - metric = length(metrics.dim), - time = length(forecast.months.dim), - sdate = length(start.months.dim), - region = length(regions.dim))) + if(is.null(recipe$Analysis$Workflow$Scorecards$legend_white_space)){ + legend.white.space <- 6 + } else { + legend.white.space <- recipe$Analysis$Workflow$Scorecards$legend_white_space + } + + if(is.null(recipe$Analysis$Workflow$Scorecards$legend_height)){ + legend.height <- 50 + } else { + legend.height <- recipe$Analysis$Workflow$Scorecards$legend_height + } + + if(is.null(recipe$Analysis$Workflow$Scorecards$label_scale)){ + label.scale <- 1.4 + } else { + label.scale <- recipe$Analysis$Workflow$Scorecards$label_scale + } + + if(is.null(recipe$Analysis$Workflow$Scorecards$round_decimal)){ + round.decimal <- 2 + } else { + round.decimal <- recipe$Analysis$Workflow$Scorecards$round_decimal + } + + if(is.null(recipe$Analysis$Workflow$Scorecards$font_size)){ + font.size <- 1.1 + } else { + font.size <- recipe$Analysis$Workflow$Scorecards$font_size + } + + ## Define if difference scorecard is to be plotted + if (is.null(recipe$Analysis$Workflow$Scorecards$calculate_diff)){ + calculate.diff <- FALSE + } else { + calculate.diff <- recipe$Analysis$Workflow$Scorecards$calculate_diff + } + + ####### SKILL AGGREGATION ####### + if(metric.aggregation == 'skill'){ + + ## Load data files + loaded_metrics <- LoadMetrics(input_path = skill.input.path, + system = system, + reference = reference, + var = var, + metrics = metrics.visualize, + period = period, + start_months = start.months, + calib_method = calib.method, + inf_to_na = inf.to.na + ) + + ## Spatial Aggregation of metrics + if('region' %in% names(dim(loaded_metrics[[1]][[1]]))){ + + ### Convert loaded metrics to array for already aggregated data + metrics.dim <- attributes(loaded_metrics)$metrics + forecast.months.dim <- forecast.months + start.months.dim <- attributes(loaded_metrics)$start_months + regions.dim <- regions #list('NAO' = c(lon.min = -80, lon.max = 40, lat.min = 20, lat.max = 80)) + + aggregated_metrics <- array(dim = c(system = length(loaded_metrics), + reference = length(loaded_metrics[[1]]), + metric = length(metrics.dim), + time = length(forecast.months.dim), + sdate = length(start.months.dim), + region = length(regions.dim))) + + + for (sys in 1:length(names(loaded_metrics))){ + for (ref in 1:length(names(loaded_metrics[[sys]]))){ + aggregated_metrics[sys, ref, , , , ] <- s2dv::Reorder(data = loaded_metrics[[sys]][[ref]], order = c('metric','time','sdate','region')) + } + } + + ## Add attributes + attributes(aggregated_metrics)$metrics <- metrics.load + attributes(aggregated_metrics)$start.months <- attributes(loaded_metrics)$start_months + attributes(aggregated_metrics)$forecast.months <- forecast.months + attributes(aggregated_metrics)$regions <- regions + attributes(aggregated_metrics)$system.name <- names(loaded_metrics) + attributes(aggregated_metrics)$reference.name <- names(loaded_metrics[[1]]) + + + } else { + ## Calculate weighted mean of spatial aggregation + aggregated_metrics <- WeightedMetrics(loaded_metrics, + regions = regions, + forecast.months = forecast.months, + metric.aggregation = metric.aggregation, + ncores = ncores) + } ## close if on region + metrics_significance <- NULL + } ## close if on skill + + ###### SCORE AGGREGATION ###### + if(metric.aggregation == 'score'){ + + lon_dim <- 'longitude' + lat_dim <- 'latitude' + time_dim <- 'syear' + memb_dim <- 'ensemble' + + ## Define arrays to filled with data + aggregated_metrics <- array(data = NA, + dim = c(system = length(system), + reference = length(reference), + time = length(forecast.months), + sdate = length(start.months), + region = length(regions), + metric = length(metrics.visualize))) - for (sys in 1:length(names(loaded_metrics))){ - for (ref in 1:length(names(loaded_metrics[[sys]]))){ - aggregated_metrics[sys, ref, , , , ] <- s2dv::Reorder(data = loaded_metrics[[sys]][[ref]], order = c('metric','time','sdate','region')) - } - } + metrics_significance <- array(data = NA, + dim = c(system = length(system), + reference = length(reference), + time = length(forecast.months), + sdate = length(start.months), + region = length(regions), + metric = length(metrics.visualize))) - ## Add attributes - attributes(aggregated_metrics)$metrics <- metrics.load - attributes(aggregated_metrics)$start.months <- attributes(loaded_metrics[[1]][[1]])$start.months - attributes(aggregated_metrics)$forecast.months <- attributes(loaded_metrics[[1]][[1]])$forecast.months - attributes(aggregated_metrics)$regions <- regions - attributes(aggregated_metrics)$system.name <- names(loaded_metrics) - attributes(aggregated_metrics)$reference.name <- names(loaded_metrics[[1]]) + ## Load and aggregated data for each metric + for (sys in 1:length(system)){ + for (ref in 1:length(reference)){ + for (met in metrics.visualize) { + + if(met == 'rpss'){ + ## Load data from saved files + rps_syear <- .loadmetrics(input_path = skill.input.path, system = system[sys], + reference = reference[ref], var = var, + period = period, start_months = start.months, + calib_method = calib.method, metric = 'rps_syear') + + rps_clim_syear <- .loadmetrics(input_path = skill.input.path, system = system[sys], + reference = reference[ref], var = var, + period = period, start_months = start.months, + calib_method = calib.method, metric = 'rps_clim_syear') + + ## Spatially aggregate data + rps_syear <- sapply(X = 1:length(regions), + FUN = function(X) { + WeightedMean(data = rps_syear, + region = regions[[X]], + lon = as.vector(attributes(rps_syear)$lon), + lat = as.vector(attributes(rps_syear)$lat), + londim = lon_dim, + latdim = lat_dim, + na.rm = F) + }, simplify = 'array') + + rps_clim_syear <- sapply(X = 1:length(regions), + FUN = function(X) { + WeightedMean(data = rps_clim_syear, + region = regions[[X]], + lon = as.vector(attributes(rps_clim_syear)$lon), + lat = as.vector(attributes(rps_clim_syear)$lat), + londim = lon_dim, + latdim = lat_dim, + na.rm = F) + }, simplify = 'array') + + ## Include name of region dimension + names(dim(rps_syear))[length(dim(rps_syear))] <- 'region' + names(dim(rps_clim_syear))[length(dim(rps_clim_syear))] <- 'region' + + ## Calculate significance + sign_rpss <- RandomWalkTest(rps_syear, rps_clim_syear, + time_dim = time_dim, test.type = 'two.sided', + alpha = alpha, pval = FALSE, sign = TRUE, + ncores = NULL)$sign + + ## Temporally aggregate data + rps_syear <- Apply(data = rps_syear, + target_dims = time_dim, + fun = 'mean', ncores = ncores)$output1 + + rps_clim_syear <- Apply(data = rps_clim_syear, + target_dims = time_dim, + fun = 'mean', ncores = ncores)$output1 + + ## Calculate RPSS from aggregated RPS and RPS_clim + rpss <- 1 - rps_syear / rps_clim_syear + + ## Save metric result in arrays + aggregated_metrics[sys, ref, , , ,which(metrics.visualize == met)] <- s2dv::Reorder(data = rpss, order = c('time', 'sdate','region')) + metrics_significance[sys, ref, , , , which(metrics.visualize == met)] <- s2dv::Reorder(data = sign_rpss, order = c('time', 'sdate','region')) + + } ## close if on rpss + + if(met == 'crpss'){ + + ## Load data from saved files + crps_syear <- .loadmetrics(input_path = skill.input.path, system = system[sys], + reference = reference[ref], var = var, + period = period, start_months = start.months, + calib_method = calib.method, metric = 'crps_syear') + + crps_clim_syear <- .loadmetrics(input_path = skill.input.path, system = system[sys], + reference = reference[ref], var = var, + period = period, start_months = start.months, + calib_method = calib.method, metric = 'crps_clim_syear') + + ## Spatially aggregate data + crps_syear <- sapply(X = 1:length(regions), + FUN = function(X) { + WeightedMean(data = crps_syear, + region = regions[[X]], + lon = as.vector(attributes(crps_syear)$lon), + lat = as.vector(attributes(crps_syear)$lat), + londim = lon_dim, + latdim = lat_dim, + na.rm = na.rm) + }, simplify = 'array') + + crps_clim_syear <- sapply(X = 1:length(regions), + FUN = function(X) { + WeightedMean(data = crps_clim_syear, + region = regions[[X]], + lon = as.vector(attributes(crps_clim_syear)$lon), + lat = as.vector(attributes(crps_clim_syear)$lat), + londim = lon_dim, + latdim = lat_dim, + na.rm = na.rm) + }, simplify = 'array') + + ## Include name of region dimension + names(dim(crps_syear))[length(dim(crps_syear))] <- 'region' + names(dim(crps_clim_syear))[length(dim(crps_clim_syear))] <- 'region' + + ## Calculate significance + sign_crpss <- RandomWalkTest(crps_syear, crps_clim_syear, + time_dim = time_dim, test.type = 'two.sided', + alpha = alpha, pval = FALSE, sign = TRUE, + ncores = NULL)$sign + + ## Temporally aggregate data + crps_syear <- Apply(data = crps_syear, + target_dims = time_dim, + fun = 'mean', ncores = ncores)$output1 + + crps_clim_syear <- Apply(data = crps_clim_syear, + target_dims = time_dim, + fun = 'mean', ncores = ncores)$output1 + + ## Calculate CRPSS from aggregated CRPS and CRPS_clim + crpss <- 1 - crps_syear / crps_clim_syear + + ## Save metric result in arrays + aggregated_metrics[sys, ref, , , , which(metrics.visualize == met)] <- s2dv::Reorder(data = crpss, order = c('time', 'sdate','region')) + metrics_significance[sys, ref, , , , which(metrics.visualize == met)] <- s2dv::Reorder(data = sign_crpss, order = c('time', 'sdate','region')) + + } ## close if on crpss + + if(met == 'enscorr'){ + ## Load data from saved files + cov <- .loadmetrics(input_path = stats.input.path, system = system[sys], + reference = reference[ref], var = var, + period = period, start_months = start.months, + calib_method = calib.method, metric = 'cov') + + std_hcst <- .loadmetrics(input_path = stats.input.path, system = system[sys], + reference = reference[ref], var = var, + period = period, start_months = start.months, + calib_method = calib.method, metric = 'std_hcst') + + std_obs <- .loadmetrics(input_path = stats.input.path, system = system[sys], + reference = reference[ref], var = var, + period = period, start_months = start.months, + calib_method = calib.method, metric = 'std_obs') + + + n_eff <- .loadmetrics(input_path = stats.input.path, system = system[sys], + reference = reference[ref], var = var, + period = period, start_months = start.months, + calib_method = calib.method, metric = 'n_eff') + + ## Calculate spatial aggregation + cov <- sapply(X = 1:length(regions), + FUN = function(X) { + WeightedMean(data = cov, + region = regions[[X]], + lon = as.vector(attributes(cov)$lon), + lat = as.vector(attributes(cov)$lat), + londim = lon_dim, + latdim = lat_dim, + na.rm = na.rm) + }, simplify = 'array') + + ## Include name of region dimension + names(dim(cov))[length(dim(cov))] <- 'region' + + std_hcst <- sapply(X = 1:length(regions), + FUN = function(X) { + WeightedMean(data = std_hcst, + region = regions[[X]], + lon = as.vector(attributes(std_hcst)$lon), + lat = as.vector(attributes(std_hcst)$lat), + londim = lon_dim, + latdim = lat_dim, + na.rm = na.rm) + }, simplify = 'array') + + names(dim(std_hcst))[length(dim(std_hcst))] <- 'region' + + std_obs <- sapply(X = 1:length(regions), + FUN = function(X) { + WeightedMean(data = std_obs, + region = regions[[X]], + lon = as.vector(attributes(std_obs)$lon), + lat = as.vector(attributes(std_obs)$lat), + londim = lon_dim, + latdim = lat_dim, + na.rm = na.rm) + }, simplify = 'array') + + names(dim(std_obs))[length(dim(std_obs))] <- 'region' + + n_eff <- sapply(X = 1:length(regions), + FUN = function(X) { + WeightedMean(data = n_eff, + region = regions[[X]], + lon = as.vector(attributes(n_eff)$lon), + lat = as.vector(attributes(n_eff)$lat), + londim = lon_dim, + latdim = lat_dim, + na.rm = na.rm) + }, simplify = 'array') + + names(dim(n_eff))[length(dim(n_eff))] <- 'region' + + ## Calculate correlation + enscorr <- cov / (std_hcst * std_obs) + + ## Calculate significance of corr + t_alpha2_n2 <- qt(p = alpha/2, df = n_eff-2, lower.tail = FALSE) + t <- abs(enscorr) * sqrt(n_eff-2) / sqrt(1-enscorr^2) + + sign_corr<- array(data = NA, + dim = c(time = length(forecast.months), + sdate = length(start.months), + region = length(regions))) + + for (time in 1:dim(sign_corr)[['time']]){ + for (mon in 1:dim(sign_corr)[['sdate']]){ + for (reg in 1:dim(sign_corr)[['region']]){ + + if (anyNA(c(t[time, mon, reg], t_alpha2_n2[time, mon, reg])) == FALSE + && t[time, mon, reg] >= t_alpha2_n2[time, mon, reg]){ + sign_corr[time, mon, reg] <- TRUE + } else { + sign_corr[time, mon, reg] <- FALSE + } + } + } + } - } else { - ## Calculate weighted mean of spatial aggregation - aggregated_metrics <- WeightedMetrics(loaded_metrics, - regions = regions, - metric.aggregation = metric.aggregation, - ncores = ncores) - }## close if + ## Save metric result in arrays + aggregated_metrics[sys, ref, , , , which(metrics.visualize == met)] <- s2dv::Reorder(data = enscorr, order = c('time', 'sdate','region')) + metrics_significance[sys, ref, , , , which(metrics.visualize == met)] <- s2dv::Reorder(data = sign_corr, order = c('time', 'sdate','region')) + + } ## close if on enscorr + + if(met == 'mean_bias'){ + + mean_bias <- .loadmetrics(input_path = skill.input.path, system = system[sys], + reference = reference[ref], var = var, + period = period, start_months = start.months, + calib_method = calib.method, metric = 'mean_bias') + + ## Calculate spatial aggregation + mean_bias <- sapply(X = 1:length(regions), + FUN = function(X) { + WeightedMean(data = mean_bias, + region = regions[[X]], + lon = as.vector(attributes(mean_bias)$lon), + lat = as.vector(attributes(mean_bias)$lat), + londim = lon_dim, + latdim = lat_dim, + na.rm = na.rm) + }, simplify = 'array') + + names(dim(mean_bias))[length(dim(mean_bias))] <- 'region' + + ## Save metric result in array + aggregated_metrics[sys, ref, , , , which(metrics.visualize == met)] <- s2dv::Reorder(data = mean_bias, order = c('time', 'sdate','region')) + + } ## close on mean_bias + + if(met == 'enssprerr'){ + + enssprerr <- .loadmetrics(input_path = skill.input.path, system = system[sys], + reference = reference[ref], var = var, + period = period, start_months = start.months, + calib_method = calib.method, metric = 'enssprerr') + + ## Calculate spatial aggregation + enssprerr <- sapply(X = 1:length(regions), + FUN = function(X) { + WeightedMean(data = enssprerr, + region = regions[[X]], + lon = as.vector(attributes(enssprerr)$lon), + lat = as.vector(attributes(enssprerr)$lat), + londim = lon_dim, + latdim = lat_dim, + na.rm = na.rm) + }, simplify = 'array') + + names(dim(enssprerr))[length(dim(enssprerr))] <- 'region' + + ## Save metric result in array + aggregated_metrics[sys, ref, , , , which(metrics.visualize == met)] <- s2dv::Reorder(data = enssprerr, order = c('time', 'sdate','region')) + + } ## close on enssprerr + + } ## close loop on metric + } ## close if on reference + } ## close if on system + + ## Include metric attributes + attributes(aggregated_metrics)$metrics <- metrics.visualize + + ## Set NAs to False + metrics_significance[is.na(metrics_significance)] <- FALSE + + } ## close if on score + + + ####### PLOT SCORECARDS ########## ## Create simple scorecard tables ## (one system only) ## Metrics input must be in the same order as function SC_spatial_aggregation scorecard_single <- ScorecardsSingle(data = aggregated_metrics, + sign = metrics_significance, system = system, reference = reference, var = var, @@ -142,9 +540,17 @@ Scorecards <- function(recipe) { metrics = metrics.visualize, table.label = table.label, fileout.label = fileout.label, + plot.legend = plot.legend, + legend.breaks = legend.breaks, legend.white.space = legend.white.space, + legend.width = legend.width, + legend.height = legend.height, + label.scale = label.scale, col1.width = col1.width, col2.width = col2.width, + columns.width = columns.width, + font.size = font.size, + round.decimal = round.decimal, output.path = output.path) ## Create multi system/reference scorecard tables @@ -152,6 +558,7 @@ Scorecards <- function(recipe) { ## Metrics input must be in the same order as function SC_spatial_aggregation if(length(system) > 1 || length(reference) > 1){ scorecard_multi <- ScorecardsMulti(data = aggregated_metrics, + sign = metrics_significance, system = system, reference = reference, var = var, @@ -159,10 +566,21 @@ Scorecards <- function(recipe) { end.year = end.year, start.months = start.months, forecast.months = forecast.months, - region.names = attributes(regions)$names, + region.names = names(regions), metrics = metrics.visualize, table.label = table.label, fileout.label = fileout.label, + plot.legend = plot.legend, + legend.breaks = legend.breaks, + legend.white.space = legend.white.space, + legend.width = legend.width, + legend.height = legend.height, + label.scale = label.scale, + col1.width = col1.width, + col2.width = col2.width, + columns.width = columns.width, + font.size = font.size, + round.decimal = round.decimal, output.path = output.path) } ## close if diff --git a/modules/Skill/R/RPS_clim.R b/modules/Skill/R/RPS_clim.R index 4a079cd4855fea557f67b6f99e878e90a83398b3..adde5fe239d69962520e5ad5c33601716e417fe6 100644 --- a/modules/Skill/R/RPS_clim.R +++ b/modules/Skill/R/RPS_clim.R @@ -7,6 +7,7 @@ RPS_clim <- function(obs, indices_for_clim = NULL, prob_thresholds = c(1/3, 2/3) obs_probs <- .GetProbs(data = obs, indices_for_quantiles = indices_for_clim, ## temporarily removed s2dv::: prob_thresholds = prob_thresholds, weights = NULL, cross.val = cross.val) + # clim_probs: [bin, sdate] clim_probs <- c(prob_thresholds[1], diff(prob_thresholds), 1 - prob_thresholds[length(prob_thresholds)]) clim_probs <- array(clim_probs, dim = dim(obs_probs)) diff --git a/modules/Skill/R/tmp/CRPS.R b/modules/Skill/R/tmp/CRPS.R index c08375c4c310d1a135a25a7ea98bb50a4e08ca59..9f91be34b82aa876a2ea9c9e61f3099d581298c9 100644 --- a/modules/Skill/R/tmp/CRPS.R +++ b/modules/Skill/R/tmp/CRPS.R @@ -126,7 +126,7 @@ CRPS <- function(exp, obs, time_dim = 'sdate', memb_dim = 'member', dat_dim = NU Fair = Fair, ncores = ncores)$output1 - if (return_mean == TRUE) { + if (isTRUE(return_mean)) { crps <- MeanDims(crps, time_dim, na.rm = FALSE) } else { crps <- crps diff --git a/modules/Skill/R/tmp/RPS.R b/modules/Skill/R/tmp/RPS.R index 54ec8440e440702ba9a51abd3690f9ff4ac29458..e15a1754b96c51e109754edf486394ce2b386f7e 100644 --- a/modules/Skill/R/tmp/RPS.R +++ b/modules/Skill/R/tmp/RPS.R @@ -252,7 +252,7 @@ RPS <- function(exp, obs, time_dim = 'sdate', memb_dim = 'member', cat_dim = NUL weights = weights, cross.val = cross.val, na.rm = na.rm, ncores = ncores)$output1 - if (return_mean == TRUE) { + if (isTRUE(return_mean)) { rps <- MeanDims(rps, time_dim, na.rm = TRUE) } else { rps <- rps diff --git a/modules/Skill/Skill.R b/modules/Skill/Skill.R index d3f60d4e214d6c112179f24bb7bec8cfce9c4fac..aad449512e8f4e93851750f5b73a7614e3d2b1f0 100644 --- a/modules/Skill/Skill.R +++ b/modules/Skill/Skill.R @@ -67,7 +67,7 @@ Skill <- function(recipe, data, agg = 'global') { cross.val <- recipe$Analysis$Workflow$Skill$cross_validation } skill_metrics <- list() - for (metric in strsplit(metrics, ", | |,")[[1]]) { + for (metric in strsplit(metrics, ", | |,")[[1]]) { # Whether the fair version of the metric is to be computed if (metric %in% c('frps', 'frpss', 'bss10', 'bss90', 'fcrps', 'fcrpss')) { @@ -93,7 +93,7 @@ Skill <- function(recipe, data, agg = 'global') { skill <- .drop_dims(skill) skill_metrics[[ metric ]] <- skill # RPS_clim - } else if (metric %in% c('rps_clim')) { + } else if (metric == 'rps_clim') { skill <- Apply(list(data$obs$data), target_dims = c(time_dim, memb_dim), cross.val = cross.val, @@ -111,16 +111,16 @@ Skill <- function(recipe, data, agg = 'global') { return_mean = FALSE, ncores = ncores) skill <- .drop_dims(skill) - skill_metrics[[metric]] <- skill + skill_metrics[[ metric ]] <- skill ## temp # RPS_clim_syear - } else if (metric %in% c('rps_clim_syear')) { ## not returning syear dimension name + } else if (metric == 'rps_clim_syear') { skill <- Apply(list(data$obs$data), target_dims = c(time_dim, memb_dim), cross.val = cross.val, fun = RPS_clim, return_mean = FALSE, output_dims = 'syear')$output1 skill <- .drop_dims(skill) - skill_metrics[[ metric ]] <- skill + skill_metrics[[ metric ]] <- skill ## temp # Ranked Probability Skill Score and Fair version } else if (metric %in% c('rpss', 'frpss')) { skill <- RPSS(data$hcst$data, data$obs$data, @@ -366,8 +366,13 @@ Skill <- function(recipe, data, agg = 'global') { if (recipe$Analysis$Workflow$Skill$save == 'all') { corr_metric_names <- grep("^corr_individual_members", names(skill_metrics)) if (length(corr_metric_names) == 0) { - save_metrics(recipe = recipe, skill = skill_metrics, - data_cube = data$hcst, agg = agg) + if (tolower(recipe$Analysis$Output_format) == 'scorecards') { + save_metrics_scorecards(recipe = recipe, skill = skill_metrics, + data_cube = data$hcst, agg = agg) + } else { + save_metrics(recipe = recipe, skill = skill_metrics, + data_cube = data$hcst, agg = agg) + } } else { # Save corr if (length(skill_metrics[corr_metric_names]) > 0) { @@ -376,8 +381,13 @@ Skill <- function(recipe, data, agg = 'global') { } # Save other skill metrics if (length(skill_metrics[-corr_metric_names]) > 0) { - save_metrics(recipe = recipe, skill = skill_metrics[-corr_metric_names], - data_cube = data$hcst, agg = agg) + if (tolower(recipe$Analysis$Output_format) == 'scorecards') { + save_metrics_scorecards(recipe = recipe, skill = skill_metrics[-corr_metric_names], + data_cube = data$hcst, agg = agg) + } else { + save_metrics(recipe = recipe, skill = skill_metrics[-corr_metric_names], + data_cube = data$hcst, agg = agg) + } } } } diff --git a/modules/Statistics/Statistics.R b/modules/Statistics/Statistics.R index 40f1889b3a5eb536eb6899041b484d859cd58475..26f52de9412fa699350e5729190d489e7b0d8797 100644 --- a/modules/Statistics/Statistics.R +++ b/modules/Statistics/Statistics.R @@ -1,6 +1,6 @@ -compute_statistics <- function(recipe, data, agg = 'global'){ +Statistics <- function(recipe, data, agg = 'global'){ # data$hcst: s2dv_cube containing the hindcast @@ -8,63 +8,76 @@ compute_statistics <- function(recipe, data, agg = 'global'){ # recipe: auto-s2s recipe as provided by read_yaml time_dim <- 'syear' - memb_dim <- 'ensemble' - - - ## Duplicate obs along hcst ensemble dimension - obs_data <- adrop(data$obs$data, drop = 9) - obs_data <- InsertDim(data = obs_data, pos = 9, lendim = 25, name = 'ensemble') - + ncores <- recipe$Analysis$ncores + + ## Calculate ensemble mean + hcst_data <- Apply(data$hcst$data, target_dims = 'ensemble', fun = 'mean')$output1 + obs_data <- Apply(data$obs$data, target_dims = 'ensemble', fun = 'mean')$output1 + ## Remove unwanted dimensions + hcst_data <- Subset(hcst_data, along = c('dat', 'sday', 'sweek'), indices = list(1,1,1) , drop = 'selected') + obs_data <- Subset(obs_data, along = c('dat', 'sday', 'sweek'), indices = list(1,1,1) , drop = 'selected') + statistics_list <- tolower(recipe$Analysis$Workflow$Statistics$metric) statistics <- list() for (stat in strsplit(statistics_list, ", | |,")[[1]]) { - # Whether the fair version of the metric is to be computed + if (stat %in% c('cov', 'covariance')) { - covariance <- Apply(data = list(x= obs_data, y=data$hcst$data), - target_dims = c(time_dim, memb_dim), + ## Calculate covariance + covariance <- Apply(data = list(x = obs_data, y = hcst_data), + target_dims = time_dim, fun = function(x,y){cov(as.vector(x),as.vector(y), use = "everything", method = "pearson")})$output1 - statistics[[ stat ]] <- covariance + statistics[[ stat ]] <- covariance - } ## close if on cov + } ## close if on covariance if (stat %in% c('std', 'standard_deviation')) { - ## Calculate standard deviation - std_hcst <- Apply(data = data$hcst$data, - target_dims = c(time_dim, memb_dim), + ## Calculate standard deviation + std_hcst <- Apply(data = hcst_data, + target_dims = c(time_dim), fun = 'sd')$output1 - std_obs <- Apply(data = data$obs$data, - target_dims = c(time_dim, memb_dim), + std_obs <- Apply(data = obs_data, + target_dims = c(time_dim), fun = 'sd')$output1 - statistics[[ stat ]] <- list('std_hcst' = std_hcst, 'std_obs' = std_obs) + statistics[['std_hcst']] <- std_hcst + statistics[['std_obs']] <- std_obs } ## close if on std if (stat %in% c('var', 'variance')) { - ## Calculate standard deviation - var_hcst <- (Apply(data = data$hcst$data, - target_dims = c(time_dim, memb_dim), + ## Calculate variance + var_hcst <- (Apply(data = hcst_data, + target_dims = c(time_dim), fun = 'sd')$output1)^2 - var_obs <- (Apply(data = data$obs$data, - target_dims = c(time_dim, memb_dim), + var_obs <- (Apply(data = obs_data, + target_dims = c(time_dim), fun = 'sd')$output1)^2 - statistics[[ stat ]] <- list('var_hcst' = var_hcst, 'var_obs' = var_obs) + statistics[['var_hcst']] <- var_hcst + statistics[['var_obs']] <- var_obs - } ## close if on var + } ## close if on variance + + if (stat == 'n_eff') { + + ## Calculate degrees of freedom + n_eff <- s2dv::Eno(data = obs_data, time_dim = time_dim, na.action = na.pass, ncores = ncores) + statistics[['n_eff']] <- n_eff + + } ## close on n_eff } info(recipe$Run$logger, "##### STATISTICS COMPUTATION COMPLETE #####") @@ -79,8 +92,14 @@ compute_statistics <- function(recipe, data, agg = 'global'){ if (recipe$Analysis$Workflow$Statistics$save == 'all') { # Save all statistics - save_metrics(recipe = recipe, skill = statistics, ## Not able to save data with these dimensions - data_cube = data$hcst, agg = agg) ## The length of parameter 'order' should be the same with the dimension length of parameter 'data'. + if (tolower(recipe$Analysis$Output_format) == 'scorecards') { + save_metrics_scorecards(recipe = recipe, skill = statistics, + data_cube = data$hcst, agg = agg) + } else { + save_metrics(recipe = recipe, skill = statistics, + data_cube = data$hcst, agg = agg) + } + } # Return results diff --git a/modules/Units/R/transform_units_precipitation.R b/modules/Units/R/transform_units_precipitation.R index 5cc1e812fd31a8aec507c3cae283707a30c273b8..d0dd7ffd50dfb6f04db35006a06cd61b1d8f43e9 100644 --- a/modules/Units/R/transform_units_precipitation.R +++ b/modules/Units/R/transform_units_precipitation.R @@ -96,22 +96,25 @@ transform_units_precipitation <- function(data, original_units, new_units, } -.days_in_month <- function(x, cal) { - if (cal %in% c('gregorian', 'standard', 'proleptic_gregorian')) { - N_DAYS_IN_MONTHS <- lubridate:::N_DAYS_IN_MONTHS - if (leap_year(year(x))) { - N_DAYS_IN_MONTHS[2] <- N_DAYS_IN_MONTHS[2] + 1 - } - } else if (cal %in% c('360', '360_day')) { - N_DAYS_IN_MONTHS <- rep(30, 12) - names(N_DAYS_IN_MONTHS) <- month.abb - } else if (cal %in% c('365', '365_day')) { - N_DAYS_IN_MONTHS <- lubridate:::N_DAYS_IN_MONTHS - } else { - stop("Unknown calendar") - } - month_x <- month(x, label = TRUE, locale = "C") - n_days <- N_DAYS_IN_MONTHS[month_x] - n_days[month_x == "Feb" & leap_year(x)] <- 29L +.days_in_month <- function(dates, cal) { + n_days <- array() + for (x in 1:length(dates)) { + if (cal %in% c('gregorian', 'standard', 'proleptic_gregorian')) { + N_DAYS_IN_MONTHS <- lubridate:::N_DAYS_IN_MONTHS + if (leap_year(year(dates[x]))) { + N_DAYS_IN_MONTHS[2] <- N_DAYS_IN_MONTHS[2] + 1 + } + } else if (cal %in% c('360', '360_day')) { + N_DAYS_IN_MONTHS <- rep(30, 12) + names(N_DAYS_IN_MONTHS) <- month.abb + } else if (cal %in% c('365', '365_day')) { + N_DAYS_IN_MONTHS <- lubridate:::N_DAYS_IN_MONTHS + } else { + stop("Unknown calendar") + } + month_x <- month(dates[x], label = TRUE, locale = "C") + n_days[x] <- N_DAYS_IN_MONTHS[month_x] + # n_days[month_x == "Feb" & leap_year(x)] <- 29L + } return(n_days) } diff --git a/modules/Visualization/R/plot_most_likely_terciles_map.R b/modules/Visualization/R/plot_most_likely_terciles_map.R index abd08f7b4a636898fd4d3c87d154c6eb6d80f83c..5d60f8c1747c8ec8b5435902a3c44107d9a4873f 100644 --- a/modules/Visualization/R/plot_most_likely_terciles_map.R +++ b/modules/Visualization/R/plot_most_likely_terciles_map.R @@ -109,7 +109,7 @@ plot_most_likely_terciles <- function(recipe, "\n", "Most Likely Tercile / Initialization: ", i_syear) months <- lubridate::month(fcst$attrs$Dates[1, 1, which(start_date == i_syear), ], - label = T, abb = F) + label = T, abb = F,locale = "en_GB") years <- lubridate::year(fcst$attrs$Dates[1, 1, which(start_date == i_syear), ]) if (recipe$Analysis$Workflow$Visualization$multi_panel) { ## TODO: Ensure this works for daily and sub-daily cases diff --git a/modules/Visualization/R/plot_skill_metrics.R b/modules/Visualization/R/plot_skill_metrics.R index 2698d499b13f58dff2eabacbaacb7614eb214298..26fb4a738269c522d30f82027199e20b8e3e6dc7 100644 --- a/modules/Visualization/R/plot_skill_metrics.R +++ b/modules/Visualization/R/plot_skill_metrics.R @@ -38,7 +38,7 @@ plot_skill_metrics <- function(recipe, data_cube, skill_metrics, # Get months months <- lubridate::month(Subset(data_cube$attrs$Dates, "syear", indices = 1), - label = T, abb = F) + label = T, abb = F,locale = "en_GB") if (!is.null(recipe$Analysis$Workflow$Visualization$projection)) { projection <- tolower(recipe$Analysis$Workflow$Visualization$projection) } else { @@ -58,13 +58,16 @@ plot_skill_metrics <- function(recipe, data_cube, skill_metrics, "enscorr", "rpss_specs", "bss90_specs", "bss10_specs", "enscorr_specs", "rmsss", "msss") scores <- c("rps", "frps", "crps", "frps_specs", "mse") + statistics <- c("cov", "std_hcst", "std_obs", "var_hcst", "var_obs", "n_eff") + # Loop over variables and assign colorbar and plot parameters to each metric for (var in 1:data_cube$dims[['var']]) { + var_name <- data_cube$attrs$Variable$varName[[var]] ## Need to include for statistics plotting to work var_skill <- lapply(skill_metrics, function(x) { ClimProjDiags::Subset(x, along = 'var', indices = var, drop = 'selected')}) - for (name in c(skill_scores, scores, "mean_bias", "enssprerr")) { + for (name in c(skill_scores, scores, statistics, "mean_bias", "enssprerr")) { if (name %in% names(skill_metrics)) { units <- NULL # Define plot characteristics and metric name to display in plot @@ -120,7 +123,42 @@ plot_skill_metrics <- function(recipe, data_cube, skill_metrics, col_inf <- colorbar[1] col_sup <- colorbar[length(colorbar)] units <- data_cube$attrs$Variable$metadata[[var_name]]$units + } else if (name %in% "cov") { + skill <- var_skill[[name]] + display_name <- "Covariance" + max_value <- max(abs(quantile(skill, 0.02, na.rm = T)), + abs(quantile(skill, 0.98, na.rm = T))) + brks <- max_value * seq(-1, 1, by = 0.2) + colorbar <- clim.colors(length(brks) + 1, diverging_palette) + cols <- colorbar[2:(length(colorbar) - 1)] + col_inf <- colorbar[1] + col_sup <- colorbar[length(colorbar)] + units <- paste0(data_cube$attrs$Variable$metadata[[var_name]]$units, "²") + } else if (name %in% "std_hcst") { + skill <- var_skill[[name]] + display_name <- "Hindcast Standard Deviation" + max_value <- max(abs(quantile(skill, 0.02, na.rm = T)), + abs(quantile(skill, 0.98, na.rm = T))) + brks <- max_value * seq(-1, 1, by = 0.2) + colorbar <- clim.colors(length(brks) + 1, diverging_palette) + cols <- colorbar[2:(length(colorbar) - 1)] + col_inf <- colorbar[1] + col_sup <- colorbar[length(colorbar)] + units <- data_cube$attrs$Variable$metadata[[var_name]]$units + } else if (name %in% "std_obs") { + skill <- var_skill[[name]] + display_name <- "Observation Standard Deviation" + max_value <- max(abs(quantile(skill, 0.02, na.rm = T)), + abs(quantile(skill, 0.98, na.rm = T))) + brks <- max_value * seq(-1, 1, by = 0.2) + colorbar <- clim.colors(length(brks) + 1, diverging_palette) + cols <- colorbar[2:(length(colorbar) - 1)] + col_inf <- colorbar[1] + col_sup <- colorbar[length(colorbar)] + units <- data_cube$attrs$Variable$metadata[[var_name]]$units } + + # Reorder dimensions skill <- Reorder(skill, c("time", "longitude", "latitude")) # If the significance has been requested and the variable has it, @@ -214,6 +252,7 @@ plot_skill_metrics <- function(recipe, data_cube, skill_metrics, for (i in 1:dim(skill)[['time']]) { # Get forecast time label forecast_time <- match(months[i], month.name) - init_month + 1 + if (forecast_time < 1) { forecast_time <- forecast_time + 12 } diff --git a/modules/Visualization/R/tmp/PlotRobinson.R b/modules/Visualization/R/tmp/PlotRobinson.R index d8210258275d481c9e2724e0e860281fc2e6e048..bd427448fad9bdc9482c3e13b161f05e2fd6c1a7 100644 --- a/modules/Visualization/R/tmp/PlotRobinson.R +++ b/modules/Visualization/R/tmp/PlotRobinson.R @@ -251,7 +251,26 @@ PlotRobinson <- function(data, lon, lat, lon_dim = NULL, lat_dim = NULL, # Color bar ## Check: brks, cols, bar_limits, color_fun, bar_extra_margin, units ## Build: brks, cols, bar_limits, col_inf, col_sup - var_limits <- c(min(data, na.rm = TRUE), max(data, na.rm = TRUE)) + if (!all(is.na(data))) { + var_limits <- c(min(data[!is.infinite(data)], na.rm = TRUE), + max(data[!is.infinite(data)], na.rm = TRUE)) + } else { + warning("All the data are NAs. The map will be filled with colNA.") + if (!is.null(brks) && length(brks) > 1) { + #NOTE: var_limits be like this to avoid warnings from ColorBar + var_limits <- c(min(brks, na.rm = TRUE) + diff(brks)[1], + max(brks, na.rm = TRUE)) + } else if (!is.null(bar_limits)) { + var_limits <- c(bar_limits[1] + 0.01, bar_limits[2]) + } else { + var_limits <- c(-0.5, 0.5) # random range since colorbar is not going to be plotted + if (!is.null(legend)) { + legend <- NULL + warning("All data are NAs. Color bar won't be drawn. If you want to have ", + "color bar still, define parameter 'brks' or 'bar_limits'.") + } + } + } colorbar <- ColorBar(brks = brks, cols = cols, vertical = vertical, subsampleg = NULL, bar_limits = bar_limits, var_limits = var_limits, triangle_ends = triangle_ends, col_inf = col_inf, col_sup = col_sup, color_fun = color_fun, @@ -317,12 +336,12 @@ PlotRobinson <- function(data, lon, lat, lon_dim = NULL, lat_dim = NULL, # Add triangles to brks brks_ggplot <- brks - if (max(data, na.rm = T) > tail(brks, 1)) { + if (var_limits[2] > tail(brks, 1)) { brks_ggplot <- c(brks_ggplot, max(data, na.rm = T)) } else { brks_ggplot <- c(brks_ggplot, tail(brks, 1) + diff(tail(brks, 2))) } - if (min(data, na.rm = T) < brks[1]) { + if (var_limits[1] < brks[1]) { brks_ggplot <- c(min(data, na.rm = T), brks_ggplot) } else { brks_ggplot <- c(brks[1] - diff(brks[1:2]), brks_ggplot) diff --git a/modules/Visualization/Visualization.R b/modules/Visualization/Visualization.R index a6b6bd75ebb6864e686c7101eb713a5f82f5a773..0ce32bae6b4e2c8f6ec0c76a38215bf0cdcc9da3 100644 --- a/modules/Visualization/Visualization.R +++ b/modules/Visualization/Visualization.R @@ -46,7 +46,7 @@ Visualization <- function(recipe, "parameters of the requested plotting function, i.e. ", "PlotEquiMap, PlotRobinson or PlotLayout. There could be ", "plotting erros if the list is incomplete.")) - } else { + } else if (!is.null(output_conf)) { warning(paste("Parameter 'output_conf' should be a list.", "Using default configuration.")) output_conf <- NULL @@ -83,6 +83,18 @@ Visualization <- function(recipe, "parameter 'skill_metrics' is NULL")) } } + + # Plot statistics + if ("statistics" %in% plots) { + if (!is.null(statistics)) { + plot_skill_metrics(recipe, data$hcst, statistics, outdir, + significance, output_conf = output_conf) + } else { + error(recipe$Run$logger, + paste0("The statistics plots have been requested, but the ", + "parameter 'skill_metrics' is NULL")) + } + } # Plot forecast ensemble mean if ("forecast_ensemble_mean" %in% plots) { diff --git a/recipe_ecvs_seasonal_oper.yml b/recipe_ecvs_seasonal_oper.yml deleted file mode 100644 index 832f36d54b04c688019b75b066dc41360b302288..0000000000000000000000000000000000000000 --- a/recipe_ecvs_seasonal_oper.yml +++ /dev/null @@ -1,73 +0,0 @@ -Description: - Author: nperez - Info: ECVs Oper ESS ECMWF SEAS5 Seasonal Forecast recipe (monthly mean, tas) - -Analysis: - Horizon: seasonal # Mandatory, str: either subseasonal, seasonal, or decadal - Variables: - - {name: tas, freq: monthly_mean, units: C} - - {name: prlr, freq: monthly_mean, units: mm, flux: no} - Datasets: - System: - - {name: ECMWF-SEAS5.1} # system21_m1 system35c3s - Multimodel: no # Mandatory, bool: Either yes/true or no/false - Reference: - - {name: ERA5} # Mandatory, str: Reference codename. See docu. - Time: - sdate: '0801' ## MMDD - fcst_year: '2023' # Optional, int: Forecast year 'YYYY' - hcst_start: '1993' # Mandatory, int: Hindcast start year 'YYYY' - hcst_end: '2016' # Mandatory, int: Hindcast end year 'YYYY' - ftime_min: 1 # Mandatory, int: First leadtime time step in months - ftime_max: 6 # Mandatory, int: Last leadtime time step in months - Region: - - {name: "EU", latmin: 20, latmax: 80, lonmin: -20, lonmax: 40} - Regrid: - method: bilinear # Mandatory, str: Interpolation method. See docu. - type: "to_system" - #type: /esarchive/scratch/nmilders/gitlab/git_clones/auto-s2s/conf/grid_description.txt #'r360x180' # Mandatory, str: to_system, to_reference, or CDO-accepted grid. - Workflow: - Anomalies: - compute: no - cross_validation: no - save: none - Calibration: - method: evmos # Mandatory, str: Calibration method. See docu. - cross_validation: yes - save: none - Skill: - metric: mean_bias EnsCorr rpss crpss bss10 bss90 - save: 'all' - cross_validation: yes - Probabilities: - percentiles: [[1/3, 2/3], [1/10, 9/10]] # frac: Quantile thresholds. - save: 'all' - Indicators: - index: no - Visualization: - plots: skill_metrics forecast_ensemble_mean most_likely_terciles - multi_panel: no - dots: both - ncores: 4 # Optional, int: number of cores, defaults to 1 - remove_NAs: # Optional, bool: Whether NAs are removed, defaults to FALSE - Output_format: scorecards - logo: yes -Run: - Loglevel: INFO - Terminal: yes - filesystem: esarchive - output_dir: /esarchive/scratch/nperez/cs_oper/seasonal/ # replace with the directory where you want to save the outputs - code_dir: /esarchive/scratch/nperez/git/auto-s2s/ # replace with the directory where your code is - autosubmit: yes - # fill only if using autosubmit - auto_conf: - script: /esarchive/scratch/nperez/git/auto-s2s/exec_ecvs_seasonal_oper.R # replace with the path to your script - expid: a68v # replace with your EXPID - hpc_user: bsc32339 # replace with your hpc username - wallclock: 02:00 # hh:mm - processors_per_job: 4 - platform: nord3v2 - email_notifications: yes # enable/disable email notifications. Change it if you want to. - email_address: nuria.perez@bsc.es # replace with your email address - notify_completed: yes # notify me by email when a job finishes - notify_failed: yes # notify me by email when a job fails diff --git a/recipes/atomic_recipes/recipe_decadal.yml b/recipes/atomic_recipes/recipe_decadal.yml index 2dca96c06666efb1cb20a4f3b4af5d31f0863d21..26312b34d1127af7585fb51c9348763e47643a79 100644 --- a/recipes/atomic_recipes/recipe_decadal.yml +++ b/recipes/atomic_recipes/recipe_decadal.yml @@ -8,8 +8,8 @@ Analysis: freq: monthly_mean Datasets: System: - name: EC-Earth3-i4 #CanESM5 - member: r1i4p1f1,r2i4p1f1,r3i4p1f1 #'all' + name: HadGEM3-GC31-MM #EC-Earth3-i4 #CanESM5 + member: r1i1p1f2,r2i1p1f2,r3i1p1f2 #'all' Multimodel: no Reference: name: ERA5 #JRA-55 @@ -19,7 +19,7 @@ Analysis: hcst_end: 1993 # season: 'Annual' ftime_min: 2 - ftime_max: 14 + ftime_max: 24 Region: latmin: 10 #-90 latmax: 20 #90 diff --git a/recipes/atomic_recipes/recipe_seasonal_provenance.yml b/recipes/atomic_recipes/recipe_seasonal_provenance.yml new file mode 100644 index 0000000000000000000000000000000000000000..196b49c92bcbd52582bbf4573ae51591e3096ebc --- /dev/null +++ b/recipes/atomic_recipes/recipe_seasonal_provenance.yml @@ -0,0 +1,57 @@ +Description: + Author: V. Agudetse + '': split version +Analysis: + Horizon: seasonal + Variables: + name: tas + freq: monthly_mean + Datasets: + System: + name: ECMWF-SEAS5.1 + Multimodel: no + Reference: + name: ERA5 + Time: + sdate: '1101' + fcst_year: + hcst_start: '2000' + hcst_end: '2006' + ftime_min: 1 + ftime_max: 3 + Region: + latmin: 27 + latmax: 48 + lonmin: 0 + lonmax: 359 + Regrid: + method: bilinear + type: to_system + Workflow: + Anomalies: + compute: no + cross_validation: + save: + Calibration: + method: bias + save: 'all' + Skill: + metric: RPSS + save: 'all' + Probabilities: + percentiles: [[1/3, 2/3]] + save: 'all' + Indicators: + index: FALSE + Visualization: + plots: skill_metrics, forecast_ensemble_mean, most_likely_terciles + ncores: # Optional, int: number of cores, defaults to 1 + remove_NAs: # Optional, bool: Whether NAs are removed, defaults to FALSE + Output_format: S2S4E +Run: + Loglevel: INFO + Terminal: yes + output_dir: /tmp/out-logs/ + code_dir: /home/kinow/Development/r/workspace/sunset/ + filesystem: sample + diff --git a/recipes/atomic_recipes/recipe_system7c3s-prlr.yml b/recipes/atomic_recipes/recipe_system7c3s-prlr.yml index 1cba3c97f76f05fddec5aefe294f02207e6459b9..590d4499d07b1761737f4c0d2f89bec2bb5e30c7 100644 --- a/recipes/atomic_recipes/recipe_system7c3s-prlr.yml +++ b/recipes/atomic_recipes/recipe_system7c3s-prlr.yml @@ -4,8 +4,9 @@ Description: Analysis: Horizon: Seasonal Variables: - name: prlr + name: prlr freq: monthly_mean + units: mm Datasets: System: name: Meteo-France-System7 @@ -36,7 +37,7 @@ Analysis: method: mse_min save: 'all' Skill: - metric: RPS RPSS CRPS CRPSS FRPSS BSS10 BSS90 EnsCorr Corr + metric: RPS RPSS CRPS CRPSS FRPSS BSS10 BSS90 EnsCorr save: 'all' Probabilities: percentiles: [[1/3, 2/3], [1/10, 9/10], [1/4, 2/4, 3/4]] diff --git a/recipes/atomic_recipes/recipe_tas-tos_nadia.yml b/recipes/atomic_recipes/recipe_tas-tos_nadia.yml new file mode 100644 index 0000000000000000000000000000000000000000..466659a228e2a881f8a7418db8bbae099fefb724 --- /dev/null +++ b/recipes/atomic_recipes/recipe_tas-tos_nadia.yml @@ -0,0 +1,55 @@ +Description: + Author: V. Agudetse + +Analysis: + Horizon: Seasonal + Variables: + name: tas-tos + sic_threshold: 0.15 ## sea ice threshold for tas-tos blending, default = 0.15 + freq: monthly_mean + Datasets: + System: + name: ECMWF-SEAS5 + Multimodel: False + Reference: + name: BEST + Time: + sdate: '0101' + #fcst_year: + hcst_start: '2014' + hcst_end: '2016' + ftime_min: 1 + ftime_max: 1 + Region: + latmin: -90 + latmax: 90 + lonmin: 0 + lonmax: 359.9 + Regrid: + method: bilinear + type: to_system + Workflow: + Calibration: + method: raw + save: 'none' + Anomalies: + compute: yes + cross_validation: no + save: 'none' + Skill: + metric: mean_bias EnsCorr RPS RPSS CRPS CRPSS enssprerr + cross_validation: no + save: 'all' + Probabilities: + percentiles: [[1/3, 2/3]] + save: 'none' + Indicators: + index: no + ncores: 15 + remove_NAs: yes + Output_format: scorecards +Run: + Loglevel: INFO + Terminal: yes + output_dir: /esarchive/scratch/nmilders/scorecards_data/test/ + code_dir: /esarchive/scratch/nmilders/gitlab/git_clones/s2s-suite_tas-tos/ diff --git a/recipes/recipe_decadal_split.yml b/recipes/examples/recipe_decadal_split.yml similarity index 100% rename from recipes/recipe_decadal_split.yml rename to recipes/examples/recipe_decadal_split.yml diff --git a/recipes/examples/recipe_ecvs_seasonal_oper.yml b/recipes/examples/recipe_ecvs_seasonal_oper.yml index d47fd1593749cb07c04dd8166a73075ac2058d76..832f36d54b04c688019b75b066dc41360b302288 100644 --- a/recipes/examples/recipe_ecvs_seasonal_oper.yml +++ b/recipes/examples/recipe_ecvs_seasonal_oper.yml @@ -5,8 +5,8 @@ Description: Analysis: Horizon: seasonal # Mandatory, str: either subseasonal, seasonal, or decadal Variables: - - {name: tas, freq: monthly_mean} - - {name: prlr, freq: monthly_mean} + - {name: tas, freq: monthly_mean, units: C} + - {name: prlr, freq: monthly_mean, units: mm, flux: no} Datasets: System: - {name: ECMWF-SEAS5.1} # system21_m1 system35c3s @@ -14,14 +14,14 @@ Analysis: Reference: - {name: ERA5} # Mandatory, str: Reference codename. See docu. Time: - sdate: '0701' ## MMDD + sdate: '0801' ## MMDD fcst_year: '2023' # Optional, int: Forecast year 'YYYY' hcst_start: '1993' # Mandatory, int: Hindcast start year 'YYYY' hcst_end: '2016' # Mandatory, int: Hindcast end year 'YYYY' ftime_min: 1 # Mandatory, int: First leadtime time step in months ftime_max: 6 # Mandatory, int: Last leadtime time step in months Region: - - {name: "UE", latmin: 20, latmax: 80, lonmin: -20, lonmax: 40} + - {name: "EU", latmin: 20, latmax: 80, lonmin: -20, lonmax: 40} Regrid: method: bilinear # Mandatory, str: Interpolation method. See docu. type: "to_system" @@ -47,7 +47,7 @@ Analysis: Visualization: plots: skill_metrics forecast_ensemble_mean most_likely_terciles multi_panel: no - projection: lambert_europe + dots: both ncores: 4 # Optional, int: number of cores, defaults to 1 remove_NAs: # Optional, bool: Whether NAs are removed, defaults to FALSE Output_format: scorecards diff --git a/recipes/recipe_scorecards.yml b/recipes/examples/recipe_scorecards.yml similarity index 61% rename from recipes/recipe_scorecards.yml rename to recipes/examples/recipe_scorecards.yml index 434426d02d499db78c6884c6f6a9322390935526..a75ad1d26bd368be33b288e0fb530faa62f6a833 100644 --- a/recipes/recipe_scorecards.yml +++ b/recipes/examples/recipe_scorecards.yml @@ -56,6 +56,9 @@ Analysis: metric: mean_bias EnsCorr rps rpss crps crpss EnsSprErr # list, don't split cross_validation: yes save: 'all' + Statistics: + metric: cov std n_eff + save: 'all' Probabilities: percentiles: [[1/3, 2/3]] # list, don't split save: 'none' @@ -64,19 +67,30 @@ Analysis: Indicators: index: no # ? Scorecards: - execute: yes # yes/no - regions: + execute: yes # Mandatory, yes/no + regions: # Mandatory, define regions over which to aggregate data Extra-tropical NH: {lon.min: 0, lon.max: 360, lat.min: 30, lat.max: 90} Tropics: {lon.min: 0, lon.max: 360, lat.min: -30, lat.max: 30} - Extra-tropical SH : {lon.min: 0, lon.max: 360, lat.min: -30, lat.max: -90} - start_months: NULL - metric: mean_bias enscorr rpss crpss enssprerr - metric_aggregation: 'score' - table_label: NULL - fileout_label: NULL - col1_width: NULL - col2_width: NULL - calculate_diff: FALSE + Extra-tropical SH : {lon.min: 0, lon.max: 360, lat.min: -90, lat.max: -30} + start_months: NULL + metric: mean_bias enscorr rpss crpss enssprerr # Mandatory, define metrics to visualize in scorecard. + metric_aggregation: 'score' # Mandatory, defines the aggregation level of the metrics. + signif_alpha: 0.05 # Optional, to set alpha for signifiance calculation, default is 0.05. + table_label: NULL # Optional, to add extra information to the table title. + fileout_label: NULL # Optional, to add extra information to the output filename. + col1_width: NULL # Optional, to set the width (cm) of the first table column, default is calculated depending on row names. + col2_width: NULL # Optional, to set the width (cm) of the first table column, default is calculated depending on subrow names. + columns_width: NULL # Optional, to set the width (cm) of all other columns within the table, defualt is 1.2. + plot_legend: TRUE ## Optional, to define is the legend is included in the scorecards image, default is TRUE. + legend_breaks: NULL # Optional, default used legend breaks from modules/Scorecards/R/tmp/Utils.R. + legend_white_space: NULL # Optional, default is automatically calculted depend on column sizes. + legend_width: NULL # Optional, to set the width of the lengend bars, default is 550. + legend_height: NULL # Optional, to set the height of the legend bars, default is 50. + label_scale: NULL # Optional, to set the scale of the legend bar lables, default is 1.4. + round_decimal: NULL # Optional, to round the data shown in the scorecard, default is 2 (decimals). + inf_to_na: TRUE # Optional, to set infinite values to NA, default is FALSE. + font_size: NULL # Optional, to set the font size of the scorecard table values, default is 1.2. + calculate_diff: FALSE # Optional, to calculate difference between two systems or two references, default is FALSE. ncores: 7 remove_NAs: no # bool, don't split Output_format: Scorecards # string, don't split diff --git a/recipes/recipe_splitting_example.yml b/recipes/examples/recipe_splitting_example.yml similarity index 100% rename from recipes/recipe_splitting_example.yml rename to recipes/examples/recipe_splitting_example.yml diff --git a/split.R b/split.R index 8328e460c16c14147201790a5b9c4ceb0e111234..0d443c7c84163934f1e01c33e6ad738d3fb782c6 100755 --- a/split.R +++ b/split.R @@ -29,12 +29,11 @@ arguments <- docopt(doc = doc) recipe <- prepare_outputs(recipe_file = arguments$recipe, uniqueID = !arguments$disable_unique_ID, restructure = FALSE) -# Split recipe into atomic recipes -## TODO: Add autosubmit yes/no to the parameters? +# Split recipe into atomic recipes run_parameters <- divide_recipe(recipe) -if (recipe$Run$autosubmit) { +if (!is.null(recipe$Run$autosubmit) && (recipe$Run$autosubmit)) { write_autosubmit_conf(recipe, run_parameters$n_atomic_recipes) sink(arguments$tmpfile, append = FALSE) # Run with... @@ -56,4 +55,7 @@ if (recipe$Run$autosubmit) { cat("TRUE") } sink() +} else { + stop("Autosubmit is not selected but no temp file was provided by the + launcher. Please open an issue to report the problem.") } diff --git a/tests/recipes/recipe-seasonal_monthly_1_statistics.yml b/tests/recipes/recipe-seasonal_monthly_1_statistics.yml new file mode 100644 index 0000000000000000000000000000000000000000..ccfd3cf42bf4893354306a39ff408cce4766546d --- /dev/null +++ b/tests/recipes/recipe-seasonal_monthly_1_statistics.yml @@ -0,0 +1,58 @@ +Description: + Author: V. Agudetse + +Analysis: + Horizon: Seasonal + Variables: + name: tas + freq: monthly_mean + Datasets: + System: + name: Meteo-France-System7 + Multimodel: False + Reference: + name: ERA5 + Time: + sdate: '1101' + fcst_year: '2020' + hcst_start: '1993' + hcst_end: '1996' + ftime_min: 1 + ftime_max: 3 + Region: + latmin: 17 + latmax: 20 + lonmin: 12 + lonmax: 15 + Regrid: + method: bilinear + type: to_system + Workflow: + # Anomalies: + # compute: no + # cross_validation: + # save: 'none' + Calibration: + method: mse_min + save: 'all' + Skill: + metric: RPSS CRPSS EnsCorr Corr_individual_members Enscorr_specs + save: 'all' + Statistics: + metric: cov std var n_eff + save: 'all' + Probabilities: + percentiles: [[1/3, 2/3], [1/10, 9/10]] + save: 'all' + Indicators: + index: no + Visualization: + plots: statistics + multi_panel: yes + projection: cylindrical_equidistant + Output_format: scorecards +Run: + Loglevel: INFO + Terminal: yes + output_dir: ./tests/out-logs/ + code_dir: /esarchive/scratch/nmilders/gitlab/git_clones/auto-s2s/ #/esarchive/scratch/vagudets/repos/auto-s2s/ diff --git a/tests/recipes/recipe-seasonal_monthly_1_tas-tos.yml b/tests/recipes/recipe-seasonal_monthly_1_tas-tos.yml new file mode 100644 index 0000000000000000000000000000000000000000..c1404e7d5ec79d129dc1130b3af15d4eb2372030 --- /dev/null +++ b/tests/recipes/recipe-seasonal_monthly_1_tas-tos.yml @@ -0,0 +1,56 @@ +Description: + Author: N. Milders + +Analysis: + Horizon: Seasonal + Variables: + name: tas-tos + sic_threshold: 0.15 + freq: monthly_mean + Datasets: + System: + name: ECMWF-SEAS5 + Multimodel: False + Reference: + name: ERA5 + Time: + sdate: '0101' + fcst_year: '2018' + hcst_start: '1993' + hcst_end: '1996' + ftime_min: 1 + ftime_max: 3 + Region: + latmin: 17 + latmax: 20 + lonmin: 12 + lonmax: 15 + Regrid: + method: bilinear + type: to_system + Workflow: + Anomalies: + compute: no + cross_validation: + save: 'none' + Calibration: + method: mse_min + save: 'all' + Skill: + metric: RPSS CRPSS EnsCorr Corr_individual_members Enscorr_specs + save: 'all' + Probabilities: + percentiles: [[1/3, 2/3], [1/10, 9/10]] + save: 'all' + Indicators: + index: no + Visualization: + plots: skill_metrics most_likely_terciles forecast_ensemble_mean + multi_panel: yes + projection: cylindrical_equidistant + Output_format: S2S4E +Run: + Loglevel: INFO + Terminal: yes + output_dir: ./tests/out-logs/ + code_dir: /esarchive/scratch/vagudets/repos/auto-s2s/ diff --git a/tests/testthat/test-decadal_monthly_2.R b/tests/testthat/test-decadal_monthly_2.R index 9adc16b65deb3b84f9a825d93c175504f6304f2b..6ee681bb9a2fbe01216a41441a9a8c55a0c8106e 100644 --- a/tests/testthat/test-decadal_monthly_2.R +++ b/tests/testthat/test-decadal_monthly_2.R @@ -164,7 +164,7 @@ TRUE ) expect_equal( names(skill_metrics), -c("rpss_specs", "enscorr_specs", "frps_specs", "frpss_specs", "bss10_specs", "frps", "frps_clim") +c("rpss_specs", "enscorr_specs", "frps_specs", "frpss_specs", "bss10_specs", "frps") ) expect_equal( class(skill_metrics$rpss_specs), diff --git a/tests/testthat/test-seasonal_NAO.R b/tests/testthat/test-seasonal_NAO.R index 7a9c9de88fc9ea6d2f2fde5cef617241071f2532..4bbfd3b3e4019354e3517c159da4f793f7b49c5c 100644 --- a/tests/testthat/test-seasonal_NAO.R +++ b/tests/testthat/test-seasonal_NAO.R @@ -89,20 +89,20 @@ c(98909.3, 103299.8), tolerance = 0.0001 ) expect_equal( -(data$hcst$attrs$Dates)[1], -as.POSIXct("1993-04-30 18:00:00", tz = 'UTC') +month((data$hcst$attrs$Dates)[1]), +month(as.POSIXct("1993-04-30 18:00:00", tz = 'UTC')) ) expect_equal( -(data$hcst$attrs$Dates)[2], -as.POSIXct("1994-04-30 18:00:00", tz = 'UTC') +month((data$hcst$attrs$Dates)[2]), +month(as.POSIXct("1994-04-30 18:00:00", tz = 'UTC')) ) expect_equal( -(data$hcst$attrs$Dates)[5], -as.POSIXct("1997-04-30 18:00:00", tz = 'UTC') +month((data$hcst$attrs$Dates)[5]), +month(as.POSIXct("1997-04-30 18:00:00", tz = 'UTC')) ) expect_equal( -(data$obs$attrs$Dates)[8], -as.POSIXct("2000-04-16", tz = 'UTC') +month((data$obs$attrs$Dates)[8]), +month(as.POSIXct("2000-04-16", tz = 'UTC')) ) }) @@ -214,8 +214,8 @@ TRUE expect_equal( names(skill_metrics), c("mean_bias", "enscorr", - "enscorr_significance", "rps", "rps_clim", "rpss", "rpss_significance", - "crps", "crps_clim", "crpss", "crpss_significance", "enssprerr") + "enscorr_significance", "rps", "rpss", "rpss_significance", + "crps", "crpss", "crpss_significance", "enssprerr") ) expect_equal( class(skill_metrics$rpss), @@ -247,12 +247,22 @@ expect_equal( all(list.files(outputs, recursive = T) %in% c(paste0("Indices/ECMWF-SEAS5/nao/", paste0("nao_", 1993:2000, "0301.nc")), paste0("Indices/ERA5/nao/", paste0("nao_", 1993:2000, "0301.nc")), - "Skill/ECMWF-SEAS5/nao/scorecards_ECMWF-SEAS5_ERA5_nao-skill_1993-2000_s03.nc") + "Skill/ECMWF-SEAS5/ERA5/raw/nao/scorecards_ECMWF-SEAS5_ERA5_nao_crps_1993-2000_s03.nc", + "Skill/ECMWF-SEAS5/ERA5/raw/nao/scorecards_ECMWF-SEAS5_ERA5_nao_crpss_1993-2000_s03.nc", + "Skill/ECMWF-SEAS5/ERA5/raw/nao/scorecards_ECMWF-SEAS5_ERA5_nao_crpss_significance_1993-2000_s03.nc", + "Skill/ECMWF-SEAS5/ERA5/raw/nao/scorecards_ECMWF-SEAS5_ERA5_nao_enscorr_1993-2000_s03.nc", + "Skill/ECMWF-SEAS5/ERA5/raw/nao/scorecards_ECMWF-SEAS5_ERA5_nao_enscorr_significance_1993-2000_s03.nc", + "Skill/ECMWF-SEAS5/ERA5/raw/nao/scorecards_ECMWF-SEAS5_ERA5_nao_enssprerr_1993-2000_s03.nc", + "Skill/ECMWF-SEAS5/ERA5/raw/nao/scorecards_ECMWF-SEAS5_ERA5_nao_mean_bias_1993-2000_s03.nc", + "Skill/ECMWF-SEAS5/ERA5/raw/nao/scorecards_ECMWF-SEAS5_ERA5_nao_rps_1993-2000_s03.nc", + "Skill/ECMWF-SEAS5/ERA5/raw/nao/scorecards_ECMWF-SEAS5_ERA5_nao_rpss_1993-2000_s03.nc", + "Skill/ECMWF-SEAS5/ERA5/raw/nao/scorecards_ECMWF-SEAS5_ERA5_nao_rpss_significance_1993-2000_s03.nc" + ) ), TRUE) expect_equal( length(list.files(outputs, recursive = T)), -17 +26 ) }) diff --git a/tests/testthat/test-seasonal_monthly_statistics.R b/tests/testthat/test-seasonal_monthly_statistics.R new file mode 100644 index 0000000000000000000000000000000000000000..2db006dce540e3371baca788a63c333c9d0aa7a8 --- /dev/null +++ b/tests/testthat/test-seasonal_monthly_statistics.R @@ -0,0 +1,189 @@ +context("Seasonal monthly data") + +source("modules/Loading/Loading.R") +source("modules/Statistics/Statistics.R") +source("modules/Saving/Saving.R") +source("modules/Visualization/Visualization.R") + +recipe_file <- "tests/recipes/recipe-seasonal_monthly_1_statistics.yml" +recipe <- prepare_outputs(recipe_file, disable_checks = F) +archive <- read_yaml(paste0(recipe$Run$code_dir, "conf/archive.yml"))$archive + +# Load datasets +suppressWarnings({invisible(capture.output( +data <- Loading(recipe) +))}) + +# Compute statistics +suppressWarnings({invisible(capture.output( +statistics <- Statistics(recipe, data) +))}) + +# Saving +suppressWarnings({invisible(capture.output( +Saving(recipe = recipe, data = data, + skill_metrics = statistics) +))}) + +# Plotting +suppressWarnings({invisible(capture.output( +Visualization(recipe = recipe, data = data, + skill_metrics = statistics, + significance = T) +))}) +outdir <- get_dir(recipe = recipe, variable = data$hcst$attrs$Variable$varName) + +# ------- TESTS -------- + +test_that("1. Loading", { + +expect_equal( +is.list(data), +TRUE +) +expect_equal( +names(data), +c("hcst", "fcst", "obs") +) +expect_equal( +class(data$hcst), +"s2dv_cube" +) +expect_equal( +class(data$fcst), +"s2dv_cube" +) +expect_equal( +class(data$obs), +"s2dv_cube" +) +expect_equal( +names(data$hcst), +c("data", "dims", "coords", "attrs") +) +expect_equal( +names(data$hcst), +names(data$fcst) +) +expect_equal( +names(data$hcst), +names(data$obs) +) +expect_equal( +dim(data$hcst$data), +c(dat = 1, var = 1, sday = 1, sweek = 1, syear = 4, time = 3, latitude = 3, longitude = 3, ensemble = 25) +) +expect_equal( +dim(data$fcst$data), +c(dat = 1, var = 1, sday = 1, sweek = 1, syear = 1, time = 3, latitude = 3, longitude = 3, ensemble = 51) +) +expect_equal( +dim(data$hcst$attrs$Dates), +c(sday = 1, sweek = 1, syear = 4, time = 3) +) +expect_equal( +as.vector(drop(data$hcst$data)[1:2,1:2,1,2,3]), +c(293.9651, 295.9690, 290.6771, 290.7957), +tolerance = 0.0001 +) +expect_equal( +mean(data$hcst$data), +290.8758, +tolerance = 0.0001 +) +expect_equal( +range(data$hcst$data), +c(284.7413, 299.6219), +tolerance = 0.0001 +) +expect_equal( +(data$hcst$attrs$Dates)[1], +as.POSIXct("1993-11-30 23:59:59", tz = 'UTC') +) +expect_equal( +(data$hcst$attrs$Dates)[2], +as.POSIXct("1994-11-30 23:59:59", tz = 'UTC') +) +expect_equal( +(data$hcst$attrs$Dates)[5], +as.POSIXct("1993-12-31 23:59:59", tz = 'UTC') +) +expect_equal( +(data$obs$attrs$Dates)[10], +as.POSIXct("1995-01-15 12:00:00", tz = 'UTC') +) + +}) + +#====================================== +test_that("2. Statistics", { + +expect_equal( +is.list(statistics), +TRUE +) +expect_equal( +names(statistics), +c("cov", "std_hcst", "std_obs", "var_hcst", "var_obs", "n_eff") +) +expect_equal( +class(statistics$cov), +"array" +) +expect_equal( +dim(statistics$cov), +c(var = 1, time = 3, latitude = 3, longitude = 3) +) +expect_equal( +dim(statistics$cov), +dim(statistics$var_hcst) +) +expect_equal( +as.vector(statistics$cov[, , 2, 3]), +c(1.14846389, 0.05694802, 0.02346492), +tolerance = 0.0001 +) +expect_equal( +as.vector(statistics$var_hcst[, , 2, 3]), +c(0.74897676, 0.14698283, 0.04864656), +tolerance = 0.0001 +) + +}) + +test_that("3. Saving", { +outputs <- paste0(recipe$Run$output_dir, "/outputs/") +expect_equal( +all(basename(list.files(outputs, recursive = T)) %in% +c("scorecards_Meteo-France-System7_ERA5_tas_cov_1993-1996_s11.nc", + "scorecards_Meteo-France-System7_ERA5_tas_n_eff_1993-1996_s11.nc", + "scorecards_Meteo-France-System7_ERA5_tas_std_hcst_1993-1996_s11.nc", + "scorecards_Meteo-France-System7_ERA5_tas_std_obs_1993-1996_s11.nc", + "scorecards_Meteo-France-System7_ERA5_tas_var_hcst_1993-1996_s11.nc", + "scorecards_Meteo-France-System7_ERA5_tas_var_obs_1993-1996_s11.nc")), +TRUE +) +expect_equal( +length(list.files(outputs, recursive = T)), +6 +) + +}) + +test_that("4. Visualization", { +plots <- paste0(recipe$Run$output_dir, "/plots/") +expect_equal( +all(basename(list.files(plots, recursive = T)) %in% +c("cov-november.png", "n_eff-november.png", "std_hcst-november.png", + "std_obs-november.png", "var_hcst-november.png", "var_obs-november.png" )), +TRUE +) +expect_equal( +length(list.files(plots, recursive = T)), +6 +) + +}) + +# Delete files +unlink(recipe$Run$output_dir, recursive = T) diff --git a/tests/testthat/test-seasonal_monthly_tas-tos.R b/tests/testthat/test-seasonal_monthly_tas-tos.R new file mode 100644 index 0000000000000000000000000000000000000000..1345f63dfb1190988413905221cbda99c2f37c74 --- /dev/null +++ b/tests/testthat/test-seasonal_monthly_tas-tos.R @@ -0,0 +1,98 @@ +context("Seasonal monthly data") + +source("./modules/Loading/Loading.R") + +recipe_file <- "tests/recipes/recipe-seasonal_monthly_1_tas-tos.yml" +recipe <- prepare_outputs(recipe_file, disable_checks = F) +archive <- read_yaml(paste0(recipe$Run$code_dir, "conf/archive.yml"))$archive + +# Load datasets +suppressWarnings({invisible(capture.output( +data <- Loading(recipe) +))}) + + +# ------- TESTS -------- + +test_that("1. Loading", { + +expect_equal( +is.list(data), +TRUE +) +expect_equal( +names(data), +c("hcst", "fcst", "obs") +) +expect_equal( +class(data$hcst), +"s2dv_cube" +) +expect_equal( +class(data$fcst), +"s2dv_cube" +) +expect_equal( +class(data$obs), +"s2dv_cube" +) +expect_equal( +names(data$hcst), +c("data", "dims", "coords", "attrs") +) +expect_equal( +names(data$hcst), +names(data$fcst) +) +expect_equal( +names(data$hcst), +names(data$obs) +) +expect_equal( +dim(data$hcst$data), +c(dat = 1, var = 1, sday = 1, sweek = 1, syear = 4, time = 3, latitude = 4, longitude = 4, ensemble = 25) +) +expect_equal( +dim(data$fcst$data), +c(dat = 1, var = 1, sday = 1, sweek = 1, syear = 1, time = 3, latitude = 4, longitude = 4, ensemble = 51) +) +expect_equal( +dim(data$hcst$attrs$Dates), +c(sday = 1, sweek = 1, syear = 4, time = 3) +) +expect_equal( +as.vector(drop(data$hcst$data)[1:2,1:2,1,2,3]), +c(285.5869, 287.8836, 285.9362, 289.0483), +tolerance = 0.0001 +) +expect_equal( +mean(data$hcst$data), +290.1099, +tolerance = 0.0001 +) +expect_equal( +range(data$hcst$data), +c(283.2845, 299.7845), +tolerance = 0.0001 +) +expect_equal( +(data$hcst$attrs$Dates)[1], +as.POSIXct("1993-01-31 18:00:00", tz = 'UTC') +) +expect_equal( +(data$hcst$attrs$Dates)[2], +as.POSIXct("1994-01-31 18:00:00", tz = 'UTC') +) +expect_equal( +(data$hcst$attrs$Dates)[5], +as.POSIXct("1993-02-28 18:00:00", tz = 'UTC') +) +expect_equal( +(data$obs$attrs$Dates)[10], +as.POSIXct("1994-03-15 12:00:00", tz = 'UTC') +) + +}) + +# Delete files +unlink(recipe$Run$output_dir, recursive = T) diff --git a/tools/CST_ChangeDimName.R b/tools/CST_ChangeDimName.R new file mode 100644 index 0000000000000000000000000000000000000000..1ccbbb3d327df21f6582326e2e54fdf58ea68f09 --- /dev/null +++ b/tools/CST_ChangeDimName.R @@ -0,0 +1,39 @@ +## TODO: Documentation + +CST_ChangeDimName <- function(data, original_dimnames, final_dimnames) { + if (!inherits(data, "s2dv_cube")) { + stop("Parameter 'data' must be an object of class 's2dv_cube'") + } + if (!(length(original_dimnames) == length(final_dimnames))) { + stop("The number of dimension names in 'final_dimnames' must be the same + as in 'original_dimnames'") + } + ## TODO: Add check to verify that all original_dimnames are present in the array + for (index in 1:length(original_dimnames)) { + original_name <- original_dimnames[index] + final_name <- final_dimnames[index] + # Step 1: Change dims and data + names(data$dims)[which(names(data$dims) == original_name)] <- final_name + dim(data$data) <- data$dims + # Step 2: Change coords + names(data$coords)[which(names(data$coords) == original_name)] <- final_name + # Step 3: Change attrs + # 3.1 - Dates + if (original_name %in% names(dim(data$attrs$Dates))) { + names(dim(data$attrs$Dates))[which(names(dim(data$attrs$Dates)) + == original_name)] <- final_name + } + # 3.2 - Variable metadata + if (original_name %in% names(data$attrs$Variable$metadata)) { + names(data$attrs$Variable$metadata)[which(names(data$attrs$Variable$metadata) + == original_name)] <- final_name + } + # 3.3 - Source files + if (original_name %in% names(dim(data$attrs$source_files))) { + names(dim(data$attrs$source_files))[which(names(dim(data$attrs$source_files)) + == original_name)] <- final_name + } + } + return(data) +} + diff --git a/tools/check_recipe.R b/tools/check_recipe.R index e68f0b901c514b2ba644d87eae42bbc3f69e4c5a..d9f796036a979943151d570d9088984a1741d5d8 100644 --- a/tools/check_recipe.R +++ b/tools/check_recipe.R @@ -2,11 +2,11 @@ check_recipe <- function(recipe) { # recipe: yaml recipe already read it ## TODO: set up logger-less case info(recipe$Run$logger, paste("Checking recipe:", recipe$recipe_path)) - + # --------------------------------------------------------------------- # ANALYSIS CHECKS # --------------------------------------------------------------------- - + TIME_SETTINGS_SEASONAL <- c("sdate", "ftime_min", "ftime_max", "hcst_start", "hcst_end") TIME_SETTINGS_DECADAL <- c("ftime_min", "ftime_max", "hcst_start", "hcst_end") @@ -15,28 +15,28 @@ check_recipe <- function(recipe) { HORIZONS <- c("subseasonal", "seasonal", "decadal") ARCHIVE_SEASONAL <- "conf/archive.yml" ARCHIVE_DECADAL <- "conf/archive_decadal.yml" - + # Define error status variable error_status <- F - + # Check basic elements in recipe:Analysis: if (!("Analysis" %in% names(recipe))) { error(recipe$Run$logger, "The recipe must contain an element called 'Analysis'.") error_status <- T } - + if (!all(PARAMS %in% names(recipe$Analysis))) { error(recipe$Run$logger, paste0("The element 'Analysis' in the recipe must contain all of ", "the following: ", paste(PARAMS, collapse = ", "), ".")) error_status <- T } - + if (!any(HORIZONS %in% tolower(recipe$Analysis$Horizon))) { error(recipe$Run$logger, paste0("The element 'Horizon' in the recipe must be one of the ", - "following: ", paste(HORIZONS, collapse = ", "), ".")) + "following: ", paste(HORIZONS, collapse = ", "), ".")) error_status <- T } # Check time settings @@ -62,6 +62,23 @@ check_recipe <- function(recipe) { } else { archive <- NULL } + # Check variable parameters + if ("name" %in% names(recipe$Analysis$Variables)) { + recipe_variables <- recipe$Analysis$Variables$name + } else { + recipe_variables <- sapply(recipe$Analysis$Variables, get, x = "name") + } + recipe_variables <- unlist(strsplit(recipe_variables, ", | |,")) + # Sea-ice threshold check + if (("tas-tos" %in% recipe_variables) && + (!is.null(recipe$Analysis$Variables$sic_threshold))) { + if (!is.numeric(recipe$Analysis$Variables$sic_threshold) || + dplyr::between(recipe$Analysis$Variables$sic_threshold, 0, 1)) { + error(recipe$Run$logger, + paste("The element Analysis:Variables:sic_threshold must be a", + "numeric value between 0 and 1.")) + } + } # Check system names if (!is.null(archive)) { if (!all(recipe$Analysis$Datasets$System$name %in% names(archive$System))) { @@ -71,7 +88,7 @@ check_recipe <- function(recipe) { } # Check reference names if (!all(recipe$Analysis$Datasets$Reference$name %in% - names(archive$Reference))) { + names(archive$Reference))) { error(recipe$Run$logger, "The specified Reference name was not found in the archive.") error_status <- T @@ -121,7 +138,7 @@ check_recipe <- function(recipe) { } else { stream <- "fcst" } - + ## TODO: To be implemented in the future # if (length(recipe$Analysis$Time$sdate$fcst_day) > 1 && # tolower(recipe$Analysis$Horizon) != "subseasonal") { @@ -135,7 +152,7 @@ check_recipe <- function(recipe) { # error(recipe$Run$logger, # paste("The element 'fcst_sday' in the recipe should be defined.")) # } - + if (is.null(recipe$Analysis$Time$fcst_year)) { warn(recipe$Run$logger, paste("The element 'fcst_year' is not defined in the recipe.", @@ -151,7 +168,7 @@ check_recipe <- function(recipe) { # } # } # fcst.sdate <- list(stream = stream, fcst.sdate = fcst.sdate) - + # Regrid checks: if (length(recipe$Analysis$Regrid) != 2) { error(recipe$Run$logger, @@ -166,7 +183,7 @@ check_recipe <- function(recipe) { "Only one single Horizon can be specified in the recipe") error_status <- T } - + ## TODO: Refine this # nvar <- length(recipe$Analysis$Variables) # if (nvar > 2) { @@ -174,19 +191,19 @@ check_recipe <- function(recipe) { # "Only two type of Variables can be listed: ECVs and Indicators.") # stop("EXECUTION FAILED") # } - + # remove NULL or None Indicators or ECVs from the recipe: if (!is.null(recipe$Analysis$Variables$Indicators) && !is.list(recipe$Analysis$Variables$Indicators)) { recipe$Analysis$Variables <- recipe$Analysis$Variables[ - -which(names(recipe$Analysis$Variables) == 'Indicators')] + -which(names(recipe$Analysis$Variables) == 'Indicators')] } if (!is.null(recipe$Analysis$Variables$ECVs) && !is.list(recipe$Analysis$Variables$ECVs)) { recipe$Analysis$Variables <- recipe$Analysis$Variables[ - -which(names(recipe$Analysis$Variables) == 'ECVs')] + -which(names(recipe$Analysis$Variables) == 'ECVs')] } - + # Region checks: LIMITS <- c('latmin', 'latmax', 'lonmin', 'lonmax') # Ordinary recipe @@ -195,7 +212,7 @@ check_recipe <- function(recipe) { if (!all(LIMITS %in% names(region))) { error(recipe$Run$logger, paste0("There must be 4 elements in 'Region': ", - paste(LIMITS, collapse = ", "), ".")) + paste(LIMITS, collapse = ", "), ".")) error_status <- T } } @@ -208,35 +225,35 @@ check_recipe <- function(recipe) { } } } - # Atomic recipe + # Atomic recipe } else if (!all(LIMITS %in% names(recipe$Analysis$Region))) { error(recipe$Run$logger, paste0("There must be 4 elements in 'Region': ", paste(LIMITS, collapse = ", "), ".")) error_status <- T } - ## TODO: Implement multiple regions - # nregions <- length(recipe$Analysis$Region) - # for (i in 1:length(recipe$Analysis$Region)) { - # if (!all(limits %in% names(recipe$Analysis$Region[[i]]))) { - # limits <- paste(limits, collapse = " ") - # error(recipe$Run$logger, - # paste0("Each region defined in element 'Region' ", - # "should have 4 elements: ", - # paste(limits, collapse = ", "), ".")) - # error_status <- T - # } - # if (length(recipe$Analysis$Region) > 1) { - # if (!("name" %in% names(recipe$Analysis$Region[[i]]))) { - # error(recipe$Run$logger, - # paste("If multiple regions are requested, each region must", - # "have a 'name'".) - # # are numeric? class list mode list - # } + ## TODO: Implement multiple regions + # nregions <- length(recipe$Analysis$Region) + # for (i in 1:length(recipe$Analysis$Region)) { + # if (!all(limits %in% names(recipe$Analysis$Region[[i]]))) { + # limits <- paste(limits, collapse = " ") + # error(recipe$Run$logger, + # paste0("Each region defined in element 'Region' ", + # "should have 4 elements: ", + # paste(limits, collapse = ", "), ".")) + # error_status <- T + # } + # if (length(recipe$Analysis$Region) > 1) { + # if (!("name" %in% names(recipe$Analysis$Region[[i]]))) { + # error(recipe$Run$logger, + # paste("If multiple regions are requested, each region must", + # "have a 'name'".) + # # are numeric? class list mode list + # } # --------------------------------------------------------------------- # WORKFLOW CHECKS # --------------------------------------------------------------------- - + # Calibration # If 'method' is FALSE/no/'none' or NULL, set to 'raw' ## TODO: Review this check @@ -296,7 +313,7 @@ check_recipe <- function(recipe) { } } } - + # Downscaling if ("Downscaling" %in% names(recipe$Analysis$Workflow)) { downscal_params <- lapply(recipe$Analysis$Workflow$Downscaling, tolower) @@ -312,27 +329,27 @@ check_recipe <- function(recipe) { downscal_params$type <- "none" warn(recipe$Run$logger, paste("Downscaling 'type' is empty in the recipe, setting it to", - "'none'.")) + "'none'.")) } if (!(downscal_params$type %in% DOWNSCAL_TYPES)) { error(recipe$Run$logger, paste0("The type of Downscaling request in the recipe is not ", - "available. It must be one of the following: ", - paste(DOWNSCAL_TYPES, collapse = ", "), ".")) + "available. It must be one of the following: ", + paste(DOWNSCAL_TYPES, collapse = ", "), ".")) error_status <- T } if ((downscal_params$type %in% c("int", "intbc", "intlr", "logreg")) && (is.null(downscal_params$target_grid))) { error(recipe$Run$logger, paste("A target grid is required for the downscaling method", - "requested in the recipe.")) + "requested in the recipe.")) error_status <- T } if (downscal_params$type == "int") { if (is.null(downscal_params$int_method)) { error(recipe$Run$logger, paste("Downscaling type 'int' was requested, but no", - "interpolation method is provided in the recipe.")) + "interpolation method is provided in the recipe.")) error_status <- T } } else if (downscal_params$type %in% @@ -341,63 +358,63 @@ check_recipe <- function(recipe) { error(recipe$Run$logger, paste("Downscaling type", downscal_params$type, "was requested in the recipe, but no", - "interpolation method is provided.")) + "interpolation method is provided.")) error_status <- T } } else if (downscal_params$type == "intbc") { if (is.null(downscal_params$bc_method)) { error(recipe$Run$logger, paste("Downscaling type 'intbc' was requested in the recipe, but", - "no bias correction method is provided.")) + "no bias correction method is provided.")) error_status <- T } else if (!(downscal_params$bc_method %in% BC_METHODS)) { error(recipe$Run$logger, paste0("The accepted Bias Correction methods for the downscaling", - " module are: ", paste(BC_METHODS, collapse = ", "), ".")) + " module are: ", paste(BC_METHODS, collapse = ", "), ".")) error_status <- T } } else if (downscal_params$type == "intlr") { if (length(downscal_params$lr_method) == 0) { error(recipe$Run$logger, paste("Downscaling type 'intlr' was requested in the recipe, but", - "no linear regression method was provided.")) + "no linear regression method was provided.")) error_status <- T } else if (!(downscal_params$lr_method %in% LR_METHODS)) { error(recipe$Run$logger, paste0("The accepted linear regression methods for the", - " downscaling module are: ", - paste(LR_METHODS, collapse = ", "), ".")) + " downscaling module are: ", + paste(LR_METHODS, collapse = ", "), ".")) error_status <- T } } else if (downscal_params$type == "analogs") { if (is.null(downscal_params$nanalogs)) { warn(recipe$Run$logger, paste("Downscaling type is 'analogs, but the number of analogs", - "has not been provided in the recipe. The default is 3.")) + "has not been provided in the recipe. The default is 3.")) } } else if (downscal_params$type == "logreg") { if (is.null(downscal_params$int_method)) { error(recipe$Run$logger, paste("Downscaling type 'logreg' was requested in the recipe, but", - "no interpolation method was provided.")) + "no interpolation method was provided.")) error_status <- T } if (is.null(downscal_params$log_reg_method)) { error(recipe$Run$logger, paste("Downscaling type 'logreg' was requested in the recipe,", - "but no logistic regression method is provided.")) + "but no logistic regression method is provided.")) error_status <- T } else if (!(downscal_params$log_reg_method %in% LOGREG_METHODS)) { error(recipe$Run$logger, paste0("The accepted logistic regression methods for the ", - "downscaling module are: ", - paste(LOGREG_METHODS, collapse = ", "), ".")) + "downscaling module are: ", + paste(LOGREG_METHODS, collapse = ", "), ".")) error_status <- T } } } } - + # Indices if ("Indices" %in% names(recipe$Analysis$Workflow)) { nino_indices <- paste0("nino", c("1+2", "3", "3.4", "4")) @@ -408,9 +425,9 @@ check_recipe <- function(recipe) { "in the recipe.")) error_status <- T } else if (!(recipe$Analysis$Workflow$Anomalies$compute)) { - error(recipe$Run$logger, - paste0("Indices uses Anomalies as input, but the parameter", - "'Anomalies:compute' is set as no/False.")) + error(recipe$Run$logger, + paste0("Indices uses Anomalies as input, but the parameter", + "'Anomalies:compute' is set as no/False.")) error_status <- T } recipe_indices <- tolower(names(recipe$Analysis$Workflow$Indices)) @@ -422,7 +439,6 @@ check_recipe <- function(recipe) { error_status <- T } # Check that variables correspond with indices requested - recipe_variables <- strsplit(recipe$Analysis$Variables$name, ", | |,")[[1]] if (("nao" %in% recipe_indices) && (!all(recipe_variables %in% c("psl", "z500")))) { error(recipe$Run$logger, @@ -440,7 +456,7 @@ check_recipe <- function(recipe) { error_status <- T } } - + # Skill AVAILABLE_METRICS <- c("enscorr", "corr_individual_members", "rps", "rps_syear", "rpss", "frps", "frpss", "crps", "crps_syear", @@ -451,9 +467,9 @@ check_recipe <- function(recipe) { "frpss_specs", "bss10_specs", "bss90_specs") if ("Skill" %in% names(recipe$Analysis$Workflow)) { if (is.null(recipe$Analysis$Workflow$Skill$metric)) { - error(recipe$Run$logger, - "Parameter 'metric' must be defined under 'Skill'.") - error_status <- T + error(recipe$Run$logger, + "Parameter 'metric' must be defined under 'Skill'.") + error_status <- T } else { requested_metrics <- strsplit(recipe$Analysis$Workflow$Skill$metric, ", | |,")[[1]] @@ -464,6 +480,14 @@ check_recipe <- function(recipe) { "full list of accepted skill metrics.")) error_status <- T } + if (tolower(recipe$Analysis$Output_format) != 'scorecards') { + if (any(grepl('_syear', requested_metrics))) { + recipe$Analysis$Output_format <- 'scorecards' + warn(recipe$Run$logger, + paste0("'_syear' metrics can only be saved as 'scorecards' ", + "output format. The output format is now 'scorecards'.")) + } + } } # Saving checks SAVING_OPTIONS_SKILL <- c("all", "none") @@ -476,7 +500,7 @@ check_recipe <- function(recipe) { error_status <- T } } - + # Probabilities if ("Probabilities" %in% names(recipe$Analysis$Workflow)) { if (is.null(recipe$Analysis$Workflow$Probabilities$percentiles)) { @@ -501,11 +525,11 @@ check_recipe <- function(recipe) { error_status <- T } } - + # Visualization if ("Visualization" %in% names(recipe$Analysis$Workflow)) { PLOT_OPTIONS <- c("skill_metrics", "forecast_ensemble_mean", - "most_likely_terciles") + "most_likely_terciles", "statistics") # Separate plots parameter and check if all elements are in PLOT_OPTIONS if (is.null(recipe$Analysis$Workflow$Visualization$plots)) { error(recipe$Run$logger, @@ -566,51 +590,56 @@ check_recipe <- function(recipe) { } # Scorecards if ("Scorecards" %in% names(recipe$Analysis$Workflow)) { - if (is.null(recipe$Analysis$Workflow$Scorecards$metric)) { - error(recipe$Run$logger, - "Parameter 'metric' must be defined under 'Scorecards'.") - error_status <- T - } else { - sc_metrics <- strsplit(recipe$Analysis$Workflow$Scorecards$metric, - ", | |,")[[1]] - if (recipe$Analysis$Workflow$Scorecards$metric_aggregation == 'score') { - if ('rpss' %in% tolower(sc_metrics)) { - if (!('rps_clim_syear' %in% requested_metrics)) { - requested_metrics <- c(requested_metrics, 'rps_clim_syear') - } - if (!('rps_syear' %in% requested_metrics)) { - requested_metrics <- c(requested_metrics, 'rps_syear') + if(recipe$Analysis$Workflow$Scorecards$execute == TRUE){ + if (is.null(recipe$Analysis$Workflow$Scorecards$metric)) { + error(recipe$Run$logger, + "Parameter 'metric' must be defined under 'Scorecards'.") + error_status <- T + } else { + sc_metrics <- strsplit(recipe$Analysis$Workflow$Scorecards$metric, + ", | |,")[[1]] + if (recipe$Analysis$Workflow$Scorecards$metric_aggregation == 'score') { + if ('rpss' %in% tolower(sc_metrics)) { + if (!('rps_clim_syear' %in% requested_metrics)) { + requested_metrics <- c(requested_metrics, 'rps_clim_syear') + } + if (!('rps_syear' %in% requested_metrics)) { + requested_metrics <- c(requested_metrics, 'rps_syear') + } } - } - if ('crpss' %in% tolower(sc_metrics)) { - if (!('crps_clim_syear' %in% requested_metrics)) { - requested_metrics <- c(requested_metrics, 'crps_clim_syear') + if ('crpss' %in% tolower(sc_metrics)) { + if (!('crps_clim_syear' %in% requested_metrics)) { + requested_metrics <- c(requested_metrics, 'crps_clim_syear') + } + if (!('crps_syear' %in% requested_metrics)) { + requested_metrics <- c(requested_metrics, 'crps_syear') + } } - if (!('crps_syear' %in% requested_metrics)) { - requested_metrics <- c(requested_metrics, 'crps_syear') + if ('enscorr' %in% tolower(sc_metrics)) { + recipe$Analysis$Workflow$Statistics <- c('std', 'cov', 'n_eff') } + recipe$Analysis$Workflow$Skill$metric <- requested_metrics } - if ('enscorr' %in% tolower(sc_metrics)) { - recipe$Analysis$Workflow$Statistics <- c('standard_deviation', 'covariance') + if (tolower(recipe$Analysis$Output_format) != 'scorecards') { + recipe$Analysis$Output_format <- 'scorecards' + } + if (!all(tolower(sc_metrics) %in% tolower(requested_metrics))) { + error(recipe$Run$logger, + paste0("All of the metrics requested under 'Scorecards' must ", + "be requested in the 'Skill' section.")) + error_status <- T } - recipe$Analysis$Workflow$Skill$metric <- requested_metrics - } - if (!all(tolower(sc_metrics) %in% tolower(requested_metrics))) { - error(recipe$Run$logger, - paste0("All of the metrics requested under 'Scorecards' must ", - "be requested in the 'Skill' section.")) - error_status <- T } - } + } } # --------------------------------------------------------------------- # RUN CHECKS # --------------------------------------------------------------------- - + ## TODO: These checks should probably go first RUN_FIELDS = c("Loglevel", "Terminal", "output_dir", "code_dir") LOG_LEVELS = c("INFO", "DEBUG", "WARN", "ERROR", "FATAL") - + if (!("Run" %in% names(recipe))) { stop("The recipe must contain an element named 'Run'.") } @@ -650,11 +679,11 @@ check_recipe <- function(recipe) { paste0(LOG_LEVELS, collapse='/'))) error_status <- T } - + # --------------------------------------------------------------------- # AUTOSUBMIT CHECKS # --------------------------------------------------------------------- - + AUTO_PARAMS <- c("script", "expid", "hpc_user", "wallclock", "processors_per_job", "platform", "email_notifications", "email_address", "notify_completed", "notify_failed") @@ -723,7 +752,7 @@ check_recipe <- function(recipe) { error_status <- T } } - + # --------------------------------------------------------------------- # WORKFLOW CHECKS # --------------------------------------------------------------------- @@ -733,7 +762,7 @@ check_recipe <- function(recipe) { #nverifications <- check_number_of_dependent_verifications(recipe) # info(recipe$Run$logger, paste("Start Dates:", # paste(fcst.sdate, collapse = " "))) - + # Return error if any check has failed if (error_status) { error(recipe$Run$logger, "RECIPE CHECK FAILED.") diff --git a/tools/divide_recipe.R b/tools/divide_recipe.R index cf9cecd8bdd6b7e42005e834e30457aafb786e44..b22274cc5c0e7e5081a3f48f492cdcbfe8fba026 100644 --- a/tools/divide_recipe.R +++ b/tools/divide_recipe.R @@ -5,8 +5,8 @@ divide_recipe <- function(recipe) { info(recipe$Run$logger, "Splitting recipe in single verifications.") beta_recipe <- list(Description = append(recipe$Description, list(Origin = paste("Atomic recipe,", - "split from:", - recipe$name))), + "split from:", + recipe$name))), Analysis = list(Horizon = recipe$Analysis$Horizon, Variables = NULL, Datasets = NULL, @@ -14,18 +14,25 @@ divide_recipe <- function(recipe) { Region = NULL, Regrid = recipe$Analysis$Regrid, Workflow = recipe$Analysis$Workflow, - ncores = recipe$Analysis$ncores, - remove_NAs = recipe$Analysis$remove_NAs, + ncores = recipe$Analysis$ncores, + remove_NAs = recipe$Analysis$remove_NAs, Output_format = - recipe$Analysis$Output_format), - Run = recipe$Run[c("Loglevel", "output_dir", "Terminal", - "code_dir", "logfile", "filesystem")]) + recipe$Analysis$Output_format), + Run = recipe$Run[c("Loglevel", "output_dir", "Terminal", + "code_dir", "logfile", "filesystem")]) - # duplicate recipe by independent variables: + # duplicate recipe by independent variables:ç + # If a single variable is not given inside a list, rebuild structure + if (any(c("name", "freq", "units") %in% names(recipe$Analysis$Variables))) { + variables <- recipe$Analysis$Variables + recipe$Analysis$Variables <- NULL + recipe$Analysis$Variables[[1]] <- variables + } all_recipes <- rep(list(beta_recipe), length(recipe$Analysis$Variables)) for (var in 1:length(recipe$Analysis$Variables)) { all_recipes[[var]]$Analysis$Variables <- recipe$Analysis$Variables[[var]] } + # for (dep in verifications$dependent) { # all_recipes[[i]]$Analysis$Variables <- dep # i = i + 1 @@ -35,7 +42,7 @@ divide_recipe <- function(recipe) { # duplicate recipe by Datasets: # check Systems - + # If a single system is not given inside a list, rebuild structure if (any(c("name", "member") %in% names(recipe$Analysis$Datasets$System))) { system <- recipe$Analysis$Datasets$System recipe$Analysis$Datasets$System <- NULL @@ -66,7 +73,14 @@ divide_recipe <- function(recipe) { all_recipes <- recipes rm(list = 'recipes') } - # check References + # Check references + # If a single reference is not given inside a list, rebuild structure + if (c("name") %in% names(recipe$Analysis$Datasets$Reference)) { + reference <- recipe$Analysis$Datasets$Reference + recipe$Analysis$Datasets$Reference <- NULL + recipe$Analysis$Datasets$Reference[[1]] <- reference + } + for (ref in 1:length(recipe$Analysis$Datasets$Reference)) { for (reci in 1:length(all_recipes)) { all_recipes[[reci]]$Analysis$Datasets$Reference <- @@ -79,6 +93,8 @@ divide_recipe <- function(recipe) { } } all_recipes <- recipes + rm(list = 'recipes') + # Duplicate recipe by Region recipes <- list() if (any(c("latmin", "latmax", "lonmin", "lonmax") %in% @@ -105,12 +121,12 @@ divide_recipe <- function(recipe) { for (sdate in 1:length(recipe$Analysis$Time$sdate)) { for (reci in 1:length(all_recipes)) { all_recipes[[reci]]$Analysis$Time <- - list(sdate = recipe$Analysis$Time$sdate[[sdate]], - fcst_year = recipe$Analysis$Time$fcst_year, - hcst_start = recipe$Analysis$Time$hcst_start, - hcst_end = recipe$Analysis$Time$hcst_end, - ftime_min = recipe$Analysis$Time$ftime_min, - ftime_max = recipe$Analysis$Time$ftime_max) + list(sdate = recipe$Analysis$Time$sdate[[sdate]], + fcst_year = recipe$Analysis$Time$fcst_year, + hcst_start = recipe$Analysis$Time$hcst_start, + hcst_end = recipe$Analysis$Time$hcst_end, + ftime_min = recipe$Analysis$Time$ftime_min, + ftime_max = recipe$Analysis$Time$ftime_max) } if (sdate == 1) { recipes <- all_recipes @@ -139,15 +155,15 @@ divide_recipe <- function(recipe) { } write_yaml(all_recipes[[reci]], paste0(recipe$Run$output_dir, "/logs/recipes/atomic_recipe_", - recipe_number, ".yml")) + recipe_number, ".yml")) } info(recipe$Run$logger, paste("The main recipe has been divided into", length(all_recipes), - "atomic recipes.")) + "atomic recipes.")) text <- paste0("Check output directory ", recipe$Run$output_dir, - "/logs/recipes/ to see all the individual atomic recipes.") + "/logs/recipes/ to see all the individual atomic recipes.") info(recipe$Run$logger, text) ## TODO: Change returns? return(list(n_atomic_recipes = length(all_recipes), - outdir = recipe$Run$output_dir)) + outdir = recipe$Run$output_dir)) } diff --git a/tools/libs.R b/tools/libs.R index 3b855a77f9691a060b125826cf90af9cbbe68411..401467860ba602c0bd459439e973e3637adb7a4f 100644 --- a/tools/libs.R +++ b/tools/libs.R @@ -41,3 +41,5 @@ source("tools/restructure_recipe.R") # Settings options(bitmapType = 'cairo') pdf(NULL) +Sys.setlocale("LC_ALL", 'en_GB.UTF-8') + diff --git a/tools/prepare_outputs.R b/tools/prepare_outputs.R index c7f3e3e60c6f9c1d843ef8a14790a9438c1dc0af..aeb9e2f8bd2f685db73fcbec1f94e3934c5e91ec 100644 --- a/tools/prepare_outputs.R +++ b/tools/prepare_outputs.R @@ -89,6 +89,10 @@ prepare_outputs <- function(recipe_file, warn(recipe$Run$logger, "Filesystem not specified in the recipe. Setting it to 'esarchive'.") } + # Restructure the recipe to make the atomic recipe more readable + if (restructure) { + recipe <- restructure_recipe(recipe) + } # Run recipe checker if (disable_checks) { warn(recipe$Run$logger, @@ -96,9 +100,5 @@ prepare_outputs <- function(recipe_file, } else { recipe <- check_recipe(recipe) } - # Restructure the recipe to make the atomic recipe more readable - if (restructure) { - recipe <- restructure_recipe(recipe) - } return(recipe) } diff --git a/use_cases/ex0_1_sample_dataset/ex0_1-handson.md b/use_cases/ex0_1_sample_dataset/ex0_1-handson.md new file mode 100644 index 0000000000000000000000000000000000000000..c1dfe48a1d9dcf613db094a1c265331d4abf4a2e --- /dev/null +++ b/use_cases/ex0_1_sample_dataset/ex0_1-handson.md @@ -0,0 +1,238 @@ +# Use case 0.1: Loading a sample dataset + +## Goal +Load a sample dataset into SUNSET without the need configure an archive of netCDF or GRIB files. + +## 0. SUNSET software stack + +The first step to use SUNSET is to create a copy of the code in your local environment. Open a terminal and `cd` to the directory where you would like to store your local copy of SUNSET. For example: `/esarchive/scratch//git/`. If a directory does not exist yet, you can create it with the `mkdir` shell command. + +```shell +# Clone the GitLab repository to create a local copy of the code +git clone https://earth.bsc.es/gitlab/es/sunset.git +``` + +If you are using BSC infrastructure, all of the software dependencies are already installed within our common environment. However, if you are running SUNSET outside of the BSC facilities, you can install the dependencies by creating a conda environment. A script is provided in the SUNSET repository, and you can install the environment by running the following line from the main folder of the SUNSET repository: + +```shell +bash conda_installation/load_sunset.bash +``` + +To run the line above, you should replace `` with the path where you want the conda environment to be installed. For example, `/home//conda-sunset`. + +## 1. The recipe + +SUNSET uses YAML configuration files called 'recipes' to specify which data you want to load and the details of the different steps of the workflow. In this example, we are using a sample dataset which contains data of temperature-at-surface (tas) monthly means, from ECMWF SEAS5 as the experiment and ERA5 as the reference dataset, for the initialization month of November. + +There is a template file for this hands-on tutorial, which you can open with a text editor: + +```shell +# cd to the main SUNSET directory +# Open the recipe with a text editor such as vim or emacs +vim use_cases/ex0_1_sample_dataset/ex0_1-recipe.yml +``` + +Once you have opened your recipe, it is time to edit the contents. Fill in the blank slots according to your preference, based on the options given in the description. + +NOTE: In this particular example, the sample data is limited, so the system and reference names, variable, forecast times and region cannot be changed. + +```yaml +Description: + Author: <___> + Description: Exercise 0.1: Skill assessment of a sample dataset +Analysis: + Horizon: Seasonal + Variables: + name: tas + freq: monthly_mean + units: <___> # Choose your units: C or K + Datasets: + System: + name: ECMWF-SEAS5.1 + Multimodel: no + Reference: + name: ERA5 + Time: + sdate: '1101' + fcst_year: + hcst_start: '2000' + hcst_end: '2006' + ftime_min: 1 + ftime_max: 3 + Region: + latmin: 27 + latmax: 48 + lonmin: 0 + lonmax: 359 + Regrid: + method: bilinear + type: 'to_system' + Workflow: + Anomalies: + compute: yes + cross_validation: yes + save: 'none' + Calibration: + method: <___> + save: 'none' + Skill: + metric: RPSS BSS10 BSS90 + cross_validation: yes + save: 'none' + Probabilities: + percentiles: [[1/3, 2/3]] + save: 'none' + Visualization: + plots: skill_metrics + multi_panel: no + projection: cylindrical_equidistant + ncores: 10 + remove_NAs: yes + Output_format: S2S4E +Run: + filesystem: sample # This parameter specifies that we want to load the sample data, rather than data from a filesystem + Loglevel: INFO + Terminal: yes + output_dir: <______> # Path to the directory where you want your outputs to be saved + code_dir: <______> # Path to the directory where your code is +``` + +## 2. Load the required SUNSET modules and read the recipe + +If you are running SUNSET within BSC infrastructure, source the MODULES file to load the environment modules needed to run SUNSET. + +```shell +source MODULES +``` + +Otherwise, activate the conda environment: + +```shell +conda activate +``` + +Open an R session, by simply typing `R` on the terminal. + +To run SUNSET, we must run the R session from the directory where the code is. To check your working directory, you can run the shell command `pwd`. From the R session, you can use the commands `getwd()` and `setwd()` to see and change your working directory. + +```r +# Load required modules +source("modules/Loading/Loading.R") +source("modules/Units/Units.R") +source("modules/Calibration/Calibration.R") +source("modules/Anomalies/Anomalies.R") +source("modules/Skill/Skill.R") +source("modules/Saving/Saving.R") +source("modules/Visualization/Visualization.R") + +# Read recipe +recipe_file <- "use_cases/ex0_1_sample_dataset/ex0_1-recipe.yml" +recipe <- prepare_outputs(recipe_file) +``` + +The function `prepare_outputs()` creates a unique folder for the logs, data files and plots that result from the execution of your recipe, inside the directory you specified. It also runs a check over the recipe to detect any potential errors, misspellings or missing arguments. At the end of the check, a message is displayed indicating whether or not the recipe passed the check, along with the list of errors and warnings. + +**Questions** + +Read the logs! + +(1) Did your recipe pass the check? Did you get any warnings? + +(2) Where will your outputs be saved? Copy and paste this directory somewhere, so that you can check it later! + +*Tip*: The recipe is now stored as a `list` containing all the information of the original YAML file, plus some extra things! If you want to see any particular element of the recipe from the R session, you can simply access that element in the list. For example: + +```r +# Checking the variable name +recipe$Analysis$Variables$name +# Checking the output directory +recipe$Run$output_dir +``` + +## 3. Load the data and change the units + +The **Loading** module retrieves the information from the recipe to load the data that has been requested it in. It loads the experiment data for the hindcast period, the reference data for the corresponding period, and the experiment forecast if a forecast year has been requested. In the case of the sample dataset, we have a hindcast and the corresponding observations, but no forecast. + +For certain variables like temperature, precipitation or sea level pressure, the user can request for specific units to load the data in. The **Units** module will read the original units as stored in the netCDF files and perform any necessary unit converstions to match the request in the recipe. It also verifies that all of the loaded datasets share the same units, even if no specific unit has been requested. For this reason, users are strongly encouraged to run it even if they did not request any unit conversion. + +```r +# Load datasets +data <- Loading(recipe) +# Change units +data <- Units(recipe, data) +``` + +**Questions** + +(1) What is the structure of `data`? What is the class of the objects in `data`? *Tip*: you can use functions like `class()`, `names()` or `str()` to gain information about the structure of the object and its contents. + +```r +class(data) +names(data) +str(data, max.level = 2) +# You can access any of the three objects with the `$` operator: +class(data$hcst) +``` + +(2) Pay attention to the log messages: Did your units get converted? Are the new units what you expect? You can check the metadata of any of the objects in data. SUNSET also provides the `data_summary()` function, which lets you have a quick look at your objects: + +```r +# Check the new units and data of the hindcast (hcst) and/or observations (obs). Are they the same? +data$hcst$attrs$Variable$metadata$tas$units +data_summary(data$hcst, recipe) +data_summary(data$obs, recipe) +``` +(3) What are the dimensions of the datasets? Are they consistent with what is requested in the recipe? *Tip*: Check the data summary! + +## 4. Calibrate the data and compute the anomalies + +SUNSET has a few modules to perform post-processing on the experimental and the reference datasets. The **Calibration** module performs the bias correction method indicated in the recipe, using the `CSTools::CST_Calibration()` function. + +The **Anomalies** module removes the climatologies using functions like `CSTools::CST_Anomaly()` and `s2dv::Clim()`, and also returns the full fields in case they are needed for any future computations. + +```r +# Calibrate the data +data <- Calibration(recipe, data) +# Compute anomalies +data <- Anomalies(recipe, data) +``` +**Questions** + +(1) Verify that you now have anomaly values instead of the original full field. *Tip*: Use `data_summary()` like in the previous example and pay attention to the new values. + +## 5. Evaluate the model skill and compute the probability thresholds + +The **Skill** module returns a list of all the evaluation metrics requested in the recipe, in the shape of multi-dimensional arrays. In this case, we will compute three metrics: + +- **RPSS (Ranked Probability Skill Score)**: This skill score measures how well a forecast predicts the probability of the tercile categories (below normal, normal and above-normal), compared to the climatology. +- **BSS10 and BSS90 (Brier Skill Score):** This skill score measures how well a forecast predicts the probability of the 10th percentile and 90th percentile extreme events, compared to the climatology. + +The `Probabilities()` function returns the probability values for each requested category for the hindcast (and forecast) data, as well as the hindcast percentile values corresponding to each threshold. +``` +# Compute skill metrics +skill_metrics <- Skill(recipe, data) +# Compute percentiles and probability bins +probabilities <- Probabilities(recipe, data) +``` +**Questions** + +(1) What is the structure of `skill_metrics`? Which metrics were computed? What dimensions do they have? *Tip*: use `str()` and `names()`. + +(2) What is the structure of `probabilities`? Can you identify the probability categories and the percentiles? *Tip*: use `str()`. + +## 6. Plotting the results + +Now, let's visualize the information that was computed! + +The **Visualization** module will generate the maps we requested in the recipe: Skill maps to visualize the skill distribution of the model, for each metric. + +With the significance option in the `Visualization()` function, you can choose whether or not to shade the grid points that are statistically significant in each skill metric plot. + +```r +# Plot data +Visualization(recipe, data, + skill_metrics = skill_metrics, + significance = TRUE) +``` + +Now, you can `cd` to the the output directory and inspect the contents of the `plots/` subdirectory. The plots are PNG files that can be visualized with the `display` command. They have a descriptive name including the content of the plot, the date and the forecast time. diff --git a/use_cases/ex0_1_sample_dataset/ex0_1-recipe.yml b/use_cases/ex0_1_sample_dataset/ex0_1-recipe.yml new file mode 100644 index 0000000000000000000000000000000000000000..662e75cc77691b7e6670cf68f31af885c48225c5 --- /dev/null +++ b/use_cases/ex0_1_sample_dataset/ex0_1-recipe.yml @@ -0,0 +1,57 @@ +Description: + Author: V. Agudetse + '': split version +Analysis: + Horizon: seasonal + Variables: + name: tas + freq: monthly_mean + Datasets: + System: + name: ECMWF-SEAS5.1 + Multimodel: no + Reference: + name: ERA5 + Time: + sdate: '1101' + fcst_year: + hcst_start: '2000' + hcst_end: '2006' + ftime_min: 1 + ftime_max: 3 + Region: + latmin: 27 + latmax: 48 + lonmin: 0 + lonmax: 359 + Regrid: + method: bilinear + type: to_system + Workflow: + Anomalies: + compute: yes + cross_validation: yes + save: 'all' + Calibration: + method: bias + save: 'none' + Skill: + metric: RPSS BSS10 BSS90 + save: 'all' + Probabilities: + percentiles: [[1/3, 2/3]] + save: 'all' + Indicators: + index: FALSE + Visualization: + plots: skill_metrics + ncores: # Optional, int: number of cores, defaults to 1 + remove_NAs: # Optional, bool: Whether NAs are removed, defaults to FALSE + Output_format: S2S4E +Run: + Loglevel: INFO + Terminal: yes + output_dir: /tmp/out-logs/ + code_dir: ./ + filesystem: sample + diff --git a/use_cases/ex1_1_single_analysis_terminal/ex1_1-handson.md b/use_cases/ex1_1_single_analysis_terminal/ex1_1-handson.md new file mode 100644 index 0000000000000000000000000000000000000000..6315cd864087b2573dd4fb7ce8d27bdf27f04e54 --- /dev/null +++ b/use_cases/ex1_1_single_analysis_terminal/ex1_1-handson.md @@ -0,0 +1,235 @@ +# Hands-on 1.1: Single Verification Workflow on the Terminal + +## Goal +Create a SUNSET recipe and use the functions in the suite to reproduce the verification workflow from the previous hands-on exercises. + +## 0. Cloning the SUNSET repository + +The first step to use SUNSET is to create a copy of the code in your local environment. Open a terminal and `cd` to the directory where you would like to store your local copy of SUNSET. For example: `/esarchive/scratch//git/`. If a directory does not exist yet, you can create it with the `mkdir` shell command. + +```shell +# Clone the GitLab repository to create a local copy of the code +git clone https://earth.bsc.es/gitlab/es/sunset.git +``` + +## 1. Modifying the recipe + +SUNSET uses YAML configuration files called 'recipes' to specify which data you want to load and the details of the different steps of the workflow. In this example, we want to evaluate the temperature-at-surface (tas) monthly means, using MeteoFrance System 7 data as our experiment and ERA5 as our reference dataset, for the initialization month of November. + +There is a template file for this hands-on tutorial, which you can open with a text editor: + +```shell +# cd to the main SUNSET directory +# Open the recipe with a text editor such as vim or emacs +vim use_cases/ex1_1_single_analysis_terminal/ex1_1-recipe.yml +``` + +Once you have opened your recipe, it is time to edit the contents. Fill in the blank slots according to your preference, based on the options given in the description. + +```yaml +Description: + Author: <___> + Description: Exercise 1.1: Calibration and skill assessment of MeteoFrance System 7 surface temperature +Analysis: + Horizon: Seasonal + Variables: + name: tas + freq: monthly_mean + units: <___> # Choose your units: C or K + Datasets: + System: + name: Meteo-France-System7 + Multimodel: no + Reference: + name: ERA5 + Time: + sdate: '1101' + fcst_year: '2020' + hcst_start: '1993' + hcst_end: '2016' + ftime_min: <___> # Choose the first time step! A number from 1 to 6 + ftime_max: <___> # Choose the last time step! A number from 1 to 6 + Region: + name: "EU" + latmin: 20 + latmax: 80 + lonmin: -20 + lonmax: 40 + Regrid: + method: bilinear + type: 'r360x181' # options: to_system, to_reference, self-defined grid + Workflow: + Anomalies: + compute: yes + cross_validation: yes + save: 'none' + Calibration: + method: <___> + save: 'none' + Skill: + metric: RPSS, BSS10, BSS90 + cross_validation: yes + save: 'none' + Probabilities: + percentiles: [[1/3, 2/3], [1/10, 9/10]] + save: 'none' + Visualization: + plots: skill_metrics, forecast_ensemble_mean, most_likely_terciles + multi_panel: no + projection: cylindrical_equidistant + ncores: 10 + remove_NAs: yes + Output_format: S2S4E +Run: + Loglevel: INFO + Terminal: yes + output_dir: <______> # Path to the directory where you want your outputs to be saved + code_dir: <______> # Path to the directory where your code is +``` + +## 2. Load the required SUNSET modules and read the recipe + +First of all, source the MODULES file to load the environment modules needed to run SUNSET. + +```shell +source MODULES +``` + +Open an R session, by simply typing `R` on the terminal. + +To run SUNSET, we must run the R session from the directory where the code is. To check your working directory, you can run the shell command `pwd`. From the R session, you can use the commands `getwd()` and `setwd()` to see and change your working directory. + +```r +# Load required modules +source("modules/Loading/Loading.R") +source("modules/Units/Units.R") +source("modules/Calibration/Calibration.R") +source("modules/Anomalies/Anomalies.R") +source("modules/Skill/Skill.R") +source("modules/Saving/Saving.R") +source("modules/Visualization/Visualization.R") + +# Read recipe +recipe_file <- "use_cases/ex1_1_single_analysis_terminal/ex1_1-recipe.yml" +recipe <- prepare_outputs(recipe_file) +``` + +The function `prepare_outputs()` creates a unique folder for the logs, data files and plots that result from the execution of your recipe, inside the directory you specified. It also runs a check over the recipe to detect any potential errors, misspellings or missing arguments. At the end of the check, a message is displayed indicating whether or not the recipe passed the check, along with the list of errors and warnings. + +**Questions** + +Read the logs! + +(1) Did your recipe pass the check? Did you get any warnings? + +(2) Where will your outputs be saved? Copy and paste this directory somewhere, so that you can check it later! + +*Tip*: The recipe is now stored as a `list` containing all the information of the original YAML file, plus some extra things! If you want to see any particular element of the recipe from the R session, you can simply access that element in the list. For example: + +```r +# Checking the variable name +recipe$Analysis$Variables$name +# Checking the output directory +recipe$Run$output_dir +``` + +## 3. Load the data and change the units + +The **Loading** module retrieves the information from the recipe to load the data that has been requested it in. It loads the experiment data for the hindcast period, the reference data for the corresponding period, and the experiment forecast if a forecast year has been requested. + +For certain variables like temperature, precipitation or sea level pressure, the user can request for specific units to load the data in. The **Units** module will read the original units as stored in the netCDF files and perform any necessary unit converstions to match the request in the recipe. It also verifies that all of the loaded datasets share the same units, even if no specific unit has been requested. For this reason, users are strongly encouraged to run it even if they did not request any unit conversion. + +```r +# Load datasets +data <- Loading(recipe) +# Change units +data <- Units(recipe, data) +``` + +**Questions** + +(1) What is the structure of `data`? What is the class of the objects in `data`? *Tip*: you can use functions like `class()`, `names()` or `str()` to gain information about the structure of the object and its contents. + +```r +class(data) +names(data) +str(data, max.level = 2) +# You can access any of the three objects with the `$` operator: +class(data$hcst) +``` + +(2) Pay attention to the log messages: Did your units get converted? Are the new units what you expect? You can check the metadata of any of the objects in data. SUNSET also provides the `data_summary()` function, which lets you have a quick look at your objects: + +```r +# Check the new units and data of hcst, fcst and/or obs. Are they all the same? +data$hcst$attrs$Variable$metadata$tas$units +data_summary(data$hcst, recipe) +data_summary(data$fcst, recipe) +data_summary(data$obs, recipe) +``` +(3) What are the dimensions of the datasets? Are they consistent with what is requested in the recipe? *Tip*: Check the data summary! + +## 4. Calibrate the data and compute the anomalies + +SUNSET has a few modules to perform post-processing on the experimental and the reference datasets. The **Calibration** module performs the bias correction method indicated in the recipe, using the `CSTools::CST_Calibration()` function. + +The **Anomalies** module removes the climatologies using functions like `CSTools::CST_Anomaly()` and `s2dv::Clim()`, and also returns the full fields in case they are needed for any future computations. + +```r +# Calibrate the data +data <- Calibration(recipe, data) +# Compute anomalies +data <- Anomalies(recipe, data) +``` +**Questions** + +(1) Verify that you now have anomaly values instead of the original full field. *Tip*: Use `data_summary()` like in the previous example and pay attention to the new values. + +## 5. Evaluate the model skill and compute the probability thresholds + +The **Skill** module returns a list of all the evaluation metrics requested in the recipe, in the shape of multi-dimensional arrays. In this case, we will compute three metrics: + +- **RPSS (Ranked Probability Skill Score)**: This skill score measures how well a forecast predicts the probability of the tercile categories (below normal, normal and above-normal), compared to the climatology. +- **BSS10 and BSS90 (Brier Skill Score):** This skill score measures how well a forecast predicts the probability of the 10th percentile and 90th percentile extreme events, compared to the climatology. + +The `Probabilities()` function returns the probability values for each requested category for the hindcast and forecast data, as well as the hindcast percentile values corresponding to each threshold. +``` +# Compute skill metrics +skill_metrics <- Skill(recipe, data) +# Compute percentiles and probability bins +probabilities <- Probabilities(recipe, data) +``` +**Questions** + +(1) What is the structure of `skill_metrics`? Which metrics were computed? What dimensions do they have? *Tip*: use `str()` and `names()`. + +(2) What is the structure of `probabilities`? Can you identify the probability categories and the percentiles? *Tip*: use `str()`. + +## 6. Plotting the results + +Now, let's visualize the information that was computed! + +The **Visualization** module will generate the three types of maps we requested in the recipe: +- Skill maps to visualize the skill distribution of the model, for each metric. +- The ensemble mean of the calibrated forecast anomalies. +- A map showing the most likely tercile category for each point in the grid. + +With the significance option in the `Visualization()` function, you can choose whether or not to shade the grid points that are statistically significant in each skill metric plot. + +```r +# Plot data +Visualization(recipe, data, + skill_metrics = skill_metrics, + probabilities = probabilities, + significance = TRUE) +``` + +Now, you can `cd` to the the output directory and inspect the contents of the `plots/` subdirectory. The plots are png files that can be visualized with the `display` command. They have a descriptive name including the content of the plot, the date and the forecast time. + +**Questions** + +(1) Let's take a look at the forecast ensemble mean. What is the sign of the anomalies over Spain? In what regions are the anomalous temperatures strongest? + +(2) Let's take a look at the skill metrics RPSS, BSS10 and BSS90. In what regions and for which metrics is the forecast most skillful? *Tip*: Positive values indicate that the model is a better predictor than the climatology, with 1 being the perfect score. + +(3) Let's take a look at the Most Likely Terciles plots. This plot indicates the probability of the temperature being below normal, near-normal or above normal. What is the most likely category for Spain? diff --git a/use_cases/ex1_1_single_analysis_terminal/ex1_1-recipe.yml b/use_cases/ex1_1_single_analysis_terminal/ex1_1-recipe.yml new file mode 100644 index 0000000000000000000000000000000000000000..7b35e9666b599ea7a41113acf1b14ef5a0415c63 --- /dev/null +++ b/use_cases/ex1_1_single_analysis_terminal/ex1_1-recipe.yml @@ -0,0 +1,61 @@ +Description: + Author: V. Agudetse + Description: Analysis of MF System 7 with temperature +Analysis: + Horizon: Seasonal + Variables: + name: tas + freq: monthly_mean + units: K + Datasets: + System: + name: Meteo-France-System7 + Multimodel: no + Reference: + name: ERA5 + Time: + sdate: '1101' + fcst_year: '2020' + hcst_start: '1993' + hcst_end: '2016' + ftime_min: 1 + ftime_max: 2 + Region: + name: "EU" + latmin: 20 + latmax: 80 + lonmin: -20 + lonmax: 40 + Regrid: + method: bilinear + type: 'r360x181' + Workflow: + Anomalies: + compute: yes + cross_validation: yes + save: 'none' + Calibration: + method: mse_min + save: 'none' + Skill: + metric: RPSS, BSS10, BSS90 + cross_validation: yes + save: 'all' + Probabilities: + percentiles: [[1/3, 2/3], [1/10, 9/10]] + save: 'none' + Visualization: + plots: skill_metrics, forecast_ensemble_mean, most_likely_terciles + multi_panel: no + projection: cylindrical_equidistant + mask_terciles: no # CHECK + dots: no # CHECK + ncores: 10 + remove_NAs: yes + Output_format: S2S4E +Run: + filesystem: esarchive + Loglevel: INFO + Terminal: yes + output_dir: /esarchive/scratch/vagudets/auto-s2s-outputs/ + code_dir: /esarchive/scratch/vagudets/repos/auto-s2s/ diff --git a/use_cases/ex1_2_autosubmit_scorecards/Figures/as_change_status.PNG b/use_cases/ex1_2_autosubmit_scorecards/Figures/as_change_status.PNG new file mode 100644 index 0000000000000000000000000000000000000000..5fc5cd8a3397a626824cbaa8fb9d3fecb090ce13 Binary files /dev/null and b/use_cases/ex1_2_autosubmit_scorecards/Figures/as_change_status.PNG differ diff --git a/use_cases/ex1_2_autosubmit_scorecards/Figures/as_tree.PNG b/use_cases/ex1_2_autosubmit_scorecards/Figures/as_tree.PNG new file mode 100644 index 0000000000000000000000000000000000000000..5194568e387d7251c478eed0cf071f7d7c7a4a60 Binary files /dev/null and b/use_cases/ex1_2_autosubmit_scorecards/Figures/as_tree.PNG differ diff --git a/use_cases/ex1_2_autosubmit_scorecards/Figures/scorecard-1_ECMWF-SEAS5_ERA5_tas_1993-2003.png b/use_cases/ex1_2_autosubmit_scorecards/Figures/scorecard-1_ECMWF-SEAS5_ERA5_tas_1993-2003.png new file mode 100644 index 0000000000000000000000000000000000000000..631e3e2f80955aaf2b5dbd8abb2e577a73746373 Binary files /dev/null and b/use_cases/ex1_2_autosubmit_scorecards/Figures/scorecard-1_ECMWF-SEAS5_ERA5_tas_1993-2003.png differ diff --git a/use_cases/ex1_2_autosubmit_scorecards/Figures/scorecard-2_ECMWF-SEAS5_ERA5_tas_1993-2003.png b/use_cases/ex1_2_autosubmit_scorecards/Figures/scorecard-2_ECMWF-SEAS5_ERA5_tas_1993-2003.png new file mode 100644 index 0000000000000000000000000000000000000000..e311079536dc83ccb94c2cd786cebc44b46694ed Binary files /dev/null and b/use_cases/ex1_2_autosubmit_scorecards/Figures/scorecard-2_ECMWF-SEAS5_ERA5_tas_1993-2003.png differ diff --git a/use_cases/ex1_2_autosubmit_scorecards/Figures/scorecard-3_ECMWF-SEAS5_ERA5_tas_1993-2003.png b/use_cases/ex1_2_autosubmit_scorecards/Figures/scorecard-3_ECMWF-SEAS5_ERA5_tas_1993-2003.png new file mode 100644 index 0000000000000000000000000000000000000000..356243497a01f392ce21575d9df17c1a1d2b1796 Binary files /dev/null and b/use_cases/ex1_2_autosubmit_scorecards/Figures/scorecard-3_ECMWF-SEAS5_ERA5_tas_1993-2003.png differ diff --git a/use_cases/ex1_2_autosubmit_scorecards/Figures/scorecard-4_ECMWF-SEAS5_ERA5_tas_1993-2003.png b/use_cases/ex1_2_autosubmit_scorecards/Figures/scorecard-4_ECMWF-SEAS5_ERA5_tas_1993-2003.png new file mode 100644 index 0000000000000000000000000000000000000000..54e1a5b1825b1a00672360dd86f3c8c15a86380c Binary files /dev/null and b/use_cases/ex1_2_autosubmit_scorecards/Figures/scorecard-4_ECMWF-SEAS5_ERA5_tas_1993-2003.png differ diff --git a/use_cases/ex1_2_autosubmit_scorecards/ex1_2-handson.md b/use_cases/ex1_2_autosubmit_scorecards/ex1_2-handson.md new file mode 100644 index 0000000000000000000000000000000000000000..999e02f31794d6fcd1e6e49ead1357dd6b791f4e --- /dev/null +++ b/use_cases/ex1_2_autosubmit_scorecards/ex1_2-handson.md @@ -0,0 +1,172 @@ +# Hands-on 1.2: Computation of Scorecards with Autosubmit + +## Goal +Compute some skill metrics and plots scorecards with SUNSET, using Autosubmit to dispatch jobs in parallel. +In the recipe, we request 12 start dates (0101, 0201, ..., 1201). SUNSET will split the recipe into 12 atomic recipes, and Autosubmit will run 12 jobs, which process the verification, for each recipe in parallel. +When all the verification jobs are finished, the scorecard job will be triggered and produces the scorecards. + +We only use one variable, one model and one reference dataset in this example, but you can add more datasets and variables if needed, and SUNSET will split the recipes accordingly. + +Check GitLab Wiki: +- Autosubmit page for full explanation of using SUNSET with Autosubmit https://earth.bsc.es/gitlab/es/sunset/-/wikis/Autosubmit + +- Home page Scorecards module section to know more about scorecards https://earth.bsc.es/gitlab/es/sunset/-/wikis/home#scorecards-module + + +## 0. Cloning the SUNSET repository + +If you're completely new to SUNSET, the first step is to create a copy of the tool in your local environment. +Open a terminal and `cd` to the directory where you would like to store your local copy of SUNSET. For example: `/esarchive/scratch//git/`. If a directory does not exist yet, you can create it with the `mkdir` shell command. + +```shell +# Clone the GitLab repository to create a local copy of the code +git clone https://earth.bsc.es/gitlab/es/sunset.git +``` +You should see a git folder "sunset" under the current directory. Now you have all the code, recipes, and scripts for running SUNSET. + + +## 1. Create Autosubmit experiment + +Since we're going to use Autosubmit to dispatch jobs, we need to have an Autosubmit experiment. Note that SUNSET uses Autosubmit >= 4.0.0. + +On the workstation or the Autosubmit machine, you can create an experiment by the following commands. + +```shell +module load autosubmit/4.0.0b-foss-2015a-Python-3.7.3 +autosubmit expid -H nord3v2 -d "SUNSET use case 1_2" +``` +You will see the messages like below: + +```shell +Autosubmit is running with 4.0.0b +The new experiment "a6pc" has been registered. +Generating folder structure... +Experiment folder: /esarchive/autosubmit/a6pc +Generating config files... +Experiment a6pc created +``` +Note the experiment ID (in this snippet above, a6pc) down. We need it for the recipe later. + + +## 2. Modifying the recipe + +The template recipe for this use case can be found in [ex1_2-recipe.yml](use_cases/ex1_2_autosubmit_scorecards/ex1_2-recipe.yml). +You should at least edit some items in the "Run" section: +- `output_dir`: The directory you want to save the outputs and logs +- `code_dir`: The directory where your SUNSET code is stored (i.e., the git folder) +- `auto_conf$script`: The path to the script ex1_2-recipe.yml +- `auto_conf$expid`: The experiment "xxxx" you just created +- `auto_conf$hpc_user`: You user ID on Nord3, which should be bsc32xxx +- `auto_conf$email_address`: Your email. You can also adjust other email notification parts up to your preference. + +In the recipe, we ask for anomaly calculation after loading the data, calculate the skill scores and save the result for scorecards. In the Scorecard section, three regions are requested. + +Feel free to also modify other aspects according to your particular needs. You can read more about the parameters and the available modules in the SUNSET GitLab wiki. + +## 3. The user-defined script + +We need to have a script to define the modules to use and the steps of the workflow. Note that the script is for data loading and verification parts. The Scorecards module doesn't need to be included in this script. + +The prepare_outputs() function is already incorporated into the launcher script (see the next section for details about launcher), so it does not need to be included in the user-defined script in this case. +In its place, we will use the function read_atomic_recipe(). The recipe path is passed as an argument onto the R script. The beginning of our script should look like this: + +```R +# Load modules +source("modules/Loading/Loading.R") +source("modules/Anomalies/Anomalies.R") +source("modules/Skill/Skill.R") +source("modules/Saving/Saving.R") + +# Read recipe +## (leave this part as-is! Autosubmit will automatically pass the atomic recipe as a parameter) +args = commandArgs(trailingOnly = TRUE) +recipe_file <- args[1] +recipe <- read_atomic_recipe(recipe_file) +``` + +The rest of the user-defined script can be written in the same way as any other SUNSET script. We load the data, calculate the anomalies, then compute the skill scores and save the result as netCDF files for Scorecards. + +```R +# Load data +data <- Loading(recipe) +# Compute tos anomalies +data <- Anomalies(recipe, data) +# Compute skill metrics +skill_metrics <- Skill(recipe, data) +``` +Check the example script at [ex1_2-script.yml](use_cases/ex1_2_autosubmit_scorecards/ex1_2-script.R). +You can execute it as-is or copy it and modify it according to your needs. + + +## 4. Launch jobs and Use Autosubmit + +We will start the jobs with the launcher. The SUNSET Launcher is a bash script named launch_SUNSET.sh that can be found in the main directory of the SUNSET repository. It runs in two steps: + +1. Run the recipe checks, split the recipe into atomic recipes and create the directory for the outputs. +2. Modify the Autosubmit configuration of your experiment according to the parameters in the recipe. + +The bash script needs two inputs: (1) [recipe](#2-modifying-the-recipe) (2) [R script](#3-the-user-defined-script). + + On your workstation or Nord3 under the SUNSET code directory, run: + +```shell +bash launch_SUNSET.sh use_cases/ex1_2_autosubmit_scorecards/ex1_2-recipe.yml use_cases/ex1_2_autosubmit_scorecards/ex1_2-script.R +``` +You will see the messages similar to below: +```shell +[1] "Saving all outputs to:" +[1] "/esarchive/scratch/aho/auto-s2s-outputs/ex1_2-recipe_20231129003740" +INFO [2023-11-29 00:37:41] Checking recipe: use_cases/ex1_2_autosubmit_scorecards/ex1_2-recipe.yml +WARN [2023-11-29 00:37:41] The element 'fcst_year' is not defined in the recipe. No forecast year will be used. +INFO [2023-11-29 00:37:41] ##### RECIPE CHECK SUCCESSFULL ##### +INFO [2023-11-29 00:37:41] Splitting recipe in single verifications. +INFO [2023-11-29 00:37:41] The main recipe has been divided into 12 atomic recipes. +INFO [2023-11-29 00:37:41] Check output directory /esarchive/scratch/aho/auto-s2s-outputs//ex1_2-recipe_20231129003740/logs/recipes/ to see all the individual atomic recipes. +INFO [2023-11-29 00:37:41] ##### AUTOSUBMIT CONFIGURATION WRITTEN FOR a6pc ##### +INFO [2023-11-29 00:37:41] You can check your experiment configuration at: /esarchive/autosubmit/a6pc/conf/ +INFO [2023-11-29 00:37:41] Please SSH into bscesautosubmit01 or bscesautosubmit02 and run the following commands: +INFO [2023-11-29 00:37:41] module load autosubmit/4.0.0b-foss-2015a-Python-3.7.3 +INFO [2023-11-29 00:37:41] autosubmit create a6pc +INFO [2023-11-29 00:37:41] autosubmit refresh a6pc +INFO [2023-11-29 00:37:41] nohup autosubmit run a6pc & disown +``` +You can see some useful information, like the the path to atomic recipes, the Autosubmit configuration files, and most importantly, follow the last lines to launch your experiment. + +```shell +ssh bscesautosubmit01.bsc.es +(enter Autosubmit machine) +module load autosubmit/4.0.0b-foss-2015a-Python-3.7.3 +autosubmit create a6pc +autosubmit refresh a6pc +nohup autosubmit run a6pc & disown +``` + +Then, you can go to [Autosubmit GUI](https://earth.bsc.es/autosubmitapp/) to check the experiment status. + + + +As you can see, the Scorecards job is dependent on the Verification jobs. Once the 12 verification jobs are finished, the Scorecards job will start. + +## 5. Results and plots + +The scorecards are saved under `plots/Scorecards` under the output directory. There will be 4 files (_more explanation here_) + + + + + + + +## 6. Rerun Autosubmit + +If something goes wrong and makes the jobs fail, you can rerun the failed jobs only. + +1. Go to Autosubmit GUI, select the failed job(s), click "CHANGE STATUS" +2. Select "Set status to:" as "WAITING". Copy the lines and run them on Autosubmit machine or workstation. +3. Fix the problem under your local SUNSET git directory. +4. Run `autosubmit refresh xxxx` and `nohup autosubmit run xxxx & disown`. + + + +If everything fails, you can also simply recreate the experiment by `autosubmit create xxxx` --> `autosubmit refresh xxxx` --> `nohup autosubmit run xxxx & disown`. + diff --git a/recipes/recipe_scorecards_vic.yml b/use_cases/ex1_2_autosubmit_scorecards/ex1_2-recipe.yml similarity index 61% rename from recipes/recipe_scorecards_vic.yml rename to use_cases/ex1_2_autosubmit_scorecards/ex1_2-recipe.yml index fbd2cb90e2cd162f90e517bcbd82fe634783b8e0..73f16311f93cf69aa439945b00715b7d29f80afa 100644 --- a/recipes/recipe_scorecards_vic.yml +++ b/use_cases/ex1_2_autosubmit_scorecards/ex1_2-recipe.yml @@ -1,25 +1,17 @@ -################################################################################ -## RECIPE DESCRIPTION -################################################################################ - Description: - Author: V. Agudetse - Info: Test for recipe splitting - -################################################################################ -## ANALYSIS CONFIGURATION -################################################################################ + Author: An-Chi Ho + Info: Compute Skills and Plot Scorecards with Autosubmit Analysis: - Horizon: Seasonal - Variables: # ECVs and Indicators? + Horizon: seasonal + Variables: - {name: tas, freq: monthly_mean} Datasets: System: # multiple systems for single model, split if Multimodel = F - {name: ECMWF-SEAS5} Multimodel: False # single option Reference: - - {name: ERA5} # multiple references for single model? + - {name: ERA5} Time: sdate: # list, split - '0101' @@ -42,7 +34,7 @@ Analysis: Region: # multiple lists, split? Add region name if length(Region) > 1 - {name: "global", latmin: -90, latmax: 90, lonmin: 0, lonmax: 359.9} Regrid: - method: bilinear ## TODO: allow multiple methods? + method: bilinear type: to_system Workflow: Anomalies: @@ -50,34 +42,30 @@ Analysis: cross_validation: no save: 'none' Calibration: - method: raw ## TODO: list, split? + method: raw save: 'none' Skill: metric: mean_bias EnsCorr rps rpss crps crpss EnsSprErr # list, don't split cross_validation: yes save: 'all' Probabilities: - percentiles: [[1/3, 2/3]] # list, don't split + percentiles: [[1/3, 2/3]] save: 'none' - # Visualization: - # plots: skill_metrics - Indicators: - index: no # ? Scorecards: execute: yes # yes/no regions: Extra-tropical NH: {lon.min: 0, lon.max: 360, lat.min: 30, lat.max: 90} Tropics: {lon.min: 0, lon.max: 360, lat.min: -30, lat.max: 30} Extra-tropical SH : {lon.min: 0, lon.max: 360, lat.min: -30, lat.max: -90} - start_months: NULL + start_months: 'all' metric: mean_bias enscorr rpss crpss enssprerr metric_aggregation: 'score' - table_label: NULL + inf_to_na: TRUE # Optional, bool: set inf values in data to NA, default is FALSE table_label: NULL fileout_label: NULL col1_width: NULL col2_width: NULL calculate_diff: FALSE - ncores: 14 + ncores: 8 remove_NAs: no # bool, don't split Output_format: Scorecards # string, don't split @@ -87,20 +75,22 @@ Analysis: Run: Loglevel: INFO Terminal: yes - output_dir: /esarchive/scratch/vagudets/auto-s2s-outputs/ - code_dir: /esarchive/scratch/vagudets/repos/auto-s2s/ + filesystem: esarchive + output_dir: /esarchive/scratch/aho/auto-s2s-outputs/ + code_dir: /esarchive/scratch/aho/git/auto-s2s/ autosubmit: yes # fill only if using autosubmit auto_conf: - script: /esarchive/scratch/vagudets/repos/auto-s2s/example_scripts/test_scorecards_workflow.R # replace with the path to your script - expid: a6ae # replace with your EXPID - hpc_user: bsc32762 # replace with your hpc username + script: /esarchive/scratch/aho/git/auto-s2s/use_cases/ex1_2_autosubmit_scorecards/ex1_2-script.R # replace with the path to your script + expid: a6pc # replace with your EXPID + hpc_user: bsc32734 # replace with your hpc username wallclock: 03:00 # hh:mm - processors_per_job: 14 + processors_per_job: 8 platform: nord3v2 custom_directives: ['#SBATCH --exclusive'] email_notifications: yes # enable/disable email notifications. Change it if you want to. - email_address: victoria.agudetse@bsc.es # replace with your email address + email_address: an.ho@bsc.es # replace with your email address notify_completed: yes # notify me by email when a job finishes - notify_failed: no # notify me by email when a job fails + notify_failed: yes # notify me by email when a job fails + diff --git a/use_cases/ex1_2_autosubmit_scorecards/ex1_2-script.R b/use_cases/ex1_2_autosubmit_scorecards/ex1_2-script.R new file mode 100644 index 0000000000000000000000000000000000000000..1f60798736ae021e55face318444f62149c2aec2 --- /dev/null +++ b/use_cases/ex1_2_autosubmit_scorecards/ex1_2-script.R @@ -0,0 +1,25 @@ +############################################################################### +## Author: An-Chi Ho +## Description: Computes some skill metrics and plots scorecards with Autosubmit +## Instructions: Follow the steps described in: +## use_cases/ex1_2_autosubmit_scorecards/ex1_2-handson.md +## This script should be called by bash script launch_SUNSET.sh. +############################################################################### + +# Load modules +source("modules/Loading/Loading.R") +source("modules/Anomalies/Anomalies.R") +source("modules/Skill/Skill.R") + +# Read recipe +args = commandArgs(trailingOnly = TRUE) +recipe_file <- args[1] +recipe <- read_atomic_recipe(recipe_file) + +# Load data +data <- Loading(recipe) +# Compute tos anomalies +data <- Anomalies(recipe, data) +# Compute skill metrics +skill_metrics <- Skill(recipe, data) + diff --git a/use_cases/ex1_3_nino_indices_comparison/ex1_3-handson.md b/use_cases/ex1_3_nino_indices_comparison/ex1_3-handson.md new file mode 100644 index 0000000000000000000000000000000000000000..3696ffe66ce848a5bef9d90f658c87f3d5f62930 --- /dev/null +++ b/use_cases/ex1_3_nino_indices_comparison/ex1_3-handson.md @@ -0,0 +1,106 @@ +# Hands-on 1.3: Computation of El Niño indices for two seasonal models + +## Goal +Create a SUNSET recipe to compute and evaluate the skill of several El Niño indices (Niño1+2, Niño3, Niño3.4 and Niño4) for two models: ECMWF-SEAS5 and MeteoFrance System 7. We include the information for both of the models in a single recipe, and the SUNSET Launcher will split the recipe into two 'atomic recipes': one for each model. The computation for atomic recipe will be run in the cluster as two separate jobs. + +It is also possible to split a recipe along different Reference datasets, Variables and Start Dates. + +## 0. Cloning the SUNSET repository + +The first step to use SUNSET is to create a copy of the code in your local environment. Open a terminal and `cd` to the directory where you would like to store your local copy of SUNSET. For example: `/esarchive/scratch//git/`. If a directory does not exist yet, you can create it with the `mkdir` shell command. + +```shell +# Clone the GitLab repository to create a local copy of the code +git clone https://earth.bsc.es/gitlab/es/sunset.git +``` + +## 1. Modifying the recipe + +The template recipe for this use case can be found in `use_cases/ex1_3_nino_indices_comparison/ex1_3-recipe.yml`. You should open it with an editor such as emacs or vim: + +```shell +# cd to the main SUNSET directory +# Open the recipe with a text editor such as vim or emacs +vim use_cases/ex1_3_nino_indices_comparison/ex1_3-recipe.yml +``` + +Then, under the 'Run' section of the recipe, you should edit the parameters `output_dir` and `code_dir` to point to your desire output directory and to the directory where your SUNSET code is stored, respectively. + +Feel free to also modify other aspects of the reicpe according to your particular needs. You can read more about the parameters and the available modules in the SUNSET wiki. + +## 2. The user-defined script + +The SUNSET Launcher is a bash script named launch_SUNSET.sh that can be found in the main directory of the SUNSET repository. When working without Autosubmit, it runs in two steps: + +1. Running the recipe checks, splitting the recipe into atomic recipes and creating the directory for the outputs. +2. Sending jobs to the cluster to run the user-defined script for each atomic recipe, using SLURM. + +The prepare_outputs() function is already incorporated into the first step. For that reason, it does not need to be included in the user-defined script in this particular case. In its place, we will use the function read_atomic_recipe(). The recipe path is passed as an argument onto the R script. The beginning of our script should look like this: + +```R +# Load the modules to be used +source("modules/Loading/Loading.R") +source("modules/Units/Units.R") +source("modules/Anomalies/Anomalies.R") +source("modules/Indices/Indices.R") +source("modules/Skill/Skill.R") + +# Define the recipe path as the first argument from the command line +args = commandArgs(trailingOnly = TRUE) +recipe_file <- args[1] +# Read the atomic recipe +recipe <- read_atomic_recipe(recipe_file) +``` + +The rest of the user-defined script can be written in the same way as any other SUNSET script: + +```R +# Load data +data <- Loading(recipe) +# Check units and transform if needed +data <- Units(recipe, data) +# Compute tos anomalies +data <- Anomalies(recipe, data) +# Compute Niño Indices +nino_indices <- Indices(data = data, recipe = recipe) + +# We can compute the Skill metrics for each of the El Niño indices, +# specifying that the data is spatially aggregated, with the parameter +# agg = "region". +for (index in nino_indices) { + nino_skill_metrics <- Skill(recipe = recipe, data = index, agg = "region") +} +``` + +A complete, ready-to-use sample of this example script can be found in `use_cases/ex1_3_nino_indices_comparison/ex1_3-script.R`. You can execute it as-is or copy it and modify it according to your specific needs. + +## 3. Launching the jobs with the SUNSET Launcher + +The first step is to connect to the HPC machine through `ssh`. When working without Autosubmit, the SUNSET Launcher should be run directly from the HPC machine where the jobs will run (for example, Nord3v2). There is no need to request an interactive session; the launcher script can be called directly from the login node. You can obtain detailed usage information by running: + +```shell +bash launch_SUNSET.sh --help +``` + +The mandatory arguments are the paths to the recipe and the script. We can also include other optional arguments to be used by SLURM, such as the number of CPUs to request (--cpus), the wallclock time for each job (--wallclock) and other extra directives (--custom_directives). You can refer to the [Nord3v2 user guide](https://www.bsc.es/user-support/nord3v2.php#jobdirectives) and the [SLURM sbatch documentation](https://slurm.schedmd.com/sbatch.html) for more information on the available options for the parameters. + +In this case, we are giving each job a wallclock time of 1 hour and requesting exclusive usage of all the cores in one node. The shell command to run SUNSET will look like this: + +```shell +bash launch_SUNSET.sh use_cases/ex1_3_nino_indices_comparison/ex1_3-recipe.yml use_cases/ex1_3_nino_indices_comparison/ex1_3-script.R --wallclock=01:00:00 --custom_directives="--exclusive" +``` + +You can check the status of your running jobs with the `squeue` command. The SLURM logs will be inside of your code directory, in a subfolder named 'out-logs'. It can be useful to check them in case of errors. + +## 4. Results and plots + +The spatial pattern and time series plots that were requested are saved inside `plots/Indices/` in the output directory. There will be one set of plots for each El Niño index, with a descriptive file name providing information about the content of the plot, the system/reference datasets, the start date and the forecast time. Here are some examples of the results: + +Spatial correlation for the ensemble mean: + +![](./figures/nino34_correlation_tos_ensmean_ECMWF-SEAS5_s1101_ftime02.png) +![](./figures/nino34_correlation_tos_ensmean_Meteo-France-System7_s1101_ftime02.png) + +Time series comparison between the model and the reference dataset (ERA5): +![](./figures/nino34_ECMWF-SEAS5_ERA5_s1101_ftime02.png) +![](./figures/nino34_Meteo-France-System7_ERA5_s1101_ftime02.png) diff --git a/use_cases/ex1_3_nino_indices_comparison/ex1_3-recipe.yml b/use_cases/ex1_3_nino_indices_comparison/ex1_3-recipe.yml new file mode 100644 index 0000000000000000000000000000000000000000..a4231e6a2403201b4e9b9d9963bd8ba99d6bd52c --- /dev/null +++ b/use_cases/ex1_3_nino_indices_comparison/ex1_3-recipe.yml @@ -0,0 +1,56 @@ +Description: + Author: V. Agudetse + Info: Computing El Nino indices for ECMWF SEAS5 and MeteoFrance System7 + +Analysis: + Horizon: seasonal + Variables: + - {name: tos, freq: monthly_mean, units: K} + Datasets: + System: + - {name: ECMWF-SEAS5} + - {name: Meteo-France-System7} + Multimodel: no + Reference: + - {name: ERA5} + Time: + sdate: '1101' + fcst_year: + hcst_start: '1993' + hcst_end: '2016' + ftime_min: 2 + ftime_max: 4 + Region: + latmin: -90 + latmax: 90 + lonmin: 0 + lonmax: 359.9 + Regrid: + method: bilinear + type: to_system + Workflow: + Calibration: + method: mse_min + save: none + Anomalies: + compute: yes + cross_validation: no + save: none + Indices: + Nino1+2: {save: all, plot_ts: yes, plot_sp: yes} + Nino3: {save: all, plot_ts: yes, plot_sp: yes} + Nino3.4: {save: all, plot_ts: yes, plot_sp: yes} + Nino4: {save: all, plot_ts: yes, plot_sp: yes} + Skill: + metric: mean_bias EnsCorr rps rpss crps crpss EnsSprErr + save: 'all' + ncores: 8 + remove_NAs: yes + Output_format: S2S4E + logo: TRUE +Run: + Loglevel: INFO + Terminal: yes + output_dir: /esarchive/scratch/vagudets/auto-s2s-outputs/ # ______ + code_dir: /esarchive/scratch/vagudets/repos/auto-s2s/ # _____ + autosubmit: no diff --git a/use_cases/ex1_3_nino_indices_comparison/ex1_3-script.R b/use_cases/ex1_3_nino_indices_comparison/ex1_3-script.R new file mode 100644 index 0000000000000000000000000000000000000000..c2b0ba341a015ef4440f653a8c2638cc0fe6619f --- /dev/null +++ b/use_cases/ex1_3_nino_indices_comparison/ex1_3-script.R @@ -0,0 +1,37 @@ +############################################################################### +## Author: Núria Pérez-Zanón and Victòria Agudetse Roures +## Description: Computes the Niño1+2, Niño3, Niño3.4 and Niño4 indices and some +## skill metrics for each index. +## Instructions: To run it, follow the steps described in: +## use_cases/ex1_3_nino_indices_comparison/ex1_3-handson.md +############################################################################### + +# Load modules +source("modules/Loading/Loading.R") +source("modules/Units/Units.R") +source("modules/Anomalies/Anomalies.R") +source("modules/Indices/Indices.R") +source("modules/Skill/Skill.R") + +# Read recipe +args = commandArgs(trailingOnly = TRUE) +recipe_file <- args[1] +recipe <- read_atomic_recipe(recipe_file) + +# Load data +data <- Loading(recipe) +# Check units and transform if needed +data <- Units(recipe, data) +# Calibrate data +# data <- Calibration(recipe, data) +# Compute tos anomalies +data <- Anomalies(recipe, data) +# Compute Niño Indices +nino_indices <- Indices(data = data, recipe = recipe) + +# We can compute the Skill metrics for each of the El Niño indices, +# specifying that the data is spatially aggregated, with the parameter +# agg = "region". +for (index in nino_indices) { + skill_metrics <- Skill(recipe = recipe, data = index, agg = "region") +} diff --git a/use_cases/ex1_3_nino_indices_comparison/figures/nino34_ECMWF-SEAS5_ERA5_s1101_ftime02.png b/use_cases/ex1_3_nino_indices_comparison/figures/nino34_ECMWF-SEAS5_ERA5_s1101_ftime02.png new file mode 100644 index 0000000000000000000000000000000000000000..b14e72311586239292d447c238442a9b9f2957f5 Binary files /dev/null and b/use_cases/ex1_3_nino_indices_comparison/figures/nino34_ECMWF-SEAS5_ERA5_s1101_ftime02.png differ diff --git a/use_cases/ex1_3_nino_indices_comparison/figures/nino34_Meteo-France-System7_ERA5_s1101_ftime02.png b/use_cases/ex1_3_nino_indices_comparison/figures/nino34_Meteo-France-System7_ERA5_s1101_ftime02.png new file mode 100644 index 0000000000000000000000000000000000000000..696de30e0a7a883bce952897b47c82223cb0d186 Binary files /dev/null and b/use_cases/ex1_3_nino_indices_comparison/figures/nino34_Meteo-France-System7_ERA5_s1101_ftime02.png differ diff --git a/use_cases/ex1_3_nino_indices_comparison/figures/nino34_correlation_tos_ensmean_ECMWF-SEAS5_s1101_ftime02.png b/use_cases/ex1_3_nino_indices_comparison/figures/nino34_correlation_tos_ensmean_ECMWF-SEAS5_s1101_ftime02.png new file mode 100644 index 0000000000000000000000000000000000000000..5b8af9efa6878dc51e4fea01bab2a240bf1ec2b4 Binary files /dev/null and b/use_cases/ex1_3_nino_indices_comparison/figures/nino34_correlation_tos_ensmean_ECMWF-SEAS5_s1101_ftime02.png differ diff --git a/use_cases/ex1_3_nino_indices_comparison/figures/nino34_correlation_tos_ensmean_Meteo-France-System7_s1101_ftime02.png b/use_cases/ex1_3_nino_indices_comparison/figures/nino34_correlation_tos_ensmean_Meteo-France-System7_s1101_ftime02.png new file mode 100644 index 0000000000000000000000000000000000000000..39cb7dcde131f31f3169d5e25ed0f5bd972c2abf Binary files /dev/null and b/use_cases/ex1_3_nino_indices_comparison/figures/nino34_correlation_tos_ensmean_Meteo-France-System7_s1101_ftime02.png differ