diff --git a/.gitlab-ci.yml b/.gitlab-ci.yml index c7deb1af5aac18cff5d07efec0fc1b0949cabf83..200b32db2ff68c46e0aeb1c22f07d820d0caf4bf 100644 --- a/.gitlab-ci.yml +++ b/.gitlab-ci.yml @@ -1,5 +1,14 @@ stages: - build + +#workflow: +# rules: +# - if: $CI_COMMIT_TITLE =~ /-draft$/ +# when: never +# - when: always +# - if: $CI_PIPELINE_SOURCE == "merge_request_event" +# - if: $CI_COMMIT_BRANCH == $CI_DEFAULT_BRANCH + build: stage: build script: diff --git a/inst/doc/tutorial/PATC2023/griddes_system7c3s.txt b/inst/doc/tutorial/PATC2023/griddes_system7c3s.txt new file mode 100644 index 0000000000000000000000000000000000000000..b6f18478e416d212a75c004ae02c27d795bc0495 --- /dev/null +++ b/inst/doc/tutorial/PATC2023/griddes_system7c3s.txt @@ -0,0 +1,19 @@ +# Grid description file for Meteofrance System 7 (C3S) +# Serves as reference_grid for archive.ym +# +# gridID 2 +# +gridtype = lonlat +gridsize = 64800 +xsize = 360 +ysize = 180 +xname = longitude +xlongname = "longitude" +xunits = "degrees_east" +yname = latitude +ylongname = "latitude" +yunits = "degrees_north" +xfirst = 0.5 +xinc = 1 +yfirst = 89.5 +yinc = -1 diff --git a/inst/doc/tutorial/PATC2023/handson_1-data-loading.md b/inst/doc/tutorial/PATC2023/handson_1-data-loading.md new file mode 100644 index 0000000000000000000000000000000000000000..1a61862021d3e37ff4dec5427233f1cbb4b43322 --- /dev/null +++ b/inst/doc/tutorial/PATC2023/handson_1-data-loading.md @@ -0,0 +1,246 @@ +# Hands-on 1: Load data by startR + +## Goal +Use startR to load the data and learn how to adjust data structure while loading data. + +## 0. Load required packages + +```r +# Clean the session +rm(list = ls()) +# Load package +library(startR) +``` + +**Data description**: +We will use two datasets in the hands-on. The experiment data are Meteo-France System 7 from ECMWF, and the observation ones are ERA5 from ECMWF. The data have been first processed into monthly mean data and stored in our data archive (esarchive). + +We're going to analyze the near-surface temperature (short name: tas) for seasonal forecast. We will focus on the Europe region (roughly 20W-40E, 20N-80N). The hindcast years are 1993 to 2016, and the forecast year is 2020. The initial month is November. To speed up the practice, we will only load the first two forecast time steps, but all the ensemble members are used to give a less biased result. + +## 1. Load experimental data from data repository + +### 1.a Hindcast data + +Understand the following script, run it, and check the result. + +```r + # Use this one if on workstation or nord3 (have access to /esarchive) + path_exp <- "/esarchive/exp/meteofrance/system7c3s/monthly_mean/$var$_f6h/$var$_$syear$.nc" + #---------------------------------------------------------------------- + # Run these two lines if you're on Marenostrum4 and log in with training account + prefix <- '/gpfs/scratch/bsc32/bsc32734/bsc_training_2023/R_handson/' + path_exp <- paste0(prefix, path_exp) + #---------------------------------------------------------------------- + + sdate_hcst <- paste0(1993:2016, '1101') + + hcst <- Start(dat = path_exp, + var = 'tas', + syear = sdate_hcst, + ensemble = 'all', + time = 1:2, + latitude = values(list(20, 80)), + latitude_reorder = Sort(), + longitude = values(list(-20, 40)), + longitude_reorder = CircularSort(-180, 180), + transform = CDORemapper, + transform_params = list(grid = 'r360x181', method = 'bilinear'), + transform_vars = c('latitude', 'longitude'), + synonims = list(latitude = c('lat', 'latitude'), + longitude = c('lon', 'longitude')), + return_vars = list(time = 'syear', + longitude = NULL, latitude = NULL), + retrieve = TRUE) +``` + +**Questions** + +(1) What are the dimensions of `hcst`? Use `dim()` to check. + +```r +dim(____) +``` + +(2) What is the structure of `hcst`? Use `str()` to check. +```r +str(hcst, max.level = _____) # try 1, 2, 3 +``` + +(3) The metadata variables are stored in `attr(hcst, 'Variables')`. What variables do we have? Use `str()` to check the structure first, then try to access the variable values. +```r +metadata_attr <- attr(hcst, 'Variables') +str(metadata_attr) +names(metadata_attr$common) + +hcst_time <- metadata_attr$common$time +hcst_lat <- __________ +hcst_lon <- __________ +``` + +### 1.b Forecast data + +The forecast data are from the same dataset as hindcast, but with different years. +Therefore, they share the same data path and strucutre. +Try to take the Start() call above and modify it to load the forecast data (hint: the start year is 2020.) + +```r + sdate_fcst <- ____________ + + fcst <- Start(dat = path_exp, + var = _____, + syear = sdate_fcst, + ensemble = 'all', + time = _____, + latitude = values(list(____, ____)), + latitude_reorder = Sort(), + longitude = values(list(____, ____)), + longitude_reorder = CircularSort(-180, 180), + transform = CDORemapper, + transform_params = list(grid = _____, method = 'bilinear'), + transform_vars = c('latitude', 'longitude'), + synonims = list(latitude = c('lat', 'latitude'), + longitude = c('lon', 'longitude')), + return_vars = list(time = _____, + longitude = NULL, latitude = NULL), + retrieve = TRUE) +``` + +**Questions** + +Check the forecast data by the same methods for hindcast data. + +(1) What are the dimensions of `fcst`? Use `dim()` to check. + +```r +dim(____) +``` + +(2) What is the structure of `fcst`? Use `str()` to check. +```r +str(hcst, max.level = _____) # try 1, 2, 3 +``` + +(3) The metadata variables are stored in `attr(fcst, 'Variables')`. What variables do we have? Use `str()` to check the structure first, then try to access the variable values. +```r +metadata_attr <- attr(_____, 'Variables') +str(metadata_attr) +names(metadata_attr$common) + +fcst_time <- __________ +fcst_lat <- __________ +fcst_lon <- __________ +``` + +### 1.c Observational data + +We need the corresponding observational data to compare with the experimental data. +So, the observational data should be loaded as the same dimensions as the experimental ones. +To achieve this, we can use the metadata of the experimental data as the selectors for observational data. But be careful with the usage! We must verify the correctness and applicability first. + +**Get the time values from hindcast data** + +Check the time attributes of `hcst`: Is it correct? + +```r +dim(attributes(hcst)$Variables$common$time) +str(attributes(hcst)$Variables$common$time) +``` + +The values are not correct since they should start from November, not December. +But the array has the correct dimensions and we can take advantage of it. +What we're going to do here is to tune the values one month ahead so we can have the correct dates. +(p.s. `lubridate` is a useful R package for time value manipulation!) + +```r +attributes(hcst)$Variables$common$time <- attributes(hcst)$Variables$common$time - lubridate::days(1) +date_string <- format(attributes(hcst)$Variables$common$time, '%Y%m') +sdate_obs <- array(date_string, dim = c(syear = 24, time = 2)) +print(sdate_obs) +``` + +Now we have the correct date values, we can use them as the selectors of `syear` in the Start() call. In addition, we will use the reshaping feature in startR to get the desired dimensions. + +If the selector is an array, the parameter `split_multiselected_dims` of Start() splits the array by dimensions and we will get those dimensions in the output. +For example, we will use `sdate_obs` as the selector of "syear" dimension below. +`sdate_obs` has two dimensions, "syear" and "time"; +so, by `split_multiselected_dims`, the output `obs` will have these two dimensions, +even "time" is not explicitly specified in the Start() call. + +```r + path_obs <- '/esarchive/recon/ecmwf/era5/monthly_mean/$var$_f1h-r1440x721cds/$var$_$syear$.nc' + #---------------------------------------------------------------------- + # Run these two lines if you're on Marenostrum4 and log in with training account + prefix <- '/gpfs/scratch/bsc32/bsc32734/bsc_training_2023/R_handson/' + path_obs <- paste0(prefix, path_obs) + #---------------------------------------------------------------------- + + obs <- Start(dat = path_obs, + var = _____, + syear = sdate_obs, + split_multiselected_dims = TRUE, + latitude = values(list(_____, _____)), + latitude_reorder = Sort(), + longitude = values(list(_____, _____)), + longitude_reorder = CircularSort(-180, 180), + transform = CDORemapper, + transform_params = list(grid = ______, method = 'bilinear'), + transform_vars = c('latitude', 'longitude'), + synonims = list(latitude = c('lat', 'latitude'), + longitude = c('lon', 'longitude')), + return_vars = list(time = ______, + longitude = NULL, latitude = NULL), + retrieve = TRUE) +``` + +**Questions** + +Check the obsercational data by the same methods above. + +(1) What are the dimensions of `obs`? Use `dim()` to check. + +```r +dim(____) +``` + +(2) What is the structure of `obs`? Use `str()` to check. +```r +str(obs, max.level = ____) # try 1, 2, 3 +``` + +(3) The metadata variables are stored in `attr(obs, 'Variables')`. What variables do we have? Use `str()` to check the structure first, then try to access the variable values. +```r +metadata_attr <- attr(____, 'Variables') +str(metadata_attr) +names(metadata_attr$common) + +obs_time <- __________ +obs_lat <- __________ +obs_lon <- __________ +``` + + +## 2. Check if the datasets are consistent + +Wrong data, wrong everything afterward. It is important to examine the data and metadata after we load them. + +(1) Compare the dimensions of the three data by `dim()`. +```r + +``` +(2) Check the summary of the data by `summary()`. +```r +summary(hcst) +summary(fcst) +summary(obs) +``` + +(3) Compare metadata. We have saved the latitude, longitude, and time attributes above after loading each data. +Use `identical()` or `all.equal()` to check if the values are consistent. +```r +# lat and lon +identical(____, ____) +all.equal(____, ____) + +# time: only compare year and month +identical(format(hcst_time, '%Y%m'), format(obs_time, '%Y%m')) +``` diff --git a/inst/doc/tutorial/PATC2023/handson_1-data-loading_ans.md b/inst/doc/tutorial/PATC2023/handson_1-data-loading_ans.md new file mode 100644 index 0000000000000000000000000000000000000000..d0d4b07697449908e4f18c49dc5da1b8df059d52 --- /dev/null +++ b/inst/doc/tutorial/PATC2023/handson_1-data-loading_ans.md @@ -0,0 +1,269 @@ +# Hands-on 1: Load data by startR + +## Goal +Use startR to load the data and learn how to adjust data structure while loading data. + +## 0. Load required packages + +```r +# Clean the session +rm(list = ls()) +# Load package +library(startR) +``` + +**Data description**: +We will use two datasets in the hands-on. The experiment data are Meteo-France System 7 from ECMWF, and the observation ones are ERA5 from ECMWF. The data have been first processed into monthly mean data and stored in our data archive (esarchive). + +We're going to analyze the near-surface temperature (short name: tas) for seasonal forecast. We will focus on the Europe region (roughly 20W-40E, 20N-80N). The hindcast years are 1993 to 2016, and the forecast year is 2020. The initial month is November. To speed up the practice, we will only load the first two forecast time steps, but all the ensemble members are used to give a less biased result. + +## 1. Load experimental data from data repository + +### 1.a Hindcast data + +Understand the following script, run it, and check the result. + +```r + # Use this one if on workstation or nord3 (have access to /esarchive) + path_exp <- "/esarchive/exp/meteofrance/system7c3s/monthly_mean/$var$_f6h/$var$_$syear$.nc" + #---------------------------------------------------------------------- + # Run these two lines if you're on Marenostrum4 and log in with training account + prefix <- '/gpfs/scratch/bsc32/bsc32734/bsc_training_2023/R_handson/' + path_exp <- paste0(prefix, path_exp) + #---------------------------------------------------------------------- + + sdate_hcst <- paste0(1993:2016, '1101') + + hcst <- Start(dat = path_exp, + var = 'tas', + syear = sdate_hcst, + ensemble = 'all', + time = 1:2, + latitude = values(list(20, 80)), + latitude_reorder = Sort(), + longitude = values(list(-20, 40)), + longitude_reorder = CircularSort(-180, 180), + transform = CDORemapper, + transform_params = list(grid = 'r360x181', method = 'bilinear'), + transform_vars = c('latitude', 'longitude'), + synonims = list(latitude = c('lat', 'latitude'), + longitude = c('lon', 'longitude')), + return_vars = list(time = 'syear', + longitude = NULL, latitude = NULL), + retrieve = TRUE) +``` + +**Questions** + +(1) What are the dimensions of `hcst`? Use `dim()` to check. + +```r +dim(hcst) +# dat var syear ensemble time latitude longitude +# 1 1 24 25 2 61 61 +``` + +(2) What is the structure of `hcst`? Use `str()` to check. +```r +str(hcst, max.level = 1) +str(hcst, max.level = 2) +str(hcst, max.level = 3) +``` + +(3) The metadata variables are stored in `attr(hcst, 'Variables')`. What variables do we have? Use `str()` to check the structure first, then try to access the variable values. +```r +metadata_attr <- attr(hcst, 'Variables') +str(metadata_attr) +names(metadata_attr$common) + +hcst_time <- metadata_attr$common$time +hcst_lat <- metadata_attr$common$latitude +hcst_lon <- metadata_attr$common$longitude +``` + +### 1.b Forecast data + +The forecast data are from the same dataset as hindcast, but with different years. +Therefore, they share the same data path and strucutre. +Try to take the Start() call above and modify it to load the forecast data (hint: the start year is 2020.) + +```r + sdate_fcst <- '20201101' + + fcst <- Start(dat = path_exp, + var = 'tas', + syear = sdate_fcst, + ensemble = 'all', + time = 1:2, + latitude = values(list(20, 80)), + latitude_reorder = Sort(), + longitude = values(list(-20, 40)), + longitude_reorder = CircularSort(-180, 180), + transform = CDORemapper, + transform_params = list(grid = 'r360x181', method = 'bilinear'), + transform_vars = c('latitude', 'longitude'), + synonims = list(latitude = c('lat', 'latitude'), + longitude = c('lon', 'longitude')), + return_vars = list(time = 'syear', + longitude = NULL, latitude = NULL), + retrieve = TRUE) +``` + +**Questions** + +Check the forecast data by the same methods for hindcast data. + +(1) What are the dimensions of `fcst`? Use `dim()` to check. + +```r +dim(fcst) +# dat var syear ensemble time latitude longitude +# 1 1 1 51 2 61 61 +``` + +(2) What is the structure of `fcst`? Use `str()` to check. +```r +str(fcst, max.level = 1) +str(fcst, max.level = 2) +str(fcst, max.level = 3) +``` + +(3) The metadata variables are stored in `attr(fcst, 'Variables')`. What variables do we have? Use `str()` to check the structure first, then try to access the variable values. +```r +metadata_attr <- attr(fcst, 'Variables') +str(metadata_attr) +names(metadata_attr$common) + +fcst_time <- metadata_attr$common$time +fcst_lat <- metadata_attr$common$latitude +fcst_lon <- metadata_attr$common$longitude +``` + +### 1.c Observational data + +We need the corresponding observational data to compare with the experimental data. +So, the observational data should be loaded as the same dimensions as the experimental ones. +To achieve this, we can use the metadata of the experimental data as the selectors for observational data. But be careful with the usage! We must verify the correctness and applicability first. + +**Get the time values from hindcast data** + +Check the time attributes of `hcst`: Is it correct? + +```r +dim(attributes(hcst)$Variables$common$time) +#syear time +# 24 2 + +str(attributes(hcst)$Variables$common$time) +# POSIXct[1:48], format: "1993-12-01" "1994-12-01" "1995-12-01" "1996-12-01" "1997-12-01" ... +``` + +The values are not correct since they should start from November, not December. +But the array has the correct dimensions and we can take advantage of it. +What we're going to do here is to tune the values one month ahead so we can have the correct dates. +(p.s. `lubridate` is a useful R package for time value manipulation!) + +```r +attributes(hcst)$Variables$common$time <- attributes(hcst)$Variables$common$time - lubridate::days(1) +date_string <- format(attributes(hcst)$Variables$common$time, '%Y%m') +sdate_obs <- array(date_string, dim = c(syear = 24, time = 2)) +print(sdate_obs) +``` + +Now we have the correct date values, we can use them as the selectors of `syear` in the Start() call. In addition, we will use the reshaping feature in startR to get the desired dimensions. + +If the selector is an array, the parameter `split_multiselected_dims` of Start() splits the array by dimensions and we will get those dimensions in the output. +For example, we will use `sdate_obs` as the selector of "syear" dimension below. +`sdate_obs` has two dimensions, "syear" and "time"; +so, by `split_multiselected_dims`, the output `obs` will have these two dimensions, +even "time" is not explicitly specified in the Start() call. + +```r + path_obs <- '/esarchive/recon/ecmwf/era5/monthly_mean/$var$_f1h-r1440x721cds/$var$_$syear$.nc' + #---------------------------------------------------------------------- + # Run these two lines if you're on Marenostrum4 and log in with training account + prefix <- '/gpfs/scratch/bsc32/bsc32734/bsc_training_2023/R_handson/' + path_obs <- paste0(prefix, path_obs) + #---------------------------------------------------------------------- + + obs <- Start(dat = path_obs, + var = 'tas', + syear = sdate_obs, + split_multiselected_dims = TRUE, + latitude = values(list(20, 80)), + latitude_reorder = Sort(), + longitude = values(list(-20, 40)), + longitude_reorder = CircularSort(-180, 180), + transform = CDORemapper, + transform_params = list(grid = 'r360x181', method = 'bilinear'), + transform_vars = c('latitude', 'longitude'), + synonims = list(latitude = c('lat', 'latitude'), + longitude = c('lon', 'longitude')), + return_vars = list(time = 'syear', + longitude = NULL, latitude = NULL), + retrieve = TRUE) +``` + +**Questions** + +Check the obsercational data by the same methods above. + +(1) What are the dimensions of `obs`? Use `dim()` to check. + +```r +dim(obs) +# dat var syear time latitude longitude +# 1 1 24 2 61 61 +``` + +(2) What is the structure of `obs`? Use `str()` to check. +```r +str(obs, max.level = 1) +str(obs, max.level = 2) +str(obs, max.level = 3) +``` + +(3) The metadata variables are stored in `attr(obs, 'Variables')`. What variables do we have? Use `str()` to check the structure first, then try to access the variable values. +```r +metadata_attr <- attr(obs, 'Variables') +str(metadata_attr) +names(metadata_attr$common) + +obs_time <- metadata_attr$common$time +obs_lat <- metadata_attr$common$latitude +obs_lon <- metadata_attr$common$longitude +``` + + +## 2. Check if the datasets are consistent + +Wrong data, wrong everything afterward. It is important to examine the data and metadata after we load them. + +(1) Compare the dimensions of the three data by `dim()`. +```r +dim(hcst) +dim(fcst) +dim(obs) +``` +(2) Check the summary of the data by `summary()`. +```r +summary(hcst) +summary(fcst) +summary(obs) +``` + +(3) Compare metadata. We have saved the latitude, longitude, and time attributes above after loading each data. +Use `identical()` or `all.equal()` to check if the values are consistent. +```r +identical(obs_lat, hcst_lat) +[1] TRUE +identical(obs_lon, hcst_lon) +[1] TRUE +identical(fcst_lat, hcst_lat) +[1] TRUE +identical(fcst_lon, hcst_lon) +[1] TRUE + +identical(format(hcst_time, '%Y%m'), format(obs_time, '%Y%m')) +[1] TRUE +```