diff --git a/.Rbuildignore b/.Rbuildignore index 90018c782d3ac1075895bbfa778c1248ae5259dd..2ef8ba9063900318c7c0be04cc2ac48a636842e4 100644 --- a/.Rbuildignore +++ b/.Rbuildignore @@ -6,16 +6,16 @@ ^README\.md$ #\..*\.RData$ #^vignettes$ -#^tests$ ^inst/doc$ -#^inst/doc/*$ -#^inst/doc/figures/$ -#^inst/doc/usecase/$ -#^inst/PlotProfiling\.R$ -.gitlab-ci.yml +^\.gitlab-ci\.yml$ +## unit tests should be ignored when building the package for CRAN +^tests$ +^inst/PlotProfiling\.R$ + # Suggested by http://r-pkgs.had.co.nz/package.html ^.*\.Rproj$ # Automatically added by RStudio, ^\.Rproj\.user$ # used for temporary files. ^README\.Rmd$ # An Rmarkdown file used to generate README.md ^cran-comments\.md$ # Comments for CRAN submission -^NEWS\.md$ # A news file written in Markdown +#^NEWS\.md$ # A news file written in Markdown +^\.gitlab-ci\.yml$ diff --git a/inst/doc/faq.md b/inst/doc/faq.md index 3a065e0612b82b3c92b8de2d79e357067b22cc57..6c6a11959b6c69afe38a5260a935e62281635239 100644 --- a/inst/doc/faq.md +++ b/inst/doc/faq.md @@ -29,6 +29,7 @@ This document intends to be the first reference for any doubts that you may have 23. [The best practice of using vector and list for selectors](#23-the-best-practice-of-using-vector-and-list-for-selectors) 24. [Do both interpolation and chunking on spatial dimensions](#24-do-both-interpolation-and-chunking-on-spatial-dimensions) 25. [What to do if your function has too many target dimensions](#25-what-to-do-if-your-function-has-too-many-target-dimensions) + 26. [Use merge_across_dims_narm to remove NAs](#26-use-merge-across-dims-narm-to-remove-nas) 2. **Something goes wrong...** @@ -83,12 +84,12 @@ all the possibilities and make the output data structure reasonable all the time Therefore, it is recommended to understand the way Start() rolls first, then you know what you should expect from the output and will not get confused with what it returns to you. -If you want to connet xxx along yyy, the parameter 'merge_across_dims' and 'merge_across_dims_narm' can help you achieve it. -See Example 1. If 'merge_across_dims = TRUE', the chunk dimension will disappear. -'merge_across_dims' simply attaches data one after another, so the NA values (if exist) will be the same places as the unmerged one (see Example 2). +Now we understand the cross relationship between dimensions, we can talk about how to merge them: use the parameters `merge_across_dims` and `merge_across_dims_narm`. +See Example 1. If `merge_across_dims = TRUE`, the chunk dimension will disappear. +`merge_across_dims` simply attaches data one after another, so the NA values (if exist) will be the same places as the unmerged one (see Example 2). -If you want to remove those additional NAs, you can use 'merge_across_dims_narm = TRUE', -then the NAs will be removed when merging into one dimension. (see Example 2). +If you want to remove those additional NAs, you can use `merge_across_dims_narm = TRUE`, +then the NAs will be removed when merging into one dimension. (see Example 2). To know more about `merge_across_dims_narm`, check [How-to-26](#26-use-merge-across-dims-narm-to-remove-nas). You can find more use cases at [ex1_2_exp_obs_attr.R](inst/doc/usecase/ex1_2_exp_obs_attr.R) and [ex1_3_attr_loadin.R](inst/doc/usecase/ex1_3_attr_loadin.R). @@ -464,7 +465,7 @@ data <- Start(dat = repos, If you want to interpolate data by s2dv::CDORemap in function, you need to tell the machine which CDO module to use. Therefore, `CDO_module = 'CDO/1.9.5-foss-2018b'` should be -added in Compute() cluster list. See the example in usecase [ex2_3_cdo.R](inst/doc/usecase/ex2_3_cdo.R). +added in Compute() cluster list. See the example in usecase [ex2_3](inst/doc/usecase/ex2_3_cdo.R). ### 10. The number of members depends on the start date @@ -476,9 +477,9 @@ When trying to load both start dates at once using Start(), the order in which t - `sdates = c('19991101', '19990901')`, the member dimension will be of length 51, showing missing values for the members 26 to 51 in the second start date; - `sdates = c('19990901', '19991101')`, the member dimension will be of length 25, any member will be missing. -To ensure that all the members are retrieved, we can use parameter 'largest_dims_length'. See [FAQ 21](https://earth.bsc.es/gitlab/es/startR/-/blob/master/inst/doc/faq.md#21-retrieve-the-complete-data-when-the-dimension-length-varies-among-files) for details. +To ensure that all the members are retrieved, we can use parameter `largest_dims_length`. See [FAQ 21](https://earth.bsc.es/gitlab/es/startR/-/blob/master/inst/doc/faq.md#21-retrieve-the-complete-data-when-the-dimension-length-varies-among-files) for details. -The code to reproduce this behaviour could be found in the Use Cases section, [example 1.4](/inst/doc/usecase/ex1_4_variable_nmember.R). +The code to reproduce this behaviour could be found in the usecase [ex1_4](/inst/doc/usecase/ex1_4_variable_nmember.R). ### 11. Select the longitude/latitude region @@ -851,18 +852,18 @@ same. For example, the member number in one experiment is 25 in the early years increase to 51 later. If you assign `member = 'all'` in Start() call, the returned member dimension length will be 25 only. -The parameter 'largest_dims_length' is for this case. Its default value is `FALSE`, meaning +The parameter `largest_dims_length` is for this case. Its default value is `FALSE`, meaning that Start() can only use the first valid file to decide the dimensions. If it is changed to `TRUE`, Start() will examine all the required files to find the largest length for all the inner dimensions. It is time- and resource-consuming, but useful when you are not sure how the dimensions in all the files look like. -If you know the expected dimension length, it is recommended to assign 'largest_dims_length' +If you know the expected dimension length, it is recommended to assign `largest_dims_length` by a named integer vector, for example, `largest_dims_length = c(member = 51)`. Start() will adopt the provided ones and use the first valid file to decide the rest of dimensions. By this means, the efficiency can be similar to `largest_dims_length = FALSE`. -Find example in [use case ex1_4](/inst/doc/usecase/ex1_4_variable_nmember.R). +Find example in use case [ex1_4](/inst/doc/usecase/ex1_4_variable_nmember.R). ### 22. Define the selector when the indices in the files are not aligned @@ -978,6 +979,20 @@ We provide some [use cases](inst/doc/usecase/ex2_12_transform_and_chunk.R) showi Unfortunately, we don't have a perfect solution now before we have multiple steps feature. Talk to maintainers to see how to generate a workaround for your case. +### 26. Use merge_across_dims_narm to remove NAs +The Start() parameter `merge_across_dims_narm` can be useful when you want to merge two dimensions together (e.g., time across chunk.) If you're not familiar with the usage of `xxx_across = yyy` and `merge_across_dims` yet, check [How-to-2](#2-indicate-dependent-dimension-and-use-merge-parameters-in-start) first. + +First thing to notice is that `merge_across_dims_narm` can only remove **the NAs that are created by Start()** during the reshaping process. +It doesn't remove the NAs in the original data. For example, in Example 2 in How-to-2, the NAs are removed because those NAs are added by Start(). + +Second, if the files don't share the same length of the merged dimension, you need to use `largest_dims_length = T` along. +This parameter tells Start() that it needs to look into each file to know the dimension length. By doing this, Start() knows that the NAs in the files with shorter dimension are added by it, so `merge_across_dims_narm = T` can remove those NAs correctly. +A typical example is reading daily data and merging time dimension together. The 30-day months will have one NA at the end of time dimension, if `merge_across_dims_narm = T` and `largest_dims_length = T` are not used. +Check usecase [ex1_16](/inst/doc/usecase/ex1_16_files_different_time_dim_length.R) for the example script. +See [How-to-21](#21-retrieve-the-complete-data-when-the-dimension-length-varies-among-files) for more details of `largest_dims_length`. + + + # Something goes wrong... ### 1. No space left on device diff --git a/inst/doc/tutorial/PATC2022/Figures/.gitkeep b/inst/doc/tutorial/PATC2022/Figures/.gitkeep new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/inst/doc/tutorial/PATC2022/Figures/handson_4_fig_rmsss.png b/inst/doc/tutorial/PATC2022/Figures/handson_4_fig_rmsss.png new file mode 100644 index 0000000000000000000000000000000000000000..1633a1c01aee042a5fb82e3277250338d81e4893 Binary files /dev/null and b/inst/doc/tutorial/PATC2022/Figures/handson_4_fig_rmsss.png differ diff --git a/inst/doc/tutorial/PATC2022/Figures/handson_4_fig_rpss.png b/inst/doc/tutorial/PATC2022/Figures/handson_4_fig_rpss.png new file mode 100644 index 0000000000000000000000000000000000000000..6277483b3b0ece5a2d54db69ecd897b90e1e45aa Binary files /dev/null and b/inst/doc/tutorial/PATC2022/Figures/handson_4_fig_rpss.png differ diff --git a/inst/doc/tutorial/PATC2022/Figures/handson_5_fig1.png b/inst/doc/tutorial/PATC2022/Figures/handson_5_fig1.png new file mode 100644 index 0000000000000000000000000000000000000000..30be4766de563f11be0a786400997907abe5c97b Binary files /dev/null and b/inst/doc/tutorial/PATC2022/Figures/handson_5_fig1.png differ diff --git a/inst/doc/tutorial/PATC2022/Figures/handson_5_fig2.png b/inst/doc/tutorial/PATC2022/Figures/handson_5_fig2.png new file mode 100644 index 0000000000000000000000000000000000000000..1db8c76efb51d65fe0e1002a41bbbf792e1f4860 Binary files /dev/null and b/inst/doc/tutorial/PATC2022/Figures/handson_5_fig2.png differ diff --git a/inst/doc/tutorial/PATC2022/Figures/handson_5_fig3.png b/inst/doc/tutorial/PATC2022/Figures/handson_5_fig3.png new file mode 100644 index 0000000000000000000000000000000000000000..1d1aba9fd14f6f52b9a73c674360506d1d930079 Binary files /dev/null and b/inst/doc/tutorial/PATC2022/Figures/handson_5_fig3.png differ diff --git a/inst/doc/tutorial/PATC2022/Figures/handson_5_fig4.png b/inst/doc/tutorial/PATC2022/Figures/handson_5_fig4.png new file mode 100644 index 0000000000000000000000000000000000000000..00e1730deaa35a0e137dfe0ce91dcd9882c47e6e Binary files /dev/null and b/inst/doc/tutorial/PATC2022/Figures/handson_5_fig4.png differ diff --git a/inst/doc/tutorial/PATC2022/handson_3-rainfarm.md b/inst/doc/tutorial/PATC2022/handson_3-rainfarm.md new file mode 100644 index 0000000000000000000000000000000000000000..a7098b5e291be30dab46329ca4be50b2f9e01e92 --- /dev/null +++ b/inst/doc/tutorial/PATC2022/handson_3-rainfarm.md @@ -0,0 +1,131 @@ +# Hands-on 3: Load data by startR + +## Goal +Use startR to load the data used in CSTools [RainFARM vignette](https://earth.bsc.es/gitlab/external/cstools/-/blob/master/vignettes/RainFARM_vignette.Rmd). Learn how to adjust data while loading data. + +## 0. Load required packages + +```r +# Clean the session +rm(list = ls()) + +library(startR) +library(CSTools) +``` + +## 1. Load data from data repository (esarchive/) + +**Data description**: +This sample data set contains a small cutout of gridded seasonal precipitation +forecast data from the Copernicus Climate Change ECMWF-System 5 forecast system. +Specifically, for the 'prlr' (precipitation) variable, for the first 6 forecast +ensemble members, daily values, for all 31 days in March following the forecast +starting dates in November of years 2010 to 2012, for a small 4x4 pixel cutout in +a region in the North-Western Italian Alps (44N-47N, 6E-9E). The data resolution is 1 degree. + +Use the above information to define the variable, start dates, longitude and latitude. + +```r + # Use this one if on workstation or nord3 (have access to /esarchive) + repos <- "/esarchive/exp/ecmwf/system5c3s/daily_mean/$var$_s0-24h/$var$_$sdate$.nc" + # Use this one if on Marenostrum4 and log in with PATC2021 account + repos <- paste0('/gpfs/scratch/nct01/nct01127/d3_R_handson/esarchive/', + 'exp/ecmwf/system5c3s/daily_mean/', + '$var$_s0-24h/$var$_$sdate$.nc') + + var <- ______ + sdate <- paste0(______, '1101') + lon.min <- ______ + lon.max <- ______ + lat.min <- ______ + lat.max <- ______ +``` + +Calculate the forecast time step by package "lubridate". Desired days: the whole March (leap year ignored). +Check the usage of the following functions by type `?` or check official website https://lubridate.tidyverse.org/reference/. + +```r + library(lubridate) + ## Check which year is leap year + leap_year(2011:2013) + ## Find the first time step we want + ftime_s <- as.integer(ymd("2011-03-01", tz = "UTC") - ymd("2010-11-01", tz = "UTC")) + 1 + ftime <- ftime_s:(ftime_s + 30) + print(ftime) +``` + +Use Start() to load the data. + +```r + data <- Start(dat = repos, + var = var, + sdate = sdate, + ensemble = indices(______), + time = ftime, + latitude = values(list(lat.min, lat.max)), + latitude_reorder = Sort(), + longitude = values(list(lon.min, lon.max)), + longitude_reorder = CircularSort(0, 360), + synonims = list(latitude = c('lat', 'latitude'), + longitude = c('lon', 'longitude')), + return_vars = list(time = 'sdate', + longitude = NULL, latitude = NULL), + retrieve = TRUE) +``` + + +## 2. Compare with CSTools lonlat_prec + +Load CSTools data `lonlat_prec`. + +```r + stored_data <- CSTools::lonlat_prec +``` + +### Dimensions + +```r + dim(data) + dim(stored_data$data) +``` + +### Dates + +```r + data_time_attr <- attr(data, 'Variables')$common$time + dim(data_time_attr) + dim(stored_data$Dates$start) + str(stored_data$Dates$start) +``` + +### Latitude and Longitude + +```r + data_lat_attr <- attr(data, 'Variables')$common$latitude + as.vector(data_lat_attr) + as.vector(stored_data$lat) + + data_lon_attr <- attr(data, 'Variables')$common$longitude + as.vector(data_lon_attr) + as.vector(stored_data$lon) +``` + +## 3. Adjust Start() call to get more consistent result with lonlat_prec + +You should already see that the output of Start() `data` does have the same values +as `lonlat_prec`, but there are some differences. For example, the "member" and "time" +dimensions have different names; latitude vector order is different. + +1. We can use parameter `synonims` to change the dimension name freely. +In Start(), change `ensemble` to `member` and `time` to `ftime`, and define their synonims. + +2. Sort() in startR is a wrapper function of base function sort(). You can find it in Start() +call above, `latitude_reorder = Sort()`. Type `?sort` in R console to find how to change +latitude to descending order. + +3. To switch `member` and `sdate` dimension order, simply switch the order in Start() call. + +```r +# Write a new Start() call based on the previous script + data <- Start(...) +``` diff --git a/inst/doc/tutorial/PATC2022/handson_3-rainfarm_ans.md b/inst/doc/tutorial/PATC2022/handson_3-rainfarm_ans.md new file mode 100644 index 0000000000000000000000000000000000000000..23c3ea804a550e49b35d2e1a87f14faebc0f4462 --- /dev/null +++ b/inst/doc/tutorial/PATC2022/handson_3-rainfarm_ans.md @@ -0,0 +1,176 @@ +# Hands-on 3: Load data by startR + +## Goal +Use startR to load the data used in CSTools [RainFARM vignette](https://earth.bsc.es/gitlab/external/cstools/-/blob/master/vignettes/RainFARM_vignette.Rmd). Learn how to adjust data while loading data. + +## 0. Load required packages + +```r +# Clean the session +rm(list = ls()) + +library(startR) +library(CSTools) +``` + +## 1. Load data from data repository (esarchive/) + +**Data description**: +This sample data set contains a small cutout of gridded seasonal precipitation +forecast data from the Copernicus Climate Change ECMWF-System 5 forecast system. +Specifically, for the 'prlr' (precipitation) variable, for the first 6 forecast +ensemble members, daily values, for all 31 days in March following the forecast +starting dates in November of years 2010 to 2012, for a small 4x4 pixel cutout in +a region in the North-Western Italian Alps (44N-47N, 6E-9E). The data resolution is 1 degree. + +Use the above information to define the variable, start dates, longitude and latitude. + +```r + # Use this one if on workstation or nord3 (have access to /esarchive) + repos <- "/esarchive/exp/ecmwf/system5c3s/daily_mean/$var$_s0-24h/$var$_$sdate$.nc" + # Use this one if on Marenostrum4 and log in with PATC2021 account + repos <- paste0('/gpfs/scratch/nct01/nct01127/d3_R_handson/esarchive/', + 'exp/ecmwf/system5c3s/daily_mean/', + '$var$_s0-24h/$var$_$sdate$.nc') + + var <- 'prlr' + sdate <- paste0(2010:2012, '1101') + lon.min <- 6 + lon.max <- 9 + lat.min <- 44 + lat.max <- 47 +``` + +Calculate the forecast time step by package "lubridate". Desired days: the whole March (leap year ignored). +Check the usage of the following functions by type `?` or check official website https://lubridate.tidyverse.org/reference/. + +```r + library(lubridate) + ## Check which year is leap year + leap_year(2011:2013) + ## Find the first time step we want + ftime_s <- as.integer(ymd("2011-03-01", tz = "UTC") - ymd("2010-11-01", tz = "UTC")) + 1 + ftime <- ftime_s:(ftime_s + 30) + print(ftime) +# [1] 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 +#[20] 140 141 142 143 144 145 146 147 148 149 150 151 + +``` + +Use Start() to load the data. + +```r + data <- Start(dat = repos, + var = var, + sdate = sdate, + ensemble = indices(1:6), + time = ftime, + latitude = values(list(lat.min, lat.max)), + latitude_reorder = Sort(), + longitude = values(list(lon.min, lon.max)), + longitude_reorder = CircularSort(0, 360), + synonims = list(latitude = c('lat', 'latitude'), + longitude = c('lon', 'longitude')), + return_vars = list(time = 'sdate', + longitude = NULL, latitude = NULL), + retrieve = TRUE) +``` + + +## 2. Compare with CSTools lonlat_prec + +Load CSTools data `lonlat_prec`. + +```r + stored_data <- CSTools::lonlat_prec +``` + +### Dimensions + +```r + dim(data) +# dat var sdate ensemble time latitude longitude +# 1 1 3 6 31 4 4 + + dim(stored_data$data) +#dataset member sdate ftime lat lon +# 1 6 3 31 4 4 +``` + +### Dates + +```r + data_time_attr <- attr(data, 'Variables')$common$time + dim(data_time_attr) +#sdate time +# 3 31 + + dim(stored_data$Dates$start) +#NULL + + str(stored_data$Dates$start) +# POSIXct[1:93], format: "2011-03-01" "2011-03-02" "2011-03-03" "2011-03-04" "2011-03-05" ... + +``` + +### Latitude and Longitude + +```r + data_lat_attr <- attr(data, 'Variables')$common$latitude + as.vector(data_lat_attr) +#[1] 44 45 46 47 + + as.vector(stored_data$lat) +#[1] 47 46 45 44 + + data_lon_attr <- attr(data, 'Variables')$common$longitude + as.vector(data_lon_attr) +#[1] 6 7 8 9 + + as.vector(stored_data$lon) +#[1] 6 7 8 9 + +``` + +## 3. Adjust Start() call to get more consistent result with lonlat_prec + +You should already see that the output of Start() `data` does have the same values +as `lonlat_prec`, but there are some differences. For example, the "member" and "time" +dimensions have different names; the dimension orders are different; latitude vector order is different. + +1. We can use parameter `synonims` to change the dimension name freely. +In Start(), change `ensemble` to `member` and `time` to `ftime`, and define their synonims. + +2. Sort() in startR is a wrapper function of base function sort(). You can find it in Start() +call above, `latitude_reorder = Sort()`. Type `?sort` in R console to find how to change +latitude to descending order. + +3. To switch `member` and `sdate` dimension order, simply switch the order in Start() call. + +```r + data <- Start(dat = repos, + var = var, + member = indices(1:6), + sdate = sdate, + ftime = ftime, + latitude = values(list(lat.min, lat.max)), + latitude_reorder = Sort(decreasing = TRUE), + longitude = values(list(lon.min, lon.max)), + longitude_reorder = CircularSort(0, 360), + synonims = list(latitude = c('lat', 'latitude'), + longitude = c('lon', 'longitude'), + member = c('member', 'ensemble'), + ftime = c('ftime', 'time')), + return_vars = list(ftime = 'sdate', + longitude = NULL, latitude = NULL), + retrieve = TRUE) + + dim(data) +# dat var member sdate ftime latitude longitude +# 1 1 6 3 31 4 4 + + data_lat_attr <- attr(data, 'Variables')$common$latitude + as.vector(data_lat_attr) +#[1] 47 46 45 44 + +``` diff --git a/inst/doc/tutorial/PATC2022/handson_4-skill_workflow.md b/inst/doc/tutorial/PATC2022/handson_4-skill_workflow.md new file mode 100644 index 0000000000000000000000000000000000000000..80a6c54507a5afe58d824184e84b32d74dff5147 --- /dev/null +++ b/inst/doc/tutorial/PATC2022/handson_4-skill_workflow.md @@ -0,0 +1,192 @@ +# Hands-on 4: Seasonal forecast verification + +## Goal +Learn how to use the startR workflow to finish a piece of analysis, including defining and pre-processing the desired data, +defining the operation, building the workflow, and executing the operation. + +In this use case, the ranked probability skill score (RPSS) and the root mean square error skill score (RMSSS) are computed to verify the forecast. +To make the process faster, the required data size is small here so we can run the execution on workstation. + +## 0. Load required packages + +```r +# Clean the session +rm(list = ls()) + +library(startR) +library(s2dv) +``` + +## 1. Define data +The first step of the analysis is use Start() to identify the data needed. + +Hints: +(1) The data paths use two '$' as the wildcard, and it can be given any names you like. +(2) The variable we want to use is 'tas'. +(3) The required time (sdate) period is November 1991 to 2011. +(4) Read global data (i.e., full spatial grids.) +(5) Take all the ensemble members. +(6) Because we are going to use the whole startR workflow, we only need to create a pointer to the data repository rather than retrieve it to the workstation. + +```r +# exp data + # Use this one if on workstation or nord3 (have access to /esarchive) + repos_exp <- paste0('/esarchive/exp/ecmwf/system5c3s/monthly_mean/', + '$var$_f6h/$var$_$sdate$.nc') + # Use this one if on Marenostrum4 and log in with PATC2021 account + repos_exp <- paste0('/gpfs/scratch/nct01/nct01127/d3_R_handson/esarchive/', + 'exp/ecmwf/system5c3s/monthly_mean/', + '$var$_f6h/$var$_$sdate$.nc') + + sdates_exp <- sapply(1991:2011, function(x) paste0(x, '1101')) + + exp <- Start(dat = repos_exp, + var = ______, + sdate = sdates_exp, + time = indices(1), + ensemble = ______, + latitude = ______, + latitude_reorder = Sort(), + longitude = ______, + longitude_reorder = CircularSort(0, 360), + synonims = list(longitude = c('lon', 'longitude'), + latitude = c('lat', 'latitude')), + return_vars = list(latitude = NULL, longitude = NULL, time = 'sdate'), + retrieve = FALSE) + +# obs data + # Use this one if on workstation or nord3 (have access to /esarchive) + repos_obs <- paste0('/esarchive/recon/ecmwf/era5/monthly_mean/$var$_f1h-r360x180/', + '$var$_$sdate$.nc') + # Use this one if on Marenostrum4 and log in with PATC2022 account + repos_obs <- paste0('/gpfs/scratch/nct01/nct01127/d3_R_handson/esarchive/', + 'recon/ecmwf/era5/monthly_mean/', + '$var$_f1h-r360x180/$var$_$sdate$.nc') + + sdates_obs <- sapply(1991:2011, function(x) paste0(x, '11')) + + obs <- Start(dat = repos_obs, + var = ______, + sdate = sdates_obs, + time = indices(1), + latitude = ______, + latitude_reorder = Sort(), + longitude = ______, + longitude_reorder = CircularSort(0, 360), + synonims = list(longitude = c('lon', 'longitude'), + latitude = c('lat', 'latitude')), + return_vars = list(latitude = NULL, longitude = NULL, time = 'sdate'), + retrieve = FALSE) +``` + +Question: + +1. What are the dimensions of these two data? + +2. What is the size of these two data? + +3. Check the latitude, longitude, and time values of the two data. Are they consistent? + + +## 2. Define operation and workflow +It is recommended to define the function and write Step() together, because the latter one helps you clarify the input and output dimensions of the function. + +In the self-defined function, we want to use the two functions, 's2dv::RPSS' and 's2dv::RMSSS', to calculate the skill scores. +You can check the functions by typing `?s2dv::RPSS` and `?s2dv::RMSSS`, or on [s2dv GitLab](https://earth.bsc.es/gitlab/es/s2dv/-/blob/master/R/). + +Hint: +(1) The self-defined function is for exp and obs data. Therefore, target_dims and output_dims in Step() should be a list with two vector elements. +(2) The essential dimensions are "ensemble" and "sdate" for `exp`, and "sdate" for `obs`. These dimensions should be the 'target_dims' in Step(). +(3) To return the two skill scores as well as the significance test results, put them in a list of 4 elements. +(4) What are the dimensions of the outputs? These dimensions should be 'output_dims' in Step(). +(5) The first input of AddStep() should also be a list containing exp and obs. + +```r + # self-defined function + skill_func <- function(exp, obs) { + # exp: [ensemble, sdate] + # obs: [sdate] + + # Calculate RPSS + rpss <- s2dv::RPSS(exp, obs, memb_dim = ______) + + # Calculate RMSSS + ## ensemble mean + exp_ens <- s2dv::MeanDims(exp, ______, na.rm = T) + rmsss <- s2dv::RMSSS(exp_ens, obs, time_dim = ______, dat_dim = NULL) + + return(list(rpss = rpss$rpss, rpss_sign = rpss$sign, + rmsss = rmsss$rmsss, rmsss_pval = rmsss$p.val)) + } + step <- Step(skill_func, target_dims = list(exp = ______, obs = ______), + output_dims = list(rpss = NULL, rpss_sign = NULL, rmsss = NULL, rmsss_pval = NULL)) + wf <- AddStep(list(exp, obs), step) +``` + +Question: +1. Which dimensions are used in operation? Which dimensions are free? + +2. Can we use more dimensions as target_dims? What are the disadvantages? + + +## 3. Execute locally +To avoid potential technical problems in the connection and configuration, +we choose to run the execution locally. +Noted that it is recommended to submit jobs to HPCs if the data size is big or +the operation is heavy. + +Hint: +(1) Use the free dimensions (i.e., those are not 'target_dims' in Step()) to chunk. +(2) It is safe to divide the data into pieces of which the size each is 1/2-1/3 of RAM. In this case, +the data size is only around 270Mb, so you can play with it without worrying about crashing the terminal. + +```r + res <- Compute(wf$rpss, + chunks = list(______), + threads_load = 2, + threads_compute = 4) +``` + +## 4. Check the results + +1. What is the Compute summary printed on the screen? Do you see the total time, time for load, time for compute? + +2. Check the list structure using function `str()`. What are the items in the list? + +3. What are the dimensions of each item? + +4. Plot the maps + +Use `s2dv::PlotEquiMap` or other visualization tools to plot the maps. Check the usage of PlotEquiMap() on s2dv GitLab or type `?s2dv::PlotEquiMap`. + +```r + # Get latitude and longitude from the attributes + lat <- as.vector(attr(exp, 'Variables')$common$latitude) + lon <- as.vector(attr(exp, 'Variables')$common$longitude) + + # Set the color bars: check data range to decide the appropriate color bar + print(range(res$rpss, na.rm = TRUE)) + brks_rpss <- seq(______, ______, by = ______) + print(range(res$rmsss, na.rm = TRUE)) + brks_rmsss <- seq(______, ______, by = ______) + + # Plot RPSS + ##NOTE: Remove 'fileout' to view the figure in pop-up window + PlotEquiMap(______, lon = lon, lat = lat, + dots = ______, dot_size = 1.5, dot_symbol = 3, + brks = brks_rpss, + filled.continents = FALSE, triangle_ends = c(TRUE, TRUE), + toptitle = 'ECMWF system5 monthly mean tas RPSS 1991-2011', title_scale = 0.6, + fileout = '~/handson_4_fig_rpss.png') + + # Plot RMSSS + PlotEquiMap(______, lon = lon, lat = lat, + color_fun = clim.palette('yellowred'), + brks = brks_rmsss, + filled.continents = FALSE, triangle_ends = c(TRUE, TRUE), + toptitle = 'ECMWF system5 monthly mean tas RMSSS 1991-2011', title_scale = 0.6, + fileout = '~/handson_4_fig_rmsss.png') +``` + + +**BONUS**: Try to use only one thread for loading and computing (i.e., `threads_load = 1`, `threads_compute = 1`.) Is it slower than using more threads? diff --git a/inst/doc/tutorial/PATC2022/handson_4-skill_workflow_ans.md b/inst/doc/tutorial/PATC2022/handson_4-skill_workflow_ans.md new file mode 100644 index 0000000000000000000000000000000000000000..0a19c23c64872f9577d031af3b4fad5d6cda8414 --- /dev/null +++ b/inst/doc/tutorial/PATC2022/handson_4-skill_workflow_ans.md @@ -0,0 +1,314 @@ +# Hands-on 4: Seasonal forecast verification + +## Goal +Learn how to use the startR workflow to finish a piece of analysis, including defining and pre-processing the desired data, +defining the operation, building the workflow, and executing the operation. + +In this use case, the ranked probability skill score (RPSS) and the root mean square error skill score (RMSSS) are computed to verify the forecast. +To make the process faster, the required data size is small here so we can run the execution on workstation. + +## 0. Load required packages + +```r +# Clean the session +rm(list = ls()) + +library(startR) +library(s2dv) +``` + +## 1. Define data +The first step of the analysis is use Start() to identify the data needed. + +Hints: +(1) The data paths use two '$' as the wildcard, and it can be given any names you like. +(2) The variable we want to use is 'tas'. +(3) The required time (sdate) period is November 1991 to 2011. +(4) Read global data (i.e., full spatial grids.) +(5) Take all the ensemble members. +(6) Because we are going to use the whole startR workflow, we only need to create a pointer to the data repository rather than retrieve it to the workstation. + +```r +# exp data + # Use this one if on workstation or nord3 (have access to /esarchive) + repos_exp <- paste0('/esarchive/exp/ecmwf/system5c3s/monthly_mean/', + '$var$_f6h/$var$_$sdate$.nc') + # Use this one if on Marenostrum4 and log in with PATC2021 account + repos_exp <- paste0('/gpfs/scratch/nct01/nct01127/d3_R_handson/esarchive/', + 'exp/ecmwf/system5c3s/monthly_mean/', + '$var$_f6h/$var$_$sdate$.nc') + + sdates_exp <- sapply(1991:2011, function(x) paste0(x, '1101')) + + exp <- Start(dat = repos_exp, + var = 'tas', + sdate = sdates_exp, + time = indices(1), + ensemble = 'all', + latitude = 'all', + latitude_reorder = Sort(), + longitude = 'all', + longitude_reorder = CircularSort(0, 360), + synonims = list(longitude = c('lon', 'longitude'), + latitude = c('lat', 'latitude')), + return_vars = list(latitude = NULL, longitude = NULL, time = 'sdate'), + retrieve = FALSE) + +# obs data + # Use this one if on workstation or nord3 (have access to /esarchive) + repos_obs <- paste0('/esarchive/recon/ecmwf/era5/monthly_mean/$var$_f1h-r360x180/', + '$var$_$sdate$.nc') + # Use this one if on Marenostrum4 and log in with PATC2022 account + repos_obs <- paste0('/gpfs/scratch/nct01/nct01127/d3_R_handson/esarchive/', + 'recon/ecmwf/era5/monthly_mean/', + '$var$_f1h-r360x180/$var$_$sdate$.nc') + + sdates_obs <- sapply(1991:2011, function(x) paste0(x, '11')) + + obs <- Start(dat = repos_obs, + var = 'tas', + sdate = sdates_obs, + time = indices(1), + latitude = 'all', + latitude_reorder = Sort(), + longitude = 'all', + longitude_reorder = CircularSort(0, 360), + synonims = list(longitude = c('lon', 'longitude'), + latitude = c('lat', 'latitude')), + return_vars = list(latitude = NULL, longitude = NULL, time = 'sdate'), + retrieve = FALSE) +``` + +Question: + +1. What are the dimensions of these two data? +```r +attr(exp, 'Dimensions') + dat var sdate time ensemble latitude longitude + 1 1 21 1 25 181 360 + +attr(obs, 'Dimensions') + dat var sdate time latitude longitude + 1 1 21 1 181 360 +``` + +2. What is the size of these two data? +exp: 261Mb; obs: 10.4Mb as shown on screen from Start() call. + +3. Check the latitude, longitude, and time values of the two data. Are they consistent? + +```r +# latitude +as.vector(attr(exp, 'Variables')$common$lat) +as.vector(attr(obs, 'Variables')$common$lat) +# longitude +as.vector(attr(exp, 'Variables')$common$lon) +as.vector(attr(obs, 'Variables')$common$lon) +# time +attr(exp, 'Variables')$common$time +attr(obs, 'Variables')$common$time +``` + +## 2. Define operation and workflow +It is recommended to define the function and write Step() together, because the latter one helps you clarify the input and output dimensions of the function. + +In the self-defined function, we want to use the two functions, 's2dv::RPSS' and 's2dv::RMSSS', to calculate the skill scores. +You can check the functions by typing `?s2dv::RPSS` and `?s2dv::RMSSS`, or on [s2dv GitLab](https://earth.bsc.es/gitlab/es/s2dv/-/blob/master/R/). + +Hint: +(1) The self-defined function is for exp and obs data. Therefore, target_dims and output_dims in Step() should be a list with two vector elements. +(2) The essential dimensions are "ensemble" and "sdate" for `exp`, and "sdate" for `obs`. These dimensions should be the 'target_dims' in Step(). +(3) To return the two skill scores as well as the significance test results, put them in a list of 4 elements. +(4) What are the dimensions of the outputs? These dimensions should be 'output_dims' in Step(). +(5) The first input of AddStep() should also be a list containing exp and obs. + +```r + # self-defined function + skill_func <- function(exp, obs) { + # exp: [ensemble, sdate] + # obs: [sdate] + + # Calculate RPSS + rpss <- s2dv::RPSS(exp, obs, memb_dim = 'ensemble') + + # Calculate RMSSS + ## ensemble mean + exp_ens <- s2dv::MeanDims(exp, 'ensemble', na.rm = T) + rmsss <- s2dv::RMSSS(exp_ens, obs, time_dim = 'sdate', dat_dim = NULL) + + return(list(rpss = rpss$rpss, rpss_sign = rpss$sign, + rmsss = rmsss$rmsss, rmsss_pval = rmsss$p.val)) + } + step <- Step(skill_func, target_dims = list(exp = c('ensemble', 'sdate'), obs = c('sdate')), + output_dims = list(rpss = NULL, rpss_sign = NULL, rmsss = NULL, rmsss_pval = NULL)) + wf <- AddStep(list(exp, obs), step) +``` + +Question: +1. Which dimensions are used in operation? Which dimensions are free? +'ensemble' and 'sdate' are used in the self-defined function. The other dimensions (dat, var, time, latitude, and longitude) +are not necessary for the function. These free dimensions will be looped by multiApply during the execution. + +2. Can we use more dimensions as target_dims? What are the disadvantages? +It still works if putting more dimensions as target_dims, but we will lose the choices for chunking in the next step. Also, the lighter the function is, the quicker the operation runs. + + +## 3. Execute locally +To avoid potential technical problems in the connection and configuration, +we choose to run the execution locally. +Noted that it is recommended to submit jobs to HPCs if the data size is big or +the operation is heavy. + +Hint: +(1) Use the free dimensions (i.e., those are not 'target_dims' in Step()) to chunk. +(2) It is safe to divide the data into pieces of which the size each is 1/2-1/3 of RAM. In this case, +the data size is only around 270Mb, so you can play with it without worrying about crashing the terminal. + +```r + res <- Compute(wf$rpss, + chunks = list(latitude = 2, longitude = 2), + threads_load = 2, + threads_compute = 4) +``` + +## 4. Check the results + +1. What is the Compute summary printed on the screen? Do you see the total time, time for load, time for compute? + +```r +* Computation ended successfully. +* Number of chunks: 4 +* Max. number of concurrent chunks (jobs): 1 +* Requested cores per job: NA +* Load threads per chunk: 2 +* Compute threads per chunk: 4 +* Total time (s): 331.454513549805 +* Chunking setup: 0.00129866600036621 +* Data upload to cluster: 0 +* All chunks: 328.215093135834 +* Transfer results from cluster: 0 +* Merge: 3.23812174797058 +* Each chunk: +* queue: +* mean: 0 +* min: 0 +* max: 0 +* job setup: +* mean: 0 +* min: 0 +* max: 0 +* load: +* mean: 6.95802348852158 +* min: 6.30401253700256 +* max: 8.70350694656372 +* compute: +* mean: 75.0939234495163 +* min: 74.6092278957367 +* max: 75.3746023178101 +``` + +2. Check the list structure using function `str()`. What are the items in the list? +```r +str(res) +List of 4 + $ rpss : num [1, 1, 1, 1:181, 1:360] 0.1014 -0.0406 0.1024 0.2097 0.1854 ... + $ rpss_sign : logi [1, 1, 1, 1:181, 1:360] FALSE FALSE FALSE FALSE FALSE FALSE ... + $ rmsss : num [1, 1, 1, 1:181, 1:360] 0.994 0.993 0.993 0.993 0.993 ... + $ rmsss_pval: num [1, 1, 1, 1:181, 1:360] 0 0 0 0 0 0 0 0 0 0 ... + .. +``` + +3. What are the dimensions of each item? +```r +dim(res$rpss) + dat var time latitude longitude + 1 1 1 181 360 +dim(res$rpss_sign) + dat var time latitude longitude + 1 1 1 181 360 +dim(res$rmsss) + dat var time latitude longitude + 1 1 1 181 360 +dim(res$rmsss_pval) + dat var time latitude longitude + 1 1 1 181 360 +``` + +4. Plot the maps + +Use `s2dv::PlotEquiMap` or other visualization tools to plot the maps. Check the usage of PlotEquiMap() on s2dv GitLab or type `?s2dv::PlotEquiMap`. + +```r + # Get latitude and longitude from the attributes + lat <- as.vector(attr(exp, 'Variables')$common$latitude) + lon <- as.vector(attr(exp, 'Variables')$common$longitude) + + # Set the color bars: check data range to decide the appropriate color bar + print(range(res$rpss, na.rm = TRUE)) +#[1] -1.0729143 0.9982857 + brks_rpss <- seq(-1, 0.9, by = 0.1) + print(range(res$rmsss, na.rm = TRUE)) +#[1] 0.9513331 0.9997396 + brks_rmsss <- seq(0.95, 0.99, by = 0.01) + + # Plot RPSS + ##NOTE: Remove 'fileout' to view the figure in pop-up window + PlotEquiMap(res$rpss[1, 1, 1, , ], lon = lon, lat = lat, + dots = res$rpss_sign[1, 1, 1, , ], dot_size = 1.5, dot_symbol = 3, + brks = brks_rpss, + filled.continents = FALSE, triangle_ends = c(TRUE, TRUE), + toptitle = 'ECMWF system5 monthly mean tas RPSS 1991-2011', title_scale = 0.6, + fileout = '~/handson_4_fig_rpss.png') + + # Plot RMSSS + PlotEquiMap(res$rmsss[1, 1, 1, , ], lon = lon, lat = lat, + color_fun = clim.palette('yellowred'), + brks = brks_rmsss, + filled.continents = FALSE, triangle_ends = c(TRUE, TRUE), + toptitle = 'ECMWF system5 monthly mean tas RMSSS 1991-2011', title_scale = 0.6, + fileout = '~/handson_4_fig_rmsss.png') +``` + +![RPSS map](./Figures/handson_4_fig_rpss.png) + +![RMSSS map](./Figures/handson_4_fig_rmsss.png) + + + +**BONUS**: Try to use only one thread for loading and computing (i.e., `threads_load = 1`, `threads_compute = 1`.) Is it slower than using more threads? +It is slower with only one thread. For example, the total running time is 1214 seconds with one thread (see below) +but 331 seconds with multiple threads (see above 4.1) The loading and computing time are both more with one thread. +But the efficiency also depends on the situation of the machine. If the machine is very loaded, it can be slow for multiple threads as well. + +```r +* Computation ended successfully. +* Number of chunks: 4 +* Max. number of concurrent chunks (jobs): 1 +* Requested cores per job: NA +* Load threads per chunk: 1 +* Compute threads per chunk: 1 +* Total time (s): 1214.47402715683 +* Chunking setup: 0.000868797302246094 +* Data upload to cluster: 0 +* All chunks: 1211.30233049393 +* Transfer results from cluster: 0 +* Merge: 3.17082786560059 +* Each chunk: +* queue: +* mean: 0 +* min: 0 +* max: 0 +* job setup: +* mean: 0 +* min: 0 +* max: 0 +* load: +* mean: 13.2360381484032 +* min: 12.5131008625031 +* max: 14.2744646072388 +* compute: +* mean: 289.587656855583 +* min: 286.497741699219 +* max: 294.805396080017 +``` + diff --git a/inst/doc/tutorial/PATC2022/handson_5-interpolation.md b/inst/doc/tutorial/PATC2022/handson_5-interpolation.md new file mode 100644 index 0000000000000000000000000000000000000000..63b9ea6a7455fd40d8709e875a71a257c830c837 --- /dev/null +++ b/inst/doc/tutorial/PATC2022/handson_5-interpolation.md @@ -0,0 +1,134 @@ +# Hands-on 5: Spatial grid interpolation + +## Goal +To learn how to use Start() to load the data and do the interpolation. +In this use case, we only use Start() and load the data in the local memory. +The default transformation function is startR::CDORemapper, a wrapper function of s2dv::CDORemap that uses CDO inside. + +In detail, we are going to: +(1) Learn different ways to assign longitude and latitude in Start(). +(2) Use parameter '_reorder' to specify the sorted order. +(3) Use transform-related parameters. + +## 0. Load required modules and packages + +We need to have CDO module loaded for interpolation. If you did not load it, quit the +R console and load it. As for R packages, we need startR for data loading and s2dv for plo +tting. + +```r +# Clean the session +rm(list = ls()) + +library(startR) +library(s2dv) +``` + +## 1. Define longitude and latitude +```r + # Use this one if on workstation or nord3 + repos <- "/esarchive/exp/ecmwf/system5_m1/monthly_mean/$var$_f6h/$var$_$sdate$.nc" + # Use this one if on Marenostrum4 and log in with PATC2022 account + repos <- paste0('/gpfs/scratch/nct01/nct01127/d3_R_handson/esarchive/', + 'exp/ecmwf/system5_m1/monthly_mean/', + '$var$_f6h/$var$_$sdate$.nc') + var <- 'tas' + sdate <- c('20170101') + lon.min <- 0 + lon.max <- 359.9 + lat.min <- -90 + lat.max <- 90 + + data <- Start(dat = repos, + var = var, + sdate = sdate, + ensemble = indices(1), + time = 'all', + latitude = values(list(lat.min, lat.max)), + latitude_reorder = Sort(), + longitude = values(list(lon.min, lon.max)), + longitude_reorder = CircularSort(0, 360), + synonims = list(latitude = c('lat', 'latitude'), + longitude = c('lon', 'longitude')), + return_vars = list(time = 'sdate', + longitude = 'dat', + latitude = 'dat'), + retrieve = TRUE + ) +``` + +1. Run the above script. Check the dimensions, the warning messages, and the values of +longitude and latitude. What is the range of longitude and latitude? + +2. Why 'lon.max' is 359.9 but not 360? What will happen if it is 360? + +3. Now, change + - `latitude_reorder = Sort()` to `latitude_reorder = Sort(decreasing = TRUE)` + - `longitude_reorder = CircularSort(0, 360)` to `longitude_reorder = CircularSort(-180, 180)` + - Set `lon.min <- -180` and `lon.max <- 179.9` + +Check the values of longitude and latitude again. Is it different from the original script? + + +## 2. Interpolation + +Now, let us add the transformation parameters. To understand the usage of the `transform` parameters, check FAQ on GitLab https://earth.bsc.es/gitlab/es/startR/-/blob/master/inst/doc/faq.md#5-do-interpolation-in-start-using-parameter-transform + +```r + data_1deg <- Start(dat = repos, + var = var, + sdate = sdate, + ensemble = indices(1), + time = 'all', + latitude = values(list(lat.min, lat.max)), + latitude_reorder = Sort(), + longitude = values(list(lon.min, lon.max)), + longitude_reorder = CircularSort(0, 360), + ## transformation + transform = CDORemapper, + transform_extra_cells = 2, + transform_params = list(grid = 'r360x181', + method = 'conservative'), + transform_vars = c('latitude', 'longitude'), + ## + synonims = list(latitude = c('lat', 'latitude'), + longitude = c('lon', 'longitude')), + return_vars = list(time = 'sdate', + longitude = NULL, + latitude = NULL), + retrieve = TRUE) +``` + +1. Run the above script. Check the dimensions and the values of longitude and latitude. + +2. Plot maps +Plot the two data and see the difference. + +```r +# original data +lon_ori <- attr(data, 'Variables')$dat1$longitude +lat_ori <- attr(data, 'Variables')$dat1$latitude +PlotEquiMap(drop(data)[1, , ], lon = lon_ori, lat = lat_ori, + filled.continent = FALSE, + toptitle = "Original data: 640x1296, method: con", + fileout = "~/handson_5_fig1.png") + +# regridded data +lon_1deg <- attr(data_1deg, 'Variables')$dat1$longitude +lat_1deg <- attr(data_1deg, 'Variables')$dat1$latitude +PlotEquiMap(drop(data_1deg)[1, , ], lon = lon_1deg, lat = lat_1deg, + filled.continent = FALSE, + toptitle = "Regridded data: 181x360, method: con", + fileout = "~/handson_5_fig2.png") +``` + +3. Try different grids and interpolation methods + +- Change `method` to "bil", "bic", or other available options (check s2dv::CDORemap for details.) + +- Change `grid` to "r100x50", "t106grid", or other available options. + +- Plot the maps to see the difference. + + + diff --git a/inst/doc/tutorial/PATC2022/handson_5-interpolation_ans.md b/inst/doc/tutorial/PATC2022/handson_5-interpolation_ans.md new file mode 100644 index 0000000000000000000000000000000000000000..522c513bbeeaa3def64e9bc23b4a9195813ef0c2 --- /dev/null +++ b/inst/doc/tutorial/PATC2022/handson_5-interpolation_ans.md @@ -0,0 +1,203 @@ +# Hands-on 5: Spatial grid interpolation + +## Goal +To learn how to use Start() to load the data and do the interpolation. +In this use case, we only use Start() and load the data in the local memory. +The default transformation function is startR::CDORemapper, a wrapper function of s2dv::CDORemap that uses CDO inside. + +In detail, we are going to: +(1) Learn different ways to assign longitude and latitude in Start(). +(2) Use parameter '_reorder' to specify the sorted order. +(3) Use transform-related parameters. + +## 0. Load required modules and packages + +We need to have CDO module loaded for interpolation. If you did not load it, quit the +R console and load it. As for R packages, we need startR for data loading and s2dv for plotting. + +```r +# Clean the session +rm(list = ls()) + +library(startR) +library(s2dv) +``` + +## 1. Define longitude and latitude +```r + # Use this one if on workstation or nord3 + repos <- "/esarchive/exp/ecmwf/system5_m1/monthly_mean/$var$_f6h/$var$_$sdate$.nc" + # Use this one if on Marenostrum4 and log in with PATC2022 account + repos <- paste0('/gpfs/scratch/nct01/nct01127/d3_R_handson/esarchive/', + 'exp/ecmwf/system5_m1/monthly_mean/', + '$var$_f6h/$var$_$sdate$.nc') + + var <- 'tas' + sdate <- c('20170101') + lon.min <- 0 + lon.max <- 359.9 + lat.min <- -90 + lat.max <- 90 + + data <- Start(dat = repos, + var = var, + sdate = sdate, + ensemble = indices(1), + time = 'all', + latitude = values(list(lat.min, lat.max)), + latitude_reorder = Sort(), + longitude = values(list(lon.min, lon.max)), + longitude_reorder = CircularSort(0, 360), + synonims = list(latitude = c('lat', 'latitude'), + longitude = c('lon', 'longitude')), + return_vars = list(time = 'sdate', + longitude = 'dat', + latitude = 'dat'), + retrieve = TRUE + ) +``` + +1. Run the above script. Check the dimensions, the warning messages, and the values of +longitude and latitude. What is the range of longitude and latitude? +```r +dim(data) + dat var sdate ensemble time latitude longitude + 1 1 1 1 7 640 1296 + +longitude <- attr(data, 'Variables')$dat1$longitude +range(longitude) +[1] 0.0000 359.7222 + +latitude <- attr(data, 'Variables')$dat1$latitude +latitude[1] +[1] -89.78488 +latitude[640] +[1] 89.78488 +``` + +2. Why 'lon.max' is 359.9 but not 360? What will happen if it is 360? +If it is 360, you only get one point of longitude. Because of `longitude_reorder = CircularSort(0, 360)`, Start() regards 0 and 360 as the same point. Therefore, we need to set a number slightly smaller than 360 but bigger than the maximum value in the original data, so we can get the whole range. + +3. Now, change + - `latitude_reorder = Sort()` to `latitude_reorder = Sort(decreasing = TRUE)` + - `longitude_reorder = CircularSort(0, 360)` to `longitude_reorder = CircularSort(-180, 180)` + - Set `lon.min <- -180` and `lon.max <- 179.9` + +Check the values of longitude and latitude again. Is it different from the original script? +```r + lon.min <- -180 + lon.max <- 179.9 + + data <- Start(dat = repos, + var = var, + sdate = sdate, + ensemble = indices(1), + time = 'all', + latitude = values(list(lat.min, lat.max)), + latitude_reorder = Sort(decreasing = TRUE), + longitude = values(list(lon.min, lon.max)), + longitude_reorder = CircularSort(-180, 180), + synonims = list(latitude = c('lat', 'latitude'), + longitude = c('lon', 'longitude')), + return_vars = list(time = 'sdate', + longitude = 'dat', + latitude = 'dat'), + retrieve = TRUE) + +dim(data) + dat var sdate ensemble time latitude longitude + 1 1 1 1 7 640 1296 + +longitude <- attr(data, 'Variables')$dat1$longitude +range(longitude) +[1] -180.0000 179.7222 + +latitude <- attr(data, 'Variables')$dat1$latitude +latitude[1] +[1] 89.78488 +latitude[640] +[1] -89.78488 + +``` +The dimensions are the same. The longitude range changes to [-180, 180], and the latitude sorts from 90 to -90. + + +## 2. Interpolation +Now, let us add the transformation parameters. To understand the usage of the `transform` parameters, check FAQ on GitLab https://earth.bsc.es/gitlab/es/startR/-/blob/master/inst/doc/faq.md#5-do-interpolation-in-start-using-parameter-transform + +```r + data_1deg <- Start(dat = repos, + var = var, + sdate = sdate, + ensemble = indices(1), + time = 'all', + latitude = values(list(lat.min, lat.max)), + latitude_reorder = Sort(), + longitude = values(list(lon.min, lon.max)), + longitude_reorder = CircularSort(0, 360), + ## transformation + transform = CDORemapper, + transform_extra_cells = 2, + transform_params = list(grid = 'r360x181', + method = 'conservative'), + transform_vars = c('latitude', 'longitude'), + ## + synonims = list(latitude = c('lat', 'latitude'), + longitude = c('lon', 'longitude')), + return_vars = list(time = 'sdate', + longitude = 'dat', + latitude = 'dat'), + retrieve = TRUE) +``` + +1. Run the above script. Check the dimensions and the values of longitude and latitude. +```r +dim(data_1deg) + dat var sdate ensemble time latitude longitude + 1 1 1 1 7 181 360 + +longitude <- attr(data_1deg, 'Variables')$dat1$longitude +range(longitude) +[1] 0 359 + +latitude <- attr(data_1deg, 'Variables')$dat1$latitude +range(latitude) +[1] -90 90 +``` + +2. Plot maps +Plot the two data and see the difference. + +```r +# original data +lon_ori <- attr(data, 'Variables')$dat1$longitude +lat_ori <- attr(data, 'Variables')$dat1$latitude +PlotEquiMap(drop(data)[1, , ], lon = lon_ori, lat = lat_ori, + filled.continent = FALSE, + toptitle = "Original data: 640x1296, method: con", + fileout = "~/handson_5_fig1.png") + +# regridded data +lon_1deg <- attr(data_1deg, 'Variables')$dat1$longitude +lat_1deg <- attr(data_1deg, 'Variables')$dat1$latitude +PlotEquiMap(drop(data_1deg)[1, , ], lon = lon_1deg, lat = lat_1deg, + filled.continent = FALSE, + toptitle = "Regridded data: 181x360, method: con", + fileout = "~/handson_5_fig2.png") +``` +![original grid, method = 'con'](./Figures/handson_5_fig1.png) + +![regrid to 1 degree, method = 'con'](./Figures/handson_5_fig2.png) + + +3. Try different grids and interpolation methods + +- Change `method` to "bil", "bic", or other available options (check s2dv::CDORemap for details.) + +- Change `grid` to "r100x50", "t106grid", or other available options. + +- Plot the map to see the difference. + +![regrid to r100x50, method = 'bil'](./Figures/handson_5_fig3.png) + +![regrid to t106grid, method = 'bic'](./Figures/handson_5_fig4.png) diff --git a/inst/doc/tutorial/PATC2022/nord3_demo.R b/inst/doc/tutorial/PATC2022/nord3_demo.R new file mode 100644 index 0000000000000000000000000000000000000000..ffa4dec1f236696a27fce3ebf715e72c8e2d6af2 --- /dev/null +++ b/inst/doc/tutorial/PATC2022/nord3_demo.R @@ -0,0 +1,76 @@ +library(startR) + + repos <- "/esarchive/exp/ecmwf/system5_m1/monthly_mean/$var$_f6h/$var$_$sdate$.nc" + var <- 'tas' + sdate <- c('20170101', '20170201') + lon.min <- 0 + lon.max <- 359.9 + lat.min <- -90 + lat.max <- 90 + + data <- Start(dat = repos, + var = var, + sdate = sdate, + ensemble = 'all', + time = 'all', + latitude = values(list(lat.min, lat.max)), + latitude_reorder = Sort(), + longitude = values(list(lon.min, lon.max)), + longitude_reorder = CircularSort(0, 360), + synonims = list(latitude = c('lat', 'latitude'), + longitude = c('lon', 'longitude')), + return_vars = list(time = 'sdate', + longitude = NULL, latitude = NULL), + retrieve = FALSE + ) + + func <- function(x, conf, pval) { + # x: [ensemble, time] + # ensemble mean + ens_mean <- apply(x, 2, mean) + # temporal trend + trend <- s2dv::Trend(ens_mean, conf = conf, pval = pval)$trend + + return(trend) + } + + step <- Step(func, target_dims = c('ensemble', 'time'), + output_dims = list(trend = 'stats'), + use_libraries = c('s2dv')) + + wf <- AddStep(data, step, conf = FALSE, pval = FALSE) + +#-------------------user-defined--------------------- + queue_host <- 'nord4' # short name in .ssh/config + temp_dir <- '/gpfs/scratch/bsc32/bsc32734/startR_hpc/' + ecflow_suite_dir <- '/home/Earth/aho/startR_local/' +#---------------------------------------------------- + + # Nord3-v2 + res <- Compute(wf, + chunks = list(latitude = 2, + longitude = 2), + threads_load = 2, + threads_compute = 4, + cluster = list( + queue_host = 'nord4', + queue_type = 'slurm', + temp_dir = temp_dir, + cores_per_job = 16, + job_wallclock = '01:00:00', + max_jobs = 4, +# extra_queue_params = list('#SBATCH --constraint=medmem'), + bidirectional = FALSE, + polling_period = 10 + ), + ecflow_suite_dir = ecflow_suite_dir, + wait = TRUE + ) + +# Save the result if needed +saveRDS(res, file = '~/nord3_demo_result.Rds') + +# Check the result +str(res) +dim(res$trend) + diff --git a/inst/doc/usecase/ex1_16_files_different_time_dim_length.R b/inst/doc/usecase/ex1_16_files_different_time_dim_length.R new file mode 100644 index 0000000000000000000000000000000000000000..8b54bbce1b78afe962a76a02efd8be6d91872872 --- /dev/null +++ b/inst/doc/usecase/ex1_16_files_different_time_dim_length.R @@ -0,0 +1,52 @@ +#---------------------------------------------------------------------------- +# Author: An-Chi Ho +# Date: 11th November 2022 +# +# This script shows how to use Start() to load files with different time dimension length +# and reshape the array without undesired NAs. To know more about the usage of parameter +# "merge_across_dims_narm" and "largest_dims_length", check faq.md How-to-26 (https://earth.bsc.es/gitlab/es/startR/-/blob/master/inst/doc/faq.md#26-use-merge-across-dims-narm-to-remove-nas) +# +# The used data is daily data, so the time dimension length is the days in each month. +# "largest_dims_length = T" needs to be used to tell Start() the different time dimension lengths +# in each file, and "merge_across_dims_narm = T" can remove the NAs that are added by Start() +# during the reshaping process. +#---------------------------------------------------------------------------- + +library(startR) + +# exp +repos <- paste0('/esarchive/recon/ecmwf/era5/daily_mean/$var$_f1h/$var$_$sdate$.nc') +lons.min <- -10 +lons.max <- 30 +lats.min <- 30 +lats.max <- 60 +sdates <- clock::date_format(clock::date_build(2019, 1:12, 1), format = "%Y%m") + +data <- Start(dat = repos, + var = 'tas', + sdate = sdates, + time = 'all', + time_across = 'sdate', + merge_across_dims = TRUE, # merge 'time' dim along 'sdate' + merge_across_dims_narm = TRUE, # remove additional NAs + largest_dims_length = TRUE, # to tell Start() that each file has different dim length + lat = values(list(lats.min, lats.max)), + lat_reorder = Sort(), + lon = values(list(lons.min, lons.max)), + lon_reorder = CircularSort(-180, 180), + synonims = list(lat = c('lat', 'latitude'), lon = c('lon', 'longitude')), + return_vars = list(lon = NULL, lat = NULL, time = 'sdate'), + num_procs = 16, + retrieve = TRUE) + +dim(data) +# dat var time lat lon +# 1 1 365 107 142 + +dim(attr(data, 'Variables')$common$time) +#time +# 365 + +#NOTE: If merge_across_dims_narm = FALSE or largest_dims_length = FALSE, time dimension will +# be 372 (= 31 days * 12 months) and NAs will be at the end of each small month, i.e., +# 60:62 (29-31 Feb), 124 (31 Apr), 186 (31 Jun), 279 (31 Sep), 341 (31 Nov).