diff --git a/inst/doc/faq.md b/inst/doc/faq.md index f92b798a94570169ac7a7f1c6bfdf00d3ebb90d1..1508742096218a5ca6ee05840b347855afbaaf03 100644 --- a/inst/doc/faq.md +++ b/inst/doc/faq.md @@ -30,6 +30,7 @@ This document intends to be the first reference for any doubts that you may have 24. [Do both interpolation and chunking on spatial dimensions](#24-do-both-interpolation-and-chunking-on-spatial-dimensions) 25. [What to do if your function has too many target dimensions](#25-what-to-do-if-your-function-has-too-many-target-dimensions) 26. [Use merge_across_dims_narm to remove NAs](#26-use-merge_across_dims_narm-to-remove-nas) + 27. [Utilize chunk number in the function](#27-utilize-chunk-number-in-the-function) 2. **Something goes wrong...** @@ -721,11 +722,12 @@ obs <- Start(dat = path.obs, retrieve = T) ``` -### 18. Use glob expression '*' to define the file path -The standard way to define the file path for Start() is using tags (i.e., $TAG_NAME$). -The glob expression, or wildcard, '*', can also be used in the path definition, while the rule is different from the common usage. -Please note that **'*' can only be used to replace the common part of all the files**. For example, if all the required files have the folder 'EC-Earth-Consortium/' in their path, then this part can be substituted with '*/'. +### 18. Use glob expression '*' to define the file path +The standard way to define the file path for Start() is using tags (i.e., $TAG_NAME$). +The glob expression, or wildcard, '*', can also be used in the path definition, while the rule is different from the common usage. + +Please note that **'*' can only be used to replace the common part of all the files**. For example, if all the required files have the folder 'EC-Earth-Consortium/' in their path, then this part can be substituted with '*/'. It can save some effort to define the long and uncritical path, and also make the script cleaner. However, if the part replaced by '\*' is not same among all the files, Start() will use **the first pattern it finds in the first file to substitute '*'**. @@ -976,7 +978,10 @@ We provide some [use cases](inst/doc/usecase/ex2_12_transform_and_chunk.R) showi ### 25. What to do if your function has too many target dimensions -Unfortunately, we don't have a perfect solution now before we have multiple steps feature. Talk to maintainers to see how to generate a workaround for your case. +Ideally, the desired startR workflow uses those required (for the key computations) dimensions to compute an analysis and the rest to chunk the data in pieces. +If we have a complex analysis that require all the dimensions in the computation in one single step, we don't have any free (i.e., margin) dimension to chunk the data. +Unfortunately, we don't have a perfect solution now before we have multiple steps feature. +You may check [How-to-27](#27-utilize-chunk-number-in-the-function) to see if the solution applies to your case. If not, talk to the maintainers to see how to generate a workaround for your case. ### 26. Use merge_across_dims_narm to remove NAs @@ -990,9 +995,23 @@ This parameter tells Start() that it needs to look into each file to know the di A typical example is reading daily data and merging time dimension together. The 30-day months will have one NA at the end of time dimension, if `merge_across_dims_narm = T` and `largest_dims_length = T` are not used. Check usecase [ex1_16](/inst/doc/usecase/ex1_16_files_different_time_dim_length.R) for the example script. See [How-to-21](#21-retrieve-the-complete-data-when-the-dimension-length-varies-among-files) for more details of `largest_dims_length`. - +### 27. Utilize chunk number in the function +In the self-defined function in startR workflow, the dimensions required for the computations are used as target dimensions, +and the rest can be used to chunk the data in pieces. There is one situation that some information of one dimension is needed in the function but it is not depended by the computation. +In this case, we may be able to chunk through this dimension while using it in the function still. It is a saver if you have a complex case with no margin dimension left (see [How-to-25](#25-what-to-do-if-your-function-has-too-many-target-dimensions).) +You just need to define a parameter in your function 'nchunks = chunk_indices' and use it in the function. + +The use case [RainFARM precipitation downscaling](https://earth.bsc.es/gitlab/es/startR/-/blob/develop-RainFARMCase/inst/doc/usecase/ex2_5_rainFARM.R) demonstrates an example that the start date dimension is used as chunking dimension, +but we use its chunk number to know the start date value of each chunk. +The first part of the function performs downscaling method, which requres longitude and latitude dimensions, so these two dimensions must be the target dimensions in the workflow. +After that, the results are saved as netCDF file following esarchive convention. We need start date value here to decide the file name. +As you can see, the sdate dimension is not required for the computation, so it is not necessary to be the target dimension. We can just use 'chunk_indices' to get the chunk number therefore get the corresponding start date value for the file name. + +There are many other possible applications of this parameter. Please share with us other uses cases you may create. + + # Something goes wrong... ### 1. No space left on device diff --git a/inst/doc/figures/Rotated_Coordinates.png b/inst/doc/figures/Rotated_Coordinates.png index c5bc9cf55905972921a816e1bf10a1b8c1e9d8fb..e8a6e2ec30437a9a3978e578159ce288cf4f17aa 100644 Binary files a/inst/doc/figures/Rotated_Coordinates.png and b/inst/doc/figures/Rotated_Coordinates.png differ diff --git a/inst/doc/usecase.md b/inst/doc/usecase.md index c7748f6f2a8b57d82f2b723f18aeebfedc5b6ff0..8224bbe28ac775da3ecd5c6c32072fe547d6e31a 100644 --- a/inst/doc/usecase.md +++ b/inst/doc/usecase.md @@ -37,51 +37,51 @@ for more explanation. The problem may occur when the dimension number of the splitted selector is more than two. If you are not familiar with the usage of these parameters, please see usecases ex1_2 and ex1_3 first, which are less complicated. You can also go to FAQ How-to-#17 for more explanation. 8. [Loading tas and tos from Decadal Predictions performed with the EC-Earth model](inst/doc/usecase/ex1_8_tasandtos.R) - Some climate indices needs to be computed loading 'tas' (air temperature at 2m) over land and 'tos' (ocean surface temperature) over sea. Using **startR**, you can load these data in a unique **Start** call or with multiple calls separately for each variable. + Some climate indices needs to be computed loading 'tas' (air temperature at 2m) over land and 'tos' (ocean surface temperature) over sea. Using **startR**, you can load these data in a unique **Start** call or with multiple calls separately for each variable. 9. [Use glob expression * to define the path](inst/doc/usecase/ex1_9_path_glob_permissive.R) - This script shows you how to use glob expression '*' and the parameter 'path_glob_permissive' of Start(). -You can also find information in [FAQ How-to-18](inst/doc/faq.md#18-use-glob-expression-to-define-the-file-path). + This script shows you how to use glob expression '*' and the parameter 'path_glob_permissive' of Start(). You can also find information in [FAQ How-to-18](inst/doc/faq.md#18-use-glob-expression-to-define-the-file-path). 10. [Use 'metadata_dims' to retrieve complete variable metadata](inst/doc/usecase/ex1_10_metadata_dims.R) - This script tells you how to use the parameter 'metadata_dims' in Start() to get the complete variable metadata. -You will see four difference cases and learn the rules. -You can find more explanation in FAQ [How-to-20](inst/doc/faq.md#20-use-metadata_dims-to-retrieve-variable-metadata). + This script tells you how to use the parameter 'metadata_dims' in Start() to get the complete variable metadata. You will see four difference cases and learn the rules. You can find more explanation in FAQ [How-to-20](inst/doc/faq.md#20-use-metadata_dims-to-retrieve-variable-metadata). 11. [Three methods to load experimental files with different member and version](inst/doc/usecase/ex1_11_expid_member_version.R) This script shows three ways to load the data with different expid - member - version combination. It is useful for climate prediction of multiple experiments. - 12. [Load and plot data in rotated coordintaes](inst/doc/usecase/ex1_12_rotated_coordinates.R) - This script shows how to load and plot data in rotated coordinates using **Monarch-dust** simulations. + 12. [Load and plot data in rotated coordintaes](inst/doc/usecase/ex1_12_rotated_coordinates.R) + This script shows how to load and plot data in rotated coordinates using **Monarch-dust** simulations. + - 13. [Use value array as selector to express dependency](inst/doc/usecase/ex1_13_implicit_dependency.R) - This script shows how to use a value array as the inner dimension selector to express -dependency on a file dimension. By this means, we do not need to specify the *_across -parameter and Start() can recognize this dependecy relationship. + 13. [Use value array as selector to express dependency](inst/doc/usecase/ex1_13_implicit_dependency.R) + This script shows how to use a value array as the inner dimension selector to express dependency on a file dimension. By this means, we do not need to specify the *_across parameter and Start() can recognize this dependecy relationship. - 14. [Specify the dependency between file dimensions](inst/doc/usecase/ex1_14_file_dependency.R) - This script shows how to define the dependency between file dimensions. Note that ex1_13 is for -the dependency between one inner dimension and one file dimension (i.e., the usage of *_across), while -this use case is for two file dimensions (i.e., the usage of *_depends). + 14. [Specify the dependency between file dimensions](inst/doc/usecase/ex1_14_file_dependency.R) + This script shows how to define the dependency between file dimensions. Note that ex1_13 is for the dependency between one inner dimension and one file dimension (i.e., the usage of *_across), while this use case is for two file dimensions (i.e., the usage of *_depends). - 15. [Load irregular grid data](inst/doc/usecase/ex1_15_irregular_grid_CDORemap.R) + 15. [Load irregular grid data](inst/doc/usecase/ex1_15_irregular_grid_CDORemap.R) This script shows how to use Start() to load irregular grid data , then regrid it by s2dv::CDORemap. - 16. [Merge files with different time dimension length](inst/doc/usecase/ex1_16_files_different_time_dim_length.R) - This script shows how to use Start() to load files with different time dimension length -and reshape the array without undesired NAs. + 16. [Merge files with different time dimension length](inst/doc/usecase/ex1_16_files_different_time_dim_length.R) + This script shows how to use Start() to load files with different time dimension length and reshape the array without undesired NAs. 2. **Execute computation (use `Compute()`)** 1. [Function working on time dimension](inst/doc/usecase/ex2_1_timedim.R) + 2. [Function using attributes of the data](inst/doc/usecase/ex2_2_attr.R) Using attributes is only available in startR_v0.1.3 or above. + 3. [Use function CDORemap for interpolation](inst/doc/usecase/ex2_3_cdo.R) Using parameter `CDO_module` is only available in startR_v0.1.3 or above. Interpolate data by using `s2dv::CDORemap` in the workflow. + 4. [Use two functions in workflow](inst/doc/usecase/ex2_4_two_func.R) - 5. + + 5. [RainFARM precipitation downscaling](https://earth.bsc.es/gitlab/es/startR/-/blob/develop-RainFARMCase/inst/doc/usecase/ex2_5_rainFARM.R) + This example shows how to apply a statistical downscaling function with startR and simultaneously (considering the memory size if unnecessary dimensions are included) saves the data by chunks (e.g., chunking those dimensions which are not required for downscaling) in the esarchive format. It is not recommended to save big outputs. Consider to perform some analysis and then retrieve the result instead of saving data. This is a simplified example of RainFARM for more information visit: https://www.medscope-project.eu/products/data/. +Find more explanation of this use case in FAQ [How-to-27](inst/doc/faq.md#27-utilize-chunk-number-in-the-function). + 6. [Use external parameters in atomic function](inst/doc/usecase/ex2_6_ext_param_func.R) 7. [Calculate the ensemble-adjusted Continuous Ranked Probability Score (CRPS)](inst/doc/usecase/ex2_7_seasonal_forecast_crps.R) @@ -94,17 +94,13 @@ and reshape the array without undesired NAs. If you need to apply your analysis in a few gridpoints, you may want to consider use case 1.6, but if you need to load a lot of grid points, maybe this a better solution. 10. [Apply an existing mask on data](inst/doc/usecase/ex2_10_existing_mask.R) - This use case shows you how to apply the existing mask file on your data. -If you need to create the mask file on your own, go to ex2_9_mask.R. + This use case shows you how to apply the existing mask file on your data. If you need to create the mask file on your own, go to ex2_9_mask.R. - 11. [Two datasets with different length of target dimensions](inst/doc/usecase/ex2_11_two_dat_inconsistent_target_dim.R) - This use case uses experimental and the corresponding observational data to calculate -the temporal mean and spatial weighted mean. Notice that the spatial resolutions of the two -datasets are different, but it still works because lat and lon are target dimensions. + 11. [Two datasets with different length of target dimensions](inst/doc/usecase/ex2_11_two_dat_inconsistent_target_dim.R) + This use case uses experimental and the corresponding observational data to calculate the temporal mean and spatial weighted mean. Notice that the spatial resolutions of the two datasets are different, but it still works because lat and lon are target dimensions. - 12. [Transform and chunk spatial dimensions](inst/doc/usecase/ex2_12_transform_and_chunk.R) - This use case provides an example of transforming and chunking latitude and longitude dimensions. + 12. [Transform and chunk spatial dimensions](inst/doc/usecase/ex2_12_transform_and_chunk.R) + This use case provides an example of transforming and chunking latitude and longitude dimensions. - 13. [Load irregular grid data and interpolate it in the workflow](inst/doc/usecase/ex2_13_irregular_regrid.R) - This script shows how to load irregular grid data by Start(), then regrid it -by s2dv::CDORemap in the workflow. It is a solution before Start() can deal with irregular regridding directly. + 13. [Load irregular grid data and interpolate it in the workflow](inst/doc/usecase/ex2_13_irregular_regrid.R) + This script shows how to load irregular grid data by Start(), then regrid it by s2dv::CDORemap in the workflow. It is a solution before Start() can deal with irregular regridding directly. diff --git a/inst/doc/usecase/ex2_5_rainFARM.R b/inst/doc/usecase/ex2_5_rainFARM.R new file mode 100644 index 0000000000000000000000000000000000000000..8d315a03c1f47588f881b39513dbf6b76cf43b7e --- /dev/null +++ b/inst/doc/usecase/ex2_5_rainFARM.R @@ -0,0 +1,109 @@ +# ------------------------------------------------------------------------------ +# Downscaling precipitation using RainFARM +# ------------------------------------------------------------------------------ +# Note 1: The data could be first transformed with QuantileMapping from CSTools +# Note 2: Extra parameters could be used to downscale the data: weights, slope... +# See more information in: +# https://cran.r-project.org/web/packages/CSTools/vignettes/RainFARM_vignette.html +# ------------------------------------------------------------------------------ +# MOST IMPORTANT NOTE: +# startR aims to return a result that fits in your local memory. This aim is +# the oposite of downscaling, which increases the output size. Therefore, this +# example saves the data to NetCDF files and provides the mean which has a +# reduced size. +# Warning!!!!!! Use this example with caution to avoid saving not desired data. +#------------------------------------------------------------------------------- + +# Load libraries and functions: +library(startR) +library(CSTools) +library(ncdf4) +library(s2dv) + +# Define the data: +sdates <- paste0(1993:1996, '1101') # starting dates +path <- "/esarchive/exp/ecmwf/system5c3s/daily/$var$_s0-24h/$var$_$sdate$.nc" +data <- Start(dataset = path, + var = 'prlr', + sdate = sdates, + member = 1:3, + longitude = values(list(-10, 29)), + longitude_reorder = CircularSort(-180, 180), + latitude = values(list(18, 57)), + time = 'all', + return_vars = list(latitude = 'dataset', longitude = 'dataset', + time = NULL), + synonims = list(latitude = c('lat', 'latitude'), + longitude = c('lon', 'longitude'), + member = c('member', 'ensemble')), + retrieve = FALSE) + +# Define the function: +Chunk_RF <- function(x, nf, destination, startdates, nchunks = chunk_indices) { + lon <- as.numeric(attributes(x)$Variables$dat1$longitude) + lat <- as.numeric(attributes(x)$Variables$dat1$latitude) + down_data <- RainFARM(x, lon = lon, lat = lat, drop_realization_dim = TRUE, + nf, lon_dim = 'longitude', lat_dim = 'latitude', time_dim = 'time') + # detect the dates of forecast time for different start dates + time <- attributes(x)$Variables$common$time + dates <- lapply(startdates, function(x) {seq(as.Date(x, format = "%Y-%m-%d"), + x + length(time) - 1, 'day')}) + dimname <- names(dim(down_data$data)) + var_dims <- list(ncdim_def(name = 'lon', units = 'degrees', + vals = as.vector(down_data$lon), longname = 'longitude'), + ncdim_def(name = 'lat', units = 'degrees', + vals = as.vector(down_data$lat), longname = 'longitude'), + ncdim_def(name = 'ensemble', units = 'adim', + vals = 1 : dim(down_data$data)[which(dimname == 'member')], + longname = 'ensemble', create_dimvar = TRUE)) + metadata_var <- list(units = 'm s-1', + cdo_grid_name = paste0('r', length(lon), 'x', length(lat)), + projection = 'none') + # startdates and dates depends on the chunk: + CSTools:::.saveExp(down_data$data, startdates = startdates[nchunks['sdate']], + dates = dates[[nchunks['sdate']]], defined_dims = var_dims, + varname = 'prlr', metadata_var = metadata_var, + destination = destination) + down_data_mean <- s2dv::MeanDims(down_data$data, c('member', 'longitude', 'latitude')) + return(down_data_mean) +} + +step <- Step(Chunk_RF, + target_dims = c('member', 'longitude', 'latitude', 'time'), + output_dims = 'time', + use_libraries = c('CSTools', 'ncdf4'), + use_attributes = list(data = "Variables")) + +workflow <- AddStep(data, step, nf = 4, + destination = "/esarchive/scratch/nperez/git/Flor/cstools/test_RF_start/", + startdates = as.Date(sdates, format = "%Y%m%d")) + +res <- Compute(workflow, + chunks = list(sdate = 4), + threads_load = 2, + threads_compute = 4) + +#-----------modify according to your personal info--------- + queue_host = 'nord3' # your own host name for nord3v2 + temp_dir = '/gpfs/scratch/bsc32/bsc32339/startR_hpc/' + ecflow_suite_dir = '/home/Earth/nperez/startR_local/' # your own local directory +#------------------------------------------------------------ + +res <- Compute(workflow, + chunks = list(sdate = 4), + threads_load = 1, + threads_compute = 1, + cluster = list(queue_host = queue_host, + queue_type = 'slurm', + cores_per_job = 16, + r_module = 'R/4.1.2-foss-2019b', + temp_dir = temp_dir, + polling_period = 10, + job_wallclock = '01:00:00', + max_jobs = 4, + bidirectional = FALSE), + ecflow_suite_dir = ecflow_suite_dir, + wait = TRUE) + +# Visualize the temporal evolution of the result simultaneously for all sdates: +matplot(1:215, res$output1[, 1, 1, ], type = "l")