diff --git a/inst/doc/deployment.md b/inst/doc/deployment.md index 47325db52180433bd49da4920be06a53780cb43f..1e098bba417d816d6cb234e18cdcd532ff74754c 100644 --- a/inst/doc/deployment.md +++ b/inst/doc/deployment.md @@ -17,7 +17,7 @@ devtools::install_git('https://earth.bsc.es/gitlab/es/startR') ``` - Among others, the bigmemory package will be installed. - If loading and processing NetCDF files (only file format supported by now), install the easyNCDF package. - - If planning to interpolate the data with CDO (either by using the `transform` parameter in `startR::Start`, or by using `s2dverification::CDORemap` in the workflow specified to `startR::Compute`), install s2dverification (>= 2.8.4) and CDO (version 1.6.3 tested). CDO is not available for Windows. + - If planning to interpolate the data with CDO (either by using the `transform` parameter in `startR::Start`, or by using `s2dv::CDORemap` in the workflow specified to `startR::Compute`), install s2dv and CDO (version 1.6.3 tested). CDO is not available for Windows. A local or remote file system or THREDDS/OPeNDAP server providing the data to be retrieved must be accessible. @@ -50,7 +50,7 @@ All machines must be UNIX-based, with the "hostname", "date", "touch" and "sed" - netCDF-4 is installed, if loading and processing NetCDF files (only supported format by now) - R (>= 2.14.1) is installed as a Linux Environment Module - the startR package is installed - - if using CDO interpolation, the s2dverification package and CDO 1.6.3 are installed + - if using CDO interpolation, the s2dv package and CDO 1.6.3 are installed - any other R packages required by the `startR::Compute` workflow are installed - any other Environment Modules used by the `startR::Compute` workflow are installed - a shared file system (with a unified access point) or THREDDS/OPeNDAP server is accessible across HPC nodes and HPC login node, where the necessary data can be uploaded from your workstation. A file system shared between your workstation and the HPC is also supported and advantageous. Use of a data transfer service between the workstation and the HPC is also supported under specific configurations. diff --git a/inst/doc/faq.md b/inst/doc/faq.md index 9aa3889a85eedbbf258bf6da8ac1dff16147da4e..ea84377aa7425394a66bef20794a39072c9451c5 100644 --- a/inst/doc/faq.md +++ b/inst/doc/faq.md @@ -277,7 +277,7 @@ data <- Start(..., retrieve = FALSE) func <- function(x) { - y <- s2dverification::Season(x, posdim = 'time') #specify package name + y <- s2dv::Season(x, posdim = 'time') #specify package name return(y) } @@ -295,7 +295,7 @@ wf <- AddStep(data, step) cluster = list(queue_host = 'p1', #your alias for power9 queue_type = 'slurm', temp_dir = '/gpfs/scratch/bsc32/bsc32734/startR_hpc/', - lib_dir = '/gpfs/projects/bsc32/share/R_libs/3.5/', #s2dverification is involved here, so the machine can find Season() + lib_dir = '/gpfs/projects/bsc32/share/R_libs/3.5/', #s2dv is involved here, so the machine can find Season() r_module = 'startR/0.1.2-foss-2018b-R-3.5.0', job_wallclock = '00:10:00', cores_per_job = 4, @@ -313,7 +313,7 @@ wf <- AddStep(data, step) If you want to do the interpolation within Start(), you can use the following four parameters: -1. **`transform`**: Assign the interpolation function. It is recommended to use `startR::CDORemapper`, the wrapper function of s2dverification::CDORemap(). +1. **`transform`**: Assign the interpolation function. It is recommended to use `startR::CDORemapper`, the wrapper function of s2dv::CDORemap(). 2. **`transform_params`**: A list of the required inputs for `transform`. Take `transform = CDORemapper` as an example, the common items are: - `grid`: A character string specifying either a name of a target grid (recognized by CDO, e.g., 'r256x128', 't106grid') or a path to another NetCDF file with the target grid (a single grid must be defined in such file). - `method`: A character string specifying an interpolation method (recognized by CDO, e.g., 'con', 'bil', 'bic', 'dis'). The following long names are also supported: 'conservative', 'bilinear', 'bicubic', and 'distance-weighted'. @@ -323,10 +323,10 @@ If you want to do the interpolation within Start(), you can use the following fo The parameter ’crop’ also accepts a numeric vector of custom borders: c(western border, eastern border, southern border, northern border). 3. **`transform_vars`**: A character vector of the inner dimensions to be transformed. E.g., c('latitude', 'longitude'). -4. **`transform_extra_cells`**: A numeric indicating the number of grid cell to extend from the borders if the interpolating region is a subset of the whole region. 2 as default, which is consistent with the method in s2dverification::Load(). +4. **`transform_extra_cells`**: A numeric indicating the number of grid cell to extend from the borders if the interpolating region is a subset of the whole region. 2 as default, which is consistent with the method in s2dv::Load(). You can find an example script here [ex1_1_tranform.R](/inst/doc/usecase/ex1_1_tranform.R) -You can see more information in s2dverification::CDORemap documentation [here](https://earth.bsc.es/gitlab/es/s2dverification/blob/master/man/CDORemap.Rd). +You can see more information in s2dv::CDORemap documentation [here](https://earth.bsc.es/gitlab/es/s2dv/blob/master/man/CDORemap.Rd). ### 6. Get data attributes without retrieving data to workstation @@ -461,7 +461,7 @@ data <- Start(dat = repos, ### 9. Use CDORemap() in function -If you want to interpolate data by s2dverification::CDORemap in function, you need to tell the +If you want to interpolate data by s2dv::CDORemap in function, you need to tell the machine which CDO module to use. Therefore, `CDO_module = 'CDO/1.9.5-foss-2018b'` should be added in Compute() cluster list. See the example in usecase [ex2_3_cdo.R](inst/doc/usecase/ex2_3_cdo.R). @@ -1031,7 +1031,7 @@ some problem. ### 5. Errors related to wrong file formatting -Several errors could be return when the files are not correctly formatted. If you see one of this errors, review the coordinates in your files: +Several errors could be returned when the files are not correctly formatted. If you see one of this errors, review the coordinates in your files: ``` Error in Rsx_nc4_put_vara_double: NetCDF: Numeric conversion not representable @@ -1045,7 +1045,7 @@ Error in dim(x$x) <- dim_bk : ``` ``` -Error in s2dverification::CDORemap(data_array, lons, lats, ...) : +Error in s2dv::CDORemap(data_array, lons, lats, ...) : Found invalid values in 'lons'. ``` @@ -1053,7 +1053,7 @@ Error in s2dverification::CDORemap(data_array, lons, lats, ...) : ERROR: invalid cell Aborting in file clipping.c, line 1295 ... -Error in s2dverification::CDORemap(data_array, lons, lats, ...) : +Error in s2dv::CDORemap(data_array, lons, lats, ...) : CDO remap failed. ``` diff --git a/inst/doc/practical_guide.md b/inst/doc/practical_guide.md index d476652420df126e1893af3191ea63de2137bab8..378f6486bd7a5cb60fadb35918a15bd056f55379 100644 --- a/inst/doc/practical_guide.md +++ b/inst/doc/practical_guide.md @@ -297,7 +297,7 @@ If you are interested in actually loading the entire data set in your machine yo - evaluating the object returned by `Start()`: `data_load <- eval(data)` See the section on "How to choose the number of chunks, jobs and cores" for indications on working out the maximum amount of data that can be retrieved with a `Start()` call on a specific machine. -You may realize that this functionality is similar to the `Load()` function in the s2dverification package. In fact, `Start()` is more advanced and flexible, although `Load()` is more mature and consistent for loading typical seasonal to decadal forecasting data. `Load()` will be adapted in the future to use `Start()` internally. +You may realize that this functionality is similar to the `Load()` function in the s2dv package. In fact, `Start()` is more advanced and flexible, although `Load()` is more mature and consistent for loading typical seasonal to decadal forecasting data. `Load()` will be adapted in the future to use `Start()` internally. There are no constrains for the number or names of the outer or inner dimensions used in a `Start()` call. In other words, `Start()` will handle NetCDF files with any number of dimensions with any name, as well as files distributed across folders in complex ways, since you can use customized wildcards in the path pattern. diff --git a/inst/doc/usecase.md b/inst/doc/usecase.md index e0cf5e17d6063ec8903da29a6581a2bf0efcc598..9d1713901b754e36661493ac165d9bd7dc213466 100644 --- a/inst/doc/usecase.md +++ b/inst/doc/usecase.md @@ -87,5 +87,10 @@ If you need to create the mask file on your own, go to ex2_9_mask.R. This use case uses experimental and the corresponding observational data to calculate the temporal mean and spatial weighted mean. Notice that the spatial resolutions of the two datasets are different, but it still works because lat and lon are target dimensions. - - + 12. [Transform and chunk over spatial dimensions](inst/doc/usecase/ex2_12_transform_and_chunk.R) + This use case provides an example of transforming and chunking +latitude and longitude dimensions. If all other dimensions are used as target dimensions in the operation, +it would be good to have the option of chunking the spatial dimensions. + 13. [Interpolate irregular grid in the workflow](inst/doc/usecase/ex2_13_irregular_regrid.R) + This script shows how to load irregular grid data by Start(), then regrid it +by s2dv::CDORemap in the workflow. It is a solution before Start() can deal with irregular regridding directly. diff --git a/inst/doc/usecase/ex2_13_irregular_regrid.R b/inst/doc/usecase/ex2_13_irregular_regrid.R new file mode 100644 index 0000000000000000000000000000000000000000..df5e21f03aaeb3b47073d931f838e299334a1e69 --- /dev/null +++ b/inst/doc/usecase/ex2_13_irregular_regrid.R @@ -0,0 +1,68 @@ +#---------------------------------------------------------------------------- +# Author: An-Chi Ho +# Date: 8th Oct 2021 +# +# This script shows how to load irregular grid data by Start(), then regrid it +# by s2dv::CDORemap in the workflow. It is a solution before Start() can deal +# with irregular regridding directly. +#---------------------------------------------------------------------------- + +library(startR) + +path <- paste0('/esarchive/exp/CMIP6/dcppA-hindcast/cmcc-cm2-sr5/cmip6-dcppA-hindcast_i1p1/', + 'DCPP/CMCC/CMCC-CM2-SR5/dcppA-hindcast/$member$/Omon/$var$/gn/v20210312/', + '$var$_*_s$sdate$-$member$_gn_$aux$.nc') + +data <- Start(dataset = path, + var = 'tos', + sdate = c('1960', '1961'), + aux = 'all', + aux_depends = 'sdate', + j = indices(2:361), # remove two indices to avoid white strips + i = indices(2:291), # remove two indices to avoid white strips + time = indices(1:12), + member = 'r1i1p1f1', + return_vars = list(j = NULL, i = NULL, + latitude = NULL, longitude = NULL), + retrieve = F) + +func_regrid <- function(data) { + lons <- attr(data, 'Variables')$common$longitude + lats <- attr(data, 'Variables')$common$latitude + data <- s2dv::CDORemap(data, lons, lats, grid = 'r360x180', + method = 'bil', crop = FALSE) + lons_reg <- data[['lons']] + lats_reg <- data[['lats']] + return(list(data = data[[1]], lats = lats_reg, lons = lons_reg)) +} + +step <- Step(fun = func_regrid, + target_dims = list(data = c('j', 'i')), + output_dims = list(data = c('lon', 'lat'), + lats = 'lat', lons = 'lon'), + use_attributes = list(data = "Variables")) +wf <- AddStep(data, step) + +res <- Compute(workflow = wf$data, + chunks = list(sdate = 2, time = 2)) + +names(res) +#[1] "data" "lats" "lons" +dim(res$data) +# lon lat dataset var sdate aux time member +# 360 180 1 1 2 1 12 1 +dim(res$lons) +# lon dataset var sdate aux time member +# 360 1 1 2 1 12 1 +dim(res$lats) +# lat dataset var sdate aux time member +# 180 1 1 2 1 12 1 + +library(s2dv) +PlotEquiMap(drop(res$data)[ , , 1, 1], lon = drop(res$lons)[, 1, 1], + lat = drop(res$lats)[, 1, 1]) + +# Plot Layout for sdate = 1 all the time steps +var <- Reorder(drop(res$data)[, , 1, ], c(3, 1, 2)) +PlotLayout(PlotEquiMap, c('lon', 'lat'), var = var, + lon = drop(res$lons)[, 1, 1], lat = drop(res$lats)[, 1, 1])