startR issueshttps://earth.bsc.es/gitlab/es/startR/-/issues2020-05-08T11:32:10+02:00https://earth.bsc.es/gitlab/es/startR/-/issues/57Use as mask in startR workflow for gridpoint analysis2020-05-08T11:32:10+02:00Nuria Pérez-ZanónUse as mask in startR workflow for gridpoint analysisAs reported by @bsolaraj, a user may need to compute different indices/skill or other metrics depending on the gridpoint. Depending on the type of analysis, there are two options:
A) latitude and longitude dimensions can be used as 'tar...As reported by @bsolaraj, a user may need to compute different indices/skill or other metrics depending on the gridpoint. Depending on the type of analysis, there are two options:
A) latitude and longitude dimensions can be used as 'target_dims' because other dimensions can be used for chunking (e.g.: 'member').
B) latitude and longitude dimensions cannot be used as 'target_dims' because all other dimensions are being used for chunking.
For case A), the 'use_attributes' parameter in Step() function, could be used to decide on each gridpoint which metric/skill/index (or other decisions) to be applied. However, case B) is more restrictive, since only latitude and longitude can be used for chunking and at the same time we need to applied a criteria depending on the spatial dimensions.
The **proposed solution** is to use a mask (it may work for case A too). The mask could be TRUE/FALSE or other, such as a mask with values from 1 to 3 to distinguish sea, land and ice pixels.
The following toy example works with a mask of one's and zero's:
```
library(startR)
# Read data:
repos <- '/esarchive/exp/ecmwf/system5_m1/monthly_mean/$var$_f6h/$var$_$sdate$.nc'
data <- Start(dat = repos,
var = 'tas',
sdate = c('20170101', '20180101'),
ensemble = indices(1:20),
time = 'all',
latitude = indices(300:341),
longitude = indices(1:40),
return_vars = list(latitude = 'dat',
longitude = 'dat',
time = 'sdate'),
retrieve = FALSE)
# Read mask:
path <- '/esarchive/scratch/nperez/$var$_system5_m1_harvestmonth.nc'
mask <- Start(dat = path,
var = 'mask',
latitude = 'all',
longitude = 'all',
return_vars = list(latitude = 'dat',
longitude = 'dat'),
retrieve = FALSE)
# The function does the mean if the mask is 1 or write an NA
MeanMask <- function(x, mask) {
if (mask == 1) {
ind <- mean(x)
} else {
ind <- NA
}
return(ind)
}
# It is necessary to specify one dimension for mask,
# that's why I have added 'dat' dimension:
stepMask <- Step(fun = MeanMask,
target_dims = list(x = c('dat', 'ensemble', 'sdate', 'time'),
mask = c('dat')),
output_dim = NULL)
# Now, there are two data inputs:
wf_mask <- AddStep(list(data, mask), stepMask)
res <- Compute(workflow = wf_mask,
chunks = list(latitude = 2,
longitude = 2))
```
Extra code lines for verify the output can be found in '/esarchive/scratch/nperez/git/Flor/startR/Bala_workflow.R'
For creating the your own mask you can use the example in
/esarchive/scratch/nperez/git/Flor/startR/Bala_createMask.R'
We have already discussed off-line about this issue, but I think this is generic enough to be shared with other users.
Cheers,
Núria