startR issueshttps://earth.bsc.es/gitlab/es/startR/-/issues2020-04-02T10:26:00+02:00https://earth.bsc.es/gitlab/es/startR/-/issues/8Problem when Step receives NULL as target dims2020-04-02T10:26:00+02:00Llorenç LledóProblem when Step receives NULL as target dimsUsing develop-chunking, I encounter this problem.
See
```r
repos <- paste0('/esnas/exp/ecmwf/system4_m1/6hourly/',
'$var$/$var$_$sdate$.nc')
system4 <- Start(dat = repos,
var = 'sfcWind',
...Using develop-chunking, I encounter this problem.
See
```r
repos <- paste0('/esnas/exp/ecmwf/system4_m1/6hourly/',
'$var$/$var$_$sdate$.nc')
system4 <- Start(dat = repos,
var = 'sfcWind',
#sdate = paste0(1981:2015, '1101'),
sdate = paste0(1981:1984, '1101'),
#time = indices((30*4+1):(120*4)),
time = indices((30*4+1):(30*4+4)),
ensemble = 'all',
#ensemble = indices(1:6),
#latitude = 'all',
latitude = indices(1:10),
#longitude = 'all',
longitude = indices(1:10),
return_vars = list(latitude = NULL,
longitude = NULL,
time = c('sdate')))
repos <- paste0('/esnas/recon/ecmwf/erainterim/6hourly/',
'$var$/$var$_$file_date$.nc')
dates <- attr(system4, 'Variables')$common$time
dates_file <- sort(unique(gsub('-', '', sapply(as.character(dates), substr, 1, 7))))
erai <- Start(dat = repos,
var = 'sfcWind',
file_date = dates_file,
time = values(dates),
#latitude = 'all',
latitude = indices(1:10),
#longitude = 'all',
longitude = indices(1:10),
time_var = 'time',
time_tolerance = as.difftime(1, units = 'hours'),
time_across = 'file_date',
return_vars = list(latitude = NULL,
longitude = NULL,
time = 'file_date'),
merge_across_dims = TRUE,
split_multiselected_dims = TRUE)
ratio <- Start(dat = '/esnas/scratch/llledo/GWA_correction/GWA_$var$_file.nc',
var = 'ratio',
# latitude = 'all',
# longitude= 'all',
latitude = indices(1:10),
longitude = indices(1:10),
return_vars = list(latitude = NULL,
longitude = NULL))
####################
# Prepare and launch by_chunk
####################
step <- Step(eqmcv_atomic,
list(a = c('ensemble', 'sdate','time'),
b = c('sdate','time'),
c = NULL ),
# c = c('dat') fixes the bug
list(d = c('ensemble','pc')))
res <- Compute(step, list(system4, erai, ratio),
pc_list, wind2CF,
chunks = list(latitude = 5,
longitude = 5),
ncores=4,
#cluster = list(queue_host = 'bsceslogin01.bsc.es',
# max_jobs = 4,
# cores_per_job = 2),
#shared_dir = '/esnas/scratch/llledo/test_bychunk',
wait = FALSE)
```Nicolau Manubens GilNicolau Manubens Gilhttps://earth.bsc.es/gitlab/es/startR/-/issues/16Compute(): execution should be forbidden if providing indices() for any file ...2019-02-23T01:09:19+01:00Nicolau Manubens GilCompute(): execution should be forbidden if providing indices() for any file dimension and a data_dir is specificedstartR can't assume the files on data_dir are the same as on the local data repository, and hence the index references can't be granted.startR can't assume the files on data_dir are the same as on the local data repository, and hence the index references can't be granted.Nicolau Manubens GilNicolau Manubens Gilhttps://earth.bsc.es/gitlab/es/startR/-/issues/48Compute(): Limitation of variable naming in Start() when sending jobs to Power92020-02-27T17:09:57+01:00Carlos Delgado TorresCompute(): Limitation of variable naming in Start() when sending jobs to Power9Hi @aho,
As we have been seeing, there is an error in Compute() when the names of the variables in Start() contain underscores. This only happens when sending the job to Power9 (it works fine in the workstation).
This is an example of ...Hi @aho,
As we have been seeing, there is an error in Compute() when the names of the variables in Start() contain underscores. This only happens when sending the job to Power9 (it works fine in the workstation).
This is an example of the code that returns an error:
```
library(startR)
library(s2dverification)
data_exp = Start(dataset = '/esarchive/exp/ecearth/a1ua/cmorfiles/DCPP/EC-Earth-Consortium/EC-Earth3/dcppA-hindcast/$member$/Amon/$var$/gr/v20190713/$var$_Amon_*_s$sdate$-$member$_gr_$leadyear$.nc',
var = 'tas',
sdate = paste0(2000:2004),
month = 'all',
lat = values(list(0,14)),
lon = values(list(0,28)),
lead_year = 'all',
member = 1:3,
lead_year_depends = 'sdate',
month_across = 'lead_year',
synonims = list(month = c('month','time'), lon = c('lon', 'longitude'), lat = c('lat', 'latitude')),
return_vars = list(lat = 'dataset', lon = 'dataset'),
num_procs = 1, retrieve = FALSE)
fun <- function(x) {
y = apply(x,2,mean)
return(y)
}
step <- Step(fun = fun,
target_dims = c('month','member'),
output_dims = c('member'))
wf = AddStep(inputs = data_exp, step_fun = step)
res = Compute(workflow = wf,
chunks = list(lat = 2, lon = 2),
threads_load = 2,
threads_compute = 4,
cluster = list(queue_host = 'power',
queue_type = 'slurm',
temp_dir = '/gpfs/scratch/bsc32/bsc32924/startR_hpc/',
# lib_dir = '/gpfs/projects/bsc32/share/R_libs/3.5/',
r_module = 'R/3.5.0-foss-2018b',
CDO_module = 'CDO/1.9.5-foss-2018b',
cores_per_job = 4,
job_wallclock = '00:30:00',
max_jobs = 4,
extra_queue_params = list('#SBATCH --mem-per-cpu=3000'),
bidirectional = FALSE,
polling_period = 20),
ecflow_suite_dir = '/home/Earth/cdelgado/Desktop/startR_local/',
wait = TRUE)
```
The error doesn't occur when the variable "lead_year" is changed to "leadyear":
```
data_exp = Start(dataset = '/esarchive/exp/ecearth/a1ua/cmorfiles/DCPP/EC-Earth-Consortium/EC-Earth3/dcppA-hindcast/$member$/Amon/$var$/gr/v20190713/$var$_Amon_*_s$sdate$-$member$_gr_$leadyear$.nc',
var = 'tas',
sdate = paste0(2000:2004),
month = 'all',
lat = values(list(0,14)),
lon = values(list(0,28)),
leadyear = 'all',
member = 1:3,
leadyear_depends = 'sdate',
month_across = 'leadyear',
synonims = list(month = c('month','time'), lon = c('lon', 'longitude'), lat = c('lat', 'latitude')),
return_vars = list(lat = 'dataset', lon = 'dataset'),
num_procs = 1, retrieve = FALSE)
```
Cheers,
Carloshttps://earth.bsc.es/gitlab/es/startR/-/issues/72Retrieving multiple models for decadal predictions2021-01-14T09:47:48+01:00Nuria Pérez-ZanónRetrieving multiple models for decadal predictionsHi @cdelgado,
Thanks for sharing the [list](https://docs.google.com/spreadsheets/d/1By_m3EMTISS9iHnMy17_NVhYtAulArlsMIst_7SCDLM/edit#gid=0) and characteristics of decadal predictions. The aim of this issue is to check that Start() can r...Hi @cdelgado,
Thanks for sharing the [list](https://docs.google.com/spreadsheets/d/1By_m3EMTISS9iHnMy17_NVhYtAulArlsMIst_7SCDLM/edit#gid=0) and characteristics of decadal predictions. The aim of this issue is to check that Start() can retrieve multiple models in a single call. We know it is possible with other forecasts but given the complexity of decadal prediction storage, this exercise is needed.
Given the number of `models`, `variables` and `frequencies` defined in the table, we may need to set priorities for testing. Do you think, @cdelgado, we can discuss this off-line and report here the outcome?
I provide a code to verify two models can be loaded simultaneously for `daily` resolution and `tasmin` variable.
Given the number of differences between `version`, `grid` and `member`, the output will be filled with NA values. In order to adjust to user needs, @cdelgado, we can consider the [FAQ #8: Define a path with multiple dependencies](https://earth.bsc.es/gitlab/es/startR/-/blob/master/inst/doc/faq.md#8-define-a-path-with-multiple-dependencies) and the current open issue #61 about the same topic.
```
path_list <- list(list(name = 'EC-Earth',
path = '/esarchive/exp/ecearth/a1ua/cmorfiles/DCPP/EC-Earth-Consortium/EC-Earth3/dcppA-hindcast/$member$/day/$var$/$grid$/$version$/$var$_day_EC-Earth3_dcppA-hindcast_s$sdate$-$member$_$grid$_$fyear$.nc'),
list(name = 'HadGEM3',
path = '/esarchive/exp/CMIP6/dcppA-hindcast/hadgem3-gc31-mm/cmip6-dcppA-hindcast_i1p1/DCPP/MOHC/HadGEM3-GC31-MM/dcppA-hindcast/$member$/day/$var$/$grid$/$version$/$var$_day_HadGEM3-GC31-MM_dcppA-hidcast_s$sdate$_$member$_$grid$_$fyear$.nc'))
data <- Start(dataset = path_list,
var = 'tasmin',
grid = c('gr', 'gn'),
version = c('v20190713', 'v20200101'),
sdate = paste0(2018),
fmonth = 'all',
lat = values(list(0, 14)),
lon = values(list(0, 28)),
fyear = indices(1:2),
member = c('r1i1p1f1', 'r1i1p1f2'),
fyear_depends = 'sdate',
fmonth_across = 'fyear',
merge_across_dims = TRUE,
synonims = list(fmonth = c('fmonth','time'),
lon = c('lon', 'longitude'), lat = c('lat', 'latitude')),
transform = CDORemapper,
transform_extra_cells = 2,
transform_params = list(grid = 'r200x100', method = 'conservative', crop = c(0,28,0,14)),
transform_vars = c('lat', 'lon'),
return_vars = list(lat = 'dataset', lon = 'dataset'),
lat_reorder = Sort(),
num_procs = 1, retrieve = FALSE)
attributes(data)$ExpectedFiles
```
Cheers,
Núria
FYI @pabretonniere this issue is our next step in the data convention.https://earth.bsc.es/gitlab/es/startR/-/issues/83Start(): The usage and problems of parameter *_var2021-01-25T15:54:31+01:00ahoStart(): The usage and problems of parameter *_varHi @nperez
Sorry for my insistence, but since I kept coming across the relevant code of the parameter *_var in Start(), I want to clarify the usage (also the problematic part) of it. I summarize what I've found so far here.
**1. The u...Hi @nperez
Sorry for my insistence, but since I kept coming across the relevant code of the parameter *_var in Start(), I want to clarify the usage (also the problematic part) of it. I summarize what I've found so far here.
**1. The usage of \*_var**
The documentation says:
> The name of the associated coordinate variable must be a character string
with the name of an associated coordinate variable to be found in the data files
(in all* of them). For this to work, a ’file_var_reader’ function must be specified
when calling Start() (see parameter ’file_var_reader’). The coordinate variable
must also be requested in the parameter ’return_vars’ (see its section for details).
This feature only works for inner dimensions.
Take one file for example: `/esarchive/exp/ecmwf/system5_m1/monthly_mean/tas_f6h/tas_20080301.nc`. Using ncdump, we can see the file looks like this:
```
netcdf tas_20080301 {
dimensions:
ensemble = 25 ;
latitude = 640 ;
longitude = 1296 ;
time = UNLIMITED ; // (7 currently)
variables:
int realization(ensemble) ;
double latitude(latitude) ;
latitude:standard_name = "latitude" ;
latitude:long_name = "latitude" ;
latitude:units = "degrees_north" ;
latitude:axis = "Y" ;
double longitude(longitude) ;
longitude:standard_name = "longitude" ;
longitude:long_name = "longitude" ;
longitude:units = "degrees_east" ;
longitude:axis = "X" ;
float tas(time, ensemble, latitude, longitude) ;
tas:long_name = "2 metre temperature" ;
tas:code = 167 ;
tas:table = 128 ;
tas:grid_type = "gaussian" ;
tas:units = "K" ;
double time(time) ;
time:standard_name = "time" ;
time:units = "hours since 2008-03-01 00:00:00" ;
time:calendar = "proleptic_gregorian" ;
...
```
The coordinate variables include realization, latitude, longitude, and time. As we know, we don't need to worry about this parameter most of the time because the dimension name is the same as the name of the coordinate variable. Therefore, if needed, Start() will automatically add `time_var = 'time'` or `latitude_var = 'latitude'` when running and return a warning like: `Warning: Found specified values for dimension 'time' but no 'time_var' requested. "time_var = 'time'" has been automatically added to the Start call.` *(ex3)*
However, the dimension name 'ensemble' is different from the corresponding coordinate variable 'realization'. If the selector is assigned with **values**, then we need to use `ensemble_var = 'realization'` *(ex2)*.
So, what if the selector type is not 'values' but 'indices' or character like 'all'? Start() may or may not return an error *(ex4,5)*. It depends on how you define the parameter 'return_vars' (see point 2 below). But the message here we should take is that ***_var is not necessary to use if the selector type is not 'values'**.
**2. The cooperation with parameter 'return_vars'**
From the documentation above, we know that *_var has a certain relation with 'return_vars'. I list some points here:
- *_var will be automatically added to the return_vars list even you don't do it. Start() will put the value as NULL and return a warning like `Warning: All '*_var' params must associate a dimension to one of the requested variables in 'return_vars'. The following variables have been added to 'return_vars': 'time'` *(ex2,3)*.
- The following situation leads to an error: *time selector type is not 'values' + time_var = 'time' + time's value in return_vars is not NULL (e.g., time = 'sdate')*. The error is: ` Provided selectors for the dimension 'time' must have as many file dimensions as the variable the dimension is defined along, 'time', with the exceptions of the file pattern dimension ('dat') and any depended file dimension (if specified as depended dimension in parameter 'inner_dims_across_files' and the depending file dimension is present in the provided selector array).` *(ex5)*. I don't quite understand this message.
- We can fix the above situation by adding dimension name to the time selector (ex6). By this means, the selector type is still indices but with dimension name 'sdate', which is also the value in return_vars.
However, the above situation is not a legit usage from my understanding. Point 3 is just a workaround without meaning.
**3. Problem and proposed solution**
From the code, I regard that *_var should only be used when the selector type is values. However, the current code doesn't return a warning or error when the selector type is indices + *_var is assigned. It either leads to an irrelevant error (as the one in part 2 point 2) or runs well due to the coincidentally cooperated return_vars.
Since I haven't fully understood the error message, I don't wanna change it. To prevent confusion, we can 1. mention that *_var is only used when selector type is values in document 2. remove the assigned *_var if selector type is indices and show a warning.
Examples
```r
library(startR)
# Get time values for later use
repos <- '/esarchive/exp/ecmwf/system5_m1/monthly_mean/$var$_f6h/$var$_$sdate$.nc'
data <- Start(dat = repos,
var = 'tas',
sdate = c('20170101', '20180101'),
ensemble = indices(1),
time = indices(1:4),
latitude = indices(1), longitude = indices(1),
return_vars = list(time = 'sdate'),
retrieve = F)
time_val <- attr(data, 'Variables')$common$time
# The arguments which won't change in the tests
basic_list <- list(
dat = '/esarchive/exp/ecmwf/system5_m1/monthly_mean/$var$_f6h/$var$_$sdate$.nc',
var = 'tas',
sdate = c('20170101', '20180101'),
latitude = indices(1:3),
longitude = indices(1:2),
retrieve = F
)
# The tests with different arguments
test_batteries <- list(
# 1: ensemble and time are indices. no *_var assigned
c(basic_list, list(ensemble = c(1, 3)), list(time = indices(1:4))),
# 2: ensemble is values. ensemble_var assigned.
c(basic_list, list(ensemble = values(c(1, 3))), list(time = indices(1:4)),
ensemble_var = 'realization'),
# 3: ensemble and time are values. ensemble_var assigned. time_var will be added automatically.
c(basic_list, list(ensemble = values(c(1, 3))), list(time = time_val),
ensemble_var = 'realization'),
# 4: same as 2 but time_var is assigned, and return_vars = list(time = NULL).
c(basic_list, list(ensemble = values(c(1, 3))), list(time = indices(1:4)),
ensemble_var = 'realization', time_var = 'time',
list(return_vars = list(time = NULL))),
# 5: same as 4 but return_var = list(time = 'sdate'). ERROR!!
c(basic_list, list(ensemble = values(c(1, 3))), list(time = indices(1:4)),
ensemble_var = 'realization', time_var = 'time',
list(return_vars = list(time = 'sdate'))),
# 6: same as 5 but time with dim.
c(basic_list, list(ensemble = values(c(1, 3))), list(time = array(1:4, dim = c(time = 4, sdate = 2))),
ensemble_var = 'realization', time_var = 'time',
list(return_vars = list(time = 'sdate')))
)
# Run the tests
for (battery_ind in 1:length(test_batteries)) {
battery <- test_batteries[[battery_ind]]
call <- list()
cat(paste0("Test ", battery_ind, "...\n"))
call[names(battery[[call_index]])] <- battery
data <- do.call(Start, battery)
}
warnings()
```
Cheers,
An-Chihttps://earth.bsc.es/gitlab/es/startR/-/issues/100Unclear error message when using sdate in the file wildcard, and sdate is als...2022-04-08T18:29:15+02:00Llorenç LledóUnclear error message when using sdate in the file wildcard, and sdate is also a dimension of the provided time valuesHi, when running a very similar case of example 2 (https://earth.bsc.es/gitlab/es/startR/-/blob/master/inst/doc/usecase/ex1_2_exp_obs_attr.R), I used `sdate` instead of `date` in the observations wildcard (and subsequent startR call), an...Hi, when running a very similar case of example 2 (https://earth.bsc.es/gitlab/es/startR/-/blob/master/inst/doc/usecase/ex1_2_exp_obs_attr.R), I used `sdate` instead of `date` in the observations wildcard (and subsequent startR call), and I got an error:
```
Error in Start(sdate = unique(format(vdates, "%Y%m")), time = values(vdates), :
Size of selector file dimensions must mach size of requested variable dimensions.
```
The error is unclear, and probably the problem comes from the fact that `sdate` is already a dimension name of the `vdates` array that is used to define times to select in `time=values(vdates)`.
Maybe we should document that the wildcard of the observations cannot be the same used for the model in this example.https://earth.bsc.es/gitlab/es/startR/-/issues/103StartR and s2dv return different values2021-06-16T16:25:49+02:00acarreriStartR and s2dv return different valuesHi @aho, @nperez
I tried two different ways of loading data, one with StartR, one with s2dv. And I think I ask for a similar interpolation but it doesn't return the same values. I use R/3.6.1-foss-2015a-bare on WS.
My codes:
````R
lib...Hi @aho, @nperez
I tried two different ways of loading data, one with StartR, one with s2dv. And I think I ask for a similar interpolation but it doesn't return the same values. I use R/3.6.1-foss-2015a-bare on WS.
My codes:
````R
library(startR)
rm(list=ls())
gc()
vari <- 'tos'
path_obs <- paste0('/esarchive/recon/ecmwf/era5/monthly_mean/', vari, '_f1h-r1440x721cds/$var$_$year$$month$.nc')
startdates_obs <- paste0(c(1980:1982))
lat_min <- -90
lat_max <- 90
lon_min <- 0
lon_max <- 360
data_obs <- startR::Start(dat = path_obs,
var = vari,
year = startdates_obs,
time = 'all',
lat = values(list(lat_min, lat_max)),
lon = values(list(lon_min, lon_max)),
month = 'all',
month_depends = 'year', #ca fait quoi si je mets ici 'time' ?
time_across = 'month',
merge_across_dims = TRUE,
merge_across_dims_narm = TRUE,
transform = startR::CDORemapper,
transform_extra_cells = 2,
transform_params = list(grid = 'r360x180',
method = 'conservative',
crop = c(lon_min, lon_max,
lat_min, lat_max)),
transform_vars = c('lat', 'lon'),
synonims = list(lon = c('lon', 'longitude'),
lat = c('lat', 'latitude')),
return_vars = list(lat='dat',
lon='dat',
time=c('year','month')),
#lon_reorder = CircularSort(-180,180),
lat_reorder = Sort(decreasing = T),
retrieve = TRUE)
````
and
````R
library(maps)
library(s2dv)
# Clean memory
rm(list=ls())
gc()
vari <- 'tos'
startdates_obs <- paste0(c(1980:1982), '0101')
obs_era <- list(name = 'era5',
path = file.path('/esarchive/recon/ecmwf/era5/monthly_mean',
'$VAR_NAME$_f1h-r1440x721cds/$VAR_NAME$_$YEAR$$MONTH$.nc'))
data_obs <- s2dv::Load(var = vari,
exp = NULL,
obs = list(obs_era),
sdates= startdates_obs,
nmember=1,
leadtimemin = 1,
leadtimemax = 12,
output = 'lonlat',
grid = 'r360x180',
method = 'conservative',
storefreq = 'monthly',
nprocs= 1)
````
I checked that the retrieved lon and lat of both data are the same.
But I got different results:
- with StartR:
![image](/uploads/b65f50dc6f46cbc5f570e225407d227e/image.png)
- with s2dv:
![image](/uploads/e43f3234df3d02dd7b586d2028074f13/image.png)
I'm doing maybe something wrong, but I can't see where.
Many thanks!
Audeahoahohttps://earth.bsc.es/gitlab/es/startR/-/issues/104Three dependencies on file pattern2021-07-08T17:50:28+02:00Nuria Pérez-ZanónThree dependencies on file patternHi @aho
There is one case in which it is needed to load files from different versions for different start dates. Given that they are DCPP files, they have also chunk label.
I have been exploring different possibilities but I haven't b...Hi @aho
There is one case in which it is needed to load files from different versions for different start dates. Given that they are DCPP files, they have also chunk label.
I have been exploring different possibilities but I haven't been able to load the data. I guess it is not possible because three 'depends' are needed. It said on the FAQ#8 that 2 is the maximum of dependencies.
I have also explored the `ath_glob_permissive` to try to avoid the 'chunk_depends' but it only takes the first pattern found (as far as I understood in the documentation).
In this case, the states 1962 and 1981 are desired to be loaded from v20200731 and state 1990 from 20200101
A code (with many commented lines, apologises) is here `/esarchive/scratch/nperez/git/Flor/startR/Roberto_issueDepends.R`
I have also unsuccessfully tried to use multiStart. In this case, I have add a line in the code of multiStart: `.warning <- startR:::.warning` since sourcing the files it was complaining there was no function .warning.
After that, I defined two paths:
```r
repos <- list(v20200731 = list(name = 'v20200731', path =
paste0('/esarchive/exp/cesm1-1-cam5-cmip5/cmip6-dcppA-hindcast_i1p1/original_files/cmorfiles/DCPP/NCAR/CESM1-1-CAM5-CMIP5/dcppA-hindcast/$memb$/',
mod,'/$var$/',grid,'/v20200731/$var$_',mod,
'_CESM1-1-CAM5-CMIP5_dcppA-hindcast_s$sdate$-$memb$_',grid,'_$chunk$.nc',sep='')),
list(v20200101 = list(name = 'v20200101', path =
paste0('/esarchive/exp/cesm1-1-cam5-cmip5/cmip6-dcppA-hindcast_i1p1/original_files/cmorfiles/DCPP/NCAR/CESM1-1-CAM5-CMIP5/dcppA-hindcast/$memb$/',mod,
'/$var$/',grid,'/v20200101/$var$_',mod,
'_CESM1-1-CAM5-CMIP5_dcppA-hindcast_s$sdate$-$memb$_',grid,'_$chunk$.nc',sep=''))))
```
and two sdate selectors:
```r
sdate = list(list(name = 'v20200731', sdate = indices(c(1,8))),
list(name = 'v20200101', sdate = indices(c(1)))),
```
I think it would be good if you can take a look, first, to verify I am correctly understanding the issue, and second, to later discuss if Start() should deal with three dependencies or if multiStart is good to be used in this case.
I hope to talk to you soon.
Cheers,
Núriahttps://earth.bsc.es/gitlab/es/startR/-/issues/109AddStep(): Return a list with the same necessary info for each output2021-07-09T12:13:32+02:00ahoAddStep(): Return a list with the same necessary info for each outputWhen the self-defined function returns more than one output, the workflow built by AddStep() will be a list instead of class "startR_workflow". It doesn't cause a problem because in Compute(), we can use an arbitrary item under the list,...When the self-defined function returns more than one output, the workflow built by AddStep() will be a list instead of class "startR_workflow". It doesn't cause a problem because in Compute(), we can use an arbitrary item under the list, e.g., `workflow$output1`, which has the class "startR_workflow". However, it is not intuitive nor with any instruction that we should use this 'trick'.
Take [ex2_11](inst/doc/usecase/ex2_11_two_dat_inconsistent_target_dim.R) for example. The function returns two outputs, `ind_exp` and `ind_obs`, and they have the same dimension. The workflow will be a list of 2 and both items are of class "startR_workflow".
```r
class(workflow)
[1] "list"
class(workflow$ind_exp)
[1] "startR_workflow"
class(workflow$ind_obs)
[1] "startR_workflow"
```
If we put `workflow` as the input of Compute(), we'll get an error:
> Parameter 'workflow' must be an object of class 'startR_cube' as returned by Start or of class 'startR_workflow' as returned by AddStep.
We can use either `workflow$ind_exp` or `workflow$ind_obs` instead and the result will be correct.
In fact, `workflow$ind_exp` and `workflow$ind_obs` are identical. Even if the two outputs don't share the same dimensions, the necessary information of `workflow$ind_exp` and `workflow$ind_obs` for Compute() is still the same. For example, I change the dimension of `ind_obs` to `[asd = 2]` (while `ind_exp` has `[sdate = 4]`), the only different between `workflow$ind_exp` and `workflow$ind_obs` is `attributes(workflow$ind_obs)$Dimensions`. But this is not used in Compute() at all, so the difference doesn't make any impact.
```r
attributes(workflow$ind_obs)$Dimensions
asd dat var
NA 1 1
attributes(workflow$ind_exp)$Dimensions
sdate dat var
NA 1 1
```
I guess there must be a reason why startR creates a workflow for each output (though the workflows are almost the same), but for now, I cannot find any potential problem if AddStep() only returns the first output as representation. The relative code is here: https://earth.bsc.es/gitlab/es/startR/-/blob/master/R/AddStep.R#L122-142.
To avoid the error message above, we can add an additional check in Compute(), like:
```r
if (!any(c('startR_cube', 'startR_workflow') %in% class(workflow))) {
if (all(lapply(workflow, class) %in% c('startR_cube', 'startR_workflow'))) {
workflow <- workflow[[1]]
.warning("Parameter 'workflow' is a list but it contains multiple items of class 'startR_workflow' or 'startR_cube'. Use the first item in the list as the workflow.")
}
}
```
But for now, I keep the function as it is and see if we have any new findings in the future.
An-Chihttps://earth.bsc.es/gitlab/es/startR/-/issues/112Start() returns error when transform to coarser grid and the transformed grid...2021-08-23T14:35:08+02:00ahoStart() returns error when transform to coarser grid and the transformed grid number is 1The error happens when transforming to a coarser grid and the selected range has only 1 value after transformation. The error comes from Selector_Checker()**, which expects the transformed grid values are more than 1.
The example script...The error happens when transforming to a coarser grid and the selected range has only 1 value after transformation. The error comes from Selector_Checker()**, which expects the transformed grid values are more than 1.
The example script is as below. The longitude grid number becomes 1 after transformation.
```r
lons.min <- 350
lons.max <- 355
lats.min <- 20
lats.max <- 40
exp <- Start(dat = '/esarchive/exp/ecmwf/system5_m1/monthly_mean/$var$_f6h/$var$_$sdate$.nc',
var = 'tas',
sdate = '20000101',
ensemble = indices(1),
time = indices(1),
latitude = values(list(lats.min, lats.max)),
latitude_reorder = Sort(),
longitude = values(list(lons.min, lons.max)),
longitude_reorder = CircularSort(0, 360),
transform = CDORemapper,
transform_params = list(grid = 'r100x50',
method = 'con',
crop = c(lons.min, lons.max, lats.min, lats.max)),
transform_vars = c('latitude', 'longitude'),
transform_extra_cells = 2,
synonims = list(latitude = c('lat', 'latitude'),
longitude = c('longitude', 'lon')),
return_vars = list(latitude = NULL,
longitude = NULL,
time = 'sdate'),
retrieve = T)
```
> Error in if (var[1] < var[2]) { : missing value where TRUE/FALSE needed
**The Start() code where error happens is:
```
sub_array_of_sri <- selector_checker(sub_array_of_selectors, transformed_subset_var,
tolerance = if (aiat) {
tolerance_params[[inner_dim]]
} else {
NULL
})
```https://earth.bsc.es/gitlab/es/startR/-/issues/115Interpolation of Decadal original files into a regular grids avoiding vertica...2021-09-30T19:45:18+02:00Nuria Pérez-ZanónInterpolation of Decadal original files into a regular grids avoiding vertical empty lines and artifactsHi @rfernand
Finally, we are able to load and regrid IPSL files avoiding artifacts with the code:
```
library(startR)
library(s2dverification)
# Path to original cmorfiles:
path <- '/esarchive/exp/ipsl-cm6a-lr/cmip6-dcppA-hindcast_i1p...Hi @rfernand
Finally, we are able to load and regrid IPSL files avoiding artifacts with the code:
```
library(startR)
library(s2dverification)
# Path to original cmorfiles:
path <- '/esarchive/exp/ipsl-cm6a-lr/cmip6-dcppA-hindcast_i1p1/original_files/cmorfiles/DCPP/IPSL/IPSL-CM6A-LR/dcppA-hindcast/r1i1p1f1/SImon/siconc/gn/v20200101/$var$_SImon_IPSL-CM6A-LR_dcppA-hindcast_s2016-r1i1p1f1_gn_201701-202612.nc'
# Regrid:
data <- Start(dat = path,
var = 'siconc',
x = indices(2:361),
y = indices(2:331),
time = 'all',
return_vars = list(nav_lat = NULL, nav_lon = NULL),
retrieve = TRUE)
nav_lon <- attributes(data)$Variables$common$nav_lon[2:361, 2:331]
nav_lat <- attributes(data)$Variables$common$nav_lat[2:361, 2:331]
res <- CDORemap(data[1,1, , ,5],
nav_lon,
nav_lat,
grid = 'r360x180',
method = 'bicubic', crop = FALSE)
dev.new()
image(res$lons, res$lats, res$data_array, main = "Regrided without the 1st and last indices")
```
![Captura_de_pantalla_2021-09-22_a_las_12.25.56](/uploads/e954e633b5ab213b92b11dd8440b9011/Captura_de_pantalla_2021-09-22_a_las_12.25.56.png)
The figure shows the original data (top-left), the artifacts (top-right) and the results excluding some indices before the interpolation with 'bicubic' or 'bilinear' methods.
This issue is related to https://earth.bsc.es/gitlab/es/s2dverification/-/issues/259 but at that time the files were processed by @pabretonniere and Marga. It may also be interesting for @cdelgado.
Please, Roberto, let me know about the case of CMCC data, so, we can report it here.
With @aho I would like to discuss in a chat the possibility of creating a FAQ and the possibility of using 'transform' parameters in Start, instead of using CDORemap.
Please, don't hesitate to add here any information you may found important.
Thanks, especially to @rfernand for his patient.
Núriahttps://earth.bsc.es/gitlab/es/startR/-/issues/122Start(): Error when $var$ is not used in path2021-12-01T17:28:00+01:00ahoStart(): Error when $var$ is not used in pathThe wildcard `$var$` is necessary to used in the path. It is a special one that serves as both file dim and inner dim. If it is not used, Start() may return errors (but not for all the cases.) See more explanation in [practical guide](ht...The wildcard `$var$` is necessary to used in the path. It is a special one that serves as both file dim and inner dim. If it is not used, Start() may return errors (but not for all the cases.) See more explanation in [practical guide](https://earth.bsc.es/gitlab/es/startR/-/blob/master/inst/doc/practical_guide.md#start).
Here is an example script that return the error:
> Error in FUN(X[[i]], ...) :
Could not find dimension 'time' (or its synonims if specified) in the file /esarchive/exp/ecmwf/system5c3s/monthly_mean/tas_f6h/tas_19960501.nc
```r
hcst.path <- "/esarchive/exp/ecmwf/system5c3s/monthly_mean/tas_f6h/tas_$syear$.nc" # change tas to $var$ solve the problem
variable <- "tas"
hcst.sdates <- paste0(1993:2016,"0501")
dim(hcst.sdates) <- list(syear=length(hcst.sdates))
longitude_reorder = CircularSort(0, 361)
latitude_reorder = Sort(decreasing=TRUE)
data <- Start(dat = hcst.path,
var = variable,
syear = hcst.sdates,
time = indices(1:6),
latitude = values(list(-40, 10)),
latitude_reorder = latitude_reorder,
longitude = values(list(0, 60)),
longitude_reorder = longitude_reorder,
synonims = list(latitude=c('lat','latitude'),
longitude=c('lon','longitude'),
member=c('ensemble')),
member = indices(1:25),
return_vars = list(latitude = 'dat',
longitude = 'dat',
time = c('syear')),
split_multiselected_dims = FALSE,
retrieve = T)
```
issue #80 was also caused by the missing `$var$`, while the error message is different.
**TODO**:
(1) Test different cases to see if the problematic parts differ.
(2) Add a check for the existence of $var$, if not, return an error.https://earth.bsc.es/gitlab/es/startR/-/issues/130Help using startR to run Climpact script2022-02-09T11:26:13+01:00pdelucaHelp using startR to run Climpact scriptHello @aho,
I am very new to `startR` and I would really appreciate any help.
I sould like to use `startR` to run, in the most efficient way, on Nord3v2 a Climpact function from the `climdex.pcic.ncdf` R package.
Here below my script...Hello @aho,
I am very new to `startR` and I would really appreciate any help.
I sould like to use `startR` to run, in the most efficient way, on Nord3v2 a Climpact function from the `climdex.pcic.ncdf` R package.
Here below my script so far, which however is not yet correct. I am using only two models for one SSP scenario and I would like to compute all the climpact indices available on original models' grids.
```
library(startR)
###1 Load data (does not return error but I am not sure if original horizontal resolution is kept)
data_load<- Start(dat = "/esarchive/scratch/pdeluca/landmarc/def/tasmax_tasmin_pr_$model$_ssp126_1949_2100.nc",
var = c('tasmax', 'tasmin', 'pr'),
model = c('CanESM5', 'INM-CM4-8'),
#ssp = 'all',
lat = 'all',
lon = 'all',
transform = NULL,
synonims = list(lon = c('lon', 'longitude'),
lat = c('lat', 'latitude')),
return_vars = list(time = NULL,
lat = 'dat',
lon = 'dat'),
retrieve = FALSE)
###2 Define function (to adjust)
library(climdex.pcic.ncdf)
library(stringr)
setwd("/esarchive/scratch/pdeluca/scripts/climpact-master")
create.indices.from.files(input.files= !!these are the datasets loaded in the previous step!!,
out.dir="/esarchive/scratch/pdeluca/landmarc/out/",
output.filename.template=paste("all_daily_cmip6_ssp126_",
!!here the models names in loaded in the previous step!!, "_1949-2100.nc", sep = ""),
author.data=list(institution="My University", institution_id="MU"),
climdex.vars.subset = NULL,
climdex.time.resolution = "all",
variable.name.map = c(tmax = "tasmax",
tmin = "tasmin",
prec = "pr"),
axis.to.split.on = "Y",
fclimdex.compatible = FALSE,
base.range = c(1981,2010),
parallel = 16,
verbose = TRUE,
thresholds.files = NULL,
thresholds.name.map = c(
tx05thresh= "tx05thresh",
tx10thresh ="tx10thresh",
tx50thresh = "tx50thresh",
tn05thresh = "tn05thresh",
tn10thresh = "tn10thresh",
tn50thresh = "tn50thresh",
tx90thresh = "tx90thresh",
tx95thresh = "tx95thresh",
tn90thresh = "tn90thresh",
tn95thresh = "tn95thresh",
tx90thresh_15days = "tx90thresh_15days",
tn90thresh_15days = "tn90thresh_15days",
tavg90thresh_15days = "tavg90thresh_15days",
tavg05thresh = "tavg05thresh",
tavg95thresh = "tavg95thresh",
r95thresh = "r95thresh",
r99thresh = "r99thresh",
txraw = "txraw",
tnraw = "tnraw",
precraw = "precraw"),
max.vals.millions = 30, cluster.type = "SOCK")
###3 Define the workflow (to adjust)
step <- Step(fun = climdex_ind,
target_dims = list(tx = c('lat', 'time'),
tn = c('lat', 'time'),
pr = c('lat', 'time')),
output_dims = list(fd = c('lat', 'time'),
cdd = c('lat','time')))
wf <- AddStep(inputs = list(hist_tx, hist_tn, hist_pr), step,
times = time, lats = lat)
###4 Define user and submit jobs (to adjust)
queue_host = 'nord3' #your own host name for power9
temp_dir = '/gpfs/scratch/bsc32/bsc32339/startR_hpc/'
ecflow_suite_dir = '/home/Earth/nperez/startR_local/'
res <- Compute(wf$fd,
chunks = list(lon = 2),
threads_load = 1,
threads_compute = 4,
cluster = list(queue_host = queue_host,
queue_type = 'lsf',
extra_queue_params = list('#BSUB -q bsc_es'),
cores_per_job = 4,
temp_dir = temp_dir,
polling_period = 10,
job_wallclock = '01:00',
max_jobs = 2,
bidirectional = FALSE),
ecflow_suite_dir = ecflow_suite_dir,
wait = TRUE)
```
Thanks
paolohttps://earth.bsc.es/gitlab/es/startR/-/issues/142splitting multiple dimensions in Start Call2022-04-05T12:25:16+02:00lpalmasplitting multiple dimensions in Start CallHi @aho,
I have this start call that loads daily data, my initial code used indices to load the correct times, but I prefer to pass dates and to do so I need to add a multidimensional array with the corresponding date for each sdate. Th...Hi @aho,
I have this start call that loads daily data, my initial code used indices to load the correct times, but I prefer to pass dates and to do so I need to add a multidimensional array with the corresponding date for each sdate. The problem I'm facing is that I also need to use a multidimensional array to define the sdates in the subseasonal case. You already explained me that the current version of startR is only capable of splitting one dimension, so I create this issue to explore in the future the capability of splitting multiple dimensions.
Below you can see an use case to test this:
```r
##############################################################################################
## Time and sdate are defined using multidimensional arrays
## DOES NOT WORK
##############################################################################################
sdates <- array(c("20010501","20020501","20030501"),
dim = c(sday = 1, sdate = 3))
times <- array(ymd("20010501") + days(0:30) + rep(years(0:2), each = 31),
dim = c(time = 31, sdate = 3, sday = 1))
##############################################################################################
## Only Time is defined using multidimensional arrays
## DOES WORK
##############################################################################################
sdates <- c("20010501","20020501","20030501")
times <- array(ymd("20010501") + days(0:30) + rep(years(0:2), each = 31),
dim = c(time = 31, file_date = 3))
times <- as.POSIXct(times*86400, tz = 'UTC',
origin = '1970-01-01')
hcst <- Start(dat = "/esarchive/exp/ecmwf/system5c3s/daily_mean/$var$_f6h/$var$_$file_date$.nc",
var = "tas",
file_date = sdates,
time = times,
latitude = values(list(40, 48)),
latitude_reorder = Sort(),
longitude = values(list(10, 20)),
longitude_reorder = circularsort,
ensemble = indices(1:10),
return_vars = list(latitude = 'dat',
longitude = 'dat',
time = 'file_date'),
split_multiselected_dims = TRUE,
retrieve = TRUE)
```
Thanks in advance,ahoahohttps://earth.bsc.es/gitlab/es/startR/-/issues/151Make returning metadata an open option2022-05-06T13:58:02+02:00ahoMake returning metadata an open optionIn the new version 2.2.0-1, the development of metadata reshaping makes metadata correct but also makes the function slower. Since a certain startR version, to ensure metadata are correctly returned, Start() automatically adds items to `...In the new version 2.2.0-1, the development of metadata reshaping makes metadata correct but also makes the function slower. Since a certain startR version, to ensure metadata are correctly returned, Start() automatically adds items to `return_vars` or change the values of `return_vars` under several conditions, including if the dimension is: (1) assigned by values, not indices (2) reordered (3) dependent on other dims (4) reshaped
To improve efficiency, it is better to remove these obligations. If metadata is needed, users should try to write the correct Start call to obtain them; if users want to have a better performance, they can check the data beforehand and not get metadata along with data in the computation.https://earth.bsc.es/gitlab/es/startR/-/issues/152Compute jobs stuck with time limits2022-05-25T18:48:43+02:00acarreriCompute jobs stuck with time limitsHi @aho,
When you launched a script using Compute() but that each chunk takes more than the job_wallclock, the script stays stuck, with no failure, no error message in the terminal. In the ecflow interface, the job stay green.
Is there...Hi @aho,
When you launched a script using Compute() but that each chunk takes more than the job_wallclock, the script stays stuck, with no failure, no error message in the terminal. In the ecflow interface, the job stay green.
Is there a way to receive any message or indicator to let us know that something's wrong in that case?
Thanks
Audeahoahohttps://earth.bsc.es/gitlab/es/startR/-/issues/153Collect: Can't collect and return the output when all the chunks are finished2023-12-20T17:19:11+01:00ahoCollect: Can't collect and return the output when all the chunks are finishedSometimes with `Compute(wait = F)` and Collect() the results later on, Collect() keeps running because it thinks that the computation is in progress still, but on ecFlow UI, all the chunks are finished. The pattern hasn't been identified...Sometimes with `Compute(wait = F)` and Collect() the results later on, Collect() keeps running because it thinks that the computation is in progress still, but on ecFlow UI, all the chunks are finished. The pattern hasn't been identified yet, not sure which step causes the mistake.
The problematic part is the variables at [line238](https://earth.bsc.es/gitlab/es/startR/-/blob/master/R/Collect.R#L238), `if (sum_received_chunks / num_outputs == prod(unlist(chunks)))`
```r
[1] "sum_received_chunks"
[1] 0
[1] "num_outputs"
[1] 3
[1] "unlist(chunks)"
dat var latitude longitude
1 1 10 10
```
Since the calculation doesn't fulfill the if statement, Collect() doesn't collect the data and keep waiting for computation to be finished. If I tune the code and make Collect() return the output anyway, the returned output is correct. So the problem is the incorrect variables.https://earth.bsc.es/gitlab/es/startR/-/issues/159Start() regrid combined with split_multiselected_dims fails2022-07-21T18:01:05+02:00Nuria Pérez-ZanónStart() regrid combined with split_multiselected_dims failsHi @aho
This is not an urgent issue because I know I can do the regrid after loading the data.
I might be doing something wrong because the following code fails:
```
library(startR)
lonmin <- -11.5
lonmax <- 5.35
latmin <- 35.1
latma...Hi @aho
This is not an urgent issue because I know I can do the regrid after loading the data.
I might be doing something wrong because the following code fails:
```
library(startR)
lonmin <- -11.5
lonmax <- 5.35
latmin <- 35.1
latmax <- 44.1
sdates_obs <- format(ymd("20000401") + months(0:2) + rep(years(0:2), each=3), "%Y%m")
dim(sdates_obs) <- c(month = 3, year = 3)
obs <- Start(dat =
'/esarchive/recon/ecmwf/era5/daily_mean/$var$_f1h/$var$_$sdate$.nc',
var = 'tas', time = 'all',
sdate = sdates_obs,
latitude = values(list(latmin, latmax)),
latitude_reorder = Sort(decreasing = FALSE),
longitude = values(list(lonmin, lonmax)),
longitude_reorder = CircularSort(-180, 180),
transform = CDORemapper,
transform_extra_cells = 2,
transform_params = list(grid =
'/esarchive/exp/ecmwf/system5c3s/daily_mean/tas_f6h/tas_20020501.nc',
method = 'conservative',
crop = c(lonmin, lonmax, latmin, latmax)),
transform_vars = c('latitude', 'longitude'),
split_multiselected_dims = TRUE,
synonims = list(var = c('var','variable'),
longitude = c('lon', 'longitude'),
latitude = c('lat', 'latitude')),
return_vars = list(latitude = 'dat', longitude = 'dat'),
num_procs = 1, retrieve = TRUE)
```
While removing the transform parameters and return_vars works:
```
obs <- Start(dat =
'/esarchive/recon/ecmwf/era5/daily_mean/$var$_f1h/$var$_$sdate$.nc',
var = 'tas', time = 'all',
#time = indices((30 - window):(60 + window),
#time_across = 'month',
sdate = sdates_obs,
latitude = values(list(latmin, latmax)),
latitude_reorder = Sort(decreasing = FALSE),
longitude = values(list(lonmin, lonmax)),
longitude_reorder = CircularSort(-180, 180),
split_multiselected_dims = TRUE,
retrieve = TRUE)
```
Adding back only return_vars fails again:
```
obs <- Start(dat =
'/esarchive/recon/ecmwf/era5/daily_mean/$var$_f1h/$var$_$sdate$.nc',
var = 'tas', time = 'all',
sdate = sdates_obs,
latitude = values(list(latmin, latmax)),
latitude_reorder = Sort(decreasing = FALSE),
longitude = values(list(lonmin, lonmax)),
longitude_reorder = CircularSort(-180, 180),
split_multiselected_dims = TRUE,
synonims = list(var = c('var','variable'),
longitude = c('lon', 'longitude'),
latitude = c('lat', 'latitude')),
return_vars = list(latitude = 'dat', longitude = 'dat'),
num_procs = 1, retrieve = TRUE)
```
Please, if you are already aware of this issue (I have found this https://earth.bsc.es/gitlab/es/startR/-/issues/139 that may be similar)
Thanks in advance,
Núriahttps://earth.bsc.es/gitlab/es/startR/-/issues/161Split inner dimension while loading data2022-09-23T11:16:07+02:00ahoSplit inner dimension while loading dataHi @mlotto @allabres
Following our discussion about splitting the time dimension into two by Start(), I explored a bit the different usages and I'd like to make a summary here. We only tried to load one file and managed to split time di...Hi @mlotto @allabres
Following our discussion about splitting the time dimension into two by Start(), I explored a bit the different usages and I'd like to make a summary here. We only tried to load one file and managed to split time dim into c(week, day), but if we want to load more than one file (e.g., year = c("2015", "2016") & month = c("06", "07")), the Start call can't work well. Fortunately, we have other possible ways to make it. I don't want to overwhelm you right now, but when you need it, you can go through the scripts and resources and we can have further discussion.
We load one data first, without reshaping. We will compare the reshaped results with it.
```r
path1 <- "/esarchive/recon/ecmwf/era5/daily_mean/$var$_f1h/$var$_$year$$month$.nc"
variable <- "prlr"
# Without reshaping
data1 <- Start(dat = path1,
var = variable,
year = c('2015'), month = c('06', '07'),
time = 'all',
latitude = values(list(0, 6)), latitude_reorder = Sort(decreasing = TRUE),
longitude = values(list(0, 5)), longitude_reorder = CircularSort(0, 360),
synonims = list(latitude = c('lat', 'latitude'), longitude = c('lon', 'longitude')),
return_vars = list(latitude = 'dat', longitude = 'dat',
time = c('year', 'month')),
retrieve = TRUE)
dim(data1)
# dat var year month time latitude longitude
# 1 1 1 2 30 21 18
time1 <- attr(data1, 'Variables')$common$time
dim(time1)
# year month time
# 1 2 30
```
**[Method 1: time selector is an array of indices; split]**
I said that the array must be time values, but I was wrong. It could be indices as well (thanks for this use case, I didn't know Start() could work like this!)
```r
time_arr_ind <- array(1:30, dim = c(day = 10, week = 3))
data3 <- Start(dat = path1,
var = variable,
year = c('2015'), month = c('06', '07'),
time = indices(time_arr_ind), # [day, week]
latitude = values(list(0, 6)), latitude_reorder = Sort(decreasing = TRUE),
longitude = values(list(0, 5)), longitude_reorder = CircularSort(0, 360),
synonims = list(latitude = c('lat', 'latitude'), longitude = c('lon', 'longitude')),
return_vars = list(latitude = 'dat', longitude = 'dat',
time = c('year', 'month')),
split_multiselected_dims = TRUE, #*reshape
retrieve = TRUE)
dim(data3)
# dat var year month day week latitude longitude
# 1 1 1 2 10 3 21 18
time3 <- attr(data3, 'Variables')$common$time
dim(time3)
# year month day week
# 1 2 10 3
identical(as.vector(data1), as.vector(data3))
#[1] TRUE
```
**[Method 2: time selector is an array of value; merge & split]**
`merge_across_dims` and `split_multiselected_dims` are used to reshape the data. This usage is more complicated but useful to load exp and obs with a consistent structure (see [usecase 1_7](/inst/doc/usecase/ex1_7_split_merge.R)). Notice that `year` and `month` need to combine because param `time_across` can only have one.
```r
## Use time1 as the following time selector
time_arr <- array(time1, dim = c(yr_m = 2, time = 10, week = 3))
time_arr <- as.POSIXct(time_arr, origin = '1970-01-01', tz = 'UTC')
path2 <- "/esarchive/recon/ecmwf/era5/daily_mean/$var$_f1h/$var$_$dates$.nc" # use $dates$ instead of $year$$month$
data2 <- Start(dat = path2,
var = variable,
dates = c('201506', '201507'),
time = time_arr, #[yr_m, time, week] # must have 'time' dim
time_across = 'dates', #*reshape
merge_across_dims = TRUE, #*reshape
split_multiselected_dims = TRUE, #*reshape
latitude = values(list(0, 6)), latitude_reorder = Sort(decreasing = TRUE),
longitude = values(list(0, 5)), longitude_reorder = CircularSort(0, 360),
synonims = list(latitude = c('lat', 'latitude'), longitude = c('lon', 'longitude')),
return_vars = list(latitude = 'dat', longitude = 'dat',
time = c('dates')),
retrieve = TRUE)
dim(data2)
# dat var yr_m time week latitude longitude
# 1 1 2 10 3 21 18
time2 <- attr(data2, 'Variables')$common$time
dim(time2)
#yr_m time week
# 2 10 3
identical(as.vector(data1), as.vector(data2))
#[1] TRUE
```
----------------------------------
I'm going to create a use case to show the first method, then I'll close this issue. Let me know if you want to know more at some point.
Best,
An-Chihttps://earth.bsc.es/gitlab/es/startR/-/issues/164Reduce CDORemapper() warnings when multiple cores are used2022-11-07T17:11:46+01:00ahoReduce CDORemapper() warnings when multiple cores are usedFollowing up #157, the CDORemapper() repetitive warnings are reduced, but the function [.withWarnings()](https://earth.bsc.es/gitlab/es/startR/-/blob/master/R/Utils.R#L863) only works well when one core is used (`num_procs = 1`). But whe...Following up #157, the CDORemapper() repetitive warnings are reduced, but the function [.withWarnings()](https://earth.bsc.es/gitlab/es/startR/-/blob/master/R/Utils.R#L863) only works well when one core is used (`num_procs = 1`). But when multiple cores are used (see [line3854](https://earth.bsc.es/gitlab/es/startR/-/blob/master/R/Start.R#L3854-3866)), there is no warning being returned by `parallel::clusterApplyLB`. The warnings show later at `bigmemory::as.matrix(data_array)`, e.g., [line3941](https://earth.bsc.es/gitlab/es/startR/-/blob/master/R/Start.R#L3941) I've tried to use .withWarnings() to catch the warnings but it doesn't work. The following code can show the repeated warnings.
```r
# Load data
library(startR)
obs_path <- '/esarchive/recon/ecmwf/era5/monthly_mean/$var$_f1h/$var$_$sdate$.nc'
var_name <- 'sfcWind'
lons.min <- 10
lons.max <- 20
lats.min <- 0
lats.max <- 10
obs <- Start(dat = obs_path,
var = var_name,
sdate = '201811',
time = 'all',
latitude = values(list(lats.min, lats.max)),
latitude_reorder = Sort(decreasing = T),
longitude = values(list(lons.min, lons.max)),
longitude_reorder = CircularSort(0, 360),
synonims = list(longitude = c('lon', 'longitude'),
latitude = c('lat', 'latitude')),
transform = CDORemapper,
transform_extra_cells = 2,
transform_params = list(grid = 'r360x181', method = 'conservative', crop = T),
transform_vars = c('latitude', 'longitude'),
return_vars = list(time = NULL, latitude = 'dat', longitude = 'dat'),
num_procs = 2,
retrieve = T)
```
Before the final warning messages from Start(), two warnings are generated during the data loading process.
```
starting worker pid=21846 on localhost:11793 at 17:06:13.703
starting worker pid=21845 on localhost:11793 at 17:06:13.704
Warning messages:
1: In (function (data_array, variables, file_selectors = NULL, crop_domain = NULL, :
Argument 'crop' in 'transform_params' for CDORemapper() is deprecated. It is automatically assigned as the selected domain in Start() call.
2: ! Warning: CDORemap: Using CDO version 1.9.8.
* Successfully retrieved data.
Warning messages:
...
```
**A side note**: When using the whole workflow, the warning messages are generated by each chunk, so if there are 10 warnings (not only the one from CDORemapper) and we chunk data into 4, in the end of Compute(), the warnings are 40. It is not a bug since the 4 chunks are running independently, so it's normal that they all generate warnings. But if there is a way to improve the user experience, it would be good.https://earth.bsc.es/gitlab/es/startR/-/issues/167Poor error message when time selector is an array with dimension named as one...2022-11-15T18:15:08+01:00ahoPoor error message when time selector is an array with dimension named as one file dimIf time selector is an array and the values are across files, the dimension names cannot be the same as other file dimensions. The following example will work if `$sdate$` is changed to other names like `$file_date$` OR `time_arr` has di...If time selector is an array and the values are across files, the dimension names cannot be the same as other file dimensions. The following example will work if `$sdate$` is changed to other names like `$file_date$` OR `time_arr` has dimension names like `[time, date]`.
The error message is:
> Error in Start(dat = repos, var = "tas", sdate = sdates, time = time_arr, :
Provided indices out of range for dimension 'time' for dataset 'dat1' (accepted range: 1 to 1).
, which is not clear to the real problem.
```r
sdates <- paste0("20050", 1:6)
time_arr <- array(1:6, dim = c(time = 2, sdate = 3))
repos <- '/esarchive/recon/ecmwf/erainterim/monthly_mean/$var$_f6h/$var$_$sdate$.nc'
data <- Start(dat = repos,
var = 'tas',
sdate = sdates,
time = time_arr, #[time = 12, sdate = 3]
time_across = 'sdate',
merge_across_dims = T,
split_multiselected_dims = T,
lat = values(list(1, 3)),
lat_reorder = Sort(),
lon = values(list(1, 5)),
lon_reorder = CircularSort(-180, 180),
synonims = list(lat = c('lat', 'latitude'),
lon = c('lon', 'longitude')),
return_vars = list(lon = 'dat', lat = 'dat',
time = 'sdate'),
retrieve = FALSE)
```
But there is another case [ex1_13](/inst/doc/usecase/ex1_13_implicit_dependency.R) that the inner dim depends on the file dim. Test this case when making the improvement.https://earth.bsc.es/gitlab/es/startR/-/issues/170Start(): Merge dependent path dimensions ("*_depends")2022-12-28T09:24:57+01:00vagudetsStart(): Merge dependent path dimensions ("*_depends")Hi @aho,
I was wondering if there's any way to merge or remove dependent dimensions when using `*_depends` if neither is an inner dimension and `*_across` and `merge_across_dims` cannot be used.
For example:
```R
library(startR)
# Def...Hi @aho,
I was wondering if there's any way to merge or remove dependent dimensions when using `*_depends` if neither is an inner dimension and `*_across` and `merge_across_dims` cannot be used.
For example:
```R
library(startR)
# Define the path
path <- "/esarchive/exp/ecmwf/system5c3s/monthly_mean/$var$_$var_freq$/$var$_$sdate$.nc"
sdate <- "20220101"
var <- c("tas", "prlr")
# Define dependent dimension var_freq
var_freq <- list(tas = "f6h", prlr = "s0-24h")
# Load data
data <- Start(dat = path,
var = var,
sdate = sdate,
var_freq = var_freq,
var_freq_depends = "var",
time = "all",
lat = indices(1:5),
lon = indices(1:5),
ensemble = "all",
return_vars = list(time = "sdate"),
retrieve = TRUE)
## var_freq is now in the data dimensions and in the metadata. I would like to not have it.
dim(data)
# dat var file_date var_freq time lat lon ensemble
# 1 2 1 1 8 5 5 51
```
Thanks,
Victòriahttps://earth.bsc.es/gitlab/es/startR/-/issues/172Start(): load multiple variables from one file2023-12-21T15:46:30+01:00ahoStart(): load multiple variables from one fileStart() is supposed to load the files that have one "main" variable to load and the variable should be in the file name. For example, tas_200011.nc should have `tas` as the variable to load, and the file name should have `tas`. The follo...Start() is supposed to load the files that have one "main" variable to load and the variable should be in the file name. For example, tas_200011.nc should have `tas` as the variable to load, and the file name should have `tas`. The following example loads the files that have more than one "main" variable ('mean_bias', 'enscorr', 'rpss', 'crpss', 'enssprerr') and the file name doesn't have `$var$`. It still works without returning errors. But the metadata is not 100% correct (it only loads the first variable under $common.) and the returning warnings imply that Start() doesn't expect to have this file structure.
It may be a good start to develop the feature that loads the files with multiple variables inside.
```r
repos <- '/esarchive/scratch/nmilders/scorecards_data/input_data/cross_validation/ecmwfs5/$clim$/scorecards_ecmwfs5_era5_$clim$-skill_1993-2016_s$smonth$.nc'
repos2 <- '/esarchive/scratch/nmilders/scorecards_data/input_data/cross_validation/dwds2/$clim$/scorecards_dwds2_era5_$clim$-skill_1993-2016_s$smonth$.nc'
# Multiple datasets
data <- Start(dat = list(list(name = 'dwds2', path = repos2),
list(name = 'ecmwfs5', path = repos)),
# outer dimensions
var = c('mean_bias', 'enscorr', 'rpss', 'crpss', 'enssprerr'),
smonth = c(paste0('0', 1:9), 10:12),
clim = c('tas'),
# inner dimensions
time = 'all',
latitude = 'all',
longitude = 'all',
return_vars = list(longitude = 'dat',
latitude = 'dat',
time = NULL
),
retrieve = TRUE)
```https://earth.bsc.es/gitlab/es/startR/-/issues/174Make Start() load the monthly data only consider the month value2023-12-21T15:46:00+01:00ahoMake Start() load the monthly data only consider the month valueThis issue is for a potential development following the discussion here: https://earth.bsc.es/gitlab/es/startR/-/issues/171#note_197274. When trying to load obs data using the time attribute of exp data as the time parameter input, we ha...This issue is for a potential development following the discussion here: https://earth.bsc.es/gitlab/es/startR/-/issues/171#note_197274. When trying to load obs data using the time attribute of exp data as the time parameter input, we have problems if exp doesn't have the exact the same time value as the obs data. For example, if exp monthly data has time "2000-11-30" "2000-12-31", while obs has "2000-11-01" "2000-12-01", the retrieved obs data will be wrong since startR looks for the closest value in the data. For November, obs will have December data since "2000-12-01" is closer to "2000-11-30" than "2000-12-31".
It can be solved by tuning the exp time attributes (see example below), but one potential development is to tell startR that the data is monthly, then it only looks for the month value rather than the complete time value. We can add one parameter like `time_freq` in Start(); if it is NULL, Start() looks for the closest time value as it does now; if it is "monthly"/"daily"/"hourly", Start() looks for the time value until "month"/"day"/"hour" granularity.
```r
library(startR)
library(lubridate)
sdate <- as.vector(sapply(1995:1996, function(x) paste0(x, sprintf('%02d', 1:12), '01')))
exp <- Start(
dat = '/esarchive/exp/ecmwf/system5c3s/monthly_mean/$var$_f6h/$var$_$sdate$.nc',
var = 'tas',
sdate = sdate,
time = seq(1:3),
ensemble = 1,
latitude = indices(1),
longitude = indices(1),
synonims = list(latitude = c('lat', 'latitude'), longitude = c('lon', 'longitude')),
return_vars = list(time = 'sdate', latitude = NULL, longitude = NULL),
retrieve = TRUE)
dates <- attr(exp, 'Variables')$common$time
#===========WORKAROUND======================
# Adjust the day to the middle of the month
dates_mid <- dates - lubridate::days(15)
dim(dates_mid) <- dim(dates)
#===========================================
obs <- Start(
dat = '/esarchive/recon/ecmwf/era5/monthly_mean/$var$_f1h-r1440x721cds/$var$_$file_date$.nc',
var = 'tas',
file_date = unique(format(dates, '%Y%m')),
time = values(dates), #values(dates_mid),
time_across = 'file_date',
merge_across_dims = TRUE,
split_multiselected_dims = TRUE,
latitude = indices(1),
longitude = indices(1),
synonims = list(latitude = c('lat', 'latitude'), longitude = c('lon', 'longitude')),
return_vars = list(time = 'file_date', latitude = NULL, longitude = NULL),
retrieve = TRUE)
obs_dates <- attr(obs, 'Variables')$common$time
```https://earth.bsc.es/gitlab/es/startR/-/issues/179Start(): 'caught segfault' error when trying to load more than 16 Gb of data ...2024-02-29T11:10:02+01:00vagudetsStart(): 'caught segfault' error when trying to load more than 16 Gb of data on Nord3v2 highmem nodeHi @aho
@nmilders reported that she was having memory issues when loading 24 years x 6 time steps experiment data and regridding it to ERA5 resolution (0.25ºx0.25º) using startR on Nord3v2, even on the high memory nodes (128 Gb availab...Hi @aho
@nmilders reported that she was having memory issues when loading 24 years x 6 time steps experiment data and regridding it to ERA5 resolution (0.25ºx0.25º) using startR on Nord3v2, even on the high memory nodes (128 Gb available).
The size of the data in my own test was as follows:
```
* Detected dimension sizes:
* dat: 1
* var: 1
* sdate: 24
* time: 6
* latitude: 721
* longitude: 1440
* ensemble: 25
* Total size of involved data:
* 1 x 1 x 24 x 6 x 721 x 1440 x 25 x 8 bytes = 27.8 Gb
```
Here is a reproducible example:
```R
library(startR)
# SEAS5 monthly data
hcst.path <- "/esarchive/exp/ecmwf/system5c3s/monthly_mean/$var$_f6h/$var$_$sdate$0301.nc"
# Regrid to ERA5 resolution
target.grid <- "/esarchive/recon/ecmwf/era5/monthly_mean/tas_f1h-r1440x721cds/tas_201805.nc"
# 24 years of hindcast
start.dates <- as.character(c(1993:2016))
# Region
lat.min <- -90
lat.max <- 90
lon.min <- 0
lon.max <- 359.9
hcst <- Start(dat = hcst.path,
var = 'tas',
sdate = start.dates,
time = 1:6,
latitude = values(list(lat.min, lat.max)),
latitude_reorder = Sort(),
longitude = values(list(lon.min, lon.max)),
longitude_reorder = CircularSort(0, 360),
transform = CDORemapper,
transform_params = list(grid = target.grid,
method = 'bilinear'),
transform_vars = c('latitude', 'longitude'),
synonims = list(latitude = c('lat', 'latitude'),
longitude = c('lon', 'longitude'),
ensemble = c('member', 'ensemble')),
ensemble = 'all',
metadata_dims = 'var',
return_vars = list(latitude = 'dat',
longitude = 'dat',
time = 'sdate'),
retrieve = TRUE)
```
Trying to load this results in the following error message:
```
* Progress: 0%[s01r2b17:3954448:0:3954448] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x7f47dff27ff8)
==== backtrace (tid:3954448) ====
0 0x00000000000534c9 ucs_debug_print_backtrace() ???:0
1 0x0000000000012b20 .annobin_sigaction.c() sigaction.c:0
2 0x000000000004b216 SetIndivVectorMatrixElements() ???:0
3 0x000000000004491f _bigmemory_SetIndivVectorMatrixElements() ???:0
4 0x00000000001015c0 R_doDotCall() initfini.c:0
5 0x0000000000142717 bcEval() eval.c:0
6 0x000000000014e080 Rf_eval.localalias.34() eval.c:0
7 0x000000000014fe7f R_execClosure() eval.c:0
8 0x0000000000150df7 Rf_applyClosure() ???:0
9 0x000000000014409e bcEval() eval.c:0
10 0x000000000014e080 Rf_eval.localalias.34() eval.c:0
11 0x000000000014fe7f R_execClosure() eval.c:0
12 0x0000000000150df7 Rf_applyClosure() ???:0
13 0x000000000014409e bcEval() eval.c:0
14 0x000000000014e080 Rf_eval.localalias.34() eval.c:0
15 0x000000000014fe7f R_execClosure() eval.c:0
16 0x0000000000151295 R_execMethod() ???:0
17 0x00000000000045b9 R_dispatchGeneric() initfini.c:0
18 0x00000000001962ef do_standardGeneric() initfini.c:0
19 0x000000000013cbd8 bcEval() eval.c:0
20 0x000000000014e080 Rf_eval.localalias.34() eval.c:0
21 0x000000000014fe7f R_execClosure() eval.c:0
22 0x0000000000150df7 Rf_applyClosure() ???:0
23 0x0000000000196e16 R_possible_dispatch() initfini.c:0
24 0x0000000000135e7d tryDispatch() eval.c:0
25 0x000000000013609f tryAssignDispatch() eval.c:0
26 0x000000000013a8a3 bcEval() eval.c:0
27 0x000000000014e080 Rf_eval.localalias.34() eval.c:0
28 0x000000000014fe7f R_execClosure() eval.c:0
29 0x0000000000150df7 Rf_applyClosure() ???:0
30 0x000000000015378c R_forceAndCall() ???:0
31 0x0000000000084e92 do_lapply() initfini.c:0
32 0x0000000000191936 do_internal() initfini.c:0
33 0x000000000013af07 bcEval() eval.c:0
34 0x000000000014e080 Rf_eval.localalias.34() eval.c:0
35 0x000000000014fe7f R_execClosure() eval.c:0
36 0x0000000000150df7 Rf_applyClosure() ???:0
37 0x000000000014409e bcEval() eval.c:0
38 0x000000000014e080 Rf_eval.localalias.34() eval.c:0
39 0x000000000014ea22 forcePromise() eval.c:0
40 0x000000000014eee8 getvar() eval.c:0
41 0x0000000000142515 bcEval() eval.c:0
42 0x000000000014e080 Rf_eval.localalias.34() eval.c:0
43 0x000000000014ea22 forcePromise() eval.c:0
44 0x000000000014eee8 getvar() eval.c:0
45 0x0000000000142515 bcEval() eval.c:0
46 0x000000000014e080 Rf_eval.localalias.34() eval.c:0
47 0x000000000014fe7f R_execClosure() eval.c:0
48 0x0000000000150df7 Rf_applyClosure() ???:0
49 0x000000000014409e bcEval() eval.c:0
50 0x000000000014e080 Rf_eval.localalias.34() eval.c:0
51 0x000000000014fe7f R_execClosure() eval.c:0
52 0x0000000000150df7 Rf_applyClosure() ???:0
53 0x000000000014409e bcEval() eval.c:0
54 0x000000000014e080 Rf_eval.localalias.34() eval.c:0
55 0x000000000014fe7f R_execClosure() eval.c:0
56 0x0000000000150df7 Rf_applyClosure() ???:0
57 0x000000000014e240 Rf_eval.localalias.34() eval.c:0
58 0x00000000001531ea do_set() initfini.c:0
=================================
*** caught segfault ***
address 0x1182003c5710, cause 'unknown'
```https://earth.bsc.es/gitlab/es/startR/-/issues/180dat dimension cannot be chunked2023-12-21T13:27:04+01:00ahodat dimension cannot be chunkedWhen two datasets are loaded together and the chunking dimension includes "dat", the error is returned:
> Error in get_chunk_indices(length(dat_selectors[[file_dim]][[j]]), chunks[[file_dim]]["chunk"], :
Requested to divide dimensio...When two datasets are loaded together and the chunking dimension includes "dat", the error is returned:
> Error in get_chunk_indices(length(dat_selectors[[file_dim]][[j]]), chunks[[file_dim]]["chunk"], :
Requested to divide dimension 'dat' of length 1 in 2 chunks, which is not possible.
It is not unexpected because startR loads datasets one after another (in a for loop), and it is not necessary because we can use two Start() calls to define two datasets separately, if they are independent (therefore, can be chunked). But it should return a more meaningful error.
The following script can produce this error.
```r
library(startR)
path1 <- "/esarchive/exp/ecmwf/system5c3s/monthly_mean/$var$_f6h/$var$_$sdate$.nc"
path2 <- "/esarchive/exp/ecmwf/system5_m1/monthly_mean/$var$_f6h/$var$_$sdate$.nc"
data <- Start(dat = list(list(name = 'system5c3s', path = path1),
list(name = 'system5_m1', path = path2)),
var = c('tas'),
sdate = paste0(2017:2018, '0501'),
ensemble = 'all',
time = indices(1:3),
lat = values(list(20, 80)), lat_reorder = Sort(),
lon = values(list(-80, 40)), lon_reorder = CircularSort(-180, 180),
transform = CDORemapper,
transform_extra_cells = 2,
transform_params = list(grid = 'r360x181',
method = 'conservative'),
transform_vars = c('lat', 'lon'),
synonims = list(lat = c('lat', 'latitude'), lon = c('lon', 'longitude')),
return_vars = list(time = 'sdate', lon = 'dat', lat = 'dat'),
retrieve = FALSE)
func <- function(x) {
return(x)
}
step <- Step(func, target_dims = c('lat', 'lon'), output_dims = c('lat', 'lon'))
wf <- AddStep(data, step)
res <- Compute(wf, chunks = list(sdate = 2, dat = 2))
```https://earth.bsc.es/gitlab/es/startR/-/issues/182Missing data is repeated when Start tries to read files that do not exist2023-10-03T16:57:37+02:00Eva RifàMissing data is repeated when Start tries to read files that do not existHi @aho,
#### Summary
While substituting CST_Load by CST_Start in CSTools package vignettes I found something unexpected. The problem has happened when Start() has tried to read files that doesn't exist, the data gets repeated. The code...Hi @aho,
#### Summary
While substituting CST_Load by CST_Start in CSTools package vignettes I found something unexpected. The problem has happened when Start() has tried to read files that doesn't exist, the data gets repeated. The code is in [MostLikelyTercile_vignette](https://earth.bsc.es/gitlab/external/cstools/-/blob/master/vignettes/MostLikelyTercile_vignette.Rmd). It's unexpected because with the same code using CST_Load, the missing data is returned as NA. The vignette with CST_Start is in the following branch: [develop-vignettes_CST_Start](https://earth.bsc.es/gitlab/external/cstools/-/blob/develop-vignettes_CST_Start/vignettes/MostLikelyTercile_vignette.Rmd?ref_type=heads)
#### Example
Here below I leave the piece of code to reproduce the error:
```r
library(CSTools)
library(s2dv)
library(zeallot)
library(startR)
lat_min = 25
lat_max = 35
lon_min = -10
lon_max = 10
dates0 <- c(paste0(2015:2020,c(rep("0630", 2), rep("0615",2),rep("0630",2))),
paste0(2015:2020,c(rep("0731", 2), rep("0716",2),rep("0731",2))),
paste0(2015:2020,c(rep("0831", 2), rep("0816",2),rep("0831",2))))
dates0 <- as.POSIXct(dates0, format = "%Y%m%d", "UTC")
dim(dates0) <- c(sdate = 6, ftime = 3)
repos_obs <- paste0('/esarchive/recon/ecmwf/erainterim/monthly_mean/',
'$var$/$var$_$date$.nc')
obs <- Start(dataset = repos_obs,
var = 'tas',
date = unique(format(dates0, '%Y%m')),
ftime = values(dates0),
ftime_across = 'date',
ftime_var = 'ftime',
merge_across_dims = TRUE,
split_multiselected_dims = TRUE,
lat = values(list(lat_min, lat_max)),
lat_reorder = Sort(decreasing = TRUE),
lon = values(list(lon_min, lon_max)),
lon_reorder = CircularSort(0, 360),
synonims = list(lon = c('lon', 'longitude'),
lat = c('lat', 'latitude'),
ftime = c('ftime', 'time')),
return_vars = list(lon = NULL,
lat = NULL,
ftime = 'date'),
retrieve = TRUE)
```
The error messages are the following:
```
* Exploring files... This will take a variable amount of time depending
* on the issued request and the performance of the file server...
Error in R_nc4_open: No such file or directory
Error in R_nc4_open: No such file or directory
Error in R_nc4_open: No such file or directory
Error in R_nc4_open: No such file or directory
Error in R_nc4_open: No such file or directory
Error in R_nc4_open: No such file or directory
* Detected dimension sizes:
* dataset: 1
* var: 1
* sdate: 6
* ftime: 3
* lat: 14
* lon: 29
* Total size of requested data:
* 1 x 1 x 6 x 3 x 14 x 29 x 8 bytes = 57.1 Kb
* If the size of the requested data is close to or above the free shared
* RAM memory, R may crash.
* If the size of the requested data is close to or above the half of the
* free RAM memory, R may crash.
* Will now proceed to read and process 11 data files:
* /esarchive/recon/ecmwf/erainterim/monthly_mean/tas/tas_201506.nc
* /esarchive/recon/ecmwf/erainterim/monthly_mean/tas/tas_201606.nc
* /esarchive/recon/ecmwf/erainterim/monthly_mean/tas/tas_201706.nc
* /esarchive/recon/ecmwf/erainterim/monthly_mean/tas/tas_201806.nc
* /esarchive/recon/ecmwf/erainterim/monthly_mean/tas/tas_201507.nc
* /esarchive/recon/ecmwf/erainterim/monthly_mean/tas/tas_201607.nc
* /esarchive/recon/ecmwf/erainterim/monthly_mean/tas/tas_201707.nc
* /esarchive/recon/ecmwf/erainterim/monthly_mean/tas/tas_201807.nc
* /esarchive/recon/ecmwf/erainterim/monthly_mean/tas/tas_201508.nc
* /esarchive/recon/ecmwf/erainterim/monthly_mean/tas/tas_201608.nc
* /esarchive/recon/ecmwf/erainterim/monthly_mean/tas/tas_201808.nc
* Loading... This may take several minutes...
* Progress: 0% + 10% + 10% + 10% + 10% + 10% + 10% + 10% + 10% + 10% + 10%
* Successfully retrieved data.
Warning messages:
1: ! Warning: Parameter 'pattern_dims' not specified. Taking the first dimension,
! 'dataset' as 'pattern_dims'.
2: ! Warning: Could not find any pattern dim with explicit data set descriptions (in
! the form of list of lists). Taking the first pattern dim, 'dataset',
! as dimension with pattern specifications.
3: ! Warning: Found specified values for dimension 'lat' but no 'lat_var' requested.
! "lat_var = 'lat'" has been automatically added to the Start call.
4: ! Warning: Found specified values for dimension 'lon' but no 'lon_var' requested.
! "lon_var = 'lon'" has been automatically added to the Start call.
5: ! Warning: Date selectors have been provided for a dimension defined along a date
! variable, but no exact match found for all the selectors. Taking the
! index of the nearest values.
```
Even if from the message I see that some files are missing, no NAs are added in the data array:
```
> summary(obs)
Min. 1st Qu. Median Mean 3rd Qu. Max.
292.0 302.0 306.3 305.1 308.5 315.2
```
I see that some dates are repeated. Specifically, there are repeated dates for the year 2017, the date "2017-06-30" appears twice. Also, for the years 2019 and 2020 we don't see them and the year 2018 is repeated on its place.
```
> dateso <- attributes(obs)$Variable$common$ftime
> dateso
[1] "2015-06-30 18:00:00 UTC" "2016-06-30 18:00:00 UTC"
[3] "2017-06-30 18:00:00 UTC" "2018-06-30 18:00:00 UTC"
[5] "2018-08-31 18:00:00 UTC" "2018-08-31 18:00:00 UTC"
[7] "2015-07-31 18:00:00 UTC" "2016-07-31 18:00:00 UTC"
[9] "2017-06-30 18:00:00 UTC" "2018-06-30 18:00:00 UTC"
[11] "2018-08-31 18:00:00 UTC" "2018-08-31 18:00:00 UTC"
[13] "2015-08-31 18:00:00 UTC" "2016-08-31 18:00:00 UTC"
[15] "2017-07-31 18:00:00 UTC" "2018-07-31 18:00:00 UTC"
[17] "2018-08-31 18:00:00 UTC" "2018-08-31 18:00:00 UTC"
```
The data array matches with the repeated dates in the sense that teh data repeated in the same place:
```
> obs[,,,,1,1]
[,1] [,2] [,3]
[1,] 294.8316 301.7520 300.2737
[2,] 297.0100 301.4115 299.4274
[3,] 298.9523 298.9523 302.0528
[4,] 295.2444 295.2444 300.4945
[5,] 298.9373 298.9373 298.9373
[6,] 298.9373 298.9373 298.9373
```
#### Module and Package Version
R version 4.1.2 (2021-11-01)
startR_2.3.0
I think that as long the dates match with the data (they are both repeated in the same places) it's not an error. But how we could get NAs for missing files for this case as Load was doing? It can also be the case that I am not calling Start in a good way.
Evahttps://earth.bsc.es/gitlab/es/startR/-/issues/184Loading 3 datasets in the same Start call2023-10-17T18:52:04+02:00Eva RifàLoading 3 datasets in the same Start callHi @aho,
I open this issue because I am trying to load 3 datasets at once with Start and I get different results when I load them separately. Also, when comparing the results with CST_Load, they are a bit different. Here below I give mo...Hi @aho,
I open this issue because I am trying to load 3 datasets at once with Start and I get different results when I load them separately. Also, when comparing the results with CST_Load, they are a bit different. Here below I give more details about the code I use. The code is from the CSTools vignette [MultiModelSkill_vignette](https://earth.bsc.es/gitlab/external/cstools/-/blob/master/vignettes/MultiModelSkill_vignette.Rmd). The substitution of CST_Load by CST_Start can be found [here](https://earth.bsc.es/gitlab/external/cstools/-/merge_requests/181/diffs#cc2b59f6ddef0d83ed340435192967ed07efe755).
**Step 0: Source functions and define input params**
```r
# Step (0a): Source functions
source("https://earth.bsc.es/gitlab/external/cstools/-/raw/master/R/CST_Load.R")
source("https://earth.bsc.es/gitlab/external/cstools/-/raw/master/R/CST_Start.R")
source("https://earth.bsc.es/gitlab/external/cstools/-/raw/master/R/zzz.R")
source("https://earth.bsc.es/gitlab/external/cstools/-/raw/master/R/as.s2dv_cube.R")
library(s2dv)
library(startR)
library(zeallot)
# Step (0b): Input params
# shared
mth = '11'
clim_var = 'tas'
ini <- 1993
fin <- 2012
start <- as.Date(paste(ini, mth, "01", sep = ""), "%Y%m%d")
end <- as.Date(paste(fin, mth, "01", sep = ""), "%Y%m%d")
# Load
dateseq <- format(seq(start, end, by = "year"), "%Y%m%d")
grid <- "256x128"
# Start
dateseq2 <- format(seq(start, end, by = "year"), "%Y%m")
lonmin = 50 #-20
lonmax = 70
latmin = 25
latmax = 45 # 75
```
#### Part 1: Load datasets within the same call
**Load Call**
```r
glosea5 <- '/esarchive/exp/glosea5/glosea5c3s/$STORE_FREQ$_mean/$VAR_NAME$_f6h/$VAR_NAME$_$YEAR$$MONTH$.nc'
c(expl, obsl) %<-%
CST_Load(var = clim_var, exp = list(list(name = 'glosea5', path = glosea5),
list(name = 'ecmwf/system4_m1'),
list(name = 'meteofrance/system5_m1')),
obs = "erainterim", sdates = dateseq, leadtimemin = 2, leadtimemax = 4,
lonmin = 50, lonmax = 70, latmin = 25, latmax = 45,
storefreq = "monthly", sampleperiod = 1, nmember = 4,
output = "lonlat", method = "bilinear",
grid = paste("r", grid, sep = ""))
```
**Start call**
```r
repos1 <- "/esarchive/exp/glosea5/glosea5c3s/monthly_mean/$var$_f6h/$var$_$sdate$.nc"
repos2 <- "/esarchive/exp/ecmwf/system4_m1/monthly_mean/$var$_f6h/$var$_$sdate$01.nc"
repos3 <- "/esarchive/exp/meteofrance/system5_m1/monthly_mean/$var$_f6h/$var$_$sdate$01.nc"
exp <- CST_Start(dataset = list(list(name = 'glosea5c3s', path = repos1),
list(name = 'ecmwf/system4_m1', path = repos2),
list(name = 'meteofrance/system5_m1', path = repos3)),
var = clim_var,
member = indices(1:4),
sdate = dateseq2,
ftime = indices(2:4),
lat = values(list(latmin, latmax)),
lat_reorder = Sort(decreasing = TRUE),
lon = values(list(lonmin, lonmax)),
lon_reorder = CircularSort(0, 360),
synonims = list(lon = c('lon', 'longitude'),
lat = c('lat', 'latitude'),
member = c('member', 'ensemble'),
ftime = c('ftime', 'time')),
transform = CDORemapper,
transform_extra_cells = 2,
transform_params = list(grid = 'r256x128',
method = 'bilinear'),
transform_vars = c('lat', 'lon'),
return_vars = list(lat = NULL,
lon = NULL, ftime = 'sdate'),
retrieve = TRUE)
dates_exp <- exp$attrs$Dates
repos_obs <- "/esarchive/recon/ecmwf/erainterim/monthly_mean/$var$/$var$_$date$.nc"
obs <- CST_Start(dataset = list(list(name = 'erainterim', path = repos_obs)),
var = clim_var,
date = unique(format(dates_exp, '%Y%m')),
ftime = values(dates_exp),
ftime_across = 'date',
ftime_var = 'ftime',
merge_across_dims = TRUE,
split_multiselected_dims = TRUE,
lat = values(list(latmin, latmax)),
lat_reorder = Sort(decreasing = TRUE),
lon = values(list(lonmin, lonmax)),
lon_reorder = CircularSort(0, 360),
synonims = list(lon = c('lon', 'longitude'),
lat = c('lat', 'latitude'),
ftime = c('ftime', 'time')),
transform = CDORemapper,
transform_extra_cells = 2,
transform_params = list(grid = 'r256x128',
method = 'bilinear'),
transform_vars = c('lat', 'lon'),
return_vars = list(lon = NULL,
lat = NULL,
ftime = 'date'),
retrieve = TRUE)
```
#### Part 2: Load exp datasets within separated calls
```r
exp1 <- CST_Start(dataset = list(list(name = 'glosea5c3s', path = repos1)),
var = clim_var,
member = indices(1:4),
sdate = dateseq2,
ftime = indices(2:4),
lat = values(list(latmin, latmax)),
lat_reorder = Sort(decreasing = TRUE),
lon = values(list(lonmin, lonmax)),
lon_reorder = CircularSort(0, 360),
synonims = list(lon = c('lon', 'longitude'),
lat = c('lat', 'latitude'),
member = c('member', 'ensemble'),
ftime = c('ftime', 'time')),
transform = CDORemapper,
transform_extra_cells = 2,
transform_params = list(grid = 'r256x128',
method = 'bilinear'),
transform_vars = c('lat', 'lon'),
return_vars = list(lat = NULL,
lon = NULL, ftime = 'sdate'),
retrieve = TRUE)
exp2 <- CST_Start(dataset = list(list(name = 'ecmwf/system4_m1', path = repos2)),
var = clim_var,
member = indices(1:4),
sdate = dateseq2,
ftime = indices(2:4),
lat = values(list(latmin, latmax)),
lat_reorder = Sort(decreasing = TRUE),
lon = values(list(lonmin, lonmax)),
lon_reorder = CircularSort(0, 360),
synonims = list(lon = c('lon', 'longitude'),
lat = c('lat', 'latitude'),
member = c('member', 'ensemble'),
ftime = c('ftime', 'time')),
transform = CDORemapper,
transform_extra_cells = 2,
transform_params = list(grid = 'r256x128',
method = 'bilinear'),
transform_vars = c('lat', 'lon'),
return_vars = list(lat = NULL,
lon = NULL, ftime = 'sdate'),
retrieve = TRUE)
exp3 <- CST_Start(dataset = list(list(name = 'meteofrance/system5_m1', path = repos3)),
var = clim_var,
member = indices(1:4),
sdate = dateseq2,
ftime = indices(2:4),
lat = values(list(latmin, latmax)),
lat_reorder = Sort(decreasing = TRUE),
lon = values(list(lonmin, lonmax)),
lon_reorder = CircularSort(0, 360),
synonims = list(lon = c('lon', 'longitude'),
lat = c('lat', 'latitude'),
member = c('member', 'ensemble'),
ftime = c('ftime', 'time')),
transform = CDORemapper,
transform_extra_cells = 2,
transform_params = list(grid = 'r256x128',
method = 'bilinear'),
transform_vars = c('lat', 'lon'),
return_vars = list(lat = NULL,
lon = NULL, ftime = 'sdate'),
retrieve = TRUE)
```
#### Part 3: Compare results
```r
# First dataset
> summary(exp1$data) # Start separated
Min. 1st Qu. Median Mean 3rd Qu. Max.
254.3 274.1 277.5 278.1 281.8 297.4
> summary(expl$data[1,,,,,]) # Load
Min. 1st Qu. Median Mean 3rd Qu. Max.
254.3 274.1 277.5 278.1 281.8 297.4
> summary(exp$data[1,,,,,,]) # Start unique call
Min. 1st Qu. Median Mean 3rd Qu. Max.
254.3 274.1 277.5 278.1 281.8 297.4
# OK
# Second dataset
> summary(exp2$data)
Min. 1st Qu. Median Mean 3rd Qu. Max.
252.9 272.7 276.9 277.6 281.8 298.7
> summary(expl$data[2,,,,,])
Min. 1st Qu. Median Mean 3rd Qu. Max.
252.9 272.7 276.9 277.6 281.8 298.7
> summary(exp$data[2,,,,,,]) # Start unique call
Min. 1st Qu. Median Mean 3rd Qu. Max.
246.3 263.0 266.6 266.7 270.4 282.7
# Last NO OK
# Third dataset
> summary(exp3$data)
Min. 1st Qu. Median Mean 3rd Qu. Max.
247.8 273.0 276.4 277.5 281.7 300.5
> summary(expl$data[3,,,,,])
Min. 1st Qu. Median Mean 3rd Qu. Max.
247.8 273.0 276.4 277.5 281.7 300.5
> summary(exp$data[3,,,,,,]) # Start unique call
Min. 1st Qu. Median Mean 3rd Qu. Max.
253.8 267.1 270.7 270.4 273.8 283.5
# Last NO OK
```
The dates of the 3 datasets separated are different. However, the separated results of the data are equal to the CST_Load results.
```r
> exp1$attrs$Dates[[1]]
[1] "1993-12-16 UTC"
> exp2$attrs$Dates[[1]]
[1] "1994-01-01 UTC"
> exp3$attrs$Dates[[1]]
[1] "1994-01-16 09:00:00 UTC"
```
Is there a way to load correclty the 3 datasets in a unique Start call correctly? If not, do you think it is a good idea that for now I load the datasets separated in the vignette and I keep the development for next releases of CSTools?
Thank you in advance,
Evahttps://earth.bsc.es/gitlab/es/startR/-/issues/187Next release after v2.3.02024-02-19T10:14:07+01:00ahoNext release after v2.3.0**Development**
* [x] Use Autosubmit on hub
* [x] Use Collect() on HPCs !226 #189
**Bugfix**
* [x] Correct Collect_autosubmit() .Rds files update !225
* [x] Collect(): Correctly recognize the finished chunk (.Rds file) in local ecFlow...**Development**
* [x] Use Autosubmit on hub
* [x] Use Collect() on HPCs !226 #189
**Bugfix**
* [x] Correct Collect_autosubmit() .Rds files update !225
* [x] Collect(): Correctly recognize the finished chunk (.Rds file) in local ecFlow folder. Prevent neverending Collect() when using `wait = F` in Compute() and Collect() the result later on. !228 #153
**Others**
* [x] Correct Start() documentation: Add 'ExpectedFiles' and 'PatternDim' to output description !232https://earth.bsc.es/gitlab/es/startR/-/issues/190GRIB loading2023-12-19T18:24:59+01:00ahoGRIB loadingI attempted to create some helper functions to load GRIB files with Start(). It can load one GRIB file with some restrictions:
- regular grid data (gridType: regular_ll)
- Global region
- selected time step
- no member dim, only [latitu...I attempted to create some helper functions to load GRIB files with Start(). It can load one GRIB file with some restrictions:
- regular grid data (gridType: regular_ll)
- Global region
- selected time step
- no member dim, only [latitude, longitude, time]
Here are the files: https://earth.bsc.es/gitlab/aho/aho-testtest/-/tree/master/startR/GRIB
In the [testing script](https://earth.bsc.es/gitlab/aho/aho-testtest/-/blob/master/startR/GRIB/script_grib.R), the first two cannot work now due to missing files. I found one file (the 3rd case) that Start() can successfully load.
Many remaining things to be solved, e.g., correct metadata, flexible dimensions, transform, load a region instead of global data.
The helper function uses "gribr" package. It's better to not have this dependency.
FYI @vagudets @erifarov