Wrong data output when time selectors have out-of-range or NA values
Hi @nperez
This is the problem I mentioned off-line earlier. I tested with monthly and daily data and leave the summary here.
If the time selector is assigned by values(), and some of the values are not in the files, then the output data are wrong (misplaced I think).
- Monthly data
I use ex1_2 but change the dates. exp has year 1978 and 1979, while obs only has 1979. First, run the code with sdate = c(1978:1979), then change the sdate to 1979 only. The obs data are not consistent, and the time attribute is not correct either.
# exp
repos_exp <- paste0('/esarchive/exp/ecearth/a1tr/cmorfiles/CMIP/EC-Earth-Consortium/',
'EC-Earth3/historical/r24i1p1f1/Amon/$var$/gr/v20190312/',
'$var$_Amon_EC-Earth3_historical_r24i1p1f1_gr_$sdate$01-$sdate$12.nc')
exp <- Start(dat = repos_exp,
var = 'tas',
sdate = as.character(c(1978:1979)),
time = indices(1:3),
lat = 'all',
lon = 'all',
synonims = list(lat = c('lat', 'latitude'),
lon = c('lon', 'longitude')),
return_vars = list(lon = NULL,
lat = NULL,
time = 'sdate'),
retrieve = FALSE)
lats <- attr(exp, 'Variables')$common$lat
dates <- attr(exp, 'Variables')$common$time
dim(dates)
sdate time
2 3
dates
[1] "1978-01-16 12:00:00 UTC" "1979-01-16 12:00:00 UTC"
[3] "1978-02-15 00:00:00 UTC" "1979-02-15 00:00:00 UTC"
[5] "1978-03-16 12:00:00 UTC" "1979-03-16 12:00:00 UTC"
# obs (1978 and 1979)
repos_obs <- '/esarchive/recon/ecmwf/erainterim/monthly_mean/$var$_f6h/$var$_$date$.nc'
obs <- Start(dat = repos_obs,
var = 'tas',
date = unique(format(dates, '%Y%m')),
time = values(dates),
lat = values(lats),
lon = 'all',
time_across = 'date',
merge_across_dims = TRUE,
split_multiselected_dims = TRUE,
synonims = list(lat = c('lat', 'latitude'),
lon = c('lon', 'longitude')),
return_vars = list(lon = NULL,
lat = NULL,
time = 'date'),
retrieve = TRUE)
# check data
dim(obs) # correct
dat var sdate time lat lon
1 1 2 3 256 512
obs[1, 1, , , 1, 1] # WRONG!!!
[,1] [,2] [,3]
[1,] 250.3432 250.3432 250.3432
[2,] 250.3432 237.9695 NA
# check time attribute
dim(attr(obs, 'Variables')$common$time)
date time
3 1
attr(obs, 'Variables')$common$time # only 1979 but not 1978
[1] "1979-01-31 18:00:00 UTC" "1979-02-28 18:00:00 UTC"
[3] "1979-03-31 18:00:00 UTC"
#-------------------------
# Change sdate to 1979 only and run the above exp and obs again.
# check data
dim(obs) # correct
dat var sdate time lat lon
1 1 1 3 256 512
obs[1, 1, , , 1, 1] # the 3rd value appears. It is NA above.
[1] 250.3432 237.9695 226.5861
# check time attribute
dim(attr(obs, 'Variables')$common$time)
date time
3 1
attr(obs, 'Variables')$common$time
[1] "1979-01-31 18:00:00 UTC" "1979-02-28 18:00:00 UTC"
[3] "1979-03-31 18:00:00 UTC"
- Daily data
I create the time values manually.dates[31, 1]
is NA. In the first Start call, I split the time dimension, and don't split it in the second call.
path_mpi_esm <- paste0('/esarchive/exp/CMIP6/dcppA-hindcast/mpi-esm1-2-hr/',
'cmip6-dcppA-hindcast_i1p1/DCPP/MPI-M/MPI-ESM1-2-HR/',
'dcppA-hindcast/r1i1p1f1/day/$var$/gn/v20200101/',
'$var$_day_MPI-ESM1-2-HR_dcppA-hindcast_s$sdate$-r1i1p1f1_gn_$fyear$.nc')
# Selectors
fyear_mpi_esm <- paste0(sdate, '1101-', as.numeric(sdate) + 10, '1231')
# Create an array for time then split them
dates1 <- paste0('2000-11-', sprintf('%02d', 1:30), ' 12:00:00')
dates2 <- paste0('2000-12-', sprintf('%02d', 1:31), ' 12:00:00')
dates_nonsplit <- c(as.POSIXct(dates1, tz = 'UTC'), NA, as.POSIXct(dates2, tz = 'UTC'))
attr(dates_nonsplit, 'tzone') <- 'UTC'
dates <- dates_nonsplit
dim(dates) <- c(time = 31, month = 2)
# split
data <- Start(dat = path_mpi_esm,
var = 'tasmax',
sdate = '2000',
fyear = '20001101-20101231',
time = values(dates),
time_across = 'sdate', #without this line, an error which I don't understand shows
merge_across_dims = TRUE,
merge_across_dims_narm = FALSE,
split_multiselected_dims = TRUE,
lat = indices(1), lon = indices(1),
#fyear_depends = 'sdate',
return_vars = list(lat = 'dat', lon = 'dat', time = 'sdate'),
retrieve = TRUE)
# check data
dim(data) #correct
dat var fyear time month lat lon
1 1 1 31 2 1 1
data[1, 1, 1, , , 1, 1] #WRONG!!! The 1st and last are identical. No NA at 31st
# check time attribute
dim(attr(data, 'Variables')$common$time) #Should it be [sdate = 2, time = 31]?
sdate time
1 61
# non-split
data_nonsplit <- Start(dat = path_mpi_esm,
var = 'tasmax',
sdate = '2000',
fyear = '20001101-20101231',
time = values(dates_nonsplit),
time_across = 'sdate', #without this line, an error shows which is out-of-date
split_multiselected_dims = TRUE,
lat = indices(1), lon = indices(1),
#fyear_depends = 'sdate',
return_vars = list(lat = 'dat', lon = 'dat', time = 'sdate'),
retrieve = TRUE)
# check data
dim(data_nonsplit) #correct
dat var sdate fyear time lat lon
1 1 1 1 61 1 1
data_nonsplit[1, 1, 1, , , 1, 1] # use it to compare to the split one
# check time attribute
dim(attr(data, 'Variables')$common$time) #correct
sdate time
1 61
I cannot identify where the problems are now. 'split_multiselected_dims' seems to work incorrectly when selectors have NAs. 'merge_across_dims' can be removed from the daily data case and the results are the same, so I guess 'merge_across_dims' has no/less problems.
If you don't face related trouble, I can work on this issue later. Please let me know if you have any idea, thanks!
Cheers,
An-Chi