Start: Problem of time_tolerance and time attributes
The issue was detected with the case in #85 (closed). The case uses the time attributes of exp to retrieve obs data. However, the time attributes of exp and obs are different. For example, the exp time is 2005-05-16 12:00:00 UTC
while the corresponding obs time is 2005-05-01 UTC
; the exp time is 2005-06-16 00:00:00 UTC
while the corresponding obs time is 2005-06-01 UTC
. The difference is 15 days or 15.5 days.
The parameter 'time_tolerance' can be used to loosen the standard of matching time values. However, even 'time_tolerance' is set to be big enough, Start() doesn't retrieve all the files and later returns the error message: Error in while (indices_chunk[i + 1] == indices_chunk[i] & i < length(indices_chunk)) { : missing value where TRUE/FALSE needed
The error may due to the inconsistency between the expected dimension length and the number of actual found files. (Part 1)
Another possible reason for incomplete file finding is the wrong time attributes retrieval of obs. If obs is read independently from exp (i.e., exp time values are not used in the obs call), The returned data is correct but the time metadata is wrong. For example, the first time should be 2005-05-01 UTC
but it becomes 2005-08-28 12:00:00 UTC
. (Part 2)
library(startR)
# ATL
lonmin <- -80
lonmax <- 50
latmin <- -60
latmax <- 50
# exp
repos_exp <- paste0('/esarchive/scratch/eexarcho/Eleftheria/TRIATLAS/Analysis_a33d_a33e_a33f/',
'Data/a33d/tos/',
'tos_Amon_EC-Earth3-CC_historical_S$sdate$_$member$_gr_$chunk$.nc'
)
sdates <- paste0(c(2005:2006), '0501')
exp <- Start(dat = repos_exp,
var = 'tos',
member = 'all',
sdate = sdates,
chunk = 'all',
# time = 'all',
time = indices(1:12), #first time step per day
chunk_depends = 'sdate',
time_across = 'chunk',
merge_across_dims = TRUE,
lat = values(list(latmin, latmax)),
lat_reorder = Sort(decreasing = T),
lon = values(list(lonmin, lonmax)),
lon_reorder = CircularSort(0, 360),
transform = CDORemapper,
transform_extra_cells = 2,
transform_params = list(grid = 'r360x180',
method = 'conservative',
crop = c(lonmin, lonmax, latmin, latmax)),
transform_vars = c('lat', 'lon'),
synonims = list(lat = c('lat', 'latitude'),
lon = c('lon', 'longitude')),
return_vars = list(lon = 'dat',
lat = 'dat',
time = 'sdate'),
retrieve = T)
lons <- attr(exp, 'Variables')$common$tos$dim[[1]]$vals
lats <- attr(exp, 'Variables')$common$tos$dim[[2]]$vals
dates <- attr(exp, 'Variables')$common$time
dim(dates)
#sdate time
# 2 12
#================Part 1========================
dates_file <- sort(unique(gsub('-', '', sapply(as.character(dates), substr, 1, 7))))
repos_obs <- '/esarchive/obs/ukmo/hadisst_v1.1/monthly_mean/$var$/$var$_$date$.nc'
obs <- Start(dat = repos_obs,
var = 'tos',
date = dates_file,
time = values(dates), #dim: [sdate = 2, time = 12]
lat = values(lats),
lon = values(lons),
time_var = 'time',
# PROBLEM!!! Cannot find a proper value for time_tolerance
time_tolerance = as.difftime(372, units = 'hours'),
#time values are across all the files
time_across = 'date',
merge_across_dims = TRUE,
merge_across_dims_narm = TRUE,
split_multiselected_dims = TRUE,
synonims = list(lat = c('lat', 'latitude'),
lon = c('lon', 'longitude')),
return_vars = list(latitude = NULL,
longitude = NULL,
time = 'date'),
retrieve = TRUE)
#Error in while (indices_chunk[i + 1] == indices_chunk[i] & i < length(indices_chunk)) { :
# missing value where TRUE/FALSE needed
#================Part 2====================
obs <- Start(dat = repos_obs,
var = 'tos',
date = dates_file,
time = 'all',
lat = values(lats),
lon = values(lons),
time_across = 'date',
#combine time and file_date dims
merge_across_dims = TRUE,
#exclude the additional NAs generated by merge_across_dims
merge_across_dims_narm = TRUE,
synonims = list(lat = c('lat', 'latitude'),
lon = c('lon', 'longitude')),
return_vars = list(latitude = 'dat',
longitude = 'dat',
time = 'date'),
retrieve = TRUE)
attr(obs, 'Variables')$common$time #WRONG!!!!
[1] "2005-08-28 12:00:00 UTC" "2005-09-28 00:00:00 UTC"
[3] "2005-10-28 12:00:00 UTC" "2005-11-28 00:00:00 UTC"
[5] "2005-12-28 12:00:00 UTC" "2006-01-28 00:00:00 UTC"
[7] "2006-02-27 12:00:00 UTC" "2006-03-30 00:00:00 UTC"
[9] "2006-04-29 12:00:00 UTC" "2006-05-30 00:00:00 UTC"
[11] "2006-06-29 12:00:00 UTC" "2006-07-30 00:00:00 UTC"
[13] "2006-08-29 12:00:00 UTC" "2006-09-29 00:00:00 UTC"
[15] "2006-10-29 12:00:00 UTC" "2006-11-29 00:00:00 UTC"
[17] "2006-12-29 12:00:00 UTC" "2007-01-29 00:00:00 UTC"
[19] "2007-02-28 12:00:00 UTC" "2007-03-31 00:00:00 UTC"
[21] "2007-04-30 12:00:00 UTC" "2007-05-31 00:00:00 UTC"
[23] "2007-06-30 12:00:00 UTC" "2007-07-31 00:00:00 UTC"
@nperez I tag you to keep you in the loop.