Wrong time attribute of daily data loading with leap year
The following script loads daily data and contains the leap year. The way to define the time selector is using an array that has dimensions [time, file_date], which contain the corresponding time value to each file_date (i.e., "19941101", "19951101", "19961101"). The time array doesn't contain 19960229, and the data itself are correctly loaded without 19960229. However, the time attriute doesn't adjust along..
library(lubridate)
tt <-seq(ymd_hms("1994-12-01 18:00:00", tz = "UTC"), ymd_hms("1995-03-01 18:00:00", tz = "UTC"),
by = 'days')
tt <- c(tt, tt + years(1), tt + years(2))
time_array <- array(tt, dim = c(time = 91, file_date = 3))
time_array <- as.POSIXct(time_array, origin = '1970-01-01', tz = 'UTC')
hcst <- Start(dat = "/esarchive/exp/ecmwf/system5c3s/daily_mean/$var$_f6h/$var$_$file_date$.nc",
var = "tas",
file_date = paste0(1994:1996, '1101'), #1996 is leap year
time = time_array, #[time = 91, file_date = 3]
latitude = indices(1), longitude = indices(1), ensemble = indices(1),
return_vars = list(latitude = 'dat', longitude = 'dat', time = 'file_date'),
retrieve = TRUE)
dim(drop(hcst))
#file_date time
# 3 91
time_attr <- attr(hcst, 'Variables')$common$time
dim(time_attr)
#file_date time
# 3 92 #WRONG!! Should be 91
#It is 92 because file_date = 2 contains 1996-02-29, which should not be included
time_attr[1,91:92]
[1] "1995-03-01 18:00:00 UTC" "1995-03-02 18:00:00 UTC"
time_attr[2,91:92]
[1] "1996-02-29 18:00:00 UTC" "1996-03-01 18:00:00 UTC"
Ps.,
There is another way to get the same data array using reshaping parameters. The following script loads the same data with correct time attributes (with the current master branch that contains attribute reshaping development.) The dimension name of time array needs to be changed due to issue #100. Need to add some checks to distinguish these two usages: (1) If reshaping parameters are used, time array cannot have the dimension named as other file dims (2) If not, time array should have dimension names corresponding to file dims.
time_array2 <- time_array
names(dim(time_array2)) <- c('time', 'sdate')
hcst_reshape <- Start(dat = "/esarchive/exp/ecmwf/system5c3s/daily_mean/$var$_f6h/$var$_$file_date$.nc",
var = 'tas',
file_date = paste0(1994:1996, '1101'), #1996 is leap year
time = time_array2, #[time = 91, sdate = 3]
time_across = 'file_date',
merge_across_dims = TRUE,
split_multiselected_dims = TRUE,
latitude = indices(1), longitude = indices(1), ensemble = indices(1),
return_vars = list(latitude = 'dat', longitude = 'dat', time = 'file_date'),
retrieve = TRUE)
dim(drop(hcst_reshape))
# time sdate
# 91 3
time_attr_reshape <- attr(hcst_reshape, 'Variables')$common$time
dim(time_attr_reshape)
# time sdate
# 91 3
# Check if data is the same
all.equal(as.vector(drop(hcst)), as.vector(aperm(drop(hcst_reshape), 2:1)))
#[1] TRUE
# Check if the time attribute is correct
all(format(time_array2, '%Y%m%d') == format(attr(hcst_reshape, 'Variables')$common$time, '%Y%m%d'))
[1] TRUE
The difference between these two usages is, the 1st one needs to have time array to be corresponding to file_date. That is, time_array[time, file_date = 1] is corresponding to file_date "19941101"'s time. On the other hand, the 2nd method doesn't require this correspondence. As long as all the time values are in those file_date, startR can find the data and do the reshaping. An example of the 1st method is https://earth.bsc.es/gitlab/es/startR/-/blob/master/inst/doc/usecase/ex1_13_implicit_dependency.R, and of the 2nd method is https://earth.bsc.es/gitlab/es/startR/-/blob/master/inst/doc/usecase/ex1_2_exp_obs_attr.R