Split inner dimension while loading data
Following our discussion about splitting the time dimension into two by Start(), I explored a bit the different usages and I'd like to make a summary here. We only tried to load one file and managed to split time dim into c(week, day), but if we want to load more than one file (e.g., year = c("2015", "2016") & month = c("06", "07")), the Start call can't work well. Fortunately, we have other possible ways to make it. I don't want to overwhelm you right now, but when you need it, you can go through the scripts and resources and we can have further discussion.
We load one data first, without reshaping. We will compare the reshaped results with it.
path1 <- "/esarchive/recon/ecmwf/era5/daily_mean/$var$_f1h/$var$_$year$$month$.nc"
variable <- "prlr"
# Without reshaping
data1 <- Start(dat = path1,
var = variable,
year = c('2015'), month = c('06', '07'),
time = 'all',
latitude = values(list(0, 6)), latitude_reorder = Sort(decreasing = TRUE),
longitude = values(list(0, 5)), longitude_reorder = CircularSort(0, 360),
synonims = list(latitude = c('lat', 'latitude'), longitude = c('lon', 'longitude')),
return_vars = list(latitude = 'dat', longitude = 'dat',
time = c('year', 'month')),
retrieve = TRUE)
dim(data1)
# dat var year month time latitude longitude
# 1 1 1 2 30 21 18
time1 <- attr(data1, 'Variables')$common$time
dim(time1)
# year month time
# 1 2 30
[Method 1: time selector is an array of indices; split]
I said that the array must be time values, but I was wrong. It could be indices as well (thanks for this use case, I didn't know Start() could work like this!)
time_arr_ind <- array(1:30, dim = c(day = 10, week = 3))
data3 <- Start(dat = path1,
var = variable,
year = c('2015'), month = c('06', '07'),
time = indices(time_arr_ind), # [day, week]
latitude = values(list(0, 6)), latitude_reorder = Sort(decreasing = TRUE),
longitude = values(list(0, 5)), longitude_reorder = CircularSort(0, 360),
synonims = list(latitude = c('lat', 'latitude'), longitude = c('lon', 'longitude')),
return_vars = list(latitude = 'dat', longitude = 'dat',
time = c('year', 'month')),
split_multiselected_dims = TRUE, #*reshape
retrieve = TRUE)
dim(data3)
# dat var year month day week latitude longitude
# 1 1 1 2 10 3 21 18
time3 <- attr(data3, 'Variables')$common$time
dim(time3)
# year month day week
# 1 2 10 3
identical(as.vector(data1), as.vector(data3))
#[1] TRUE
[Method 2: time selector is an array of value; merge & split]
merge_across_dims
and split_multiselected_dims
are used to reshape the data. This usage is more complicated but useful to load exp and obs with a consistent structure (see usecase 1_7). Notice that year
and month
need to combine because param time_across
can only have one.
## Use time1 as the following time selector
time_arr <- array(time1, dim = c(yr_m = 2, time = 10, week = 3))
time_arr <- as.POSIXct(time_arr, origin = '1970-01-01', tz = 'UTC')
path2 <- "/esarchive/recon/ecmwf/era5/daily_mean/$var$_f1h/$var$_$dates$.nc" # use $dates$ instead of $year$$month$
data2 <- Start(dat = path2,
var = variable,
dates = c('201506', '201507'),
time = time_arr, #[yr_m, time, week] # must have 'time' dim
time_across = 'dates', #*reshape
merge_across_dims = TRUE, #*reshape
split_multiselected_dims = TRUE, #*reshape
latitude = values(list(0, 6)), latitude_reorder = Sort(decreasing = TRUE),
longitude = values(list(0, 5)), longitude_reorder = CircularSort(0, 360),
synonims = list(latitude = c('lat', 'latitude'), longitude = c('lon', 'longitude')),
return_vars = list(latitude = 'dat', longitude = 'dat',
time = c('dates')),
retrieve = TRUE)
dim(data2)
# dat var yr_m time week latitude longitude
# 1 1 2 10 3 21 18
time2 <- attr(data2, 'Variables')$common$time
dim(time2)
#yr_m time week
# 2 10 3
identical(as.vector(data1), as.vector(data2))
#[1] TRUE
I'm going to create a use case to show the first method, then I'll close this issue. Let me know if you want to know more at some point.
Best,
An-Chi