Loading: decadal multi-path issue with Compute()
I opened this issue to record the remaining issues about multiple paths in the decadal loading script In the script, the parameter multi_path
is for the situation that hcst or fcst requires sdates from both dcppA and dcppB. For example, load hcst "EC-Earth3-i4" from some year (dcppA) to 2021 (dcppB). Of course, no development is urgently needed since these are not new findings.
Current Behavior
At line 149
hcst <- do.call(Start, Start_hcst_arg_list)
hcst is loaded. It doesn't have correct dimensions and time attributes because to load all the sdates together, we need to give one path for each sdate (with dcppA or dcppB), so dimension "dat" is actually "sdate" at this point. In the following lines, the dimensions are correct.
# Reshape and reorder dimensions
## dat should be 1, syear should be length of dat; reorder dimensions
dim(hcst) <- c(dat = 1, syear = as.numeric(dim(hcst))[1], dim(hcst)[2:6])
hcst <- s2dv::Reorder(hcst, c('dat', 'var', 'syear', 'time', 'latitude', 'longitude', 'ensemble'))
After that, the time attributes are corrected. The times we get from Start() are only for the first syear, and we expand it to the full [syear, time].
# Manipulate time attr because Start() cannot read it correctly
wrong_time_attr <- attr(hcst, 'Variables')$common$time # dim: [time], the first syear only
tmp <- array(dim = c(dim(hcst)[c('syear', 'time')]))
tmp[1, ] <- wrong_time_attr
yr_diff <- (sdates_hcst - sdates_hcst[1])[-1] #diff(sdates_hcst)
for (i_syear in 1:length(yr_diff)) {
tmp[(i_syear + 1), ] <- wrong_time_attr + lubridate::years(yr_diff[i_syear])
}
attr(hcst, 'Variables')$common$time <- as.POSIXct(tmp, origin = '1970-01-01', tz = 'UTC')
Same situation for fcst, following hcst loading.
Expected Behavior & Possible Solutions
This workaround works for retrieve = TRUE
case. But for Compute() case, it will be quite painful to adjust the data at the beginning of the self-defined function (not impossible, but painful for sure). The ideal solution is to find a way to specify the dependency between "path" and "sdate" so the output of Start() is correct directly. Here is a reference:
https://earth.bsc.es/gitlab/es/startR/-/blob/master/inst/doc/faq.md#8-define-a-path-with-multiple-dependencies. I'm sure I have tried to find a way but I didn't make it and thought some modification in startR is needed.
Steps To Reproduce
You can use unit test recipe-decadal_monthly_1.yml and change the sdates to this:
Time:
fcst_year: #2021
hcst_start: 1991
hcst_end: 2021
- Script:
source("modules/Loading/Loading.R")
recipe_file <- "tests/recipes/recipe-decadal_monthly_1.yml"
recipe <- prepare_outputs(recipe_file)
archive <- read_yaml(paste0(recipe$Run$code_dir, "conf/archive_decadal.yml"))$archive
data <- Loading(recipe)
-
Branch/SUNSET Version: master
-
Environment: Anywhere
Other Relevant Information
Another missing functionality in multi-path at line 98:
#TODO: to make this case work; enhance Start() if it's possible
if (multi_path & length(variable) > 1) {
stop("The recipe requests multiple variables and start dates from both dpccA-hindcast and dcppB-forecast. This case is not available for now.")
}
Don't hesitate to let me know if you have questions now or in the future!
Best,
An-Chi