Missing data is repeated when Start tries to read files that do not exist
Hi @aho,
Summary
While substituting CST_Load by CST_Start in CSTools package vignettes I found something unexpected. The problem has happened when Start() has tried to read files that doesn't exist, the data gets repeated. The code is in MostLikelyTercile_vignette. It's unexpected because with the same code using CST_Load, the missing data is returned as NA. The vignette with CST_Start is in the following branch: develop-vignettes_CST_Start
Example
Here below I leave the piece of code to reproduce the error:
library(CSTools)
library(s2dv)
library(zeallot)
library(startR)
lat_min = 25
lat_max = 35
lon_min = -10
lon_max = 10
dates0 <- c(paste0(2015:2020,c(rep("0630", 2), rep("0615",2),rep("0630",2))),
paste0(2015:2020,c(rep("0731", 2), rep("0716",2),rep("0731",2))),
paste0(2015:2020,c(rep("0831", 2), rep("0816",2),rep("0831",2))))
dates0 <- as.POSIXct(dates0, format = "%Y%m%d", "UTC")
dim(dates0) <- c(sdate = 6, ftime = 3)
repos_obs <- paste0('/esarchive/recon/ecmwf/erainterim/monthly_mean/',
'$var$/$var$_$date$.nc')
obs <- Start(dataset = repos_obs,
var = 'tas',
date = unique(format(dates0, '%Y%m')),
ftime = values(dates0),
ftime_across = 'date',
ftime_var = 'ftime',
merge_across_dims = TRUE,
split_multiselected_dims = TRUE,
lat = values(list(lat_min, lat_max)),
lat_reorder = Sort(decreasing = TRUE),
lon = values(list(lon_min, lon_max)),
lon_reorder = CircularSort(0, 360),
synonims = list(lon = c('lon', 'longitude'),
lat = c('lat', 'latitude'),
ftime = c('ftime', 'time')),
return_vars = list(lon = NULL,
lat = NULL,
ftime = 'date'),
retrieve = TRUE)
The error messages are the following:
* Exploring files... This will take a variable amount of time depending
* on the issued request and the performance of the file server...
Error in R_nc4_open: No such file or directory
Error in R_nc4_open: No such file or directory
Error in R_nc4_open: No such file or directory
Error in R_nc4_open: No such file or directory
Error in R_nc4_open: No such file or directory
Error in R_nc4_open: No such file or directory
* Detected dimension sizes:
* dataset: 1
* var: 1
* sdate: 6
* ftime: 3
* lat: 14
* lon: 29
* Total size of requested data:
* 1 x 1 x 6 x 3 x 14 x 29 x 8 bytes = 57.1 Kb
* If the size of the requested data is close to or above the free shared
* RAM memory, R may crash.
* If the size of the requested data is close to or above the half of the
* free RAM memory, R may crash.
* Will now proceed to read and process 11 data files:
* /esarchive/recon/ecmwf/erainterim/monthly_mean/tas/tas_201506.nc
* /esarchive/recon/ecmwf/erainterim/monthly_mean/tas/tas_201606.nc
* /esarchive/recon/ecmwf/erainterim/monthly_mean/tas/tas_201706.nc
* /esarchive/recon/ecmwf/erainterim/monthly_mean/tas/tas_201806.nc
* /esarchive/recon/ecmwf/erainterim/monthly_mean/tas/tas_201507.nc
* /esarchive/recon/ecmwf/erainterim/monthly_mean/tas/tas_201607.nc
* /esarchive/recon/ecmwf/erainterim/monthly_mean/tas/tas_201707.nc
* /esarchive/recon/ecmwf/erainterim/monthly_mean/tas/tas_201807.nc
* /esarchive/recon/ecmwf/erainterim/monthly_mean/tas/tas_201508.nc
* /esarchive/recon/ecmwf/erainterim/monthly_mean/tas/tas_201608.nc
* /esarchive/recon/ecmwf/erainterim/monthly_mean/tas/tas_201808.nc
* Loading... This may take several minutes...
* Progress: 0% + 10% + 10% + 10% + 10% + 10% + 10% + 10% + 10% + 10% + 10%
* Successfully retrieved data.
Warning messages:
1: ! Warning: Parameter 'pattern_dims' not specified. Taking the first dimension,
! 'dataset' as 'pattern_dims'.
2: ! Warning: Could not find any pattern dim with explicit data set descriptions (in
! the form of list of lists). Taking the first pattern dim, 'dataset',
! as dimension with pattern specifications.
3: ! Warning: Found specified values for dimension 'lat' but no 'lat_var' requested.
! "lat_var = 'lat'" has been automatically added to the Start call.
4: ! Warning: Found specified values for dimension 'lon' but no 'lon_var' requested.
! "lon_var = 'lon'" has been automatically added to the Start call.
5: ! Warning: Date selectors have been provided for a dimension defined along a date
! variable, but no exact match found for all the selectors. Taking the
! index of the nearest values.
Even if from the message I see that some files are missing, no NAs are added in the data array:
> summary(obs)
Min. 1st Qu. Median Mean 3rd Qu. Max.
292.0 302.0 306.3 305.1 308.5 315.2
I see that some dates are repeated. Specifically, there are repeated dates for the year 2017, the date "2017-06-30" appears twice. Also, for the years 2019 and 2020 we don't see them and the year 2018 is repeated on its place.
> dateso <- attributes(obs)$Variable$common$ftime
> dateso
[1] "2015-06-30 18:00:00 UTC" "2016-06-30 18:00:00 UTC"
[3] "2017-06-30 18:00:00 UTC" "2018-06-30 18:00:00 UTC"
[5] "2018-08-31 18:00:00 UTC" "2018-08-31 18:00:00 UTC"
[7] "2015-07-31 18:00:00 UTC" "2016-07-31 18:00:00 UTC"
[9] "2017-06-30 18:00:00 UTC" "2018-06-30 18:00:00 UTC"
[11] "2018-08-31 18:00:00 UTC" "2018-08-31 18:00:00 UTC"
[13] "2015-08-31 18:00:00 UTC" "2016-08-31 18:00:00 UTC"
[15] "2017-07-31 18:00:00 UTC" "2018-07-31 18:00:00 UTC"
[17] "2018-08-31 18:00:00 UTC" "2018-08-31 18:00:00 UTC"
The data array matches with the repeated dates in the sense that teh data repeated in the same place:
> obs[,,,,1,1]
[,1] [,2] [,3]
[1,] 294.8316 301.7520 300.2737
[2,] 297.0100 301.4115 299.4274
[3,] 298.9523 298.9523 302.0528
[4,] 295.2444 295.2444 300.4945
[5,] 298.9373 298.9373 298.9373
[6,] 298.9373 298.9373 298.9373
Module and Package Version
R version 4.1.2 (2021-11-01)
startR_2.3.0
I think that as long the dates match with the data (they are both repeated in the same places) it's not an error. But how we could get NAs for missing files for this case as Load was doing? It can also be the case that I am not calling Start in a good way.
Eva