CST_PeriodStandardization (2 issues: dates with ref_period and NA's issue)
(This is a template to report errors and bugs. Please fill in the relevant information and delete the rest.)
Hi @tkariyat and @abatalla (not sure if the issue should be addressed to you, please tag anyone else if needed),
R and packages version
Which R version are you using? R 4.1.2
Which R packages versions are you using? CSIndicators_1.1.1
Which machine are you using? WS
Summary
Bug: there are 2 different bugs:
- first, an issue with the parameter ref_period, when using using it I get a warning that it is not used because parameter dates is not provided (but I do provide it in the s2dv_cube attrs$Dates, I provide an example below)
- second issue, regardless of ref_period, when using handle_infinity = TRUE I would not expect NA's anywhere that the original data didn't have NA's, however I do get NA's. The sample data that I'm providing (see "other relevant information" beloww) has NA's in time 1 and 2, but nowhere else; the result has NA's in other places, it seems that the leadtime information (dimension "time") is being somehow mixed in the calculation and I don't think this should happen, leadtimes should alwasy be independent.
Example
load('./sample_data_spei.RData'))
# Issue1 about Dates:
test <- CST_PeriodStandardization(data = sample_data_spei, handle_infinity = TRUE, ref_period = list(1994,2016))
#Warning message:
#In PeriodStandardization(data = data$data, data_cor = data_cor$data, :
# Parameter 'dates' is not provided so 'ref_period' can't be used.
class(sample_data_spei$attrs$Dates)
#[1] "POSIXct" "POSIXt"
# Issue2 mixing leadtimes and including NA from first leadtimes in the final result (other leadtimes):
summary(sample_data_spei$data)
# Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
#-469.79 -8.67 89.62 84.83 170.46 1695.77 69414
dim(sample_data_spei$data)
#latitude syear time ensemble
# 1509 23 8 1
summary(sample_data_spei$data[,,3:8,])
# Min. 1st Qu. Median Mean 3rd Qu. Max.
#-469.790 -8.666 89.616 84.831 170.464 1695.765
# NAs come from leadtimes 1 and 2
summary(test$data)
# Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
# -8.56 -0.78 0.01 0.00 0.76 7.13 69629
dim(test$data)
#latitude syear time ensemble
# 1509 23 8 1
summary(test$data[,,3:8,])
# Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
#-8.56118 -0.77889 0.01000 0.00384 0.76391 7.12988 215
# I wouldn't expect NA's here, as handle_infinity = TRUE and the 2 first leadtimes (where the original data had NA's) have been removed
Other Relevant Information
The coordinates of the data are only latitude because the data has been aggregated by regions and the latitude is the latitude of the centroid of the region (that I need to calculate SPEI)
FYI @eball