Clim(): Incorrect results when NA exists
According to the description of Clim():
Clim() computes climatologies using the startdates covered by the whole experiments/observational data sets. The startdates not available for all the data (model and obs) are excluded when computing the climatologies.
From the sentence above, we expect that:
- If one sdate dim contains NA, this sdate is excluded from calculation (e.g., 1st sdate has NA, the climatology is calculated by sdate[2:end])
- When there is a NA in obs, the corresponding exp point should be removed (to be NA) too.
However, Clim() doesn't always return the expected results.
Problematic example:
# Complete data (without NA)
set.seed(1)
exp <- array(rnorm(60), dim = c(dataset = 1, member = 3, sdate = 5, ftime = 2, lon = 2))
set.seed(2)
obs <- array(rnorm(40), dim = c(dataset = 1, member = 2, sdate = 5, ftime = 2, lon = 2))
res <- Clim(exp, obs)
summary(res$clim_exp)
# Min. 1st Qu. Median Mean 3rd Qu. Max.
#-0.22940 -0.00555 0.05701 0.10760 0.11770 0.82740
summary(res$clim_obs)
# Min. 1st Qu. Median Mean 3rd Qu. Max.
#-0.40080 -0.35150 -0.24380 0.09759 0.68040 0.99100
#--------------------------------
# Same data but change the first obs to NA
obs[1] <- NA
res <- Clim(exp, obs)
summary(res$clim_exp) #SAME as above
# Min. 1st Qu. Median Mean 3rd Qu. Max.
#-0.22940 -0.00555 0.05701 0.10760 0.11770 0.82740
summary(res$clim_obs) #DIFFERENT as above
# Min. 1st Qu. Median Mean 3rd Qu. Max.
#-0.4008 -0.3515 -0.2438 0.1463 0.8025 1.0500
The second result shows that the clim_exp doesn't change while clim_obs changes due to NA exists.
The following example shows that, the clim_exp also changes when obs has NA. The only difference between the above and below examples is that the above example contains one more dimension 'lon'.
# Complete data (without NA)
set.seed(1)
exp <- array(rnorm(30), dim = c(dataset = 1, member = 3, sdate = 5, ftime = 2))
set.seed(2)
obs <- array(rnorm(20), dim = c(dataset = 1, member = 2, sdate = 5, ftime = 2))
res <- Clim(exp, obs)
summary(res$clim_exp)
# Min. 1st Qu. Median Mean 3rd Qu. Max.
#0.02360 0.04206 0.09641 0.08246 0.10880 0.14150
summary(res$clim_obs)
# Min. 1st Qu. Median Mean 3rd Qu. Max.
#-0.3802 -0.2738 0.2112 0.1955 0.6804 0.7397
#--------------------------------
# Same data but change the first obs to NA
obs[1] <- NA
res <- Clim(exp, obs)
summary(res$clim_exp)
# Min. 1st Qu. Median Mean 3rd Qu. Max.
#0.02360 0.04315 0.11640 0.14830 0.25210 0.31750
summary(res$clim_obs)
# Min. 1st Qu. Median Mean 3rd Qu. Max.
#-0.3802 -0.3531 0.1978 0.2664 0.8173 1.0500
This issue is related to #243 (closed) . We need to clarify what the best way is to deal with NA in Clim().