Skip to content
GitLab
Projects Groups Topics Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
  • Sign in
  • S s2dverification
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributor statistics
    • Graph
    • Compare revisions
  • Issues 65
    • Issues 65
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 7
    • Merge requests 7
  • Deployments
    • Deployments
    • Releases
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Commits
  • Issue Boards
Collapse sidebar
  • Earth SciencesEarth Sciences
  • s2dverification
  • Issues
  • #245
Closed
Open
Issue created Jan 22, 2020 by aho@ahoMaintainer

Clim(): Incorrect results when NA exists

According to the description of Clim():

Clim() computes climatologies using the startdates covered by the whole experiments/observational data sets. The startdates not available for all the data (model and obs) are excluded when computing the climatologies.

From the sentence above, we expect that:

  1. If one sdate dim contains NA, this sdate is excluded from calculation (e.g., 1st sdate has NA, the climatology is calculated by sdate[2:end])
  2. When there is a NA in obs, the corresponding exp point should be removed (to be NA) too.

However, Clim() doesn't always return the expected results.
Problematic example:

# Complete data (without NA)
set.seed(1)
exp <- array(rnorm(60), dim = c(dataset = 1, member = 3, sdate = 5, ftime = 2, lon = 2))
set.seed(2)
obs <- array(rnorm(40),  dim = c(dataset = 1, member = 2, sdate = 5, ftime = 2, lon = 2))

res <- Clim(exp, obs)
summary(res$clim_exp)
#    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
#-0.22940 -0.00555  0.05701  0.10760  0.11770  0.82740 
summary(res$clim_obs)
#    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
#-0.40080 -0.35150 -0.24380  0.09759  0.68040  0.99100 

#--------------------------------
# Same data but change the first obs to NA
obs[1] <- NA

res <- Clim(exp, obs)
summary(res$clim_exp)  #SAME as above
#    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
#-0.22940 -0.00555  0.05701  0.10760  0.11770  0.82740 
summary(res$clim_obs)  #DIFFERENT as above
#   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#-0.4008 -0.3515 -0.2438  0.1463  0.8025  1.0500 

The second result shows that the clim_exp doesn't change while clim_obs changes due to NA exists.

The following example shows that, the clim_exp also changes when obs has NA. The only difference between the above and below examples is that the above example contains one more dimension 'lon'.

# Complete data (without NA)
set.seed(1)
exp <- array(rnorm(30), dim = c(dataset = 1, member = 3, sdate = 5, ftime = 2))
set.seed(2)
obs <- array(rnorm(20),  dim = c(dataset = 1, member = 2, sdate = 5, ftime = 2))

res <- Clim(exp, obs)
summary(res$clim_exp)
#   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#0.02360 0.04206 0.09641 0.08246 0.10880 0.14150 
summary(res$clim_obs)
#   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#-0.3802 -0.2738  0.2112  0.1955  0.6804  0.7397 

#--------------------------------
# Same data but change the first obs to NA
obs[1] <- NA

res <- Clim(exp, obs)
summary(res$clim_exp)
#   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#0.02360 0.04315 0.11640 0.14830 0.25210 0.31750 
summary(res$clim_obs)
#   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#-0.3802 -0.3531  0.1978  0.2664  0.8173  1.0500 

This issue is related to #243 (closed) . We need to clarify what the best way is to deal with NA in Clim().

Assignee
Assign to
Time tracking