New RandomWalkTest() function

Hi @nperez and @aho,

As we were speaking, we could add the RandomWalkTest() function to the s2dv package. The code is based on @amanriqu's code and it has been adapted to the format of s2dv (added documentation, example, sanity checks and it is parallelized with multiApply::Apply()).

The function compares 2 forecasts to know if one is significantly better than the other one. It returns the score (number of times that forecaster A has been better than forecaster B minus the number of times that forecaster B has been better than forecaster A) and whether this difference is significant or not at the 5% significance level.

This is the code:


#'Random walk test for skill differences
#'
#'@description Forecast comparison of 2 forecasts based on Random Walks (given skill of the 2 forecasts with respect to a common reference), 
#'with significance estimate at the 5% confidence level, as in DelSole and Tippett (2015).
#'
#'@author Andrea Manrique \email{andrea.manrique@bsc.es}
#'@author Carlos Delgado-Torres \email{carlos.delgado@bsc.es}
#'
#'@param skill_A Time serie of the skill with the forecaster A's.
#'@param skill_B Time serie of the skill with the forecaster B's.
#'@param time_dim A character string indicating the name of the dimension along which the tests are computed. The default value is 'sdate'.
#'@param ncores An integer indicating the number of cores to use for parallel computation. The default value is NULL.
#'
#'@return A list of 2:
#'\item{$score}{
#'The number of times that forecaster A has been better than forecaster B minus 
#'the number of times that forecaster B has been better than forecaster A (for skill positively oriented).
#'If $score is positive forecaster A is better than forecaster B, and if $score is negative forecaster B is better than forecaster B.
#'}
#'\item{$signif}{
#'Whether the difference is significant or not at the 5% significance level.
#'}
#'
#'@examples
#' dat1 <- array(c(1:40), dim = c(sdate = 10, lat = 2, lon = 2))
#' dat2 <- array(c(11:50), dim = c(sdate = 10, lat = 2, lon = 2))
#' print(RandomWalkTest(skill_A = dat1, skill_B = dat2, time_dim = 'sdate', ncores = 1))
#'
#'@import multiApply
#'@export
RandomWalkTest <- function(skill_A, skill_B, time_dim = 'sdate', ncores = NULL){
  
  ## Check inputs
  if (is.null(skill_A) | is.null(skill_B)){
    stop("Parameters 'skill_A' and 'skill_B' cannot be NULL.")
  }
  if(!is.numeric(skill_A) | !is.numeric(skill_B)){
    stop("Parameter 'skill_A' and 'skill_B' must be a numerical array.")
  }
  if (!identical(dim(skill_A),dim(skill_B))){
    stop("Parameters 'skill_A' and 'skill_B' must have the same dimensions.")
  }
  if(!is.character(time_dim)){
    stop("Parameter 'time_dim' must be a character string.")
  }
  if(!time_dim %in% names(dim(skill_A)) | !time_dim %in% names(dim(skill_B))){
    stop("Parameter 'time_dim' is not found in 'skill_A' or 'skill_B' dimensions.")
  }
  if (!is.null(ncores)){
    if (!is.numeric(ncores) | ncores %% 1 != 0 | ncores < 0 | length(ncores) > 1){
      stop("Parameter 'ncores' must be a positive integer.")
    }
  }
  
  ## Compute the Random Walk Test
  res <- multiApply::Apply(data = list(skill_A, skill_B), 
                           target_dims = time_dim,
                           fun = .RandomWalkTest, 
                           ncores = ncores)
  return(res)
}

.RandomWalkTest <- function(skill_A, skill_B){
  
   score <- cumsum(skill_A > skill_B) - cumsum(skill_A < skill_B)
   
   ## TRUE if significant (if last value is above or below 2*sqrt(N))
   signif<- ifelse(test = (score[length(skill_A)] <  (-2)*sqrt(length(skill_A))) | (score[length(skill_A)] >  2*sqrt(length(skill_A))), 
                   yes = TRUE, no = FALSE)
   
   return(list("score"=score[length(skill_A)],"signif"=signif))
}

Please tell me if anything is unclear or you need more information.

Best regards,

Carlos