Comparison of BrierScore() and RPSS() ouputs with binary probabilities

Hi @vagudets and @nperez,

This issue is to share a test I've done comparing the outputs of the BrierScore() and RPSS() functions using randomly generated data with two categories.

#### Comparing BrierScore() with RPSS() ####

library(devtools); devtools::load_all()

# Generate random forecasts and binary obs
set.seed(123)
exp_probs <- runif(100)  # forecast probabilities
obs_binary <- rbinom(100, 1, exp_probs)

# ---- BrierScore --------------------------------------------------------------
brier_res <- BrierScore(exp_probs, obs_binary, thresholds = 0.5)

# ---- RPSS --------------------------------------------------------------------

# Convert to probs with 2 categories) 
exp_probs_bin <- cbind(1 - exp_probs, exp_probs)  # [non-event, event]
obs_probs_bin <- cbind(1 - obs_binary, obs_binary)

# Add dim names
dim(exp_probs_bin) <- c(sdate = length(exp_probs), bin = 2)
dim(obs_probs_bin) <- c(sdate = length(obs_binary), bin = 2)

# RPSS with default ref (climatological forecast)
rpss_res <- RPSS(exp = exp_probs_bin, obs = obs_probs_bin, memb_dim = NULL, cat_dim = "bin")

# RPSS with observed climatology as ref
ref_probs_bin <- cbind(1 - mean(obs_binary), mean(obs_binary))
ref_probs_bin <- array(rep(ref_probs_bin, each = 100), dim = c(sdate = 100, bin = 2))
dimnames(ref_probs_bin) <- list(sdate = NULL, bin = c("no", "yes"))
rpss_res2 <- RPSS(exp = exp_probs_bin, obs = obs_probs_bin, ref = ref_probs_bin, memb_dim = NULL, cat_dim = "bin")


# ---- Compare scores ----------------------------------------------------------

# Brier Skill Score from BrierScore()
brier_res$bss_res  # 0.3697053
brier_res$bss_gres # 0.3987596

# Manual Brier Skill Score
print(1 -
        mean((exp_probs - obs_binary)^2) / 
        mean((rep(mean(obs_binary), 100) - obs_binary)^2)) # 0.3987596

# RPSS using default reference
rpss_res$rpss # 0.6134883

# RPSS using reference based on obs mean
rpss_res2$rpss # 0.3987596

As shown, the output of RPSS() with a reference forecast calculated from the mean of the observations matches with both the manually computed BSS and the $bss_gres element of the BrierScore() output. The other outputs ($bss_res and rpss_res$rpss) are different. This raises the following questions:

What exactly does $bss_gres represent, and how does it differ from $bss_res?
Why is the RPSS() result so different when using the function’s default reference?

BrierScore.R documentation:

     {$rel}{standard reliability}
     {$res}{standard resolution}
     {$unc}{standard uncertainty}
     {$gres}{generalized resolution}
     {$bss_res}{res - rel / unc}       #bss_res
     {$bss_gres}{gres - rel / unc}     #bss_gres

Best,

Ariadna

Edited Jun 02, 2025 by abatalla