Bayesian Statistics

10. Linear Models (Marginal Means and Comparison)

Dominique Makowski
D.Makowski@sussex.ac.uk

Recap

  • Is there a relationship between Petal Length and Petal Width (iris builtin dataset)?
library(brms)

f <- brms::bf(Petal.Length ~ 0 + Intercept + Petal.Width)

model <- brms::brm(f, data = iris, refresh=0)
Running MCMC with 4 sequential chains...

Chain 1 finished in 0.0 seconds.
Chain 2 finished in 0.0 seconds.
Chain 3 finished in 0.0 seconds.
Chain 4 finished in 0.0 seconds.

All 4 chains finished successfully.
Mean chain execution time: 0.0 seconds.
Total execution time: 0.6 seconds.
performance::performance(model)
# Indices of model performance

ELPD     | ELPD_SE |   LOOIC | LOOIC_SE |    WAIC |    R2 | R2 (adj.) |  RMSE | Sigma
-------------------------------------------------------------------------------------
-104.322 |   9.519 | 208.645 |   19.039 | 208.624 | 0.927 |     0.925 | 0.475 | 0.483
parameters::parameters(model)
# Fixed Effects

Parameter   | Median |       95% CI |   pd |  Rhat |     ESS
------------------------------------------------------------
(Intercept) |   1.08 | [0.94, 1.22] | 100% | 1.000 | 1802.00
Petal.Width |   2.23 | [2.13, 2.33] | 100% | 1.000 | 1769.00

# Sigma

Parameter | Median |       95% CI |   pd |  Rhat |     ESS
----------------------------------------------------------
sigma     |   0.48 | [0.43, 0.54] | 100% | 1.001 | 2197.00

Marginal Means and Effects

Factors as Predictors

  • Is Petal Length different between different flower species?
summary(iris$Species)
    setosa versicolor  virginica 
        50         50         50 
  • Species is a categorical factor with 3 levels: setosa, versicolor, and virginica
  • Factors are very common in psychology, e.g.:
    • Condition (control vs. treatment)
    • Gender
    • Group (healthy vs. patient)
  • When a factor is used as a predictor, the first level is used as the reference level (i.e., becomes the intercept)
  • The other levels are compared to the reference level

Petal.Length ~ Species

  • Fit the model with Species as a predictor
model2 <- brms::brm(Petal.Length ~ 0 + Intercept + Species, 
                    data = iris, refresh=0)
Running MCMC with 4 sequential chains...

Chain 1 finished in 0.0 seconds.
Chain 2 finished in 0.0 seconds.
Chain 3 finished in 0.0 seconds.
Chain 4 finished in 0.0 seconds.

All 4 chains finished successfully.
Mean chain execution time: 0.0 seconds.
Total execution time: 0.5 seconds.
  • Interpret parameters
parameters::parameters(model2)
# Fixed Effects

Parameter         | Median |       95% CI |   pd |  Rhat |     ESS
------------------------------------------------------------------
(Intercept)       |   1.46 | [1.34, 1.59] | 100% | 1.002 | 1667.00
Speciesversicolor |   2.80 | [2.62, 2.96] | 100% | 1.000 | 2107.00
Speciesvirginica  |   4.09 | [3.92, 4.26] | 100% | 1.002 | 1777.00

# Sigma

Parameter | Median |       95% CI |   pd |  Rhat |     ESS
----------------------------------------------------------
sigma     |   0.43 | [0.38, 0.49] | 100% | 1.002 | 2913.00
  • What is the value of Petal.Length for “Versicolor”?
  • Can we easily get the value of the outcome at each level?

Marginal Means

  • A statistical model defines a relationship between the outcome and any predictor values
  • It is straightforward to compute the model-based value (i.e., the prediction) of the outcome at each level of a factor
modelbased::estimate_relation(model2, by="Species")
Model-based Predictions

Species    | Predicted |   SE |       95% CI
--------------------------------------------
setosa     |      1.46 | 0.06 | [1.34, 1.59]
versicolor |      4.26 | 0.06 | [4.14, 4.38]
virginica  |      5.55 | 0.06 | [5.44, 5.67]

Variable predicted: Petal.Length
Predictors modulated: Species
  • Marginal means are the average predicted values of the outcome at each level of a factor, averaged over all other predictors / random effects (i.e., marginalized for other stuff)
modelbased::estimate_means(model2, by="Species")
Estimated Marginal Means

Species    | Median |       95% CI |   pd |          ROPE | % in ROPE
---------------------------------------------------------------------
setosa     |   1.46 | [1.34, 1.59] | 100% | [-0.10, 0.10] |        0%
versicolor |   4.26 | [4.14, 4.38] | 100% | [-0.10, 0.10] |        0%
virginica  |   5.55 | [5.44, 5.67] | 100% | [-0.10, 0.10] |        0%

Variable predicted: Petal.Length
Predictors modulated: Species
  • Why are they the same?
  • Marginal means are conceptually (and in practice) different from empirical means (i.e., direct average of the observed values) because they are based on a model (and thus can better generalize or extrapolate)
  • Marginal means are a powerful tool to estimate any quantities of interest in a statistical model

Marginal Contrasts

  • One very useful application of marginal means is to compute marginal contrasts (contrasts of marginal means)
modelbased::estimate_contrasts(model2, contrast="Species")
Marginal Contrasts Analysis

Level1     | Level2     | Median |       95% CI |   pd |          ROPE | % in ROPE
----------------------------------------------------------------------------------
versicolor | setosa     |   2.80 | [2.62, 2.96] | 100% | [-0.10, 0.10] |        0%
virginica  | setosa     |   4.09 | [3.92, 4.26] | 100% | [-0.10, 0.10] |        0%
virginica  | versicolor |   1.29 | [1.12, 1.47] | 100% | [-0.10, 0.10] |        0%

Variable predicted: Petal.Length
Predictors contrasted: Species

Interactions

  • Is the relationship between Petal Length and Petal Width different between species?
model3 <- brms::brm(Petal.Length ~ 0 + Intercept + Species * Petal.Width, 
                    data = iris, refresh=0)
Running MCMC with 4 sequential chains...

Chain 1 finished in 0.5 seconds.
Chain 2 finished in 0.5 seconds.
Chain 3 finished in 0.5 seconds.
Chain 4 finished in 0.4 seconds.

All 4 chains finished successfully.
Mean chain execution time: 0.5 seconds.
Total execution time: 2.3 seconds.
parameters::parameters(model3)
# Fixed Effects

Parameter                     | Median |        95% CI |     pd |  Rhat |     ESS
---------------------------------------------------------------------------------
(Intercept)                   |   1.32 | [ 1.05, 1.59] |   100% | 1.004 | 1160.00
Speciesversicolor             |   0.45 | [-0.30, 1.24] | 87.55% | 1.002 | 1968.00
Speciesvirginica              |   2.90 | [ 2.12, 3.72] |   100% | 1.002 | 2066.00
Petal.Width                   |   0.57 | [-0.43, 1.56] | 85.75% | 1.005 | 1083.00
Speciesversicolor:Petal.Width |   1.32 | [ 0.21, 2.45] | 98.72% | 1.005 | 1109.00
Speciesvirginica:Petal.Width  |   0.09 | [-1.03, 1.19] | 57.05% | 1.005 | 1072.00

# Sigma

Parameter | Median |       95% CI |   pd |  Rhat |     ESS
----------------------------------------------------------
sigma     |   0.36 | [0.33, 0.41] | 100% | 1.002 | 2436.00
  • Interaction parameters shows how much the effect of Petal.Width on Petal.Length changes when the species changes

Visualization (1)

p <- ggplot(iris, aes(x = Petal.Width, y = Petal.Length, color = Species)) +
  geom_point() 
p

Visualization (2)

  • modelbased::estimate_relation() is a convenient function to create a datagrid and make predictions
    • keep_iterations to add individual predicted iterations
    • length (default is 10), the number of points (more points: smoother line, but slower computation)
pred <- modelbased::estimate_relation(model3, length=20, keep_iterations=15)
reshaped_iters <- reshape_iterations(pred)

p <- p + geom_line(data = reshaped_iters, 
            aes(y = iter_value, group = interaction(Species, iter_group)), 
            alpha = 0.5)
p

Visualization (3)

  • Add the “median” prediction
p + geom_line(data = pred, aes(y = Predicted), 
              linewidth=2)

Effect of Species (Marginal Means)

modelbased::estimate_means(model2, by="Species")
Estimated Marginal Means

Species    | Median |       95% CI |   pd |          ROPE | % in ROPE
---------------------------------------------------------------------
setosa     |   1.46 | [1.34, 1.59] | 100% | [-0.10, 0.10] |        0%
versicolor |   4.26 | [4.14, 4.38] | 100% | [-0.10, 0.10] |        0%
virginica  |   5.55 | [5.44, 5.67] | 100% | [-0.10, 0.10] |        0%

Variable predicted: Petal.Length
Predictors modulated: Species
modelbased::estimate_means(model3, by="Species")
Estimated Marginal Means

Species    | Median |       95% CI |   pd |          ROPE | % in ROPE
---------------------------------------------------------------------
setosa     |   2.00 | [1.04, 2.96] | 100% | [-0.10, 0.10] |        0%
versicolor |   4.02 | [3.90, 4.14] | 100% | [-0.10, 0.10] |        0%
virginica  |   5.01 | [4.70, 5.34] | 100% | [-0.10, 0.10] |        0%

Variable predicted: Petal.Length
Predictors modulated: Species
Predictors averaged: Petal.Width (1.2)
  • The means at every level are different between the two levels. why?
  • Because the second model averages over the effect of Petal.Width (takes more information into account)

Effect of Species along Petal.Width (Marginal Contrasts)

  • Marginal contrasts can be computed at any levels of other predictors
modelbased::estimate_contrasts(model3, contrast="Species", by = "Petal.Width", length = 3, 
                               test = "pd", centrality = "mean")
Marginal Contrasts Analysis

Level1     | Level2     | Petal.Width |  Mean |        95% CI |     pd
----------------------------------------------------------------------
versicolor | setosa     |        0.10 |  0.59 | [-0.09, 1.30] | 95.25%
virginica  | setosa     |        0.10 |  2.92 | [ 2.19, 3.68] |   100%
virginica  | versicolor |        0.10 |  2.33 | [ 1.37, 3.34] |   100%
versicolor | setosa     |        1.30 |  2.17 | [ 1.10, 3.21] |   100%
virginica  | setosa     |        1.30 |  3.03 | [ 1.94, 4.14] |   100%
virginica  | versicolor |        1.30 |  0.87 | [ 0.58, 1.18] |   100%
versicolor | setosa     |        2.50 |  3.74 | [ 1.43, 6.05] | 99.78%
virginica  | setosa     |        2.50 |  3.15 | [ 0.85, 5.42] | 99.55%
virginica  | versicolor |        2.50 | -0.59 | [-1.25, 0.10] | 95.47%

Variable predicted: Petal.Length
Predictors contrasted: Species
  • Useful to understand interactions

Effect of Petal Width (Marginal Effects)

  • The model-based framework can also be used to compute marginal effects
  • The slope of a predictor at any value of other predictors
modelbased::estimate_slopes(model3, trend="Petal.Width", by = "Species")
Estimated Marginal Effects

Species    | Median |        95% CI |     pd |          ROPE | % in ROPE
------------------------------------------------------------------------
setosa     |   0.57 | [-0.43, 1.56] | 85.75% | [-0.10, 0.10] |     8.39%
versicolor |   1.87 | [ 1.32, 2.40] |   100% | [-0.10, 0.10] |        0%
virginica  |   0.65 | [ 0.28, 1.02] | 99.98% | [-0.10, 0.10] |        0%

Marginal effects estimated for Petal.Width
Type of slope was dY/dX

Exercice

  • Interpret and visualize effect of the Sepal Width on Sepal Length (depending on the Species)

Model Comparison

Coefficient of Determination - R2

  • R2 measures how well the model explains the data already observed, focusing on variance reduction
  • R2 can sometimes give misleading impressions in cases of overfitting; a model might appear to perform very well on the training data but poorly on new data
  • R2 is primarily applicable (and intuitive) to linear models where the relationship between variables is the primary interest
performance::r2(model)
# Bayesian R2 with Compatibility Interval

  Conditional R2: 0.927 (95% CI [0.920, 0.932])
performance::r2(model2)
# Bayesian R2 with Compatibility Interval

  Conditional R2: 0.941 (95% CI [0.936, 0.945])
performance::r2(model3)
# Bayesian R2 with Compatibility Interval

  Conditional R2: 0.958 (95% CI [0.955, 0.961])

ELPD

  • Other “relative” indices of fit can be used that measures how well the model predicts each data point and how well it could in-theory generalize to new data
    • This quality of fit metric is called the ELPD (Expected Log Pointwise Predictive Density)
  • It can be computed using 2 main methods
    • WAIC (Widely Applicable Information Criterion)
    • LOO (Leave-One-Out Information Criterion)

ELPD - WAIC

  • WAIC (Widely Applicable Information Criterion)
    • An index of prediction error adjusted for the number of parameters
    • It provides a balance between model fit and complexity, penalizing models that have too many parameters (similar to the BIC).
    • Computationally more straightforward than LOO, but might not be as accurate (more sensitive to outliers)
loo::waic(model)

Computed from 4000 by 150 log-likelihood matrix.

          Estimate   SE
elpd_waic   -104.3  9.5
p_waic         3.2  0.5
waic         208.6 19.0
loo::waic(model2)

Computed from 4000 by 150 log-likelihood matrix.

          Estimate   SE
elpd_waic    -89.3 10.6
p_waic         4.5  0.8
waic         178.5 21.3

1 (0.7%) p_waic estimates greater than 0.4. We recommend trying loo instead. 
loo::waic(model3)

Computed from 4000 by 150 log-likelihood matrix.

          Estimate   SE
elpd_waic    -64.5 11.5
p_waic         7.0  1.2
waic         129.0 23.0

3 (2.0%) p_waic estimates greater than 0.4. We recommend trying loo instead. 

loo_compare()

  • Note that \(ELPD = -(\frac{WAIC}{2})\)
  • Use loo::loo_compare() to get the difference in ELPD between models
loo::loo_compare(loo::waic(model), loo::waic(model2), loo::waic(model3))
       elpd_diff se_diff
model3   0.0       0.0  
model2 -24.8       6.6  
model  -39.8      10.3  
  • Shows the “best” model first

ELPD - LOOIC

  • Leave-One-Out (LOO) Cross-Validation is a method to assess model performance by estimating how well a model predicts each observation, one at a time, using the rest of the data
  • Instead of refitting the model n times, each time leaving out one of the n data points, approximations like PSIS (Pareto Smoothed Importance Sampling) are used to avoid extensive computation
  • Provides a robust measure of model’s predictive accuracy (without direct complexity adjustment - but indirect through overfitting sensitivity)
loo::loo(model)

Computed from 4000 by 150 log-likelihood matrix.

         Estimate   SE
elpd_loo   -104.3  9.5
p_loo         3.2  0.5
looic       208.6 19.0
------
MCSE of elpd_loo is 0.0.
MCSE and ESS estimates assume MCMC draws (r_eff in [0.5, 1.2]).

All Pareto k estimates are good (k < 0.7).
See help('pareto-k-diagnostic') for details.
loo::loo(model2)

Computed from 4000 by 150 log-likelihood matrix.

         Estimate   SE
elpd_loo    -89.3 10.6
p_loo         4.6  0.8
looic       178.6 21.3
------
MCSE of elpd_loo is 0.0.
MCSE and ESS estimates assume MCMC draws (r_eff in [0.4, 1.1]).

All Pareto k estimates are good (k < 0.7).
See help('pareto-k-diagnostic') for details.
loo::loo(model3)

Computed from 4000 by 150 log-likelihood matrix.

         Estimate   SE
elpd_loo    -64.5 11.5
p_loo         7.0  1.2
looic       129.1 23.0
------
MCSE of elpd_loo is 0.1.
MCSE and ESS estimates assume MCMC draws (r_eff in [0.3, 1.0]).

All Pareto k estimates are good (k < 0.7).
See help('pareto-k-diagnostic') for details.
loo::loo_compare(loo::loo(model), loo::loo(model2), loo::loo(model3))
       elpd_diff se_diff
model3   0.0       0.0  
model2 -24.8       6.6  
model  -39.8      10.3  

Better by how much? Interpretation rules of thumb

  • We can focus on the ELPD difference and its SE
  • The “standardized” difference (\(\Delta_{ELPD} / SE\)) can be interpreted as a z-score
    • \(> |1.96|\) ~ p < .05
    • \(> |3.29|\) ~ p < .001
    • Use 1-pnorm(|z-score|) to get the one-tailed p-value
  • But the standarized difference does not make much sense if the absolute is small (\(|\Delta_{ELPD}|<4\))
    • It’s not useful to say one model is likely to provide better predictions than other if the the predictions are almost the same anyways
  • Reporting should include the raw ELPD difference and its SE (and eventually the z-score or p-value if it makes sense)
loo::loo_compare(loo::loo(model), loo::loo(model2), loo::loo(model3)) |> 
  report::report()
The difference in predictive accuracy, as indexed by Expected Log Predictive
Density (ELPD-LOO), suggests that 'model3' is the best model (ELPD = -64.54),
followed by 'model2' (diff-ELPD = -24.75 +- 6.63, p < .001) and 'model'
(diff-ELPD = -39.79 +- 10.30, p < .001)

Exercice

  • Make a conclusion about which model is better and report the analysis

Exercice

  • A researcher is interested in the relationship between the two dimensions (verbal and quantitative) of the SAT (a standardized test for college admissions), in particular whether the quantitative score is predicted by the verbal score. A colleague suggests that gender is important.
  • Using the real dataset sat.act from the psych package, analyze the data and answer the question
  • Load the dataset as follows:
df <- psych::sat.act
df$gender <- as.factor(df$gender)  # gender=1: males, 2: females

Conclusion

  • You have (hopefully) learned:
    • What Bayesian statistics were about, the philosophy and applications of Bayes theorem
    • The strengths and drawbacks, and the differences and similarities with other approaches
    • How to apply it using simple and less simple models
  • This background knowledge helped you gain confidence about knowing where to look to learn more
  • Bayesian statistics is a very rapidly evolving field
    • Challenge to teach & learn (lots of uncertainty)
  • Important to keep up-to-date with the latest developments
    • Follow the development of new and existing packages (e.g., brms, rstan, …)
    • Follow forums, blogs and developers (e.g., r-bloggers, Twitter/X, …)
  • Learn by engaging with the community
    • Ask for help - it will often be helpful to people
    • Help others (share tips and tricks, little successes)
  • Spread the Bayesian gospel to your fellow academics
    • But don’t be a blind zealot - think for yourselves

The End (for good)

Thank you!