Marginal means are the average predicted values of the outcome at each level of a factor, averaged over all other predictors / random effects (i.e., marginalized for other stuff)
Marginal means are conceptually (and in practice) different from empirical means (i.e., direct average of the observed values) because they are based on a model (and thus can better generalize or extrapolate)
Marginal means are a powerful tool to estimate any quantities of interest in a statistical model
Marginal Contrasts
One very useful application of marginal means is to compute marginal contrasts (contrasts of marginal means)
The model-based framework can also be used to compute marginal effects
The slope of a predictor at any value of other predictors
modelbased::estimate_slopes(model3, trend="Petal.Width", by ="Species")
Estimated Marginal Effects
Species | Median | 95% CI | pd | ROPE | % in ROPE
------------------------------------------------------------------------
setosa | 0.57 | [-0.43, 1.56] | 85.75% | [-0.10, 0.10] | 8.39%
versicolor | 1.87 | [ 1.32, 2.40] | 100% | [-0.10, 0.10] | 0%
virginica | 0.65 | [ 0.28, 1.02] | 99.98% | [-0.10, 0.10] | 0%
Marginal effects estimated for Petal.Width
Type of slope was dY/dX
Exercice
Interpret and visualize effect of the Sepal Width on Sepal Length (depending on the Species)
Model Comparison
Coefficient of Determination - R2
R2 measures how well the model explains the data already observed, focusing on variance reduction
R2 can sometimes give misleading impressions in cases of overfitting; a model might appear to perform very well on the training data but poorly on new data
R2 is primarily applicable (and intuitive) to linear models where the relationship between variables is the primary interest
performance::r2(model)
# Bayesian R2 with Compatibility Interval
Conditional R2: 0.927 (95% CI [0.920, 0.932])
performance::r2(model2)
# Bayesian R2 with Compatibility Interval
Conditional R2: 0.941 (95% CI [0.936, 0.945])
performance::r2(model3)
# Bayesian R2 with Compatibility Interval
Conditional R2: 0.958 (95% CI [0.955, 0.961])
ELPD
Other “relative” indices of fit can be used that measures how well the model predicts each data point and how well it could in-theory generalize to new data
This quality of fit metric is called the ELPD(Expected Log Pointwise Predictive Density)
It can be computed using 2 main methods
WAIC (Widely Applicable Information Criterion)
LOO (Leave-One-Out Information Criterion)
ELPD - WAIC
WAIC (Widely Applicable Information Criterion)
An index of prediction error adjusted for the number of parameters
It provides a balance between model fit and complexity, penalizing models that have too many parameters (similar to the BIC).
Computationally more straightforward than LOO, but might not be as accurate (more sensitive to outliers)
loo::waic(model)
Computed from 4000 by 150 log-likelihood matrix.
Estimate SE
elpd_waic -104.3 9.5
p_waic 3.2 0.5
waic 208.6 19.0
loo::waic(model2)
Computed from 4000 by 150 log-likelihood matrix.
Estimate SE
elpd_waic -89.3 10.6
p_waic 4.5 0.8
waic 178.5 21.3
1 (0.7%) p_waic estimates greater than 0.4. We recommend trying loo instead.
loo::waic(model3)
Computed from 4000 by 150 log-likelihood matrix.
Estimate SE
elpd_waic -64.5 11.5
p_waic 7.0 1.2
waic 129.0 23.0
3 (2.0%) p_waic estimates greater than 0.4. We recommend trying loo instead.
loo_compare()
Note that \(ELPD = -(\frac{WAIC}{2})\)
Use loo::loo_compare() to get the difference in ELPD between models
Leave-One-Out (LOO) Cross-Validation is a method to assess model performance by estimating how well a model predicts each observation, one at a time, using the rest of the data
Instead of refitting the model n times, each time leaving out one of the n data points, approximations like PSIS (Pareto Smoothed Importance Sampling) are used to avoid extensive computation
Provides a robust measure of model’s predictive accuracy (without direct complexity adjustment - but indirect through overfitting sensitivity)
loo::loo(model)
Computed from 4000 by 150 log-likelihood matrix.
Estimate SE
elpd_loo -104.3 9.5
p_loo 3.2 0.5
looic 208.6 19.0
------
MCSE of elpd_loo is 0.0.
MCSE and ESS estimates assume MCMC draws (r_eff in [0.5, 1.2]).
All Pareto k estimates are good (k < 0.7).
See help('pareto-k-diagnostic') for details.
loo::loo(model2)
Computed from 4000 by 150 log-likelihood matrix.
Estimate SE
elpd_loo -89.3 10.6
p_loo 4.6 0.8
looic 178.6 21.3
------
MCSE of elpd_loo is 0.0.
MCSE and ESS estimates assume MCMC draws (r_eff in [0.4, 1.1]).
All Pareto k estimates are good (k < 0.7).
See help('pareto-k-diagnostic') for details.
loo::loo(model3)
Computed from 4000 by 150 log-likelihood matrix.
Estimate SE
elpd_loo -64.5 11.5
p_loo 7.0 1.2
looic 129.1 23.0
------
MCSE of elpd_loo is 0.1.
MCSE and ESS estimates assume MCMC draws (r_eff in [0.3, 1.0]).
All Pareto k estimates are good (k < 0.7).
See help('pareto-k-diagnostic') for details.
The difference in predictive accuracy, as indexed by Expected Log Predictive
Density (ELPD-LOO), suggests that 'model3' is the best model (ELPD = -64.54),
followed by 'model2' (diff-ELPD = -24.75 +- 6.63, p < .001) and 'model'
(diff-ELPD = -39.79 +- 10.30, p < .001)
Exercice
Make a conclusion about which model is better and report the analysis
Exercice
A researcher is interested in the relationship between the two dimensions (verbal and quantitative) of the SAT (a standardized test for college admissions), in particular whether the quantitative score is predicted by the verbal score. A colleague suggests that gender is important.
Using the real dataset sat.act from the psych package, analyze the data and answer the question