Bayesian Statistics

8. Linear Models (Priors)

_{Dominique Makowski}
_{^{D.Makowski@sussex.ac.uk}}

Recap

You know how to:
- Use of Bayes Factors in simple tests (e.g., t-tests and correlations)
- Run Bayesian Correlations
- Sample from the posterior using MCMC
- Describe and report the posterior distribution

Goal

Last week, we were interested in the correlation between mpg and qsec in the mtcars dataset

BayesFactor::correlationBF(mtcars$mpg, mtcars$qsec)

Bayes factor analysis
--------------
[1] Alt., r=0.333 : 4.416463 ±0%

Against denominator:
  Null, rho = 0 
---
Bayes factor type: BFcorrelation, Jeffreys-beta*

ggplot(mtcars, aes(x = mpg, y = qsec)) +
  geom_point(size=3) +
  stat_ellipse(color="blue", linewidth=1)

This week, we are interested in the actual relationship (i.e., not just the strength of association, but the actual characteristics of the regression line) between these 2 variables

1. Visualizing Model Predictions

Exercice

Fit a Frequentist linear model to predict qsec from mpg. Write down its parameters.

model <- lm(qsec ~ mpg, data = mtcars) 
summary(model)


Call:
lm(formula = qsec ~ mpg, data = mtcars)

Residuals:
    Min      1Q  Median      3Q     Max 
-2.8161 -1.0287  0.0954  0.8623  4.7149 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 15.35477    1.02978  14.911 2.05e-15 ***
mpg          0.12414    0.04916   2.525   0.0171 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.65 on 30 degrees of freedom
Multiple R-squared:  0.1753,    Adjusted R-squared:  0.1478 
F-statistic: 6.377 on 1 and 30 DF,  p-value: 0.01708

Visualize the relationship

ggplot(mtcars, aes(x = mpg, y = qsec)) +
  geom_point(size=3) +
  geom_smooth(method="lm", color="red")

Slope and Intercept

Using geom_smooth(method="lm") is convenient, but we don’t know what’s happening under the hood
What is happening under the hood?¹
- A linear model is computed between x and y ($y = ax + b$)
- The slope a (aka $\beta$) and the intercept b (aka $\beta_{0}$ or the Intercept) are extracted
- We draw a segment of this line between the minimum and the maximum of x

ggplot(mtcars, aes(x = mpg, y = qsec)) +
  geom_point(size=3) +
  geom_abline(intercept = 15, slope = 0.12, color="red")

How to Visualize a Model

There is another way to get this regression line
A model (i.e., an equation with a set of parameters, e.g., intercept and slope) is a “function”
- $y = 2 + 3x$
- It expresses y “as a function” of x
- For any value of x, we can predict a value of y
A statistical model can be used to generate predictions

Making Predictions

One get get predictions from almost any models (using the predict() or the insight::get_predicted() function - the latter being slightly being more user-friendly and comprehensive)

pred <- insight::get_predicted(model, data=mtcars) 
length(pred)

[1] 32

What is pred? And why is its length 32?

data <- mtcars  # Copy dataframe (not good to modify builtin datasets)
data$Predicted <- pred

Let’s add it to the dataframe and visualize it

Visualizing Predicted Distribution

We can compare the actual distribution of the outcome variable qsec with that of the predicted outcome
A good model where predictions are close to the actual values should reproduce the same distribution of the outcome variable
However, in general, a linear model is geared at predicting the “mean” of the outcome variable (by definition)

ggplot(data) +
  geom_density(aes(x=qsec), fill="green", alpha=0.5) +
  geom_density(aes(x=Predicted), fill="red", alpha=0.5)

Visualizing Predicted Relationship (1)

But we can also visualize the relationship between the predictor and the outcome using the predictions

ggplot(data, aes(x = mpg, y = qsec)) +
  geom_point(size=3) +
  geom_point(aes(y=Predicted), size=3, color="red", alpha=0.5)

Visualizing Predicted Relationship (2)

Interesting pattern… what if we connect the points with a line?

It looks like the line we got using the equation

Datagrids

The relationship estimated by models can be visualized using predictions (rather than using directly the parameters), which is convenient
We can make predictions on non-existing values, example, equally spaced values simply to visualize the relationship

newdata <- data.frame(mpg = c(10, 20, 30, 40))

This is called a “datagrid” (a grid of new data points used for model predictions and visualizations)
We can also predict the confidence interval around the predictions

pred <- insight::get_predicted(model, data=newdata, ci=0.95) |> 
  as.data.frame() |> 
  cbind(newdata)
head(pred)

  Predicted        SE   CI_low  CI_high mpg
1  16.59613 0.5754098 15.42099 17.77128  10
2  17.83750 0.2916457 17.24188 18.43312  20
3  19.07887 0.5677467 17.91937 20.23836  30
4  20.32023 1.0212413 18.23458 22.40588  40

Visualize Predictions with Confidence Intervals

Plot everything

mtcars |> 
  ggplot(aes(x = mpg)) +
  geom_point(aes(y = qsec), size=3) +
  geom_ribbon(data=pred, aes(ymin=CI_low, ymax=CI_high), fill="red", alpha=0.2) +
  geom_line(data=pred, aes(y=Predicted), color="red")

2. Bayesian Models: Quick n’ Dirty

Installing `brms`

From now on until the rest of the module, we will use the brms package to fit Bayesian models
We took some time to do the installation in the 2nd module, but if you missed it, let me know during the break
It is a package that is complicated to install. Why?
Because it needs to install another programming language…

Stan, JAGS, and BUGS

As mentioned, performing Bayesian inference is complex
Modern Bayesian full inference (using MCMC), known to be resource-intensive and slow, is currently best performed (most fast and flexibly) by dedicated languages, aka probabilistic programming languages
- Stan (the most popular one): Named in honour of Stanislaw Ulam, pioneer of the Monte Carlo method
- JAGS: “Just Another Gibbs Sampler”
- BUGS: “Bayesian Inference Using Gibbs Sampling”
- PyMC (Python)
- Turing (Julia)
These languages are designed to perform Bayesian inference, and are optimized for that purpose
They are not general-purpose programming languages
What a hassle to learn a new language just to do Bayesian inference!

About `brms`

There are “wrappers” around these languages that allow us to use them from other general purpose languages like R or Python
In R:
- brms = Bayesian Regression Models using ‘Stan’
- rstanarm = R interface to Stan for Applied Regression Modeling
rstanarm
- Has different functions for different models (e.g., stan_lm(), stan_glm(), stan_lmer(), stan_glmer(), stan_gamm4() etc.)
- Each models are implemented with precompiled Stan code
- It is slightly easier to install, but less flexible
brms
- Became the most popular
- Main difference is that brms has only one main function, brm(), which can fit almost any model
- These models are translated into Stan code, and then compiled
- Overhead time when fitting new models, but it is more flexible

Doing a Bayesian Regression

Frequentist model

model <- lm(qsec ~ mpg, data = mtcars) 
parameters::parameters(model)

Parameter   | Coefficient |   SE |         95% CI | t(30) |      p
------------------------------------------------------------------
(Intercept) |       15.35 | 1.03 | [13.25, 17.46] | 14.91 | < .001
mpg         |        0.12 | 0.05 | [ 0.02,  0.22] |  2.53 | 0.017

Bayesian model

model <- brms::brm(qsec ~ mpg, data = mtcars, refresh=0) 
parameters::parameters(model)

# Fixed Effects

Parameter   | Median |         95% CI |     pd |  Rhat |     ESS
----------------------------------------------------------------
(Intercept) |  15.34 | [13.13, 17.40] |   100% | 1.001 | 3582.00
mpg         |   0.12 | [ 0.03,  0.23] | 99.30% | 1.000 | 3549.00

# Sigma

Parameter | Median |       95% CI |   pd |  Rhat |     ESS
----------------------------------------------------------
sigma     |   1.68 | [1.33, 2.21] | 100% | 1.001 | 2972.00

refresh=0 silences the printing of the sampling _{(to avoid cluttering this slide)}
R-hat: Convergence diagnostic index. Should be less than < 1.05.
ESS: Effective Sample Size: how many “good iterations” (truly independent draws) we have. Should be > 400 (the maximum being the number of iterations that we draw).

Voilà!

You’ve fitted and reported a Bayesian model.
The results are very coherent with the Frequentist version.
This is a valid Bayesian analysis…
But because we are on our way to become Bayesian Masters, we can actually spend more time and thought about aspects related to this model.

3. Bayesian Models: Priors

`brms` formula

the brm() function’s first argument is a formula, like in lm()
Because formula can become complex, it is useful to specify the formula in a separate variable

f <- brms::brmsformula(qsec ~ mpg)

We could then run brms::brm(f, data=mtcars) to fit the Bayesian linear model

Regression formulas: the intercept

In most model functions in R, the intercept is included by default:
- lm(qsec ~ mpg) is equivalent to lm(qsec ~ 1 + mpg)

coef(lm(qsec ~ mpg, data=mtcars))

(Intercept)         mpg 
 15.3547689   0.1241366

coef(lm(qsec ~ 1 + mpg, data=mtcars))

(Intercept)         mpg 
 15.3547689   0.1241366

One can remove the intercept by using lm(qsec ~ 0 + mpg)

coef(lm(qsec ~ 0 + mpg, data=mtcars))

     mpg 
0.827125

Exercice

Visualize the relationship predicted by the No Intercept model

newdata <- data.frame(mpg = seq(-5, 35, 5))

lm(qsec ~ 0 + mpg, data=mtcars) |> 
  insight::get_predicted(data=newdata, ci=0.95) |> 
  as.data.frame() |> 
  cbind(newdata) |> 
  ggplot(aes(x = mpg, y = Predicted)) +
  geom_ribbon(aes(ymin=CI_low, ymax=CI_high), fill="red", alpha=0.2) +
  geom_line(color="red") +
  geom_point(data=mtcars, aes(x = mpg, y = qsec), size=3)

It forces the regression line to go through the origin (0, 0)

`brms` formula exception: the intercept

In brms, there is a twist. For reasons that we will clarify later, it is often good to explicitly include the intercept as a “variable”
- y ~ 0 + Intercept is equivalent to y ~ 1
- This makes the intercept as a normal coefficient rather than a special thing, which makes it easier for priors specification
Instead of the previous formula, we will use:

f <- brms::brmsformula(qsec ~ 0 + Intercept + mpg)

The 2 formulas would have resulted in the same model, but explicitly including the intercept makes it easier to…
Specify priors

Get Default Priors

Bayesian inference is the process of reallocating credibility over parameters, from priors to posteriors
In the correlation example, we had a prior over the correlation coefficient
In Bayesian regression models, we have priors over the coefficients (regression parameters)
How many parameters do we have in this linear model?
The get_prior() function returns the priors associated with a given model or a formula (i.e., the default priors made by brms)

brms::get_prior(f, data = mtcars)

                prior class      coef group resp dpar nlpar lb ub       source
               (flat)     b                                            default
               (flat)     b Intercept                             (vectorized)
               (flat)     b       mpg                             (vectorized)
 student_t(3, 0, 2.5) sigma                                  0         default

Understanding Priors Parameters

brms::get_prior(f, data = mtcars)

                prior class      coef group resp dpar nlpar lb ub       source
               (flat)     b                                            default
               (flat)     b Intercept                             (vectorized)
               (flat)     b       mpg                             (vectorized)
 student_t(3, 0, 2.5) sigma                                  0         default

brms has set a default prior for each parameter
2 types of parameters for linear models:
- Coefficients (b) and Distributional (sigma)
For coefficients b, brms uses “flat” priors for any coefficient
- The first row just indicates this “default”, and then it shows that this default is applied to all the following coefficients (vectorized)
- Changing the priors in the first row would is a convenient way to apply the same prior for all coefficients
- We can ignore the first row because we want to be specific with our priors
We will ignore the last row sigma (for now) to focus on the coefficients

Flat priors

What are “flat” priors?
- Uniform priors
These are the “least informative”: we consider that all possible values have the same probability: we consider that a slope of -1 million and +1 million are equally likely
- Using this prior gives result equivalent to Maximum Likelihood Estimation (because the prior has no influence, so only the likelihood matters)
It might seem like a good idea to use flat priors, but it is not
- Not realistic: we often don’t expect the slope to be -1 million or +1 million
- Not good for the inference algorithm: we waste energy exploring regions of the parameter space that are not realistic
- Can lead to convergence issues
In general, avoid flat priors

Alternative to Flat Priors

Instead of flat priors, we can set weakly informative priors
Priors that are “weakly informative” give some information to the model, but not too much
For example, we can use a normal distribution with a very large standard deviation
- It helps the model to converge by avoiding unrealistic regions of the parameter space
- At the same time doesn’t provide too much information to the model that would “bias” the estimation too much

What priors to use? Intercept (1)

The Intercept represents the predicted value of the dependent variable (qsec) when the independent variable (mpg) is 0
We can plot our data to visualize the unit and range of our data

p <- ggplot(mtcars, aes(x = mpg, y = qsec)) +
  geom_point(size=3) + 
  geom_vline(xintercept = 0, linetype="dashed") +
  geom_hline(yintercept = 0, linetype="dashed") +
  coord_cartesian(xlim = c(-1, 35), ylim = c(-1, 30))
p

What would be a good range for the Intercept?

What priors to use? Intercept (2)

We can visualize regression lines with different intercepts (and a slope of 0)
- e.g., 10? 15? 17? 20? 25?

p + geom_abline(intercept = c(10, 15, 17, 20, 25), slope = 0, 
                color=c("purple", "red", "orange", "green", "blue"), linewidth=1)

Which one feels the most probable? Which one the least?

What priors to use? Intercept (3)

We could use a normal distribution for the intercept, centered as a “best guess”
The center could be set to the average y value
- Because if the slope = 0 (no effect), the intercept would be the average y value (cf. “constant” models)
- mean(mtcars$qsec) = 17.85
What precision? In the example above, we used $SD=4$

What priors to use? Intercept (4)

Such prior ($Normal(17.85, 4)$) is actually quite narrow (i.e., “strongly informative”)
- The actual SD of the response variable is sd(mtcars$qsec) = 1.79
You can express more or less strong beliefs by changing the precision
It is good to use informative priors, but it requires thought and understanding of the parameters and the data
However, since we want with weakly informative priors, we can use a wider distribution
- e.g., $Normal(17.85, 40)$
- Or $Normal(M_{qsec}, 10*SD_{qsec})$ = $Normal(17.85, 17.9)$
Can be convenient to express the prior intercept as a function of the outcome variable’s mean and SD
- But the details don’t actually have a huge impact (e.g., taking 10 SD vs. 12 SD)

Priors for Coefficients (1)

In psychology, we are often interested in testing effects
- Coefficients (slopes) often represent some effects of interest
- We are often interested in assessing whether a coefficient is different from 0
Thus, in order to avoid “biasing” the effects in one or the other direction, we often use a symmetric distribution centered at 0
- This means that we are not assuming that the effect is positive or negative
- In other words, this prior is agnostic (doesn’t provide information) about the direction of the effect, only its absolute size, penalizing very large effects
- We consider that the most likely value is 0, but we equally allow for the possibility of positive or negative effects
- This kind of prior centered at 0 is conservative (more or less, depending on its width): it pushes the effects towards 0
That’s one of the reason why the idea that we can “cheat” with priors to make things significant is not true
- Using wide symmetric priors centered at 0 is a common default choice and, if anything, is more conservative than Frequentist estimation

Priors for Coefficients (2)

Assuming the intercept is the mean of qsec (17.85), how do various slopes look like
- e.g., -1, -0.5, 0, 0.5, 1

Values of $|1|$ are quite extreme
We can pick a distribution that does not give very high plausibility to values $> |1|$

Priors for Coefficients (3)

E.g., $Normal(0, 0.5)$

Priors for Coefficients (4)

Because we want to be even weaker in the information, we are going to pick $Normal(0, 5)$
What does this prior say?
- Any slope between -3 SD and + 3 SD (i.e., a slope of -15 and +15) is possible
- A slope of 1,000,000 is very unlikely
Note that other priors could have been suited
For instance, if the data was particularly noisy, or prone to outliers (extreme values), that could make the slope “wobbly”, we could use a t-distribution
- Red: $Normal(\mu=0, \sigma=0.5)$
- Purple: $Normal(\mu=0, \sigma=5)$
- Blue: $t(df=2, \mu=0, \sigma=5)$

Chill out, it’s just a prior

The choice of priors might seem like a big deal, tedious, daunting and scary
But it’s not:
- If you understand your model, you can easily and quickly make reasonable choices
- It’s forgiving, even if you put a “bad” prior (e.g., too informative), if the effect is there the data will overwhelm the prior (unless it’s mis-specified, example putting a $Beta$ prior on a negative relationship)
- In many cases, the priors don’t matter as much
- In many cases, even using “flat” priors is fine and won’t make any difference
But it is useful to think about priors
- Especially when you can have strong prior knowledge
- If you want to be particularly conservative (“error control”)
- If you have noisy or little data
- If you care about efficiency (slow models that have trouble converging)

Set Priors in `brms`

Now that we know what priors we want to set:
- Intercept: $Normal(17.85, 17.9)$
- Slope: $Normal(0, 5)$
How to actually set them?

brms::get_prior(f, data=mtcars)

                prior class      coef group resp dpar nlpar lb ub       source
               (flat)     b                                            default
               (flat)     b Intercept                             (vectorized)
               (flat)     b       mpg                             (vectorized)
 student_t(3, 0, 2.5) sigma                                  0         default

These are the default priors

Using `set_prior()`

We create new priors using set_prior(), but we need to identify them using class and coef

brms::set_prior("normal(17.85, 17.9)", class = "b", coef = "Intercept")

b_Intercept ~ normal(17.85, 17.9)

Validate Priors in `brms`

A collection of priors is created using the c() function
This collection of priors is then “validated” using validate_prior()
- Check whether all the specified priors exist in the list of priors
- Whether the replacement of default priors is successful
- Create a new list of priors (default + replaced ones) that we can provide to the model

priors <- c(
  brms::set_prior("normal(17.85, 17.9)", class = "b", coef = "Intercept"),
  brms::set_prior("normal(0, 5)", class = "b", coef = "mpg")
) |> 
  brms::validate_prior(f, data=mtcars)

priors

                prior class      coef group resp dpar nlpar lb ub  source
               (flat)     b                                       default
  normal(17.85, 17.9)     b Intercept                                user
         normal(0, 5)     b       mpg                                user
 student_t(3, 0, 2.5) sigma                                  0    default

Our new object named priors is a collection of priors that we can provide to the model

Check Priors

Are these priors reasonable? i.e., not too narrow, not too wide?
In the case of a simple linear model, where we have a good idea of the range of the coefficients, we can gauge that just from the prior parameters themselves
In more complex models, it can become complicated
- Because parameters don’t have an intuitive sense or are not directly interpretable
- Because parameters interact with one-another
One way to check if priors are appropriate is to do a “Prior Predictive Check”
- This means that we will generate model predictions using the priors only
- We can then see if these predictions are reasonable

Make a model with priors only

We can fit a model in brms using the brm function
- Enter the formula, the data, and priors
- If the prior argument is not specified, it will use the default that we got from get_prior()
Here we want to only use the priors (as if we didn’t see the data)
- Using the argument: sample_prior="only"
The model is first compiled (into Stan and C++ code), then sampled from

model_priors <- brms::brm(
  formula=f,
  data = mtcars,
  prior = priors,
  sample_prior="only"
)

Generate predictions

We want to see what this “prior-only” model predicts
We want to make predictions on a datagrid (to visualize the regression lines that it predicts)

newdata <- data.frame(mpg = seq(10, 35, length.out=6))
newdata

pred <- get_predicted(model_priors, data=newdata, iterations=250) |> 
  as.data.frame() |> 
  cbind(newdata)

nrow(pred)

[1] 6

ncol(pred)

[1] 252

names(pred)

  [1] "Predicted" "iter_1"    "iter_2"    "iter_3"    "iter_4"    "iter_5"   
  [7] "iter_6"    "iter_7"    "iter_8"    "iter_9"    "iter_10"   "iter_11"  
 [13] "iter_12"   "iter_13"   "iter_14"   "iter_15"   "iter_16"   "iter_17"  
 [19] "iter_18"   "iter_19"   "iter_20"   "iter_21"   "iter_22"   "iter_23"  
 [25] "iter_24"   "iter_25"   "iter_26"   "iter_27"   "iter_28"   "iter_29"  
 [31] "iter_30"   "iter_31"   "iter_32"   "iter_33"   "iter_34"   "iter_35"  
 [37] "iter_36"   "iter_37"   "iter_38"   "iter_39"   "iter_40"   "iter_41"  
 [43] "iter_42"   "iter_43"   "iter_44"   "iter_45"   "iter_46"   "iter_47"  
 [49] "iter_48"   "iter_49"   "iter_50"   "iter_51"   "iter_52"   "iter_53"  
 [55] "iter_54"   "iter_55"   "iter_56"   "iter_57"   "iter_58"   "iter_59"  
 [61] "iter_60"   "iter_61"   "iter_62"   "iter_63"   "iter_64"   "iter_65"  
 [67] "iter_66"   "iter_67"   "iter_68"   "iter_69"   "iter_70"   "iter_71"  
 [73] "iter_72"   "iter_73"   "iter_74"   "iter_75"   "iter_76"   "iter_77"  
 [79] "iter_78"   "iter_79"   "iter_80"   "iter_81"   "iter_82"   "iter_83"  
 [85] "iter_84"   "iter_85"   "iter_86"   "iter_87"   "iter_88"   "iter_89"  
 [91] "iter_90"   "iter_91"   "iter_92"   "iter_93"   "iter_94"   "iter_95"  
 [97] "iter_96"   "iter_97"   "iter_98"   "iter_99"   "iter_100"  "iter_101" 
[103] "iter_102"  "iter_103"  "iter_104"  "iter_105"  "iter_106"  "iter_107" 
[109] "iter_108"  "iter_109"  "iter_110"  "iter_111"  "iter_112"  "iter_113" 
[115] "iter_114"  "iter_115"  "iter_116"  "iter_117"  "iter_118"  "iter_119" 
[121] "iter_120"  "iter_121"  "iter_122"  "iter_123"  "iter_124"  "iter_125" 
[127] "iter_126"  "iter_127"  "iter_128"  "iter_129"  "iter_130"  "iter_131" 
[133] "iter_132"  "iter_133"  "iter_134"  "iter_135"  "iter_136"  "iter_137" 
[139] "iter_138"  "iter_139"  "iter_140"  "iter_141"  "iter_142"  "iter_143" 
[145] "iter_144"  "iter_145"  "iter_146"  "iter_147"  "iter_148"  "iter_149" 
[151] "iter_150"  "iter_151"  "iter_152"  "iter_153"  "iter_154"  "iter_155" 
[157] "iter_156"  "iter_157"  "iter_158"  "iter_159"  "iter_160"  "iter_161" 
[163] "iter_162"  "iter_163"  "iter_164"  "iter_165"  "iter_166"  "iter_167" 
[169] "iter_168"  "iter_169"  "iter_170"  "iter_171"  "iter_172"  "iter_173" 
[175] "iter_174"  "iter_175"  "iter_176"  "iter_177"  "iter_178"  "iter_179" 
[181] "iter_180"  "iter_181"  "iter_182"  "iter_183"  "iter_184"  "iter_185" 
[187] "iter_186"  "iter_187"  "iter_188"  "iter_189"  "iter_190"  "iter_191" 
[193] "iter_192"  "iter_193"  "iter_194"  "iter_195"  "iter_196"  "iter_197" 
[199] "iter_198"  "iter_199"  "iter_200"  "iter_201"  "iter_202"  "iter_203" 
[205] "iter_204"  "iter_205"  "iter_206"  "iter_207"  "iter_208"  "iter_209" 
[211] "iter_210"  "iter_211"  "iter_212"  "iter_213"  "iter_214"  "iter_215" 
[217] "iter_216"  "iter_217"  "iter_218"  "iter_219"  "iter_220"  "iter_221" 
[223] "iter_222"  "iter_223"  "iter_224"  "iter_225"  "iter_226"  "iter_227" 
[229] "iter_228"  "iter_229"  "iter_230"  "iter_231"  "iter_232"  "iter_233" 
[235] "iter_234"  "iter_235"  "iter_236"  "iter_237"  "iter_238"  "iter_239" 
[241] "iter_240"  "iter_241"  "iter_242"  "iter_243"  "iter_244"  "iter_245" 
[247] "iter_246"  "iter_247"  "iter_248"  "iter_249"  "iter_250"  "mpg"

Iterations

Bayesian sampling is done in iterations
Each iteration represents one possibility of model parameters
More plausible possibilities will appear more frequently
In our case, this means that we have one line (Intercept + Slope) for each iteration
We can visualize these various lines, that corresponds to possibilities emerging from our prior parameters (i.e., before seeing the data)

Prior Predictive Check - Relationship

p <- ggplot(mtcars, aes(x=mpg, y=qsec)) +
  geom_point(size=3) 
p

p + geom_line(data=pred, aes(y=iter_1))

p + 
  geom_line(data=pred, aes(y=iter_1)) + 
  geom_line(data=pred, aes(y=iter_2)) +
  geom_line(data=pred, aes(y=iter_3)) +
  geom_line(data=pred, aes(y=iter_4)) +
  geom_line(data=pred, aes(y=iter_5))

Reshaping iterations

We can reshape the dataframe (pivot into longer format) to have one column for all iterations and another to identify the iteration number

pred <- reshape_iterations(pred)
head(pred)

  Predicted mpg iter_index iter_group iter_value
1  16.91122  10          1          1   41.22650
2  16.38351  15          2          1   46.27231
3  15.85580  20          3          1   51.31811
4  15.32809  25          4          1   56.36392
5  14.80038  30          5          1   61.40972
6  14.27267  35          6          1   66.45553

Visualizing iterations

We can visualize the relationship between mpg and qsec for each iteration

p + 
  geom_line(data=pred, aes(y=iter_value, group=iter_group), alpha=0.2) +
  coord_cartesian(ylim=c(-400, 400))

What can we say?
This is rather good
- As we can see, the priors don’t predict the exact relationship (the priors don’t have knowledge about what we want to conclude)
- They don’t predict crazy numbers either (e.g., a billion)

Prior Predictive Check - Distribution

We can also visualize the distribution of the predictions to see whether it predicts the outcome in a reasonable range of values

pred <- get_predicted(model_priors, data=mtcars, iterations=250) |> 
  as.data.frame() |> 
  reshape_iterations()

head(pred)

  Predicted iter_index iter_group iter_value
1  16.40507          1          1  -73.97298
2  16.40507          2          1  -73.97298
3  16.34220          3          1  -83.88313
4  16.39110          4          1  -76.17524
5  16.48541          5          1  -61.31002
6  16.50636          6          1  -58.00664

ggplot(mtcars, aes(x=qsec)) +
  geom_density(fill="skyblue") +
  geom_density(data=pred, aes(x=iter_value, group=iter_group), color="orange", alpha=0.5) +
  coord_cartesian(xlim=c(-100, 100), ylim=c(0, 0.3))

Exercice

Set very narrow priors for the Intercept and mpg parameters that are centered around the values that we know from the Frequentist model
Run posterior predictive checks. Try to obtain a plot that looks like this:

The Art of Priors

Prior choice can depend on:

Mathematical constraints
- “sigma is strictly positive”
- “correlations are between -1 and 1”
Theoretical knowledge
- Expert knowledge (“this is a plausible range”)
- Model knowledge (e.g., Intercept to the mean of the outcome)
Empirical knowledge
- From previous studies
- From descriptive stats
Types of Priors (different terms, no established taxonomy)
- Weakly informative, mildly informative, strongly informative
- Uninformative, regularizing, principled, informative¹
- Skeptical, agnostic, believer²

The Practical Side

In general, don’t panic and follow the rule of “good enough”
- If you have reasons to care about priors (e.g., complex models, convergence issues, having ideas about priors), then it’s worth spending time to think about them
- If not, going with the default / flat / very weakly informative priors is fine
- But don’t forget to assess the convergence of the model (e.g., MCMC sampling process) to make sure the posterior has been well explored…

Homework

Watch McElreath’s lecture on MCMC
- https://www.youtube.com/watch?v=rZk2FqX2XnY&t=788s&ab_channel=RichardMcElreath

The End _{^{(for now)}}

Thank you!

Bayesian Statistics

Recap

Goal

1. Visualizing Model Predictions

Exercice

Slope and Intercept

How to Visualize a Model

Making Predictions

Visualizing Predicted Distribution

Visualizing Predicted Relationship (1)

Visualizing Predicted Relationship (2)

Datagrids

Visualize Predictions with Confidence Intervals

2. Bayesian Models: Quick n’ Dirty

Installing brms

Stan, JAGS, and BUGS

About brms

Doing a Bayesian Regression

Voilà!

3. Bayesian Models: Priors

brms formula

Regression formulas: the intercept

Exercice

brms formula exception: the intercept

Get Default Priors

Understanding Priors Parameters

Flat priors

Alternative to Flat Priors

What priors to use? Intercept (1)

What priors to use? Intercept (2)

What priors to use? Intercept (3)

What priors to use? Intercept (4)

Priors for Coefficients (1)

Priors for Coefficients (2)

Priors for Coefficients (3)

Priors for Coefficients (4)

Chill out, it’s just a prior

Set Priors in brms

Using set_prior()

Validate Priors in brms

Check Priors

Make a model with priors only

Generate predictions

Iterations

Prior Predictive Check - Relationship

Reshaping iterations

Visualizing iterations

Prior Predictive Check - Distribution

Exercice

The Art of Priors

The Practical Side

Homework

The End (for now)

Installing `brms`

About `brms`

`brms` formula

`brms` formula exception: the intercept

Set Priors in `brms`

Using `set_prior()`

Validate Priors in `brms`

The End _{^{(for now)}}