Bayesian Statistics

2. On Regression Parameters in a Frequentist Framework

Dominique Makowski
D.Makowski@sussex.ac.uk

Recap

Solution

Maximum Likelihood

Fitting distributions to data

  • Maximum Likelihood Estimation (MLE) is the act of estimating the parameters of a distribution most fitting the data.
  • Typical algorithms that solve it proceed by iteratively adjusting the parameters of the distribution and computing the divergence between the observed and predicted probabilities, and finding the parameters that minimize this divergence (i.e., that maximizes the likelihood).
  • They are not trivial algorithms
  • We can use the fitdistr() (“fit distribution”) function from the MASS package.
x <- rnorm(1000, mean=50, sd=10)

MASS::fitdistr(x, "normal")
      mean          sd    
  49.8828661    9.9825633 
 ( 0.3156764) ( 0.2232169)
Show figure code
params <- MASS::fitdistr(x, "normal")

data.frame(x=x) |> 
  ggplot(aes(x=x)) +
  geom_histogram(aes(y=after_stat(density)), bins=60, fill="grey", color="black") +
  geom_line(
    data = data.frame(
      x=seq(0, 100, length.out = 500), 
      y=dnorm(seq(0, 100, length.out = 500), mean=params$estimate["mean"], sd=params$estimate["sd"])), 
    aes(x=x, y=y), color="red", linewidth=2) +
  theme_bw() +
  labs(x="x", y="Density")

x <- rgamma(1000, 2, 3)

MASS::fitdistr(x, "normal")
      mean          sd    
  0.65923460   0.45948465 
 (0.01453018) (0.01027439)
Show figure code
params <- MASS::fitdistr(x, "normal")

data.frame(x=x) |> 
  ggplot(aes(x=x)) +
  geom_histogram(aes(y=after_stat(density)), bins=100, fill="grey", color="black") +
  geom_line(
    data = data.frame(
      x=seq(-1.5, 3.5, length.out = 500), 
      y=dnorm(seq(-1.5, 3.5, length.out = 500), mean=params$estimate["mean"], sd=params$estimate["sd"])), 
    aes(x=x, y=y), color="red", linewidth=2) +
  theme_bw() +
  labs(x="x", y="Density")

Gaussian Maximum Likelihood vs. The Mean

x <- rgamma(1000, 2, 3)

MASS::fitdistr(x, "normal")
      mean          sd    
  0.67509327   0.47089492 
 (0.01489100) (0.01052953)
mean(x)
[1] 0.6750933
x <- runif(1000, -10, 4)

MASS::fitdistr(x, "normal")
      mean           sd     
  -2.87920735    3.92875625 
 ( 0.12423818) ( 0.08784966)
mean(x)
[1] -2.879207
  • Fitting the best Gaussian distribution has interesting properties
  • The “location” (mean) of the best fitting Normal distribution of the data = the actual (empirical) mean of the data
    • We can use one to find the other

Maximum Likelihood: Caveats

  • MLE will always find some best fitting parameters: it doesn’t mean that the distribution is a good fit for the data.
x <- rbeta(2000, 0.5, 0.5)

MASS::fitdistr(x, "normal")
      mean           sd     
  0.500284152   0.360421491 
 (0.008059270) (0.005698764)
Show figure code
params <- MASS::fitdistr(x, "normal")

data.frame(x=x) |> 
  ggplot(aes(x=x)) +
  geom_histogram(aes(y=after_stat(density)), bins=100, fill="grey", color="black") +
  geom_line(
    data = data.frame(
      x=seq(-0.5, 1.5, length.out = 500), 
      y=dnorm(seq(-0.5, 1.5, length.out = 500), mean=params$estimate["mean"], sd=params$estimate["sd"])), 
    aes(x=x, y=y), color="red", linewidth=2) +
  theme_bw() +
  labs(x="x", y="Density")

Quizz

  • Run the following:
df <- data.frame(y = rgamma(50000, 0.5, 8))
Code
ggplot(df, aes(x=y)) +
  geom_histogram(aes(y=after_stat(density)), bins=100, fill="grey", color="black") +
  theme_bw() +
  labs(x="y", y="Density")

  • What are the parameters of the best fitting Normal distribution of y?
MASS::fitdistr(df$y, "normal")
       mean            sd     
  0.0626589570   0.0886953261 
 (0.0003966576) (0.0002804792)
  • Fit an intercept-only linear regression model to y and look at its summary.
model <- lm(y ~ 1, data=df)
summary(model)

Call:
lm(formula = y ~ 1, data = df)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.06266 -0.05617 -0.03399  0.02038  1.30855 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 0.0626590  0.0003967     158   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.0887 on 49999 degrees of freedom
  • What do you notice?

Linear Model = “Gaussian” Model

  • You might have heard that an “intercept only” model predicts the mean of the data
  • That is true, but it is not the whole story…
  • More generally, a linear model finds the parameters of the best fitting normal distribution
    • The location \(\mu\) corresponds to the mean of the data
    • The spread \(\sigma\) corresponds to the standard deviation of the “residuals” (i.e., the distance between the observed data and the predicted mean)
  • In other words, a typical “linear model” is actually a “Gaussian” model (based on a Normal distribution)
Code
df <- data.frame(y = rnorm(5000, 10, 5))

ggplot(df, aes(y=y)) +
  geom_jitter(aes(x=0), alpha=0.3) +
  geom_linerange(data=data.frame(x=0, y=mean(df$y), ymin=mean(df$y)-0.5*sd(df$y), ymax=mean(df$y)+0.5*sd(df$y)), 
                 aes(x=x, y=y, ymin=ymin, ymax=ymax), color="red", linewidth=2) +
  geom_point(data=data.frame(x=0, y=mean(df$y)), aes(x=x, y=y), color="red", size=4) +
  geom_hline(yintercept = 10, linetype="dashed", color="red", linewidth=1) +
  coord_cartesian(xlim = c(-1, 0.4)) + 
  labs(x="", y="y") +
  ggside::geom_ysidedensity(data=data.frame(y = rnorm(50000, 10, 5)), color = "red", linewidth = 2) +
  theme_minimal() +
  theme(axis.text.x = element_blank(), axis.ticks.x = element_blank(), panel.grid.major.x = element_blank(), panel.grid.minor.x = element_blank()) 

Homoscedasticity

  • In typical linear models, we express \(\mu\) as a function of the predictors (e.g., \(\mu = \beta_{0} + \beta_{1}x\))
  • \(\sigma\) is assumed to be constant across all observations (i.e., homoscedasticity)
Code
model <- lm(mpg ~ qsec, data=mtcars)

p <- mtcars |> 
  ggplot(aes(x=qsec, y=mpg)) +
  geom_smooth(method="lm", formula = 'y ~ x', se=FALSE, linewidth=2) +
  theme_bw()

# Function to add normal distribution curves
add_normals <- function(p, model) {
  sigma <- summary(model)$sigma  # Standard deviation of residuals
  n <- 100  # Number of points for each curve
  
  for(i in 1:nrow(mtcars)) {
    x_val <- mtcars$qsec[i]
    y_pred <- predict(model, newdata = data.frame(qsec = x_val))
    
    # Create a sequence of y values for the normal curve
    y_seq <- seq(y_pred - 3*sigma, y_pred + 3*sigma, length.out = n)
    density <- dnorm(y_seq, mean = y_pred, sd = sigma)
    
    # Adjust density to match the scale of the plot
    max_width <- 1  # Max width of areas
    density_scaled <- (density / max(density)) * max_width
    
    # Create a dataframe for each path
    path_df <- data.frame(x = x_val + density_scaled, y = y_seq)
    path_dfv <- data.frame(x=path_df$x[1], ymin=min(path_df$y), ymax=max(path_df$y))
    
    # Add the path to the plot
    p <- p + 
      geom_segment(data = path_dfv, aes(x = x, xend=x, y = ymin, yend=ymax), 
                   color = "#FF9800", linewidth = 0.7, alpha=0.8, linetype="dotted") +
      geom_path(data = path_df, aes(x = x, y = y), 
                color = "#FF9800", linewidth = 1, alpha=0.8) 
  }
  p
}


# Create the final plot
p_ml <- add_normals(p, model) +
  geom_segment(aes(x = qsec, y = mpg, xend = qsec, yend = predict(lm(mpg ~ qsec, data=mtcars))), 
               color="red", linetype="solid", linewidth=1) +
  geom_point(alpha=0.8, size=6) 
p_ml

Quizz

  • Estimating the parameters that maximize the likelihood of the data given a model1 (i.e., a distribution) is called…
  • Linear regressions use what model (what distribution)?
  • Linear regressions estimate how many parameters?
    • \(\mu\) (location): often expressed as a function of other predictors
    • \(\sigma\) (spread): assumed to be constant across all observations (i.e., homoscedasticity)

MLE vs. Ordinary Least Squares (OLS)

  • The regression line obtained with MLE with a Gaussian Distribution (i.e., the solution to the Gaussian model MLE algorithm) = the line that minimizes the sum of squared residuals (OLS)
    • Conversely: the line that minimizes the sum of squared residuals (OLS) = the solution to the Gaussian model MLE algorithm
    • USEFUL!
  • The OLS solution can be easily computed using simple formulas, without the need for complex optimization algorithms (as in MLE)
  • The formula for OLS solves a special case of MLE (when we assume a Normal distribution with a single fixed \(\sigma\) parameter)

Ordinary Least Squares (OLS)

  • The Ordinary Least Squares (OLS) line is the line that minimizes the sum of squared residuals (Residual Sum of Squares - RSS)
    • \(RSS = \sum(residuals^{2})\) (the square is to make them positive)
  • The residuals are the differences between observed and predicted values
    • \(RSS = \sum_{i=1}^{n_{obs}}(y_{i} - \hat{y}_{i})^{2}\)
  • Squared residuals correspond to Errors (epsilon \(\varepsilon\))
Code
p <- mtcars |> 
  ggplot(aes(x=qsec, y=mpg)) +
  geom_point(alpha=0.7, size=6) +
  geom_smooth(method="lm", formula = 'y ~ x', se=FALSE, linewidth=2) +
  geom_segment(aes(x = qsec, y = mpg, xend = qsec, yend = predict(lm(mpg ~ qsec, data=mtcars))), 
               color="red", linetype="dotted", linewidth=1) +
  theme_bw() +
  labs(x="x", y="y", title="The residuals are the vertical distances between each point and the line.")
p

From RSS to Sigma

  • Under OLS, we can compute the regression line using a simple formula (which is the same as under MLE with a Gaussian distribution)
  • Once we have the line, we can compute the residuals, from which we can work out the \(\sigma\) (sigma) of the best fitting Normal distribution
  • The standard deviation of the errors, also called the residual standard error, and is computed as:
    • \(\sigma = \sqrt{\frac{\sum_{i=1}^{n} (y_i - \hat{y_i})^2}{n_{obs} - k}}\)
    • \(\sigma = \sqrt{\frac{RSS}{n_{obs} - k}}\)
    • \(k\) is the number of predictors in the model (excluding the intercept)
    • N observations - k predictors = degrees of freedom (df)
    • \(\sigma = \sqrt{\frac{RSS}{df}}\)

Summary: OLS vs. MLE

  • MLE is a general approach to estimating parameters of models and is not limited to linear models
    • Same approach used in many other models (e.g., logistic regression, Poisson regression, etc.) where the key change is the family (type) of distribution
  • In linear models, where the distribution of errors is Normal, OLS estimation = MLE solution
  • OLS is one easy way (i.e., solvable using simple formulas) to estimate a special case of MLE models
  • In R, there are actually two separate functions that can be used for Gaussian models
    • lm() solves a Gaussian model using OLS
    • glm() solves general linear models using MLE and can be used for simple Gaussian models too
summary(lm(mpg ~ qsec, data=mtcars))

Call:
lm(formula = mpg ~ qsec, data = mtcars)

Residuals:
    Min      1Q  Median      3Q     Max 
-9.8760 -3.4539 -0.7203  2.2774 11.6491 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)  
(Intercept)  -5.1140    10.0295  -0.510   0.6139  
qsec          1.4121     0.5592   2.525   0.0171 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 5.564 on 30 degrees of freedom
Multiple R-squared:  0.1753,    Adjusted R-squared:  0.1478 
F-statistic: 6.377 on 1 and 30 DF,  p-value: 0.01708
summary(glm(mpg ~ qsec, data=mtcars, family="gaussian"))

Call:
glm(formula = mpg ~ qsec, family = "gaussian", data = mtcars)

Coefficients:
            Estimate Std. Error t value Pr(>|t|)  
(Intercept)  -5.1140    10.0295  -0.510   0.6139  
qsec          1.4121     0.5592   2.525   0.0171 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for gaussian family taken to be 30.95518)

    Null deviance: 1126.05  on 31  degrees of freedom
Residual deviance:  928.66  on 30  degrees of freedom
AIC: 204.59

Number of Fisher Scoring iterations: 2

Regression Parameters

Tests vs. Models

  • Psychology uses a lot of tests (t-tests, correlation tests, ANOVAs, etc.)
  • Tests != Models
    • We can’t predict the value of y given x with a test
  • However, tests are usually based on models
  • In particular, tests are related to specific parameters of models
cor.test(mtcars$wt, mtcars$qsec) |> 
  parameters::parameters()
Pearson's product-moment correlation

Parameter1 |  Parameter2 |     r |        95% CI | t(30) |     p
----------------------------------------------------------------
mtcars$wt  | mtcars$qsec | -0.17 | [-0.49, 0.19] | -0.97 | 0.339

Alternative hypothesis: true correlation is not equal to 0
lm(wt ~ qsec, data=mtcars) |> 
  parameters::parameters()
Parameter   | Coefficient |   SE |        95% CI | t(30) |     p
----------------------------------------------------------------
(Intercept) |        4.92 | 1.77 | [ 1.32, 8.53] |  2.79 | 0.009
qsec        |       -0.10 | 0.10 | [-0.30, 0.11] | -0.97 | 0.339
  • What are Parameters?

Linear Model

  • Linear function: \(y = ax + b\)
  • Regression formula: \(\hat{y} = Intercept + \beta_{1}x\)
    • \(\hat{y}\) (Y-hat): predicted response/outcome/dependent variable
    • \(Intercept\) (\(\beta_{0}\)): “starting point”. Value of \(\hat{y}\) when \(x=0\)
    • \(\beta_{1}\) (beta) : slope/“effect”. Change of \(\hat{y}\) from the intercept when \(x\) increases by 1
    • \(x\): predictor/independent variable
  • What are the parameters of the regression line below? Intercept = 1 and beta = 3
Code
df <- bayestestR::simulate_correlation(n=500, r=0.7)

ggplot(df, aes(x=V1, y=V2)) +
  geom_point(alpha=0) +
  geom_vline(xintercept = 0, linetype="dotted") +
  geom_abline(intercept = 1, slope = 3, color="red", linewidth=2)  +
  geom_segment(aes(x = 0, y = 1, xend = 1, yend = 1), linewidth=1, 
               color="green", linetype="dashed") +
  geom_segment(aes(x = 1, y = 1, xend = 1, yend = 4), linewidth=1, 
               color="green", linetype="solid", arrow=arrow(length = unit(0.1, "inches"))) +
  geom_segment(aes(x = 0, y = 0, xend = 0, yend = 1), linewidth=1, 
               color="blue", linetype="solid", arrow=arrow(length = unit(0.1, "inches"))) +
  geom_point(aes(x = 0, y = 0), color="purple", size=8, shape="+") +
  labs(x="x", y="y") +
  theme_bw() +
  coord_cartesian(xlim = c(-1, 1.5), ylim = c(-3, 4))

Impact of Parameters

  • What happens when we change the slope and the intercept?

Hands-on!

  • Compute the linear regression coefficients for:
    • Response variable: mpg
    • Predictor variable: qsec
    • Dataset: mtcars (built-in in R)

Linear Regression in R

  • In R, we use the lm() function (linear model) with the response ~ predictor formula to specify the model
  • We can apply the summary() method to get the results (base R equivalent of easystats’ model_parameters())
  • What are the coefficients?
model <- lm(mpg ~ qsec, data=mtcars)
summary(model)

Call:
lm(formula = mpg ~ qsec, data = mtcars)

Residuals:
    Min      1Q  Median      3Q     Max 
-9.8760 -3.4539 -0.7203  2.2774 11.6491 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)  
(Intercept)  -5.1140    10.0295  -0.510   0.6139  
qsec          1.4121     0.5592   2.525   0.0171 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 5.564 on 30 degrees of freedom
Multiple R-squared:  0.1753,    Adjusted R-squared:  0.1478 
F-statistic: 6.377 on 1 and 30 DF,  p-value: 0.01708
  • \(Intercept = -5.1140\) and \(\beta_{qsec} = 1.4121\)

Unit sensitivity

df <- mtcars
df$qsec10 <- df$qsec / 10
model <- lm(mpg ~ qsec10, data=df)
  • What happens if you divide the predictor by 10?
    • Will it change the model? Will it make the effect less significant? Will change the coefficient?
summary(model)

Call:
lm(formula = mpg ~ qsec10, data = df)

Residuals:
    Min      1Q  Median      3Q     Max 
-9.8760 -3.4539 -0.7203  2.2774 11.6491 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)  
(Intercept)   -5.114     10.030  -0.510   0.6139  
qsec10        14.121      5.592   2.525   0.0171 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 5.564 on 30 degrees of freedom
Multiple R-squared:  0.1753,    Adjusted R-squared:  0.1478 
F-statistic: 6.377 on 1 and 30 DF,  p-value: 0.01708
  • Raw coefficient in a regression model are unit dependent (it is important to know the scale of your variables for interpreting the coefficients)
  • But the other indices (t-value, p-value) haven’t changed: the two models gives us the same answer.

Model Parameters

  • In regression models, 2 types of parameters are estimated (using OLS/MLE)
    • The coefficients
      • Intercept
      • The slope(s)
    • “Distributional” (aka “Auxiliary”) parameters (the other parameters of the distribution used)
      • E.g., \(\sigma\) (sigma) for linear models
      • Usually, most people tend to ignore these… but they are important!
  • Two models can have the same coefficients, but different sigmas
  • The standard error (the uncertainty related to the coefficient) of the coefficients is computed using \(\sigma\) via a fairly complex formula

Quizz

  • What happens if you divide the coefficient by its SE?
  • You obtain the t-value!
  • Why is it called a t-value 🤔
model <- lm(mpg ~ qsec, data=mtcars)
summary(model)

Call:
lm(formula = mpg ~ qsec, data = mtcars)

Residuals:
    Min      1Q  Median      3Q     Max 
-9.8760 -3.4539 -0.7203  2.2774 11.6491 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)  
(Intercept)  -5.1140    10.0295  -0.510   0.6139  
qsec          1.4121     0.5592   2.525   0.0171 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 5.564 on 30 degrees of freedom
Multiple R-squared:  0.1753,    Adjusted R-squared:  0.1478 
F-statistic: 6.377 on 1 and 30 DF,  p-value: 0.01708

The t-value

  • The t-value is the ratio between the coefficient and its standard error
  • It can be seen as a “standardized” index of the coefficient’s precision (independent of the scale/unit of the variable)
    • Why “precision”?
  • It corresponds to a point on a t-distribution of corresponding degrees of freedom (df), centered at 0
  • This t-distribution is the assumption made in Frequentist statistics about the distribution of coefficients (effects) under the null hypothesis
    • Not to be confused with the distribution of residuals (which is assumed to be Normal)
    • This t-distribution can be used to adjust for the sample size (df) - hence the use of t instead of Normal to introduce bias (especially for small sample sizes). For large sample sizes (= large df), it becomes equivalent to a Normal distribution

The t-value

  • The t-value shows how likely our coefficient compared to the assumed distributions of possible coefficient under the null hypothesis (i.e., if there is an absence of effect)
Code
# Plot  t-distribution
x <- seq(-5, 5, length.out = 1000)
y <- dt(x, df=30)
df <- data.frame(x = x, y = y)

t_value <- insight::get_statistic(model)[2, 2]

df |>
  ggplot(aes(x = x, y = y)) +
  geom_area(color="black", fill="grey") +
  geom_segment(aes(x = t_value, y = 0, xend = t_value, yend = dt(t_value, df=30)), 
               color="red", linetype="solid", linewidth=1) +
  geom_point(aes(x = t_value, y = dt(t_value, df=30)), color="red", size=3) +
  theme_minimal() +
  labs(x = "\nt-values - standardized coefficient under the null hypothesis", y = "Probability", 
       title = paste0("t-Distribution (df=30); ", "t-value = ", round(t_value, 2), "\n"))

Is our coefficient precisely different from 0?

  • Is our coefficient likely to be observed under the null hypothesis?
    • What is the probability to obtain a coefficient at least as precise as ours (in both directions) under the null hypothesis?
    • \(Prob > |t|\)
Code
df$Probability <- ifelse(df$x < -t_value, "< -t", "Smaller")
df$Probability <- ifelse(df$x > t_value, "> +t", df$Probability)
df |>
  ggplot(aes(x = x, y = y)) +
  geom_line() +
  geom_area(aes(x = x, y = y, fill = Probability), alpha = 0.5) +
  geom_segment(aes(x = t_value, y = 0, xend = t_value, yend = dt(t_value, df=30)), 
               color="red", linetype="solid", linewidth=1) +
  geom_point(aes(x = t_value, y = dt(t_value, df=30)), color="red", size=3) +
  theme_minimal() +
  scale_fill_manual(values = c("red", "red", "grey")) +
  labs(x = "\nt-values - standardized coefficient under the null hypothesis", y = "Probability", 
       title = paste0("t-Distribution (df=30); ", "t-value = ", round(t_value, 2), "\n"))

Is our coefficient “big” (i.e., precise)?

  • Is our coefficient (more specifically, its “precision”) likely to be observed under the null hypothesis?
    • What is the probability \(p\) to obtain a coefficient at least as precise as ours (in both directions) under the null hypothesis?
    • \(Prob > |t|\)
prob <- pt(-t_value, df=30, lower.tail = TRUE) + 
  pt(t_value, df=30, lower.tail = FALSE)

round(prob, 4)
[1] 0.0171
  • Look again at your summary…

Call:
lm(formula = mpg ~ qsec, data = mtcars)

Residuals:
    Min      1Q  Median      3Q     Max 
-9.8760 -3.4539 -0.7203  2.2774 11.6491 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)  
(Intercept)  -5.1140    10.0295  -0.510   0.6139  
qsec          1.4121     0.5592   2.525   0.0171 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 5.564 on 30 degrees of freedom
Multiple R-squared:  0.1753,    Adjusted R-squared:  0.1478 
F-statistic: 6.377 on 1 and 30 DF,  p-value: 0.01708
  • It is the p-value!
  • You computed the p-value by hand!

p-value

  • The p-value is the probability of observing a value >= to the t-value under the null hypothesis. In other words, it is the probability of obtaining test results at least as “big” (precisely away from 0) as the result actually observed if we repeat it an infinite amount of times and there is no true effect.
    • It is quite convoluted…
    • Is not a “trivial” value. It has some delicate aspects.
  • Estimation: It is the product of several complicated steps:
    • Estimate the SE of the coefficients (this is not a straightforward process and gets complicated in more complex models)
    • Estimate the df (degrees of freedom) of the model (this is also problematic for complex models, e.g., mixed models)
    • Make a strong assumption about the distribution of the coefficients under the null hypothesis (e.g., the t-distribution)
  • Interpretation:
    • We tend to focus on a dichotomous interpretation of the p-value (significant or not), based on an arbitrary threshold \(\alpha\) typically set to 5% (.05) for literally no reason in particular
    • We often interpret it as the probability of the null hypothesis being true, or as an index of effect magnitude, which is not correct (it’s more related to the width of the certainty around the estimate than its absolute size)
  • However, p-values are not fundamentally bad. They are very good at one thing: controlling how often, if we repeat an experiment infinitely, we’ll make a particular kind of error. It’s just that it’s often not what we care about in practice.

Confidence Intervals (CI)

  • Confidence Intervals are often seen as a “range” of likely values for the coefficient, and the more we would get close to the coefficient the more “likely” the values would be. It is incorrect
  • The confidence interval (CI) is the range of values that contains, e.g., with 95% of probability, the value of the coefficient if the t-distribution was centered around the t-value (as if the null effect was the estimated coefficient).
  • The interpretation is, again, fairly convoluted
Code
# Get 95% ETI of the t distribution
ci <- qt(c(0.025, 0.975), df=30, lower.tail = TRUE) + t_value
# ci * parameters::parameters(model)$SE[2]  # Indeed shows the right CI

df |>
  ggplot(aes(x = x, y = y)) +
  geom_area(fill="grey", alpha=0.5) +
  geom_area(aes(x = x + t_value, y = y), color="red", fill="red", alpha=0.5) +
  geom_segment(aes(x = t_value, y = 0, xend = t_value, yend = dt(t_value, df=30)), 
               color="red", linetype="solid", linewidth=1) +
  geom_point(aes(x = t_value, y = dt(t_value, df=30)), color="red", size=3) +
  # Horizontal segment indicating the CI range
  geom_segment(aes(x = ci[1], y = 0.05, xend = ci[2], yend = 0.05), 
               color="orange", linetype="solid", linewidth=1, arrow = arrow(ends='both')) +
  geom_vline(xintercept = ci[1], color="orange", linetype="dotted", linewidth=0.5) +
  geom_vline(xintercept = ci[2], color="orange", linetype="dotted", linewidth=0.5) +
  theme_minimal() +
  labs(x = "\nt-values - standardized coefficient under the null hypothesis", y = "Probability", 
       title = paste0("Shifted t-Distribution (df=30); ", "location = ", round(t_value, 2), "(t-value)\n"))

Next Week

  • Can we do better and get more intuitive quantities?
    • The answer starts with “B”
    • And ends with “…ootstrapping”

The End (for now)

Thank you!