Bayesian Statistics

0. RefreshR

_{Dominique Makowski}
_{^{D.Makowski@sussex.ac.uk}}

How to do Bayesian Correlations

Frequentist Pearson Correlation Matrix

df <- iris # A dataframe available in base R

correlation::correlation(df) |>
    summary()

# Correlation Matrix (pearson-method)

Parameter    | Petal.Width | Petal.Length | Sepal.Width
-------------------------------------------------------
Sepal.Length |     0.82*** |      0.87*** |       -0.12
Sepal.Width  |    -0.37*** |     -0.43*** |            
Petal.Length |     0.96*** |              |            

p-value adjustment method: Holm (1979)

Bayesian Pearson Correlation Matrix

df <- iris # A dataframe available in base R

correlation::correlation(df, bayesian = TRUE) |>
    summary()

# Correlation Matrix (pearson-method)

Parameter    | Petal.Width | Petal.Length | Sepal.Width
-------------------------------------------------------
Sepal.Length |     0.80*** |      0.86*** |       -0.11
Sepal.Width  |    -0.36*** |     -0.41*** |            
Petal.Length |     0.96*** |              |

Bayesian stats are easy* to do

This module is about understanding what we are doing, and why we are doing it.

_{* Easy as in “not too hard”}

Why Bayesian Statistics?

The Bayesian framework for statistics has quickly gained popularity among scientists, associated with the general shift towards open, transparent, and more rigorous science. Reasons to prefer this approach are:

Reliability and flexibility _{^{(Kruschke, Aguinis, & Joo, 2012; Etz & Vandekerckhove, 2016)}}
The possibility of introducing prior knowledge into the analysis _{^{(Andrews & Baguley, 2013; Kruschke et al., 2012)}}
Intuitive results and their straightforward interpretation _{^{(Kruschke, 2010; Wagenmakers et al., 2018)}}

Learning Bayes? Back to the Bayesics first

This module adopts a slightly unorthodox approach: instead of starting with Bayesian theory and equations, we will first consolidate various concepts and notions that are present in Frequentist statistics, that will help us to understand and apply the Bayesian framework
We won’t be talking about Bayes until a few sessions in (“awww 😞”)
- But everything we do until then will be in preparation for it
Understanding Bayes painlessly requires to be very clear about some fundamental concepts, in particular, probability distributions and model parameters

Differences from Previous Sussex Stats Course

No self-paced “tutorials”, all is workshop style
Less emphasis on “THE ONLY WAY”, focus on understanding concepts and applying them critically
No imposed organization, templates, model response, code functions to follow etc.
But paradoxically we now actually care about the code
Change of mindset: you are more ECRs than students
Assessments involve understanding what we discuss in class (and at the end being able to do a Bayesian analysis), no more no less

“There will be no foolish tutorial-reading or silly puppets in this class. As such, I don’t expect many of you to appreciate the subtle science and exact art that is Bayesian Statistics.” Snape (allegedly)

How to successfuly attend this module

Goal: Becomes master* of Bayesian statistics
- Master User: be comfortable using and reading Bayesian statistics
- $\neq$ becoming a master mathematician
- Right level of understanding: not too superficial, not too deep
Code shown in the slides should in general be understood
- But you don’t need to memorize it
- Best to follow along by trying and running the code on your own system
- (If you need me to slow down, let me know!)
- DO NOT READ AHEAD LOOK UP THE ANSWERS
Ideally, make an Quarto file and write there info and code examples
- Slides will be available online
Equations are not generally important
- No need to memorize it, but you should understand the concepts
- Memorizing a few Greek symbols will be useful
  - In particular beta $\beta$, sigma $\sigma$, mu $\mu$
Please engage (don’t leave me hanging 😢)

Follow the slides

https://github.com/DominiqueMakowski/teaching
- 👉 2025-26: Week 1

The Environment

Setup

Make sure you have R and RStudio on your computer
- Why local installation? Independence, flexibility, and power
Follow the instruction on Canvas (in /Module Information)
If you have any problem, let me know

Vocabulary

R = programming language
RStudio = the “editor” that we use to work with R
Posit = The company that created R Studio and that provides cloud services (used to be called RStudio)
Posit cloud = The online version of RStudio
Quarto = Used to be called R Markdown. A system to combine code, text, and code output into nice documents (similar to Jupyter)
Markdown = A simple syntax to *format* text used on **internet**

Panels

Source (text editor)
Console (interactive)
Environment (objects)
Files (navigate)

Creating New Document

You can interact with code inside code chunks

Options

Remove “editor” (or replace visual with source)
- editor: source
Select “Chunk Output in Console”

Interacting

Create a new document (file)
- .R (R script) or .qmd (quarto document)
Write some code in the script
Run the code
- Click somewhere on the same line that you want to execute
- Or select the code that you want to execute
- Hit Ctrl+Enter

2 + 2

[1] 4

Programming Concepts

Classes

In R, each thing has a class (type)
- Numeric (aka integers and floats; numbers)
- Character (aka string; text)
- Logical (aka booleans; TRUE/FALSE)
  - Note: TRUE and FALSE are equivalent to 1 and 0
  - Try: TRUE + TRUE
- Factors (aka categorical; e.g. experimental conditions)
- Comments (with hash #, CTRL + SHIFT + C)
- “Functions” (ends with (); e.g. mean())
- Many more…
You can access a function’s documentation with ?mean or clicking on the function and pressing F1
Functions can be “nested”, e.g., sqrt(sqrt(sqrt(2)))
You can check the class of an object with class()

Types

# Number
3
# Character "quotations"
"text"
# Logical
TRUE

Check class

x <- 3
class(x)

[1] "numeric"

Printing

x <- 3

The content of something is only shown on demand, by printing it

print(x)

[1] 3

The last line of a code chunk, if not assigned to anything, is automatically printed by R

x  # = print(x)

[1] 3

Printing things don’t modify (“save”) things:

x + 1

[1] 4

[1] 3

Exercice

How many functions do we use in sqrt(mean(c(1, 2, 3)))?
What’s the difference between mean() and mean?
A friend tells you to use mad() but you don’t know what it does. How do you find out?
What are the arguments of the function mean()?
What’s the difference between sqrt(2) and sqrt(x=2)?
What is the output of class(TRUE)
What is the output of class(class)
What is the output of class(class(class))
What is the output of class(class(class(class)))

Assignment and Equality

Assign names to objects with <- (or =, but <- is preferred in R)
= is used for arguments inside functions (e.g., sqrt(x = 2))
== is used to test for equality (e.g., x == 3 returns TRUE if x is equal to 3, FALSE otherwise)

Exercice

What is the output of class(3 == 2 + 1)
What is the output of "a" <- 3?
What’s the difference between an object and a variable?
- Object = anything that exists in R
- Variable = The name assigned to an object OR a column in a data frame
- x <- 3. “x” is a variable.

Vectors

A vector is a “list” of elements of the same class, indexed by their position
In R, most operations are by default vectorized (i.e., applied to each element of the vector)
Create and concatenate vectors with the combine function c()

# Vector
x <- c(0, 1, 2)
x + 3

[1] 3 4 5

c(x, 3)

[1] 0 1 2 3

Vector Indexing

x <- c(0, 1, 2, 3)

R starts counting at 1, not 0.

x[2]

[1] 1

Vectors can also be indexed via logical vectors

x[c(TRUE, FALSE, TRUE, FALSE)]

[1] 0 2

Useful for filtering

mask <- x >= 1
mask

[1] FALSE  TRUE  TRUE  TRUE

x[mask]

[1] 1 2 3

Vectors vs. Lists

A list is a container of named elements of any kind, indexed by their name
The order of things doesn’t matter (unlike in a vector)
We can extract elements via their names (instead of via their index)

mylist <- list(var1 = "some text", var2 = 30, var3 = x)
mylist$var3 # = mylist[["var3"]]

[1] 0 1 2 3

Warning

mylist[] returns a list, while mylist[[]] returns the element itself

You can also merge lists with c()

mylist2 <- list(var4 = "some other text")
c(mylist, mylist2)

$var1
[1] "some text"

$var2
[1] 30

$var3
[1] 0 1 2 3

$var4
[1] "some other text"

Exercice

What is the output?

it_is_a_character <- 3
it_is_a_character <- c(it_is_a_character, 3)
it_is_a_character <- c(it_is_a_character, 3)
it_is_a_character + 3

What is the output?

mylist <- list(
    numbers = c(1, 2, 3),
    note = "hello"
)

a <- mylist["numbers"]
b <- mylist[["numbers"]]

class(a)
class(b)

Exercice

What is the output?

mylist2 <- list(
    note = "new note",
    extra = TRUE
)

merged <- c(mylist, mylist2)
names(merged)
merged

Sequences

You can create vectors with : operator, e.g., 1:10 creates a vector containing the sequence 1, 2, …, 10

5:9

[1] 5 6 7 8 9

The function seq() can be used to create vectors with more control, by specifying the step size OR the length of the output vector

seq(0, 1, by = 0.2) # from 0 to 1, step 0.2

[1] 0.0 0.2 0.4 0.6 0.8 1.0

seq(0, 1, length.out=3)

[1] 0.0 0.5 1.0

Pipes

Pipe: |>, with CTRL + SHIFT + M
- If old pipe %>%: Tools -> Global Options -> Code -> Native Pipe Operator
Puts the previous “stuff” as the first argument of the next function

4 |> sqrt() # equivalent to

[1] 2

sqrt(4)

[1] 2

Pipes are useful to chain operations in a Human-readable way (“do this then this then this”)

result <- 4 |>
    sqrt() |>
    c(1, 0) |>
    as.character()

result

[1] "2" "1" "0"

But if always ends up at the start (i.e., the end of the chain gets “stored” in the thing at the beginning of the statement)

DataFrames

A data frame is a collection of vectors of the same length (i.e., a table)
Each vector is a column of the data frame
Each column can have a different class (e.g., numeric, character, logical, etc.)

# Create a data frame
df <- data.frame(
    var1 = c(1, 2, 3),
    var2 = c("a", "b", "c"),
    var3 = c(TRUE, FALSE, TRUE)
)

A few “example” dataframes are directly available in base R, e.g., mtcars, iris

Tip

You can view the first rows of a data frame with head()

head(iris)

  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa

Data classes

Similarly to lists, you can access columns via their names with [[]]
Or with the $ operator, which is a shorthand for [[]]

df$var1

[1] 1 2 3

Each column of a data frame has a class: the type

class(df$var1)

[1] "numeric"

class(df$var2)

[1] "character"

class(df$var3)

[1] "logical"

Data Concatenation

df1 <- data.frame(x = c(1, 2), y = c("a", "b"))
df2 <- data.frame(x = c(3, 4), y = c("c", "d"))
df3 <- data.frame(z = c(TRUE, FALSE))

c() would be ambiguous for dataframes because tables can be combining vertically (by rows) or horizontally (by columns)
rbind() and cbind() are used to combine dataframes by rows or by columns

df4 <- rbind(df1, df2)
df4

  x y
1 1 a
2 2 b
3 3 c
4 4 d

df5 <- cbind(df1, df3)
df5

  x y     z
1 1 a  TRUE
2 2 b FALSE

Packages

Install packages with install.packages()

install.packages("tidyverse")
install.packages("easystats")

tidyverse¹ and easystats² are actually collections of packages
Load packages with library()
- This simply makes the functions of the package available in the current session
- You can still call functions from packages that are not loaded by explicitly mentioning the package name pkg::fun()

Tip

It is good practice to explicitly mention a function’s package when using it, e.g. dplyr::select(), especially when using less popular functions.

ggplot basics (1)

ggplot2 is the main R package for data visualization
It is based on the Grammar of Graphics _{^{(Wilkinson, 2005)}}
The main function is ggplot()
1. Takes a data frame as first argument
2. Followed by a mapping of variables to aesthetic characteristics (x, y, color, shape, etc.)
3. We can then add layers to the plot with +
Note: In ggplot (and most tidyverse) packages, variables are not quoted (x=Sepal.Length, not x="Sepal.Length")
- This is not typically the case (in other packages and languages)

library(tidyverse)

ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width)) +
    geom_point() +
    geom_density_2d() +
    theme_classic()

ggplot basics (2)

The arguments passed to ggplot() are inherited by the layers
One can specify different data & aesthetics for each layer

ggplot() +
    geom_point(data = iris, aes(x = Sepal.Length, y = Sepal.Width)) +
    geom_density_2d(data = iris, aes(x = Sepal.Length, y = Sepal.Width)) +
    theme_classic()

ggplot basics (3)

Aside from aesthetics and data, other arguments can be used to customize the plot

ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width)) +
    geom_point(color = "yellow", size = 4, shape = "triangle") +
    geom_density_2d(color = "red") +
    see::theme_abyss() # Package in easystats

ggplot basics (4)

Warning

Misnomer: do NOT confuse arguments that are “aesthetics” in aes() (i.e., map variable names to aesthetic features) with arguments that control the appearance of the plot (not in aes())

ggplot(iris) +
    geom_point(aes(x = Sepal.Length, y = Sepal.Width, color = "blue"))

ggplot(iris) +
    geom_point(aes(x = Sepal.Length, y = Sepal.Width), color = "blue")

ggplot basics (5)

The appearance of aesthetic mappings can be controlled with scale_*()

iris |>
    ggplot(aes(x = Sepal.Length, y = Sepal.Width, color = Species)) +
    geom_point(size = 3) +
    scale_color_manual(
        values = list(
            setosa = "orange",
            versicolor = "purple",
            virginica = "green"
        )
    ) +
    see::theme_abyss()

Exercice

1. Make a dataframe with 2 columns:
- age: with a vector from 0 to 10
- height: with a vector from 150 to 170
- weight : with a vector from 40 to 70
- The dataframe should have 30 rows
1. Visualize:
- “Values of age as a function of height. The size of the points should reflect weight. Make the points green. Add a minimal theme.”

df <- data.frame(age=seq(0, 10, length.out=30),
                 height=seq(150, 170, length.out=30),
                 weight=seq(40, 70, length.out=30))

ggplot(df, aes(x=height, y=age, size=weight)) +
  geom_point(color = "green") +
  theme_minimal()

For loops

For loops are used to iterate over a sequence of values

myvec <- c("Tom", "Dom", "Harry")
for (x in myvec) {
    print(x)
}

[1] "Tom"
[1] "Dom"
[1] "Harry"

It is convenient to iterate over sequences of numbers, e.g., 1:10

for (i in 1:10) {
    print(i)
}

[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
[1] 6
[1] 7
[1] 8
[1] 9
[1] 10

It is useful to initialize an empty list to then store some result at each iteration

myseq <- c() # Initialize empty vector
for (i in 1:10) {
    # Take 10 random elements from this list
    newvector <- sample(c(1, 2, 3), 10, replace = TRUE)
    # Compute mean
    mu <- mean(newvector)
    # Append to myseq
    myseq <- c(myseq, mu)
}
myseq

 [1] 1.8 2.2 2.4 1.9 2.2 1.7 1.9 2.3 2.0 2.2

Functions

Functions are self-contained factories
- They take some variables in (through arguments)
- They return some output

# Define a new function
do_an_addition <- function(x, y) {
    result <- x + y
    return(result)
}

# Call the function
result <- do_an_addition(x = 2, y = 3)
result

[1] 5

Quizz Time

1 + "1" returns an error. Why?
What’s the difference between c() and list()?
In ggplot, aesthetics refer to visual customization (e.g., change the color of all points)
A pipe takes the output of the previous function as the first argument of the next
What will True * 3 return?
What will TRUE / 10 return?
I do ggplot(iris, aes(x="Sepal.Length", y="Petal.Length")) but it throws an error. Why?
I do ggplot(iris, aes(x=Sepal.Length, y=Petal.length)) but it throws an error. Why?
I am running mutate(data, x = 3) but it says Error in mutate(x) : could not find function "mutate". Why?
What is the problem with the following:

do_a_multiplication <- function(numbers) {
    result <- x * y
    return(result)
}

What will the following return?

a_cool_function <- function(v1, v2) {
    return(v1 + v2)
}

the_answer_is_42 <- c()
for (t in 1:2) {
  for (h in 3:4) {
    v <- t |> 
      a_cool_function(h) 
    the_answer_is_42 <- c(the_answer_is_42, v)
  }
}
the_answer_is_42

The End _{^{(for now)}}

Thank you!

Bayesian Statistics

How to do Bayesian Correlations

Bayesian stats are easy* to do

Why Bayesian Statistics?

Learning Bayes? Back to the Bayesics first

Differences from Previous Sussex Stats Course

How to successfuly attend this module

Follow the slides

The Environment

Setup

Vocabulary

Panels

Creating New Document

Options

Interacting

Programming Concepts

Classes

Printing

Exercice

Assignment and Equality

Exercice

Vectors

Vector Indexing

Vectors vs. Lists

Exercice

Exercice

Sequences

Pipes

DataFrames

Data classes

Data Concatenation

Packages

ggplot basics (1)

ggplot basics (2)

ggplot basics (3)

ggplot basics (4)

ggplot basics (5)

Exercice

For loops

Functions

Quizz Time

The End (for now)

The End _{^{(for now)}}