Bayesian Statistics

0. RefreshR

Dominique Makowski
D.Makowski@sussex.ac.uk

Follow the slides

  • https://github.com/DominiqueMakowski/teaching
    • 👉 Lecture 0

The Environment

Setup

  • Make sure you have R and RStudio on your computer
    • Why local installation? Independence, flexibility, and power
  • Follow the instruction on Canvas (in /Module Information)
  • If you have any problem, let me know

Vocabulary

  • R = programming language
  • RStudio = the “editor” that we use to work with R
  • Posit = The company that created R Studio and that provides cloud services (used to be called RStudio)
  • Posit cloud = The online version of RStudio
  • Quarto = Used to be called R Markdown. A system to combine code, text, and code output into nice documents (similar to Jupyter)
  • Markdown = A simple syntax to *format* text used on **internet**

Panels

  1. Source (text editor)
  2. Console (interactive)
  3. Environment (objects)
  4. Files (navigate)

Creating New Document

  • You can interact with code inside code chunks

Options

  • Remove “editor” (or replace visual with source)
    • editor: source
  • Select “Chunk Output in Console”

Interacting

  1. Create a new document (file)
    • .R (R script) or .qmd (quarto document)
  2. Write some code in the script
  3. Run the code
    • Click somewhere on the same line that you want to execute
    • Or select the code that you want to execute
    • Hit Ctrl+Enter
2 + 2
[1] 4

Programming Concepts

Classes

  • In R, each thing has a class (type)
    • Numeric (aka integers and floats; numbers)
    • Character (aka string; text)
    • Logical (aka booleans; TRUE/FALSE)
      • Note: TRUE and FALSE are equivalent to 1 and 0
      • Try: TRUE + TRUE
    • Factors (aka categorical; e.g. experimental conditions)
    • Comments (with hash #, CTRL + SHIFT + C)
    • “Functions” (ends with (); e.g. mean())
    • Many more…
  • You can check the class of an object with class()
  • You can access a function’s documentation with ?mean or clicking on the function and pressing F1

Types

# Number
3
# Character "quotations"
"text"
# Logical
TRUE

Check class

x <- 3
class(x)
[1] "numeric"

Vectors vs. Lists (1)

  • A vector is a “list” of elements of the same class, indexed by their position
  • In R, most operations are by default vectorized (i.e., applied to each element of the vector)
  • Create and concatenate vectors with the combine function c()
# Vector
x <- c(0, 1, 2)
x + 3
[1] 3 4 5
c(x, 3)
[1] 0 1 2 3

Warning

R starts counting at 1, not 0.

x[2]
[1] 1

Vectors vs. Lists (2)

  • A list is a container of named elements of any kind, indexed by their name
  • The order of things doesn’t matter (unlike in a vector)
  • We can extract elements via their names (instead of via their index)
mylist <- list(var1 = "some text", var2 = 30, var3 = x)
mylist$var3 # = mylist[["var3"]]
[1] 0 1 2

Warning

mylist[] returns a list, while mylist[[]] returns the element itself

  • You can also merge lists with c()
mylist2 <- list(var4 = "some other text")
c(mylist, mylist2)
$var1
[1] "some text"

$var2
[1] 30

$var3
[1] 0 1 2

$var4
[1] "some other text"

Pipes

  • Pipe: |>, with CTRL + SHIFT + M
    • If old pipe %>%: Tools -> Global Options -> Code -> Native Pipe Operator
  • Puts the previous “stuff” as the first argument of the next function
4 |> sqrt()  # equivalent to
[1] 2
sqrt(4)
[1] 2
  • Pipes are useful to chain operations in a Human-readable way (“do this then this then this”)
result <- 4 |>
  sqrt() |>
  c(1, 0) |>
  as.character()
result
[1] "2" "1" "0"

DataFrames

  • A data frame is a collection of vectors of the same length (i.e. a table)
  • Each vector is a column of the data frame
  • Each column can have a different class (e.g., numeric, character, logical, etc.)
# Create a data frame
df <- data.frame(
  var1 = c(1, 2, 3),
  var2 = c("a", "b", "c"),
  var3 = c(TRUE, FALSE, TRUE)
)
  • A few “example” dataframes are directly available in base R, e.g., mtcars, iris

Tip

You can view the first rows of a data frame with head()

head(iris)
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa

Packages

  • Install packages with install.packages()
install.packages("tidyverse")
install.packages("easystats")
  • tidyverse1 and easystats2 are actually collections of packages
  • Load packages with library()
    • This simply makes the functions of the package available in the current session
    • You can still call functions from packages that are not loaded by explicitly mentioning the package name pkg::fun()

Tip

It is good practice to explicitly mention a function’s package when using it, e.g. dplyr::select(), especially when using less popular functions.

ggplot basics (1)

  • ggplot2 is the main R package for data visualization
  • It is based on the Grammar of Graphics (Wilkinson, 2005)
  • The main function is ggplot()
    1. Takes a data frame as first argument
    2. Followed by a mapping of variables to aesthetic characteristics (x, y, color, shape, etc.)
    3. We can then add layers to the plot with +
  • Note: In ggplot (and most tidyverse) packages, variables are not quoted (x=Sepal.Length, not x="Sepal.Length")
    • This is not typically the case (in other packages and languages)
library(tidyverse)

ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width)) +
  geom_point() +
  geom_density_2d() +
  theme_classic()

ggplot basics (2)

  • The arguments passed to ggplot() are inherited by the layers
  • One can specify different data & aesthetics for each layer
ggplot() +
  geom_point(data=iris, aes(x = Sepal.Length, y = Sepal.Width)) +
  geom_density_2d(data=iris, aes(x = Sepal.Length, y = Sepal.Width)) +
  theme_classic()

ggplot basics (3)

  • Aside from aesthetics and data, other arguments can be used to customize the plot
ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width)) +
  geom_point(color="yellow", size=4, shape="triangle") +
  geom_density_2d(color="red") +
  see::theme_abyss()  # Package in easystats

ggplot basics (4)

Warning

Misnomer: do NOT confuse arguments that are “aesthetics” in aes() (i.e., map variable names to aesthetic features) with arguments that control the appearance of the plot (not in aes())

ggplot(iris) +
  geom_point(aes(x = Sepal.Length,
                 y = Sepal.Width,
                 color="blue"))

ggplot(iris) +
  geom_point(aes(x = Sepal.Length,
                 y = Sepal.Width),
             color="blue")

ggplot basics (5)

  • The appearance of aesthetic mappings can be controlled with scale_*()
iris |>
  ggplot(aes(x = Sepal.Length, y = Sepal.Width, color=Species)) +
  geom_point(size=3) +
  scale_color_manual(values=list(setosa="orange",
                                 versicolor="purple",
                                 virginica="green")) +
  see::theme_abyss()

For loops

  • For loops are used to iterate over a sequence of values
myvec <- c("Tom", "Dom", "Harry")
for(x in myvec){
  print(x)
}
[1] "Tom"
[1] "Dom"
[1] "Harry"
  • It is convenient to iterate over sequences of numbers, e.g., 1:10
for(i in 1:10){
  print(i)
}
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
[1] 6
[1] 7
[1] 8
[1] 9
[1] 10
  • It is useful to initialize an empty list to then store some result at each iteration
myseq <- c()  # Initialize empty vector
for(i in 1:10){
  # Take 10 random elements from this list 
  newvector <- sample(c(1, 2, 3), 10, replace = TRUE) 
  # Compute mean
  mu <- mean(newvector)
  # Append to myseq
  myseq <- c(myseq, mu)
}
myseq
 [1] 1.7 2.1 1.8 1.8 2.7 2.2 1.6 1.9 2.2 1.9

Functions

  • Functions are self-contained factories
    • They take some variables in (through arguments)
    • They return some output
# Define a new function
do_an_addition <- function(x, y) {
  result <- x + y
  return(result)  
}  
# Call the function
result <- do_an_addition(x=2, y=3)
result
[1] 5

Quizz Time

  • 1 + "1" returns an error. Why?
  • What’s the difference between c() and list()?
  • In ggplot, aesthetics refer to visual customization (e.g., change the color of all points)
  • A pipe takes the output of the previous function as the first argument of the next
  • What will True * 3 return?
  • What will TRUE / 10 return?
  • I do ggplot(iris, aes(x="Sepal.Length", y="Petal.Length")) but it throws an error. Why?
  • I do ggplot(iris, aes(x=Sepal.Length, y=Petal.length)) but it throws an error. Why?
  • I am running mutate(data, x = 3) but it says Error in mutate(x) : could not find function "mutate". Why?
  • What is the problem with the following:
do_a_multiplication <- function(numbers) {
  result <- x * y
  return(result)  
}  

The End (for now)

Thank you!