5  Descriptive Models

5.1 The Data

For this chapter, we will be using the data from Wagenmakers et al. (2008) - Experiment 1 (also reanalyzed by Heathcote and Love 2012), that contains responses and response times for several participants in two conditions (where instructions emphasized either speed or accuracy). Using the same procedure as the authors, we excluded all trials with uninterpretable response time, i.e., responses that are too fast (<180 ms) or too slow (>2 sec instead of >3 sec, see Thériault et al. 2024 for a discussion on outlier removal).

using Downloads, CSV, DataFrames, Random
using Turing, Distributions, StatsFuns, SequentialSamplingModels
using GLMakie

Random.seed!(123)  # For reproducibility

df = CSV.read(Downloads.download("https://raw.githubusercontent.com/DominiqueMakowski/CognitiveModels/main/data/wagenmakers2008.csv"), DataFrame)

# Show 10 first rows
first(df, 10)
10×5 DataFrame
Row Participant Condition RT Error Frequency
Int64 String15 Float64 Bool String15
1 1 Speed 0.7 false Low
2 1 Speed 0.392 true Very Low
3 1 Speed 0.46 false Very Low
4 1 Speed 0.455 false Very Low
5 1 Speed 0.505 true Low
6 1 Speed 0.773 false High
7 1 Speed 0.39 false High
8 1 Speed 0.587 true Low
9 1 Speed 0.603 false Low
10 1 Speed 0.435 false High

In the previous chapter, we modeled the error rate (the probability of making an error) using a logistic model, and observed that it was higher in the "Speed" condition. But how about speed? We are going to first take interest in the response times (RT) of Correct answers only (as we can assume that errors are underpinned by a different generative process).

After filtering out the errors, we create a new column, Accuracy, which is the “binarization” of the Condition column, and is equal to 1 when the condition is "Accuracy" and 0 when it is "Speed".

Code
df = df[df.Error .== 0, :]
df.Accuracy = df.Condition .== "Accuracy"
Code Tip

Note the usage of vectorization .== as we want to compare each element of the Condition vector to the target "Accuracy".

Code
function plot_distribution(df, title="Empirical Distribution of Data from Wagenmakers et al. (2018)")
    fig = Figure()
    ax = Axis(fig[1, 1], title=title,
        xlabel="RT (s)",
        ylabel="Distribution",
        yticksvisible=false,
        xticksvisible=false,
        yticklabelsvisible=false)
    Makie.density!(df[df.Condition .== "Speed", :RT], color=("#EF5350", 0.7), label = "Speed")
    Makie.density!(df[df.Condition .== "Accuracy", :RT], color=("#66BB6A", 0.7), label = "Accuracy")
    Makie.axislegend("Condition"; position=:rt)
    Makie.ylims!(ax, (0, nothing))
    return fig
end

plot_distribution(df, "Empirical Distribution of Data from Wagenmakers et al. (2018)")
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie C:\Users\domma\.julia\packages\Makie\VRavR\src\scenes.jl:220

5.2 Gaussian (aka Linear) Model

Note

Note that until the last section of this chapter, we will disregard the existence of multiple participants (which require the inclusion of random effects in the model). We will treat the data as if it was a single participant at first to better understand the parameters, but will show how to add random effects at the end.

A linear model is the most common type of model. It aims at predicting the mean mu \(\mu\) of the outcome variable using a Normal (aka Gaussian) distribution for the residuals. In other words, it models the outcome \(y\) as a Normal distribution with a mean mu \(\mu\) that is itself the result of a linear function of the predictors \(X\) and a variance sigma \(\sigma\) that is constant across all values of the predictors. It can be written as \(y = Normal(\mu, \sigma)\), where \(\mu = intercept + slope * X\).

In order to fit a Linear Model for RTs, we need to set a prior on all these parameters, namely:

  • Sigma \(\sigma\) : The variance (corresponding to the “spread” of RTs)
  • Mu \(\mu\) : The mean for the intercept (i.e., at the reference condition which is in our case "Speed")
  • The effect of the condition (the slope) on the mean (\(\mu\)) RT.

5.2.1 Model Specification

@model function model_Gaussian(rt; condition=nothing)

    # Prior on variance 
    σ ~ truncated(Normal(0, 0.5); lower=0)  # Strictly positive half normal distribution

    # Priors on intercept and effect of condition
    μ_intercept ~ truncated(Normal(0, 1); lower=0)
    μ_condition ~ Normal(0, 0.3)

    # Iterate through every observation
    for i in 1:length(rt)
        # Apply formula
        μ = μ_intercept + μ_condition * condition[i]
        # Likelihood family
        rt[i] ~ Normal(μ, σ)
    end
end

# Fit the model with the data
fit_Gaussian = model_Gaussian(df.RT; condition=df.Accuracy)
# Sample results using MCMC
chain_Gaussian = sample(fit_Gaussian, NUTS(), 400)
# Summary (95% CI)
hpd(chain_Gaussian; alpha=0.05)
HPD
   parameters     lower     upper 
       Symbol   Float64   Float64 
            σ    0.1652    0.1701
  μ_intercept    0.5071    0.5168
  μ_condition    0.1319    0.1457

The effect of Condition is significant, people are on average slower (higher RT) when condition is "Accuracy". But is our model good?

5.2.2 Posterior Predictive Check

Code
pred = predict(model_Gaussian([(missing) for i in 1:length(df.RT)], condition=df.Accuracy), chain_Gaussian)
pred = Array(pred)
Code
fig = plot_distribution(df, "Predictions made by Gaussian (aka Linear) Model")
for i in 1:length(chain_Gaussian)
    lines!(Makie.KernelDensity.kde(pred[:, i]), color=ifelse(df.Accuracy[i] == 1, "#388E3C", "#D32F2F"), alpha=0.1)
end
fig
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie C:\Users\domma\.julia\packages\Makie\VRavR\src\scenes.jl:220

As you can see, the linear models are good at predicting the mean RT (the center of the distribution), but they are not good at capturing the spread and the shape of the data.

5.3 Scaled Gaussian Model

The previous model, despite its poor fit to the data, suggests that the mean RT is higher for the Accuracy condition. But it seems like the green distribution is also wider (i.e., the response time is more variable), which is not captured by the our model (the predicted distributions have the same widths). This is expected, as typical linear models estimate only one value for sigma \(\sigma\) for the whole model, hence the requirement for homoscedasticity.

Note

Homoscedasticity, or homogeneity of variances, is the assumption of similar variances accross different values of predictors. It is important in linear models as only one value for sigma \(\sigma\) is estimated.

Is it possible to set sigma \(\sigma\) as a parameter that would depend on the condition, in the same way as mu \(\mu\)? In Julia, this is very simple.

All we need is to set sigma \(\sigma\) as the result of a linear function, such as \(\sigma = intercept + slope * condition\). This means setting a prior on the intercept of sigma \(\sigma\) (in our case, the variance in the reference condition) and a prior on how much this variance changes for the other condition. This change can, by definition, be positive or negative (i.e., the other condition can have either a biggger or a smaller variance), so the prior over the effect of condition should ideally allow for positive and negative values (e.g., σ_condition ~ Normal(0, 0.1)).

But this leads to an important problem.

Important

The combination of an intercept and a (possible negative) slope for sigma \(\sigma\) technically allows for negative variance values, which is impossible (distributions cannot have a negative variance). This issue is one of the most important to address when setting up complex models for RTs.

Indeed, even if we set a very narrow prior on the intercept of sigma \(\sigma\) to fix it at for instance 0.14, and a narrow prior on the effect of condition, say \(Normal(0, 0.001)\), an effect of condition of -0.15 is still possible (albeit with very low probability). And such effect would lead to a sigma \(\sigma\) of 0.14 - 0.15 = -0.01, which would lead to an error (and this will often happen as the sampling process does explore unlikely regions of the parameter space).

5.3.1 Solution 1: Directional Effect of Condition

One possible (but not recommended) solution is to simply make it impossible for the effect of condition to be negative by Truncating the prior to a lower bound of 0. This can work in our case, because we know that the comparison condition is likely to have a higher variance than the reference condition (the intercept) - and if it wasn’t the case, we could have changed the reference factor. However, this is not a good practice as we are enforcing a very strong a priori specific direction of the effect, which is not always justified.

@model function model_ScaledlGaussian(rt; condition=nothing)

    # Priors
    μ_intercept ~ truncated(Normal(0, 1); lower=0)
    μ_condition ~ Normal(0, 0.3)

    σ_intercept ~ truncated(Normal(0, 0.5); lower=0)  # Same prior as previously
    σ_condition ~ truncated(Normal(0, 0.1); lower=0)  # Enforce positivity

    for i in 1:length(rt)
        μ = μ_intercept + μ_condition * condition[i]
        σ = σ_intercept + σ_condition * condition[i]
        rt[i] ~ Normal(μ, σ)
    end
end

fit_ScaledlGaussian = model_ScaledlGaussian(df.RT; condition=df.Accuracy)
chain_ScaledGaussian = sample(fit_ScaledlGaussian, NUTS(), 400)
# Summary (95% CI)
hpd(chain_ScaledGaussian; alpha=0.05)
HPD
   parameters     lower     upper 
       Symbol   Float64   Float64 
  μ_intercept    0.5081    0.5148
  μ_condition    0.1330    0.1446
  σ_intercept    0.1219    0.1271
  σ_condition    0.0714    0.0810

We can see that the effect of condition on sigma \(\sigma\) is significantly positive: the variance is higher in the Accuracy condition as compared to the Speed condition.

5.3.2 Solution 2: Avoid Exploring Negative Variance Values

The other trick is to force the sampling algorithm to avoid exploring negative variance values (when sigma \(\sigma\) < 0). This can be done by adding a conditional statement when sigma \(\sigma\) is negative to avoid trying this value and erroring, and instead returning an infinitely low model probability (-Inf) to push away the exploration of this impossible region.

@model function model_ScaledlGaussian(rt; condition=nothing)

    # Priors
    μ_intercept ~ truncated(Normal(0, 1); lower=0)
    μ_condition ~ Normal(0, 0.3)

    σ_intercept ~ truncated(Normal(0, 0.5); lower=0)
    σ_condition ~ Normal(0, 0.1)

    for i in 1:length(rt)
        μ = μ_intercept + μ_condition * condition[i]
        σ = σ_intercept + σ_condition * condition[i]
        if σ < 0  # Avoid negative variance values
            Turing.@addlogprob! -Inf
            return nothing
        end
        rt[i] ~ Normal(μ, σ)
    end
end

fit_ScaledlGaussian = model_ScaledlGaussian(df.RT; condition=df.Accuracy)
chain_ScaledGaussian = sample(fit_ScaledlGaussian, NUTS(), 400)
hpd(chain_ScaledGaussian; alpha=0.05)
HPD
   parameters     lower     upper 
       Symbol   Float64   Float64 
  μ_intercept    0.5076    0.5148
  μ_condition    0.1316    0.1444
  σ_intercept    0.1223    0.1273
  σ_condition    0.0709    0.0803

5.4 The Problem with Linear Models

Reaction time (RTs) have been traditionally modeled using traditional linear models and their derived statistical tests such as t-test and ANOVAs. Importantly, linear models - by definition - will try to predict the mean of the outcome variable by estimating the “best fitting” Normal distribution. In the context of reaction times (RTs), this is not ideal, as RTs typically exhibit a non-normal distribution, skewed towards the left with a long tail towards the right. This means that the parameters of a Normal distribution (mean \(\mu\) and standard deviation \(\sigma\)) are not good descriptors of the data.

Linear models try to find the best fitting Normal distribution for the data. However, for reaction times, even the best fitting Normal distribution (in red) does not capture well the actual data (in grey).

A popular mitigation method to account for the non-normality of RTs is to transform the data, using for instance the popular log-transform. However, this practice should be avoided as it leads to various issues, including loss of power and distorted results interpretation (Lo and Andrews 2015; Schramm and Rouder 2019). Instead, rather than applying arbitrary data transformation, it would be better to swap the Normal distribution used by the model for a more appropriate one that can better capture the characteristics of a RT distribution.

5.5 Shifted LogNormal Model

One of the obvious candidate alternative to the log-transformation would be to use a model with a Log-transformed Normal distribution. A LogNormal distribution is a distribution of a random variable whose logarithm is normally distributed. In this model, the mean \(\mu\) and is defined on the log-scale, and effects must be interpreted as multiplicative rather than additive (the condition increases the mean RT by a factor of \(\exp(\mu_{condition})\)).

Note that for LogNormal distributions (as it is the case for many of the models introduced in the rest of the capter), the distribution parameters (\(\mu\) and \(\sigma\)) are not independent with respect to the mean and the standard deviation (SD). The empirical SD increases when the mean \(\mu\) increases (which is seen as a feature rather than a bug, as it is consistent with typical reaction time data, Wagenmakers, Grasman, and Molenaar 2005).

A Shifted LogNormal model introduces a shift (a delay) parameter tau \(\tau\) that corresponds to the minimum “starting time” of the response process.

We need to set a prior for this parameter, which is usually truncated between 0 (to exclude negative minimum times) and the minimum RT of the data (the logic being that the minimum delay for response must be lower than the faster response actually observed).

While \(Uniform(0, min(RT))\) is a common choice of prior, it is not ideal as it implies that all values between 0 and the minimum RT are equally likely, which is not the case. Indeed, psychology research has shown that such minimum response time for Humans is often betwen 100 and 250 ms. Moreover, in our case, we explicitly removed all RTs below 180 ms, suggesting that the minimum response time is more likely to approach 180 ms than 0 ms.

5.5.1 Prior on Minimum RT

Instead of a \(Uniform\) prior, we will use a \(Gamma(1.1, 11)\) distribution (truncated at min. RT), as this particular parameterization reflects the low probability of very low minimum RTs (near 0) and a steadily increasing probability for increasing times.

Code
xaxis = range(0, 0.3, 1000)
fig = lines(xaxis, pdf.(Gamma(1.1, 11), xaxis); color=:blue, label="Gamma(1.1, 11)")
vlines!([minimum(df.RT)]; color="red", linestyle=:dash, label="Min. RT = 0.18 s")
axislegend()
fig
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie C:\Users\domma\.julia\packages\Makie\VRavR\src\scenes.jl:220

5.5.2 Model Specification

@model function model_LogNormal(rt; min_rt=minimum(df.RT), condition=nothing)

    # Priors 
    τ ~ truncated(Gamma(1.1, 11); upper=min_rt)

    μ_intercept ~ Normal(0, exp(1))  # On the log-scale: exp(μ) to get value in seconds
    μ_condition ~ Normal(0, exp(0.3))

    σ_intercept ~ truncated(Normal(0, 0.5); lower=0)
    σ_condition ~ Normal(0, 0.1)

    for i in 1:length(rt)
        μ = μ_intercept + μ_condition * condition[i]
        σ = σ_intercept + σ_condition * condition[i]
        if σ < 0  # Avoid negative variance values
            Turing.@addlogprob! -Inf
            return nothing
        end
        rt[i] ~ ShiftedLogNormal(μ, σ, τ)
    end
end

fit_LogNormal = model_LogNormal(df.RT; condition=df.Accuracy)
chain_LogNormal = sample(fit_LogNormal, NUTS(), 400)

5.5.3 Interpretation

hpd(chain_LogNormal; alpha=0.05)
HPD
   parameters     lower     upper 
       Symbol   Float64   Float64 
            τ    0.1721    0.1794
  μ_intercept   -1.1584   -1.1274
  μ_condition    0.3123    0.3407
  σ_intercept    0.3073    0.3232
  σ_condition    0.0326    0.0523
Code
pred = predict(model_LogNormal([(missing) for i in 1:length(df.RT)]; condition=df.Accuracy), chain_LogNormal)
pred = Array(pred)

fig = plot_distribution(df, "Predictions made by Shifted LogNormal Model")
for i in 1:length(chain_LogNormal)
    lines!(Makie.KernelDensity.kde(pred[:, i]), color=ifelse(df.Accuracy[i] == 1, "#388E3C", "#D32F2F"), alpha=0.1)
end
fig
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie C:\Users\domma\.julia\packages\Makie\VRavR\src\scenes.jl:220

This model provides a much better fit to the data, and confirms that the Accuracy condition is associated with higher RTs and higher variability (i.e., a larger distribution width).

LogNormal distributions in nature

The reason why the Normal distribution is so ubiquituous in nature (and hence used as a good default) is due to the Central Limit Theorem, which states that the sum of a large number of independent random variables will be approximately normally distributed. Because many things in nature are the result of the addition of many random processes, the Normal distribution is very common in real life.

However, it turns out that the multiplication of random variables result in a LogNormal distribution, and multiplicating (rather than additive) cascades of processes are also very common in nature, from lengths of latent periods of infectious diseases to distribution of mineral resources in the Earth’s crust, and the elemental mechanisms at stakes in physics and cell biolody (Limpert, Stahel, and Abbt 2001).

Thus, using LogNormal distributions for RTs can be justified with the assumption that response times are the result of multiplicative stochastic processes happening in the brain.

5.6 ExGaussian Model

Another popular model to describe RTs uses the ExGaussian distribution, i.e., the Exponentially-modified Gaussian distribution (Balota and Yap 2011; Matzke and Wagenmakers 2009).

This distribution is a convolution of normal and exponential distributions and has three parameters, namely mu \(\mu\) and sigma \(\sigma\) - the mean and standard deviation of the Gaussian distribution - and tau \(\tau\) - the exponential component of the distribution (note that although denoted by the same letter, it does not correspond directly to a shift of the distribution). Intuitively, these parameters reflect the centrality, the width and the tail dominance, respectively.

Beyond the descriptive value of these types of models, some have tried to interpret their parameters in terms of cognitive mechanisms, arguing for instance that changes in the Gaussian components (\(\mu\) and \(\sigma\)) reflect changes in attentional processes [e.g., “the time required for organization and execution of the motor response”; Hohle (1965)], whereas changes in the exponential component (\(\tau\)) reflect changes in intentional (i.e., decision-related) processes (Kieffaber et al. 2006). However, Matzke and Wagenmakers (2009) demonstrate that there is likely no direct correspondence between ex-Gaussian parameters and cognitive mechanisms, and underline their value primarily as descriptive tools, rather than models of cognition per se.

Descriptively, the three parameters can be interpreted as:

  • Mu \(\mu\) : The location / centrality of the RTs. Would correspond to the mean in a symmetrical distribution.
  • Sigma \(\sigma\) : The variability and dispersion of the RTs. Akin to the standard deviation in normal distributions.
  • Tau \(\tau\) : Tail weight / skewness of the distribution.
Important

Note that these parameters are not independent with respect to distribution characteristics, such as the empirical mean and SD. Below is an example of different distributions with the same location (mu \(\mu\)) and dispersion (sigma \(\sigma\)) parameters. Although only the tail weight parameter (tau \(\tau\)) is changed, the whole distribution appears to shift is centre of mass. Hence, one should be careful note to interpret the values of mu \(\mu\) directly as the “mean” or the distribution peak and sigma \(\sigma\) as the SD or the “width”.

5.6.1 Conditional Tau \(\tau\) Parameter

In the same way as we modeled the effect of the condition on the variance component sigma \(\sigma\), we can do the same for any other parameters, including the exponential component tau \(\tau\). All wee need is to set a prior on the intercept and the condition effect, and make sure that \(\tau > 0\).

@model function model_ExGaussian(rt; condition=nothing)

    # Priors 
    μ_intercept ~ Normal(0, 1) 
    μ_condition ~ Normal(0, 0.3)

    σ_intercept ~ truncated(Normal(0, 0.5); lower=0)
    σ_condition ~ Normal(0, 0.1)

    τ_intercept ~ truncated(Normal(0, 0.5); lower=0)
    τ_condition ~ Normal(0, 0.1)

    for i in 1:length(rt)
        μ = μ_intercept + μ_condition * condition[i]
        σ = σ_intercept + σ_condition * condition[i]
        if σ < 0  # Avoid negative variance values
            Turing.@addlogprob! -Inf
            return nothing
        end
        τ = τ_intercept + τ_condition * condition[i]
        if τ <= 0  # Avoid negative tau values
            Turing.@addlogprob! -Inf
            return nothing
        end
        rt[i] ~ ExGaussian(μ, σ, τ)
    end
end

fit_ExGaussian = model_ExGaussian(df.RT; condition=df.Accuracy)
chain_ExGaussian = sample(fit_ExGaussian, NUTS(), 400)

5.6.2 Interpretation

hpd(chain_ExGaussian; alpha=0.05)
HPD
   parameters     lower     upper 
       Symbol   Float64   Float64 
  μ_intercept    0.4002    0.4059
  μ_condition    0.0627    0.0733
  σ_intercept    0.0379    0.0424
  σ_condition    0.0101    0.0186
  τ_intercept    0.1047    0.1126
  τ_condition    0.0631    0.0788
Code
pred = predict(model_ExGaussian([(missing) for i in 1:length(df.RT)]; condition=df.Accuracy), chain_ExGaussian)
pred = Array(pred)

fig = plot_distribution(df, "Predictions made by Shifted LogNormal Model")
for i in 1:length(chain_ExGaussian)
    lines!(Makie.KernelDensity.kde(pred[:, i]), color=ifelse(df.Accuracy[i] == 1, "#388E3C", "#D32F2F"), alpha=0.1)
end
fig
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie C:\Users\domma\.julia\packages\Makie\VRavR\src\scenes.jl:220

The ExGaussian model also provides an excellent fit to the data. Moreover, by modeling more parameters (including tau \(\tau\)), we can draw more nuanced conclusions. In this case, the Accuracy condition is associated with higher RTs, higher variability, and a heavier tail (i.e., more extreme values).

5.7 Shifted Wald Model

The Wald distribution, also known as the Inverse Gaussian distribution, corresponds to the distribution of the first passage time of a Wiener process with a drift rate \(\mu\) and a diffusion rate \(\sigma\). While we will unpack this definition below and emphasize its important consequences, one can first note that it has been described as a potential model for RTs when convoluted with an exponential distribution (in the same way that the ExGaussian distribution is a convolution of a Gaussian and an exponential distribution). However, this Ex-Wald model (Schwarz 2001) was shown to be less appropriate than one of its variant, the Shifted Wald distribution (Heathcote 2004; Anders et al. 2016).

Note that the Wald distribution, similarly to the models that we will be covering next (the “generative” models), is different from the previous distributions in that it is not characterized by a “location” and “scale” parameters (mu \(\mu\) and sigma \(\sigma\)). Instead, the parameters of the Shifted Wald distribution are:

  • Nu \(\nu\) : A drift parameter, corresponding to the strength of the evidence accumulation process.
  • Alpha \(\alpha\) : A threshold parameter, corresponding to the amount of evidence required to make a decision.
  • Tau \(\tau\) : A delay parameter, corresponding to the non-response time (i.e., the minimum time required to process the stimulus and respond). A shift parameter similar to the one in the Shifted LogNormal model.

As we can see, these parameters do not have a direct correspondence with the mean and standard deviation of the distribution. Their interpretation is more complex but, as we will see below, offers a window to a new level of interpretation.

Note

Explanations regarding these new parameters will be provided in the next chapter.

5.7.1 Model Specification

@model function model_Wald(rt; min_rt=minimum(df.RT), condition=nothing)

    # Priors 
    ν_intercept ~ truncated(Normal(1, 3); lower=0)
    ν_condition ~ Normal(0, 1)

    α_intercept ~ truncated(Normal(0, 1); lower=0)
    α_condition ~ Normal(0, 0.5)

    τ_intercept ~ truncated(Gamma(1.1, 11); upper=min_rt)
    τ_condition ~ Normal(0, 0.01)

    for i in 1:length(rt)
        ν = ν_intercept + ν_condition * condition[i]
        if ν <= 0  # Avoid negative drift
            Turing.@addlogprob! -Inf
            return nothing
        end
        α = α_intercept + α_condition * condition[i]
        if α <= 0  # Avoid negative variance values
            Turing.@addlogprob! -Inf
            return nothing
        end
        τ = τ_intercept + τ_condition * condition[i]
        if τ < 0  # Avoid negative tau values
            Turing.@addlogprob! -Inf
            return nothing
        end
        rt[i] ~ Wald(ν, α, τ)
    end
end

fit_Wald = model_Wald(df.RT; condition=df.Accuracy)
chain_Wald = sample(fit_Wald, NUTS(), 600)
hpd(chain_Wald; alpha=0.05)
HPD
   parameters     lower     upper 
       Symbol   Float64   Float64 
  ν_intercept    5.0817    5.3248
  ν_condition   -1.3539   -1.0550
  α_intercept    1.6585    1.7402
  α_condition    0.2191    0.3506
  τ_intercept    0.1818    0.1870
  τ_condition   -0.0363   -0.0233
Code
pred = predict(model_Wald([(missing) for i in 1:length(df.RT)]; condition=df.Accuracy), chain_Wald)
pred = Array(pred)

fig = plot_distribution(df, "Predictions made by Shifted Wald Model")
for i in 1:length(chain_Wald)
    lines!(Makie.KernelDensity.kde(pred[:, i]), color=ifelse(df.Accuracy[i] == 1, "#388E3C", "#D32F2F"), alpha=0.1)
end
fig
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie C:\Users\domma\.julia\packages\Makie\VRavR\src\scenes.jl:220

5.7.2 Model Comparison

At this stage, given the multiple options avaiable to model RTs, you might be wondering which model is the best. One can compare the models using the Leave-One-Out Cross-Validation (LOO-CV) method, which is a Bayesian method to estimate the out-of-sample predictive accuracy of a model.

Code
using ParetoSmooth

loo_Gaussian = psis_loo(fit_Gaussian, chain_Gaussian, source="mcmc")
loo_ScaledGaussian = psis_loo(fit_ScaledlGaussian, chain_ScaledGaussian, source="mcmc")
loo_LogNormal = psis_loo(fit_LogNormal, chain_LogNormal, source="mcmc")
loo_ExGaussian = psis_loo(fit_ExGaussian, chain_ExGaussian, source="mcmc")
loo_Wald = psis_loo(fit_Wald, chain_Wald, source="mcmc")

loo_compare((
    Gaussian = loo_Gaussian, 
    ScaledGaussian = loo_ScaledGaussian, 
    LogNormal = loo_LogNormal, 
    ExGaussian = loo_ExGaussian, 
    Wald = loo_Wald))
┌────────────────┬──────────┬────────┬────────┐
│                │  cv_elpd │ cv_avg │ weight │
├────────────────┼──────────┼────────┼────────┤
│     ExGaussian │     0.00 │   0.00 │   1.00 │
│      LogNormal │  -323.67 │  -0.03 │   0.00 │
│           Wald │  -381.68 │  -0.04 │   0.00 │
│ ScaledGaussian │ -2466.17 │  -0.26 │   0.00 │
│       Gaussian │ -2974.91 │  -0.31 │   0.00 │
└────────────────┴──────────┴────────┴────────┘

The loo_compare() function orders models from best to worse based on their ELPD (Expected Log Pointwise Predictive Density) and provides the difference in ELPD between the best model and the other models. As one can see, traditional linear models perform terribly.

5.8 Shifted LogNormal Mixed Model

Complex example walkthrough. Add Random effects.