BetaPhi2() for Beta Regressions

Regression models with a Beta distribution can be useful to predict scores from bounded variables (that has an upper and lower limit), such as that of scales.

The SubjectiveScalesModels.jl package defines the the BetaPhi2() function that can be used to generate or model this type of data.

Function

SubjectiveScalesModels.BetaPhi2 — Type

BetaPhi2(μ, ϕ)

Construct a Beta distribution with parameters mean μ and precision ϕ. It is defined as Beta(μ * 2ϕ, (1 - μ) * 2ϕ).

Arguments

μ: Location parameter (range: $]0, 1[$).
ϕ: Precision parameter (must be $> 0$).

Details

Beta Phi2 is a variant of the traditional Mu-Phi location-precision parametrization. The modification - scaling ϕ by a factor of 1/2 - creates in a Beta distribution in which, when μ is at its center (i.e., 0.5), a parameter ϕ equal to 1 results in a flat prior (i.e., $Beta(1, 1)$). It is useful to set priors for ϕ on the log scale in regression models, so that a prior of $Normal(0, 1)$ assigns the most probability on a flat distribution (ϕ=1).

The red area shows the region where the distribution assigns the highest probability to extreme values (towards 0 and/or 1). The blue area shows the region where the distribution is "convex" and peaks within the $]0, 1[$ interval.

Examples

julia> BetaPhi2(0.5, 1)
BetaPhi2{Float64}(μ=0.5, ϕ=1.0)

source

Usage

Simulate Data

Summary

You can use rand(dist, n) to generate n observations from a BetaPhi2() distribution with pre-specified parameters.

Let's generate some data from a BetaPhi2() distribution with known parameters that we are going to try to recover using Bayesian modelling.

using DataFrames
using Random
using Turing
using CairoMakie
using StatsFuns: logistic
using SubjectiveScalesModels

Random.seed!(123)

y = rand(BetaPhi2(μ=0.7, ϕ=3.0), 1000)

hist(y, bins=beta_bins(30), color=:dodgerblue, normalization=:pdf)

Code tip

We use the beta_bins(n_bins) function to conveniently visualize Beta distributions.

Prior Specification

Summary

Expressing μ on the logit scale and ϕ on the log scale is recommended, with default priors as $Normal(0, 1)$.

Expressing parameters on the logit scale for μ and the log scale for ϕ can be useful to define priors that are more interpretable and easier to specify (and to avoid computational issues caused by the bounded nature of the parameters).

See code

μ = Normal(0, 1.0)
ϕ = Normal(0, 1.0)

fig =  Figure(size = (850, 600))

ax1 = Axis(fig[1, 1],
    xlabel="Prior on the logit scale",
    ylabel="Prior on μ",
    yticksvisible=false,
    xticksvisible=false,
    yticklabelsvisible=false)

xaxis1 = range(-10, 10, 1000)

lines!(ax1, xaxis1, pdf.(μ, xaxis1), color=:purple, linewidth=2, label="μ ~ Normal(0, 1)")
axislegend(ax1; position=:rt)

ax2 = Axis(fig[1, 2],
    xlabel="Prior after logistic transformation",
    yticksvisible=false,
    xticksvisible=false,
    yticklabelsvisible=false)
lines!(ax2, logistic.(xaxis1), pdf.(μ, xaxis1), color=:purple, linewidth=2, label="μ")

ax3 = Axis(fig[2, 1],
    xlabel="Prior on the log scale",
    ylabel="Prior on ϕ",
    yticksvisible=false,
    xticksvisible=false,
    yticklabelsvisible=false)
lines!(ax3, xaxis1, pdf.(ϕ, xaxis1), color=:green, linewidth=2, label="ϕ ~ Normal(0, 1)")
axislegend(ax3; position=:rt)

ax4 = Axis(fig[2, 2],
    xlabel="Prior after exponential transformation",
    yticksvisible=false,
    xticksvisible=false,
    yticklabelsvisible=false)
vlines!(ax4, [1], color=:black, linestyle=:dash, linewidth=1)
lines!(ax4, exp.(xaxis1), pdf.(ϕ, xaxis1), color=:green, linewidth=2, label="ϕ")
xlims!(ax4, -0.5, 15)

fig[0, :] = Label(fig, "Priors for Beta Regressions", fontsize=20, color=:black, font=:bold)
fig;

Bayesian Model with Turing

We can easily use this distribution to fit a Beta regression model using the Turing package.

@model function model_beta(y)
    # Priors
    μ ~ Normal(0, 1)
    ϕ ~ Normal(0, 1)

    # Inference
    for i in 1:length(y)
        y[i] ~ BetaPhi2(logistic(μ), exp(ϕ))
    end
end

fit = model_beta(y)
posteriors = sample(fit, NUTS(), 500)

# 95% CI
hpd(posteriors)

HPD
  parameters     lower     upper 
      Symbol   Float64   Float64 

           μ    0.7870    0.8832
           ϕ    1.0061    1.1616

Let us do a Posterior Predictive Check which involves the generation of predictions from the model to compare the predicted distribution against the actual observed data.

# Make predictions
pred = predict(model_beta([missing for _ in 1:length(y)]), posteriors)
pred = Array(pred)

fig = hist(y, bins=beta_bins(30), color=:dodgerblue, normalization=:pdf)
for i in 1:size(pred, 1) # Iterate over each draw
    density!(pred[i, :], color=(:black, 0), strokecolor=(:crimson, 0.05), strokewidth=1)
end
xlims!(0, 1)
fig

Recover Parameters

Summary

Use the logistic() (in the StatsFuns package) and exp() functions to transform the model parameters back to the original scale.

Let us compare the parameters estimated by the model (the mean of the posteriors) with the true values used to generate the data (μ=0.7, ϕ=3.0).

means = DataFrame(mean(posteriors))

table = DataFrame(
    Parameter = means.parameters,
    PosteriorMean = means.mean,
    Estimate = [logistic(means.mean[1]), exp(means.mean[2])],
    TrueValue = [0.7, 3.0]
)

2×4 DataFrame

Row	Parameter	PosteriorMean	Estimate	TrueValue
	Symbol	Float64	Float64	Float64
1	μ	0.831805	0.696737	0.7
2	ϕ	1.07873	2.94093	3.0

Mission accomplished!