BetaPhi2() for Beta Regressions
Regression models with a Beta distribution can be useful to predict scores from bounded variables (that has an upper and lower limit), such as that of scales.
The SubjectiveScalesModels.jl
package defines the the BetaPhi2()
function that can be used to generate or model this type of data.
Function
SubjectiveScalesModels.BetaPhi2
— TypeBetaPhi2(μ, ϕ)
Construct a Beta distribution with parameters mean μ
and precision ϕ
. It is defined as Beta(μ * 2ϕ, (1 - μ) * 2ϕ)
.
Arguments
μ
: Location parameter (range: $]0, 1[$).ϕ
: Precision parameter (must be $> 0$).
Details
Beta Phi2 is a variant of the traditional Mu-Phi location-precision parametrization. The modification - scaling ϕ by a factor of 1/2 - creates in a Beta distribution in which, when μ is at its center (i.e., 0.5), a parameter ϕ equal to 1 results in a flat prior (i.e., $Beta(1, 1)$). It is useful to set priors for ϕ on the log scale in regression models, so that a prior of $Normal(0, 1)$ assigns the most probability on a flat distribution (ϕ=1).
The red area shows the region where the distribution assigns the highest probability to extreme values (towards 0 and/or 1). The blue area shows the region where the distribution is "convex" and peaks within the $]0, 1[$ interval.
Examples
julia> BetaPhi2(0.5, 1)
BetaPhi2{Float64}(μ=0.5, ϕ=1.0)
Usage
Simulate Data
You can use rand(dist, n)
to generate n observations from a BetaPhi2()
distribution with pre-specified parameters.
Let's generate some data from a BetaPhi2()
distribution with known parameters that we are going to try to recover using Bayesian modelling.
using DataFrames
using Random
using Turing
using CairoMakie
using StatsFuns: logistic
using SubjectiveScalesModels
Random.seed!(123)
y = rand(BetaPhi2(μ=0.7, ϕ=3.0), 1000)
hist(y, bins=beta_bins(30), color=:dodgerblue, normalization=:pdf)
We use the beta_bins(n_bins)
function to conveniently visualize Beta distributions.
Prior Specification
Expressing μ on the logit scale and ϕ on the log scale is recommended, with default priors as $Normal(0, 1)$.
Expressing parameters on the logit scale for μ
and the log scale for ϕ
can be useful to define priors that are more interpretable and easier to specify (and to avoid computational issues caused by the bounded nature of the parameters).
See code
μ = Normal(0, 1.0)
ϕ = Normal(0, 1.0)
fig = Figure(size = (850, 600))
ax1 = Axis(fig[1, 1],
xlabel="Prior on the logit scale",
ylabel="Prior on μ",
yticksvisible=false,
xticksvisible=false,
yticklabelsvisible=false)
xaxis1 = range(-10, 10, 1000)
lines!(ax1, xaxis1, pdf.(μ, xaxis1), color=:purple, linewidth=2, label="μ ~ Normal(0, 1)")
axislegend(ax1; position=:rt)
ax2 = Axis(fig[1, 2],
xlabel="Prior after logistic transformation",
yticksvisible=false,
xticksvisible=false,
yticklabelsvisible=false)
lines!(ax2, logistic.(xaxis1), pdf.(μ, xaxis1), color=:purple, linewidth=2, label="μ")
ax3 = Axis(fig[2, 1],
xlabel="Prior on the log scale",
ylabel="Prior on ϕ",
yticksvisible=false,
xticksvisible=false,
yticklabelsvisible=false)
lines!(ax3, xaxis1, pdf.(ϕ, xaxis1), color=:green, linewidth=2, label="ϕ ~ Normal(0, 1)")
axislegend(ax3; position=:rt)
ax4 = Axis(fig[2, 2],
xlabel="Prior after exponential transformation",
yticksvisible=false,
xticksvisible=false,
yticklabelsvisible=false)
vlines!(ax4, [1], color=:black, linestyle=:dash, linewidth=1)
lines!(ax4, exp.(xaxis1), pdf.(ϕ, xaxis1), color=:green, linewidth=2, label="ϕ")
xlims!(ax4, -0.5, 15)
fig[0, :] = Label(fig, "Priors for Beta Regressions", fontsize=20, color=:black, font=:bold)
fig;
Bayesian Model with Turing
We can easily use this distribution to fit a Beta regression model using the Turing
package.
@model function model_beta(y)
# Priors
μ ~ Normal(0, 1)
ϕ ~ Normal(0, 1)
# Inference
for i in 1:length(y)
y[i] ~ BetaPhi2(logistic(μ), exp(ϕ))
end
end
fit = model_beta(y)
posteriors = sample(fit, NUTS(), 500)
# 95% CI
hpd(posteriors)
HPD
parameters lower upper
Symbol Float64 Float64
μ 0.7870 0.8832
ϕ 1.0061 1.1616
Let us do a Posterior Predictive Check which involves the generation of predictions from the model to compare the predicted distribution against the actual observed data.
# Make predictions
pred = predict(model_beta([missing for _ in 1:length(y)]), posteriors)
pred = Array(pred)
fig = hist(y, bins=beta_bins(30), color=:dodgerblue, normalization=:pdf)
for i in 1:size(pred, 1) # Iterate over each draw
density!(pred[i, :], color=(:black, 0), strokecolor=(:crimson, 0.05), strokewidth=1)
end
xlims!(0, 1)
fig
Recover Parameters
Use the logistic()
(in the StatsFuns
package) and exp()
functions to transform the model parameters back to the original scale.
Let us compare the parameters estimated by the model (the mean of the posteriors) with the true values used to generate the data (μ=0.7, ϕ=3.0).
means = DataFrame(mean(posteriors))
table = DataFrame(
Parameter = means.parameters,
PosteriorMean = means.mean,
Estimate = [logistic(means.mean[1]), exp(means.mean[2])],
TrueValue = [0.7, 3.0]
)
Row | Parameter | PosteriorMean | Estimate | TrueValue |
---|---|---|---|---|
Symbol | Float64 | Float64 | Float64 | |
1 | μ | 0.831805 | 0.696737 | 0.7 |
2 | ϕ | 1.07873 | 2.94093 | 3.0 |
Mission accomplished!