Ordered Beta Regressions
Data from subjective scales often exhibit clustered responses at the extremes (e.g., zeros and ones). This makes it challenging to model with regular Beta regressions. The Ordered Beta distribution allows for the presence of zeros and ones in an convenient way.
Function
SubjectiveScalesModels.OrderedBeta
— TypeOrderedBeta(μ, ϕ, k0, k1)
This distribution was introduced by Kubinec (2023) as an appropriate and parsimonious way of describing data commonly observed in psychological science (such as from slider scales). It is defined with a Beta distribution on the interval ]0, 1[ with additional point masses at 0 and 1. The Beta distribution is specified using the BetaPhi2
parametrization.
Arguments
μ
: location parameter on the scale 0-1ϕ
: precision parameter (must be positive). Note that this parameter is based on theBetaPhi2
reparametrization of the Beta distribution, which corresponds to half the precision of the traditional Beta distribution as implemented in for example theordbetareg
package.k0
: first cutpoint beyond which the probability of zeros increases. Likely lower than 0.5 (also referred to ascutzero
in the ordbetareg and k1 in the paper).k1
: second cutpoint beyond which the probability of ones increases. Likely higher than 0.5. Must be greater thank0
(also referred to ascutone
in the ordbetareg and k2 in the paper).
Note that when k0=0
and k1=1
, the distribution is equivalent to a Beta distribution (i.e., there are no zeros or ones).
Details
The figure above shows the parameter space for k0 and k1, showing the regions that produce a large proportion of zeros and ones (in red). Understanding this is important to set appropriate priors on these parameters.
Compared to the ordbetareg
R package, the main difference is that:
- phi ϕ (Julia version) = phi ϕ (R version) / 2
- k0 and k1 are specified on the raw scale [0, 1], and independently (in the R package, they are specified on the logit scale and k1 is expressed as a difference from k0).
Examples
julia> OrderedBeta(0.5, 1, 0.1, 0.9)
OrderedBeta{Float64}(μ=0.5, ϕ=1.0, k0=0.1, k1=0.9)
References
- Kubinec, R. (2023). Ordered beta regression: a parsimonious, well-fitting model for continuous data with lower and upper bounds. Political analysis, 31(4), 519-536.
Usage
Simulate Data
Data with clustered extreme responses are common in psychology and cognitive neuroscience. Ordered Beta models are a convenient and parsimonious way of modelling such data.
The model is based on a distribution with 4 parameters, 2 of which are the parameters of the BetaPhi2
distribution (modeling data in between the extremes), and k0 and k1 delimiting fuzzy boundaries beyond which the probability of extreme values increases.
Let us start by generating data from a distribution with known parameters, and then fitting a model to recover these parameters.
using CairoMakie
using Turing
using DataFrames
using StatsFuns: logistic
using SubjectiveScalesModels
μ = 0.6
ϕ = 2.5
k0 = 0.05
k1 = 0.9
data = rand(OrderedBeta(μ, ϕ, k0, k1), 1000)
hist(data, color=:forestgreen, normalization=:pdf, bins=beta_bins(20))
Prior Specification
We recommend the following priors:
μ ~ Normal(0, 1) # Logit scale
ϕ ~ Normal(0, 1) # Log scale
k0 ~ -Gamma(3, 3) # Logit scale
k1 ~ Gamma(3, 3) # Logit scale
Because these 4 parameters come with their own constraints (i.e., phi $\phi$ must be positive, mu $\mu$, k0 and k1 must be between 0 and 1), it is convenient to express them on a transformed scale (in which they become unconstrained and can adopt any values).
In particular, mu $\mu$ is typically expressed on the logit scale, and phi $\phi$ is expressed on the log scale (see here for details).
Priors for the cut points require a bit more thought. They share the same constraints as mu $\mu$ (must be on the 0-1 scale), which lends itself to being expressed on the logit scale and then transformed back using the logistic function. However, additionally, we would like to ensure that k0 < k1 (i.e., the lower boundary is lower than the upper boundary). This can be achieved by setting a prior for k0 that will have an upper bound at 0 (on the logit scale, corresponding to 0.5 on the raw scale) and a prior for k1 that will have 0 as its lower bound. In other words, we force the k0 to be between 0 and 0.5 and k1 to be between 0.5 and 1, which also solves the issue of k1 being smaller than k0.
This can be achieved using the $Gamma$ and "minus" Gamma distributions (its mirrored version). We can then pick a shape of the Gamma distribution that maximizes the probability on values lower than 0.05 for k0 and higher than 0.95 for k1 (which is within the typical range in psychological science), such as $Gamma(3, 3)$ and $-Gamma(3, 3)$ (which mode is "6", i.e., ~0.998 on the raw scale).
See code
using StatsFuns: logit
k0 = -Gamma(3, 3)
k1 = Gamma(3, 3)
fig = Figure(size = (850, 600))
ax1 = Axis(fig[1, 1],
xlabel="Prior on the logit scale",
ylabel="Prior on k0",
yticksvisible=false,
xticksvisible=false,
yticklabelsvisible=false)
xaxis1 = range(-30, 30, 1000)
vlines!(ax1, [0], color=:red, linestyle=:dash, linewidth=1)
vlines!(ax1, logit.([0.05]), color=:black, linestyle=:dot, linewidth=1)
lines!(ax1, xaxis1, pdf.(k0, xaxis1), color=:purple, linewidth=2, label="k0 ~ -Gamma(3, 3)")
axislegend(ax1; position=:rt)
ax2 = Axis(fig[1, 2],
xlabel="Prior after logistic transformation",
yticksvisible=false,
xticksvisible=false,
yticklabelsvisible=false)
vlines!(ax2, [0.05], color=:black, linestyle=:dot, linewidth=1)
lines!(ax2, logistic.(xaxis1), pdf.(k0, xaxis1), color=:purple, linewidth=2, label="k0")
ax3 = Axis(fig[2, 1],
xlabel="Prior on the logit scale",
ylabel="Prior on k1",
yticksvisible=false,
xticksvisible=false,
yticklabelsvisible=false)
vlines!(ax3, [0], color=:red, linestyle=:dash, linewidth=1)
vlines!(ax3, logit.([0.95]), color=:black, linestyle=:dot, linewidth=1)
lines!(ax3, xaxis1, pdf.(k1, xaxis1), color=:green, linewidth=2, label="k1 ~ Gamma(3, 3)")
axislegend(ax3; position=:rt)
ax4 = Axis(fig[2, 2],
xlabel="Prior after logistic transformation",
yticksvisible=false,
xticksvisible=false,
yticklabelsvisible=false)
vlines!(ax4, [0.95], color=:black, linestyle=:dot, linewidth=1)
lines!(ax4, logistic.(xaxis1), pdf.(k1, xaxis1), color=:green, linewidth=2, label="k1")
fig[0, :] = Label(fig, "Priors for Cut Points in Ordered Beta Regressions", fontsize=20, color=:black, font=:bold)
fig;
Bayesian Model with Turing
The parameters (and their priors) are expressed on the transformed scale, and then transformed using logistic()
or exp()
functions before being used in the model.
@model function model_ordbeta(y)
# Priors
μ ~ Normal(0, 1)
ϕ ~ Normal(0, 1)
k0 ~ -Gamma(3, 3)
k1 ~ Gamma(3, 3)
# Inference
for i in 1:length(y)
y[i] ~ OrderedBeta(logistic(μ), exp(ϕ), logistic(k0), logistic(k1))
end
end
posteriors = sample(model_ordbeta(data), NUTS(), 1000);
Chains MCMC chain (1000×16×1 Array{Float64, 3}):
Iterations = 501:1:1500
Number of chains = 1
Samples per chain = 1000
Wall duration = 6.33 seconds
Compute duration = 6.33 seconds
parameters = μ, ϕ, k0, k1
internals = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size
Summary Statistics
parameters mean std mcse ess_bulk ess_tail rhat ⋯
Symbol Float64 Float64 Float64 Float64 Float64 Float64 ⋯
μ 0.4132 0.0281 0.0008 1363.4594 872.8682 0.9995 ⋯
ϕ 0.9720 0.0456 0.0011 1858.5837 726.1515 0.9990 ⋯
k0 -3.0535 0.1769 0.0046 1483.6769 649.1879 0.9996 ⋯
k1 2.1484 0.0938 0.0027 1162.5996 647.5708 0.9995 ⋯
1 column omitted
Quantiles
parameters 2.5% 25.0% 50.0% 75.0% 97.5%
Symbol Float64 Float64 Float64 Float64 Float64
μ 0.3618 0.3930 0.4135 0.4317 0.4690
ϕ 0.8778 0.9414 0.9734 1.0031 1.0581
k0 -3.3755 -3.1758 -3.0523 -2.9308 -2.7100
k1 1.9765 2.0823 2.1494 2.2107 2.3339
means = DataFrame(mean(posteriors))
table = DataFrame(
Parameter = means.parameters,
PosteriorMean = means.mean,
Estimate = [logistic(means.mean[1]), exp(means.mean[2]), logistic(means.mean[3]), logistic(means.mean[4])],
TrueValue = [0.6, 2.5, 0.05, 0.9]
)
Row | Parameter | PosteriorMean | Estimate | TrueValue |
---|---|---|---|---|
Symbol | Float64 | Float64 | Float64 | |
1 | μ | 0.413168 | 0.601847 | 0.6 |
2 | ϕ | 0.971992 | 2.64321 | 2.5 |
3 | k0 | -3.05349 | 0.0450672 | 0.05 |
4 | k1 | 2.14838 | 0.895518 | 0.9 |
Validation against R Implementation
R Output
Let's start by making some toy data (the data itself doesn't matter, as we are only interested in getting the same results) and fit an Ordered Beta model using the ordbetareg
package.
library(ordbetareg)
data <- iris
data$y <- data$Petal.Width - min(data$Petal.Width)
data$y <- data$y / max(data$y)
data$x <- data$Petal.Length / max(data$Petal.Length)
# Inflate zeros and ones
data <- rbind(data, data[(data$y==0) | (data$y == 1), ])
data <- rbind(data, data[(data$y==0) | (data$y == 1), ])
ordbetareg(formula=y ~ x, data=data)
Family: ord_beta_reg
Links: mu = identity; phi = identity; cutzero = identity; cutone = identity
Formula: y ~ x
Data: data (Number of observations: 174)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept -3.82 0.14 -4.09 -3.55 1.00 2701 2494
x 6.42 0.21 6.00 6.83 1.00 2999 2808
Further Distributional Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
phi 22.33 2.56 17.61 27.69 1.00 4220 3026
cutzero -3.41 0.27 -3.95 -2.88 1.00 3202 3248
cutone 1.87 0.06 1.75 2.00 1.00 3323 3298
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Julia Implementation
Let us now do the same thing in Julia and Turing.
using RDatasets
using CairoMakie
using Turing
using StatsFuns: logistic
using SubjectiveScalesModels
data = dataset("datasets", "iris")
data.y = data.PetalWidth .- minimum(data.PetalWidth)
data.y = data.y ./ maximum(data.y)
data.x = data.PetalLength ./ maximum(data.PetalLength)
# Inflate zeros and ones
data = vcat(data, data[(data.y .== 0) .| (data.y .== 1), :])
data = vcat(data, data[(data.y .== 0) .| (data.y .== 1), :])
println("N-zero: ", sum(data.y .== 0) , ", N-one: ", sum(data.y .== 1))
N-zero: 20, N-one: 12
@model function model_ordbeta(y, x)
μ_intercept ~ Normal(0, 3)
μ_x ~ Normal(0, 3)
ϕ ~ Normal(0, 3)
cutzero ~ Normal(-3, 3)
cutone ~ Normal(3, 3)
for i in 1:length(y)
μ = μ_intercept + μ_x * x[i]
y[i] ~ OrderedBeta(logistic(μ), exp(ϕ), logistic(cutzero), logistic(cutone))
end
end
fit = model_ordbeta(data.y, data.x)
posteriors = sample(fit, NUTS(), 1000)
# Mean posterior
mean(posteriors)
Mean
parameters mean
Symbol Float64
μ_intercept -3.8180
μ_x 6.4139
ϕ 2.4165
cutzero -3.4651
cutone 3.2057
Note that due to Stan limitations, the R implementation has k1 (cutone) specified as the log of the difference from k0 (cutzero). We can convert the Julia results by doing: log(cutone - cutzero)
.
The parameters for mu μ are very similar, and that of phi ϕ is different (but that is expected as a different parametrization is used). The values for the cut points k0 and k1 (after the transformation specified above) are also very similar.
# Make predictions
pred = predict(model_ordbeta([missing for _ in 1:length(data.y)], data.x), posteriors)
pred = Array(pred)
fig = hist(data.y, color=:forestgreen, normalization=:pdf, bins=beta_bins(20))
for i in 1:size(pred, 1) # Iterate over each draw
# density!(pred[i, :], color=(:black, 0), strokecolor=(:crimson, 0.05), strokewidth=1)
hist!(pred[i, :], color=(:crimson, 0.01), normalization=:pdf, bins=beta_bins(20))
end
fig
The sharp number of zeros and ones makes it hard for typical plotting approaches to accurately reflect the distribution. Density plots will tend to be very distorted at the edges (due to the Gaussian kernel used), and histograms will be dependent on the binning. One option is to specify the bin edges in a convenient way to capture the zeros and ones (which is what the beta_bins
function does).
Conclusion: it seems like the Julia version is working as expected as compared to the original R implementation.