| Title: | Dirichlet Regression |
|---|---|
| Description: | Implements Dirichlet regression models. |
| Authors: | Marco Johannes Maier [cre, aut] (ORCID: <https://orcid.org/0000-0002-1715-7456>) |
| Maintainer: | Marco Johannes Maier <[email protected]> |
| License: | GPL (>= 2) |
| Version: | 0.7-1 |
| Built: | 2026-06-06 08:44:51 UTC |
| Source: | https://github.com/maiermarco/dirichletreg |
This package provides a functions to analyze compositional data using Dirichlet regression methods.
| Package: | DirichletReg |
| Type: | Package |
| Version: | 0.7-1 |
| Date: | 2021-04-29 |
| License: | GPL (>= 2) |
Marco J. Maier
example(plot.DirichletRegData) example(DirichReg)example(plot.DirichletRegData) example(DirichReg)
This function allows for pairwise tests of Dirichlet regression models using a likelihood ratio test (LRT).
## S3 method for class 'DirichletRegModel' anova(object, ..., sorted = FALSE)## S3 method for class 'DirichletRegModel' anova(object, ..., sorted = FALSE)
object |
the model to be compared against those listed in ... |
... |
models to be tested against the one specified as |
sorted |
should the models be sorted according to their numbers or parameters? |
The test statistic is computed where is the likelihood of model with equal to the difference of the number of parameters in the models.
Marco J. Maier
ALake <- ArcticLake ALake$AL <- DR_data(ArcticLake[,1:3]) mod0 <- DirichReg(AL ~ 1, ALake) mod1 <- DirichReg(AL ~ depth, ALake) mod2 <- DirichReg(AL ~ depth + I(depth^2), ALake) anova(mod1, mod0, mod2, sorted = TRUE)ALake <- ArcticLake ALake$AL <- DR_data(ArcticLake[,1:3]) mod0 <- DirichReg(AL ~ 1, ALake) mod1 <- DirichReg(AL ~ depth, ALake) mod2 <- DirichReg(AL ~ depth + I(depth^2), ALake) anova(mod1, mod0, mod2, sorted = TRUE)
These data are taken from Aitchison (2003) and contain information on the relation of sediment composition with depth in an Arctic lake.
ArcticLakeArcticLake
A data frame with 39 observations on the following 4 variables:
sand, silt, clay
relative frequencies of sand, silt, and clay
depthwater depth in meters
Aitchison, J. (2003). The Statistical Analysis of Compositional Data. The Blackburn Press, Caldwell, NJ.
head(ArcticLake) AL <- DR_data(ArcticLake[,1:3]) plot(AL) summary(AL)head(ArcticLake) AL <- DR_data(ArcticLake[,1:3]) plot(AL) summary(AL)
These data (Aitchison, 2003) list blood samples' compositions of Albumin, Pre-Albumin, Globulin A, and Globulin B in relation to two types of diseases. 14 patients suffer from disease A, 16 from disease B and 6 are unclassified.
BloodSamplesBloodSamples
A data frame with 36 observations on the following 6 variables.
Albumin, Pre.Albumin, Globulin.A, Globulin.B
the amounts of Albumin, Pre-Albumin, Globulin A, and Globulin B.
Diseasediagnosis of disease A, B, or NA for unclassified observations.
Newa factor indicating whether the observations are old and classified (No) or new and unclassified (Yes).
Aitchison, J. (2003). The Statistical Analysis of Compositional Data. The Blackburn Press, Caldwell, NJ.
head(BloodSamples) Bl <- DR_data(BloodSamples[,1:4]) summary(Bl)head(BloodSamples) Bl <- DR_data(BloodSamples[,1:4]) summary(Bl)
Density function and random number generation for the Dirichlet distribution
rdirichlet(n, alpha) ddirichlet(x, alpha, log = FALSE, sum.up = FALSE) ddirichlet_R(x, alpha, log = FALSE, sum.up = FALSE)rdirichlet(n, alpha) ddirichlet(x, alpha, log = FALSE, sum.up = FALSE) ddirichlet_R(x, alpha, log = FALSE, sum.up = FALSE)
n |
number of random observations to draw |
x |
a matrix containing observations |
alpha |
the Dirichlet distribution's parameters. Can be a vector (one set of parameters for all observations) or a matrix (a different set of parameters for each observation), see “Details” |
log |
if |
sum.up |
if |
The Dirichlet distribution is a multidimensional generalization of the Beta distribution where each dimension is governed by an -parameter.
Formally this is
Usually, alpha is a vector thus the same parameters will be used for all observations.
If alpha is a matrix, a complete set of -parameters must be supplied for each observation.
log returns the logarithm of the densities (therefore the log-likelihood) and sum.up returns the product or sum and thereby the likelihood or log-likelihood.
Dirichlet (log-)densities are by default computed using C-routines (ddirichlet_log_vector and ddirichlet_log_matrix), a version only using R is provided by ddirichlet_R.
Caution: Although .C() can be used to call the C routines directly, R will crash or produce wrong values, if, e.g., data types are not set properly.
rdirichlet |
returns a matrix with random numbers according to the supplied alpha vector or matrix. |
ddirichlet |
returns a vector of densities (if |
ddirichlet_R |
as |
Marco J. Maier
X1 <- rdirichlet(100, c(5, 5, 10)) a.mat <- cbind(1:10, 5, 10:1) a.mat X2 <- rdirichlet(10, a.mat) # note how the probabilities in the first an last column relate to a.mat round(X2, 2) ddirichlet(X1, c(5, 5, 10)) ddirichlet(X2, a.mat) ddirichlet(X2[1:3,], c(1, 2, -1)) ddirichlet(X2[1:3,], c(1, 2, -1), sum.up = TRUE)X1 <- rdirichlet(100, c(5, 5, 10)) a.mat <- cbind(1:10, 5, 10:1) a.mat X2 <- rdirichlet(10, a.mat) # note how the probabilities in the first an last column relate to a.mat round(X2, 2) ddirichlet(X1, c(5, 5, 10)) ddirichlet(X2, a.mat) ddirichlet(X2[1:3,], c(1, 2, -1)) ddirichlet(X2[1:3,], c(1, 2, -1), sum.up = TRUE)
This function prepares a matrix with compositional variables for further processing in the DirichletReg package.
DR_data(Y, trafo = sqrt(.Machine$double.eps), base = 1, norm_tol = sqrt(.Machine$double.eps)) ## S3 method for class 'DirichletRegData' print(x, type = c("processed", "original"), ...) ## S3 method for class 'DirichletRegData' summary(object, ...)DR_data(Y, trafo = sqrt(.Machine$double.eps), base = 1, norm_tol = sqrt(.Machine$double.eps)) ## S3 method for class 'DirichletRegData' print(x, type = c("processed", "original"), ...) ## S3 method for class 'DirichletRegData' summary(object, ...)
Y |
A |
trafo |
Either a logical or numeric value.
Transformation of variables causes the values to shrink away from extreme values of 0 and 1, see “Details”.
|
base |
The “base” component to use in the reparametrized model |
norm_tol |
Due to numerical precision, row sums of |
x |
A |
type |
Displays either the (possibly normalized or transformed) |
object |
A |
... |
Further arguments |
YY is a matrix or data.frame containing compositional variables.
If they do not sum up to 1 for all observations, normalization is forced where each row entry is divided by the row's sum (a warning will be issued that normalization was applied).
In case one row-entry (or more) is NA, the whole row will be returned as NA.
Beta-distributed variables can be supplied as a single vector which, however, has to have values in the interval .
The second variable will be generated (1 - Y) and a matrix consisting of the columns 1 - Y and Y will be returned.
A message will be issued that a beta-distributed variable was assumed and that this assumtion needs to be checked.
trafoThe transformation (done if trafo = TRUE) is a generalization of that proposed by Smithson and Verkuilen (2006) that transforms each component of by computing where is the number of observations in (this approach is also used in the package betareg, see Cribari-Neto & Zeileis, 2010).
For an arbitrary number of dimensions (or variables) the transformation is .
baseTo set the base (i.e., omitted) component of Y for the “alternative” (mean/precision) model, the argument base can be used. This is by default set to the first variable in Y (if a vector is be supplied, the column 1 - Y becomes the base component).
Note that the definition can be overruled in DirichReg.
x and object
Objects created by DR_data.
typespecifies for the print method whether the original or processed data are displayed.
The function returns a matrix object of class DirichletRegData with the following attributes:
attr(*, "dimnames")
|
a list with two entries, row names (by default |
attr(*, "Y.original")
|
the original data |
attr(*, "dims")
|
number of dimensions of |
attr(*, "dim.names")
|
the number of components in |
attr(*, "obs")
|
number of observations of |
attr(*, "valid_obs")
|
number of valid observations |
attr(*, "normalized")
|
a logical value indicating whether the data were normalized |
attr(*, "transformed")
|
a logical value indicating whether the data were transformed |
attr(*, "base")
|
number of the variable used as the base in the reparametrized model |
Marco J. Maier
Smithson, M. & Verkuilen, J. (2006). A Better Lemon Squeezer? Maximum-Likelihood Regression With Beta-Distributed Dependent Variables. Psychological Methods, 11(1), 54–71.
Cribari-Neto, F. & Zeileis, A. (2010). Beta Regression in R. Journal of Statistical Software, 34(2), 1–24.
# create a DirichletRegData object from the Arctic Lake data head(ArcticLake[, 1:3]) AL <- DR_data(ArcticLake[, 1:3]) summary(AL) head(AL)# create a DirichletRegData object from the Arctic Lake data head(ArcticLake[, 1:3]) AL <- DR_data(ArcticLake[, 1:3]) summary(AL) head(AL)
DirichletRegModel
These are available methods for the results of Dirichlet regression models and objects of class DirichletRegModel.
These methods contain functions for print and summary of the data, generate fitted values and predicting new values using predict.
Various types of residuals are implemented and confint can be used to compute confidence intervals of the parameters.
Furthermore logLik extracts the log-likelihood of the model and vcov extracts the covariance matrix of the parameter estimates.
## S3 method for class 'DirichletRegModel' print(x, digits = max(3, getOption("digits") - 3), ...) ## S3 method for class 'DirichletRegModel' summary(object, ...) ## S3 method for class 'DirichletRegModel' fitted(object, mu = TRUE, alpha = FALSE, phi = FALSE, ...) ## S3 method for class 'DirichletRegModel' predict(object, newdata, mu = TRUE, alpha = FALSE, phi = FALSE, ...) ## S3 method for class 'DirichletRegModel' residuals(object, type = c("standardized", "composite", "raw"), ...) ## S3 method for class 'DirichletRegModel' confint(object, parm, level, ..., type=c("all", "beta", "gamma"), exp = FALSE) ## S3 method for class 'DirichletRegConfint' print(x, digits = 3, ...) ## S3 method for class 'DirichletRegModel' logLik(object, ...) ## S3 method for class 'DirichletRegModel' AIC(object, ..., k = 2) ## S3 method for class 'DirichletRegModel' BIC(object, ...) ## S3 method for class 'DirichletRegModel' nobs(object, ...) ## S3 method for class 'DirichletRegModel' vcov(object, ...) ## S3 method for class 'DirichletRegModel' update(object, formula., ..., evaluate = TRUE) ## S3 method for class 'DirichletRegModel' drop1(object, scope, test = c("LRT", "none"), k = 2, sort = TRUE, ...)## S3 method for class 'DirichletRegModel' print(x, digits = max(3, getOption("digits") - 3), ...) ## S3 method for class 'DirichletRegModel' summary(object, ...) ## S3 method for class 'DirichletRegModel' fitted(object, mu = TRUE, alpha = FALSE, phi = FALSE, ...) ## S3 method for class 'DirichletRegModel' predict(object, newdata, mu = TRUE, alpha = FALSE, phi = FALSE, ...) ## S3 method for class 'DirichletRegModel' residuals(object, type = c("standardized", "composite", "raw"), ...) ## S3 method for class 'DirichletRegModel' confint(object, parm, level, ..., type=c("all", "beta", "gamma"), exp = FALSE) ## S3 method for class 'DirichletRegConfint' print(x, digits = 3, ...) ## S3 method for class 'DirichletRegModel' logLik(object, ...) ## S3 method for class 'DirichletRegModel' AIC(object, ..., k = 2) ## S3 method for class 'DirichletRegModel' BIC(object, ...) ## S3 method for class 'DirichletRegModel' nobs(object, ...) ## S3 method for class 'DirichletRegModel' vcov(object, ...) ## S3 method for class 'DirichletRegModel' update(object, formula., ..., evaluate = TRUE) ## S3 method for class 'DirichletRegModel' drop1(object, scope, test = c("LRT", "none"), k = 2, sort = TRUE, ...)
x |
an object of class |
object |
an object of class |
alpha |
logical; returns alpha values |
mu |
logical; returns expected values |
phi |
logical; returns precision values |
type |
for for |
newdata |
a |
k |
number for the weighting of parameters |
parm |
a vector containing names of the parameters to print |
level |
(a vector of) confidence level(s), defaults to |
exp |
logical; returns parameters in exponentiated form |
digits |
the number of digits in the output |
formula. |
the new formula to be updated, see |
evaluate |
if |
scope |
defines the scope of variables to be dropped, see |
test |
defines the type of test for |
sort |
if |
... |
further arguments |
Marco J. Maier
ALake <- ArcticLake ALake$AL <- DR_data(ArcticLake[, 1:3]) mod1 <- DirichReg(AL ~ depth + I(depth^2) | depth, data = ALake, model="alternative") update(mod1, . ~ . | . + I(depth^2), evaluate = FALSE) mod1 drop1(mod1) ### issues a caveat when used for the first time in an R session summary(mod1) head(fitted(mod1)) predict(mod1, newdata = data.frame("depth" = seq(10, 100, 10))) head(residuals(mod1)) confint(mod1) confint(mod1, exp = TRUE) logLik(mod1) round(vcov(mod1), 5)ALake <- ArcticLake ALake$AL <- DR_data(ArcticLake[, 1:3]) mod1 <- DirichReg(AL ~ depth + I(depth^2) | depth, data = ALake, model="alternative") update(mod1, . ~ . | . + I(depth^2), evaluate = FALSE) mod1 drop1(mod1) ### issues a caveat when used for the first time in an R session summary(mod1) head(fitted(mod1)) predict(mod1, newdata = data.frame("depth" = seq(10, 100, 10))) head(residuals(mod1)) confint(mod1) confint(mod1, exp = TRUE) logLik(mod1) round(vcov(mod1), 5)
This function allows for fitting Dirichlet regression models using two different parametrizations.
DirichReg(formula, data, model = c("common", "alternative"), subset, sub.comp, base, weights, control, verbosity = 0)DirichReg(formula, data, model = c("common", "alternative"), subset, sub.comp, base, weights, control, verbosity = 0)
formula |
the model formula (for different specifications see “Details”) |
data |
a |
model |
specifies whether the |
subset |
estimates the model for a subset of the data |
sub.comp |
analyze a subcomposition by selecting specific components (see “Details”) |
base |
redefine the base variable |
weights |
frequency weights |
control |
a list containing control parameters used for the optimization |
verbosity |
prints information about the function's progress, see Details |
formula determines the used predictors.
The responses must be prepared by DR_data and can be optionally stored in the object containing all covariates which is then specified as the argument data.
(Although “on-the-fly” processing of DR_data in a formula works, it is only intended for testing purposes and may be removed at any time – use at your own risk.)
There are two different parametrization (controlled by the argument model, see below):
the “common” param. that models each by an (possibly individual) set of predictors, and
the “alternative” param. that models expected values (; as in multinomial logistic regression) and precision parameters () with two sets of predictors.
As the two models offer different modeling strategies, the specification of their formulae differ:
The simplest possible model here is to include only an intercept for all components.
If DV is the ‘dependent variable’ (i.e., compositional data) with three components, we can request this null-model by DV ~ 1.
We always have at least two dependent variables, so simple formulae as the one given above will be expanded to DV ~ 1 | 1 | 1, because DV hast three components.
Likewise, it is possible to specify a common set of predictors for all components, as in DV ~ p1 * p2, where p1 and p2 are predictors.
If the covariates of the components shall differ, one has to set up a complete formula for each subcomposition, using | as separators between the components, for example, DV ~ p1 | p1 + p2 | p1 * p2 will lead to a model where the first response in DV will be modeled using p1, the second will be predicted by p1 + p2 and the third by p1 * p2.
Note that if you use the latter approach, the predictors have to be stated
explicitly for all response variables.
The simplest possible model here is to include an intercept for all components (except the base) and an intercept for precision. This can be achieved by DV ~ 1, which is expanded to DV ~ 1 | 1. The part modeling the ‘mean’ (first element on the right-hand side) is mandatory, if no specification for precision is included, an intercept will be added. Note that you need to set model = "alternative" to use this parametrization!
The alternative parametrization consists of two parts: modeled expected values () and their ‘precision’ ().
As in multinomial logistic regression, one response variable is omitted (by default the first, but this can be changed by the base argument in DR_data or DirichReg) and for the rest a set of predictors is used with a multinomial logit-link.
For precisions, a different set of predictors can be set up using a log-link.
DV ~ p1 * p2 | p1 + p2 will set up a model where the expected values are predicted by p1 * p2 and precision are modeled using p1 + p2.
The data argument accepts a data.frame that must include the dependent variable as a named element (see examples how to do this).
The base-component (i.e., omitted component) is initially set during the stage of data preparation DR_data, but can easily be changed using the argument base which takes integer values from 1 to the maximum number of components.
If a data set contains a large number of components, of which only a few are relevant, the latter can be ‘sorted out’ and the irrelevant (i.e., not selected) components will be aggregated into a single variable (row sums) that automatically becomes the base category for the model, unless specified otherwise by base. The positioning of variables will necessarily change: the aggregated variable takes the first column and the others are appended in their order of selection.
Using subset, the model can be fitted only to a part of the data, for more information about this functionality, see subset.
Note that, unlike in glm, weights are not treated as prior weights, but as frequency weights!
Using the control argument, the settings passed to the optimizers can be altered.
This argument takes a named list.
To supply user-defined starting values, use control = list(sv=c(...)) and supply a vector containing initial values for all parameters.
Optimizer-specific options include the number of iterations (iterlim = 1000) and convergence criteria for the BFGS- and NR-optimization ((tol1 = 1e-5) and (tol2 = 1e-10)).
Verbosity takes integer values from 0 to 4.
0, no information is printed (default).
1 prints information about 3 stages (preparation, starting values, estimation).
2 prints little information about optimization (verbosity values greater than one are passed to print.default = verbosity - 1 of maxBFGS and maxNR).
3 prints more information about optimization.
4 prints all information about optimization.
call |
[ |
parametrization |
[ |
varnames |
[ |
n.vars |
[ |
dims |
[ |
Y |
[ |
X |
[ |
Z |
[ |
sub.comp |
[ |
base |
[ |
weights |
[ |
orig.resp |
[ |
data |
[ |
d |
[ |
formula |
[ |
mf_formula |
[ |
npar |
[ |
coefficients |
[ |
coefnames |
[ |
fitted.values |
[ |
logLik |
[ |
vcov |
[ |
hessian |
[ |
se |
[ |
optimization |
[ |
Marco J. Maier
ALake <- ArcticLake ALake$Y <- DR_data(ALake[,1:3]) # fit a quadratic Dirichlet regression models ("common") res1 <- DirichReg(Y ~ depth + I(depth^2), ALake) # fit a Dirichlet regression with quadratic predictor for the mean and # a linear predictor for precision ("alternative") res2 <- DirichReg(Y ~ depth + I(depth^2) | depth, ALake, model="alternative") # test both models anova(res1, res2) res1 summary(res2)ALake <- ArcticLake ALake$Y <- DR_data(ALake[,1:3]) # fit a quadratic Dirichlet regression models ("common") res1 <- DirichReg(Y ~ depth + I(depth^2), ALake) # fit a Dirichlet regression with quadratic predictor for the mean and # a linear predictor for precision ("alternative") res2 <- DirichReg(Y ~ depth + I(depth^2) | depth, ALake, model="alternative") # test both models anova(res1, res2) res1 summary(res2)
Data from Aitchison (2003)
GlacialTillsGlacialTills
A data frame with 92 observations on the following 5 variables.
Red.Sandstonea numeric vector
Gray.Sandstonea numeric vector
Crystallinea numeric vector
Miscellaneousa numeric vector
Pcounta numeric vector
Aitchison, J. (2003). The Statistical Analysis of Compositional Data. The Blackburn Press, Caldwell, NJ.
With this function you can plot Dirichlet-distributed data in 2, 3 and 4 dimensions.
## S3 method for class 'DirichletRegData' plot(x, dims, ticks = TRUE, ref.lines = NULL, dim.labels, a2d = list(colored = TRUE, c.grid = TRUE, col.scheme = c("dims", "entropy"), entropy.contours = FALSE, entropy.colors = FALSE), a3d = list(rgl = TRUE, ...), rug = TRUE, reset_par = TRUE, ...)## S3 method for class 'DirichletRegData' plot(x, dims, ticks = TRUE, ref.lines = NULL, dim.labels, a2d = list(colored = TRUE, c.grid = TRUE, col.scheme = c("dims", "entropy"), entropy.contours = FALSE, entropy.colors = FALSE), a3d = list(rgl = TRUE, ...), rug = TRUE, reset_par = TRUE, ...)
x |
data prepared with |
dims |
select two, three, or four Dimensions of your data |
ticks |
display ticks? |
ref.lines |
. |
dim.labels |
a character vector giving labels for the dimensions/variables |
a2d |
a named list of settings for ternary plots (3 variables), see Details |
a3d |
a named list of settings for quaternary plots (4 variables), see Details |
rug |
display a rug for a one-dimensional plot (2 variables) |
reset_par |
reset graphical parameters of |
... |
further graphical arguments as |
Marco J. Maier
# plot of "Sand" in the Arctic Lake data set plot(DR_data(ReadingSkills[, 1]), main="Reading Accuracy") # ternary plot of Arctic Lake data plot(DR_data(ArcticLake[, 1:3]), a2d = list(colored = FALSE))# plot of "Sand" in the Arctic Lake data set plot(DR_data(ReadingSkills[, 1]), main="Reading Accuracy") # ternary plot of Arctic Lake data plot(DR_data(ArcticLake[, 1:3]), a2d = list(colored = FALSE))
These data provide transformed reading accuracy scores predicted by IQ and diagnosed dyslexia.
ReadingSkillsReadingSkills
A data frame containing 44 observations on 3 variables.
accuracyreading accuracy score transformed to fit into
dyslexiaa factor with the diagnosis of dyslexia (“yes” or “no”)
iqnon-verbal IQ (-scores; , )
Example 3 from http://www.michaelsmithson.online/stats/betareg/betareg.html
A compilation of four datasets listed in Aitchison (2003)
Each type of rock has 25 observations – to use only a certain type of rock, see “Details”.
RocksRocks
A data frame with 100 observations on the following 8 variables.
Albite, Blandite, Cornite, Daubite, Endite
numeric vectors
deptha numeric vector
porositya numeric vector
typea factor with levels Boxite Coxite Hongite Kongite
Aitchison, J. (2003). The Statistical Analysis of Compositional Data. The Blackburn Press, Caldwell, NJ.
These functions transform a matrix with three or four components to fit into a two- or three-dimensional simplex (triangle or tetrahedron).
toSimplex(x) toTernary(abc) toTernaryVectors(c1, c2, c3) toQuaternary(abcd) toQuaternaryVectors(c1, c2, c3, c4)toSimplex(x) toTernary(abc) toTernaryVectors(c1, c2, c3) toQuaternary(abcd) toQuaternaryVectors(c1, c2, c3, c4)
x |
a matrix-like object with 3 or 4 columns. |
abc |
a matrix-like object with 3 columns. |
abcd |
a matrix-like object with 4 columns. |
c1 |
a numeric vector with values of the first component. |
c2 |
a numeric vector with values of the second component. |
c3 |
a numeric vector with values of the third component. |
c4 |
a numeric vector with values of the fourth component. |
Most of these functions are only used internally, but sometimes it might be useful to plot “custom” ternary or quaternary graphics.
Note that, apart from toSimplex(), functions do not have any checks, so it is advisable to use this function if elements are added to plots or own graphics are created.
The function returns a matrix object with coordinates in two or three dimensions
In prior versions (up to 0.5-0), an unexported function coord.trafo() was used internally and could also be accessed via DirichletReg:::coord.trafo().
If you have used this in your code, you will get a message that the function is now deprecated and will become defunct in the future.
Use toSimplex() instead.
Marco J. Maier
# create a DirichletRegData object from the Arctic Lake data "to be added"# create a DirichletRegData object from the Arctic Lake data "to be added"