Title: | Companion Package to the Book 'R: Einführung durch angewandte Statistik' |
---|---|
Description: | Provides functions used in the 'R: Einführung durch angewandte Statistik' (second edition). |
Authors: | Marco Johannes Maier [cre, aut] |
Maintainer: | Marco Johannes Maier <[email protected]> |
License: | GPL-2 |
Version: | 0.9.4 |
Built: | 2025-02-19 04:44:51 UTC |
Source: | https://github.com/cran/REdaS |
The REdaS Package provides functions used in the second edition of “R: Einführung durch angewandte Statistik”.
Package: | REdaS |
Type: | Package |
Version: | 0.9.4 |
Date: | 2022-06-11 |
License: | GPL-2 |
Autor and Maintainer: Marco J. Maier [email protected]
Hatzinger, R., Hornik, K., Nagel, H., & Maier, M. J. (2014). R: Einführung durch angewandte Statistik. München: Pearson Studium.
Implements Barlett's Test of Sphericity which tests whether a matrix is significantly different from an identity matrix.
bart_spher(x, use = c("everything", "all.obs", "complete.obs", "na.or.complete", "pairwise.complete.obs")) ## S3 method for class 'bart_spher' print(x, ...)
bart_spher(x, use = c("everything", "all.obs", "complete.obs", "na.or.complete", "pairwise.complete.obs")) ## S3 method for class 'bart_spher' print(x, ...)
x |
a data matrix or the object to be printed. |
use |
defines the method to use if missing values are present (see Examples and |
... |
further arguments for the |
The test statistic as defined in Eq. (3) in Bartlett (1951) is
where
is the number of observations,
the number of variables, and
the correlation matrix of the data supplied in
x
. is the determinant of
.
Bartlett's is asymptotically
-distributed with
under the null hypothesis.
Note that, because the bias-corrected correlation matrix is used, is employed instead of
, as in the paper.
If no missing values are present in the data matrix x
, use
will work with any setting and no adjustments are necessary. In this case, is the number of rows in
x
.
For listwise deletion (use = "complete.obs"
or "na.or.complete"
), is the number of remaining rows in
x
.
When use = "pairwise.complete.obs"
, is approximated as the sum of relative non-missing responses for all observations with 2 or more valid responses.
If listwise/pairwise methods are used to compute the correlation matrix and the test statistic, a warning will be issued when printing the object.
A list object of class 'bart_spher'
call |
the issued function call |
x |
the original data |
cormat |
the correlation matrix computed from the data |
use |
treatment of |
n |
the number of used observations |
k |
the number of variables/items |
X2 |
the computed |
df |
degrees of freedom |
p.value |
the |
warn |
logical value indicating whether a warning regarding missing values will be issued (see Details) |
Marco J. Maier
Bartlett, M. S. (1951). The Effect of Standardization on a Approximation in Factor Analysis. Biometrika 38(3/4), 337–344.
# generate a data frame with 3 variables and 100 observations set.seed(5L) datamatrix <- data.frame("A" = rnorm(100), "B" = rnorm(100), "C" = rnorm(100)) head(datamatrix) # correlation matrix cor(datamatrix) # bartlett's test bart_spher(datamatrix) # effects of missing observations on correlations: to illustrate this, the first # observation on variable A is set to NA datamatrix[1, 1] <- NA head(datamatrix) # "everything" (the default) causes all correlations involving a variable with # missing values to be NA (in this case, all pairwise correlations with the # variable "A") cor(datamatrix) # "all.obs" generates an error if missing values are present. ## Not run: cor(datamatrix, use = "all.obs") ## End(Not run) # "complete.obs" and "na.or.complete" delete complete observations if there are # NA (in this case, the first case would be deleted). If there are no complete # cases left after the listwise deletion, "complete.obs" results in an error # while "na.or.complete" returns a matrix with all elements being NA. cor(datamatrix, use = "complete.obs") cor(datamatrix, use = "na.or.complete") # "pairwise.complete.obs" uses all non-missing pairwise values. If there are no # non-missing value pairs in two variables, the results will be NA. # It is possible that correlation matrices are not positive semi-definite. cor(datamatrix, use = "pairwise.complete.obs") # with the missing value in the first cell, the test does not work anymore: ## Not run: bart_spher(datamatrix) ## End(Not run) # deleting the whole first observation (listwise) gives bart_spher(datamatrix, use = "na.or.complete") # using pairwise-correlation, the result is bart_spher(datamatrix, use = "pairwise.complete.obs")
# generate a data frame with 3 variables and 100 observations set.seed(5L) datamatrix <- data.frame("A" = rnorm(100), "B" = rnorm(100), "C" = rnorm(100)) head(datamatrix) # correlation matrix cor(datamatrix) # bartlett's test bart_spher(datamatrix) # effects of missing observations on correlations: to illustrate this, the first # observation on variable A is set to NA datamatrix[1, 1] <- NA head(datamatrix) # "everything" (the default) causes all correlations involving a variable with # missing values to be NA (in this case, all pairwise correlations with the # variable "A") cor(datamatrix) # "all.obs" generates an error if missing values are present. ## Not run: cor(datamatrix, use = "all.obs") ## End(Not run) # "complete.obs" and "na.or.complete" delete complete observations if there are # NA (in this case, the first case would be deleted). If there are no complete # cases left after the listwise deletion, "complete.obs" results in an error # while "na.or.complete" returns a matrix with all elements being NA. cor(datamatrix, use = "complete.obs") cor(datamatrix, use = "na.or.complete") # "pairwise.complete.obs" uses all non-missing pairwise values. If there are no # non-missing value pairs in two variables, the results will be NA. # It is possible that correlation matrices are not positive semi-definite. cor(datamatrix, use = "pairwise.complete.obs") # with the missing value in the first cell, the test does not work anymore: ## Not run: bart_spher(datamatrix) ## End(Not run) # deleting the whole first observation (listwise) gives bart_spher(datamatrix, use = "na.or.complete") # using pairwise-correlation, the result is bart_spher(datamatrix, use = "pairwise.complete.obs")
This function computes (one or more) confidence intervals (CIs) for a vector of observations or a table
object and returns an object of class 'freqCI'
to draw a bar plot of the results.
freqCI(x, level = 0.95) ## S3 method for class 'freqCI' print(x, percent = TRUE, digits, ...) ## S3 method for class 'freqCI' barplot(height, percent = TRUE, ...)
freqCI(x, level = 0.95) ## S3 method for class 'freqCI' print(x, percent = TRUE, digits, ...) ## S3 method for class 'freqCI' barplot(height, percent = TRUE, ...)
x |
must either be a numeric or factor object of individual observations (character vectors are also accepted, but a warning is issued) or an object of class |
level |
a numeric vector of confidence levels in |
percent |
if |
digits |
the number of digits to print (default to 2 if values are represented as percents or 4 if relative frequencies are used. |
height |
to plot the proportions and confidence intervals, an object of class |
... |
further arguments. |
ref to the book
freqCI()
returns an object of class 'freqCI'
as a list:
call |
the function call issued |
x |
the original object |
level |
the confidence levels |
freq |
a numeric vector of frequencies |
n |
the number of observations |
rel_freq |
relative frequencies |
cat_names |
category names |
CIs_low |
lower confidence interval boundary/boundaries |
CIs_high |
upper confidence interval boundary/boundaries |
print.freqCI()
invisibly returns a matrix with the confidence intervals and estimates.
barplot.freqCI()
invisibly returns a vector with the -coordinates of the plotted bars.
Marco J. Maier
# generate some simple data using rep() and inspect them using table() mydata <- rep(letters[1:3], c(100,200,300)) table(mydata) 100 * prop.table(table(mydata)) # compute 95% and 99% confidence intervals and print them with standard settings res <- freqCI(mydata, level = c(.95, .99)) res # print the result as relative frequencies rounded to 3 digits, save the result # and print the invisibly returned matrix resmat <- print(res, percent = FALSE, digits = 3) resmat # plot the results and save the x-coordinates x_coo <- barplot(res) x_coo # use the x-coordinates to plot the frequencies per category text(x_coo, 0, labels = paste0("n = ", res$freq), pos = 3)
# generate some simple data using rep() and inspect them using table() mydata <- rep(letters[1:3], c(100,200,300)) table(mydata) 100 * prop.table(table(mydata)) # compute 95% and 99% confidence intervals and print them with standard settings res <- freqCI(mydata, level = c(.95, .99)) res # print the result as relative frequencies rounded to 3 digits, save the result # and print the invisibly returned matrix resmat <- print(res, percent = FALSE, digits = 3) resmat # plot the results and save the x-coordinates x_coo <- barplot(res) x_coo # use the x-coordinates to plot the frequencies per category text(x_coo, 0, labels = paste0("n = ", res$freq), pos = 3)
Converts radians to degrees and vice versa.
deg2rad(d) rad2deg(r)
deg2rad(d) rad2deg(r)
d |
degrees |
r |
radians |
Since , degrees (
) can be converted to radians (
) using
and the conversion of radians to degrees is
.
Marco J. Maier
see Trigonometric Functions, Hyperbolic Functions, Constants in R
# pi is available as a constant pi # 180° are pi radians deg2rad(180) # 2 * pi radians are 360° rad2deg(2 * pi)
# pi is available as a constant pi # 180° are pi radians deg2rad(180) # 2 * pi radians are 360° rad2deg(2 * pi)
This function draws a (grouped) boxplot-like plot with with kernel density estimators.
densbox(formula, data, rug = FALSE, from, to, gsep = .5, kernel, bw, main, ylab, var_names, box_out = TRUE, horizontal = FALSE, ...)
densbox(formula, data, rug = FALSE, from, to, gsep = .5, kernel, bw, main, ylab, var_names, box_out = TRUE, horizontal = FALSE, ...)
formula |
a |
data |
a data frame containing the variables specified in formula |
rug |
a logical value to add a rug to the individual density-boxes |
from |
an optional lower boundary for the kernel density estimation (see |
to |
an optional upper boundary for the kernel density estimation (see |
gsep |
a numeric value |
kernel |
a string specifying the type of the kernel (default: |
bw |
the bandwidth for kernel density estimation (see |
main |
a character object for the title |
ylab |
a character object for the |
var_names |
a character object to print grouping variables' names in the lower left margin – grouping variables are treated in the order they are given in the formula |
box_out |
if |
horizontal |
not implemented yet... |
... |
further arguments, see Details |
This function plots a combination of boxplots and kernel density plots to get a more informative graphic of a metric dependent variable with respect to grouped data. The central element is the formula
argument that defines the dependent variable (dv) and grouping variables (independent variables, iv). For a meaningful plot, the ivs should be categorical variables (they are treated as factors).
In the simplest case, there is no grouping, so formula
is DV ~ 1
.
As grouping variables are added, the plot will be split up accordingly.
Note that the ordering of ivs in the formula defines how the plot is split up – the first variable is the most general grouping, the second will form subgroups in the first variable's groups and so on ...
If there are cases where a level of a factor is completely missing ab initio, the level will be dropped.
Subgroups with less than 5 observations will be dropped and “” will be plotted instead.
Marco J. Maier
density
,
boxplot
,
grid (Package)
# plot a density-box-plot of one (log-normal) variable set.seed(5L) data1 <- rlnorm(100, 1, .5) densbox(data1 ~ 1, from = 0, rug = TRUE) # plots a continuous variable in (0, 1) with 2 grouping variables data2 <- data.frame(y = rnorm(400, rep(c(0, 1, -1, 0), each = 100), 1), x1 = rep(c("A", "B"), each = 200), x2 = rep(c("X", "Y", "X", "Y"), each = 100)) with(data2, tapply(y, list(x1, x2), mean)) # a density-box-plot of the data with the kernel density # estimator constrained to the interval 0 to 1 densbox(y ~ x2 + x1, data2, main = "Plot with some\nSpecials", var_names = c("Second\nVariable", "First Variable")) # the same plot with a rug and ignoring outliers in the boxplot densbox(y ~ x2 + x1, data2, rug = TRUE, box_out = FALSE) # density-box-plot with the same data, but no additional space between groups # by setting gsep = 0. # the kernel density plots have a rectangular kernel with a bandwidth of 0.25 # which results in a "jagged" appearance. densbox(y ~ x2 + x1, data2, gsep = 0, kernel = "rectangular", bw = 0.25)
# plot a density-box-plot of one (log-normal) variable set.seed(5L) data1 <- rlnorm(100, 1, .5) densbox(data1 ~ 1, from = 0, rug = TRUE) # plots a continuous variable in (0, 1) with 2 grouping variables data2 <- data.frame(y = rnorm(400, rep(c(0, 1, -1, 0), each = 100), 1), x1 = rep(c("A", "B"), each = 200), x2 = rep(c("X", "Y", "X", "Y"), each = 100)) with(data2, tapply(y, list(x1, x2), mean)) # a density-box-plot of the data with the kernel density # estimator constrained to the interval 0 to 1 densbox(y ~ x2 + x1, data2, main = "Plot with some\nSpecials", var_names = c("Second\nVariable", "First Variable")) # the same plot with a rug and ignoring outliers in the boxplot densbox(y ~ x2 + x1, data2, rug = TRUE, box_out = FALSE) # density-box-plot with the same data, but no additional space between groups # by setting gsep = 0. # the kernel density plots have a rectangular kernel with a bandwidth of 0.25 # which results in a "jagged" appearance. densbox(y ~ x2 + x1, data2, gsep = 0, kernel = "rectangular", bw = 0.25)
description
KMOS(x, use = c("everything", "all.obs", "complete.obs", "na.or.complete", "pairwise.complete.obs")) ## S3 method for class 'MSA_KMO' print(x, stats = c("both", "MSA", "KMO"), vars = "all", sort = FALSE, show = "all", digits = getOption("digits"), ...)
KMOS(x, use = c("everything", "all.obs", "complete.obs", "na.or.complete", "pairwise.complete.obs")) ## S3 method for class 'MSA_KMO' print(x, stats = c("both", "MSA", "KMO"), vars = "all", sort = FALSE, show = "all", digits = getOption("digits"), ...)
x |
The data |
use |
defines the method to use if missing values are present (for a detailed explanation see |
stats |
determines if |
vars |
can be |
sort |
sorts the MSAs in increasing order. |
show |
shows the specified number of variables (from 1 to the number of potentially sorted variables). |
digits |
the number of decimal places to print. |
... |
further arguments. |
The Measure of Sampling Adequacy (MSA) for individual items and the Kaiser-Meyer-Olkin (KMO) Criterion rely on the Anti-Image-Correlation Matrix (for details see Kaiser & Rice, 1974) that contains all bivariate partial correlations given all other items in the
which is:
where is the correlation matrix, based on the data
.
The KMO and MSAs for individual items are (adapted from Equations (3) and (4) in Kaiser & Rice, 1974; note that is
in the article):
Historically, as suggested in Kaiser (1974) and Kaiser & Rice (1974), a rule of thumb for those values is:
|
marvelous |
|
meritorious |
|
middling |
|
mediocre |
|
miserable |
|
unacceptable |
A list of class 'MSA_KMO'
call |
the issued function call |
cormat |
correlation matrix |
pcormat |
normalized negative inverse of the correlation matrix (pairwise correlations given all other variables) |
n |
the number of observations |
k |
the number of variables/items |
MSA |
measure of sampling adequacy |
KMO |
Kaiser-Meyer-Olkin criterion |
Marco J. Maier
Kaiser, H. F. (1970). A Second Generation Little Jiffy. Psychometrika, 35(4), 401–415.
Kaiser, H. F. (1974). An Index of Factorial Simplicity. Psychometrika, 39(1), 31–36.
Kaiser, H. F., & Rice, J. (1974). Little Jiffy, Mark IV. Educational and Psychological Measurement, 34, 111–117.
set.seed(5L) daten <- data.frame("A"=rnorm(100), "B"=rnorm(100), "C"=rnorm(100), "D"=rnorm(100), "E"=rnorm(100)) cor(daten) KMOS(daten, use = "pairwise.complete.obs")
set.seed(5L) daten <- data.frame("A"=rnorm(100), "B"=rnorm(100), "C"=rnorm(100), "D"=rnorm(100), "E"=rnorm(100)) cor(daten) KMOS(daten, use = "pairwise.complete.obs")
This function computes the (log-)odds ratio (OR) for a table (
x
must be an object of class 'table'
either by using table
or as.table
). For a data frame
of variables with 2 categories each, all
pairwise (log-)odds-ratios are computed.
odds_ratios(x) ## S3 method for class 'REdaS_ORs' print(x, ...) ## S3 method for class 'REdaS_ORs' summary(object, ...)
odds_ratios(x) ## S3 method for class 'REdaS_ORs' print(x, ...) ## S3 method for class 'REdaS_ORs' summary(object, ...)
x |
either a |
object |
an object of class |
... |
further arguments. |
Note that tables where one or more cells are 0 are not processed and a warning is issued in such cases.
odds_ratios()
returns a list of class 'REdaS_ORs'
:
call |
the issued function call. |
x |
the original data. |
tables |
a list of one or more tables. |
comps |
a list of the compared variables' names. |
ORs |
a list with (log-)odds-ratios, standard errors, |
print.REdaS_ORs()
invisibly returns a matrix containing all statistics shown by the print
-method.
Marco J. Maier
# create a table from a 2 x 2 matrix of frequencies using as.table() tab <- as.table( matrix(c(49, 1, 5, 45), 2) ) dimnames(tab) <- list("LED on?" = c("no", "yes"), "PC running?" = c("no", "yes")) tab odds_ratios(tab) # generate a matrix with 3 variables and 100 observations # note that each variable must have exactly two categories set.seed(5) x <- data.frame("A" = as.factor(sample(1:2, 100, TRUE)), "B" = as.factor(sample(3:4, 100, TRUE)), "C" = as.factor(sample(5:6, 100, TRUE))) head(x) res <- odds_ratios(x) # print the results and save the summarized information in a matrix resmat <- print(res) resmat # the summary method gives a rather lengthy output with all tables etc. summary(res)
# create a table from a 2 x 2 matrix of frequencies using as.table() tab <- as.table( matrix(c(49, 1, 5, 45), 2) ) dimnames(tab) <- list("LED on?" = c("no", "yes"), "PC running?" = c("no", "yes")) tab odds_ratios(tab) # generate a matrix with 3 variables and 100 observations # note that each variable must have exactly two categories set.seed(5) x <- data.frame("A" = as.factor(sample(1:2, 100, TRUE)), "B" = as.factor(sample(3:4, 100, TRUE)), "C" = as.factor(sample(5:6, 100, TRUE))) head(x) res <- odds_ratios(x) # print the results and save the summarized information in a matrix resmat <- print(res) resmat # the summary method gives a rather lengthy output with all tables etc. summary(res)