Package 'REdaS'

Title: Companion Package to the Book 'R: Einführung durch angewandte Statistik'
Description: Provides functions used in the 'R: Einführung durch angewandte Statistik' (second edition).
Authors: Marco Johannes Maier [cre, aut]
Maintainer: Marco Johannes Maier <[email protected]>
License: GPL-2
Version: 0.9.4
Built: 2025-02-19 04:44:51 UTC
Source: https://github.com/cran/REdaS

Help Index


The REdaS Package

Description

The REdaS Package provides functions used in the second edition of “R: Einführung durch angewandte Statistik”.

Details

Package: REdaS
Type: Package
Version: 0.9.4
Date: 2022-06-11
License: GPL-2

Author(s)

Autor and Maintainer: Marco J. Maier [email protected]

References

Hatzinger, R., Hornik, K., Nagel, H., & Maier, M. J. (2014). R: Einführung durch angewandte Statistik. München: Pearson Studium.


Bartlett's Test of Sphericity

Description

Implements Barlett's Test of Sphericity which tests whether a matrix is significantly different from an identity matrix.

Usage

bart_spher(x, use = c("everything", "all.obs", "complete.obs",
                      "na.or.complete", "pairwise.complete.obs"))

## S3 method for class 'bart_spher'
print(x, ...)

Arguments

x

a data matrix or the object to be printed.

use

defines the method to use if missing values are present (see Examples and cor).

...

further arguments for the print method.

Details

The test statistic X2X^2 as defined in Eq. (3) in Bartlett (1951) is X2=[(n1)(2k+5)/6]log(R)X^2=-[(n-1)-(2k+5)/6]\cdot\log(\left|\mathbf{R}\right|) where nn is the number of observations, kk the number of variables, and R\mathbf{R} the correlation matrix of the data supplied in x. R\left|\mathbf{R}\right| is the determinant of R\mathbf{R}.

Bartlett's X2X^2 is asymptotically χ2\chi^2-distributed with df=k(k1)/2\mathit{df}=k(k-1)/2 under the null hypothesis.

Note that, because the bias-corrected correlation matrix is used, (n1)(n-1) is employed instead of nn, as in the paper.

Treatment of Missing Values

If no missing values are present in the data matrix x, use will work with any setting and no adjustments are necessary. In this case, nn is the number of rows in x.

For listwise deletion (use = "complete.obs" or "na.or.complete"), nn is the number of remaining rows in x.

When use = "pairwise.complete.obs", nn is approximated as the sum of relative non-missing responses for all observations with 2 or more valid responses.

If listwise/pairwise methods are used to compute the correlation matrix and the test statistic, a warning will be issued when printing the object.

Value

A list object of class 'bart_spher'

call

the issued function call

x

the original data

cormat

the correlation matrix computed from the data

use

treatment of NAs

n

the number of used observations

k

the number of variables/items

X2

the computed X2X^2 value

df

degrees of freedom

p.value

the pp-value

warn

logical value indicating whether a warning regarding missing values will be issued (see Details)

Author(s)

Marco J. Maier

References

Bartlett, M. S. (1951). The Effect of Standardization on a χ2\chi^2 Approximation in Factor Analysis. Biometrika 38(3/4), 337–344.

See Also

cor() and KMOS()

Examples

# generate a data frame with 3 variables and 100 observations
set.seed(5L)
datamatrix <- data.frame("A" = rnorm(100), "B" = rnorm(100), "C" = rnorm(100))
head(datamatrix)

# correlation matrix
cor(datamatrix)


# bartlett's test
bart_spher(datamatrix)


# effects of missing observations on correlations: to illustrate this, the first
# observation on variable A is set to NA
datamatrix[1, 1] <- NA
head(datamatrix)

# "everything" (the default) causes all correlations involving a variable with
# missing values to be NA (in this case, all pairwise correlations with the
# variable "A")
cor(datamatrix)

# "all.obs" generates an error if missing values are present.
## Not run: 
cor(datamatrix, use = "all.obs")
## End(Not run)

# "complete.obs" and "na.or.complete" delete complete observations if there are
# NA (in this case, the first case would be deleted). If there are no complete
# cases left after the listwise deletion, "complete.obs" results in an error
# while "na.or.complete" returns a matrix with all elements being NA.
cor(datamatrix, use = "complete.obs")
cor(datamatrix, use = "na.or.complete")

# "pairwise.complete.obs" uses all non-missing pairwise values. If there are no
# non-missing value pairs in two variables, the results will be NA.
# It is possible that correlation matrices are not positive semi-definite.
cor(datamatrix, use = "pairwise.complete.obs")


# with the missing value in the first cell, the test does not work anymore:
## Not run: 
bart_spher(datamatrix)
## End(Not run)

# deleting the whole first observation (listwise) gives
bart_spher(datamatrix, use = "na.or.complete")

# using pairwise-correlation, the result is
bart_spher(datamatrix, use = "pairwise.complete.obs")

Confidence Intervals for Relative Frequencies

Description

This function computes (one or more) confidence intervals (CIs) for a vector of observations or a table object and returns an object of class 'freqCI' to draw a bar plot of the results.

Usage

freqCI(x, level = 0.95)

## S3 method for class 'freqCI'
print(x, percent = TRUE, digits, ...)

## S3 method for class 'freqCI'
barplot(height, percent = TRUE, ...)

Arguments

x

must either be a numeric or factor object of individual observations (character vectors are also accepted, but a warning is issued) or an object of class 'table' of frequencies (produced using table or as.table)

level

a numeric vector of confidence levels in (0,1)(0,\,1).

percent

if TRUE, all values are printed as percentages, else relative frequencies are printed.

digits

the number of digits to print (default to 2 if values are represented as percents or 4 if relative frequencies are used.

height

to plot the proportions and confidence intervals, an object of class 'freqCI' must be used with the generic barplot function.

...

further arguments.

Details

ref to the book

Value

freqCI() returns an object of class 'freqCI' as a list:

call

the function call issued

x

the original object

level

the confidence levels

freq

a numeric vector of frequencies

n

the number of observations

rel_freq

relative frequencies

cat_names

category names

CIs_low

lower confidence interval boundary/boundaries

CIs_high

upper confidence interval boundary/boundaries

print.freqCI() invisibly returns a matrix with the confidence intervals and estimates.

barplot.freqCI() invisibly returns a vector with the xx-coordinates of the plotted bars.

Author(s)

Marco J. Maier

See Also

table, as.table, barplot

Examples

# generate some simple data using rep() and inspect them using table()
mydata <- rep(letters[1:3], c(100,200,300))
table(mydata)
100 * prop.table(table(mydata))

# compute 95% and 99% confidence intervals and print them with standard settings
res <- freqCI(mydata, level = c(.95, .99))
res

# print the result as relative frequencies rounded to 3 digits, save the result
# and print the invisibly returned matrix
resmat <- print(res, percent = FALSE, digits = 3)
resmat

# plot the results and save the x-coordinates
x_coo <- barplot(res)
x_coo

# use the x-coordinates to plot the frequencies per category
text(x_coo, 0, labels = paste0("n = ", res$freq), pos = 3)

Conversion between Radians and Degrees

Description

Converts radians to degrees and vice versa.

Usage

deg2rad(d)

rad2deg(r)

Arguments

d

degrees

r

radians

Details

Since πrad=180\pi\,\mathrm{rad}=180^{\circ}, degrees (dd) can be converted to radians (rr) using r=dπ/180r=d\cdot{}\pi/180 and the conversion of radians to degrees is d=r180/πd=r\cdot{}180/\pi.

Author(s)

Marco J. Maier

See Also

see Trigonometric Functions, Hyperbolic Functions, Constants in R

Examples

# pi is available as a constant
pi

# 180° are pi radians
deg2rad(180)

# 2 * pi radians are 360°
rad2deg(2 * pi)

Density-Box-Plots

Description

This function draws a (grouped) boxplot-like plot with with kernel density estimators.

Usage

densbox(formula, data, rug = FALSE, from, to, gsep = .5, kernel, bw, main, ylab,
    var_names, box_out = TRUE, horizontal = FALSE, ...)

Arguments

formula

a formula object that references elements in data, see Details

data

a data frame containing the variables specified in formula

rug

a logical value to add a rug to the individual density-boxes

from

an optional lower boundary for the kernel density estimation (see density)

to

an optional upper boundary for the kernel density estimation (see density)

gsep

a numeric value 0\geq0 that specifies the length of group separation if two or more grouping variables are used

kernel

a string specifying the type of the kernel (default: "gaussian", see density)

bw

the bandwidth for kernel density estimation (see density)

main

a character object for the title

ylab

a character object for the yy-axis label

var_names

a character object to print grouping variables' names in the lower left margin – grouping variables are treated in the order they are given in the formula

box_out

if TRUE, outliers treated as in standard boxplots (plotted as stars outside the boxplot's whiskers; default), if FALSE, outliers are not treated differently, i.e., minimum and maximum will be over the full range, no matter how far individual observations may be from the median with respect to the IQR (interquartile range; see boxplot.stats and fivenum for details on the computation of boxplot statistics).

horizontal

not implemented yet...

...

further arguments, see Details

Details

This function plots a combination of boxplots and kernel density plots to get a more informative graphic of a metric dependent variable with respect to grouped data. The central element is the formula argument that defines the dependent variable (dv) and grouping variables (independent variables, iv). For a meaningful plot, the ivs should be categorical variables (they are treated as factors).

In the simplest case, there is no grouping, so formula is DV ~ 1. As grouping variables are added, the plot will be split up accordingly. Note that the ordering of ivs in the formula defines how the plot is split up – the first variable is the most general grouping, the second will form subgroups in the first variable's groups and so on ...

If there are cases where a level of a factor is completely missing ab initio, the level will be dropped. Subgroups with less than 5 observations will be dropped and “<5<5” will be plotted instead.

Author(s)

Marco J. Maier

See Also

density, boxplot, grid (Package)

Examples

# plot a density-box-plot of one (log-normal) variable
set.seed(5L)
data1 <- rlnorm(100, 1, .5)
densbox(data1 ~ 1, from = 0, rug = TRUE)

# plots a continuous variable in (0, 1) with 2 grouping variables
data2 <- data.frame(y  = rnorm(400, rep(c(0, 1, -1, 0), each = 100), 1),
                    x1 = rep(c("A", "B"), each = 200),
                    x2 = rep(c("X", "Y", "X", "Y"), each = 100))
with(data2, tapply(y, list(x1, x2), mean))

# a density-box-plot of the data with the kernel density
# estimator constrained to the interval 0 to 1
densbox(y ~ x2 + x1, data2, main = "Plot with some\nSpecials",
  var_names = c("Second\nVariable", "First Variable"))

# the same plot with a rug and ignoring outliers in the boxplot
densbox(y ~ x2 + x1, data2, rug = TRUE, box_out = FALSE)

# density-box-plot with the same data, but no additional space between groups
# by setting gsep = 0.
# the kernel density plots have a rectangular kernel with a bandwidth of 0.25
# which results in a "jagged" appearance.
densbox(y ~ x2 + x1, data2, gsep = 0, kernel = "rectangular", bw = 0.25)

Kaiser-Meyer-Olkin Statistics

Description

description

Usage

KMOS(x, use = c("everything", "all.obs", "complete.obs", "na.or.complete",
    "pairwise.complete.obs"))

## S3 method for class 'MSA_KMO'
print(x, stats = c("both", "MSA", "KMO"), vars = "all",
    sort = FALSE, show = "all", digits = getOption("digits"), ...)

Arguments

x

The data X\mathbf{X} for KMOS(), an object of class 'MSA_KMO' for the print method.

use

defines the method to use if missing values are present (for a detailed explanation see bart_spher; see also cor).

stats

determines if "MSA", "KMO" or "both" (default) are printed.

vars

can be "all" or a vector of index numbers of variables to print the MSAs for.

sort

sorts the MSAs in increasing order.

show

shows the specified number of variables (from 1 to the number of potentially sorted variables).

digits

the number of decimal places to print.

...

further arguments.

Details

The Measure of Sampling Adequacy (MSA) for individual items and the Kaiser-Meyer-Olkin (KMO) Criterion rely on the Anti-Image-Correlation Matrix A\mathbf{A} (for details see Kaiser & Rice, 1974) that contains all bivariate partial correlations given all other items in the aij=rijX{i,j}a_{ij}=r_{ij\,\vert\,\mathbf{X}\setminus\{i,\,j\}} which is:

A=[diag(R1)]1/2R1[diag(R1)]1/2\mathbf{A}=\left[\mathrm{diag}(\mathbf{R}^{-1})\right]^{-1/2}\,\mathbf{R}^{-1}\,\left[\mathrm{diag}(\mathbf{R}^{-1})\right]^{-1/2}

where R\mathbf{R} is the correlation matrix, based on the data X\mathbf{X}.

The KMO and MSAs for individual items are (adapted from Equations (3) and (4) in Kaiser & Rice, 1974; note that aa is qq in the article):

KMO=i=1kj=1krij2i=1kj=1krij2+aij2,ij\mathit{KMO}=\frac{\sum_{i=1}^{k}\sum_{j=1}^{k}r_{ij}^2}{\sum_{i=1}^{k}\sum_{j=1}^{k}r_{ij}^2+a_{ij}^2},\qquad i\neq j

MSAi=j=1krij2j=1krij2+aij2,ji\mathit{MSA}_i=\frac{\sum_{j=1}^{k}r_{ij}^2}{\sum_{j=1}^{k}r_{ij}^2+a_{ij}^2},\qquad j\neq i

Historically, as suggested in Kaiser (1974) and Kaiser & Rice (1974), a rule of thumb for those values is:

.9\geq{}.9 marvelous
[.8,.9)[.8,\,.9) meritorious
[.7,.8)[.7,\,.8) middling
[.6,.7)[.6,\,.7) mediocre
[.5,.6)[.5,\,.6) miserable
<.5<.5 unacceptable

Value

A list of class 'MSA_KMO'

call

the issued function call

cormat

correlation matrix

pcormat

normalized negative inverse of the correlation matrix (pairwise correlations given all other variables)

n

the number of observations

k

the number of variables/items

MSA

measure of sampling adequacy

KMO

Kaiser-Meyer-Olkin criterion

Author(s)

Marco J. Maier

References

Kaiser, H. F. (1970). A Second Generation Little Jiffy. Psychometrika, 35(4), 401–415.

Kaiser, H. F. (1974). An Index of Factorial Simplicity. Psychometrika, 39(1), 31–36.

Kaiser, H. F., & Rice, J. (1974). Little Jiffy, Mark IV. Educational and Psychological Measurement, 34, 111–117.

See Also

cor, bart_spher

Examples

set.seed(5L)
daten <- data.frame("A"=rnorm(100), "B"=rnorm(100), "C"=rnorm(100),
                    "D"=rnorm(100), "E"=rnorm(100))
cor(daten)
KMOS(daten, use = "pairwise.complete.obs")

Compute (Log) Odds Ratios

Description

This function computes the (log-)odds ratio (OR) for a 2×22\times{}2 table (x must be an object of class 'table' either by using table or as.table). For a data frame of kk variables with 2 categories each, all k(k1)/2k(k-1)/2 pairwise (log-)odds-ratios are computed.

Usage

odds_ratios(x)

## S3 method for class 'REdaS_ORs'
print(x, ...)

## S3 method for class 'REdaS_ORs'
summary(object, ...)

Arguments

x

either a 2×22\times{}2 table object or a data frame where each variable has two categories.

object

an object of class 'REdaS_ORs'.

...

further arguments.

Details

Note that tables where one or more cells are 0 are not processed and a warning is issued in such cases.

Value

odds_ratios() returns a list of class 'REdaS_ORs':

call

the issued function call.

x

the original data.

tables

a list of one or more tables.

comps

a list of the compared variables' names.

ORs

a list with (log-)odds-ratios, standard errors, zz- and pp-values.

print.REdaS_ORs() invisibly returns a matrix containing all statistics shown by the print-method.

Author(s)

Marco J. Maier

Examples

# create a table from a 2 x 2 matrix of frequencies using as.table()
tab <- as.table( matrix(c(49, 1, 5, 45), 2) )
dimnames(tab) <- list("LED on?" = c("no", "yes"),
                      "PC running?" = c("no", "yes"))
tab

odds_ratios(tab)

# generate a matrix with 3 variables and 100 observations
# note that each variable must have exactly two categories
set.seed(5)
x <- data.frame("A" = as.factor(sample(1:2, 100, TRUE)),
                "B" = as.factor(sample(3:4, 100, TRUE)),
                "C" = as.factor(sample(5:6, 100, TRUE)))
head(x)

res <- odds_ratios(x)

# print the results and save the summarized information in a matrix
resmat <- print(res)
resmat

# the summary method gives a rather lengthy output with all tables etc.
summary(res)