Title: | Calculate Crosstab and Topline Tables of Weighted Survey Data |
---|---|
Description: | Calculate common types of tables for weighted survey data. Options include topline and (2-way and 3-way) crosstab tables of categorical or ordinal data as well as summary tables of weighted numeric variables. Optionally, include the margin of error at selected confidence intervals including the design effect. The design effect is calculated as described by Kish (1965) <doi:10.1002/bimj.19680100122> beginning on page 257. Output takes the form of tibbles (simple data frames). This package conveniently handles labelled data, such as that commonly used by 'Stata' and 'SPSS.' Complex survey design is not supported at this time. |
Authors: | John D. Johnson [aut, cre] |
Maintainer: | John D. Johnson <[email protected]> |
License: | CC0 |
Version: | 0.1.6 |
Built: | 2025-02-24 04:03:32 UTC |
Source: | https://github.com/jdjohn215/pollster |
crosstab
returns a tibble containing a weighted crosstab of two variables
crosstab( df, x, y, weight, remove = "", n = TRUE, pct_type = "row", format = "wide", unwt_n = FALSE )
crosstab( df, x, y, weight, remove = "", n = TRUE, pct_type = "row", format = "wide", unwt_n = FALSE )
df |
The data source |
x |
The independent variable |
y |
The dependent variable |
weight |
The weighting variable |
remove |
An optional character vector of values to remove from final table (e.g. "refused"). This will not affect any calculations made. The vector is not case-sensitive. |
n |
logical, if TRUE numeric totals are included. They are included in a separate column for row and cell percentages, but in a separate row for wide format column percentages. |
pct_type |
Controls the kind of percentage values returned. One of "row," "cell," or "column." |
format |
one of "long" or "wide" |
unwt_n |
logical, if TRUE a column "unweighted_n" is included containing the unweighted frequency count. It is not available when pct_type is "column" |
Options include row, column, or cell percentages. The tibble can be in long or wide format.
a tibble
crosstab(df = illinois, x = voter, y = raceethnic, weight = weight) crosstab(df = illinois, x = voter, y = raceethnic, weight = weight, n = FALSE)
crosstab(df = illinois, x = voter, y = raceethnic, weight = weight) crosstab(df = illinois, x = voter, y = raceethnic, weight = weight, n = FALSE)
crosstab_3way
returns a tibble containing a weighted crosstab of two variables by a third variable
crosstab_3way( df, x, y, z, weight, remove = c(""), n = TRUE, pct_type = "row", format = "wide", unwt_n = FALSE )
crosstab_3way( df, x, y, z, weight, remove = c(""), n = TRUE, pct_type = "row", format = "wide", unwt_n = FALSE )
df |
The data source |
x |
The independent variable |
y |
The dependent variable |
z |
The second control variable |
weight |
The weighting variable |
remove |
An optional character vector of values to remove from final table (e.g. "refused"). This will not affect any calculations made. The vector is not case-sensitive. |
n |
logical, if TRUE numeric totals are included. |
pct_type |
Controls the kind of percentage values returned. One of "row" or "cell." |
format |
one of "long" or "wide" |
unwt_n |
logical, if TRUE a column is added containing unweighted frequency counts |
Options include row or cell percentages. The tibble can be in long or wide format. These tables are ideal for use with small multiples created with ggplot2::facet_wrap.
a tibble
crosstab_3way(df = illinois, x = sex, y = educ6, z = maritalstatus, weight = weight) crosstab_3way(df = illinois, x = sex, y = educ6, z = maritalstatus, weight = weight, format = "wide")
crosstab_3way(df = illinois, x = sex, y = educ6, z = maritalstatus, weight = weight) crosstab_3way(df = illinois, x = sex, y = educ6, z = maritalstatus, weight = weight, format = "wide")
deff_calc
returns a single number
deff_calc(w)
deff_calc(w)
w |
a vector of weights |
This function returns the design effect of a given sample using the formula length(w)*sum(w^2)/(sum(w)^2). It is designed for use in the moe family of functions. If any weights are equal to 0, they are removed prior to calculation.
A number
deff_calc(illinois$weight)
deff_calc(illinois$weight)
A dataset containing the responses of 36,207 Illinois respondents to the Current Population Survey's biennial Voting and Registration Supplement for the Current Population Survey, 1996-2018.
illinois
illinois
A data frame with 36207 rows and 9 variables:
year of survey
the state fips code
sex of the respondent, labelled value
highest level of education for respondent, labelled values
one of white, black, Hispanic, or other, labelled values
one of Married, Widowed/divorced/Sep, or Never Married, labelled values
indicates if the respondent is registered to vote, labelled values
indicates if the respondent voted, labelled values
the age of the respondent, numeric values
the number of people each respondent is calculated to represent
https://www.census.gov/topics/public-sector/voting.html
moe_crosstab
returns a tibble containing a weighted crosstab of two variables with margin of error
moe_crosstab( df, x, y, weight, remove = c(""), n = TRUE, pct_type = "row", format = "long", zscore = 1.96, unwt_n = FALSE )
moe_crosstab( df, x, y, weight, remove = c(""), n = TRUE, pct_type = "row", format = "long", zscore = 1.96, unwt_n = FALSE )
df |
The data source |
x |
The independent variable |
y |
The dependent variable |
weight |
The weighting variable, defaults to zwave_weight |
remove |
An optional character vector of values to remove from final table (e.g. "refused"). This will not affect any calculations made. The vector is not case-sensitive. |
n |
logical, if TRUE numeric totals are included. |
pct_type |
Controls the kind of percentage values returned. One of "row" or "cell." Column percents are not supported. |
format |
one of "long" or "wide" |
zscore |
defaults to 1.96, consistent with a 95% confidence interval |
unwt_n |
logical, if TRUE it adds a column with unweighted frequency values |
Options include row or cell percentages. The tibble can be in long or wide format. The margin of error includes the design effect of the weights.
a tibble
moe_crosstab(df = illinois, x = voter, y = raceethnic, weight = weight) moe_crosstab(df = illinois, x = voter, y = raceethnic, weight = weight, n = FALSE)
moe_crosstab(df = illinois, x = voter, y = raceethnic, weight = weight) moe_crosstab(df = illinois, x = voter, y = raceethnic, weight = weight, n = FALSE)
moe_crosstab_3way
returns a tibble containing a weighted crosstab of two variables by a third variable with margin of error
moe_crosstab_3way( df, x, y, z, weight, remove = c(""), n = TRUE, pct_type = "row", format = "long", zscore = 1.96, unwt_n = FALSE )
moe_crosstab_3way( df, x, y, z, weight, remove = c(""), n = TRUE, pct_type = "row", format = "long", zscore = 1.96, unwt_n = FALSE )
df |
The data source |
x |
The independent variable |
y |
The dependent variable |
z |
The second control variable |
weight |
The weighting variable |
remove |
An optional character vector of values to remove from final table (e.g. "refused"). This will not affect any calculations made. The vector is not case-sensitive. |
n |
logical, if TRUE numeric totals are included. |
pct_type |
Controls the kind of percentage values returned. One of "row" or "cell." |
format |
one of "long" or "wide" |
zscore |
defaults to 1.96, consistent with a 95% confidence interval |
unwt_n |
logical, if TRUE it adds a column with unweighted frequency values |
Options include row or cell percentages. The tibble can be in long or wide format. These tables are ideal for use with small multiples created with ggplot2::facet_wrap.
a tibble
moe_crosstab_3way(df = illinois, x = sex, y = educ6, z = maritalstatus, weight = weight) moe_crosstab_3way(df = illinois, x = sex, y = educ6, z = maritalstatus, weight = weight, format = "wide")
moe_crosstab_3way(df = illinois, x = sex, y = educ6, z = maritalstatus, weight = weight) moe_crosstab_3way(df = illinois, x = sex, y = educ6, z = maritalstatus, weight = weight, format = "wide")
moe_topline
returns a tibble containing a weighted topline of one variable with margin of error
moe_topline( df, variable, weight, remove = c(""), n = TRUE, pct = TRUE, valid_pct = TRUE, cum_pct = TRUE, zscore = 1.96 )
moe_topline( df, variable, weight, remove = c(""), n = TRUE, pct = TRUE, valid_pct = TRUE, cum_pct = TRUE, zscore = 1.96 )
df |
The data source |
variable |
the variable name |
weight |
The weighting variable, defaults to zwave_weight |
remove |
An optional character vector of values to remove from final table (e.g. "refused"). This will not affect any calculations made. The vector is not case-sensitive. |
n |
logical, if TRUE a frequency column is included percentages, but in a separate row for column percentages. |
pct |
logical, if TRUE a column of percents is included |
valid_pct |
logical, if TRUE a column of valid percents is included |
cum_pct |
logical, if TRUE a column of cumulative percents is included |
zscore |
defaults to 1.96, consistent with a 95% confidence interval |
By default the table includes a column for frequency count, percent, valid percent, and cumulative percent.
a tibble
moe_topline(df = illinois, variable = educ6, weight = weight) moe_topline(df = illinois, variable = educ6, weight = weight, remove = c("LT HS"))
moe_topline(df = illinois, variable = educ6, weight = weight) moe_topline(df = illinois, variable = educ6, weight = weight, remove = c("LT HS"))
moe_wave_crosstab
returns a tibble containing a weighted crosstab of two variables
with margin of error. Use this function when the x-variable indicates different survey
waves for which weights were calculated independently.
moe_wave_crosstab( df, x, y, weight, remove = c(""), n = TRUE, pct_type = "row", format = "long", zscore = 1.96, unwt_n = FALSE )
moe_wave_crosstab( df, x, y, weight, remove = c(""), n = TRUE, pct_type = "row", format = "long", zscore = 1.96, unwt_n = FALSE )
df |
The data source |
x |
The independent variable, which uniquely identifies survey waves |
y |
The dependent variable |
weight |
The weighting variable, defaults to zwave_weight |
remove |
An optional character vector of values to remove from final table (e.g. "refused"). This will not affect any calculations made. The vector is not case-sensitive. |
n |
logical, if TRUE numeric totals are included. |
pct_type |
Controls the kind of percentage values returned. One of "row" or "cell." Column percents are not supported. |
format |
one of "long" or "wide" |
zscore |
defaults to 1.96, consistent with a 95% confidence interval |
unwt_n |
logical, if TRUE it adds a column with unweighted frequency values |
Options include row or cell percentages. The tibble can be in long or wide format. The margin of error includes the design effect of the weights, calculated separately for each survey wave.
a tibble
moe_wave_crosstab(df = illinois, x = year, y = maritalstatus, weight = weight) moe_wave_crosstab(df = illinois, x = year, y = maritalstatus, weight = weight, format = "wide")
moe_wave_crosstab(df = illinois, x = year, y = maritalstatus, weight = weight) moe_wave_crosstab(df = illinois, x = year, y = maritalstatus, weight = weight, format = "wide")
moe_wave_crosstab_3way
returns a tibble containing a weighted crosstab of two variables by a third variable with margin of error.
Use this function when the z-variable indicates different survey
waves for which weights were calculated independently.
moe_wave_crosstab_3way( df, x, y, z, weight, remove = c(""), n = TRUE, pct_type = "row", format = "long", zscore = 1.96, unwt_n = FALSE )
moe_wave_crosstab_3way( df, x, y, z, weight, remove = c(""), n = TRUE, pct_type = "row", format = "long", zscore = 1.96, unwt_n = FALSE )
df |
The data source |
x |
The independent variable |
y |
The dependent variable |
z |
The second control variable, uniquely identifies survey waves |
weight |
The weighting variable |
remove |
An optional character vector of values to remove from final table (e.g. "refused"). This will not affect any calculations made. The vector is not case-sensitive. |
n |
logical, if TRUE numeric totals are included. |
pct_type |
Controls the kind of percentage values returned. One of "row" or "cell." |
format |
one of "long" or "wide" |
zscore |
defaults to 1.96, consistent with a 95% confidence interval |
unwt_n |
logical, if TRUE it adds a column with unweighted frequency values |
Options include row or cell percentages. The tibble can be in long or wide format. These tables are ideal for use with small multiples created with ggplot2::facet_wrap.
a tibble
moe_crosstab_3way(df = illinois, x = sex, y = educ6, z = year, weight = weight) moe_crosstab_3way(df = illinois, x = sex, y = educ6, z = year, weight = weight, format = "wide")
moe_crosstab_3way(df = illinois, x = sex, y = educ6, z = year, weight = weight) moe_crosstab_3way(df = illinois, x = sex, y = educ6, z = year, weight = weight, format = "wide")
moedeff_calc
returns a single number. It is designed for use in the moe family of functions.
moedeff_calc(pct, deff, n, zscore = 1.96)
moedeff_calc(pct, deff, n, zscore = 1.96)
pct |
a proportion |
deff |
a design effect |
n |
the sample size |
zscore |
defaults to 1.96, consistent with a 95% confidence interval. |
This function returns the margin of error including design effect of a given sample of weighted data using the formula sqrt(deff)*zscore*sqrt((pct*(1-pct))/(n-1))*100
A percentage
moedeff_calc(pct = 0.515, deff = 1.6, n = 214)
moedeff_calc(pct = 0.515, deff = 1.6, n = 214)
summary_table
returns a tibble containing a weighted summary table of a single variable.
summary_table(df, variable, weight, name_style = "clean")
summary_table(df, variable, weight, name_style = "clean")
df |
The data source |
variable |
the variable to summarize, it should be numeric |
weight |
The weighting variable |
name_style |
the style of the column names–one of "clean" or "pretty." Clean names are all lower case and words are separated by an underscore. Pretty names begin with a capital letter are words a separated by a space. |
The resulting tible includes columns for the variable name, unweighted observations, weighted observations, weighted mean, minimum value, maximum value, unweighted missing values, and weighted missing values
a tibble
summary_table(illinois, age, weight) summary_table(illinois, age, weight, name_style = "pretty")
summary_table(illinois, age, weight) summary_table(illinois, age, weight, name_style = "pretty")
topline
returns a tibble containing a weighted topline of one variable
topline( df, variable, weight, remove = c(""), n = TRUE, pct = TRUE, valid_pct = TRUE, cum_pct = TRUE )
topline( df, variable, weight, remove = c(""), n = TRUE, pct = TRUE, valid_pct = TRUE, cum_pct = TRUE )
df |
The data source |
variable |
the variable name |
weight |
The weighting variable, defaults to zwave_weight |
remove |
An optional character vector of values to remove from final table (e.g. "refused"). This will not affect any calculations made. The vector is not case-sensitive. |
n |
logical, if TRUE a frequency column is included percentages, but in a separate row for column percentages. |
pct |
logical, if TRUE a column of percents is included |
valid_pct |
logical, if TRUE a column of valid percents is included |
cum_pct |
logical, if TRUE a column of cumulative percents is included |
By default the table includes a column for frequency count, percent, valid percent, and cumulative percent.
a tibble
topline(illinois, sex, weight) topline(illinois, sex, weight, pct = FALSE)
topline(illinois, sex, weight) topline(illinois, sex, weight, pct = FALSE)
wtd_mean
returns the weighted mean of a variable. It's a tidy-compatible
wrapper around stats::weighted.mean().
wtd_mean(df, variable, weight)
wtd_mean(df, variable, weight)
df |
The data source |
variable |
the variable, it should be numeric |
weight |
The weighting variable |
a numeric value
wtd_mean(illinois, age, weight) library(dplyr) illinois %>% wtd_mean(age, weight)
wtd_mean(illinois, age, weight) library(dplyr) illinois %>% wtd_mean(age, weight)