--- title: "crosstabs" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{crosstabs} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r setup} library(pollster) library(dplyr) library(knitr) library(ggplot2) ``` Crosstabs can come in [wide or long format](https://en.wikipedia.org/wiki/Wide_and_narrow_data). Each is useful, depending on your purpose. Wide data is best for display tables. Long data is usually better for making plots, for instance.. Here is a wide table. ```{r} crosstab(df = illinois, x = sex, y = educ6, weight = weight) %>% kable() ``` And here is long format. ```{r} crosstab(df = illinois, x = sex, y = educ6, weight = weight, format = "long") ``` By default, row percentages are used. You can also explicitly choose cell or column percentages using the `pct_type` argument. I discourage the use of column percentages--it's better to just flip the x and y variables and make row percents--but the option is included to match functionality provided by other standard statistical software. ```{r} # cell percentages crosstab(df = illinois, x = sex, y = educ6, weight = weight, pct_type = "cell") # column percentages crosstab(df = illinois, x = sex, y = educ6, weight = weight, pct_type = "column") ``` To make a graph, just feed your `tibble` output to a `ggplot2` function. ```{r, fig.width=5.6} crosstab(df = illinois, x = sex, y = educ6, weight = weight, format = "long") %>% ggplot(aes(x = educ6, y = pct, fill = sex)) + geom_bar(stat = "identity", position = position_dodge()) + labs(title = "Educational attainment of the Illinois adult population by gender") ``` ## Margin of error ### How the margin of error is calculated The margin of error is calculated including the design effect of the sample weights, using the following formula: `sqrt(design effect)*zscore*sqrt((pct*(1-pct))/(n-1))*100` The design effect is calculated using the formula `length(weights)*sum(weights^2)/(sum(weights)^2)`. ------ Get at topline table with the margin of error in a separate column using the `moe_crosstab` function. By default, a z-score of 1.96 (95% confidence interval is used). Supply your own desired z-score using the `zscore` argument. Only row and cell percents are supported. By default, the table format is long because I anticipate making visualizations will be the most common use-case for this graphic. ```{r} moe_crosstab(illinois, educ6, voter, weight) ``` A wide format table looks like this. ```{r} moe_crosstab(illinois, educ6, voter, weight, format = "wide") ``` `ggplot2` offers [multiple ways](http://www.sthda.com/english/wiki/ggplot2-error-bars-quick-start-guide-r-software-and-data-visualization) to visualize the margin of error. Here is one good option. (Please note, if you don't have ggplot2 >= [3.3.0](https://www.tidyverse.org/blog/2020/03/ggplot2-3-3-0/) you'll get an error message.) ```{r, fig.width=5} illinois %>% filter(year == 2016) %>% moe_crosstab(educ6, voter, weight) %>% ggplot(aes(x = pct, y = educ6, xmin = (pct - moe), xmax = (pct + moe), color = voter)) + geom_pointrange(position = position_dodge(width = 0.2)) ``` ### Special case, the x-variable identifies survey waves If the x-variable in your crosstab uniquely identifies survey waves for which the weights were independently generated, it is best practice to calculate the design effect independently for each wave. `moe_wave_crosstab` does just that. All of the arguments remain the same as in `moe_crosstab`. ```{r} moe_wave_crosstab(df = illinois, x = year, y = rv, weight = weight) ```