---
title: "crosstabs"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{crosstabs}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```
```{r setup}
library(pollster)
library(dplyr)
library(knitr)
library(ggplot2)
```
Crosstabs can come in [wide or long format](https://en.wikipedia.org/wiki/Wide_and_narrow_data). Each is useful, depending on your purpose. Wide data is best for display tables. Long data is usually better for making plots, for instance..
Here is a wide table.
```{r}
crosstab(df = illinois, x = sex, y = educ6, weight = weight) %>%
kable()
```
And here is long format.
```{r}
crosstab(df = illinois, x = sex, y = educ6, weight = weight, format = "long")
```
By default, row percentages are used. You can also explicitly choose cell or column percentages using the `pct_type` argument. I discourage the use of column percentages--it's better to just flip the x and y variables and make row percents--but the option is included to match functionality provided by other standard statistical software.
```{r}
# cell percentages
crosstab(df = illinois, x = sex, y = educ6, weight = weight, pct_type = "cell")
# column percentages
crosstab(df = illinois, x = sex, y = educ6, weight = weight, pct_type = "column")
```
To make a graph, just feed your `tibble` output to a `ggplot2` function.
```{r, fig.width=5.6}
crosstab(df = illinois, x = sex, y = educ6, weight = weight, format = "long") %>%
ggplot(aes(x = educ6, y = pct, fill = sex)) +
geom_bar(stat = "identity", position = position_dodge()) +
labs(title = "Educational attainment of the Illinois adult population by gender")
```
## Margin of error
### How the margin of error is calculated
The margin of error is calculated including the design effect of the sample weights, using the following formula:
`sqrt(design effect)*zscore*sqrt((pct*(1-pct))/(n-1))*100`
The design effect is calculated using the formula `length(weights)*sum(weights^2)/(sum(weights)^2)`.
------
Get at topline table with the margin of error in a separate column using the `moe_crosstab` function. By default, a z-score of 1.96 (95% confidence interval is used). Supply your own desired z-score using the `zscore` argument. Only row and cell percents are supported. By default, the table format is long because I anticipate making visualizations will be the most common use-case for this graphic.
```{r}
moe_crosstab(illinois, educ6, voter, weight)
```
A wide format table looks like this.
```{r}
moe_crosstab(illinois, educ6, voter, weight, format = "wide")
```
`ggplot2` offers [multiple ways](http://www.sthda.com/english/wiki/ggplot2-error-bars-quick-start-guide-r-software-and-data-visualization) to visualize the margin of error. Here is one good option. (Please note, if you don't have ggplot2 >= [3.3.0](https://www.tidyverse.org/blog/2020/03/ggplot2-3-3-0/) you'll get an error message.)
```{r, fig.width=5}
illinois %>%
filter(year == 2016) %>%
moe_crosstab(educ6, voter, weight) %>%
ggplot(aes(x = pct, y = educ6, xmin = (pct - moe), xmax = (pct + moe),
color = voter)) +
geom_pointrange(position = position_dodge(width = 0.2))
```
### Special case, the x-variable identifies survey waves
If the x-variable in your crosstab uniquely identifies survey waves for which the weights were independently generated, it is best practice to calculate the design effect independently for each wave. `moe_wave_crosstab` does just that. All of the arguments remain the same as in `moe_crosstab`.
```{r}
moe_wave_crosstab(df = illinois, x = year, y = rv, weight = weight)
```