---
title: "crosstabs"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{crosstabs}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

```{r setup}
library(pollster)
library(dplyr)
library(knitr)
library(ggplot2)
```

Crosstabs can come in [wide or long format](https://en.wikipedia.org/wiki/Wide_and_narrow_data). Each is useful, depending on your purpose. Wide data is best for display tables. Long data is usually better for making plots, for instance..

Here is a wide table.

```{r}
crosstab(df = illinois, x = sex, y = educ6, weight = weight) %>%
  kable()
```

And here is long format.

```{r}
crosstab(df = illinois, x = sex, y = educ6, weight = weight, format = "long")
```

By default, row percentages are used. You can also explicitly choose cell or column percentages using the `pct_type` argument. I discourage the use of column percentages--it's better to just flip the x and y variables and make row percents--but the option is included to match functionality provided by other standard statistical software.

```{r}
# cell percentages
crosstab(df = illinois, x = sex, y = educ6, weight = weight, pct_type = "cell")

# column percentages
crosstab(df = illinois, x = sex, y = educ6, weight = weight, pct_type = "column")
```

To make a graph, just feed your `tibble` output to a `ggplot2` function.

```{r, fig.width=5.6}
crosstab(df = illinois, x = sex, y = educ6, weight = weight, format = "long") %>%
  ggplot(aes(x = educ6, y = pct, fill = sex)) +
  geom_bar(stat = "identity", position = position_dodge()) +
  labs(title = "Educational attainment of the Illinois adult population by gender")
```

## Margin of error

### How the margin of error is calculated

The margin of error is calculated including the design effect of the sample weights, using the following formula:

`sqrt(design effect)*zscore*sqrt((pct*(1-pct))/(n-1))*100`

The design effect is calculated using the formula `length(weights)*sum(weights^2)/(sum(weights)^2)`.

------

Get at topline table with the margin of error in a separate column using the `moe_crosstab` function. By default, a z-score of 1.96 (95% confidence interval is used). Supply your own desired z-score using the `zscore` argument. Only row and cell percents are supported. By default, the table format is long because I anticipate making visualizations will be the most common use-case for this graphic.

```{r}
moe_crosstab(illinois, educ6, voter, weight)
```

A wide format table looks like this.

```{r}
moe_crosstab(illinois, educ6, voter, weight, format = "wide")
```

`ggplot2` offers [multiple ways](http://www.sthda.com/english/wiki/ggplot2-error-bars-quick-start-guide-r-software-and-data-visualization) to visualize the margin of error. Here is one good option. (Please note, if you don't have ggplot2 >= [3.3.0](https://www.tidyverse.org/blog/2020/03/ggplot2-3-3-0/) you'll get an error message.)

```{r, fig.width=5}
illinois %>%
  filter(year == 2016) %>%
  moe_crosstab(educ6, voter, weight) %>%
  ggplot(aes(x = pct, y = educ6, xmin = (pct - moe), xmax = (pct + moe),
             color = voter)) +
  geom_pointrange(position = position_dodge(width = 0.2))
```

### Special case, the x-variable identifies survey waves

If the x-variable in your crosstab uniquely identifies survey waves for which the weights were independently generated, it is best practice to calculate the design effect independently for each wave. `moe_wave_crosstab` does just that. All of the arguments remain the same as in `moe_crosstab`.

```{r}
moe_wave_crosstab(df = illinois, x = year, y = rv, weight = weight)
```