class: center, middle, inverse, title-slide .title[ # Lecture 17: More on ggplot ] .author[ ### Robin Liu ] .institute[ ### UCSB ] .date[ ### 2022-07-20 ] --- # Prereq For this lecture we need to install and load the `socviz` package. ```r install.packages("socviz") library(socviz) ``` https://socviz.co/ --- # Aesthetic mappings Last time we set the aesthetic mappings at the top level: in the call to `ggplot`. ```r p <- ggplot(data = gapminder, mapping = aes(x = gdpPercap, y=lifeExp)) p + geom_point(alpha = 0.3) + scale_x_log10(labels = scales::label_dollar()) ``` <img src="Lec17_files/figure-html/unnamed-chunk-3-1.png" style="display: block; margin: auto;" /> --- # Aesthetic mappings We can set the mapping directly in the geom. ```r p <- ggplot(data = gapminder) p + geom_point(mapping = aes(x = gdpPercap, y=lifeExp), alpha = 0.3) + scale_x_log10(labels = scales::label_dollar()) ``` <img src="Lec17_files/figure-html/unnamed-chunk-4-1.png" style="display: block; margin: auto;" /> --- # Aesthetic mappings But then further geoms do not *inherit* the mapping. ```r p <- ggplot(data = gapminder) p + geom_point(mapping = aes(x = gdpPercap, y=lifeExp), alpha = 0.3) + geom_smooth(method = "lm") + scale_x_log10(labels = scales::label_dollar()) ``` ``` ## `geom_smooth()` using formula 'y ~ x' ``` ``` ## Error in `check_required_aesthetics()`: ## ! stat_smooth requires the following missing aesthetics: x and y ``` <img src="Lec17_files/figure-html/unnamed-chunk-5-1.png" style="display: block; margin: auto;" /> The `x` and `y` aesthetics are *required* in `geom_smooth`. --- # Aesthetic mappings Mapping `x` and `y` aesthetics in `geom_smooth`. ```r p <- ggplot(data = gapminder) p + geom_point(mapping = aes(x = gdpPercap, y=lifeExp), alpha = 0.3) + geom_smooth(mapping = aes(x = gdpPercap, y=lifeExp), method = "lm") + scale_x_log10(labels = scales::label_dollar()) ``` <img src="Lec17_files/figure-html/unnamed-chunk-6-1.png" style="display: block; margin: auto;" /> --- # Aesthetic mappings Better: mappings in the top layer are *inherited* by the geoms. Less duplication. ```r p <- ggplot(data = gapminder, mapping = aes(x = gdpPercap, y=lifeExp)) p + geom_point(alpha = 0.3) + geom_smooth(method = "lm") + scale_x_log10(labels = scales::label_dollar()) ``` <img src="Lec17_files/figure-html/unnamed-chunk-7-1.png" style="display: block; margin: auto;" /> --- class: inverse, center, middle # Grouped data --- # Grouped data Recall whats in `gapminder` ```r head(gapminder, 5) ``` ``` ## # A tibble: 5 x 6 ## country continent year lifeExp pop gdpPercap ## <fct> <fct> <int> <dbl> <int> <dbl> ## 1 Afghanistan Asia 1952 28.8 8425333 779. ## 2 Afghanistan Asia 1957 30.3 9240934 821. ## 3 Afghanistan Asia 1962 32.0 10267083 853. ## 4 Afghanistan Asia 1967 34.0 11537966 836. ## 5 Afghanistan Asia 1972 36.1 13079460 740. ``` --- # Grouped data ```r gapminder |> filter(country == "Afghanistan") |> ggplot(mapping = aes(x = year, y = gdpPercap)) + geom_line() ``` <img src="Lec17_files/figure-html/unnamed-chunk-9-1.png" style="display: block; margin: auto;" /> --- # Grouped data ```r gapminder |> ggplot(mapping = aes(x = year, y = gdpPercap)) + geom_line() ``` <img src="Lec17_files/figure-html/unnamed-chunk-10-1.png" style="display: block; margin: auto;" /> -- What happened? --- # Grouped data `ggplot` did not know about different countries in the previous plot. We need to specify countries as a `group`. ```r gapminder |> ggplot(mapping = aes(x = year, y = gdpPercap)) + geom_line(aes(group = country)) ``` <img src="Lec17_files/figure-html/unnamed-chunk-11-1.png" style="display: block; margin: auto;" /> Plot is still a mess. --- # Grouped data Let's further split up the countries by continent. ```r p <- ggplot(gapminder, mapping = aes(x = year, y = gdpPercap)) p + geom_line(aes(group = country)) + * facet_wrap(~ continent) ``` <img src="Lec17_files/figure-html/unnamed-chunk-12-1.png" style="display: block; margin: auto;" /> --- # Grouped data **Very** useful visualization tool: facet the data by a factor. Allows us to compare data among groups on the same scale. <img src="Lec17_files/figure-html/unnamed-chunk-13-1.png" style="display: block; margin: auto;" /> --- # Facets Facets are especially effective for data with many factors. [General Social Survey 2016](https://rdrr.io/github/kjhealy/socviz/man/gss_sm.html) ```r # library(socviz) gss_sm ``` ``` ## # A tibble: 2,867 x 32 ## year id ballot age childs sibs degree race sex region income16 ## <dbl> <dbl> <labelled> <dbl> <dbl> <labe> <fct> <fct> <fct> <fct> <fct> ## 1 2016 1 1 47 3 2 Bache~ White Male New E~ $170000~ ## 2 2016 2 2 61 0 3 High ~ White Male New E~ $50000 ~ ## 3 2016 3 3 72 2 3 Bache~ White Male New E~ $75000 ~ ## 4 2016 4 1 43 4 3 High ~ White Fema~ New E~ $170000~ ## 5 2016 5 3 55 2 2 Gradu~ White Fema~ New E~ $170000~ ## 6 2016 6 2 53 2 2 Junio~ White Fema~ New E~ $60000 ~ ## 7 2016 7 1 50 2 2 High ~ White Male New E~ $170000~ ## 8 2016 8 3 23 3 6 High ~ Other Fema~ Middl~ $30000 ~ ## 9 2016 9 1 45 3 5 High ~ Black Male Middl~ $60000 ~ ## 10 2016 10 3 71 4 1 Junio~ White Male Middl~ $60000 ~ ## # ... with 2,857 more rows, and 21 more variables: relig <fct>, marital <fct>, ## # padeg <fct>, madeg <fct>, partyid <fct>, polviews <fct>, happy <fct>, ## # partners <fct>, grass <fct>, zodiac <fct>, pres12 <labelled>, ## # wtssall <dbl>, income_rc <fct>, agegrp <fct>, ageq <fct>, siblings <fct>, ## # kids <fct>, religion <fct>, bigregion <fct>, partners_rc <fct>, obama <dbl> ``` --- # Facets ```r p <- ggplot(data = gss_sm, mapping = aes(y = degree)) p + geom_bar() ``` <img src="Lec17_files/figure-html/unnamed-chunk-15-1.png" style="display: block; margin: auto;" /> --- # Facets ```r p <- ggplot(data = gss_sm, mapping = aes(y = degree)) p + geom_bar() + * facet_wrap(~ race) ``` <img src="Lec17_files/figure-html/unnamed-chunk-16-1.png" style="display: block; margin: auto;" /> --- # Facets ```r p <- ggplot(data = gss_sm, mapping = aes(y = degree)) p + geom_bar() + * facet_grid(sex ~ race) ``` <img src="Lec17_files/figure-html/unnamed-chunk-17-1.png" style="display: block; margin: auto;" /> --- # Facets <img src="Lec17_files/figure-html/unnamed-chunk-18-1.png" style="display: block; margin: auto;" />
02
:
00
--- # geom_bar vs geom_col ```r gss_sm |> ggplot(gss_sm, mapping = aes(y = fct_infreq(degree))) + * geom_bar() ``` <img src="Lec17_files/figure-html/unnamed-chunk-19-1.png" style="display: block; margin: auto;" /> --- # geom_bar vs geom_col ```r gss_sm |> group_by(degree) |> summarize(count = n()) |> ggplot(mapping = aes(x = count, y = reorder(degree, count, decreasing = T))) + * geom_col() ``` <img src="Lec17_files/figure-html/unnamed-chunk-20-1.png" style="display: block; margin: auto;" /> --- # group_by, summarize A complicated `group_by`: ```r (rel_by_region <- gss_sm |> group_by(bigregion, religion) |> summarize(N = n()) |> mutate(freq = N / sum(N), pct = round((freq*100), 0))) ``` ``` ## # A tibble: 24 x 5 ## # Groups: bigregion [4] ## bigregion religion N freq pct ## <fct> <fct> <int> <dbl> <dbl> ## 1 Northeast Protestant 158 0.324 32 ## 2 Northeast Catholic 162 0.332 33 ## 3 Northeast Jewish 27 0.0553 6 ## 4 Northeast None 112 0.230 23 ## 5 Northeast Other 28 0.0574 6 ## 6 Northeast <NA> 1 0.00205 0 ## 7 Midwest Protestant 325 0.468 47 ## 8 Midwest Catholic 172 0.247 25 ## 9 Midwest Jewish 3 0.00432 0 ## 10 Midwest None 157 0.226 23 ## # ... with 14 more rows ``` --- # group_by, summarize ```r p <- ggplot(rel_by_region, mapping = aes(x = bigregion, y = pct, fill = religion)) p + geom_col() + labs(x = "Region", y = "Percent", fill = "Religion") ``` <img src="Lec17_files/figure-html/unnamed-chunk-22-1.png" style="display: block; margin: auto;" /> We used `geom_col()` since the percentages are given. --- # Dodged barplot ```r p <- ggplot(rel_by_region, mapping = aes(x = bigregion, y = pct, fill = religion)) p + geom_col(position = "dodge") + labs(x = "Region", y = "Percent", fill = "Religion") + theme(legend.position = "top") ``` <img src="Lec17_files/figure-html/unnamed-chunk-23-1.png" style="display: block; margin: auto;" /> --- # Dodged barplot ```r p <- ggplot(rel_by_region, mapping = aes(y = religion, x = pct, fill = religion)) p + geom_col(position = "dodge") + labs(x = "Percent", y = NULL, fill = "Religion") + guides(fill = "none") + facet_grid(~ bigregion) + theme_minimal() ``` <img src="Lec17_files/figure-html/unnamed-chunk-24-1.png" style="display: block; margin: auto;" /> --- # Organ data example ```r set.seed(100) organdata |> select(1:6) |> slice_sample(n = 5) ``` ``` ## # A tibble: 5 x 6 ## country year donors pop pop_dens gdp ## <chr> <date> <dbl> <int> <dbl> <int> ## 1 Switzerland 1995-01-01 13 7041 17.1 26304 ## 2 Germany 1993-01-01 13.9 81156 22.7 19983 ## 3 Germany NA NA NA NA NA ## 4 United Kingdom 1996-01-01 13.6 58139 23.9 20839 ## 5 Switzerland 1999-01-01 14.4 7144 17.3 28562 ``` --- # Organ data example ```r p <- ggplot(data = organdata, mapping = aes(x = year, y = donors)) p + geom_line(aes(group = country)) + facet_wrap(~ country) ``` ``` ## Warning: Removed 34 row(s) containing missing values (geom_path). ``` <img src="Lec17_files/figure-html/unnamed-chunk-26-1.png" style="display: block; margin: auto;" /> --- # Organ data Cleveland dotplot demo ```r (by_country <- organdata |> group_by(consent_law, country) |> summarize(donors_mean = mean(donors, na.rm = T), donors_sd = sd(donors, na.rm = T))) ``` ``` ## # A tibble: 17 x 4 ## # Groups: consent_law [2] ## consent_law country donors_mean donors_sd ## <chr> <chr> <dbl> <dbl> ## 1 Informed Australia 10.6 1.14 ## 2 Informed Canada 14.0 0.751 ## 3 Informed Denmark 13.1 1.47 ## 4 Informed Germany 13.0 0.611 ## 5 Informed Ireland 19.8 2.48 ## 6 Informed Netherlands 13.7 1.55 ## 7 Informed United Kingdom 13.5 0.775 ## 8 Informed United States 20.0 1.33 ## 9 Presumed Austria 23.5 2.42 ## 10 Presumed Belgium 21.9 1.94 ## 11 Presumed Finland 18.4 1.53 ## 12 Presumed France 16.8 1.60 ## 13 Presumed Italy 11.1 4.28 ## 14 Presumed Norway 15.4 1.11 ## 15 Presumed Spain 28.1 4.96 ## 16 Presumed Sweden 13.1 1.75 ## 17 Presumed Switzerland 14.2 1.71 ``` --- # Organ data Cleveland dotplot demo <img src="Lec17_files/figure-html/unnamed-chunk-28-1.png" style="display: block; margin: auto;" /> --- # Organ data Cleveland dotplot demo <img src="Lec17_files/figure-html/unnamed-chunk-29-1.png" style="display: block; margin: auto;" /> --- # Plot text directly ```r (by_country <- organdata |> group_by(consent_law, country) |> summarize(donors_mean = mean(donors, na.rm = T), roads_mean = mean(roads, na.rm = T))) ``` ``` ## # A tibble: 17 x 4 ## # Groups: consent_law [2] ## consent_law country donors_mean roads_mean ## <chr> <chr> <dbl> <dbl> ## 1 Informed Australia 10.6 105. ## 2 Informed Canada 14.0 109. ## 3 Informed Denmark 13.1 102. ## 4 Informed Germany 13.0 113. ## 5 Informed Ireland 19.8 118. ## 6 Informed Netherlands 13.7 76.1 ## 7 Informed United Kingdom 13.5 67.9 ## 8 Informed United States 20.0 155. ## 9 Presumed Austria 23.5 150. ## 10 Presumed Belgium 21.9 155. ## 11 Presumed Finland 18.4 93.6 ## 12 Presumed France 16.8 156. ## 13 Presumed Italy 11.1 122. ## 14 Presumed Norway 15.4 70.0 ## 15 Presumed Spain 28.1 161. ## 16 Presumed Sweden 13.1 72.3 ## 17 Presumed Switzerland 14.2 96.4 ``` --- # Plot text directly ```r p <- ggplot(data = by_country, mapping = aes(x = roads_mean, y = donors_mean)) p + geom_point() + geom_text(mapping = aes(label = country)) ``` <img src="Lec17_files/figure-html/unnamed-chunk-31-1.png" style="display: block; margin: auto;" /> --- # Plot text directly ```r library(ggrepel) p <- ggplot(data = by_country, mapping = aes(x = roads_mean, y = donors_mean)) p + geom_point() + geom_text_repel(mapping = aes(label = country)) ``` <img src="Lec17_files/figure-html/unnamed-chunk-32-1.png" style="display: block; margin: auto;" /> --- # Plot text directly ```r library(ggrepel) p <- ggplot(data = by_country, mapping = aes(x = roads_mean, y = donors_mean)) p + geom_point() + geom_text_repel(data = filter(by_country, donors_mean > 22), mapping = aes(label = country)) ``` <img src="Lec17_files/figure-html/unnamed-chunk-33-1.png" style="display: block; margin: auto;" /> --- class: inverse, middle, center # Other aesthetics --- # Mapping other aesthetics ```r (by_country <- organdata |> group_by(consent_law, country) |> summarize(donors_mean = mean(donors, na.rm = T), roads_mean = mean(roads, na.rm = T), pop_mean = mean(pop, na.rm = T), external_mean = mean(external, na.rm = T))) ``` ``` ## # A tibble: 17 x 6 ## # Groups: consent_law [2] ## consent_law country donors_mean roads_mean pop_mean external_mean ## <chr> <chr> <dbl> <dbl> <dbl> <dbl> ## 1 Informed Australia 10.6 105. 18318. 393 ## 2 Informed Canada 14.0 109. 29608. 411. ## 3 Informed Denmark 13.1 102. 5257. 532. ## 4 Informed Germany 13.0 113. 80255. 391. ## 5 Informed Ireland 19.8 118. 3674. 394 ## 6 Informed Netherlands 13.7 76.1 15548. 286. ## 7 Informed United Kingdom 13.5 67.9 58187. 288. ## 8 Informed United States 20.0 155. 269330. 530 ## 9 Presumed Austria 23.5 150. 7927. 507. ## 10 Presumed Belgium 21.9 155. 10153. 542. ## 11 Presumed Finland 18.4 93.6 5112. 722. ## 12 Presumed France 16.8 156. 58056. 603. ## 13 Presumed Italy 11.1 122. 57360. 369. ## 14 Presumed Norway 15.4 70.0 4386. 423. ## 15 Presumed Spain 28.1 161. 39666. 377. ## 16 Presumed Sweden 13.1 72.3 8789. 396. ## 17 Presumed Switzerland 14.2 96.4 7037. 488. ``` --- # Other aesthetics ```r p <- ggplot(data = by_country, mapping = aes(x = roads_mean, y = external_mean, color = donors_mean, size = pop_mean, shape = consent_law)) p + geom_point() + scale_size_area(max_size = 8) + scale_colour_distiller(palette = "YlGnBu") ``` <img src="Lec17_files/figure-html/unnamed-chunk-35-1.png" style="display: block; margin: auto;" /> --- # Discrete color palettes ```r organdata |> group_by(world, country) |> summarize(donors_mean = mean(donors, na.rm = T)) |> ggplot(mapping = aes(x = donors_mean, y = country, fill = world)) + geom_col() + scale_fill_brewer(type = "qual") ``` <img src="Lec17_files/figure-html/unnamed-chunk-36-1.png" style="display: block; margin: auto;" /> --- # Summary - Faceting by a discrete variable - the `group` aesthetic -- Each `geom_` function requires certain aesthetics. - `geom_point()` requires `x` and `y` - `geom_pointrange()` requires `x`, `y`, but also `x_min` and `x_max` - Different geoms can use the same aesthetics: `x`, `y`, `color`, and `fill`. -- Explore the different geoms and pick one suitable for your project. https://r-graph-gallery.com/ggplot2-package.html --- # For fun https://twitter.com/accidental__aRt https://github.com/djnavarro/jasmines https://www.data-imaginist.com/art