library(tidyverse)
library(gapminder)

Grouping

An “invisible” aesthetic scale. When do you use it? In “collective” geoms like lines (also smooth) and paths. These connect observation points and need to know by which categorical (or at least discrete) variable they should do that.

Grouping in smooth

Geom smooth shows you tendencies in your data with a line or a curve, with a confidence-interval ribbon by default.

gapminder %>% filter(year == 2007) %>%
ggplot(mapping = aes(y = lifeExp, x = gdpPercap)) + 
  geom_point() 

gapminder %>% filter(year > 2000) %>%
ggplot(mapping = aes(y = lifeExp, x = gdpPercap)) + 
  geom_point() + 
  geom_smooth(method = "lm")

When you let ggplot know a categorical variable, it will group the numeric ones by it.

gapminder %>% filter(year > 2000) %>%
  #NB continent mapped on color
ggplot(mapping = aes(y = lifeExp, x = gdpPercap, color = continent)) + 
  geom_point() + 
  geom_smooth(method = "lm")

This is what ggplot actually reads:

gapminder %>% filter(year > 2000) %>%
  #NB continent mapped on color
ggplot(mapping = aes(y = lifeExp, x = gdpPercap, color = continent, 
                     #NB grouping
                     group = continent)) + 
  geom_point() + 
  geom_smooth(method = "lm", se = FALSE)

Now, when you want colored points but one common smooth line for them, you have to override the grouping by color by setting group = 1 (just learn this by heart, idiom).

gapminder %>% filter(year > 2000) %>%
  #NB continent mapped on color
ggplot(mapping = aes(y = lifeExp, x = gdpPercap, color = continent, 
                     #NB grouping
                     group = 1
                     )) + 
  geom_point() + 
  geom_smooth(method = "lm", se = FALSE, aes(group = 1))

If you show ggplot more than one categorical variable, it will by default group the data by all combinations of values of these categorical variables.

We will have two values in year and five values in continent.

gapminder2000 %>% 
ggplot(mapping = aes(y = lifeExp, x = gdpPercap, color = continent, 
    shape = year)) + 
  geom_point() + 
  geom_smooth(method = "lm", se = FALSE)

And this is explicitly what ggplot reads here:

gapminder2000 %>% 
ggplot(mapping = aes(y = lifeExp, x = gdpPercap, color = continent, 
    shape = year, 
    #NB grouping
    group = interaction(continent, year)
    )) + 
  geom_point() + 
  geom_smooth(method = "lm", se = FALSE)

Overriding:

gapminder2000 %>% 
ggplot(mapping = aes(y = lifeExp, x = gdpPercap, color = continent, 
    shape = year, group = 1)) + 
  geom_point() + 
  geom_smooth(method = "lm", se = FALSE)

gapminder2000 %>% 
ggplot(mapping = aes(y = lifeExp, x = gdpPercap, color = continent, 
    shape = year, group = continent)) + 
  geom_point() + 
  geom_smooth(method = "lm", se = FALSE)

Grouping in lines

When using geom_line, you always have to name a categorical variable in group. Typical use: timelines.

gapminder %>% 
  ggplot(mapping = aes(x = year, y = lifeExp, color = continent)) +
  #NB grouping - it could also have been written above, of course
  #you want to see lifeExp of each country, through all years
  geom_line(mapping = aes(group = country), alpha = 0.4)

If we don’t have too many years in the data, we can e.g. combine the individual countries in the timeline with boxplots showing lifeExp distributions for the global population in the given year.

gapminder %>% 
  ggplot(mapping = aes(x = year, y = lifeExp)) + 
  geom_line(mapping = aes(color = continent, group = country)) +
  geom_boxplot(mapping = aes(group = year), fill = NA)

NA

BTW, what are the lowest life expectancy extremes in Africa and Asia between 1975 and 1995?

gapminder %>% 
  filter(year > 1975, year < 1995, continent %in% c("Africa", "Asia")) %>% 
  group_by(continent) %>% 
  filter(lifeExp == min(lifeExp) ) %>% ungroup() %>% 
  select(country, year, lifeExp)

Asia: Cambodia - Red Khmers, Africa: Hutu genocide on Tutsi

LS0tCnRpdGxlOiAiR3JvdXBpbmcgYW5kIHN0YXRzIgpvdXRwdXQ6IGh0bWxfbm90ZWJvb2sKLS0tCmBgYHtyIG1lc3NhZ2U9RkFMU0V9CmxpYnJhcnkodGlkeXZlcnNlKQpsaWJyYXJ5KGdhcG1pbmRlcikKYGBgCgoKIyBHcm91cGluZyAKIApBbiAiaW52aXNpYmxlIiBhZXN0aGV0aWMgc2NhbGUuIFdoZW4gZG8geW91IHVzZSBpdD8gSW4gImNvbGxlY3RpdmUiIGdlb21zIGxpa2UgCmxpbmVzIChhbHNvIHNtb290aCkgYW5kIHBhdGhzLiBUaGVzZSBjb25uZWN0IG9ic2VydmF0aW9uIHBvaW50cyBhbmQgbmVlZCB0byBrbm93IGJ5IHdoaWNoIApjYXRlZ29yaWNhbCAob3IgYXQgbGVhc3QgZGlzY3JldGUpIHZhcmlhYmxlIHRoZXkgc2hvdWxkIGRvIHRoYXQuIAoKIyMgR3JvdXBpbmcgaW4gYHNtb290aGAKR2VvbSBzbW9vdGggc2hvd3MgeW91IHRlbmRlbmNpZXMgaW4geW91ciBkYXRhIHdpdGggYSBsaW5lIG9yIGEgY3VydmUsIAp3aXRoIGEgY29uZmlkZW5jZS1pbnRlcnZhbCByaWJib24gYnkgZGVmYXVsdC4gIAoKYGBge3J9CmdhcG1pbmRlciAlPiUgZmlsdGVyKHllYXIgPT0gMjAwNykgJT4lCmdncGxvdChtYXBwaW5nID0gYWVzKHkgPSBsaWZlRXhwLCB4ID0gZ2RwUGVyY2FwKSkgKyAKICBnZW9tX3BvaW50KCkgCmBgYApgYGB7cn0KZ2FwbWluZGVyICU+JSBmaWx0ZXIoeWVhciA+IDIwMDApICU+JQpnZ3Bsb3QobWFwcGluZyA9IGFlcyh5ID0gbGlmZUV4cCwgeCA9IGdkcFBlcmNhcCkpICsgCiAgZ2VvbV9wb2ludCgpICsgCiAgZ2VvbV9zbW9vdGgobWV0aG9kID0gImxtIikKYGBgCldoZW4geW91IGxldCBnZ3Bsb3Qga25vdyBhIGNhdGVnb3JpY2FsIHZhcmlhYmxlLCBpdCB3aWxsIGdyb3VwIHRoZSBudW1lcmljIApvbmVzIGJ5IGl0LgoKYGBge3J9CmdhcG1pbmRlciAlPiUgZmlsdGVyKHllYXIgPiAyMDAwKSAlPiUKICAjTkIgY29udGluZW50IG1hcHBlZCBvbiBjb2xvcgpnZ3Bsb3QobWFwcGluZyA9IGFlcyh5ID0gbGlmZUV4cCwgeCA9IGdkcFBlcmNhcCwgY29sb3IgPSBjb250aW5lbnQpKSArIAogIGdlb21fcG9pbnQoKSArIAogIGdlb21fc21vb3RoKG1ldGhvZCA9ICJsbSIpCmBgYApUaGlzIGlzIHdoYXQgZ2dwbG90IGFjdHVhbGx5IHJlYWRzOiAKYGBge3J9CmdhcG1pbmRlciAlPiUgZmlsdGVyKHllYXIgPiAyMDAwKSAlPiUKICAjTkIgY29udGluZW50IG1hcHBlZCBvbiBjb2xvcgpnZ3Bsb3QobWFwcGluZyA9IGFlcyh5ID0gbGlmZUV4cCwgeCA9IGdkcFBlcmNhcCwgY29sb3IgPSBjb250aW5lbnQsIAogICAgICAgICAgICAgICAgICAgICAjTkIgZ3JvdXBpbmcKICAgICAgICAgICAgICAgICAgICAgZ3JvdXAgPSBjb250aW5lbnQpKSArIAogIGdlb21fcG9pbnQoKSArIAogIGdlb21fc21vb3RoKG1ldGhvZCA9ICJsbSIsIHNlID0gRkFMU0UpCmBgYAoKTm93LCB3aGVuIHlvdSB3YW50IGNvbG9yZWQgcG9pbnRzIGJ1dCBvbmUgY29tbW9uIHNtb290aCBsaW5lIGZvciB0aGVtLCB5b3UgaGF2ZSAKdG8gb3ZlcnJpZGUgdGhlIGdyb3VwaW5nIGJ5IGNvbG9yIGJ5IHNldHRpbmcgYGdyb3VwID0gMWAgKGp1c3QgbGVhcm4gdGhpcyBieSBoZWFydCwgaWRpb20pLiAKCmBgYHtyfQpnYXBtaW5kZXIgJT4lIGZpbHRlcih5ZWFyID4gMjAwMCkgJT4lCiAgI05CIGNvbnRpbmVudCBtYXBwZWQgb24gY29sb3IKZ2dwbG90KG1hcHBpbmcgPSBhZXMoeSA9IGxpZmVFeHAsIHggPSBnZHBQZXJjYXAsIGNvbG9yID0gY29udGluZW50LCAKICAgICAgICAgICAgICAgICAgICAgI05CIGdyb3VwaW5nCiAgICAgICAgICAgICAgICAgICAgIGdyb3VwID0gMQogICAgICAgICAgICAgICAgICAgICApKSArIAogIGdlb21fcG9pbnQoKSArIAogIGdlb21fc21vb3RoKG1ldGhvZCA9ICJsbSIsIHNlID0gRkFMU0UsICNhZXMoZ3JvdXAgPSAxKQogICAgICAgICAgICAgICkKCmBgYAoKSWYgeW91IHNob3cgZ2dwbG90IG1vcmUgdGhhbiBvbmUgY2F0ZWdvcmljYWwgdmFyaWFibGUsIGl0IHdpbGwgYnkgZGVmYXVsdCBncm91cCAKdGhlIGRhdGEgYnkgYWxsIGNvbWJpbmF0aW9ucyBvZiB2YWx1ZXMgb2YgdGhlc2UgY2F0ZWdvcmljYWwgdmFyaWFibGVzLiAKCgpgYGB7cn0KZ2FwbWluZGVyMjAwMCA8LSBnYXBtaW5kZXIgJT4lIGZpbHRlcih5ZWFyID4gMjAwMCkgJT4lIAogIG11dGF0ZSh5ZWFyID0gZmFjdG9yKHllYXIpKQpkaXN0aW5jdChnYXBtaW5kZXIyMDAwLCB5ZWFyKQpgYGAKV2Ugd2lsbCBoYXZlIHR3byB2YWx1ZXMgaW4gYHllYXJgIGFuZCBmaXZlIHZhbHVlcyBpbiBgY29udGluZW50YC4gCgpgYGB7cn0KZ2FwbWluZGVyMjAwMCAlPiUgCmdncGxvdChtYXBwaW5nID0gYWVzKHkgPSBsaWZlRXhwLCB4ID0gZ2RwUGVyY2FwLCBjb2xvciA9IGNvbnRpbmVudCwgCiAgICBzaGFwZSA9IHllYXIpKSArIAogIGdlb21fcG9pbnQoKSArIAogIGdlb21fc21vb3RoKG1ldGhvZCA9ICJsbSIsIHNlID0gRkFMU0UpCmBgYApBbmQgdGhpcyBpcyBleHBsaWNpdGx5IHdoYXQgZ2dwbG90IHJlYWRzIGhlcmU6CgpgYGB7cn0KZ2FwbWluZGVyMjAwMCAlPiUgCmdncGxvdChtYXBwaW5nID0gYWVzKHkgPSBsaWZlRXhwLCB4ID0gZ2RwUGVyY2FwLCBjb2xvciA9IGNvbnRpbmVudCwgCiAgICBzaGFwZSA9IHllYXIsIAogICAgI05CIGdyb3VwaW5nCiAgICBncm91cCA9IGludGVyYWN0aW9uKGNvbnRpbmVudCwgeWVhcikKICAgICkpICsgCiAgZ2VvbV9wb2ludCgpICsgCiAgZ2VvbV9zbW9vdGgobWV0aG9kID0gImxtIiwgc2UgPSBGQUxTRSkKYGBgCk92ZXJyaWRpbmc6CgpgYGB7cn0KZ2FwbWluZGVyMjAwMCAlPiUgCmdncGxvdChtYXBwaW5nID0gYWVzKHkgPSBsaWZlRXhwLCB4ID0gZ2RwUGVyY2FwLCBjb2xvciA9IGNvbnRpbmVudCwgCiAgICBzaGFwZSA9IHllYXIsIGdyb3VwID0gMSkpICsgCiAgZ2VvbV9wb2ludCgpICsgCiAgZ2VvbV9zbW9vdGgobWV0aG9kID0gImxtIiwgc2UgPSBGQUxTRSkKYGBgCmBgYHtyfQpnYXBtaW5kZXIyMDAwICU+JSAKZ2dwbG90KG1hcHBpbmcgPSBhZXMoeSA9IGxpZmVFeHAsIHggPSBnZHBQZXJjYXAsIGNvbG9yID0gY29udGluZW50LCAKICAgIHNoYXBlID0geWVhciwgZ3JvdXAgPSBjb250aW5lbnQpKSArIAogIGdlb21fcG9pbnQoKSArIAogIGdlb21fc21vb3RoKG1ldGhvZCA9ICJsbSIsIHNlID0gRkFMU0UpCmBgYAoKIyMgR3JvdXBpbmcgaW4gbGluZXMKV2hlbiB1c2luZyBgZ2VvbV9saW5lYCwgeW91IGFsd2F5cyBoYXZlIHRvIG5hbWUgYSBjYXRlZ29yaWNhbCB2YXJpYWJsZSBpbiAKYGdyb3VwYC4gVHlwaWNhbCB1c2U6IHRpbWVsaW5lcy4gIAoKYGBge3J9CmdhcG1pbmRlciAlPiUgCiAgZ2dwbG90KG1hcHBpbmcgPSBhZXMoeCA9IHllYXIsIHkgPSBsaWZlRXhwLCBjb2xvciA9IGNvbnRpbmVudCkpICsKICAjTkIgZ3JvdXBpbmcgLSBpdCBjb3VsZCBhbHNvIGhhdmUgYmVlbiB3cml0dGVuIGFib3ZlLCBvZiBjb3Vyc2UKICAjeW91IHdhbnQgdG8gc2VlIGxpZmVFeHAgb2YgZWFjaCBjb3VudHJ5LCB0aHJvdWdoIGFsbCB5ZWFycwogIGdlb21fbGluZShtYXBwaW5nID0gYWVzKGdyb3VwID0gY291bnRyeSksIGFscGhhID0gMC40KQpgYGAKCklmIHdlIGRvbid0IGhhdmUgdG9vIG1hbnkgeWVhcnMgaW4gdGhlIGRhdGEsIHdlIGNhbiBlLmcuIGNvbWJpbmUgdGhlIGluZGl2aWR1YWwgCmNvdW50cmllcyBpbiB0aGUgdGltZWxpbmUgd2l0aCBib3hwbG90cyBzaG93aW5nIGxpZmVFeHAgZGlzdHJpYnV0aW9ucyBmb3IgdGhlIApnbG9iYWwgcG9wdWxhdGlvbiBpbiB0aGUgZ2l2ZW4geWVhci4gCgpgYGB7cn0KZ2FwbWluZGVyICU+JSAKICBnZ3Bsb3QobWFwcGluZyA9IGFlcyh4ID0geWVhciwgeSA9IGxpZmVFeHApKSArIAogIGdlb21fbGluZShtYXBwaW5nID0gYWVzKGNvbG9yID0gY29udGluZW50LCBncm91cCA9IGNvdW50cnkpKSArCiAgZ2VvbV9ib3hwbG90KG1hcHBpbmcgPSBhZXMoZ3JvdXAgPSB5ZWFyKSwgZmlsbCA9IE5BKQogIApgYGAKCkJUVywgd2hhdCBhcmUgdGhlIGxvd2VzdCBsaWZlIGV4cGVjdGFuY3kgZXh0cmVtZXMgaW4gQWZyaWNhIGFuZCBBc2lhIApiZXR3ZWVuIDE5NzUgYW5kIDE5OTU/IAoKYGBge3J9CmdhcG1pbmRlciAlPiUgCiAgZmlsdGVyKHllYXIgPiAxOTc1LCB5ZWFyIDwgMTk5NSwgY29udGluZW50ICVpbiUgYygiQWZyaWNhIiwgIkFzaWEiKSkgJT4lIAogIGdyb3VwX2J5KGNvbnRpbmVudCkgJT4lIAogIGZpbHRlcihsaWZlRXhwID09IG1pbihsaWZlRXhwKSApICU+JSB1bmdyb3VwKCkgJT4lIAogIHNlbGVjdChjb3VudHJ5LCB5ZWFyLCBsaWZlRXhwKQpgYGAKCgoKCgoKCgoKCgoKCgoKCgoKCgpBc2lhOiBDYW1ib2RpYSAtIFJlZCBLaG1lcnMsIEFmcmljYTogSHV0dSBnZW5vY2lkZSBvbiBUdXRzaSAKCgoKCgoKCgoKCgoKCgoKCgo=