---
title: "Ggplot Statistical transformation objects"
output: html_notebook
---
```{r message=FALSE}
library(gapminder)
library(tidyverse)
```

# Geoms to display summarizing statistics

They always take an x and compute and plot the corresponding summarizing 
y-values for it: central value and dispersion of some sort, or just one of these.

You have two options to say what values the geom should render:

  - supply individual functions to compute the lower and higher dispersion 
  ranges, and a function to compute the central value. Argument slots to fill:
  `fun`, `fun.min`, `fun.max`.  
  - supply a summary function that returns both/all of them. 
  Its output must be a named vector. Argument slot to fill: `fun.data`. 
  
You either have to write these functions yourself, or you can use a few 
well-matching ones from the `Hmisc` package or wrappers around them created for 
ggplot2. 

_Examples of summary functions_


### `geom_errorbar` and `geom_linerange`
Just ranges, without the central value

__Compute standard error of the mean lifeExpectancy
for each continent in each year. Render them as errorbars 
(i.e. without the mean).__

```{r}
gapminder %>% 
  ggplot(aes(x = factor(year), y = lifeExp, color = continent)) + 
  geom_point(alpha = 0.2, 
             position = position_jitterdodge(jitter.width = 0.7, 
                                             dodge.width = 0.7)) +
  stat_summary(fun.data = "mean_se", geom = "errorbar", size = 1)
```
__Compute standard deviation of the life expectancy for each continent in each 
year using linerange (i.e. without the mean)__

Use the `ggplot2::mean_sd` function. NB: it is a wrapper around the 
`Hmisc::smean_sdl` function. It is documented below. By default, the range 
represents standard error times two (double length). To alter this, one has to 
use the `mult` parameter just like in the original`Hmisc::smean_sdl` function.
Look at the way `stat_summary` inputs these arguments: `fun.args = list()`. 

In fun.data, preferably use these functions:

```
Usage

mean_cl_boot(x, ...)

mean_cl_normal(x, ...)

mean_sdl(x, ...)

median_hilow(x, ...)

Arguments
x 	      a numeric vector

... 	    other arguments passed on to the respective Hmisc function.

Value

A data frame with columns y, ymin, and ymax. 
```
These are wrappers around some summary function from the `Hmisc package`, and they
use Hmisc's functions' parameters. Documentation of these functions follows below. 

```
Usage

smean.cl.normal(x, mult=qt((1+conf.int)/2,n-1), conf.int=.95, na.rm=TRUE)

smean.sd(x, na.rm=TRUE)

smean.sdl(x, mult=2, na.rm=TRUE)

smean.cl.boot(x, conf.int=.95, B=1000, na.rm=TRUE, reps=FALSE)

smedian.hilow(x, conf.int=.95, na.rm=TRUE)

```

These cannot be used directly as `fun.data`, since their output is different from 
the dataframe with column names `y`, `y.min`, and `y.max`. Example:

```{r message=FALSE}
library(Hmisc)
library(magrittr)
c(1,1,1,10,10,10,10) %>% Hmisc::smean.cl.normal() %T>% str() #mean and confidence intervals
```
As seen above, the resulting output is a named vector, with names different from `y`, `y.min`, `y.max`.

In all summary functions, we can supply either `fun.data`, or functions for each statistics separately. These arguments are called `fun` (the central value), `fun.min` 
(the lower dispersion value), and `fun.max` (the upper dispersion value).  


```{r}
gapminder %>% 
  ggplot(aes(x = factor(year), y = lifeExp, color = continent)) + 
  geom_point(alpha = 0.2, 
             position = position_jitterdodge(jitter.width = 0.7, 
                                             dodge.width = 0.7)) +
  stat_summary(fun.data = "mean_sdl", fun.args = list(mult = 1)
               , 
               geom = "linerange", size = 0.7,
               position = position_dodge(width = 0.8))
```
###`geom_crossbar` and `geom_pointrange`
These include the central value

```{r}
gapminder %>% 
  ggplot(aes(x = factor(year), y = lifeExp, color = continent)) + 
  geom_point(alpha = 0.2, 
             position = position_jitterdodge(jitter.width = 0.7, 
                                             dodge.width = 0.7)) +
  stat_summary(fun.data = "mean_se", geom = "crossbar", size = 0.7)

```
The 
```{r}
gapminder %>% 
  ggplot(aes(x = factor(year), y = lifeExp, color = continent)) + 
  geom_point(alpha = 0.2, 
             position = position_jitterdodge(jitter.width = 0.7, 
                                             dodge.width = 0.7)) +
  stat_summary(fun.data = "mean_sdl", fun.args = list(mult = 1), 
               geom = "pointrange", position = position_dodge(width = 0.8),
               size = 0.5)

```

With `fun`, `fun.min`, and `fun.max`: you have to write your own functions first :-/ .  

```{r}
low_f <- function(x) {quantile(x, probs = 0.25)}
hi_f <- function(x) {quantile(x, probs = 0.75)}
gapminder %>% 
  ggplot(aes(x = factor(year), y = lifeExp, color = continent)) + 
  geom_point(alpha = 0.2, 
             position = position_jitterdodge(jitter.width = 0.7, 
                                             dodge.width = 0.7)) +
  stat_summary(fun = "median", fun.min = "low_f", 
               fun.max = "hi_f", 
                   geom = "pointrange", position = position_dodge(width = 0.8),
               size = 0.5)

```

### `geom_smooth`, `stat_smooth`

```{r}
gapminder %>% ggplot(aes(x = gdpPercap, y = lifeExp, color = continent)) + 
  geom_point() + 
  geom_smooth(method = "lm")
```

### `geom_quantile`, `stat_quantile`

```{r}
gapminder %>% filter(year > 1995) %>% ggplot(aes(x = gdpPercap, y = lifeExp)) + 
  geom_point(alpha = 0.3) + 
  geom_quantile(formula = y ~ x ,quantiles = c(0.01, 0.25, 0.5, 0.75), 
                 aes(color = factor(..quantile..)), size = 2) +
  geom_smooth( formula = y ~ x , method = "lm", color = "black", linetype = 2, se = FALSE)# + #facet_wrap(~continent)
```

# `stat_function` 


```{r}

gapminder %>% filter(continent == "Europe", year > 2000) %>% 
  ggplot(aes(x = gdpPercap)) + 
  geom_density() +
  stat_function(fun = dnorm, 
               color = "red", args = list(mean = mean(filter(gapminder, continent == "Europe", year > 2000)$gdpPercap), sd = sd(filter(gapminder, continent == "Europe", year > 2000)$gdpPercap)))
```





