---
title: "This time really about geoms"
output: html_notebook
---
Let us explore some common geoms. Look also at OrgPad and at the ggplot2 
cheatsheet by RStudio.


```{r}
library(tidyverse)
```

A large dataset at a glance: 

```{r}
glimpse(diamonds)
```

```{r}
summary(diamonds)
```

# `geom_histogram` for one continuous variable 
```{r}
diamonds %>% 
  ggplot() + geom_histogram(mapping = aes(x = price, y = ..count..),
                            binwidth = 500) #bin represents 500 USD
  
```
Help: 

```
Computed variables

count

    number of points in bin
density

    density of points in bin, scaled to integrate to 1
ncount

    count, scaled to maximum of 1
ndensity

    density, scaled to maximum of 1
```

# `geom_bar` for one discrete variable
```{r}
diamonds %>% 
  ggplot() + geom_bar(mapping = aes(x = clarity))
```


# `geom_bar` for two discrete variables. Position

```{r}
diamonds %>% 
  ggplot() + geom_bar(mapping = aes(x = clarity, fill = cut), position = "stack")
```



```{r}
diamonds %>% 
  ggplot() + geom_bar(mapping = aes(x = clarity, fill = cut), position = "fill")
```

```{r}
diamonds %>% 
  ggplot() + geom_bar(mapping = aes(x = clarity, fill = cut), position = "dodge")
```


```{r}

```


# `geom_point` and its derived plots
A good dataset to demonstrate `geom_point()` is `diamonds`, because they have 
many observations and we will have to deal with overplotting (too many points in 
the exact same place). 

```{r}
glimpse(diamonds)
```
```{r}
summary(diamonds)
```


Relations between two continuous variables - and optional other variables, as seen below.

```{r}
diamonds %>% 
  ggplot(mapping = aes(x = carat, y = price)) + 
  geom_point(stat = "identity", position = "identity", na.rm = FALSE, show.legend = NA, 
             inherit.aes = TRUE) #default parameter values added, explain
```
Deal with overplotting
## alpha 
```{r}
diamonds %>% 
  ggplot(mapping = aes(x = carat, y = price)) + 
  geom_point(stat = "identity", position = "identity", alpha = 0.1) #alpha added
```

## tiny shape

```{r}
diamonds %>% 
  ggplot(mapping = aes(x = carat, y = price)) + 
  geom_point(stat = "identity", position = "identity", shape = ".") #position
```
## `geom_hex`

```{r}
diamonds %>% 
  ggplot(mapping = aes(x = carat, y = price)) + 
  geom_hex(bins = 20)+ 
  scale_x_continuous(breaks = seq(from = 0,to = 5, by = 0.5)) +
  scale_y_continuous(breaks = seq(from = 0, to = max(diamonds$price), by = 1000))
```


```{r}
diamonds %>% 
  ggplot(mapping = aes(x = carat, y = price)) + 
  geom_density_2d_filled(contour_var = "count", binwidth = 10) + 
  scale_x_continuous(breaks = seq(from = 0,to = 5, by = 0.5)) +
  scale_y_continuous(breaks = seq(from = 0, to = max(diamonds$price), by = 1000))
```


```{r}
diamonds %>% 
  ggplot(mapping = aes(x = carat, y = price)) + 
  geom_smooth(method = "gam")
```
## `jitter` when dataset smaller 
The `table variable has fewer values than the other numeric ones`
```{r}
diamonds %>% 
  ggplot() + geom_histogram(mapping = aes(x = table, y = ..count..), binwidth = 1)
```

With carat is going to be overplotted. This can impossibly be over 53,000 items. 
Table increments by 1. 
```{r}
diamonds %>% 
  ggplot() + geom_point(mapping = aes(x = table, y = carat), shape = ".")
```


```{r}
diamonds %>% 
  ggplot() + geom_point(mapping = aes(x = table, y = carat), 
                        position = position_jitter(width = 0.6, height = 0.05, seed = 1222 
                                                   ), 
                        shape = ".")
```


## `geom_count` for two discrete variables

```{r}
diamonds %>% 
  ggplot(mapping = aes(x = cut, y = clarity)) +
  geom_count()
```



