# A tibble: 234 × 11
manufacturer model displ year cyl trans drv cty hwy fl class
<chr> <chr> <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr>
1 audi a4 1.8 1999 4 auto… f 18 29 p comp…
2 audi a4 1.8 1999 4 manu… f 21 29 p comp…
3 audi a4 2 2008 4 manu… f 20 31 p comp…
4 audi a4 2 2008 4 auto… f 21 30 p comp…
5 audi a4 2.8 1999 6 auto… f 16 26 p comp…
6 audi a4 2.8 1999 6 manu… f 18 26 p comp…
7 audi a4 3.1 2008 6 auto… f 18 27 p comp…
8 audi a4 quattro 1.8 1999 4 manu… 4 18 26 p comp…
9 audi a4 quattro 1.8 1999 4 auto… 4 16 25 p comp…
10 audi a4 quattro 2 2008 4 manu… 4 20 28 p comp…
# … with 224 more rows
# if you want to inspect the data:# ?mpg
Bare plot syntax
Then we create a simple plot:
ggplot(data = mpg) +geom_point(mapping =aes(x = displ, y = hwy))
mapping is a special argument: combined with the aes() function, it maps variables in the data to plotting elements. We can also pass other arguments to mapping:
We can use {magrittr}’s pipe operator (%>%) or the native-pipe (|>) to pass the data as an argument:
mpg %>%ggplot() +geom_point(mapping =aes(x = displ, y = hwy))
mpg |>ggplot() +geom_point(mapping =aes(x = displ, y = hwy))
The mapping argument
We can also specify the mapping argument inside the ggplot call:
mpg %>%ggplot(mapping =aes(x = displ, y = hwy)) +geom_point()
The aestethics specified in this way will be inherited by default in every subsequent geom_* (geometry). We can always override them in the geom_* function.
Scales
{ggplot} also has elegant methods to change the scales of a plot:
mpg %>%ggplot(aes(x=displ, y=hwy)) +geom_point(aes(color=class, shape=drv), size=2) +scale_colour_viridis_d() +scale_y_log10() +labs(title ="Engine displacement vs Highway consumption",subtitle ="Y is in log scale",y ="", ) +theme(plot.title.position ="plot",legend.position ="left" )
To interpret the scale_* layers, remember the following:
The second component denotes the geometry or mapping. In this case:
scale_colour will affect the color.
scale_y will affect the scale of the y variable.
The third argument denotes the type of transformation applied:
viridis_d denotes that the viridis discrete colormap is used.
log10 applies a log10 transform to the axis.
Note that the axis’ gridlines are scaled accordingly! Something like this would not have happened if we had transformed the y parameter directly. This is quite important, as it would have made interpreting the plot much more unintuitive.
Coordinate planes
Here we use an example of a slightly more advanced data transformation. The mutate function (from the {dplyr} package) can apply a the same transformation to multiple columns. Instead of manually writing down every column, we can use the across() function. This function has a .cols argument and a .fns argument. To the first one, we pass the where(is_character) function: this will automatically filter out all the columns of our dataset that are of the type character. The second argument denotes the transformation we will apply to it, in this case a type conversion into factors.
With a simple + coord_*, we can change the coordinates of our plots:
mpg %>%# change the column type from char to factormutate(across(.cols =where(is_character), .fns = as_factor)) %>%ggplot(aes(x=manufacturer)) +geom_bar(aes(fill=drv)) +scale_fill_viridis_d() +coord_polar()
mpg %>%# change the column type from char to factormutate(across(.cols =where(is_character), .fns = as_factor)) %>%ggplot(aes(x=manufacturer)) +geom_bar(aes(fill=drv)) +scale_fill_viridis_d() +coord_flip()
While coord_polar() might be exotic or more niche, the coord_flip is an elegant solution to swap the axis of our plot. We could simply invert the x and y arguments of the aes() mapping, but this does not always work (e.g. with geom_bar, that only takes an x argument and computes the y “under the hood”).
Facets
the facet_wrap() “layer” can be used to split the plot in different facets.
The first argument of facet_wrap() should be a formula, which you create with ~ followed by a variable name (here “formula” is the name of a data structure in R, not a synonym for “equation”). The variable that you pass to facet_wrap() should be discrete.
`geom_smooth()` using method = 'loess' and formula = 'y ~ x'
As said above, the geoms can take a mapping argument. However, not all arguments can be used in every geom: for example, the shape argument cannot be passed to a geom_line().
Sometimes it’s much better to combine mappings to make your visualisations much simpler to grasp:
p2a <-ggplot(data = mpg) +geom_smooth(mapping =aes(x = displ, y = hwy, group = drv)) +labs(title ="Smooth Regression Line",subtitle ="By type of drive train (no color)" ) +theme(plot.title.position ="plot")p2b <-ggplot(data = mpg) +geom_smooth(mapping =aes(x = displ, y = hwy, color = drv),show.legend =FALSE ) +labs(title ="Smooth Regression Line",subtitle ="By type of drive train (with color)" ) +theme(plot.title.position ="plot")p2a | p2b
`geom_smooth()` using method = 'loess' and formula = 'y ~ x'
`geom_smooth()` using method = 'loess' and formula = 'y ~ x'
We can also layer them:
mpg %>%ggplot(mapping =aes(x = displ, y = hwy)) +geom_point() +geom_smooth() +labs(title ="Smooth Regression Line + Scatterplot", ) +theme(plot.title.position ="plot")
`geom_smooth()` using method = 'loess' and formula = 'y ~ x'
In this case, setting a “global” mapping can help remove code duplication. We can still set custom aesthetics for each layer:
mpg %>%ggplot(mapping =aes(x = displ, y = hwy)) +geom_point(aes(color=class)) +geom_smooth() +labs(title ="Smooth Regression Line + Scatterplot", ) +theme(plot.title.position ="plot")
`geom_smooth()` using method = 'loess' and formula = 'y ~ x'
We can also change the data in each layer:
ggplot(data = mpg, mapping =aes(x = displ, y = hwy)) +geom_point(mapping =aes(color = class)) +geom_smooth(data = mpg %>%filter(class =="subcompact"), se =FALSE) +labs(title ="Smooth regression line + scatterplot",subtitle ="Smooth line fitted only on `class == 'subcompact'`" ) +theme(plot.title.position ="plot")
`geom_smooth()` using method = 'loess' and formula = 'y ~ x'
Extensions to {ggplot}
{ggplot} is so influential and versatile that many packages were written to extend its functionalities, or wrap them to build more advanced visualisations. You can see a list here.
Visualise Models
The {parameters} and {see} packages are part of the {easystats} framework, to make statistical plotting and modelling easier.