1 Introduction

1.1 Disclaimer

This material is heavily inspired (stolen?) from

1.2 Prerequisites

  • basic knowledge of R (as from the ‘RIntro’ course)

1.3 Course objective

Main objective: Hands on ggplot

Schedule

  • Morning (9:30-12:00) be able to read the ggplot2 cheatsheet
  • Afternoon (14:00-16:00) draw your own plot

What is ggplot2?

  • ggplot2 creates an R object that can be transformed with ‘+’ and then exported on a device.
  • general approach to plotting by following the `grammars of graphics’ rule of Wilkinson (2005)Wilkinson L (2005) The grammar of graphics. Statistics and computing, 2nd edn. Springer, New York
  • Use case: Explorative data analysis and high-level publication figures
# install.packages("ggplot2")
library(ggplot2)
ggplot(faithful, aes(x=waiting, y=eruptions)) + 
  geom_point() + 
  labs(title="ggplot2 default")

1.4 Not in this course (next one?)

Theory of data visualisation: how to draw information to convey a (non-misleading) message

Base R plotting

  • e.g. plot(x,y,…), is like a piece of paper, things are added on a device (plot window, pdf, html, etc.) until the device is closed.
  • Specific plotting functions, e.g. hist(), barplot(), etc…
  • Use case: still quickest look at the data
plot(x=faithful$waiting, y=faithful$eruptions, main="base r default")

plotly

  • plotly creates interactive web-based graphs (html-widgets) via the open source JavaScript graphing library plotly.js.
  • provides interface to ggplot2 to create interactive ggplot versions
  • exands into dashboards based on dash (written on top of plotly.js)
  • Use case: exploratory data analysis, entering discussion on interpretation
library(plotly)
plot_ly(data = faithful, x = ~waiting, y = ~eruptions) %>%
  layout(title="plotly default")

many other packages/attempts

ggplot extensions

2 Background

ggplot is heavily inspired by the grammar of graphics. Grammar of graphics is a general approach to plotting by following the approach of Wilkinson L (2005) The grammar of graphics. Statistics and computing, 2nd edn. Springer, New York.

The grammar of graphics distinguishes several elements

  • Data
    • information to display
  • Mapping
    • aesthetic mapping: variables in data to graphical properties in the geometry
    • facet mapping: variables in data to panels (facets) in the plot
  • Statistics
    • transform data to values to be displayed (e.g. smooth data, calculate histogram)
  • Scales
    • translate data values into figure properties (e.g. categories into colors)
  • Geometries
    • data displayed as points, lines, bars, etc…
  • Facets
    • look at subsets of data in different panels
  • Coordinates
    • the coordinate system to display position values
  • Theme
    • look-and-feel of the graph

All of these elements are somehow represented in r-package ggplot2.

3 Discover ggplot2

3.1 A first basic plot

A first basic plot:

ggplot(data = faithful,
       mapping = aes(x = eruptions,y = waiting)) +
  geom_point()

Which elements have been used?

  • Data, data.frame faithful
  • Mapping (aes() defines vars to coord.axes)
  • Statistics
  • Scales
  • Geometries, geoms tell how to display data (geom_point() - scatterplot)
  • Facets
  • Coordinates
  • Theme

Answer: All elements

  • some (the minimum) specified by user,
  • all others by default settings.

Note the ‘+’ to combine different parts of the graphic (the layers)

# same plot - alternative specification
ggplot(data = faithful) +
  geom_point(mapping = aes(x = waiting,y = eruptions),
             colour= "blue")

# local (here) vs. global (above) aesthetics

3.1.1 Exercise - change a geom

  • set ‘shape’ of points to all diamonds
  • set ‘color’ of points to all blue
  • set ‘alpha’ of points (transparency) to all 0.3
  • vary ‘size’ of dots by eruptions/waiting
  • vary ‘color’ of points by ‘eruptions > 3.1’

Use ‘?geom_point’, ggplot2 cheatsheet, vignette(“ggplot2-specs”)

Look at g and g_built with the view function. Can you find size? It’s in the ‘data’ part of one of the objects.

Insights

  • settings vs. aesthetics
  • x, y, size etc.. not transmitted as strings, rather as objects. You can calculate it!

3.1.2 When is what calculated/executed?

g <- ggplot(data = faithful) +
  geom_point(mapping = aes(x = eruptions,y = waiting, size=eruptions/waiting))

No output - result of ggplot() captured in object g. What’s in g? Can you find ‘size’?

Insight

  • the function ggplot() returns the data and settings/parameters/function names, not the values to be plotted.
  • only when print() function is called values are calculated.

3.1.3 Exercise - try out different geom_*

df <- data.frame(x = c(1,2,5,4,5), y = c(9, 1, 9,3,4))
base <- ggplot(df, aes(x, y))
base + geom_point()
base + geom_line()
base + geom_path()
base + geom_polygon()
base + geom_rect()
base + geom_ribbon(aes(ymin=x,ymax=y))
base + geom_label(aes(label=paste0("(",x,",",y,")")))
faithful %>% slice(10)
##   eruptions waiting
## 1      4.35      85
faithful[10,]
##    eruptions waiting
## 10      4.35      85
ggplot(data = faithful) +
    geom_point(mapping = aes(x = eruptions,y = waiting),
             data = faithful %>% filter(eruptions > 3 & eruptions < 3.1), colour="red", size=10) +
  geom_point(mapping = aes(x = eruptions,y = waiting))