Graphs

Charts and How they are generated from Data

Data Variables

Geometry

Graph Types

Mappable Aesthetics

Author

Arvind V.

Published

November 1, 2021

““He is one of those who don’t want millions, but an answer to their questions.”

— Alyosha, in The Brothers Karamazov

1 Setting up R Packages

library(tidyverse)
library(mosaic) # Our all-in-one package
library(skimr) # Looking at data
library(ggformula) # Our plotting package
library(visdat) # Mapping missing data
library(naniar) # Missing data visualization and munging
library(janitor) # Clean the data
library(tinytable) # Printing Tables for our data
library(DT) # Interactive Tables for our data
##
# devtools::install_github("rpruim/Lock5withR")
library(Lock5withR)
library(Lock5Data) # Some neat little datasets from a lovely textbook

Plot Fonts and Theme

Show the Code

library(systemfonts)
library(showtext)
## Clean the slate
systemfonts::clear_local_fonts()
systemfonts::clear_registry()
##
showtext_opts(dpi = 96) # set DPI for showtext
sysfonts::font_add(
  family = "Alegreya",
  regular = "../../../../../../fonts/Alegreya-Regular.ttf",
  bold = "../../../../../../fonts/Alegreya-Bold.ttf",
  italic = "../../../../../../fonts/Alegreya-Italic.ttf",
  bolditalic = "../../../../../../fonts/Alegreya-BoldItalic.ttf"
)

sysfonts::font_add(
  family = "Roboto Condensed",
  regular = "../../../../../../fonts/RobotoCondensed-Regular.ttf",
  bold = "../../../../../../fonts/RobotoCondensed-Bold.ttf",
  italic = "../../../../../../fonts/RobotoCondensed-Italic.ttf",
  bolditalic = "../../../../../../fonts/RobotoCondensed-BoldItalic.ttf"
)
showtext_auto(enable = TRUE) # enable showtext
##
theme_custom <- function() {
  theme_bw(base_size = 10) +
    #
    # theme(panel.widths = unit(11, "cm"),
    #       panel.heights = unit(6.79, "cm")) + # Golden Ratio
    #
    theme(
      plot.margin = margin_auto(t = 1, r = 2, b = 1, l = 1, unit = "cm"),
      plot.background = element_rect(
        fill = "bisque",
        colour = "black",
        linewidth = 1
      )
    ) +

    theme_sub_axis(
      title = element_text(
        family = "Roboto Condensed",
        size = 10
      ),
      text = element_text(
        family = "Roboto Condensed",
        size = 8
      )
    ) +

    theme_sub_legend(
      text = element_text(
        family = "Roboto Condensed",
        size = 6
      ),
      title = element_text(
        family = "Alegreya",
        size = 8
      )
    ) +

    theme_sub_plot(
      title = element_text(
        family = "Alegreya",
        size = 14, face = "bold"
      ),
      title.position = "plot",
      subtitle = element_text(
        family = "Alegreya",
        size = 10
      ),
      caption = element_text(
        family = "Alegreya",
        size = 6
      ),
      caption.position = "plot"
    )
}

## Use available fonts in ggplot text geoms too!
ggplot2::update_geom_defaults(geom = "text", new = list(
  family = "Roboto Condensed",
  face = "plain",
  size = 3.5,
  color = "#2b2b2b"
))
ggplot2::update_geom_defaults(geom = "label", new = list(
  family = "Roboto Condensed",
  face = "plain",
  size = 3.5,
  color = "#2b2b2b"
))

ggplot2::update_geom_defaults(geom = "marquee", new = list(
  family = "Roboto Condensed",
  face = "plain",
  size = 3.5,
  color = "#2b2b2b"
))
ggplot2::update_geom_defaults(geom = "text_repel", new = list(
  family = "Roboto Condensed",
  face = "plain",
  size = 3.5,
  color = "#2b2b2b"
))
ggplot2::update_geom_defaults(geom = "label_repel", new = list(
  family = "Roboto Condensed",
  face = "plain",
  size = 3.5,
  color = "#2b2b2b"
))

## Set the theme
ggplot2::theme_set(new = theme_custom())

## tinytable options
options("tinytable_tt_digits" = 2)
options("tinytable_format_num_fmt" = "significant_cell")
options(tinytable_html_mathjax = TRUE)


## Set defaults for flextable
flextable::set_flextable_defaults(font.family = "Roboto Condensed")

2 Why Visualize?

2.1 An Iconic Presentation

2.2 Some Reasons

We can digest information more easily when it is pictorial
Our Working Memories are both short-term and limited in capacity. So a picture abstracts the details and presents us with an overall summary, an insight, or a story that is both easy to recall and easy on retention.
Data Viz includes shapes that carry strong cultural memories; and impressions for us. These cultural memories help us to use data viz in a universal way to appeal to a wide variety of audiences. (Do humans have a gene for geometry?¹);

2.3 Some Pictures

It helps sift facts from mere statements: for example:

Visuals are a good starting point to make hypotheses of what may be happening in the situation represented by the data

3 Why Analyze?

3.1 Analysis

Visualizations may not tell us the true magnitude or significance of things.
We need analytic methods or statistics to assure ourselves that something is happening
These methods remove human bias and ensure that we are speaking with the assurance that our problem deserves.
Analysis uses numbers, or metrics, that allow us to crystallize our ambiguous words/guesses.
These metrics are calculable from our data, of course, but are not directly visible, despite often being intuitive.
Using these metrics, we need to become, paradoxically enough, sure of our uncertainty

So we need both visuals and analytics. And as we will see, we will not be content with that: we will visualize our analytics, and analyze our visualizations!

3.2 What is Tidy Data?

Let us recall first what we meant by tidy data:

3.3 Tidy Data Principles

Each variable is a column;
Each column contains one kind of data.
Each observation or case is a row.
Each observations contains one value for each variable.

4 What is a Data Visualization?

4.1 Data Viz = Data + Geometry

How many geometric things do we know?
Shapes? Lines? Axes? Curves? Angles? Patterns? Textures? Colours? Sizes? Positions? Lengths? Heights? Breadths? Radii? Textures?
All these are geometric aspects or aesthetics, each with a unique property.
Some “geometric things” which we might consider are shown in the figure below.

Figure 4: Common Geometric Aesthetics in Charts

4.2 Mapping

How can we manipulate these geometric aesthetics, perhaps like Kandinsky?
The aesthetic has a property, an atribute, which we can manipulate in accordance with a data variable!
This act of “mapping” a geometric thing to a variable and modifying its essential property is called Data Visualization

4.3 Mapping Examples

length or height of a bar can be made proportional to theage or income of a person
Colour of points can be mapped to gender, with a unique colour for each gender.
Position along an X-axis can vary in accordance with a height variable, and
Position along the Y-axis can vary with a bodyWeight variable.

4.4 Using Multiple Geometries

A chart may use more than one aesthetic: position, shape, colour,height and angle, pattern or texture to name several.
Usually, each aesthetic is mapped to just one variable to ensure there is no cognitive error.
There is of course a choice and you should be able to map any kind of variable to any geometric aspect/aesthetic that may be available.

4.5 A Natural Mapping

Note that here is also a “natural” mapping between aesthetic and kind of variable Quantitative or Qualitative as seen in
For instance, shape is rarely mapped to a Quantitative variable;
the nature of variation between the Quantitative variable and the shape aesthetic is not similar (i.e. not continuous).
Bad choices may lead to bad, or worse, misleading charts!

4.6 A Data Visualization Example

Show the Code

set.seed(1947)
diamonds %>%
  slice_sample(n = 150, weight_by = cut) %>%
  gf_point(price ~ carat,
    colour = ~cut,
    shape = ~cut,
    size = 2, data = .
  ) %>%
  gf_labs(
    title = "Plot Title = DIAMONDS ARE FOREVER",
    subtitle = "Plot Subtitle = AND A GIRL'S BEST FRIEND",
    caption = "Plot Caption = From the diamonds dataset",
    x = "x-Axis Title = CARAT",
    y = "y-Axis Title = PRICE"
  ) %>%
  # Use same name for scales to merge legends
  gf_refine(
    scale_color_brewer(
      name = "Legend = DIAMOND QUALITY",
      palette = "Set1"
    ),
    scale_shape_manual(
      name = "Legend = DIAMOND QUALITY",
      values = c(15:21)
    )
  ) %>%
  gf_annotate("text",
    x = 1.0, y = 16000,
    label = "These DIAMONDS are\n Super Affordable!!",
    fontface = "bold",
    size = 2
  ) %>%
  gf_annotate("curve",
    x = 0.9,
    y = 14500,
    yend = 8000,
    xend = 0.95,
    linewidth = 0.5,
    curvature = 0.5,
    arrow = arrow(length = unit(0.25, "cm"))
  ) %>%
  gf_annotate(
    "rect",
    xmin = 1,
    xmax = 1.25,
    ymin = 2250,
    ymax = 10000,
    alpha = 0.5,
    fill = "grey80",
    col = "black"
  )

Figure 5: Data Vis Components and Features

4.7 What were the Components?

In the above chart, it is pretty clear what kind of variable is plotted on the x-axis and the y-axis.
The dominant geometry is a point, whose position is determined by the x and y variables.
The shape of the point is determined by the cut variable
What about colour? Could this be considered as another axis in the chart?
There are also other aspects that you can choose (not explicitly shown here) such as the plot theme(colours, fonts, backgrounds etc)
which may not be mapped to data, but are nonetheless choices to be made.
We will get acquainted with this aspect as we build charts.

4.8 Transformations

As we will see, Data Variables may be transformed before being mapped to some geometric aesthetic
e.g. we may perform counts with a Qual variable that contains only the entries {S, M, L, XL}.
We may also transform the axes (make them logarithmic, or even polar ) to create precisely the shape-meaning we wish.
This allows us considerable flexibility in making charts!!

4.9 Facets

Finally, if the graph is too busy, with lots of colours and shapes, then we can split the graph into many small multiples or facets, each showing a subset of the data.
This is called faceting and is a powerful way to reduce cognitive load on the viewer.

Show the Code

set.seed(1947)
diamonds %>%
  slice_sample(n = 150, weight_by = cut) %>%
  gf_point(price ~ carat | clarity,
    colour = ~cut,
    shape = ~cut,
    size = 2, data = .
  ) %>%
  gf_labs(
    title = "Plot Title = DIAMONDS ARE FOREVER",
    subtitle = "Plot Subtitle = AND A GIRL'S BEST FRIEND",
    caption = "Plot Caption = From the diamonds dataset",
    x = "x-Axis Title = CARAT",
    y = "y-Axis Title = PRICE"
  ) %>%
  # Use same name for scales to merge legends
  gf_refine(
    scale_color_brewer(
      name = "Legend = DIAMOND QUALITY",
      palette = "Set1"
    ),
    scale_shape_manual(
      name = "Legend = DIAMOND QUALITY",
      values = c(15:21)
    )
  )

5 Basic Types of Charts

5.1 Mapping Variables to Aesthetics

We can therefore think of simple visualizations as combinations of aesthetics, mapped to combinations of variables.
It should be possible to use the many shapes we know, or can conceive of, and marry them to data to create a brand new visualization method that advances both understanding and retention! You should try!!

5.2 Mappings and Charts: A Catalogue

Geometries , Combinations, and Graphs
Variable #1	Variable #2	Chart Names
Quant	None	Histogram and Density
Qual	None	Bar Chart
Quant	Quant	Scatter Plot, Line Chart, Bubble Plot, Area Chart
Quant	Qual	Pie Chart, Donut Chart, Column Chart, Box-Whisker Plot, Radar Chart, Bump Chart, Tree Diagram
Qual	Qual	Stacked Bar Chart, Mosaic Chart, Sankey, Chord Diagram, Network Diagram

6 Conclusion

6.1 Data Science Workflow

6.2 Workflow Description

So there we have it:

Data: We generate data by experiment, or obtain readily available data. We import and clean the data
Variables: Questions lead us to identify Types of Variables (Quant and Qual)
Transform: Sometimes we may need to transform the data (long to wide, summarize, create new variables…)
Explore: Further Questions lead us to infer relationships between variables, the relative size of things, which we describe using Data Visualizations
Report: This may be of interest, or best of all, outright surprising! Which is finally Communicated with charts and descriptions in a research report.

6.3 Grammar of Data Visualization

You might think of all these Questions, Answers, Mapping as being equivalent to a grammar, as a language in itself.

And indeed, in R we use a philosophy called the Grammar of Graphics! We will use this grammar in the R graphics packages that we will encounter when we make Graphs next.

Other parts of the Workflow (Transformation, Facetting, Analysis and Modelling) also fall within the grammar, as we shall see.

7 AI Generated Summary and Podcast

7.1 Summary

This is a tutorial on data visualization using the R programming language. It introduces concepts such as data types, variables, and visualization techniques. The tutorial utilizes metaphors to explain these concepts, emphasizing the use of geometric aesthetics to represent data. It also highlights the importance of both visual and analytic approaches in understanding data. The tutorial then demonstrates basic chart types, including histograms, scatterplots, and bar charts, and discusses the “Grammar of Graphics” philosophy that guides data visualization in R. The text concludes with a workflow diagram for data science, emphasizing the iterative process of data import, cleaning, transformation, visualization, hypothesis generation, analysis, and communication.

8 References

Claus Wilke. Fundamentals of Data Visualization. https://clauswilke.com/dataviz/
Kieran Healy. Data Visualization: A Practical Introduction. https://socviz.co/
Winston Chang. R Graphics Cookbook. https://r-graphics.org/
Hadley Wickham and Garrett Grolemund. R for Data Science. https://r4ds.had.co.nz/
Jack Dougherty and Ilya Ilyankou. Hands-On Data Visualization. https://handsondataviz.org/
Albert Rapp. Adding images to ggplot. https://albert-rapp.de/posts/ggplot2-tips/27_images/27_images

R Package Citations

Package	Version	Citation
ggformula	1.0.0	Kaplan and Pruim (2025)
Lock5Data	3.0.0	Lock (2021)
mosaic	1.9.2	Pruim, Kaplan, and Horton (2017)
TeachingDemos	2.13	Snow (2024)

Kaplan, Daniel, and Randall Pruim. 2025. ggformula: Formula Interface to the Grammar of Graphics. https://doi.org/10.32614/CRAN.package.ggformula.

Lock, Robin. 2021. Lock5Data: Datasets for “Statistics: UnLocking the Power of Data”. https://doi.org/10.32614/CRAN.package.Lock5Data.

Pruim, Randall, Daniel T Kaplan, and Nicholas J Horton. 2017. “The Mosaic Package: Helping Students to ‘Think with Data’ Using r.” The R Journal 9 (1): 77–102. https://journal.r-project.org/archive/2017/RJ-2017-024/index.html.

Snow, Greg. 2024. TeachingDemos: Demonstrations for Teaching and Learning. https://doi.org/10.32614/CRAN.package.TeachingDemos.

Footnotes

https://www.xcode.in/genes-and-personality/how-genes-influence-your-math-ability/↩︎

Citation

BibTeX citation:

@online{v.2021,
  author = {V., Arvind},
  title = {\textless Iconify-Icon Icon=“carbon:chart-3d” Width=“1.2em”
    Height=“1.2em”\textgreater\textless/Iconify-Icon\textgreater{}
    {Graphs}},
  date = {2021-11-01},
  url = {https://madhatterguide.netlify.app/content/courses/Analytics/10-Descriptive/Modules/09-Graphs/},
  langid = {en}
}

For attribution, please cite this work as:

V., Arvind. 2021. “<Iconify-Icon Icon=‘carbon:chart-3d’ Width=‘1.2em’ Height=‘1.2em’></Iconify-Icon> Graphs.” November 1, 2021. https://madhatterguide.netlify.app/content/courses/Analytics/10-Descriptive/Modules/09-Graphs/.