The Mad Hatter’s Guide to Data Viz and Stats in R
  1. Data Viz and Stats
  2. Descriptive Analytics
  3. Graphs
  • Data Viz and Stats
    • Tools
      • Introduction to R and RStudio
    • Descriptive Analytics
      • Data
      • Inspect Data
      • Graphs
      • Summaries
      • Counts
      • Quantities
      • Groups
      • Distributions
      • Groups and Distributions
      • Change
      • Proportions
      • Parts of a Whole
      • Evolution and Flow
      • Ratings and Rankings
      • Surveys
      • Time
      • Space
      • Networks
      • Miscellaneous Graphing Tools, and References
    • Inference
      • Basics of Statistical Inference
      • 🎲 Samples, Populations, Statistics and Inference
      • Basics of Randomization Tests
      • Inference for a Single Mean
      • Inference for Two Independent Means
      • Inference for Comparing Two Paired Means
      • Comparing Multiple Means with ANOVA
      • Inference for Correlation
      • Testing a Single Proportion
      • Inference Test for Two Proportions
    • Modelling
      • Modelling with Linear Regression
      • Modelling with Logistic Regression
      • 🕔 Modelling and Predicting Time Series
    • Workflow
      • Facing the Abyss
      • I Publish, therefore I Am
      • Data Carpentry
    • Arts
      • Colours
      • Fonts in ggplot
      • Annotating Plots: Text, Labels, and Boxes
      • Annotations: Drawing Attention to Parts of the Graph
      • Highlighting parts of the Chart
      • Changing Scales on Charts
      • Assembling a Collage of Plots
      • Making Diagrams in R
    • AI Tools
      • Using gander and ellmer
      • Using Github Copilot and other AI tools to generate R code
      • Using LLMs to Explain Stat models
    • Case Studies
      • Demo:Product Packaging and Elderly People
      • Ikea Furniture
      • Movie Profits
      • Gender at the Work Place
      • Heptathlon
      • School Scores
      • Children's Games
      • Valentine’s Day Spending
      • Women Live Longer?
      • Hearing Loss in Children
      • California Transit Payments
      • Seaweed Nutrients
      • Coffee Flavours
      • Legionnaire’s Disease in the USA
      • Antarctic Sea ice
      • William Farr's Observations on Cholera in London
    • Projects
      • Project: Basics of EDA #1
      • Project: Basics of EDA #2
      • Experiments

On this page

  • 1 Setting up R Packages
  • 2 Why Visualize?
    • 2.1 An Iconic Presentation
    • 2.2 Some Reasons
    • 2.3 Some Pictures
  • 3 Why Analyze?
    • 3.1 Analysis
    • 3.2 What is Tidy Data?
    • 3.3 Tidy Data Principles
  • 4 What is a Data Visualization?
    • 4.1 Data Viz = Data + Geometry
    • 4.2 Mapping
    • 4.3 Mapping Examples
    • 4.4 Using Multiple Geometries
    • 4.5 A Natural Mapping
    • 4.6 A Data Visualization Example
    • 4.7 What were the Components?
    • 4.8 Transformations
    • 4.9 Facets
  • 5 Basic Types of Charts
    • 5.1 Mapping Variables to Aesthetics
    • 5.2 Mappings and Charts: A Catalogue
  • 6 Conclusion
    • 6.1 Data Science Workflow
    • 6.2 Workflow Description
    • 6.3 Grammar of Data Visualization
  • 7 AI Generated Summary and Podcast
    • 7.1 Summary
  • 8 References
  1. Data Viz and Stats
  2. Descriptive Analytics
  3. Graphs

Graphs

Charts and How they are generated from Data

Data Variables
Geometry
Graph Types
Mappable Aesthetics
Author

Arvind V.

Published

November 1, 2021

““He is one of those who don’t want millions, but an answer to their questions.”

— Alyosha, in The Brothers Karamazov

1 Setting up R Packages

library(tidyverse)
library(mosaic) # Our all-in-one package
library(skimr) # Looking at data
library(ggformula) # Our plotting package
library(visdat) # Mapping missing data
library(naniar) # Missing data visualization and munging
library(janitor) # Clean the data
library(tinytable) # Printing Tables for our data
library(DT) # Interactive Tables for our data
##
# devtools::install_github("rpruim/Lock5withR")
library(Lock5withR)
library(Lock5Data) # Some neat little datasets from a lovely textbook

Plot Fonts and Theme

Show the Code
library(systemfonts)
library(showtext)
## Clean the slate
systemfonts::clear_local_fonts()
systemfonts::clear_registry()
##
showtext_opts(dpi = 96) # set DPI for showtext
sysfonts::font_add(
  family = "Alegreya",
  regular = "../../../../../../fonts/Alegreya-Regular.ttf",
  bold = "../../../../../../fonts/Alegreya-Bold.ttf",
  italic = "../../../../../../fonts/Alegreya-Italic.ttf",
  bolditalic = "../../../../../../fonts/Alegreya-BoldItalic.ttf"
)

sysfonts::font_add(
  family = "Roboto Condensed",
  regular = "../../../../../../fonts/RobotoCondensed-Regular.ttf",
  bold = "../../../../../../fonts/RobotoCondensed-Bold.ttf",
  italic = "../../../../../../fonts/RobotoCondensed-Italic.ttf",
  bolditalic = "../../../../../../fonts/RobotoCondensed-BoldItalic.ttf"
)
showtext_auto(enable = TRUE) # enable showtext
##
theme_custom <- function() {
  theme_bw(base_size = 10) +

    theme_sub_axis(
      title = element_text(
        family = "Roboto Condensed",
        size = 8
      ),
      text = element_text(
        family = "Roboto Condensed",
        size = 6
      )
    ) +

    theme_sub_legend(
      text = element_text(
        family = "Roboto Condensed",
        size = 6
      ),
      title = element_text(
        family = "Alegreya",
        size = 8
      )
    ) +

    theme_sub_plot(
      title = element_text(
        family = "Alegreya",
        size = 14, face = "bold"
      ),
      title.position = "plot",
      subtitle = element_text(
        family = "Alegreya",
        size = 10
      ),
      caption = element_text(
        family = "Alegreya",
        size = 6
      ),
      caption.position = "plot"
    )
}

## Use available fonts in ggplot text geoms too!
ggplot2::update_geom_defaults(geom = "text", new = list(
  family = "Roboto Condensed",
  face = "plain",
  size = 3.5,
  color = "#2b2b2b"
))
ggplot2::update_geom_defaults(geom = "label", new = list(
  family = "Roboto Condensed",
  face = "plain",
  size = 3.5,
  color = "#2b2b2b"
))

ggplot2::update_geom_defaults(geom = "marquee", new = list(
  family = "Roboto Condensed",
  face = "plain",
  size = 3.5,
  color = "#2b2b2b"
))
ggplot2::update_geom_defaults(geom = "text_repel", new = list(
  family = "Roboto Condensed",
  face = "plain",
  size = 3.5,
  color = "#2b2b2b"
))
ggplot2::update_geom_defaults(geom = "label_repel", new = list(
  family = "Roboto Condensed",
  face = "plain",
  size = 3.5,
  color = "#2b2b2b"
))

## Set the theme
ggplot2::theme_set(new = theme_custom())

## tinytable options
options("tinytable_tt_digits" = 2)
options("tinytable_format_num_fmt" = "significant_cell")
options(tinytable_html_mathjax = TRUE)


## Set defaults for flextable
flextable::set_flextable_defaults(font.family = "Roboto Condensed")

2 Why Visualize?

2.1 An Iconic Presentation

2.2 Some Reasons

  • We can digest information more easily when it is pictorial
  • Our Working Memories are both short-term and limited in capacity. So a picture abstracts the details and presents us with an overall summary, an insight, or a story that is both easy to recall and easy on retention.
  • Data Viz includes shapes that carry strong cultural memories; and impressions for us. These cultural memories help us to use data viz in a universal way to appeal to a wide variety of audiences. (Do humans have a gene for geometry?1);

2.3 Some Pictures

  • It helps sift facts from mere statements: for example:
Figure 1: Rape Capital
Figure 2: Data Reveals Crime
  • Visuals are a good starting point to make hypotheses of what may be happening in the situation represented by the data

3 Why Analyze?

3.1 Analysis

  • Visualizations may not tell us the true magnitude or significance of things.
  • We need analytic methods or statistics to assure ourselves that something is happening
  • These methods remove human bias and ensure that we are speaking with the assurance that our problem deserves.
  • Analysis uses numbers, or metrics, that allow us to crystallize our ambiguous words/guesses.
  • These metrics are calculable from our data, of course, but are not directly visible, despite often being intuitive.
  • Using these metrics, we need to become, paradoxically enough, sure of our uncertainty

So we need both visuals and analytics. And as we will see, we will not be content with that: we will visualize our analytics, and analyze our visualizations!

3.2 What is Tidy Data?

Let us recall first what we meant by tidy data:

Figure 3: Tidy Data

3.3 Tidy Data Principles

  • Each variable is a column;
  • Each column contains one kind of data.
  • Each observation or case is a row.
  • Each observations contains one value for each variable.

4 What is a Data Visualization?

4.1 Data Viz = Data + Geometry

  • How many geometric things do we know?
  • Shapes? Lines? Axes? Curves? Angles? Patterns? Textures? Colours? Sizes? Positions? Lengths? Heights? Breadths? Radii? Textures?
  • All these are geometric aspects or aesthetics, each with a unique property.
  • Some “geometric things” which we might consider are shown in the figure below.
Figure 4: Common Geometric Aesthetics in Charts

4.2 Mapping

  • How can we manipulate these geometric aesthetics, perhaps like Kandinsky?
  • The aesthetic has a property, an atribute, which we can manipulate in accordance with a data variable!
  • This act of “mapping” a geometric thing to a variable and modifying its essential property is called Data Visualization

4.3 Mapping Examples

  • length or height of a bar can be made proportional to theage or income of a person
  • Colour of points can be mapped to gender, with a unique colour for each gender.
  • Position along an X-axis can vary in accordance with a height variable, and
  • Position along the Y-axis can vary with a bodyWeight variable.

4.4 Using Multiple Geometries

  • A chart may use more than one aesthetic: position, shape, colour,height and angle, pattern or texture to name several.
  • Usually, each aesthetic is mapped to just one variable to ensure there is no cognitive error.
  • There is of course a choice and you should be able to map any kind of variable to any geometric aspect/aesthetic that may be available.

4.5 A Natural Mapping

  • Note that here is also a “natural” mapping between aesthetic and kind of variable Quantitative or Qualitative as seen in
  • For instance, shape is rarely mapped to a Quantitative variable;
  • the nature of variation between the Quantitative variable and the shape aesthetic is not similar (i.e. not continuous).
  • Bad choices may lead to bad, or worse, misleading charts!

4.6 A Data Visualization Example

Show the Code
set.seed(1947)
diamonds %>%
  slice_sample(n = 150, weight_by = cut) %>%
  gf_point(price ~ carat,
    colour = ~cut,
    shape = ~cut,
    size = 2, data = .
  ) %>%
  gf_labs(
    title = "Plot Title = DIAMONDS ARE FOREVER",
    subtitle = "Plot Subtitle = AND A GIRL'S BEST FRIEND",
    caption = "Plot Caption = From the diamonds dataset",
    x = "x-Axis Title = CARAT",
    y = "y-Axis Title = PRICE"
  ) %>%
  # Use same name for scales to merge legends
  gf_refine(
    scale_color_brewer(
      name = "Legend = DIAMOND QUALITY",
      palette = "Set1"
    ),
    scale_shape_manual(
      name = "Legend = DIAMOND QUALITY",
      values = c(15:21)
    )
  ) %>%
  gf_annotate("text",
    x = 1.0, y = 16000,
    label = "These DIAMONDS are\n Super Affordable!!",
    fontface = "bold",
    size = 2
  ) %>%
  gf_annotate("curve",
    x = 0.9,
    y = 14500,
    yend = 8000,
    xend = 0.95,
    linewidth = 0.5,
    curvature = 0.5,
    arrow = arrow(length = unit(0.25, "cm"))
  ) %>%
  gf_annotate(
    "rect",
    xmin = 1,
    xmax = 1.25,
    ymin = 2250,
    ymax = 10000,
    alpha = 0.5,
    fill = "grey80",
    col = "black"
  )
Figure 5: Data Vis Components and Features

4.7 What were the Components?

  • In the above chart, it is pretty clear what kind of variable is plotted on the x-axis and the y-axis.
  • The dominant geometry is a point, whose position is determined by the x and y variables.
  • The shape of the point is determined by the cut variable
  • What about colour? Could this be considered as another axis in the chart?
  • There are also other aspects that you can choose (not explicitly shown here) such as the plot theme(colours, fonts, backgrounds etc)
  • which may not be mapped to data, but are nonetheless choices to be made.
  • We will get acquainted with this aspect as we build charts.

4.8 Transformations

  • As we will see, Data Variables may be transformed before being mapped to some geometric aesthetic
  • e.g. we may perform counts with a Qual variable that contains only the entries {S, M, L, XL}.
  • We may also transform the axes (make them logarithmic, or even polar ) to create precisely the shape-meaning we wish.
  • This allows us considerable flexibility in making charts!!

4.9 Facets

  • Finally, if the graph is too busy, with lots of colours and shapes, then we can split the graph into many small multiples or facets, each showing a subset of the data.
  • This is called faceting and is a powerful way to reduce cognitive load on the viewer.
Show the Code
set.seed(1947)
diamonds %>%
  slice_sample(n = 150, weight_by = cut) %>%
  gf_point(price ~ carat | clarity,
    colour = ~cut,
    shape = ~cut,
    size = 2, data = .
  ) %>%
  gf_labs(
    title = "Plot Title = DIAMONDS ARE FOREVER",
    subtitle = "Plot Subtitle = AND A GIRL'S BEST FRIEND",
    caption = "Plot Caption = From the diamonds dataset",
    x = "x-Axis Title = CARAT",
    y = "y-Axis Title = PRICE"
  ) %>%
  # Use same name for scales to merge legends
  gf_refine(
    scale_color_brewer(
      name = "Legend = DIAMOND QUALITY",
      palette = "Set1"
    ),
    scale_shape_manual(
      name = "Legend = DIAMOND QUALITY",
      values = c(15:21)
    )
  )
Figure 6: Data Vis Facets

5 Basic Types of Charts

5.1 Mapping Variables to Aesthetics

  • We can therefore think of simple visualizations as combinations of aesthetics, mapped to combinations of variables.
  • It should be possible to use the many shapes we know, or can conceive of, and marry them to data to create a brand new visualization method that advances both understanding and retention! You should try!!

5.2 Mappings and Charts: A Catalogue

Geometries , Combinations, and Graphs
Variable #1 Variable #2 Chart Names Chart Shape
Quant None Histogram and Density
Qual None Bar Chart

Quant Quant Scatter Plot, Line Chart, Bubble Plot, Area Chart
Quant Qual Pie Chart, Donut Chart, Column Chart, Box-Whisker Plot, Radar Chart, Bump Chart, Tree Diagram
Qual Qual Stacked Bar Chart, Mosaic Chart, Sankey, Chord Diagram, Network Diagram

6 Conclusion

6.1 Data Science Workflow

Figure 7: Data Science Workflow

6.2 Workflow Description

So there we have it:

  • Data: We generate data by experiment, or obtain readily available data. We import and clean the data
  • Variables: Questions lead us to identify Types of Variables (Quant and Qual)
  • Transform: Sometimes we may need to transform the data (long to wide, summarize, create new variables…)
  • Explore: Further Questions lead us to infer relationships between variables, the relative size of things, which we describe using Data Visualizations
  • Report: This may be of interest, or best of all, outright surprising! Which is finally Communicated with charts and descriptions in a research report.

6.3 Grammar of Data Visualization

You might think of all these Questions, Answers, Mapping as being equivalent to a grammar, as a language in itself.

And indeed, in R we use a philosophy called the Grammar of Graphics! We will use this grammar in the R graphics packages that we will encounter when we make Graphs next.

Other parts of the Workflow (Transformation, Facetting, Analysis and Modelling) also fall within the grammar, as we shall see.

7 AI Generated Summary and Podcast

7.1 Summary

This is a tutorial on data visualization using the R programming language. It introduces concepts such as data types, variables, and visualization techniques. The tutorial utilizes metaphors to explain these concepts, emphasizing the use of geometric aesthetics to represent data. It also highlights the importance of both visual and analytic approaches in understanding data. The tutorial then demonstrates basic chart types, including histograms, scatterplots, and bar charts, and discusses the “Grammar of Graphics” philosophy that guides data visualization in R. The text concludes with a workflow diagram for data science, emphasizing the iterative process of data import, cleaning, transformation, visualization, hypothesis generation, analysis, and communication.

Your browser does not support the audio tag; for browser support, please see: https://www.w3schools.com/tags/tag_audio.asp

8 References

  1. Claus Wilke. Fundamentals of Data Visualization. https://clauswilke.com/dataviz/
  2. Kieran Healy. Data Visualization: A Practical Introduction. https://socviz.co/
  3. Winston Chang. R Graphics Cookbook. https://r-graphics.org/
  4. Hadley Wickham and Garrett Grolemund. R for Data Science. https://r4ds.had.co.nz/
  5. Jack Dougherty and Ilya Ilyankou. Hands-On Data Visualization. https://handsondataviz.org/
  6. Albert Rapp. Adding images to ggplot. https://albert-rapp.de/posts/ggplot2-tips/27_images/27_images

R Package Citations

Package Version Citation
ggformula 0.12.2 Kaplan and Pruim (2025)
Lock5Data 3.0.0 Lock (2021)
mosaic 1.9.2 Pruim, Kaplan, and Horton (2017)
TeachingDemos 2.13 Snow (2024)
Kaplan, Daniel, and Randall Pruim. 2025. ggformula: Formula Interface to the Grammar of Graphics. https://doi.org/10.32614/CRAN.package.ggformula.
Lock, Robin. 2021. Lock5Data: Datasets for “Statistics: UnLocking the Power of Data”. https://doi.org/10.32614/CRAN.package.Lock5Data.
Pruim, Randall, Daniel T Kaplan, and Nicholas J Horton. 2017. “The Mosaic Package: Helping Students to ‘Think with Data’ Using r.” The R Journal 9 (1): 77–102. https://journal.r-project.org/archive/2017/RJ-2017-024/index.html.
Snow, Greg. 2024. TeachingDemos: Demonstrations for Teaching and Learning. https://doi.org/10.32614/CRAN.package.TeachingDemos.
Back to top

Footnotes

  1. https://www.xcode.in/genes-and-personality/how-genes-influence-your-math-ability/↩︎

Citation

BibTeX citation:
@online{v.2021,
  author = {V., Arvind},
  title = {\textless Iconify-Icon Icon=“carbon:chart-3d” Width=“1.2em”
    Height=“1.2em”\textgreater\textless/Iconify-Icon\textgreater{}
    {Graphs}},
  date = {2021-11-01},
  url = {https://madhatterguide.netlify.app/content/courses/Analytics/10-Descriptive/Modules/09-Graphs/},
  langid = {en}
}
For attribution, please cite this work as:
V., Arvind. 2021. “<Iconify-Icon Icon=‘carbon:chart-3d’ Width=‘1.2em’ Height=‘1.2em’></Iconify-Icon> Graphs.” November 1, 2021. https://madhatterguide.netlify.app/content/courses/Analytics/10-Descriptive/Modules/09-Graphs/.
Inspect Data
Summaries

License: CC BY-SA 2.0

Website made with ❤️ and Quarto, by Arvind V.

Hosted by Netlify .