The Mad Hatter’s Guide to Data Viz and Stats in R
  1. Data Viz and Stats
  2. Case Studies
  3. Antarctic Sea ice
  • Data Viz and Stats
    • Tools
      • Introduction to R and RStudio
    • Descriptive Analytics
      • Data
      • Inspect Data
      • Graphs
      • Summaries
      • Counts
      • Quantities
      • Groups
      • Distributions
      • Groups and Distributions
      • Change
      • Proportions
      • Parts of a Whole
      • Evolution and Flow
      • Ratings and Rankings
      • Surveys
      • Time
      • Space
      • Networks
      • Miscellaneous Graphing Tools, and References
    • Inference
      • Basics of Statistical Inference
      • 🎲 Samples, Populations, Statistics and Inference
      • Basics of Randomization Tests
      • Inference for a Single Mean
      • Inference for Two Independent Means
      • Inference for Comparing Two Paired Means
      • Comparing Multiple Means with ANOVA
      • Inference for Correlation
      • Testing a Single Proportion
      • Inference Test for Two Proportions
    • Modelling
      • Modelling with Linear Regression
      • Modelling with Logistic Regression
      • 🕔 Modelling and Predicting Time Series
    • Workflow
      • Facing the Abyss
      • I Publish, therefore I Am
      • Data Carpentry
    • Arts
      • Colours
      • Fonts in ggplot
      • Annotating Plots: Text, Labels, and Boxes
      • Annotations: Drawing Attention to Parts of the Graph
      • Highlighting parts of the Chart
      • Changing Scales on Charts
      • Assembling a Collage of Plots
      • Making Diagrams in R
    • AI Tools
      • Using gander and ellmer
      • Using Github Copilot and other AI tools to generate R code
      • Using LLMs to Explain Stat models
    • Case Studies
      • Demo:Product Packaging and Elderly People
      • Ikea Furniture
      • Movie Profits
      • Gender at the Work Place
      • Heptathlon
      • School Scores
      • Children's Games
      • Valentine’s Day Spending
      • Women Live Longer?
      • Hearing Loss in Children
      • California Transit Payments
      • Seaweed Nutrients
      • Coffee Flavours
      • Legionnaire’s Disease in the USA
      • Antarctic Sea ice
      • William Farr's Observations on Cholera in London
    • Projects
      • Project: Basics of EDA #1
      • Project: Basics of EDA #2
      • Experiments

On this page

  • 1 Setting up R Packages
  • 2 Introduction
  • 3 Read the Data
  • 4 Inspect the Data
  • 5 Data Dictionary
  • 6 Analyse/Transform the Data
  • 7 Research Question
  • 8 Plot the Data
  • 9 Tasks and Discussion
  1. Data Viz and Stats
  2. Case Studies
  3. Antarctic Sea ice

Antarctic Sea ice

1 Setting up R Packages

library(tidyverse)
library(mosaic)
library(skimr)
library(ggformula)

Plot Fonts and Theme

Show the Code
library(systemfonts)
library(showtext)
## Clean the slate
systemfonts::clear_local_fonts()
systemfonts::clear_registry()
##
showtext_opts(dpi = 96) # set DPI for showtext
sysfonts::font_add(
  family = "Alegreya",
  regular = "../../../../../../fonts/Alegreya-Regular.ttf",
  bold = "../../../../../../fonts/Alegreya-Bold.ttf",
  italic = "../../../../../../fonts/Alegreya-Italic.ttf",
  bolditalic = "../../../../../../fonts/Alegreya-BoldItalic.ttf"
)

sysfonts::font_add(
  family = "Roboto Condensed",
  regular = "../../../../../../fonts/RobotoCondensed-Regular.ttf",
  bold = "../../../../../../fonts/RobotoCondensed-Bold.ttf",
  italic = "../../../../../../fonts/RobotoCondensed-Italic.ttf",
  bolditalic = "../../../../../../fonts/RobotoCondensed-BoldItalic.ttf"
)
showtext_auto(enable = TRUE) # enable showtext
##
theme_custom <- function() {
  font <- "Alegreya" # assign font family up front
  "%+replace%" <- ggplot2::"%+replace%" # nolint

  theme_classic(base_size = 14, base_family = font) %+replace% # replace elements we want to change

    theme(
      text = element_text(family = font), # set base font family

      # text elements
      plot.title = element_text( # title
        family = font, # set font family
        size = 24, # set font size
        face = "bold", # bold typeface
        hjust = 0, # left align
        margin = margin(t = 5, r = 0, b = 5, l = 0)
      ), # margin
      plot.title.position = "plot",
      plot.subtitle = element_text( # subtitle
        family = font, # font family
        size = 14, # font size
        hjust = 0, # left align
        margin = margin(t = 5, r = 0, b = 10, l = 0)
      ), # margin

      plot.caption = element_text( # caption
        family = font, # font family
        size = 9, # font size
        hjust = 1
      ), # right align

      plot.caption.position = "plot", # right align

      axis.title = element_text( # axis titles
        family = "Roboto Condensed", # font family
        size = 12
      ), # font size

      axis.text = element_text( # axis text
        family = "Roboto Condensed", # font family
        size = 9
      ), # font size

      axis.text.x = element_text( # margin for axis text
        margin = margin(5, b = 10)
      )

      # since the legend often requires manual tweaking
      # based on plot content, don't define it here
    )
}

## Use available fonts in ggplot text geoms too!
ggplot2::update_geom_defaults(geom = "text", new = list(
  family = "Roboto Condensed",
  face = "plain",
  size = 3.5,
  color = "#2b2b2b"
))

## Set the theme
ggplot2::theme_set(new = theme_custom())

2 Introduction

The extent of Antarctic Sea Ice over time is monitored by the National Snow and Ice Data Center https://nsidc.org/.

3 Read the Data

NoteExcel Data

The data is an excel sheet. Inspect it first in Excel and decide which sheet you need, and which part of the data you need. There are multiple sheets! Then use readxl::read_xlsx(..) to read it into R. NOTE: The sheet that contains our data of interest is titled “SH-Daily-Extent”.

4 Inspect the Data

Appreciate the structure of this data. You may even want to open it in Excel for a closer look. List any imperfections in your Data Dictionary. Why do these matter now? Why might they not have mattered earlier, up to now?

5 Data Dictionary

NoteQuantitative Variables

Write in.

NoteQualitative Variables

Write in.

NoteObservations

Write in.

6 Analyse/Transform the Data

Try to figure what may be needed, based on the imperfections noted above, what you may attempt to clean the data. Refer to your “list of imperfections” in the data.

Then look at the code below and execute line by line to get an idea.

```{r}
#| label: data-preprocessing
#
# Write in your code here
# to prepare this data as shown below
# to generate the plot that follows
```
Show the Code
ice %>%
  # Select columns
  # Rename some while selecting !!
  select("month" = ...1, "day" = ...2, c(4:49)) %>%
  # Fill the month column! Yes!!
  tidyr::fill(month) %>%
  # Make Wide Data into Long
  pivot_longer(
    cols = -c(month, day),
    names_to = "series",
    values_to = "values"
  ) %>%
  # Regular Munging
  mutate(
    series = as.integer(series),
    month = factor(month,
      levels = month.name,
      labels = month.name,
      ordered = TRUE
    ),
    # Note munging for date!!
    # Using the lubridate package, part of tidyverse
    date = lubridate::make_date(
      year = series,
      month = month,
      day = day
    )
  ) -> ice_prepared

ice_prepared

7 Research Question

Note

Write in! Look first at the graph!

8 Plot the Data

Error in `position_dodge()`:
! `orientation` must be a string or character vector.

9 Tasks and Discussion

  • Complete the Data Dictionary.
  • Select and Transform the variables as shown.
  • Create the graphs shown and discuss the following questions:
    • Identify the type of charts
    • Identify the variables used for various geometrical aspects (x, y, fill…). Name the variables appropriately.
    • What research activity might have been carried out to obtain the data graphed here? Provide some details.
    • What might have been the Hypothesis/Research Question to which the response was Chart?
    • What might the red points represent?
    • What is perhaps a befuddling aspect of this graph until you…Ohhh!!!!!!
    • Draw a sketch of a similar chart for ice extents in the Arctic.
Back to top
Legionnaire’s Disease in the USA
William Farr’s Observations on Cholera in London

License: CC BY-SA 2.0

Website made with ❤️ and Quarto, by Arvind V.

Hosted by Netlify .