The Mad Hatter’s Guide to Data Viz and Stats in R
  1. Tutorial on Inference for Two Paired Means
  • Data Viz and Stats
    • Tools
      • Introduction to R and RStudio
    • Descriptive Analytics
      • Data
      • Inspect Data
      • Graphs
      • Summaries
      • Counts
      • Quantities
      • Groups
      • Distributions
      • Groups and Distributions
      • Change
      • Proportions
      • Parts of a Whole
      • Evolution and Flow
      • Ratings and Rankings
      • Surveys
      • Time
      • Space
      • Networks
      • Miscellaneous Graphing Tools, and References
    • Inference
      • Basics of Statistical Inference
      • 🎲 Samples, Populations, Statistics and Inference
      • Basics of Randomization Tests
      • Inference for a Single Mean
      • Inference for Two Independent Means
      • Inference for Comparing Two Paired Means
      • Comparing Multiple Means with ANOVA
      • Inference for Correlation
      • Testing a Single Proportion
      • Inference Test for Two Proportions
    • Modelling
      • Modelling with Linear Regression
      • Modelling with Logistic Regression
      • 🕔 Modelling and Predicting Time Series
    • Workflow
      • Facing the Abyss
      • I Publish, therefore I Am
      • Data Carpentry
    • Arts
      • Colours
      • Fonts in ggplot
      • Annotating Plots: Text, Labels, and Boxes
      • Annotations: Drawing Attention to Parts of the Graph
      • Highlighting parts of the Chart
      • Changing Scales on Charts
      • Assembling a Collage of Plots
      • Making Diagrams in R
    • AI Tools
      • Using gander and ellmer
      • Using Github Copilot and other AI tools to generate R code
      • Using LLMs to Explain Stat models
    • Case Studies
      • Demo:Product Packaging and Elderly People
      • Ikea Furniture
      • Movie Profits
      • Gender at the Work Place
      • Heptathlon
      • School Scores
      • Children's Games
      • Valentine’s Day Spending
      • Women Live Longer?
      • Hearing Loss in Children
      • California Transit Payments
      • Seaweed Nutrients
      • Coffee Flavours
      • Legionnaire’s Disease in the USA
      • Antarctic Sea ice
      • William Farr's Observations on Cholera in London
    • Projects
      • Project: Basics of EDA #1
      • Project: Basics of EDA #2
      • Experiments

On this page

  • 1 Setting up R Packages
  • 2 Case Study-1: IceCream!!
    • 2.1 Inspecting and Charting Data
    • 2.2 Hypothesis
    • 2.3 Null Distribution Computations
  • 3 Conclusions

Tutorial on Inference for Two Paired Means

Author

Arvind Venkatadri

Published

November 22, 2022

Modified

September 22, 2025

1 Setting up R Packages

library(tidyverse)
library(mosaic)

library(resampledata)

2 Case Study-1: IceCream!!

What is there to not like about icecreams!! Here is a dataset that has data on Sugar and Calories between Vanilla and Chocolate icecreams, across several brands of icecreams. Is this a sample of paired data? Let us check:

2.1 Inspecting and Charting Data

data("IceCream")
IceCream
inspect(IceCream)

categorical variables:  
   name  class levels  n missing                                  distribution
1 Brand factor     39 39       0 Baskin Robbins (2.6%) ...                    

quantitative variables:  
               name   class   min    Q1 median    Q3 max      mean        sd  n
1   VanillaCalories integer 120.0 140.0    160 240.0 307 191.41026 58.644207 39
2        VanillaFat numeric   4.5   7.5      9  15.5  21  11.28718  4.431655 39
3      VanillaSugar numeric  10.0  12.5     17  21.0  27  17.13077  4.841333 39
4 ChocolateCalories integer 120.0 140.0    170 260.0 320 198.74359 63.063342 39
5      ChocolateFat numeric   5.0   7.5      9  14.7  21  11.12051  4.597378 39
6    ChocolateSugar numeric  12.0  15.0     18  22.3  33  18.97436  5.402812 39
  missing
1       0
2       0
3       0
4       0
5       0
6       0

Hmm…the data are about calories, fat, and sugar between two flavours of icecream sold by each brand. There are 39 brands.

Let us plot the data first:

IceCream %>%
  gf_col(fct_reorder(Brand, VanillaCalories) ~ VanillaCalories,
    fill = "red"
  ) %>%
  gf_col(fct_reorder(Brand, VanillaCalories) ~ -ChocolateCalories,
    fill = "green",
    xlab = "Calories", ylab = "Brand",
    title = "Calories across Icecream Brands",
    subtitle = "Vanilla = Red, Green = Chocolate"
  ) %>%
  gf_theme(theme_classic())
IceCream %>%
  gf_col(fct_reorder(Brand, VanillaFat) ~ VanillaFat,
    fill = "red"
  ) %>%
  gf_col(fct_reorder(Brand, VanillaFat) ~ -ChocolateFat,
    fill = "green",
    xlab = "Fat", ylab = "Brand",
    title = "Calories across Icecream Brands",
    subtitle = "Vanilla = Red, Green = Chocolate"
  ) %>%
  gf_theme(theme_classic())
IceCream %>%
  gf_col(fct_reorder(Brand, VanillaSugar) ~ VanillaSugar,
    fill = "red"
  ) %>%
  gf_col(fct_reorder(Brand, VanillaSugar) ~ -ChocolateSugar,
    fill = "green",
    xlab = "Sugar", ylab = "Brand",
    title = "Calories across Icecream Brands",
    subtitle = "Vanilla = Red, Green = Chocolate"
  ) %>%
  gf_theme(theme_classic())

We may hypothesize that say, the fat content in the two flavours might be similar on a per brand basis. That is, if say Baskin Robbins has high sugar in the vanilla flavour, it is likely to have high sugar also in its chocolate flavour.

Let us see what are the observed differences in the mean values of calories, sugar, and fat across brands:

IceCream %>%
  mutate(
    diff_calories = VanillaCalories - ChocolateCalories,
    diff_fat = VanillaFat - ChocolateFat,
    diff_sugar = VanillaSugar - ChocolateSugar
  ) %>%
  summarise(
    mean_diff_calories = mean(diff_calories),
    mean_diff_fat = mean(diff_fat),
    mean_diff_sugar = mean(diff_sugar)
  )

Hmm…while the numbers showing difference in means are quite different, we need to perform tests to infer whether these difference are statistically significant.

2.2 Hypothesis

How do we specify our Hypotheses? (Of course, there is more than one!)

Write the Null and Alternate hypotheses here.

2.3 Null Distribution Computations

How do we compute the NULL distributions, for each of the three components of the ice creams, using pair-wise analysis?

3 Conclusions

So are there significant differences in sugar, fat, and calorie content across the two flavours?

Is this conclusion different if you don’t use paired-data, and just treat the data as independent readings?

Back to top

License: CC BY-SA 2.0

Website made with ❤️ and Quarto, by Arvind V.

Hosted by Netlify .