The Mad Hatter’s Guide to Data Viz and Stats in R
  1. Data Viz and Stats
  2. Case Studies
  3. School Scores
  • Data Viz and Stats
    • Tools
      • Introduction to R and RStudio
    • Descriptive Analytics
      • Data
      • Inspect Data
      • Graphs
      • Summaries
      • Counts
      • Quantities
      • Groups
      • Distributions
      • Groups and Distributions
      • Change
      • Proportions
      • Parts of a Whole
      • Evolution and Flow
      • Ratings and Rankings
      • Surveys
      • Time
      • Space
      • Networks
      • Miscellaneous Graphing Tools, and References
    • Inference
      • Basics of Statistical Inference
      • 🎲 Samples, Populations, Statistics and Inference
      • Basics of Randomization Tests
      • Inference for a Single Mean
      • Inference for Two Independent Means
      • Inference for Comparing Two Paired Means
      • Comparing Multiple Means with ANOVA
      • Inference for Correlation
      • Testing a Single Proportion
      • Inference Test for Two Proportions
    • Modelling
      • Modelling with Linear Regression
      • Modelling with Logistic Regression
      • 🕔 Modelling and Predicting Time Series
    • Workflow
      • Facing the Abyss
      • I Publish, therefore I Am
      • Data Carpentry
    • Arts
      • Colours
      • Fonts in ggplot
      • Annotating Plots: Text, Labels, and Boxes
      • Annotations: Drawing Attention to Parts of the Graph
      • Highlighting parts of the Chart
      • Changing Scales on Charts
      • Assembling a Collage of Plots
      • Making Diagrams in R
    • AI Tools
      • Using gander and ellmer
      • Using Github Copilot and other AI tools to generate R code
      • Using LLMs to Explain Stat models
    • Case Studies
      • Demo:Product Packaging and Elderly People
      • Ikea Furniture
      • Movie Profits
      • Gender at the Work Place
      • Heptathlon
      • School Scores
      • Children's Games
      • Valentine’s Day Spending
      • Women Live Longer?
      • Hearing Loss in Children
      • California Transit Payments
      • Seaweed Nutrients
      • Coffee Flavours
      • Legionnaire’s Disease in the USA
      • Antarctic Sea ice
      • William Farr's Observations on Cholera in London
    • Projects
      • Project: Basics of EDA #1
      • Project: Basics of EDA #2
      • Experiments

On this page

  • 1 Setting up R Packages
  • 2 Introduction
  • 3 Read the Data
  • 4 Inspect and Clean the Data
  • 5 Data Dictionary
  • 6 Analyse the Data
  • 7 Plot the Data: All Subjects
  • 8 Plot the Data: Maths vs Family Income
  • 9 Task and Discussion
  1. Data Viz and Stats
  2. Case Studies
  3. School Scores

School Scores

1 Setting up R Packages

library(tidyverse)
library(mosaic)
library(skimr)
library(ggformula)
library(GGally)

Plot Fonts and Theme

Show the Code
library(systemfonts)
library(showtext)
## Clean the slate
systemfonts::clear_local_fonts()
systemfonts::clear_registry()
##
showtext_opts(dpi = 96) # set DPI for showtext
sysfonts::font_add(
  family = "Alegreya",
  regular = "../../../../../../fonts/Alegreya-Regular.ttf",
  bold = "../../../../../../fonts/Alegreya-Bold.ttf",
  italic = "../../../../../../fonts/Alegreya-Italic.ttf",
  bolditalic = "../../../../../../fonts/Alegreya-BoldItalic.ttf"
)

sysfonts::font_add(
  family = "Roboto Condensed",
  regular = "../../../../../../fonts/RobotoCondensed-Regular.ttf",
  bold = "../../../../../../fonts/RobotoCondensed-Bold.ttf",
  italic = "../../../../../../fonts/RobotoCondensed-Italic.ttf",
  bolditalic = "../../../../../../fonts/RobotoCondensed-BoldItalic.ttf"
)
showtext_auto(enable = TRUE) # enable showtext
##
theme_custom <- function() {
  font <- "Alegreya" # assign font family up front
  "%+replace%" <- ggplot2::"%+replace%" # nolint

  theme_classic(base_size = 14, base_family = font) %+replace% # replace elements we want to change

    theme(
      text = element_text(family = font), # set base font family

      # text elements
      plot.title = element_text( # title
        family = font, # set font family
        size = 24, # set font size
        face = "bold", # bold typeface
        hjust = 0, # left align
        margin = margin(t = 5, r = 0, b = 5, l = 0)
      ), # margin
      plot.title.position = "plot",
      plot.subtitle = element_text( # subtitle
        family = font, # font family
        size = 14, # font size
        hjust = 0, # left align
        margin = margin(t = 5, r = 0, b = 10, l = 0)
      ), # margin

      plot.caption = element_text( # caption
        family = font, # font family
        size = 9, # font size
        hjust = 1
      ), # right align

      plot.caption.position = "plot", # right align

      axis.title = element_text( # axis titles
        family = "Roboto Condensed", # font family
        size = 12
      ), # font size

      axis.text = element_text( # axis text
        family = "Roboto Condensed", # font family
        size = 9
      ), # font size

      axis.text.x = element_text( # margin for axis text
        margin = margin(5, b = 10)
      )

      # since the legend often requires manual tweaking
      # based on plot content, don't define it here
    )
}

## Use available fonts in ggplot text geoms too!
ggplot2::update_geom_defaults(geom = "text", new = list(
  family = "Roboto Condensed",
  face = "plain",
  size = 3.5,
  color = "#2b2b2b"
))
ggplot2::update_geom_defaults(geom = "label", new = list(
  family = "Roboto Condensed",
  face = "plain",
  size = 3.5,
  color = "#2b2b2b"
))

## Set the theme
ggplot2::theme_set(new = theme_custom())

2 Introduction

This dataset pertains to scores obtained by students in diverse subjects. Family Income is also part of this dataset.

3 Read the Data

4 Inspect and Clean the Data

Hint: Use the janitor package here to clean up the variable names. Try to use the big_camel case name format for variables.

Rows: 577
Columns: 99
$ Year                                              <dbl> 2005, 2005, 2005, 20…
$ StateCode                                         <chr> "AL", "AK", "AZ", "A…
$ StateName                                         <chr> "Alabama", "Alaska",…
$ TotalMath                                         <dbl> 559, 519, 530, 552, …
$ TotalTestTakers                                   <dbl> 3985, 3996, 18184, 1…
$ TotalVerbal                                       <dbl> 567, 523, 526, 563, …
$ AcademicSubjectsArtsMusicAverageGpa               <dbl> 3.92, 3.76, 3.85, 3.…
$ AcademicSubjectsArtsMusicAverageYears             <dbl> 2.2, 1.9, 2.1, 2.2, …
$ AcademicSubjectsEnglishAverageGpa                 <dbl> 3.53, 3.35, 3.45, 3.…
$ AcademicSubjectsEnglishAverageYears               <dbl> 3.9, 3.9, 3.9, 4.0, …
$ AcademicSubjectsForeignLanguagesAverageGpa        <dbl> 3.54, 3.34, 3.41, 3.…
$ AcademicSubjectsForeignLanguagesAverageYears      <dbl> 2.6, 2.1, 2.6, 2.6, …
$ AcademicSubjectsMathematicsAverageGpa             <dbl> 3.41, 3.06, 3.25, 3.…
$ AcademicSubjectsMathematicsAverageYears           <dbl> 4.0, 3.5, 3.9, 4.1, …
$ AcademicSubjectsNaturalSciencesAverageGpa         <dbl> 3.52, 3.25, 3.43, 3.…
$ AcademicSubjectsNaturalSciencesAverageYears       <dbl> 3.9, 3.2, 3.4, 3.7, …
$ AcademicSubjectsSocialSciencesHistoryAverageGpa   <dbl> 3.59, 3.39, 3.55, 3.…
$ AcademicSubjectsSocialSciencesHistoryAverageYears <dbl> 3.9, 3.4, 3.3, 3.6, …
$ FamilyIncomeBetween20_40KMath                     <dbl> 513, 492, 498, 513, …
$ FamilyIncomeBetween20_40KTestTakers               <dbl> 324, 401, 2121, 180,…
$ FamilyIncomeBetween20_40KVerbal                   <dbl> 527, 500, 495, 526, …
$ FamilyIncomeBetween40_60KMath                     <dbl> 539, 517, 520, 543, …
$ FamilyIncomeBetween40_60KTestTakers               <dbl> 442, 539, 2270, 245,…
$ FamilyIncomeBetween40_60KVerbal                   <dbl> 551, 522, 518, 555, …
$ FamilyIncomeBetween60_80KMath                     <dbl> 550, 513, 524, 553, …
$ FamilyIncomeBetween60_80KTestTakers               <dbl> 473, 603, 2372, 227,…
$ FamilyIncomeBetween60_80KVerbal                   <dbl> 564, 519, 523, 570, …
$ FamilyIncomeBetween80_100KMath                    <dbl> 566, 528, 534, 570, …
$ FamilyIncomeBetween80_100KTestTakers              <dbl> 475, 444, 1866, 147,…
$ FamilyIncomeBetween80_100KVerbal                  <dbl> 577, 534, 533, 580, …
$ FamilyIncomeLessThan20KMath                       <dbl> 462, 464, 485, 489, …
$ FamilyIncomeLessThan20KTestTakers                 <dbl> 175, 191, 891, 107, …
$ FamilyIncomeLessThan20KVerbal                     <dbl> 474, 467, 474, 486, …
$ FamilyIncomeMoreThan100KMath                      <dbl> 588, 541, 554, 572, …
$ FamilyIncomeMoreThan100KTestTakers                <dbl> 980, 540, 3083, 314,…
$ FamilyIncomeMoreThan100KVerbal                    <dbl> 590, 544, 546, 589, …
$ GpaAMinusMath                                     <dbl> 569, 544, 541, 559, …
$ GpaAMinusTestTakers                               <dbl> 724, 673, 3334, 298,…
$ GpaAMinusVerbal                                   <dbl> 575, 546, 535, 572, …
$ GpaAPlusMath                                      <dbl> 622, 600, 605, 629, …
$ GpaAPlusTestTakers                                <dbl> 563, 173, 1684, 273,…
$ GpaAPlusVerbal                                    <dbl> 623, 604, 593, 639, …
$ GpaAMath                                          <dbl> 600, 580, 571, 579, …
$ GpaATestTakers                                    <dbl> 1032, 671, 3854, 457…
$ GpaAVerbal                                        <dbl> 608, 578, 563, 583, …
$ GpaBMath                                          <dbl> 514, 492, 498, 492, …
$ GpaBTestTakers                                    <dbl> 1253, 1622, 7193, 43…
$ GpaBVerbal                                        <dbl> 525, 499, 499, 511, …
$ GpaCMath                                          <dbl> 436, 466, 458, 419, …
$ GpaCTestTakers                                    <dbl> 188, 418, 1184, 57, …
$ GpaCVerbal                                        <dbl> 451, 472, 464, 436, …
$ GpaDOrLowerMath                                   <dbl> 0, 424, 439, 0, 419,…
$ GpaDOrLowerTestTakers                             <dbl> 0, 12, 16, 0, 240, 1…
$ GpaDOrLowerVerbal                                 <dbl> 0, 466, 435, 0, 408,…
$ GpaNoResponseMath                                 <dbl> 0, 0, 0, 0, 0, 0, 0,…
$ GpaNoResponseTestTakers                           <dbl> 225, 427, 919, 78, 1…
$ GpaNoResponseVerbal                               <dbl> 0, 0, 0, 0, 0, 0, 0,…
$ GenderFemaleMath                                  <dbl> 538, 505, 513, 536, …
$ GenderFemaleTestTakers                            <dbl> 2072, 2161, 9806, 85…
$ GenderFemaleVerbal                                <dbl> 561, 521, 522, 558, …
$ GenderMaleMath                                    <dbl> 582, 535, 549, 570, …
$ GenderMaleTestTakers                              <dbl> 1913, 1835, 8378, 74…
$ GenderMaleVerbal                                  <dbl> 574, 526, 531, 570, …
$ ScoreRangesBetween200To300MathFemales             <dbl> 22, 30, 119, 12, 297…
$ ScoreRangesBetween200To300MathMales               <dbl> 10, 20, 72, 7, 1453,…
$ ScoreRangesBetween200To300MathTotal               <dbl> 32, 50, 191, 19, 443…
$ ScoreRangesBetween200To300VerbalFemales           <dbl> 14, 26, 115, 9, 3382…
$ ScoreRangesBetween200To300VerbalMales             <dbl> 17, 26, 86, 3, 2433,…
$ ScoreRangesBetween200To300VerbalTotal             <dbl> 31, 52, 201, 12, 581…
$ ScoreRangesBetween300To400MathFemales             <dbl> 173, 233, 881, 68, 1…
$ ScoreRangesBetween300To400MathMales               <dbl> 93, 153, 450, 31, 71…
$ ScoreRangesBetween300To400MathTotal               <dbl> 266, 386, 1331, 99, …
$ ScoreRangesBetween300To400VerbalFemales           <dbl> 123, 218, 739, 46, 1…
$ ScoreRangesBetween300To400VerbalMales             <dbl> 84, 171, 613, 42, 10…
$ ScoreRangesBetween300To400VerbalTotal             <dbl> 207, 389, 1352, 88, …
$ ScoreRangesBetween400To500MathFemales             <dbl> 514, 696, 3215, 210,…
$ ScoreRangesBetween400To500MathMales               <dbl> 293, 485, 1948, 137,…
$ ScoreRangesBetween400To500MathTotal               <dbl> 807, 1181, 5163, 347…
$ ScoreRangesBetween400To500VerbalFemales           <dbl> 430, 656, 3048, 183,…
$ ScoreRangesBetween400To500VerbalMales             <dbl> 332, 552, 2398, 141,…
$ ScoreRangesBetween400To500VerbalTotal             <dbl> 762, 1208, 5446, 324…
$ ScoreRangesBetween500To600MathFemales             <dbl> 722, 813, 3576, 316,…
$ ScoreRangesBetween500To600MathMales               <dbl> 614, 616, 3152, 244,…
$ ScoreRangesBetween500To600MathTotal               <dbl> 1336, 1429, 6728, 56…
$ ScoreRangesBetween500To600VerbalFemales           <dbl> 690, 729, 3661, 302,…
$ ScoreRangesBetween500To600VerbalMales             <dbl> 617, 596, 3101, 236,…
$ ScoreRangesBetween500To600VerbalTotal             <dbl> 1307, 1325, 6762, 53…
$ ScoreRangesBetween600To700MathFemales             <dbl> 485, 342, 1688, 204,…
$ ScoreRangesBetween600To700MathMales               <dbl> 611, 445, 2126, 239,…
$ ScoreRangesBetween600To700MathTotal               <dbl> 1096, 787, 3814, 443…
$ ScoreRangesBetween600To700VerbalFemales           <dbl> 596, 423, 1831, 242,…
$ ScoreRangesBetween600To700VerbalMales             <dbl> 613, 375, 1679, 226,…
$ ScoreRangesBetween600To700VerbalTotal             <dbl> 1209, 798, 3510, 468…
$ ScoreRangesBetween700To800MathFemales             <dbl> 156, 47, 327, 49, 54…
$ ScoreRangesBetween700To800MathMales               <dbl> 292, 116, 630, 83, 8…
$ ScoreRangesBetween700To800MathTotal               <dbl> 448, 163, 957, 132, …
$ ScoreRangesBetween700To800VerbalFemales           <dbl> 219, 109, 412, 77, 5…
$ ScoreRangesBetween700To800VerbalMales             <dbl> 250, 115, 501, 93, 4…
$ ScoreRangesBetween700To800VerbalTotal             <dbl> 469, 224, 913, 170, …

5 Data Dictionary

NoteQuantitative Variables

Write in.

NoteQualitative Variables

Write in.

NoteObservations

Write in.

6 Analyse the Data

```{r}
#| label: data-preprocessing
#
# Write in your code here
# to prepare this data as shown below
# to generate the plot that follows
```

7 Plot the Data: All Subjects

8 Plot the Data: Maths vs Family Income

Error in `position_dodge()`:
! `orientation` must be a string or character vector.

9 Task and Discussion

Complete the Data Dictionary. Select and Transform the variables as shown. Create the graphs shown below and discuss the following questions:

  • Identify the type of charts
  • Identify the variables used for various geometrical aspects (x, y, fill…). Name the variables appropriately.
  • What activity might have been carried out to obtain the data graphed here? Provide some details.
  • What might have been the Hypothesis/Research Question to which the response was Chart #1?
  • And Chart #2
  • Write a 2-line story based on each of the graphs, describing your inference/surprise.
Back to top
Heptathlon
Children’s Games

License: CC BY-SA 2.0

Website made with ❤️ and Quarto, by Arvind V.

Hosted by Netlify .