The Mad Hatter’s Guide to Data Viz and Stats in R
  1. Seattle Bicycle Zones
  • Data Viz and Stats
    • Tools
      • Introduction to R and RStudio
    • Descriptive Analytics
      • Data
      • Inspect Data
      • Graphs
      • Summaries
      • Counts
      • Quantities
      • Groups
      • Distributions
      • Groups and Distributions
      • Change
      • Proportions
      • Parts of a Whole
      • Evolution and Flow
      • Ratings and Rankings
      • Surveys
      • Time
      • Space
      • Networks
      • Miscellaneous Graphing Tools, and References
    • Inference
      • Basics of Statistical Inference
      • 🎲 Samples, Populations, Statistics and Inference
      • Basics of Randomization Tests
      • Inference for a Single Mean
      • Inference for Two Independent Means
      • Inference for Comparing Two Paired Means
      • Comparing Multiple Means with ANOVA
      • Inference for Correlation
      • Testing a Single Proportion
      • Inference Test for Two Proportions
    • Modelling
      • Modelling with Linear Regression
      • Modelling with Logistic Regression
      • 🕔 Modelling and Predicting Time Series
    • Workflow
      • Facing the Abyss
      • I Publish, therefore I Am
      • Data Carpentry
    • Arts
      • Colours
      • Fonts in ggplot
      • Annotating Plots: Text, Labels, and Boxes
      • Annotations: Drawing Attention to Parts of the Graph
      • Highlighting parts of the Chart
      • Changing Scales on Charts
      • Assembling a Collage of Plots
      • Making Diagrams in R
    • AI Tools
      • Using gander and ellmer
      • Using Github Copilot and other AI tools to generate R code
      • Using LLMs to Explain Stat models
    • Case Studies
      • Demo:Product Packaging and Elderly People
      • Ikea Furniture
      • Movie Profits
      • Gender at the Work Place
      • Heptathlon
      • School Scores
      • Children's Games
      • Valentine’s Day Spending
      • Women Live Longer?
      • Hearing Loss in Children
      • California Transit Payments
      • Seaweed Nutrients
      • Coffee Flavours
      • Legionnaire’s Disease in the USA
      • Antarctic Sea ice
      • William Farr's Observations on Cholera in London
    • Projects
      • Project: Basics of EDA #1
      • Project: Basics of EDA #2
      • Experiments

On this page

  • 1 Setting up R Packages
  • 2 Introduction
  • 3 Read the Data
  • 4 Inspect the Data
  • 5 Data Dictionary
  • 6 Data Munging
  • 7 Research Question
  • 8 Plot the Data
  • 9 Tasks and Discussion

Seattle Bicycle Zones

1 Setting up R Packages

library(tidyverse)
library(mosaic)
library(skimr)
library(ggformula)

2 Introduction

This is data pertaining to hourly counts of bicycles and pedestrians at various locations in Seattle, Washington, USA. The data was collected by the City of Seattle and is available on the City of Seattle Open Data Portal. The data includes information on the date and time of the count, the location of the count, and the number of bicycles and pedestrians counted. The data is useful for understanding patterns of bicycle and pedestrian traffic in Seattle, which can inform transportation planning and policy decisions.

3 Read the Data

4 Inspect the Data

Rows: 515,688
Columns: 5
$ date       <chr> "01/01/2014 12:00:00 AM", "01/01/2014 01:00:00 AM", "01/01/…
$ crossing   <chr> "Broadway Cycle Track North Of E Union St", "Broadway Cycle…
$ direction  <chr> "North", "North", "North", "North", "North", "North", "Nort…
$ bike_count <dbl> 0, 3, 0, 0, 0, 0, 0, 0, 2, 0, 5, 0, 7, 4, 6, 6, 1, 4, 3, 0,…
$ ped_count  <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…

5 Data Dictionary

NoteQuantitative Variables

Write in.

NoteQualitative Variables

Write in.

NoteObservations

Write in.

6 Data Munging

We are going to develop three data-frames from this data, to capture different kinds of time and space averages.

  1. df_bikes_grouped groups the data by crossing and calculates the average bike count for each crossing. It also creates new variables for date, hour, day of the week, term (weekend/weekday), month, season, and a cleaned-up version of the crossing names. The data is filtered to remove extreme values (bike counts greater than 2000) and years after 2017. The average bike count for each crossing is calculated and stored in crossing_avg.
df_bikes_crossing <- df_bikes_modified %>%
  filter(
    bike_count < 2000,
    lubridate::year(date) < 2018
  ) %>%
  group_by(crossing) %>%
  # mutate(crossing_avg = mean(bike_count, na.rm = T))
  summarize(bike_avg_crossing_overall = mean(bike_count, na.rm = T))

df_bikes_crossing
  1. df_bikes_month calculates the average bike count for each crossing, month, hour, and type_of_day (weekend/weekday). It also calculates the difference between the average bike count and the mean bike count for that crossing. The difference is capped at 2.5 to avoid extreme values.

7 Research Question

Note

Write in! Look at the graph below and “reverse-engineer” the Research Question!

8 Plot the Data

9 Tasks and Discussion

  • Complete the Data Dictionary.
  • Select and Transform the variables as shown.
  • Create the graphs shown and discuss the following questions:
    • Identify the type of charts
    • Identify the variables used for various geometrical aspects (x, y, fill…). Name the variables appropriately.

The chart above depicts hourly, weekly, and monthly activity of bicycle traffic at several key intersections in the city of Seattle, Canada. - Discuss the kind of chart. - How were the small multiples obtained? - Discuss hourly, weekly, and monthly trends of the traffic. - If a large street festival was to be planned, which street intersection and which day would you choose? Justify based on the chart.(heat map over time)

  • What research activity might have been carried out to obtain the data graphed here? Provide some details.
  • What might have been the Hypothesis/Research Question to which the response was Chart?
  • Write a 2-line story based on the chart, describing your inference/surprise.
  • Based on the diagram, discuss which one an elderly person might try if they are deficient in calcium. If you were trying to avoid carbs, which seaweed sushi would you try?
Back to top

License: CC BY-SA 2.0

Website made with ❤️ and Quarto, by Arvind V.

Hosted by Netlify .