The Mad Hatter’s Guide to Data Viz and Stats in R
  1. Data Viz and Stats
  • Data Viz and Stats
    • Tools
      • Introduction to R and RStudio
    • Descriptive Analytics
      • Data
      • Inspect Data
      • Graphs
      • Summaries
      • Counts
      • Quantities
      • Groups
      • Distributions
      • Groups and Distributions
      • Change
      • Proportions
      • Parts of a Whole
      • Evolution and Flow
      • Ratings and Rankings
      • Surveys
      • Time
      • Space
      • Networks
      • Miscellaneous Graphing Tools, and References
    • Inference
      • Basics of Statistical Inference
      • 🎲 Samples, Populations, Statistics and Inference
      • Basics of Randomization Tests
      • Inference for a Single Mean
      • Inference for Two Independent Means
      • Inference for Comparing Two Paired Means
      • Comparing Multiple Means with ANOVA
      • Inference for Correlation
      • Testing a Single Proportion
      • Inference Test for Two Proportions
    • Modelling
      • Modelling with Linear Regression
      • Modelling with Logistic Regression
      • 🕔 Modelling and Predicting Time Series
    • Workflow
      • Facing the Abyss
      • I Publish, therefore I Am
      • Data Carpentry
    • Arts
      • Colours
      • Fonts in ggplot
      • Annotating Plots: Text, Labels, and Boxes
      • Annotations: Drawing Attention to Parts of the Graph
      • Highlighting parts of the Chart
      • Changing Scales on Charts
      • Assembling a Collage of Plots
      • Making Diagrams in R
    • AI Tools
      • Using gander and ellmer
      • Using Github Copilot and other AI tools to generate R code
      • Using LLMs to Explain Stat models
    • Case Studies
      • Demo:Product Packaging and Elderly People
      • Ikea Furniture
      • Movie Profits
      • Gender at the Work Place
      • Heptathlon
      • School Scores
      • Children’s Games
      • Valentine’s Day Spending
      • Women Live Longer?
      • Hearing Loss in Children
      • California Transit Payments
      • Seaweed Nutrients
      • Coffee Flavours
      • Legionnaire’s Disease in the USA
      • Antarctic Sea ice
      • William Farr’s Observations on Cholera in London
    • Projects
      • Project: Basics of EDA #1
      • Project: Basics of EDA #2
      • Experiments

On this page

  • 1 Abstract
  • 2 What you will learn
  • 3 Our Tools
  • 4 References
  • 5 Pedagogical Notes
  • 6 Learning R with AI
  • 7 Business Analytics Courses elsewhere
  • 8 Modules
Categories
All (9)

Exploratory Data Analysis and Statistical Inference in R

Published

November 22, 2022

It is a truth universally acknowledged, that a Srishti Art and Design student, in possession of a good Mac, must be terrified of coding.

-Code and Prejudice, Jane Austen, 1813

1 Abstract

This Course takes Business Practitioners and Art-Design Peasants on a journey of Exploratory Data Analysis (EDA) and Statistics: using data to derive insights, make predictions, and decide on plans of action that can be communicated and actualized in a Design and Business context.

“EDA is the process of examining the data at hand, to find patterns in it, and to obtain insights. It is not a rigid process, but a fluid iterative one, where in one chases down alleyways and cul-de-sacs of inquiry, and sometimes comes back empty-handed. But in the end, one has a thorough understanding of the data structure and its variables: their values, their measures and variations, and their possible relationships. One also develops hypotheses for modelling, statistical investigation and plans for the application of machine learning procedures to the data to enable decision making.”


The Course starts with Descriptive Analytics / EDA: Datasets from various domains of business, design, and scientific activity are introduced. The datasets are motivated from the point of view of the types of information they contain: students will relate the Data Variables (Qualitative and Quantitative) to various types of Data/Information Visualizations.

Statistical Concepts such as Sampling, Hypothesis Tests, Simulation / Modelling, and Uncertainty will be introduced. Students will be motivated to set up their own Hypotheses, which they would answer after analyzing the data that they would gather using a random sampling experiment;

Predictive Analytics will take us into looking at Data and training standard ML algorithms to make predictions with new Data. Regression, Clustering, and Classification will be covered.

Prescriptive Analytics will deal with coming to terms with the uncertainty in Predictions, and using tools such as both ML, Linear/non-Linear Programming, and Decision-Making to make Business Decisions, with an assessment of the Risks involved.

The Course will include in a full Exploratory Data Analysis Workflow using RStudio and Quarto that includes Data Gathering and Cleaning, Descriptive and Predictive Analytics, Prescriptive Analytics and Decision Making, and Communication resulting in a publication-worthy documents (HTML / PDF/ Word) and/or on a personal website.

A diagram from a very popular textbook by Ismay and Kim may be relevant here:

2 What you will learn

  • Data Basics: What does data look like and why should we care?
  • Understand the R language, and appreciate how close it is to plain English, for the most part
  • Rapidly and intuitively creating Graphs and Data Visualizations using geometric metaphors to explore data for insights,
  • Use Statistical Tests, Procedures, Models, and Simulations and to answer Business and Design Questions
  • Create crisp and readable Reports with Quarto that can be shared in a Design and Business Context
  • Using ML algorithms such Regression, Classification, and Clustering to develop Business Insights
  • Use Linear Programming to make Business Decisions
  • Use Git and GitHub to manage your code and website versions, and to share your work with others
  • Create a Personal Website to showcase your work
  • Using AI tools in RStudio where appropriate for code and text completion, and code development using prompts in English (Hindi, Kannada, Tamil, Gujarati, Tulu…)

3 Our Tools

  1. R https://cran.r-project.org/ and RStudio https://posit.co/

    R is a freely available language and environment for statistical computing and graphics which provides a wide variety of statistical and graphical techniques: linear and nonlinear modelling, statistical tests, time series analysis, classification, clustering,etc. RStudio is an integrated development environment (IDE) for R and Python.

  2. Quarto https://quarto.org

    Quarto is an open-source scientific and technical publishing system built on Pandoc, and integrated into RStudio. It extends the functionality of R Markdown, Jupyter Notebooks, and other markdown formats to provide a unified authoring framework for data science and scientific communication. Quarto supports multiple programming languages, including R, Python, Julia, and Observable JavaScript.

  3. (Very Foreseeable Future): Positron IDE https://positron.posit.co/

    Positron IDE is a new IDE from Posit, in public beta as of June 2024. Positron unifies exploration and production work in one free, AI-powered environment, empowering the full spectrum of data science in Python and R.

  4. (In a No-Code version of this course): Orange Data Mining https://orangedatamining.com/

    Orange is also a FOSS visual point-and-click software for Data Mining and ML, developed at the University of Slovenia, in Ljubljana, Slovenia. It can be thought of like Photoshop for Data Science. It is component-based software, with a large toolbox of components (called widgets) for data preprocessing, feature scoring and filtering, modeling, model evaluation, and exploration techniques. It includes a wide range of standard ML algorithms, and can also be extended using Python scripts. Orange is used for both research and education at many universities worldwide.


  5. Radiant – Business analytics using R and Shiny https://radiant-rstats.github.io/docs/index.html

    Radiant is a FOSS platform-independent browser-based interface for business analytics in R, developed at the University of San Diego. The application is based on the Shiny package and can be run using R, or in your browser with no installation required. The tool automatically installs a version of R and adds a Shiny-based GUI that removes the need to write R-code. Radiant can also be installed on top of an existing installation of R and invoked from within RStudio.


4 References

Visualization and R language

  1. Hadley Wickham, Mine Cetinkaya-Rundel, and Garett Grolemund. R for Data Science (2e). https://r4ds.hadley.nz. The most important reference for data visualization and analysis in R. Available free online.
  2. Robert Kabacoff. Modern Data Visualization with R. https://rkabacoff.github.io/datavis/. Available free online.
  3. Jack Dougherty and Ilya Ilyankou, Hands-On Data Visualization: Interactive Storytelling from Spreadsheets to Code, https://handsondataviz.org/. Available free online.
  4. Claus O. Wilke, Fundamentals of Data Visualization, https://clauswilke.com/dataviz/. Available free online.
  5. Jonathan Schwabish, Better Data Visualizations: A Guide for Scholars, Researchers, and Wonks, Columbia University Press, 2021.
  6. Alberto Cairo, The Functional Art: An introduction to information graphics and visualization, New Riders. 2013. ISBN-9780133041361.
  7. Cole Nussbaumer Knaflic, Storytelling With Data: A Data Visualization Guide for Business Professionals, Wiley 2015. ISBN-9781119002253.
  8. Reza Mohammadi(2025). Data Science Foundations and Machine Learning with R: From Data to Decisions. https://book-data-science-r.netlify.app/

Analytics

  1. Judd, C.M., McClelland, G.H., & Ryan, C.S. (2017). Data Analysis: A Model Comparison Approach To Regression, ANOVA, and Beyond. Third Edition (3rd ed.). Routledge. https://doi.org/10.4324/9781315744131
  2. Thomas Maydon, The 4 Types of Data Analytics. https://www.kdnuggets.com/2017/07/4-types-data-analytics.html
  3. Keith McNulty. Handbook of Regression Modeling in People Analytics: With Examples in R, Python and Julia https://peopleanalytics-regression-book.org. Available free online.
  4. James R Evans, Business Analytics: Methods, Models, and Decisions, Pearson Education, 2021.
  5. Dimitris Bertsimas, Robert Freund, Data, Models, and Decisions: the Fundamentals of Management Science, Dynamic Ideas Press, 2004.
  6. Cliff T. Ragsdale, Spreadsheet Modeling & Decision Analysis: A Practical Introduction to Management Science, South Western, Cengage Learning, Mason, OH, 2012.

Statistics

  1. Mine Cetinkaya-Rundel, Johanna Hardin. Introduction to Modern Statistics. https://openintro-ims2.netlify.app. Available free online.
  2. Daniel T. Kaplan. Statistical Models (second edition). https://dtkaplan.github.io/SM2-bookdown/. Available free online.
  3. Daniel T. Kaplan, Compact Introduction to Classical Inference, 2020. https://dtkaplan.github.io/CompactInference/. Available free online.
  4. Daniel T. Kaplan and Frank Shaw, Statistical Modeling: Computational Technique. https://www.mosaic-web.org/go/SM2-technique/. Available free online.
  5. Jonas Kristoffer Lindeløv. Common statistical tests are linear models (or: how to teach stats). https://lindeloev.github.io/tests-as-linear/. Available free online.

5 Pedagogical Notes

Why this course is what it is, and why it does what it does! Read this only if you are planning to be an educator yourself, or if you want to understand why your Teacher is…like this!!

PRIMM

The method followed will be based on PRIMM:

  • PREDICT: Inspect the code and guess at what the code might do, write predictions
  • RUN: the code provided and check what happens
  • INFER: what the parameters of the code do and write comments to explain. What bells and whistles can you see?
  • MODIFY: the parameters code provided to understand the options available. Write comments to show what you have aimed for and achieved.
  • MAKE: take an idea/concept of your own, and graph it.

From https://primmportal.com, used without permission

From https://primmportal.com, used without permission

So in this course, wherever you see “YOUR TURN”, please respond with questions of the data, explanations, more questions and if you are already confident, code chunks to create new calculations and graphs.

And why teach R in this way?

Because we all know two things:

  • Pretty decent English
  • \(y = mx + c\)

Let us hear from Amelia McNamara:


6 Learning R with AI

Of course.

  1. https://openai.com/index/improvements-to-data-analysis-in-chatgpt/
  2. https://rtutor.ai
  3. https://intro2r.library.duke.edu/ai.html
  4. https://chatlize.ai

7 Business Analytics Courses elsewhere

  1. University of San Diego, Rady School of Business. http://lab.rady.ucsd.edu/sawtooth/business_analytics_in_r/index.html

8 Modules

Title Date
Tools and Software Dec 31, 2022
Descriptive Analytics Dec 31, 2022
Statistical Inference Nov 30, 2022
Inferential Modelling Apr 13, 2023
Workflow Feb 9, 2022
Artful Stuff in Data Viz Aug 1, 2025
Projects for Peasants Sep 8, 2025
Case Studies Sep 2, 2024
Using AI tools in R Jul 21, 2025
No matching items
Back to top
Tools

License: CC BY-SA 2.0

Website made with ❤️ and Quarto, by Arvind V.

Hosted by Netlify .