Groups and Distributions

Qual Variables

Quant Variables

Box Plots

Violin Plots

Author

Arvind V.

Published

November 15, 2022

Modified

October 17, 2025

Abstract

Quant and Qual Variable Graphs and their Siblings

“Keep away from people who try to belittle your ambitions. Small people always do that, but the really great make you feel that you, too, can become great.”

— Mark Twain

1 Setting up R Packages

library(tidyverse)
library(mosaic)
library(ggformula)
library(skimr)
library(janitor) # Data cleaning and tidying package
library(visdat) # Visualize whole dataframes for missing data
library(naniar) # Clean missing data
library(DT) # Interactive Tables for our data
library(tinytable) # Elegant Tables for our data
library(ggrepel) # Repel overlapping text labels away from each other
library(marquee) # To create text marquees in ggplot2

Plot Fonts and Theme

Show the Code

library(systemfonts)
library(showtext)
## Clean the slate
systemfonts::clear_local_fonts()
systemfonts::clear_registry()
##
showtext_opts(dpi = 96) # set DPI for showtext
sysfonts::font_add(
  family = "Alegreya",
  regular = "../../../../../../fonts/Alegreya-Regular.ttf",
  bold = "../../../../../../fonts/Alegreya-Bold.ttf",
  italic = "../../../../../../fonts/Alegreya-Italic.ttf",
  bolditalic = "../../../../../../fonts/Alegreya-BoldItalic.ttf"
)

sysfonts::font_add(
  family = "Roboto Condensed",
  regular = "../../../../../../fonts/RobotoCondensed-Regular.ttf",
  bold = "../../../../../../fonts/RobotoCondensed-Bold.ttf",
  italic = "../../../../../../fonts/RobotoCondensed-Italic.ttf",
  bolditalic = "../../../../../../fonts/RobotoCondensed-BoldItalic.ttf"
)
showtext_auto(enable = TRUE) # enable showtext
##
theme_custom <- function() {
  theme_bw(base_size = 10) +

    # theme(panel.widths = unit(11, "cm"),
    #       panel.heights = unit(6.79, "cm")) + # Golden Ratio

    theme(
      plot.margin = margin_auto(t = 1, r = 2, b = 1, l = 1, unit = "cm"),
      plot.background = element_rect(
        fill = "bisque",
        colour = "black",
        linewidth = 1
      )
    ) +

    theme_sub_axis(
      title = element_text(
        family = "Roboto Condensed",
        size = 10
      ),
      text = element_text(
        family = "Roboto Condensed",
        size = 8
      )
    ) +

    theme_sub_legend(
      text = element_text(
        family = "Roboto Condensed",
        size = 6
      ),
      title = element_text(
        family = "Alegreya",
        size = 8
      )
    ) +

    theme_sub_plot(
      title = element_text(
        family = "Alegreya",
        size = 14, face = "bold"
      ),
      title.position = "plot",
      subtitle = element_text(
        family = "Alegreya",
        size = 10
      ),
      caption = element_text(
        family = "Alegreya",
        size = 6
      ),
      caption.position = "plot"
    )
}

## Use available fonts in ggplot text geoms too!
ggplot2::update_geom_defaults(geom = "text", new = list(
  family = "Roboto Condensed",
  face = "plain",
  size = 3.5,
  color = "#2b2b2b"
))
ggplot2::update_geom_defaults(geom = "label", new = list(
  family = "Roboto Condensed",
  face = "plain",
  size = 3.5,
  color = "#2b2b2b"
))

ggplot2::update_geom_defaults(geom = "marquee", new = list(
  family = "Roboto Condensed",
  face = "plain",
  size = 3.5,
  color = "#2b2b2b"
))
ggplot2::update_geom_defaults(geom = "text_repel", new = list(
  family = "Roboto Condensed",
  face = "plain",
  size = 3.5,
  color = "#2b2b2b"
))
ggplot2::update_geom_defaults(geom = "label_repel", new = list(
  family = "Roboto Condensed",
  face = "plain",
  size = 3.5,
  color = "#2b2b2b"
))

## Set the theme
ggplot2::theme_set(new = theme_custom())

## tinytable options
options("tinytable_tt_digits" = 2)
options("tinytable_format_num_fmt" = "significant_cell")
options(tinytable_html_mathjax = TRUE)


## Set defaults for flextable
flextable::set_flextable_defaults(font.family = "Roboto Condensed")

2 What graphs will we see today?

Variable #1	Variable #2	Chart Names	Chart Shape
Quant	(Qual)	Violin Plot

3 What kind of Data Variables will we choose?

No	Pronoun	Answer	Variable/Scale	Example	What Operations?
1	How Many / Much / Heavy? Few? Seldom? Often? When?	Quantities, with Scale and a Zero Value.Differences and Ratios /Products are meaningful.	Quantitative/Ratio	Length,Height,Temperature in Kelvin,Activity,Dose Amount,Reaction Rate,Flow Rate,Concentration,Pulse,Survival Rate	Correlation

4 Inspiration

Which is the plots above is more evocative of the underlying data? The one which looks like a combo box-plot + density is probably giving us a greater sense of the spread of the data than the good old box plot.

5 How do these Chart(s) Work?

Often one needs to view multiple densities at the same time. Ridge plots of course give us one option, where we get densities of a Quant variable split by a Qual variable. Another option is to generate a density plot facetted into small multiples using a Qual variable.

Yet another plot that allows comparison of multiple densities side by side is a violin plot. The violin plot combines the aspects of a boxplot(ranking of values, median, quantiles…) with a superimposed density plot. This allows us to look at medians, means, densities, and quantiles of a Quant variable with respect to another Qual variable. Let us see what this looks like!

Figure 2: Violin Plots for Normal Variables

In Figure 2, the plots show (very artificial!) distributions of a single Quant variable across levels of another Qual variable. At each level of the Qual variable along the X-axis, we have a violin plot showing the density.

6 Case Study-1: `diamonds` dataset

The diamonds dataset is a classic dataset that contains information about the prices and attributes of over 50,000 diamonds. It includes variables such as carat, cut, color, clarity, and price. We can use this dataset to create violin plots to visualize the distribution of diamond prices across different cuts and clarity levels.

data("diamonds", package = "ggplot2")
glimpse(diamonds)

Rows: 53,940
Columns: 10
$ carat   <dbl> 0.23, 0.21, 0.23, 0.29, 0.31, 0.24, 0.24, 0.26, 0.22, 0.23, 0.…
$ cut     <ord> Ideal, Premium, Good, Premium, Good, Very Good, Very Good, Ver…
$ color   <ord> E, E, E, I, J, J, I, H, E, H, J, J, F, J, E, E, I, J, J, J, I,…
$ clarity <ord> SI2, SI1, VS1, VS2, SI2, VVS2, VVS1, SI1, VS2, VS1, SI1, VS1, …
$ depth   <dbl> 61.5, 59.8, 56.9, 62.4, 63.3, 62.8, 62.3, 61.9, 65.1, 59.4, 64…
$ table   <dbl> 55, 61, 65, 58, 58, 57, 57, 55, 61, 61, 55, 56, 61, 54, 62, 58…
$ price   <int> 326, 326, 327, 334, 335, 336, 336, 337, 337, 338, 339, 340, 34…
$ x       <dbl> 3.95, 3.89, 4.05, 4.20, 4.34, 3.94, 3.95, 4.07, 3.87, 4.00, 4.…
$ y       <dbl> 3.98, 3.84, 4.07, 4.23, 4.35, 3.96, 3.98, 4.11, 3.78, 4.05, 4.…
$ z       <dbl> 2.43, 2.31, 2.31, 2.63, 2.75, 2.48, 2.47, 2.53, 2.49, 2.39, 2.…

There is no particular need to do any munging of this dataset, but for recall and retention purposes, we will do it anyway ;-D

diamonds_modified <- diamonds %>%
  drop_na() %>%
  janitor::clean_names(case = "snake") %>%
  janitor::remove_empty(which = c("rows", "cols")) %>%
  dplyr::mutate(
    across(where(is.character), as.factor)
  ) %>%
  dplyr::relocate(where(is.factor))

diamonds_modified %>% glimpse()

Rows: 53,940
Columns: 10
$ cut     <ord> Ideal, Premium, Good, Premium, Good, Very Good, Very Good, Ver…
$ color   <ord> E, E, E, I, J, J, I, H, E, H, J, J, F, J, E, E, I, J, J, J, I,…
$ clarity <ord> SI2, SI1, VS1, VS2, SI2, VVS2, VVS1, SI1, VS2, VS1, SI1, VS1, …
$ carat   <dbl> 0.23, 0.21, 0.23, 0.29, 0.31, 0.24, 0.24, 0.26, 0.22, 0.23, 0.…
$ depth   <dbl> 61.5, 59.8, 56.9, 62.4, 63.3, 62.8, 62.3, 61.9, 65.1, 59.4, 64…
$ table   <dbl> 55, 61, 65, 58, 58, 57, 57, 55, 61, 61, 55, 56, 61, 54, 62, 58…
$ price   <int> 326, 326, 327, 334, 335, 336, 336, 337, 337, 338, 339, 340, 34…
$ x       <dbl> 3.95, 3.89, 4.05, 4.20, 4.34, 3.94, 3.95, 4.07, 3.87, 4.00, 4.…
$ y       <dbl> 3.98, 3.84, 4.07, 4.23, 4.35, 3.96, 3.98, 4.11, 3.78, 4.05, 4.…
$ z       <dbl> 2.43, 2.31, 2.31, 2.63, 2.75, 2.48, 2.47, 2.53, 2.49, 2.39, 2.…

ggplot2::theme_set(new = theme_custom())

gf_violin(price ~ "All Diamonds",
  data = diamonds_modified,
  draw_quantiles = c(0, .25, .50, .75)
) %>%
  gf_labs(title = "Plot A: Violin plot for Diamond Prices")

ggplot2::theme_set(new = theme_custom())

diamonds_modified %>%
  gf_violin(price ~ cut,
    draw_quantiles = c(0, .25, .50, .75)
  ) %>%
  gf_labs(title = "Plot B: Price by Cut")

ggplot2::theme_set(new = theme_custom())

diamonds_modified %>%
  gf_violin(price ~ cut,
    fill = ~cut,
    color = ~cut,
    alpha = 0.5,
    draw_quantiles = c(0, .25, .50, .75)
  ) %>%
  gf_labs(title = "Plot C: Price by Cut")

ggplot2::theme_set(new = theme_custom())

diamonds_modified %>%
  gf_violin(price ~ cut,
    fill = ~cut,
    colour = ~cut,
    alpha = 0.5,
    draw_quantiles = c(0, .25, .50, .75)
  ) %>%
  gf_facet_wrap(vars(clarity)) %>%
  gf_labs(title = "Plot D: Price by Cut facetted by Clarity") %>%
  gf_theme(theme(axis.text.x = element_text(angle = 45, hjust = 1)))

ggplot2::theme_set(new = theme_custom())

diamonds_modified %>% ggplot() +
  geom_violin(aes(y = price, x = ""),
    draw_quantiles = c(0, .25, .50, .75)
  ) + # note: y, not x
  labs(title = "Plot A: violin for Diamond Prices")
###
diamonds_modified %>% ggplot() +
  geom_violin(aes(cut, price),
    draw_quantiles = c(0, .25, .50, .75)
  ) +
  labs(title = "Plot B: Price by Cut")
###
diamonds_modified %>% ggplot() +
  geom_violin(
    aes(cut, price,
      color = cut, fill = cut
    ),
    draw_quantiles = c(0, .25, .50, .75),
    alpha = 0.4
  ) +
  labs(title = "Plot C: Price by Cut")
###
diamonds_modified %>% ggplot() +
  geom_violin(
    aes(cut,
      price,
      color = cut, fill = cut
    ),
    draw_quantiles = c(0, .25, .50, .75),
    alpha = 0.4
  ) +
  facet_wrap(vars(clarity)) +
  labs(title = "Plot D: Price by Cut facetted by Clarity") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Business Insights from diamond Violin Plots

The distribution for price is clearly long-tailed (skewed). The distributions also vary considerably based on both cut and clarity. These Qual variables clearly have a large effect on the prices of individual diamonds.

7 Wait, But Why?

Box plots give us an idea of medians, IQR ranges, and outliers. The shape of the density is not apparent from the box.
Densities give us shapes of distributions, but do not provide visual indication of other metrics like means or medians ( at least not without some effort)
Violins help us do both!
Violins can also be cut in half (since they are symmetric, like Buddhist Prayer Wheels), then placed horizontally, and combined with both a boxplot and a dot-plot to give us raincloud plots that look like this. (Yes, there is code over there, which you can reuse.)

8 Conclusion

Histograms, Frequency Distributions, and Box Plots are used for Quantitative data variables
Histograms “dwell upon” counts, ranges, means and standard deviations
Frequency Density plots “dwell upon” probabilities and densities
Box Plots “dwell upon” medians and Quartiles
Qualitative data variables can be plotted as counts, using Bar Charts, or using Heat Maps
Violin Plots help us to visualize multiple distributions at the same time, as when we split a Quant variable wrt to the levels of a Qual variable.
Ridge Plots are density plots used for describing one Quant and one Qual variable (by inherent splitting)
We can split all these plots on the basis of another Qualitative variable.(Ridge Plots are already split)
Long tailed distributions need care in visualization and in inference making!

9 Your Turn

Datasets

Click on the Dataset Icon above, and unzip that archive. Try to make distribution plots with each of the three tools.

CalmCode

A dataset from calmcode.io https://calmcode.io/datasets.html

From Groups

Datasets from the earlier module on Groups.

inspect the dataset in each case and develop a set of Questions, that can be answered by appropriate stat measures, or by using a chart to show the distribution.

10 References

Winston Chang (2024). R Graphics Cookbook. https://r-graphics.org
See the scrolly animation for a histogram at this website: Exploring Histograms, an essay by Aran Lunzer and Amelia McNamara https://tinlizzie.org/histograms/?s=09
Minimal R using mosaic.https://cran.r-project.org/web/packages/mosaic/vignettes/MinimalRgg.pdf
Sebastian Sauer, Plotting multiple plots using purrr::map and ggplot

R Package Citations

Package	Version	Citation
ggnormalviolin	0.2.1	Schneider (2025)
ggridges	0.5.7	Wilke (2025)
NHANES	2.1.0	Pruim (2015)
TeachHist	0.2.1	Lange (2023)
TeachingDemos	2.13	Snow (2024)
tinytable	0.13.0	Arel-Bundock (2025)
visualize	4.5.0	Balamuta (2023)

Arel-Bundock, Vincent. 2025. tinytable: Simple and Configurable Tables in “HTML,” “LaTeX,” “Markdown,” “Word,” “PNG,” “PDF,” and “Typst” Formats. https://doi.org/10.32614/CRAN.package.tinytable.

Balamuta, James. 2023. visualize: Graph Probability Distributions with User Supplied Parameters and Statistics. https://doi.org/10.32614/CRAN.package.visualize.

Lange, Carsten. 2023. TeachHist: A Collection of Amended Histograms Designed for Teaching Statistics. https://doi.org/10.32614/CRAN.package.TeachHist.

Pruim, Randall. 2015. NHANES: Data from the US National Health and Nutrition Examination Study. https://doi.org/10.32614/CRAN.package.NHANES.

Schneider, W. Joel. 2025. ggnormalviolin: A “ggplot2” Extension to Make Normal Violin Plots. https://doi.org/10.32614/CRAN.package.ggnormalviolin.

Snow, Greg. 2024. TeachingDemos: Demonstrations for Teaching and Learning. https://doi.org/10.32614/CRAN.package.TeachingDemos.

Wilke, Claus O. 2025. ggridges: Ridgeline Plots in “ggplot2”. https://doi.org/10.32614/CRAN.package.ggridges.

Citation

BibTeX citation:

@online{v.2022,
  author = {V., Arvind},
  title = {\textless Iconify-Icon
    Icon=“material-Symbols:light-Group-Rounded” Width=“1.2em”
    Height=“1.2em”\textgreater\textless/Iconify-Icon\textgreater{}
    {Groups} and {Distributions}},
  date = {2022-11-15},
  url = {https://madhatterguide.netlify.app/content/courses/Analytics/10-Descriptive/Modules/28-Violins/},
  langid = {en},
  abstract = {Quant and Qual Variable Graphs and their Siblings}
}

For attribution, please cite this work as:

V., Arvind. 2022. “<Iconify-Icon Icon=‘material-Symbols:light-Group-Rounded’ Width=‘1.2em’ Height=‘1.2em’></Iconify-Icon> Groups and Distributions.” November 15, 2022. https://madhatterguide.netlify.app/content/courses/Analytics/10-Descriptive/Modules/28-Violins/.