Comparing Multiple Means with ANOVA

Arvind V.

2023-03-28

Setting up R Packages

library(tidyverse) # Tidy data processing
library(ggformula) # Formula based plots
library(mosaic) # Data inspection and Statistical Inference
library(broom) # Tidy outputs from Statistical Analyses
library(infer) # Statistical Inference, Permutation/Bootstrap
library(supernova) # Beginner-Friendly ANOVA Tables
library(ggstatsplot) # Statistical Plots
library(ggcompare) # Improved p.value brackets on graphs
##
library(patchwork) # Arranging Plots
library(ggprism) # Interesting Categorical Axes
library(paletteer) # Color Palettes

Plot Fonts and Theme

Code
library(systemfonts)
library(showtext)
## Clean the slate
systemfonts::clear_local_fonts()
systemfonts::clear_registry()
##
showtext_opts(dpi = 96) # set DPI for showtext
sysfonts::font_add(
  family = "Alegreya",
  regular = "../../../../../../fonts/Alegreya-Regular.ttf",
  bold = "../../../../../../fonts/Alegreya-Bold.ttf",
  italic = "../../../../../../fonts/Alegreya-Italic.ttf",
  bolditalic = "../../../../../../fonts/Alegreya-BoldItalic.ttf"
)

sysfonts::font_add(
  family = "Roboto Condensed",
  regular = "../../../../../../fonts/RobotoCondensed-Regular.ttf",
  bold = "../../../../../../fonts/RobotoCondensed-Bold.ttf",
  italic = "../../../../../../fonts/RobotoCondensed-Italic.ttf",
  bolditalic = "../../../../../../fonts/RobotoCondensed-BoldItalic.ttf"
)
showtext_auto(enable = TRUE) # enable showtext
##
theme_custom <- function() {
  theme_bw(base_size = 10) +

    # theme(panel.widths = unit(11, "cm"),
    #       panel.heights = unit(6.79, "cm")) + # Golden Ratio

    theme(
      plot.margin = margin_auto(t = 1, r = 2, b = 1, l = 1, unit = "cm"),
      plot.background = element_rect(
        fill = "bisque",
        colour = "black",
        linewidth = 1
      )
    ) +

    theme_sub_axis(
      title = element_text(
        family = "Roboto Condensed",
        size = 10
      ),
      text = element_text(
        family = "Roboto Condensed",
        size = 8
      )
    ) +

    theme_sub_legend(
      text = element_text(
        family = "Roboto Condensed",
        size = 6
      ),
      title = element_text(
        family = "Alegreya",
        size = 8
      )
    ) +

    theme_sub_plot(
      title = element_text(
        family = "Alegreya",
        size = 14, face = "bold"
      ),
      title.position = "plot",
      subtitle = element_text(
        family = "Alegreya",
        size = 10
      ),
      caption = element_text(
        family = "Alegreya",
        size = 6
      ),
      caption.position = "plot"
    )
}

## Use available fonts in ggplot text geoms too!
ggplot2::update_geom_defaults(geom = "text", new = list(
  family = "Roboto Condensed",
  face = "plain",
  size = 3.5,
  color = "#2b2b2b"
))
ggplot2::update_geom_defaults(geom = "label", new = list(
  family = "Roboto Condensed",
  face = "plain",
  size = 3.5,
  color = "#2b2b2b"
))

ggplot2::update_geom_defaults(geom = "marquee", new = list(
  family = "Roboto Condensed",
  face = "plain",
  size = 3.5,
  color = "#2b2b2b"
))
ggplot2::update_geom_defaults(geom = "text_repel", new = list(
  family = "Roboto Condensed",
  face = "plain",
  size = 3.5,
  color = "#2b2b2b"
))
ggplot2::update_geom_defaults(geom = "label_repel", new = list(
  family = "Roboto Condensed",
  face = "plain",
  size = 3.5,
  color = "#2b2b2b"
))

## Set the theme
ggplot2::theme_set(new = theme_custom())

## tinytable options
options("tinytable_tt_digits" = 2)
options("tinytable_format_num_fmt" = "significant_cell")
options(tinytable_html_mathjax = TRUE)


## Set defaults for flextable
flextable::set_flextable_defaults(font.family = "Roboto Condensed")

Introduction

Suppose we have three sales strategies on our website, to sell a certain product, say men’s shirts. We have observations of customer website interactions over several months. How do we know which strategy makes people buy the fastest ?

If there is a University course that is offered in parallel in three different classrooms, is there a difference between the average marks obtained by students in each of the classrooms?

In each case we have a set of Quant observations in each Qual category: Interaction Time vs Sales Strategy in the first example, and Student Marks vs Classroom in the second. We can take mean scores in each category and decide to compare them. How do we make the comparisons? One way would be to compare them pair-wise, doing as many t-tests as there are pairs. But with this rapidly becomes intractable and also dangerous: with increasing number of groups, the number of mean-comparisons becomes very large \(N\choose 2\) and with each comparison the possibility of some difference showing up, just by chance, increases! And we end up making the wrong inference and perhaps the wrong decision. The trick is of course to make comparisons all at once and ANOVA is the technique that allows us to do just that.

In this tutorial, we will compare the Hatching Time of frog spawn1, at three different lab temperatures.

In this tutorial, our research question is:

Research Question

Based on the sample dataset at hand, how does frogspawn hatching time vary with different temperature settings?

Workflow: Read the Data

Download the data by clicking the button below.

Data Folder

Save the CSV in a subfolder titled “data” inside your R work folder.

frogs_orig <- read_csv("data/frogs.csv")
frogs_orig

Our response variable is the hatching Time. Our explanatory variable is a factor, Temperature, with 3 levels: 13°C, 18°C and 25°C. Different samples of spawn were subject to each of these temperatures respectively.

Workflow: Clean the Data

The data is in wide-format, with a separate column for each Temperature, and a common column for Sample ID. This is good for humans, but poor for a computer: there are NA entries since not all samples of spawn can be subject to all temperatures. (E.g. Sample ID #1 was maintained at 13°C, and there are obvious NAs in the other two columns, which we don’t need).

We will first stack up the Temperature columns into a single column, separate that into pieces and then retain just the number part (13, 18, 25), getting rid of the word Temperature from the column titles. Then the remaining numerical column with temperatures (13, 18, 25) will be converted into a factor.

We will use pivot_longer()and separate_wider_regex() to achieve this. [See this animation for pivot_longer(): https://haswal.github.io/pivot/ ]

Code
frogs_orig %>%
  pivot_longer(
    .,
    cols = starts_with("Temperature"),
    cols_vary = "fastest",
    # new in pivot_longer
    names_to = "Temp",
    values_to = "Time"
  ) %>%
  drop_na() %>%
  ##
  separate_wider_regex(
    cols = Temp,
    # knock off the unnecessary "Temperature" word
    # Just keep the digits thereafter
    patterns = c("Temperature", TempFac = "\\d+"),
    cols_remove = TRUE
  ) %>%
  # Convert Temp into TempFac, a 3-level factor
  mutate(TempFac = factor(
    x = TempFac,
    levels = c(13, 18, 25),
    labels = c("13", "18", "25")
  )) %>%
  rename("Id" = `Frogspawn sample id`) -> frogs_long
frogs_long
##
frogs_long %>% count(TempFac)
Table 1: Frogspawn data
(a) Clean Data
(b) Counts of samples per Temperature level

So we have cleaned up our data and have 20 samples for Hatching Time per TempFac setting.

Workflow: EDA

Let us plot jitter-plots, boxplots, and histograms of Hatching Time:

ggplot2::theme_set(new = theme_custom())

gf_jitter(Time ~ TempFac,
  color = ~TempFac, width = 0.2,
  data = frogs_long, size = 3, alpha = 0.5
) %>%
  gf_labs(
    title = "Frog Spawn Hatching Time vs Temperature",
    x = "Temperature", y = "Hatching Time"
  ) %>%
  gf_hline(yintercept = ~ mean(Time), color = "grey") %>%
  gf_annotate(geom = "text", label = "Overall Mean", x = 3, y = mean(frogs_long$Time) + 0.5, size = 4) %>%
  gf_refine(
    scale_color_paletteer_d("ggthemes::colorblind"),
    scale_x_discrete(guide = "prism_bracket"),
    guides(color = guide_legend(title = "Temperature level (°C)"))
  )

ggplot2::theme_set(new = theme_custom())

gf_boxplot(
  data = frogs_long,
  Time ~ TempFac,
  fill = ~TempFac,
  alpha = 0.5, orientation = "x"
) %>%
  gf_hline(yintercept = ~ mean(Time), color = "grey") %>%
  gf_labs(
    title = "Frog Spawn Hatching Time vs Temperature",
    x = "Temperature", y = "Hatching Time",
    caption = "Using ggprism"
  ) %>%
  gf_annotate(geom = "text", label = "Overall Mean", x = 3, y = mean(frogs_long$Time) + 0.5, size = 4) %>%
  gf_refine(
    scale_fill_paletteer_d("ggthemes::colorblind"),
    scale_x_discrete(guide = "prism_bracket"),
    guides(fill = guide_legend(title = "Temperature level (°C)"))
  )

ggplot2::theme_set(new = theme_custom())

gf_histogram(~Time,
  fill = ~TempFac,
  data = frogs_long, alpha = 0.5
) %>%
  gf_vline(xintercept = ~ mean(frogs_long$Time)) %>%
  gf_labs(
    title = "Frog Spawn Hatching Time vs Temperature",
    x = "Hatching Time", y = "Count"
  ) %>%
  gf_annotate("text",
    y = 7, x = mean(frogs_long$Time) + 1.5,
    label = "Overall Mean", size = 4
  ) %>%
  gf_refine(
    scale_fill_paletteer_d("ggthemes::colorblind"),
    guides(fill = guide_legend(title = "Temperature level (°C)"))
  )

The histograms look well separated and the box plots also show very little overlap. So we can reasonably hypothesize that Temperature has a significant effect on Hatching Time.

The scatter plot seems to show that there are only a few fixed values of observed readings of Hatching Time for each setting of TempFac, which is a little unusual. But we will proceed nonetheless.

Let’s go ahead with our ANOVA test.

Workflow: ANOVA

We will first execute the ANOVA test with code and evaluate the results. Then we will do an intuitive walkthrough of the process and finally, hand-calculate entire analysis for clear understanding. For now, a little faith!

Stating the Model

And supernova gives us a nice linear equation relating Hatching_Time to TempFac:

supernova::equation(frogs_anova)
Fitted equation:
Time = 26.3 + -5.3*TempFac18 + -10.1*TempFac25 + e

TempFac18 and TempFac25 are binary {0,1} coded variables, representing the test situation. e is the remaining error. The equation models the means at each value of TempFac.

Code
# https://www.statology.org/r-geom_path-each-group-consists-of-only-one-observation/

## Plot the Equation over the Scatterplot
frogs_long %>%
  mutate(fitted = fitted(frogs_anova)) %>%
  gf_jitter(Time ~ TempFac,
    width = 0.2, alpha = 0.6,
    color = ~TempFac,
    data = .
  ) %>%
  gf_summary(
    group = ~1, # See the reference link above. Damn!!!
    fun = "mean", geom = "line", colour = "lightblue",
    lty = 1, linewidth = 2
  ) %>%
  gf_point(fitted ~ TempFac,
    color = ~TempFac,
    shape = 2,
    size = 6
  ) %>%
  gf_segment(fitted + fitted ~ 0 + TempFac, linetype = 2, color = ~TempFac) %>%
  gf_annotate("label",
    label = TeX(r"($Time = 26.3 - 5.3 \times TempFac18  -10.1 \times TempFac25 + e$)", output = "character"),
    parse = TRUE,
    y = 26.3, x = 2.25,
    size = 3, color = "black", fill = "lightblue"
  ) %>%
  gf_annotate("label",
    label = "Triangles are Predictions\n at each TempFac value", y = 20, x = 1.25,
    size = 3, color = "black", fill = "moccasin"
  ) %>%
  gf_labs(
    title = "Fitted ANOVA Model Equation",
    x = "Temperature", y = "Hatching Time"
  ) %>%
  gf_refine(
    scale_colour_paletteer_d("ggthemes::colorblind"),
    guides(colour = guide_legend(title = "Temperature level (°C)"))
  ) %>%
  gf_refine(
    scale_x_discrete(guide = "prism_bracket"),
    scale_y_continuous(breaks = c(10, 15, (26.3 - 10.1), 20, (26.3 - 5.3), 25, 26.3, 30, 35))
  )

Workflow: Checking ANOVA Assumptions

ANOVA makes 3 fundamental assumptions:

  1. Data (and errors) are normally distributed.
  2. Variances are equal.
  3. Observations are independent.

We can check these using checks and graphs.

Checks for Normality

The shapiro.wilk test tests if a vector of numeric data is normally distributed and rejects the hypothesis of normality when the p-value is less than or equal to 0.05.

shapiro.test(x = frogs_long$Time)

    Shapiro-Wilk normality test

data:  frogs_long$Time
W = 0.92752, p-value = 0.001561

The p-value is very low and we cannot reject the (alternative) hypothesis that the overall data is not normal. How about *normality at each level of the TempFac factor?

Group-wise Normality

We will use an advanced dplyr technique to do this. We will first group the data by TempFac, then use dplyr::group_modify() to apply the shapiro.test() function to each of these groups. Finally we will use broom::tidy() to convert the output of shapiro.test() into a tidy dataframe. See the code below.

Code
frogs_long %>%
  group_by(TempFac) %>%
  group_modify(~ .x %>%
    select(Time) %>%
    as_vector() %>%
    shapiro.test() %>%
    broom::tidy())

The shapiro.wilk test makes a NULL Hypothesis that the data are normally distributed and estimates the probability that the given data could have happened by chance. Except for TempFac = 18 the p.values are less than 0.05 and we can reject the NULL hypothesis that each of these is normally distributed. Perhaps this is a sign that we need more than 20 samples per factor level. Let there be more frogs !!! இன்னும தவளைகள் வேண்டும்!! !!

We can also check the residuals post-model:

Code
ggplot2::theme_set(new = theme_custom())

frogs_anova$residuals %>%
  as_tibble() %>%
  gf_dhistogram(~value, data = .) %>%
  gf_labs(
    title = "Residuals Histogram",
    x = "Residuals", y = "Count"
  ) %>%
  gf_fitdistr()
##
frogs_anova$residuals %>%
  as_tibble() %>%
  gf_qq(~value, data = .) %>%
  gf_qqstep() %>%
  gf_labs(
    title = "Residuals Q-Q Plot",
    x = "Theoretical Quantiles", y = "Sample Quantiles"
  ) %>%
  gf_qqline()

shapiro.test(frogs_anova$residuals)

    Shapiro-Wilk normality test

data:  frogs_anova$residuals
W = 0.94814, p-value = 0.01275

Unsurprisingly, the residuals are also not normally distributed either. We really need more samples / observations!! But the differences in means are so large that we can still be confident of the results.

Check for Similar Variance

Response data with different variances at different levels of an explanatory variable are said to exhibit heteroscedasticity. This violates one of the assumptions of ANOVA.

To check if the Time readings are similar in variance across levels of TempFac, we can use the Levene Test, or since our per-group observations are not normally distributed, a non-parametric rank-based Fligner-Killeen Test. The NULL hypothesis is that the data are with similar variances. The tests assess how probable this is with the given data assuming this NULL hypothesis:

frogs_long %>%
  group_by(TempFac) %>%
  summarise(variance = var(Time))

Not too different…OK on with the tests…

DescTools::LeveneTest(Time ~ TempFac, data = frogs_long)
##
fligner.test(Time ~ TempFac, data = frogs_long)

    Fligner-Killeen test of homogeneity of variances

data:  Time by TempFac
Fligner-Killeen:med chi-squared = 0.53898, df = 2, p-value = 0.7638

It seems that there is no cause for concern here; the data do not have significantly different variances at different levels of TempFac.

Independent Observations

This is an experiment design concern; the way the data is gathered must be specified such that data for each level of the factors ( factor combinations if there are more than one) should be independent.

Workflow: Effect Size

The simplest way to find the actual effect sizes detected by an ANOVA test is something we have already done, with the supernova package: Here is the table and plot again:

ggplot2::theme_set(new = theme_custom())

frogs_supernova <-
  supernova::pairwise(frogs_anova,
    plot = TRUE,
    alpha = 0.05,
    correction = "Bonferroni"
  )

frogs_supernova$TempFac

This table, the plot, and the equation we set up earlier all give us the sense of how the TempFac affects Time. The differences are given pair-wise between levels of the Qual factor, TempFac, and the standard error has been declared in pooled fashion (all groups together).

Using other packages : ggstatsplot

There is a very neat package called ggstatsplot1 that allows us to plot very comprehensive statistical graphs. Let us quickly do this:

Code
ggplot2::theme_set(new = theme_custom())

library(ggstatsplot)
frogs_long %>%
  ggstatsplot::ggbetweenstats(
    x = TempFac, y = Time,
    colour = TempFac, alpha = 0.8,
    type = "parametric",
    p.adjust.method = "bonferroni",
    conf.level = 0.95,
    # Remove the violin plots
    violin.args = list(width = 0.0)
  ) +

  scale_colour_paletteer_d("ggthemes::colorblind") +
  labs(
    title = "ANOVA : Frogs Spawn Time vs Temperature Setting",
    x = "Temperature (°C)", y = "Spawn Time (days)",
    subtitle = "", caption = ""
  )
Figure 1: ggstatsplot ANOVA

This plot also shows the \(p.value = 0\) for each pairwise comparison, attesting to its significance.

The ggstatsplot package is very useful for quick visualizations of statistical tests.

Workflow: ANOVA using Permutation Tests

We wish to establish the significance of the effect size due to each of the levels in TempFac. From the normality tests conducted earlier we see that except at one level of TempFac, the times are are not normally distributed. Hence we opt for a Permutation Test to check for significance of effect.

As remarked in Ernst1, the non-parametric permutation test can be both exact and also intuitively easier for students to grasp.

We proceed with a Permutation Test for TempFac. We shuffle the levels (13, 18, 25) randomly between the Times and repeat the ANOVA test each time and calculate the F-statistic. The Null distribution is the distribution of the F-statistic over the many permutations and the p-value is given by the proportion of times the F-statistic equals or exceeds that observed.

We will use infer to do this: We calculate the observed F-stat with infer, which also has a very direct, if verbose, syntax for doing permutation tests:

observed_infer <-
  frogs_long %>%
  specify(Time ~ TempFac) %>%
  hypothesise(null = "independence") %>%
  calculate(stat = "F")
observed_infer

We see that the observed F-Statistic is of course \(385.8966\) as before. Now we use infer to generate a NULL distribution using permutation of the factor TempFac:

null_dist_infer <- frogs_long %>%
  specify(Time ~ TempFac) %>%
  hypothesise(null = "independence") %>%
  generate(reps = 4999, type = "permute") %>%
  calculate(stat = "F")
##
null_dist_infer
##
null_dist_infer %>%
  visualise(method = "simulation") +
  shade_p_value(obs_stat = observed_infer$stat, direction = "right") +
  scale_x_continuous(trans = "log10", expand = c(0, 0)) +
  coord_cartesian(xlim = c(0.2, 500), clip = "off") +
  annotation_logticks(outside = FALSE) +
  theme_custom()

As seen, the infer based permutation test also shows that the permutationally generated F-statistics are nowhere near that which was observed. The effect of TempFac is very strong.

Wait, But Why?

  • In marketing, design, or business research, similar quantities may be measured across different locations, or stores, or categories of people, for instance.
  • ANOVA is the tool to decide if the Quant variable has differences across the Qual categories.
  • This approach can be extended to more than one Qual variable, and also if there is another Quant variable in the mix.

Conclusions

We have discussed ANOVA as a means of modelling the effects of a Categorical variable on a Continuous (Quant) variable. ANOVA can be carried out using the standard formula aov when assumptions on distributions, variances, and independence are met. Permutation ANOVA tests can be carried out when these assumptions do not quite hold.

Two-Way ANOVA

What if we have two Categorical variables as predictors?

We then need to perform a Two-Way ANOVA analysis, where we look at the predictors individually (main effects) and together (interaction effects). Here too, we need to verify if the number of observations are balanced across all combinations of factors of the two Qualitative predictors. There are three different classical approaches (Type1, Type2, and Type3 ANOVA) for testing hypotheses in ANOVA for unbalanced designs, as they are called. (Langsrud 2003).

Informative Hypothesis Testing: Models which incorporate a priori Beliefs

Note that when we specified our research question, we had no specific hypothesis about the means, other than that they might be different. In many situations, we may have reason to believe in the relative “ordering” of the means for different levels of the Categorical variable. The one-sided t-test is the simplest example (e.g., \(\mu_1 >= 0\) and \(\mu_1 >= \mu_2\)); this readily extends to the multi-parameter setting, where more than one inequality constraint can be imposed on the parameters (e.g., \(\mu_1 <= \mu_2 <= \mu_3\).
It is possible to incorporate these beliefs into the ANOVA model, using what is called as informative hypothesis testing, which have certain advantages compared to unconstrained models. The R package called restriktor has the capability to develop such models with beliefs.

Your Turn

  1. Try the simple datasets at https://www.performingmusicresearch.com/datasets/

  2. Can you try to ANOVA-analyse the datasets we dealt with in plotting Groups with Boxplots?

References

  1. The ANOVA tutorial at Our Coding Club
  2. Antoine Soetewey. How to: one-way ANOVA by hand. https://statsandr.com/blog/how-to-one-way-anova-by-hand/
  3. ANOVA in R - Stats and R https://statsandr.com/blog/anova-in-r/
  4. Michael Crawley.(2013) The R Book,second edition. Chapter 11.
  5. David C Howell, Permutation Tests for Factorial ANOVA Designs
  6. Marti Anderson, Permutation tests for univariate or multivariate analysis of variance and regression
  7. Judd, Charles M., Gary H. McClelland, and Carey S. Ryan.(2017). “Introduction to Data Analysis.” In, 1–9. Routledge. https://doi.org/10.4324/9781315744131-1.
  8. Patil, I. (2021). Visualizations with statistical details: The ‘ggstatsplot’ approach. Journal of Open Source Software, 6(61), 3167, doi:10.21105/joss.03167
  9. Langsrud, Øyvind. (2003). ANOVA for unbalanced data: Use type II instead of type III sums of squares. Statistics and Computing. 13. 163-167.
  10. Kim TK. (2017). Understanding one-way ANOVA using conceptual figures. Korean J Anesthesiol. 2017 Feb;70(1):22-26. https://ekja.org/upload/pdf/kjae-70-22.pdf
  11. Anova – Type I/II/III SS explained.https://mcfromnz.wordpress.com/2011/03/02/anova-type-iiiiii-ss-explained/
  12. Bidyut Ghosh (Aug 28, 2017). One-way ANOVA in R. https://datascienceplus.com/one-way-anova-in-r/
R Package Citations
Package Version Citation
DescTools 0.99.60 Signorell (2025)
ggcompare 0.0.5 Wang (2025)
ggprism 1.0.7 Dawson (2025)
ggstatsplot 0.13.3 Patil (2021)
ggtext 0.1.2 Wilke and Wiernik (2022)
restriktor 0.6.10 Vanbrabant and Kuiper (2024)
supernova 3.0.0 Blake et al. (2024)
Blake, Adam, Jeff Chrabaszcz, Ji Son, and Jim Stigler. 2024. supernova: Judd, McClelland, & Ryan Formatting for ANOVA Output. https://doi.org/10.32614/CRAN.package.supernova.
Dawson, Charlotte. 2025. ggprism: A ggplot2 Extension Inspired by GraphPad Prism. https://doi.org/10.32614/CRAN.package.ggprism.
Langsrud, Øyvind. 2003. Statistics and Computing 13 (2): 163–67. https://doi.org/10.1023/a:1023260610025.
Patil, Indrajeet. 2021. Visualizations with statistical details: The ggstatsplot approach.” Journal of Open Source Software 6 (61): 3167. https://doi.org/10.21105/joss.03167.
Signorell, Andri. 2025. DescTools: Tools for Descriptive Statistics. https://doi.org/10.32614/CRAN.package.DescTools.
Vanbrabant, Leonard, and Rebecca Kuiper. 2024. restriktor: Restricted Statistical Estimation and Inference for Linear Models. https://doi.org/10.32614/CRAN.package.restriktor.
Wang, Hao. 2025. ggcompare: Mean Comparison in ggplot2. https://doi.org/10.32614/CRAN.package.ggcompare.
Wilke, Claus O., and Brenton M. Wiernik. 2022. ggtext: Improved Text Rendering Support for ggplot2. https://doi.org/10.32614/CRAN.package.ggtext.

An AI Generated Explanation of the Statistical Model

We have used the statlingua package to generate an AI explanation of the statistical model t1_amas we have used in this module. statlingua interfaces to the ellmer package, which in turn interfaces to the ollama API for AI explanations. The AI model used is llama3.1, and the explanation is tailored for a novice audience with moderate verbosity.

AI Generated Explanation

The command was run in the Console and the results pasted into this Quarto document. I have not edited this at all. Do look for complex verbiage and outright hallucinatory responses.

Code
library(ellmer)
library(statlingua)
client <- ellmer::chat_ollama(model = "llama3.1", echo = FALSE)
###

exam_context <- "This model uses ANOVA to analyze the effect of temperature on frog spawn time. The model is based on a dataset where frog spawn were observed at three different temperatures (13°C, 18°C, and 25°C) to see how long it took for them to spawn. There were 20 samples of spawn at each temperature. The data was in wide form and pivotted to long form, and the Temperature variable was converted to a 3-level factor. The goal is to determine if the temperature-factor has a significant impact on the time it takes for frogs to spawn."

explanation_anova <- statlingua::explain(frogs_anova,
  client = client,
  context = exam_context,
  audience = "novice", verbosity = "moderate"
) # moderate / detailed
ANOVA Model Output Interpretation

The provided ANOVA table summarizes the results of our ANOVA linear regression model, examining the effect of temperature on frog spawn time.

  • Df (Degrees of Freedom): These values represent the number of parameters in each part of the model.
    • TempFac (2): The two degrees of freedom for the temperature factor correspond to the three levels of the factor minus one. This reflects that there are two independent comparisons within the factor, which can be derived by considering pair-wise differences between levels while controlling for the overall mean, effectively reducing the complexity from a 3-level comparison down to 2.
    • Residuals (57): The 57 degrees of freedom for residuals is the total number of observations minus the model’s intercept and slope parameters. Assuming no loss of data during preparation, this should be your sample size minus one, reflecting how much variation in the observed times might not be explained by temperature.
  • Sum Sq (Sum of Squares): These values measure the amount of variation attributable to each part of the model.
    • TempFac: The sum of squares for the temp factor is substantial relative to the variability attributed to random errors, indicating that temperature has a significant impact on frog spawn time.
    • Residuals: This value represents how much observed variance remains unexplained by our model’s inclusion of temperature differences.
  • Mean Sq (Mean Square) and F Value:
    • The mean square for any effect is its sum of squares divided by its degrees of freedom.
    • In this case, the much larger Mean Square for TempFac reflects both that it explains more variance & includes more variation per comparison due to having fewer degrees of freedoms.
    • F Value (385.9): The F value represents a test statistic comparing the explained variance by temp factor to what we’d expect from randomized chance under our assumption of no effect.
  • Pr(>F) (<2e-16 ): A very small Pr(>F) associated with a significant result indicates there is practically zero probability (considered a very strong effect size) that data as extreme would be observed in experiments testing the null hypothesis assuming temp has no overall effect on spawn time. **** signifies this high level of statistical significance.
Section: Checking Assumptions

Given the complexity and specificity to research goals, consulting with your dataset on assumptions via graphical explorations is prudent. Here are brief suggestions tailored to our specific model: 1. Plot residuals against fitted values or a normal Q-Q plot. 2. For linearity: Observe whether scatter around each predicted spawn time level is fairly random for each temperature category in the residuals vs fitted-values plot. 3. For homoscedasticity (Constant Deviation): Assess if variability at any one temperature’s range of times are roughly consistent with those from other temperatures’ ranges, also considering graphical displays. 4. Normality: Perform a Shapiro-Wilk test on these model residuals to quantify how far the data deviates from normality; it’s especially relevant given the small size & homogeneity across treatment conditions.

Section: Interpreting Coefficients

Given the context and ANOVA results here, we’d primarily focus on the Temperature (TempFac levels) effect, not the individual intercept or coefficients of the TempFac categories themselves as our primary object of study is their collective overall influence relative to randomness. - The practical interpretation would be that any 2-level average comparison between our treatment conditions, as determined by the specific model parameters and the nature of factorization here applied toward ANOVA analysis. For a simple way of framing your result consider whether some combination of those two categories produces an increase or decrease in observed spawn times when comparing at higher temperatures than our baseline (for example).

Conclusion

The strong effect and extremely low p-value for the TempFac indicate that temperature has a significant impact on frog spawn time, and there is strong evidence to support an overall association. Your analysis thus supports your research goal and provides valuable insight into how temperature influences this ecological process.

In summary, using linear regression or ANOVA (equivalent in terms of testing effect magnitude) with careful attention toward ensuring the appropriateness of linearity assumptions allows drawing significant conclusions about how the temperature factor affects frog spawn times.