Lesson 7: ANOVAs

In the previous lessons, we have been working with t-tests, which are used when you have a categorical independent variable with only 2 values and a continuous dependent variable. In this lesson, we will build on what you have learned and cover ANOVAs. ANOVAs are similar to t-tests, but they can be used when the categorical independent variable has 2 or more values. In other words, they are are generalization of the t-test. They can be used in for the same type of data as a t-test, but in some additional circumstances as well.

For this lesson, we will work with a new data set on the Coachella Valley fringe-toed lizard. This lizard is very endangered, so we are interested in how to best manage the species to improve the chances of persistence. We have data based on a simulation that estimates the time to extinction (TTE) for three different management plans: no reserve (“none”), a single reserve (“single”), and multiple reserves in a network (“network”).

First, download the lizard data and load the data into R (don’t worry if you get a warning message when you do this - it’s not a problem).

lizard <- read.csv("lizard.csv")

Because when we checked assumptions, we determined that we needed to work with the log-transformed extinction times, we will add the log-transformed variable to our data set:

lizard$log_TTE <- log(lizard$TTE)

Also, load the ggplot2 package:

library(ggplot2)

Now, we are ready to get started. We will learn both the classical frequentist and likelihood approach.

7.1 Frequentist approach

To run an ANOVA with the frequentist approach, the structure is similar to a t-test, but we will use a different function: aov. However, the arguments (input) of the function are the same as what we use for a t-test: our independent and dependent variable (don’t forget to use the log of TTE) and the data set. To view the output of the test, instead of just typing the name of the test object, we will use the summary function, with the test object as the input of the function.

lizard_anova <- aov(log_TTE ~ Plan, data=lizard)
summary(lizard_anova)
##              Df Sum Sq Mean Sq F value   Pr(>F)    
## Plan          2   6.66   3.329   10.96 3.67e-05 ***
## Residuals   146  44.33   0.304                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

As with the t-test, when we view the output, we can see the degrees of freedom (Df), the F value, and the p-value (Pr(>F)). Note that for an ANOVA, we have two degrees of freedom: the between categories degrees of freedom (number of categories - 1) and the within categories degrees of freedom (total sample size - number of categories). We report both when we report our results.

Looking at the p-value, we can see that it is below 0.05, so we reject the null hypothesis and tentatively accept that at least one of our categories has a different time to extinction than the rest.

If your assumptions of normally-distributed residuals and/or unequal variances are not met and can’t be fixed with a transform, there are alternative tests that you can run.

The Kruskal-Wallis test is a rank test similar to the Mann-Whitney U test that we used as an alternative for the t-test. It can be used when your residuals are not normally distributed. If we were going to use it for the lizard data (we don’t have to because our assumption were met, but this is for the sake of demonstration), it would look like this:

kruskal.test(TTE ~ Plan, data=lizard)

If our variances are not equal, we can instead use the Welch’s ANOVA, which would look like this:

oneway.test(TTE ~ Plan, data=lizard, var.equal = FALSE)

Again, you don’t need to run these tests in addition to the normal ANOVA. You would run one of these as an alternative to the ANOVA if the assumption is not met.

Post-hoc tests

If we want to know specifically which pairs of treatments differ from each other, we can use post-hoc tests to do pairwise comparisons between our categories. This is like running t-tests between each pair of categories, except the test corrects for the problem of multiple comparisons. We will use a function (pairwise.t.test) that gives us the flexibility to choose the type of correction we want to do, and will also work for data with unequal variances. We will use the Holm correction, which is a good choice in general because it is not too conservative but also doesn’t add other assumptions about our data.

The required inputs of this function are the dependent and independent variables for the test, in that order. There is not a separate argument for the data frame that contains those variables, so we have to provide the data frame nane as part of the variable names. To choose the type of p-value correction we want to use, we also need to add the “p.adjust.method” argument. We set this equal to “holm” to use the Holm correction.

pairwise.t.test(lizard$log_TTE,lizard$Plan,p.adjust.method="holm")
## 
##  Pairwise comparisons using t tests with pooled SD 
## 
## data:  lizard$log_TTE and lizard$Plan 
## 
##        network none 
## none   2e-05   -    
## single 0.029   0.029
## 
## P value adjustment method: holm

The output shows a table of p-values for each pairwise comparison between our groups. The row and column names tell you which two groups are being compared. Let’s look at the value in the upper left corner. It compares no reserve (“none”) to a network of reserves (“network”). The p-value for this pairwise comparison is 0.00005, which is clearly less than 0.05, so that tells us that there is a significant difference in the extinction time between the two treatments.

Now look at the remaining two pairwise comparisons. Based on the p-values, which other pairs of categories had significantly different extinction times?

In this example, it’s not necessary because after the log-tranform, our data meet the assumptions of an ANOVA and these follow-up post-hoc tests, but if our data did violate these assumptions, we can adjust the tests we use for the post hoc tests.

If we do not have equal variances, we can still use the pairwise.t.test function. We just add an additional argument (pool.sd=FALSE) to the function. If we were going to use this approach for the lizard data, it would look like this:

pairwise.t.test(lizard$TTE,lizard$Plan,p.adjust.method="holm",pool.sd=FALSE)
## 
##  Pairwise comparisons using t tests with non-pooled SD 
## 
## data:  lizard$TTE and lizard$Plan 
## 
##        network none  
## none   4.6e-05 -     
## single 0.0490  0.0086
## 
## P value adjustment method: holm

To do the pairwise comparisons if we don’t have normally-distributed residuals, we need a new function: pairwise.wilcox.test. This will run pairwise tests similar to t-test, but that allow for the non-normal distribution. The arguments of this function are the same as the arguments for the pairwise.t.test fucntion, so to use this function on the lizard data, it would look like this:

pairwise.wilcox.test(lizard$TTE,lizard$Plan,p.adjust.method="holm")
## 
##  Pairwise comparisons using Wilcoxon rank sum test with continuity correction 
## 
## data:  lizard$TTE and lizard$Plan 
## 
##        network none   
## none   0.00011 -      
## single 0.07094 0.02147
## 
## P value adjustment method: holm

We interpret the output of these test in the same way we would for the normal pairwise comparisons.

7.2 Maximum likelihood approach

To use the maximum likelihood approach for an ANOVA, we use exactly the same procedure that we used for a t-test.

We start by building our null and alternative models, using the lm function:

lizard_null <- lm(log_TTE ~ 1, data=lizard)
lizard_alt <- lm(log_TTE ~ Plan, data=lizard)

To compare the models, we again use the AIC function:

AIC(lizard_null,lizard_alt)
##             df      AIC
## lizard_null  2 267.0508
## lizard_alt   4 250.2030

What is your conclusion based on the AIC comparison?

If we want to view more information about our models, such as the values of extinction time for each category, we can use the summary function. Let’s try it just for the alternative model:

summary(lizard_alt)
## 
## Call:
## lm(formula = log_TTE ~ Plan, data = lizard)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.1710 -0.3913 -0.0580  0.4827  1.4904 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  4.38991    0.07872  55.769  < 2e-16 ***
## Plannone    -0.51831    0.11076  -4.679  6.5e-06 ***
## Plansingle  -0.24550    0.11076  -2.216   0.0282 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.551 on 146 degrees of freedom
## Multiple R-squared:  0.1306, Adjusted R-squared:  0.1187 
## F-statistic: 10.96 on 2 and 146 DF,  p-value: 3.665e-05

At the top, you can see the formula you used to build the model, and below that is a table showing the quartiles of your model residuals. Then you can see the coefficients table. The first column of this table (“Estimate”) can be used to figure our the time to extinction for each category. The first row in the estimate column (the “intercept” row) is the time to extinction for the network category. We can tell by process of elimination, because the other two row names are “Plannone” and “Plansingle”. To calculate the time to extinction for the no reserve category, we take the estimate value in the “Plan none” row and add it to the estimate from the intercept row:

-0.51831 + 4.38991
## [1] 3.8716

You would then use a similar approach to calculate the time to extinction for the single reserve category.

You don’t need to understand all of the rest of the output here (but I am happy to answer any questions you have if you are curious!), however, I will point out that the final line provides an F statistic, degrees of freedom, and a p-value that should match the ones from the aov function you used for the classical frequentist approach. So you can use the lm function for the frequentist approach as well, and just look at the final line from the summary of the alternative model to get the output!

With the maximum likelihood approach, we do not typically follow up with post-hoc tests. This has to do with differences in the philosophy of the two approaches. Maximum likelihood approaches are more about identifying important explanatory variables for our data and not so much about digging into the nitty-gritty differences between the different categories within a variable. Therefore, if it is important to your question to test for those pairwise differences, you should use the classical frequentist approach.