In the previous lessons, we have been working with t-tests, which are used when you have a categorical independent variable with only 2 values and a continuous dependent variable. In this lesson, we will build on what you have learned and cover ANOVAs. ANOVAs are similar to t-tests, but they can be used when the categorical independent variable has 2 or more values. In other words, they are are generalization of the t-test. They can be used in for the same type of data as a t-test, but in some additional circumstances as well.
For this lesson, we will work with a new data set on the Coachella Valley fringe-toed lizard. This lizard is very endangered, so we are interested in how to best manage the species to improve the chances of persistence. We have data based on a simulation that estimates the time to extinction (TTE) for three different management plans: no reserve (“none”), a single reserve (“single”), and multiple reserves in a network (“network”).
First, download the lizard data and load the data into R (don’t worry if you get a warning message when you do this - it’s not a problem).
lizard <- read.csv("lizard.csv")
Because when we checked assumptions, we determined that we needed to work with the log-transformed extinction times, we will add the log-transformed variable to our data set:
lizard$log_TTE <- log(lizard$TTE)
Also, load the ggplot2 package:
library(ggplot2)
Now, we are ready to get started. We will learn both the classical frequentist and likelihood approach.
To run an ANOVA with the frequentist approach, the structure is
similar to a t-test, but we will use a different function:
aov. However, the arguments (input) of the function are the
same as what we use for a t-test: our independent and dependent variable
(don’t forget to use the log of TTE) and the data set. To view the
output of the test, instead of just typing the name of the test object,
we will use the summary function, with the test object as
the input of the function.
lizard_anova <- aov(log_TTE ~ Plan, data=lizard)
summary(lizard_anova)
## Df Sum Sq Mean Sq F value Pr(>F)
## Plan 2 6.66 3.329 10.96 3.67e-05 ***
## Residuals 146 44.33 0.304
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
As with the t-test, when we view the output, we can see the degrees of freedom (Df), the F value, and the p-value (Pr(>F)). Note that for an ANOVA, we have two degrees of freedom: the between categories degrees of freedom (number of categories - 1) and the within categories degrees of freedom (total sample size - number of categories). We report both when we report our results.
Looking at the p-value, we can see that it is below 0.05, so we reject the null hypothesis and tentatively accept that at least one of our categories has a different time to extinction than the rest.
If your assumptions of normally-distributed residuals and/or unequal variances are not met and can’t be fixed with a transform, there are alternative tests that you can run.
The Kruskal-Wallis test is a rank test similar to the Mann-Whitney U test that we used as an alternative for the t-test. It can be used when your residuals are not normally distributed. If we were going to use it for the lizard data (we don’t have to because our assumption were met, but this is for the sake of demonstration), it would look like this:
kruskal.test(TTE ~ Plan, data=lizard)
If our variances are not equal, we can instead use the Welch’s ANOVA, which would look like this:
oneway.test(TTE ~ Plan, data=lizard, var.equal = FALSE)
Again, you don’t need to run these tests in addition to the normal ANOVA. You would run one of these as an alternative to the ANOVA if the assumption is not met.
If we want to know specifically which pairs of treatments differ from
each other, we can use post-hoc tests to do pairwise comparisons between
our categories. This is like running t-tests between each pair of
categories, except the test corrects for the problem of multiple
comparisons. We will use a function (pairwise.t.test) that
gives us the flexibility to choose the type of correction we want to do,
and will also work for data with unequal variances. We will use the Holm
correction, which is a good choice in general because it is not too
conservative but also doesn’t add other assumptions about our data.
The required inputs of this function are the dependent and independent variables for the test, in that order. There is not a separate argument for the data frame that contains those variables, so we have to provide the data frame nane as part of the variable names. To choose the type of p-value correction we want to use, we also need to add the “p.adjust.method” argument. We set this equal to “holm” to use the Holm correction.
pairwise.t.test(lizard$log_TTE,lizard$Plan,p.adjust.method="holm")
##
## Pairwise comparisons using t tests with pooled SD
##
## data: lizard$log_TTE and lizard$Plan
##
## network none
## none 2e-05 -
## single 0.029 0.029
##
## P value adjustment method: holm
The output shows a table of p-values for each pairwise comparison between our groups. The row and column names tell you which two groups are being compared. Let’s look at the value in the upper left corner. It compares no reserve (“none”) to a network of reserves (“network”). The p-value for this pairwise comparison is 0.00005, which is clearly less than 0.05, so that tells us that there is a significant difference in the extinction time between the two treatments.
Now look at the remaining two pairwise comparisons. Based on the p-values, which other pairs of categories had significantly different extinction times?
In this example, it’s not necessary because after the log-tranform, our data meet the assumptions of an ANOVA and these follow-up post-hoc tests, but if our data did violate these assumptions, we can adjust the tests we use for the post hoc tests.
If we do not have equal variances, we can still use the
pairwise.t.test function. We just add an additional
argument (pool.sd=FALSE) to the function. If we were going to use this
approach for the lizard data, it would look like this:
pairwise.t.test(lizard$TTE,lizard$Plan,p.adjust.method="holm",pool.sd=FALSE)
##
## Pairwise comparisons using t tests with non-pooled SD
##
## data: lizard$TTE and lizard$Plan
##
## network none
## none 4.6e-05 -
## single 0.0490 0.0086
##
## P value adjustment method: holm
To do the pairwise comparisons if we don’t have normally-distributed
residuals, we need a new function: pairwise.wilcox.test.
This will run pairwise tests similar to t-test, but that allow for the
non-normal distribution. The arguments of this function are the same as
the arguments for the pairwise.t.test fucntion, so to use
this function on the lizard data, it would look like this:
pairwise.wilcox.test(lizard$TTE,lizard$Plan,p.adjust.method="holm")
##
## Pairwise comparisons using Wilcoxon rank sum test with continuity correction
##
## data: lizard$TTE and lizard$Plan
##
## network none
## none 0.00011 -
## single 0.07094 0.02147
##
## P value adjustment method: holm
We interpret the output of these test in the same way we would for the normal pairwise comparisons.
To use the maximum likelihood approach for an ANOVA, we use exactly the same procedure that we used for a t-test.
We start by building our null and alternative models, using the
lm function:
lizard_null <- lm(log_TTE ~ 1, data=lizard)
lizard_alt <- lm(log_TTE ~ Plan, data=lizard)
To compare the models, we again use the AIC
function:
AIC(lizard_null,lizard_alt)
## df AIC
## lizard_null 2 267.0508
## lizard_alt 4 250.2030
What is your conclusion based on the AIC comparison?
If we want to view more information about our models, such as the
values of extinction time for each category, we can use the
summary function. Let’s try it just for the alternative
model:
summary(lizard_alt)
##
## Call:
## lm(formula = log_TTE ~ Plan, data = lizard)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.1710 -0.3913 -0.0580 0.4827 1.4904
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.38991 0.07872 55.769 < 2e-16 ***
## Plannone -0.51831 0.11076 -4.679 6.5e-06 ***
## Plansingle -0.24550 0.11076 -2.216 0.0282 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.551 on 146 degrees of freedom
## Multiple R-squared: 0.1306, Adjusted R-squared: 0.1187
## F-statistic: 10.96 on 2 and 146 DF, p-value: 3.665e-05
At the top, you can see the formula you used to build the model, and below that is a table showing the quartiles of your model residuals. Then you can see the coefficients table. The first column of this table (“Estimate”) can be used to figure our the time to extinction for each category. The first row in the estimate column (the “intercept” row) is the time to extinction for the network category. We can tell by process of elimination, because the other two row names are “Plannone” and “Plansingle”. To calculate the time to extinction for the no reserve category, we take the estimate value in the “Plan none” row and add it to the estimate from the intercept row:
-0.51831 + 4.38991
## [1] 3.8716
You would then use a similar approach to calculate the time to extinction for the single reserve category.
You don’t need to understand all of the rest of the output here (but
I am happy to answer any questions you have if you are curious!),
however, I will point out that the final line provides an F statistic,
degrees of freedom, and a p-value that should match the ones from the
aov function you used for the classical frequentist
approach. So you can use the lm function for the
frequentist approach as well, and just look at the final line from the
summary of the alternative model to get the output!
With the maximum likelihood approach, we do not typically follow up with post-hoc tests. This has to do with differences in the philosophy of the two approaches. Maximum likelihood approaches are more about identifying important explanatory variables for our data and not so much about digging into the nitty-gritty differences between the different categories within a variable. Therefore, if it is important to your question to test for those pairwise differences, you should use the classical frequentist approach.