In this lesson, we will move on to a new statistical test: the t-test. T-tests are part of a broader class of models known as general linear models. We can use general linear models to analyze data when we have a continuous dependent variable that is normally distributed (or at least close to normally distributed). General linear models tests will then test whether the independent variable included in the model affects the mean of the normally-distributed dependent variable. The thing that distinguishing different types of general linear models (t-tests, ANOVAs, linear regressions) is the independent variable(s).
In this lesson, we will work through three types of t-tests: one-sample, two-sample, and paired.
A one-sample t-test is used when you want to compare the mean of your sample data to a known or expected population mean. For example, say you know that the average historical population growth rate in a population of lizards is 1.05, and you want to test whether the population growth rate from the last 10 years differs from that population average.
The overall process for running a one-sample t-test in the classic frequentist framework is similar to that used for a Chi-square test: you start identifying your hypothesis, then you use your data to calculate a test statistic, and finally you use a probability distribution to calculate the probability of a test statistical equal than or greater than yours, assuming the null hypothesis is true.
As with our other tests, the null hypothesis is the hypothesis of no effect. In the context of a one-sample t-test, our null and alternative hypotheses would therefore be:
Null: The sample mean equals the known/expected mean.
Alternative: The sample mean does not equal the known/expected mean.
The test statistic that we calculate for t-tests is called the t-statistic. It incorporates three main pieces of information, which should be familiar to you because earlier this semester, you identified them as being important for determining statistical significance.
The equation for the t-statistic for a one-sample t-test is below. You won’t be required to do this calculation by hand, but we will work through how it incorporates the pieces of information above.
\[ t = \frac{\bar{x} - \mu}{s/\sqrt{n}} \]
In the numerator, \(\bar{x}\) is the sample mean and \(\mu\) is the known/expected population mean. Therefore, the number represents the signal: the magnitude of the difference between the sample mean and the population mean. The greater the difference, the higher the t-statistic will be. In the denominator, the \(s\) is the standard deviation, so this is our measure of the noise. The greater the standard deviation, the lower the t-statistic will be. The \(n\) in the denominator is the sample size. The higher the sample size, the higher the t-statistic. This matches our expectations about how the signal, noise, and sample size affect the likelihood of a statistically-significant result. If you have a larger signal and sample size, and lower noise, you will be more likely to get a statistically-significant result, and that corresponds to a higher t-statistic.
Like with the Chi-square test, once we have our test statistic we can use a probability distribution to calculate the probability of a t-statistic greater than or equal to ours. If our data are normally distributed and our null hypothesis is true, we would expect the probability of different values of the t-statistic to follow a t distribution, which is shown below. The t-distribution is similar to a normal distribution, and it is centered at zero, because if the null hypothesis is true, there should be a higher probability that the sample mean is close to the population mean. If the sample mean and population mean are equal, the t-statistic will be equal to zero.
Also like with the Chi-square distribution, the specific shape of the t-distribution depends on the degrees of freedom in our data. For a one-sample t-test, the degrees of freedom is equal to the sample size minus 1. The higher the degrees of freedom, the more narrow the t-distribution. This makes sense because if we have a larger sample size, our sample mean should be more accurate. If our sample population truly has the same mean as the known/expected population mean, then a larger sample size will make it more likely that the sample mean is close to the population mean, and the t-statistic is more likely to be close to zero.
Once we have calculated the degrees of freedom, we can use the correct version of the t distribution to calculate the p-value. As always, if the p-value is < 0.05, we reject our null hypothesis.
A two-sample t-test is used when you have a categorical independent variable with only two values, and you want to compare the means of your two categories. For example, if you want to test whether the photosynthesis rate differs between under-story and over-story plants, this can be done with a two-sample t-test. If your categorical variable has more than two possible values, you cannot use a two-sample t-test. Instead you would use an ANOVA, which will be the next type of test we cover.
The process for running a two-sample t-test is similar to a one-sample t-test, but there are slight differences in how we state our hypotheses and calculate the t-statistic and degrees of freedom.
In the context of a two-sample t-test, our null and alternative hypotheses are:
Null: The means of the two groups are equal.
Alternative: The means of the two groups are not equal.
For two-sample t-test, we also calculate a t-statistic, but the calculation is slightly different because we now have two sample groups instead of one sample group and one known mean. However, it still contains measures of the signal, noise, and sample size.
\[ t = \frac{\bar{x}_1 - \bar{x}_2}{\sqrt{S_p^2(\frac{1}{n_1}-\frac{1}{n_2})}} \]
In the numerator, we now have our two sample means, \(\bar{x}_1\) and \(\bar{x}_2\). Therefore, the numerator still represents the signal, but this time it is the difference between the two sample means. In the denominator, \(S_p^2\) is the pooled variance, which is essentially a weighted average of the variances of each sample group, with the weight determined by the sample size in each group (if they have equal sample sizes, they will be weighted equally). This is our new measure of the noise. Finally, we have the sample sizes of our two groups, \(n_1\) and \(n_2\). As before, if you have a larger signal and sample size, and lower noise, you will be more likely to get a statistically-significant result, and that corresponds to a higher t-statistic.
Once you have the t-statistic for a two-sample t-test, the process is the same for calculating the p-value and drawing your conclusions. Because you have two groups, though, when you calculate the degrees of freedom for the t-distribution, it is equal to \(n-2\), rather than \(n-1\).
Back when we discussed experimental design, we discussed blocked designs, which are a way of accounting for variation in our data caused by variables that we are not including in our analysis. They work by applying every treatment within a block, which could be the same individual, the same plot, etc. When you analyze the data, you then make comparisons between each treatment within each block. For example, if you are testing a new drug for blood pressure, you could measure the blood pressure of each patient before and after taking the drug, and see how the drug changes the blood pressure in each patient. This will account for any differences in the baseline blood pressure of the patients. When you have a blocked design with only two treatments, known as a paired design, you can analyze the data using a paired t-test.
The null and alternative hypothesis can be phrased in the same way as they are for a two-sample t-test:
Null: the two groups have equal means
Alternative: The two groups have different means
However, because the focus of a paried t-test is on the difference between the two treatment, you can also phrase your hypotheses in terms of that difference:
Null: The average difference between the groups is 0
Alternative: The average difference between the groups is not zero
A paired t-test has similarities with both a one-sample and two-sample t-test. It is similar to a two-sample t-test in that there are two treatment groups, but the way it is run is more like a one-sample t-test.
The first step in running a paired t-test is to calculate the differences between the treatments for each pair of samples in your study. In the example of the blood pressure drug above, that would mean calculating the difference in blood pressure for each patient before and after taking the drug. At that point, the test is run just like a one-sample t-test, but you run the test using the differences between the treatments for each each pair, rather than on the raw data for the two treatments. The “known” population mean for your null hypothesis would be zero because if the two treatments have equal mean, you would expect the average difference between each pair to be zero.
Calculating the degrees of freedom for a two-sample t-test is also similar to calculating the degrees of freedom for a one-sample t-test, but keep in mind that you are running the t-test on the difference between the treatment for each pair. Therefore, your degrees of freedom is \(n-1\), where \(n\) is the number of pairs, rather than the total number of data points.
Start off by reading in the civu data set that we have worked with
before. Be sure to first set your working directory to the folder with
the civu dataset. Then use the head function to view the
top rows of the data set, as a reminder of what data we have in the data
frame.
civu <- read.csv("civu.csv")
head(civu)
## herbivory plant_id density_2006 density_2007 survival_t.1
## 1 0 15 20 3 0
## 2 0 7 26 5 0
## 3 0 16 17 1 0
## 4 0 22 26 5 0
## 5 0 23 26 5 0
## 6 0 12 20 3 0
civu$herbivory <- as.factor(civu$herbivory)
In this data set, the “herbivory” variable is our independent variable and the “density_2006” variable will be our dependent variable. Because the herbivory variable is categorical, we need to convert it to a factor variable, as we have done before:
civu$herbivory <- as.factor(civu$herbivory)
Now we’re ready to get started on our analysis!
First, we are going to visualize the data. We will use two approaches: a standard boxplot and a density plot that shows the full distributions of our data.
We will use the ggplot2 package to make the graphs, so load that first:
library(ggplot2)
We’ll begin with a boxplot. The syntax is similar to what you used to make histograms, with a few modifications. Here is the full code, will explanations of the modification below:
ggplot(data = civu, aes(x = herbivory, y = density_2006)) +
geom_boxplot() +
labs(x = "Herbivory treatment", y = "Density in 2006") +
theme_classic()

As will the histograms, we start with the ggplot function, where we tell R what data and variable we want to use in our graph. The data argument is where we input the name of our data frame, as before, and the variables in the aes function With a boxplot, we are graphing two variables (our independent and dependent variables) because we want to visualize the effect of one variable on another. Therefore, we need to include both an x and y variable in the aes function.
In the next row down, we tell R what type of graph we want to make. Here, we want to make a box plot, so we use the geom_boxplot function.
The other two lines are similar to the lines we used when we made histograms: they change the axis labels and some of the aesthetics of the graph.
Now we will make a density plot that compares the full distributions of thistle density for the two herbivory treatments. Again, the syntax will be similar, but I will walk through the differences below:
ggplot(data=civu, aes(x=density_2006,fill=herbivory))+
geom_density(aes(y=after_stat(density)),alpha=0.5)+
scale_fill_manual(values=c("#ce9642","#3b7c70"))+
labs(x="Thistles Density",y="Frequency")+
theme_classic()

Now that we have visualized the data, let’s run the t-test itself. We
will set up the model using the t-test function: t.test.
The first argument for this function is the formula for our model, where
we tell R our independent and dependent variables. The dependent
variable goes on the left and the independent variable goes on the
right. The second argument is the name of the data set. Then then final
argument “var.equal = TRUE” tells R that we are assuming our two
treatments have equal variance.
civu_ttest <- t.test(density_2006 ~ herbivory, data = civu, var.equal = TRUE)
To view the output, just type the name you gave to your t-test object:
civu_ttest
##
## Two Sample t-test
##
## data: density_2006 by herbivory
## t = -8.2506, df = 148, p-value = 7.941e-14
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
## -21.07172 -12.92828
## sample estimates:
## mean in group 0 mean in group 1
## 21 38
You should see the following pieces in your output:
Based on this output, would you reject the null hypothesis and tentatively accept the alternative hypothesis?
Next we will use a likelihood-based approach to test the same question. For this, we will start by building our two linear models, just as we did in the Model Building lesson:
civu_null <- lm(density_2006 ~ 1, civu)
civu_alt <- lm(density_2006 ~ herbivory, civu)
Now, we just need to calculate the Akaike’s Information Criterion
(AIC) values for the two models. We can do this using the
AIC function. As the arguments, we just need to list the
models we want to compare. We can compare more than two models at once,
but for this question, we just have our null model and one alternative
model.
AIC(civu_null, civu_alt)
## df AIC
## civu_null 2 1178.015
## civu_alt 3 1123.255
You should see the output automatically appear in a table. The first column lists the model. The second column (df) lists the number of parameters in each model (remember, AIC penalizes for adding parameters). The final column lists the AIC values for each model. The lower the AIC value, the better the model. A difference of 2 or more between the AIC values indicates that one model is significantly better than the other.
Based on this output, which is the better model? Is it significantly better?
Does your conclusion from this approach match your conclusion from the classical frequentist approach?