Lesson 8: Correlations

For our correlation analysis, we are going to look at correlations between three climate variables from a data set on the effects of climate on plant cover. The tree variables we will test are: total precipitation (Totprecip), mean temperature (Mean_tempC), and maximum temperature (Max_tempC). There are often relationships between different climate variables, but those relationships are not usually causal (e.g., higher precipitation doesn’t cause higher temperatures), so a correlation analysis is the right way to go for analyzing these relationships.

8.1 Visualizing our data

First, load the data set. Be sure your working directory is set correctly.

plant <- read.csv("PlantSumm.csv")

To visualize relationships between two continuous variables, a scatterplot is a good approach. With a correlation, we don’t always add a best fit line (the best fit line is the output of a regression analysis). Here we have three climate variables, so we want to look at the pairwise relationships between each pair of variables. We could do that with three completely separate plots, but there’s a faster way! With the pairs function, we can generate a grid of plots that shows the relationship between all pairs of variables we want to include in our analysis. We just provide a formula with the variables we want to include and the data set from which they come.

pairs(~ Totprecip + Mean_tempC + Max_tempC, data = plant) 

In the output, you can see the variables on the diagonal and the scatterplot for each pair of variables. (Note that the plots below the diagonal are just repeats of the plots above the diagonal, but with the axes switched). Which variables appear to have a strong correlation?

8.2 Running the correlation test

Now we will run a formal test to see if the correlations are significant. You can download additional packages that will automatically run the test for all pairs of variables in our data set, but we will just use the base R function (cor.test), so we’ll run a separate test for each pair of variables.

To run the pairwise tests, we’ll use the with function in combination with the cor.test function, to pull out the two variables we want from the plant data set and run the correlation for those two variables.

with(plant,cor.test(Mean_tempC,Max_tempC))
## 
##  Pearson's product-moment correlation
## 
## data:  Mean_tempC and Max_tempC
## t = 8.4936, df = 17, p-value = 1.6e-07
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.7531352 0.9611007
## sample estimates:
##       cor 
## 0.8996063
with(plant,cor.test(Mean_tempC,Totprecip))
## 
##  Pearson's product-moment correlation
## 
## data:  Mean_tempC and Totprecip
## t = -1.6944, df = 17, p-value = 0.1084
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.71147981  0.08956702
## sample estimates:
##        cor 
## -0.3801058
with(plant,cor.test(Max_tempC,Totprecip))
## 
##  Pearson's product-moment correlation
## 
## data:  Max_tempC and Totprecip
## t = -2.5001, df = 17, p-value = 0.02294
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.78728864 -0.08407849
## sample estimates:
##        cor 
## -0.5184872

Based on the output of these tests, which are statistically significant? Are the correlations positive or negative?