For our correlation analysis, we are going to look at correlations between three climate variables from a data set on the effects of climate on plant cover. The tree variables we will test are: total precipitation (Totprecip), mean temperature (Mean_tempC), and maximum temperature (Max_tempC). There are often relationships between different climate variables, but those relationships are not usually causal (e.g., higher precipitation doesn’t cause higher temperatures), so a correlation analysis is the right way to go for analyzing these relationships.
First, load the data set. Be sure your working directory is set correctly.
plant <- read.csv("PlantSumm.csv")
To visualize relationships between two continuous variables, a
scatterplot is a good approach. With a correlation, we don’t always add
a best fit line (the best fit line is the output of a regression
analysis). Here we have three climate variables, so we want to look at
the pairwise relationships between each pair of variables. We could do
that with three completely separate plots, but there’s a faster way!
With the pairs function, we can generate a grid of plots
that shows the relationship between all pairs of variables we want to
include in our analysis. We just provide a formula with the variables we
want to include and the data set from which they come.
pairs(~ Totprecip + Mean_tempC + Max_tempC, data = plant)

In the output, you can see the variables on the diagonal and the scatterplot for each pair of variables. (Note that the plots below the diagonal are just repeats of the plots above the diagonal, but with the axes switched). Which variables appear to have a strong correlation?
Now we will run a formal test to see if the correlations are
significant. You can download additional packages that will
automatically run the test for all pairs of variables in our data set,
but we will just use the base R function (cor.test), so
we’ll run a separate test for each pair of variables.
To run the pairwise tests, we’ll use the with function
in combination with the cor.test function, to pull out the
two variables we want from the plant data set and run the correlation
for those two variables.
with(plant,cor.test(Mean_tempC,Max_tempC))
##
## Pearson's product-moment correlation
##
## data: Mean_tempC and Max_tempC
## t = 8.4936, df = 17, p-value = 1.6e-07
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.7531352 0.9611007
## sample estimates:
## cor
## 0.8996063
with(plant,cor.test(Mean_tempC,Totprecip))
##
## Pearson's product-moment correlation
##
## data: Mean_tempC and Totprecip
## t = -1.6944, df = 17, p-value = 0.1084
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.71147981 0.08956702
## sample estimates:
## cor
## -0.3801058
with(plant,cor.test(Max_tempC,Totprecip))
##
## Pearson's product-moment correlation
##
## data: Max_tempC and Totprecip
## t = -2.5001, df = 17, p-value = 0.02294
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.78728864 -0.08407849
## sample estimates:
## cor
## -0.5184872
Based on the output of these tests, which are statistically significant? Are the correlations positive or negative?