What’s New ?

The Top 10 favtutor Features You Might Have Overlooked

Read More

Pearson correlation in R (With Code Examples)

  • Jan 14, 2024
  • 8 Minutes Read
Pearson correlation in R (With Code Examples)

Understanding the relationships between variables is critical in statistical analysis and data science for gaining meaningful insights and making informed decisions. The Pearson correlation coefficient is extremely useful for determining the strength and direction of a linear relationship between two continuous variables. In this article, we'll look at Pearson correlation in the R programming language, including negative correlation, interpretation, the formula, and how to use it in SPSS. 

What is Pearson Correlation?

The Pearson correlation coefficient, which is also known as "r". It is used to assess the strength and direction of a linear relationship between two continuous variables. Its values range between -1 and 1, where:

  • 1 indicates a perfect positive linear relationship,
  • 0 suggests no linear relationship, and
  • -1 signifies a perfect negative linear relationship.

The Pearson correlation coefficient (r) is calculated as the covariance between the two variables divided by the product of their standard deviations. The formula can be expressed as follows:

r = Σ[(x - mx)(y - my)] / √(Σ(x - mx)^2 * Σ(y - my)^2)

x and y are individual data points for the two variables,

mx and my are the means of the respective variables.

We take its summation starting from 1 to n, n being the number of data points in that dataset.

What is a Negative Correlation?

A negative correlation occurs when one variable increases while the other decreases. A Pearson correlation coefficient (r) close to -1 indicates a negative correlation. In R, you can calculate the Pearson correlation coefficient using the cor() function.

Code:

x <- c(1, 2, 3, 4, 5)
y <- c(5, 4, 3, 2, 1)

correlation <- cor(x, y)

print(paste("Pearson correlation coefficient: ", correlation))

 

Output:

[1] "Pearson correlation coefficient:  -1"

 

In this case, the vectors x and y have a perfect negative linear relationship, which yields a Pearson correlation coefficient of -1. This means that as the values in vector x increase, the values in vector y decrease linearly.

Interpretation of Pearson Correlation

To interpret the results of the Pearson Correlation, we need to consider both - its magnitude and its sign. The sign represents whether it's a positive or a negative correlation, and the magnitude represents the intensity of the correlation. The bigger the number, the stronger the correlation.

  • |r| close to 1 suggests a strong linear relationship,
  • |r| close to 0 implies a weak or no linear relationship.

The sign of r reveals the direction of the relationship.

  • r > 0 indicates a positive linear relationship,
  • r < 0 indicates a negative one.

Let us now consider an example to see how Pearson Correlation is used in the real world and how we can interpret it.

Code:

data(mtcars)

correlation_mpg_wt <- cor(mtcars$mpg, mtcars$wt)
correlation_hp_disp = cor(mtcars$hp, mtcars$disp)

print("Pearson correlation coefficient between mpg and wt: ", correlation_mpg_wt)
print("Pearson correlation coefficient between hp and disp: ", correlation_hp_disp)

 

Output:

[1] "Pearson correlation coefficient between mpg and wt:  -0.867659376517228"
[2] "Pearson correlation coefficient between hp and disp: 0.7909486"

 

In this example, we're using the mtcars dataset that comes with R. Here we see, there's a strong negative correlation between mpg and wt, while a strong positive correlation between hp and disp. These can be concluded from their correlation number: -0.867659376517228 and +0.7909486 respectively.

A negative correlation between miles per gallon (mpg) and weight (wt) suggests that heavier weight has lower miles per gallon. Similarly, the positive correlation between horsepower (hp) and disp suggests higher the disp, the higher the horsepower.

SPSS Application

Statistical Package for the Social Sciences (SPSS) is a widely used software for statistical analysis. While R is a powerful and adaptable programming language, many researchers and analysts are familiar with SPSS due to its user-friendly interface. The Pearson correlation analysis is available in both R and SPSS, making it simple to switch between them.

In SPSS, you can perform the Pearson correlation analysis using the "Correlate" command. Follow the following step-by-step guide to do the correlation in SPSS.

  1. Open SPSS and load your dataset.

  2. Navigate to "Analyse" in the menu bar and select "Correlate."

  3. Choose "Bivariate" for a pairwise correlation analysis.

  4. Select the variables you want to analyze and move them to the "Variables" box.

  5. Click "OK" to run the analysis.

SPSS will generate a correlation matrix that contains Pearson correlation coefficients and p-values. The output contains useful information about the strength and significance of the relationships between variables.

You can integrate SPSS and R using the foreign library in R programming.

library(foreign)

# Read SPSS data file into R
data_spss <- read.spss("your_data_file.sav", to.data.frame = TRUE)

correlation_matrix <- cor(data_spss)

print(correlation_matrix)

 

In R, you can perform a similar analysis with the cor() function. To obtain the Pearson correlation matrix, read an SPSS data file into R and apply the cor() function to the data frame. This matrix then summarises the relationships between all pairs of variables.

Conclusion

In conclusion, Pearson Correlation in R is an important tool for quantifying linear relationships between continuous variables. We have learned about both positive and negative correlations, what they mean and how to interpret them, and how to implement them in SPSS and R programming language. It is clear that this statistical measure is very crucial for extracting valuable insights from diverse datasets. Whether examining patterns for research, business analytics, or any other data-driven task, analysts and researchers can make informed decisions by assessing the strength and direction of these relationships among datasets.

FavTutor - 24x7 Live Coding Help from Expert Tutors!

About The Author
Abhisek Ganguly
Passionate machine learning enthusiast with a deep love for computer science, dedicated to pushing the boundaries of AI through academic research and sharing knowledge through teaching.