Evaluating Linear Relationships

Evaluating Linear Relationships

Emily Halford 24/11/2020 4
Evaluating Linear Relationships

Scatterplots

Scatterplots are used to visually assess the relationship between two numeric variables. Typically, the explanatory variable is placed on the X axis and the independent variable is placed on the Y axis.

Image_by_Author.png

Now, both linear relationships pictured below are positive. As X increases, Y also increases. Yet again, the relationship represented in the scatterplot on the right is far stronger than that in the scatterplot on the left.

Image_by_Emily.png

 

Often, the relationship between two continuous variables isn’t linear at all. One such non-linear relationship is pictured below — as X increases, Y follows a parabolic shape. There appears to be a strong and important relationship between these variables, but it would not be captured by techniques designed to assess linear relationships (e.g., correlation and regression). The possibility of a relationship such as that pictured below underscores the importance of producing a scatterplot before running analyses, as this meaningful relationship could be completely missed in an analysis that skips data visualization.

Image_by_Emily_2.png

Correlation Coefficient 

Once you’ve seen a somewhat linear relationship on your scatterplot, you can calculate a correlation coefficient to get a number representing the strength of the association. Correlation coefficients can be either negative or positive (which indicates a negative or positive relationship, respectively) and range from -1 to 1, with the ends of this spectrum representing strong relationships and 0 indicating that there is no linear relationship between the variables.

Monotonic_non-linear_relationship.png

 Monotonic, non-linear relationship. Image from https://thenounproject.com/term/graph-curve-up/2064827/

Image_by_Emily_3.png

One line of R code is all it takes to produce both the Pearson correlation coefficient and the associated t-test output for the “weak” positive correlation pictured on the left:

cor.test.png

 

As can be seen in the output below, the Pearson correlation coefficient (0.78) is very large even in this “weak” relationship. The p-value associated with the t-test statistic is well below 0.05, indicating a significant relationship:

Pearsons_Product.png

cor.test_data.png

Now the relationship is almost equivalent to 1, which confirms the very strong relationship that we could observe in the scatterplot above. Yet again, the relationship is statistically significant:

Pearson_Product_Moment_Correlation.png

 

Linear Regression*

*Note: There are many kinds of regression analyses, and lots of complexity that one can dive into in learning about regression. For the purposes of this article, I am keeping it simple and am focused entirely on linear regression and its relationship with scatterplots and correlation coefficients.

  1. Linearity: The relationship between X and Y is linear
  2. Homoscedasticity: Constant variance of residuals at different values of X
  3. Normality: Data should be normally distributed around the regression line

summaryweak.png

Call_Formula.png

Output for “weak” relationship linear model

Let’s run the model now for our “strong” relationship:

strong_relationship.png

Output_for_strong_relationship_linear_model.png

 Output for “strong” relationship linear model

Share this article

Leave your comments

Post comment as a guest

0
terms and condition.
  • Leah Johnson

    Brilliant

  • Michael Turner

    Well explained

  • Adam Pavelka

    This helped me so much!

  • Pamela Reddin

    Insightful

Share this article

Emily Halford

Data Science & Mental Health Expert

Emily is a data analyst working in psychiatric epidemiology in New York City. She is a suicide-prevention professional who is enthusiastic about taking a data-driven approach to the mental health field. Emily holds a Master of Public Health from Columbia University.

   
Save
Cookies user prefences
We use cookies to ensure you to get the best experience on our website. If you decline the use of cookies, this website may not function as expected.
Accept all
Decline all
Read more
Analytics
Tools used to analyze the data to measure the effectiveness of a website and to understand how it works.
Google Analytics
Accept
Decline