Evaluating Linear Relationships

Emily Halford 24/11/2020 4

Scatterplots

Scatterplots are used to visually assess the relationship between two numeric variables. Typically, the explanatory variable is placed on the X axis and the independent variable is placed on the Y axis.

Now, both linear relationships pictured below are positive. As X increases, Y also increases. Yet again, the relationship represented in the scatterplot on the right is far stronger than that in the scatterplot on the left.

 

Often, the relationship between two continuous variables isn’t linear at all. One such non-linear relationship is pictured below — as X increases, Y follows a parabolic shape. There appears to be a strong and important relationship between these variables, but it would not be captured by techniques designed to assess linear relationships (e.g., correlation and regression). The possibility of a relationship such as that pictured below underscores the importance of producing a scatterplot before running analyses, as this meaningful relationship could be completely missed in an analysis that skips data visualization.

Correlation Coefficient 

Once you’ve seen a somewhat linear relationship on your scatterplot, you can calculate a correlation coefficient to get a number representing the strength of the association. Correlation coefficients can be either negative or positive (which indicates a negative or positive relationship, respectively) and range from -1 to 1, with the ends of this spectrum representing strong relationships and 0 indicating that there is no linear relationship between the variables.

 Monotonic, non-linear relationship. Image from https://thenounproject.com/term/graph-curve-up/2064827/

One line of R code is all it takes to produce both the Pearson correlation coefficient and the associated t-test output for the “weak” positive correlation pictured on the left:

 

As can be seen in the output below, the Pearson correlation coefficient (0.78) is very large even in this “weak” relationship. The p-value associated with the t-test statistic is well below 0.05, indicating a significant relationship:

Now the relationship is almost equivalent to 1, which confirms the very strong relationship that we could observe in the scatterplot above. Yet again, the relationship is statistically significant:

 

Linear Regression*

*Note: There are many kinds of regression analyses, and lots of complexity that one can dive into in learning about regression. For the purposes of this article, I am keeping it simple and am focused entirely on linear regression and its relationship with scatterplots and correlation coefficients.

  1. Linearity: The relationship between X and Y is linear
  2. Homoscedasticity: Constant variance of residuals at different values of X
  3. Normality: Data should be normally distributed around the regression line

Output for “weak” relationship linear model

Let’s run the model now for our “strong” relationship:

 Output for “strong” relationship linear model

Share this article

Leave your comments

Post comment as a guest

  • Leah Johnson

    Brilliant

  • Michael Turner

    Well explained

  • Adam Pavelka

    This helped me so much!

  • Pamela Reddin

    Insightful