Table of Contents:
- Equation of the best fit line
- What is a line of best fit
- How to find line of best fit
- How to draw a line of best fit
- Example 1
Equation of the best fit line
This lesson is a continuation of our past two lessons, where we talked about bivariate data, scatter plots and correlation, and then learnt about regression analysis. Therefore, we will be using the concepts we acquired throughout those two lessons and construct on them to study the line of best fit definition and characteristics.
What is a line of best fit
As we saw in our past lesson, a line of best fit (or best fit line) is simply straight line that tries to represent the data points in a scatter plot as best as possible. This doesnt mean that this line will touch every single point from the data in the plot, actually a line of best fit may touch a few, all or NONE of the data points plotted in the graph. For that reason, the line of best fit is also called the trend line because instead of exactly representing each single point of the data set, it does all it can by presenting the overall trend that the data points follow, it provides a view of the behaviour of the data points and how the variables are correlated with each other.
How to find line of best fit
Since the line of best fit is simply a straight line, it can be mathematically defined through the equation for a straight line:
Where we know that:
dependent variable
independent variable
slope of the line (the name can be different depending on the textbook you are using)
intercept (point in the graph where the line crosses the axis
Notice the slope can have either one of two names: or , the name differs depending on which textbook you are using in your class or to study; for this lesson, we will keep the name , just remember that we are talking about the slope of best fit line.
For the cases in which we are looking at a linear regression analysis graph where a bivariate set of data has been plotted, we will always have the values of the variables and (since these are the values given in the bivariate data set) and so, we will usually have to solve for the slope and the y-intercept from the equation for the line of best fit.
In other words, when having a bivariate data set, and are provided, so a and b have to be calculated (this is not always the case, the line of best fit equation can be used to solve for the values of the variables themselves when given the slope of the line and the y-intercept, but if the data table is provided, then we will be solving for and ).
The formulas for the slope and the y-intercept are as follows:
Where:
number of data points
dependent variable data value
independent variable data value
slope of the best fit line
-intercept
= mean for the sample of values
= mean for the sample of values
is the symbol for summation
therefore:
In equation 2, notice that b is defined in terms of a, therefore, you will always solve for a first; b is also defined in terms of the means and , which takes us to an important realization: the data points in the set shown in a regression analysis scatter plot count as a sample, not as a whole population. If you think about it, this makes sense, since a regression analysis scatter plot is usually used to find missing points that have not been graphed, but can be inferred by the relationship shown throughout the given data points.
Therefore, when obtaining the mean of the values for each of the variables used in the analysis, we are taking the mean of sample data points and so the notation for the mean of a sample: .
After solving and , we can use these values to solve the best fit line equation as shown in equation 1, and plot the best fit line graph in the scatter plot.
How to draw a line of best fit
Let us use the method described above to obtain the best fit line of the bivariate data scatter plot shown in figure 2. We start by producing its corresponding data table so we know the values of and .
So let us solve for a by making the calculations in pieces:
Now we solve for b:
And so, we can obtain the points for our trend line using the line of best fit formula from equation 1:
And now we can graph the two points found above: (0, 8.9) and (13, 1.23); we connect them with a straight line and we find the line of best fit!
And so, for the scatter plot of the line of best fit as seen in figure 4, we can see that the points (0, 8.9) and (13, 1.23) are shown in green, and the best fit line is shown in blue.
Let us work through another example so you can get more practice:
Example 1
Given the following bivariate data, what is the line of best fit?Use the the equation for the line of best fit and plot it in the diagram provided.
We start by doing the calculation for the slope of the line of best fit:
Now we solve for b:
And so, we can obtain the points for our trend line using the line of best fit formula from equation 1:
And now we can graph the two points found above: (0, 0.5) and (4, 4.9); we connect them with a straight line and we obtain the line of best fit:
No we end this lesson with a few recommendations: this lesson on the equation of the line of best fit provides many more examples that you can work through so you continue practice what you learned today. And for even more practice on you own, this lines of best fit worksheet can be printed out and worked through!
This is it for our lesson of today, see you in the next one!
•
•