Deriving the Least-Squares Estimates for Simple Linear Regression
This document contains the mathematical details for deriving the least-squares estimates for slope (\(\beta_1\)) and intercept (\(\beta_0\)). We obtain the estimates, \(\hat{\beta}_1\) and \(\hat{\beta}_0\) by finding the values that minimize the sum of squared residuals, as shown in Equation 1.
\[ SSR = \sum\limits_{i=1}^{n}[y_i - \hat{y}_i]^2 = [y_i - (\hat{\beta}_0 + \hat{\beta}_1 x_i)]^2 = [y_i - \hat{\beta}_0 - \hat{\beta}_1 x_i]^2 \tag{1}\]
Recall that we can find the values of \(\hat{\beta}_1\) and \(\hat{\beta}_0\) that minimize /eq-ssr by taking the partial derivatives of Equation 1 and setting them to 0. Thus, the values of \(\hat{\beta}_1\) and \(\hat{\beta}_0\) that minimize the respective partial derivative also minimize the sum of squared residuals. The partial derivatives are shown in Equation 2.
\[ \begin{aligned} \frac{\partial \text{SSR}}{\partial \hat{\beta}_1} &= -2 \sum\limits_{i=1}^{n}x_i(y_i - \hat{\beta}_0 - \hat{\beta}_1 x_i) \\ \frac{\partial \text{SSR}}{\partial \hat{\beta}_0} &= -2 \sum\limits_{i=1}^{n}(y_i - \hat{\beta}_0 - \hat{\beta}_1 x_i) \end{aligned} \tag{2}\]
The derivation of deriving \(\hat{\beta}_0\) is shown in Equation 3.
\[ \begin{aligned}\frac{\partial \text{SSR}}{\partial \hat{\beta}_0} &= -2 \sum\limits_{i=1}^{n}(y_i - \hat{\beta}_0 - \hat{\beta}_1 x_i) = 0 \\&\Rightarrow -\sum\limits_{i=1}^{n}(y_i + \hat{\beta}_0 + \hat{\beta}_1 x_i) = 0 \\&\Rightarrow - \sum\limits_{i=1}^{n}y_i + n\hat{\beta}_0 + \hat{\beta}_1\sum\limits_{i=1}^{n}x_i = 0 \\&\Rightarrow n\hat{\beta}_0 = \sum\limits_{i=1}^{n}y_i - \hat{\beta}_1\sum\limits_{i=1}^{n}x_i \\&\Rightarrow \hat{\beta}_0 = \frac{1}{n}\Big(\sum\limits_{i=1}^{n}y_i - \hat{\beta}_1\sum\limits_{i=1}^{n}x_i\Big)\\&\Rightarrow \hat{\beta}_0 = \bar{y} - \hat{\beta}_1 \bar{x} \\\end{aligned} \tag{3}\]
The derivation of \(\hat{\beta}_1\) using the \(\hat{\beta}_0\) we just derived is shown in Equation 4.
\[ \begin{aligned}&\frac{\partial \text{SSR}}{\partial \hat{\beta}_1} = -2 \sum\limits_{i=1}^{n}x_i(y_i - \hat{\beta}_0 - \hat{\beta}_1 x_i) = 0 \\&\Rightarrow -\sum\limits_{i=1}^{n}x_iy_i + \hat{\beta}_0\sum\limits_{i=1}^{n}x_i + \hat{\beta}_1\sum\limits_{i=1}^{n}x_i^2 = 0 \\\text{(Fill in }\hat{\beta}_0\text{)}&\Rightarrow -\sum\limits_{i=1}^{n}x_iy_i + (\bar{y} - \hat{\beta}_1\bar{x})\sum\limits_{i=1}^{n}x_i + \hat{\beta}_1\sum\limits_{i=1}^{n}x_i^2 = 0 \\&\Rightarrow (\bar{y} - \hat{\beta}_1\bar{x})\sum\limits_{i=1}^{n}x_i + \hat{\beta}_1\sum\limits_{i=1}^{n}x_i^2 = \sum\limits_{i=1}^{n}x_iy_i \\&\Rightarrow \bar{y}\sum\limits_{i=1}^{n}x_i - \hat{\beta}_1\bar{x}\sum\limits_{i=1}^{n}x_i + \hat{\beta}_1\sum\limits_{i=1}^{n}x_i^2 = \sum\limits_{i=1}^{n}x_iy_i \\&\Rightarrow n\bar{y}\bar{x} - \hat{\beta}_1n\bar{x}^2 + \hat{\beta}_1\sum\limits_{i=1}^{n}x_i^2 = \sum\limits_{i=1}^{n}x_iy_i \\&\Rightarrow \hat{\beta}_1\sum\limits_{i=1}^{n}x_i^2 - \hat{\beta}_1n\bar{x}^2 = \sum\limits_{i=1}^{n}x_iy_i - n\bar{y}\bar{x} \\&\Rightarrow \hat{\beta}_1\Big(\sum\limits_{i=1}^{n}x_i^2 -n\bar{x}^2\Big) = \sum\limits_{i=1}^{n}x_iy_i - n\bar{y}\bar{x} \\ &\hat{\beta}_1 = \frac{\sum\limits_{i=1}^{n}x_iy_i - n\bar{y}\bar{x}}{\sum\limits_{i=1}^{n}x_i^2 -n\bar{x}^2}\end{aligned} \tag{4}\]
To write \(\hat{\beta}_1\) in a form that’s more recognizable, we will use the following:
\[ \sum x_iy_i - n\bar{y}\bar{x} = \sum(x - \bar{x})(y - \bar{y}) = (n-1)\text{Cov}(x,y) \tag{5}\]
\[ \sum x_i^2 - n\bar{x}^2 - \sum(x - \bar{x})^2 = (n-1)s_x^2 \tag{6}\]
where \(\text{Cov}(x,y)\) is the covariance of \(x\) and \(y\), and \(s_x^2\) is the sample variance of \(x\) (\(s_x\) is the sample standard deviation).
Thus, applying Equation 5 and Equation 6, we have
\[ \begin{aligned}\hat{\beta}_1 &= \frac{\sum\limits_{i=1}^{n}x_iy_i - n\bar{y}\bar{x}}{\sum\limits_{i=1}^{n}x_i^2 -n\bar{x}^2} \\&= \frac{\sum\limits_{i=1}^{n}(x-\bar{x})(y-\bar{y})}{\sum\limits_{i=1}^{n}(x-\bar{x})^2}\\&= \frac{(n-1)\text{Cov}(x,y)}{(n-1)s_x^2}\\&= \frac{\text{Cov}(x,y)}{s_x^2}\end{aligned} \tag{7}\]
The correlation between \(x\) and \(y\) is \(r = \frac{\text{Cov}(x,y)}{s_x s_y}\). Thus, \(\text{Cov}(x,y) = r s_xs_y\). Plugging this into Equation 7, we have
\[ \hat{\beta}_1 = \frac{\text{Cov}(x,y)}{s_x^2} = r\frac{s_ys_x}{s_x^2} = r\frac{s_y}{s_x} \tag{8}\]