Homework 3 - Multiple Linear Regression

Due Friday, October 13, 11:59 PM on D2L

Homework Instruction and Requirement

  • Homework 3 covers course materials of Week 1 to 6.

  • Please submit your work in one PDF file including all parts to D2L > Assessments > Dropbox. Multiple files or a file that is not in pdf format are not allowed.

  • In your homework, please number and answer questions in order.

  • Your answers may be handwritten on the Mathematical Derivation and Reasoning part. However, you need to scan your paper and make it a PDF file.

  • Your entire work on Statistical Computing and Data Analysis should be completed by any word processing software (Microsoft Word, Google Docs, (R)Markdown, Quarto, LaTex, etc) and your preferred programming language. Your document should be a PDF file.

  • Questions starting with (MSSC) are for MSSC 5780 students.

  • It is your responsibility to let me understand what you try to show. If you type your answers, make sure there are no typos. I grade your work based on what you show, not what you want to show. If you choose to handwrite your answers, write them neatly. If I can’t read your sloppy handwriting, your answer is judged as wrong.

Mathematical Derivation and Reasoning

The simple linear regression and multiple linear regression models and notations are the same as defined in our course slides and textbook.

In simple linear regression,

  1. (MSSC) Show that \(r^2 = \frac{SS_R}{SS_T} = R^2\), that is, the square of the sample correlation coefficient between \(y\) and \(x\) is equal to the coefficient of determination.

Suppose \({\bf A}_{n \times n}\) is a symmetric idempotent matrix.

In multiple linear regression, let the hat matrix \({\bf H}_{n \times n} = {\bf X(X'X)}^{-1}{\bf X'}\).

  1. (MSSC) Show that \({\bf H}\) and \((\bf I - H)\) are symmetric and idempotent.

  2. (MSSC) Show that \(\text{tr}({\bf H}) = p\).

Statistical Computing and Data Analysis

Please perform a data analysis using \(\texttt{R}\) or your preferred language. Any results should be generated by computer outputs, and your work should be done entirely by your computer. Handwriting is not allowed. Relevant code should be attached.

We use the same data set mpg.csv for data analysis.

  1. Fit a MLR model \(y = \beta_0 + \beta_1x_1 + \beta_6x_6+\epsilon\) relating gasoline mileage \(y\) (miles per gallon) to engine displacement \(x_1\) and the number of carburetor barrels \(x_6\). Interpret the regression coefficients \(\beta_1\) and \(\beta_6\).

  2. Write down the \(H_0\) and \(H_1\) for testing significance of regression, and construct the ANOVA table to test the significance. Explain your decision rule and conclusion.

  3. Obtain \(R^2\) and \(R_{Adj}^2\) for this MLR model. Compare these to the \(R^2\) and \(R_{Adj}^2\) for the SLR model relating mileage to engine displacement.

  4. Find a \(95\%\) confidence interval (CI) for \(\beta_1\). Interpret your results.

  5. With \(\alpha = 0.05\), do the marginal test \(H_0: \beta_6 = 0\). Interpret your results.

  6. Find a \(95\%\) CI on the mean gasoline mileage when \(x_1 = 275\) in\(^3\) and \(x_6 = 2\) barrels.

  7. Find a \(95\%\) prediction interval (PI) for a new observation on gasoline mileage when \(x_1 = 275\) in\(^3\) and \(x_6 = 2\) barrels.

  8. In Homework 2 you were asked to compute \(95\%\) CI on mean gasoline mileage and PI on a car’s gasoline mileage when the engine displacement \(x_1 = 275\) in\(^3\). Compare the length of these intervals to the length of the CI and PI from the question 6 and 7 above. Does adding \(x_6\) to the model help in terms of prediction or uncertainty reduction?

  9. Perform matrix operations to compute \(({\bf y - X b})'({\bf y - X b})\) and verify it is \(SS_{res} = \sum_{i=1}^n(y_i - \hat{y}_i)^2\).

  10. Generate the predictor effect plot for \(x_1\) and \(x_6\). Explain the plot by discussing the effect of each predictor on \(y\).

  11. Construct the \(95\%\) confidence region for the coefficients \((\beta_1, \beta_6)\). Interpret the region.