Homework 1 - Probaility and Statistics Review

Due Friday, September 8, 11:59 PM on D2L

Homework Instruction and Requirement

  • Homework 1 covers course materials of Week 1 to 2.

  • Please submit your work in one PDF file to D2L > Assessments > Dropbox. Multiple files or a file that is not in pdf format are not allowed.

  • In your homework, please number and answer questions in order.

  • It is your responsibility to let me understand what you try to show. If you type your answers, make sure there are no typos. I grade your work based on what you show, not what you want to show. If you choose to handwrite your answers, write them neatly. If I can’t read your sloppy handwriting, your answer is judged as wrong.

  • Relevant code should be attached.

Programming and Computing

Please sharpen your coding skill using R or any language you prefer. No need to show your work on this part!

Probability and Statistics Review

  1. \(Y_1 \sim N(3, 8)\) and \(Y_2 \sim N(1, 4)\), and \(Y_1\) and \(Y_2\) are independent. What is the distribution the variable \(2Y_1 + 3Y_2\) follows?
  1. Plot normal density curves with different choices of mean and standard deviation.

  2. Install the R package ISLR2. Choose a continuous variable in the ISLR2::Boston data set. Use the sample() function to draw a simple random sample of size 20 from this population. Calculate the sample average.

  3. Repeat the sampling in 3. several times to plot a sampling distribution of the sample mean.

  4. Suppose \(Y_i \stackrel{iid}{\sim} N(\mu, \sigma^2)\), \(i = 1, 2, \dots, n\), with unknown \(\mu\) and \(\sigma\). The \(100(1-\alpha)\%\) confidence interval (CI) for the population mean \(\mu\) is \(\left( \overline{y}- t_{\alpha/2, n-1}\frac{s}{\sqrt{n}}, \overline{y} + t_{\alpha/2, n-1}\frac{s}{\sqrt{n}} \right)\). Use simulation with \(\alpha = 0.1\), \(\mu = 4\) and \(\sigma = 2\) to verify that such CIs contain \(\mu\) about \(100(1-\alpha)\%\) of times. Fill the percentage in the following table, and comment on your results.

Simulation times \(n=5\) \(n=30\) \(n=200\)
\(20\)
\(1000\)
\(20000\)
  1. If \(U_1\) and \(U_2\) are independent and both are uniform random variables over \([0, 1]\) interval https://en.wikipedia.org/wiki/Continuous_uniform_distribution, then \(X_1\) and \(X_2\) defined by \[X_1 = \sqrt{-2\ln(U_1)}\cos(2\pi U_2), \quad X_2 = \sqrt{-2\ln(U_1)}\sin(2\pi U_2)\] are independent \(N(0, 1)\) variables. Draw 10,000 samples for \(U_1\) and \(U_2\) using the runif() function, and use the transformation to generate the samples of \(X_1\) and \(X_2\). Verify
    • the standard normality of \(X_1\) and \(X_2\) by plotting their histogram with a superimposed standard normal density.
    • the independence of \(X_1\) and \(X_2\) by plotting the scatterplot of \(X_1\) and \(X_2\) and computing their correlation coefficient.