Homework 1 - Probaility and Statistics Review
Due Friday, September 8, 11:59 PM on D2L
Homework Instruction and Requirement
Homework 1 covers course materials of Week 1 to 2.
Please submit your work in one PDF file to D2L > Assessments > Dropbox. Multiple files or a file that is not in pdf format are not allowed.
In your homework, please number and answer questions in order.
It is your responsibility to let me understand what you try to show. If you type your answers, make sure there are no typos. I grade your work based on what you show, not what you want to show. If you choose to handwrite your answers, write them neatly. If I can’t read your sloppy handwriting, your answer is judged as wrong.
Relevant code should be attached.
Programming and Computing
Please sharpen your coding skill using R or any language you prefer. No need to show your work on this part!
Probability and Statistics Review
- \(Y_1 \sim N(3, 8)\) and \(Y_2 \sim N(1, 4)\), and \(Y_1\) and \(Y_2\) are independent. What is the distribution the variable \(2Y_1 + 3Y_2\) follows?
Plot normal density curves with different choices of mean and standard deviation.
Install the R package
ISLR2
. Choose a continuous variable in theISLR2::Boston
data set. Use thesample()
function to draw a simple random sample of size 20 from this population. Calculate the sample average.Repeat the sampling in 3. several times to plot a sampling distribution of the sample mean.
Suppose \(Y_i \stackrel{iid}{\sim} N(\mu, \sigma^2)\), \(i = 1, 2, \dots, n\), with unknown \(\mu\) and \(\sigma\). The \(100(1-\alpha)\%\) confidence interval (CI) for the population mean \(\mu\) is \(\left( \overline{y}- t_{\alpha/2, n-1}\frac{s}{\sqrt{n}}, \overline{y} + t_{\alpha/2, n-1}\frac{s}{\sqrt{n}} \right)\). Use simulation with \(\alpha = 0.1\), \(\mu = 4\) and \(\sigma = 2\) to verify that such CIs contain \(\mu\) about \(100(1-\alpha)\%\) of times. Fill the percentage in the following table, and comment on your results.
Simulation times | \(n=5\) | \(n=30\) | \(n=200\) |
---|---|---|---|
\(20\) | |||
\(1000\) | |||
\(20000\) |
- If \(U_1\) and \(U_2\) are independent and both are uniform random variables over \([0, 1]\) interval https://en.wikipedia.org/wiki/Continuous_uniform_distribution, then \(X_1\) and \(X_2\) defined by \[X_1 = \sqrt{-2\ln(U_1)}\cos(2\pi U_2), \quad X_2 = \sqrt{-2\ln(U_1)}\sin(2\pi U_2)\] are independent \(N(0, 1)\) variables. Draw 10,000 samples for \(U_1\) and \(U_2\) using the
runif()
function, and use the transformation to generate the samples of \(X_1\) and \(X_2\). Verify- the standard normality of \(X_1\) and \(X_2\) by plotting their histogram with a superimposed standard normal density.
- the independence of \(X_1\) and \(X_2\) by plotting the scatterplot of \(X_1\) and \(X_2\) and computing their correlation coefficient.