Probability and Statistics 🎲

MATH 4780 / MSSC 5780 Regression Analysis

Dr. Cheng-Han Yu
Department of Mathematical and Statistical Sciences
Marquette University

Random Variables

Discrete Random Variables

  • A discrete variable \(Y\) has countable possible values, e.g. \(\mathcal{Y} = \{0, 1, 2\}\)

  • Probability (mass) function (pf or pmf) \[P(Y = y) = p(y), \,\, y \in \mathcal{Y}\]

    • \(0 \le p(y) \le 1\) for all \(y \in \mathcal{Y}\)

    • \(\sum_{y \in \mathcal{Y}}p(y) = 1\)

    • \(P(a < Y < b) = \sum_{y: a<y<b}p(y)\)

Give me an example of a discrete variable/distribution!

Binomial Probability Function

\(P(Y = y; m, \pi) = \frac{m!}{y!(m-y)!}\pi^y(1-\pi)^{m-y}, \quad y = 0, 1, 2, \dots, m\)

Continuous Random Variables

  • A continuous variable \(Y\) has infinite possible values, e.g. \(\mathcal{Y} = [0, \infty)\)

  • Probability density function (pdf) \[f(y), \,\, y \in \mathcal{Y}\]

    • \(f(y) \ge 0\) for all \(y \in \mathcal{Y}\)

    • \(\int_{\mathcal{Y}}f(y) \, dy= 1\)

    • \(P(a < Y < b) = \int_{a}^bf(y)\,dy\)

Give me an example of continuous variable/distribution!

Normal (Gaussian) Density Curve

For continuous variables, \(P(a < Y < b)\) is the area under the density curve between \(a\) and \(b\).

Expected Value and Variance

For a random variable \(Y\),

  • The expected value or mean: \(E(Y)\) or \(\mu\).

  • The variance: \(\mathrm{Var}(Y)\) or \(\sigma^2\).

  • The mean measures the center of the distribution, or the balancing point of a seesaw.

  • The variance measures the mean squared distance from the mean, or dispersion of a distribution.

Discrete \(Y\):

\[E(Y) := \sum_{y \in \mathcal{Y}}yP(Y = y)\] \[\begin{align} \mathrm{Var}(Y) &:= E\left[(Y - E(Y))^2 \right] \\&= \sum_{y \in \mathcal{Y}}(y - \mu)^2P(Y = y)\end{align}\]

Continuous \(Y\):

\[E(Y) := \int_{-\infty}^{\infty}yf(y)\, dy\] \[\begin{align} \mathrm{Var}(Y) &:= E\left[(Y - E(Y))^2 \right] \\&= \int_{-\infty}^{\infty}(y - \mu)^2f(y)\, dy \end{align}\]

This is NOT the sample mean \(\overline{y}\) or sample variance \(s^2\).

R Lab dpqr Functions

For some distribution (dist),

  • ddist(x, ...): density value \(f(x)\) or probability value \(P(X = x)\).
  • pdist(q, ...): cdf \(F(q) = P(X \le q)\).
  • qdist(p, ...): quantile of probability \(p\).
  • rdist(n, ...): generate \(n\) random numbers.
## 10 binomial variable values with m = 5
rbinom(n = 10, size = 5, prob = 0.4)
 [1] 2 2 2 2 2 2 0 1 3 3
## P(X = 3) of binom(5, 0.4)
dbinom(x = 3, size = 5, prob = 0.4)
[1] 0.23
## P(X <= 2) of binom(5, 0.4)
pbinom(q = 2, size = 5, prob = 0.4)
[1] 0.683

R Lab dpqr Functions

## the default mean = 0 and sd = 1 (standard normal)
rnorm(5)
[1] -0.151  0.259 -0.649  0.846 -0.660
  • \(100\) random draws from \(N(0, 1)\)

R Lab dpqr Functions

# P(0.5 < Z < 1) where Z ~ N(0, 1)
pnorm(1) - pnorm(0.5)
[1] 0.15

R Lab dpqr Functions

m <- 5
p <- 0.4
## mean
(mu <- sum(0:5 * dbinom(0:5, size = m, prob = p)))
[1] 2
m * p
[1] 2
## var
sum((0:5 - mu) ^ 2 * dbinom(0:5, size = m, prob = p))
[1] 1.2
m * p * (1 - p)
[1] 1.2

https://statisticsglobe.com/probability-distributions-in-r

Distributions

Some of Normals is Normal

  • If \(Y \sim N(\mu, \sigma^2)\), \(Z = \frac{Y - \mu}{\sigma} \sim N(0, 1)\).
  • If \(X \sim N(\mu_X, \sigma_X^2)\) and \(Y \sim N(\mu_Y, \sigma_Y^2)\) and \(X\) and \(Y\) are independent. Then for \(a, b \in \mathbf{R}\), \[aX + bY \sim N\left(a\mu_X+b\mu_Y, \color{red}{a^2} \color{black} \sigma_X^2 + \color{red}{b^2} \color{black} \sigma_Y^2\right)\]

What is the distribution of \(a_1Y_1 + a_2Y_2 + \cdots + a_nY_n\) if \(Y_i \sim N(\mu_i, \sigma^2_i)\) and \(Y_i\)s are independent?

Statistics Comes In

Suppose each data point \(Y_i\) of the sample \((Y_1, Y_2, \dots, Y_n)\) is a random variable from the same population whose distribution is \(N(\mu, \sigma^2)\), and \(Y_i\)s are independent each other: \[Y_i \stackrel{iid}{\sim} N(\mu, \sigma^2), \quad i = 1, 2, \dots, n\]

Statistics Comes In: Sampling Distribution

If \(Y_i \stackrel{iid}{\sim} N(\mu, \sigma^2), \quad i = 1, 2, \dots, n\),

  • \(\overline{Y} \sim N\left(\mu,\frac{\sigma^2}{n} \right)\)

  • \(Z = \frac{\overline{Y} - \mu}{\sigma/\sqrt{n}} \sim N(0, 1)\)

  • Let the sample variance of \(Y\) be \(S^2 = \frac{\sum_{i=1}^n(Y_i - \overline{Y})^2}{n-1}\).

  • \(\frac{\overline{Y} - \mu}{S/\sqrt{n}} \sim t_{n-1}\)

  • Inference: \(\mu\) and \(\sigma^2\) are unknown, and \(\overline{y}\) and \(s^2\) are point estimates for \(\mu\) and \(\sigma^2\), respectively.

Why Use Normal? Central Limit Theorem (CLT)

  • \(X_1, X_2, \dots, X_n\) are i.i.d. variables with mean \(\mu\) and variance \(\sigma^2 < \infty\).

  • As \(n\) increases, the sampling distribution of \(\overline{X}_n = \frac{\sum_{i=1}^nX_i}{n}\) looks more and more like \(N(\mu, \frac{\sigma^2}{n})\), regardless of the distribution from which we are sampling \(X_i\)!

Nature Methods 10, 809–810 (2013)

\((1-\alpha)100\%\) Confidence Interval for \(\mu\)

  • \(T = \frac{\overline{Y} - \mu}{S/\sqrt{n}} \sim t_{n-1}\)

\[\small \begin{align} & \quad \quad P(-t_{\alpha/2, n-1} < T < t_{\alpha/2, n-1}) = 1 - \alpha \\ & \iff P(-t_{\alpha/2, n-1} < \frac{\overline{Y} - \mu}{S/\sqrt{n}} < t_{\alpha/2, n-1}) = 1 - \alpha \\ & \iff P(\mu-t_{\alpha/2, n-1}S/\sqrt{n} < \overline{Y} < \mu + t_{\alpha/2, n-1}S/\sqrt{n}) = 1 - \alpha \end{align}\]

\((1-\alpha)100\%\) Confidence Interval for \(\mu\): Probability

\[P\left(\mu-t_{\alpha/2, n-1}\frac{S}{\sqrt{n}} < \overline{Y} < \mu + t_{\alpha/2, n-1}\frac{S}{\sqrt{n}} \right) = 1-\alpha\]

Is the interval \(\left(\mu-t_{\alpha/2, n-1}\frac{S}{\sqrt{n}}, \mu + t_{\alpha/2, n-1}\frac{S}{\sqrt{n}} \right)\) our confidence interval?

No! We don’t know \(\mu\), the quantity we’d like to estimate! But we almost there!

\((1-\alpha)100\%\) Confidence Interval for \(\mu\): Formula

\[\begin{align} &P\left(\mu-t_{\alpha/2, n-1}\frac{S}{\sqrt{n}} < \overline{Y} < \mu + t_{\alpha/2, n-1}\frac{S}{\sqrt{n}} \right) = 1-\alpha\\ &P\left( \boxed{\overline{Y}- t_{\alpha/2, n-1}\frac{S}{\sqrt{n}} < \mu < \overline{Y} + t_{\alpha/2, n-1}\frac{S}{\sqrt{n}}} \right) = 1-\alpha \end{align}\]

  • With sample data of size \(n\), \(\left( \overline{y}- t_{\alpha/2, n-1}\frac{s}{\sqrt{n}}, \overline{y} + t_{\alpha/2, n-1}\frac{s}{\sqrt{n}} \right)\) is our \((1-\alpha)100\%\) CI for \(\mu\).

Hypothesis Testing

  • \(H_0: \mu = \mu_0 \text{ vs. } H_1: \mu > \mu_0\), or \(\mu < \mu_0\), or \(\mu \ne \mu_0\)
  • The significant level \(\alpha = P(\text{Reject } H_0 \mid H_0 \text{ is true}) = P(\text{Type I error})\)
  • The test statistic is \(t_{test} = \frac{\overline{y} - \color{blue}{\mu_0}}{s/\sqrt{n}}\), a value from \(T \sim t_{n-1}\).
  • When calculating a test statistic, we assume \(H_0\) is true.

Reject \(H_0\) if

Method     Right-tailed \((H_1: \mu > \mu_0)\) Left-tailed \((H_1: \mu < \mu_0)\) Two-tailed \((H_1: \mu \ne \mu_0)\)
Critical value \(t_{test} > t_{\alpha, n-1}\) \(t_{test} < -t_{\alpha, n-1}\) \(\mid t_{test}\mid \, > t_{\alpha/2, n-1}\)
\(p\)-value \(\small P(T > t_{test} \mid H_0) < \alpha\) \(\small P(T < t_{test} \mid H_0) < \alpha\) \(\small 2P(T > \,\mid t_{test}\mid) \mid H_0) < \alpha\)

Both Methods Lead to the Same Conclusion