MATH 4780 / MSSC 5780 Regression Analysis
\(Y_i= \beta_0 + \beta_1X_{i1} + \beta_2X_{i2} + \dots + \beta_kX_{ik} + \epsilon_i\)
Without normality,
The R-student residuals \(t_i \sim t_{n-p-1}\) if the model assumptions are correct.
Compare the distribution of the \(t_i\)s to \(t_{n-p-1}\) in QQ plot.
A response that is close to normal usually makes the assumption of normal errors more tenable.
Heavier tailed errors:
Skewed errors:
Multimodal errors:
Want the error or the response conditional on \(x\)s to be like normal after correction.
Power transformation: \(y \rightarrow y^{\lambda}\)
Ladder of powers and roots (Tukey, 1977):
The order of \(y\) is reversed if \(\lambda < 0\) is used for power transformation.
A modified power transformation by Box and Cox (1964):
\[y^{(\lambda)} = \begin{cases} \frac{y^{\lambda}-1}{\lambda}, & \quad \lambda \ne 0\\ \ln y, & \quad \lambda = 0 \end{cases}\]
gdp infant gini health region
Albania 11.1 12.8 34 6.0 Europe
Algeria 14.3 21.0 35 5.2 Africa
Argentina 22.1 9.7 46 8.5 America
Armenia 7.4 13.5 31 4.5 Europe
Australia 46.6 4.4 30 9.1 Oceania
Austria 45.4 3.5 26 11.5 Europe
Azerbaijan 17.9 25.7 34 5.4 Asia
Bangladesh 3.4 44.1 32 3.6 Asia
Belarus 18.2 3.6 27 5.0 Europe
Belgium 41.7 3.4 26 10.8 Europe
Benin 1.9 55.7 36 4.5 Africa
Bhutan 7.7 35.9 39 3.8 Asia
gdp
: GDP per capita in thousands of U.S. dollarsinfant
: Infant mortality rate per 1000 live birthsgini
: Gini coefficient for the distribution of family incomehealth
: Health expenditures as a percentage of GDPgdp
, health
and gini
affect infant
.mat <- matrix(r_stud)
for (lam in c(0.5, 0, -0.5, -1)) {
refit <- update(
ciafit, car::bcPower(infant, lam) ~ .
)
mat <- cbind(rstudent(refit), mat)
}
colnames(mat) <- c(-1, -0.5, "log", 0.5, 1)
boxplot(
mat, id = FALSE,
xlab = expression("Powers," ~ lambda),
ylab = expression(
"R-Student Residuals for "
~ Infant ^ (lambda))
)
bcPower Transformation to Normality
Est Power Rounded Pwr Wald Lwr Bnd Wald Upr Bnd
Y1 -0.22 -0.33 -0.37 -0.072
Likelihood ratio test that transformation parameter is equal to 0
(log transformation)
LRT df pval
LR test, lambda = (0) 8 1 0.005
Likelihood ratio test that no transformation is needed
LRT df pval
LR test, lambda = (1) 181 1 <2e-16
(Intercept) gdp health gini
3.016 -0.044 -0.055 0.022
gdp
, the infant mortality rate is expected to be decreased, on average, by 4.3% because \(\exp(-0.044) = 0.957\).