MATH 4780 / MSSC 5780 Regression Analysis
\(Y_i= \beta_0 + \beta_1X_{i1} + \beta_2X_{i2} + \dots + \beta_kX_{ik} + \epsilon_i\)
Care about the partial relationship between \(y\) and each \(x\) with impact of other \(x\)s controlled.
Residual-based plots are more relevant in detecting the departure of linearity.
Partial residual plots (Component-plus-Residual Plot) are for diagnosing nonlinearity.
The partial residuals for \(x_j\): \[e_i^{(j)} = b_jx_{ij} + e_i\]
Partial residual plot \(e_i^{(j)}\) vs. \(x_{ij}\) for \(x_j\)
The bulge points | Transform | Ladder of powers/roots |
---|---|---|
left | \(x\) | down, e.g., \(\log(x)\) |
right | \(x\) | up |
down | \(y\) | down |
up | \(y\) | up |
gdp
to log(gdp)
health
to health + health^2
car::brief(logciafit, digits = 2)
(Intercept) gdp health gini
Estimate 3.02 -0.0439 -0.055 0.0216
Std. Error 0.29 0.0037 0.022 0.0061
Residual SD = 0.59 on 130 df, R-squared = 0.71
car::brief(logciafit2, digits = 2)
(Intercept) log(gdp) poly(health, degree = 2, raw = TRUE)1
Estimate 4.65 -0.720 -0.221
Std. Error 0.32 0.038 0.058
poly(health, degree = 2, raw = TRUE)2 gini
Estimate 0.0096 0.0191
Std. Error 0.0034 0.0044
Residual SD = 0.44 on 129 df, R-squared = 0.84
library(effects)
par(mar = c(2, 2, 0, 0))
plot(Effect("gdp", logciafit2, residuals = TRUE),
lines = list(col = c("blue", "black"), lty = 2),
axes = list(grid = TRUE), confint = FALSE,
partial.residuals = list(plot = TRUE, smooth.col = "magenta",
lty = 1,
span = 3/4),
xlab = "GDP per Capita", ylab = "Partial Residual", main = "", cex.lab = 2)
par(mar = c(2, 2, 0, 0))
plot(Effect("health", logciafit2, residuals = TRUE),
lines = list(col = c("blue", "black"), lty = 2),
axes = list(grid = TRUE), confint = FALSE,
partial.residuals = list(plot = TRUE, smooth.col = "magenta",
lty = 1,
span = 3/4),
xlab = "Health Expenditures", ylab = "Partial Residual", main = "", cex.lab = 2)
Box and Tidwell (1962) proposed a procedure for estimating \(\lambda_1, \lambda_2, \dots, \lambda_k\) in the model \[y = \beta_0 + \beta_1x_1^{\lambda_1} + \cdots + \beta_kx_k^{\lambda_k}+ \epsilon\]
All \(x_j\)s are positive.
\(\beta_0, \beta_1, \dots, \beta_k\) are estimated after and conditional on the transformations.
\(x_j^{\lambda_j} = \log_e(x_j)\) if \(\lambda_j = 0\).
Consider the model \[\log(Infant) = \beta_0 + \beta_1 GDP^{\lambda_1} + \beta_2Gini^{\lambda_2} + \beta_3 Health + \beta_4 Health ^ 2 + \epsilon\]
car::boxTidwell(log(infant) ~ gdp + gini,
other.x = ~poly(health, 2, raw = TRUE), data = CIA)
...
MLE of lambda Score Statistic (t) Pr(>|t|)
gdp 0.2 10.6 <2e-16 ***
gini -0.5 -0.4 0.7
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
...
The point estimate of \(\lambda\) is \(\hat{\lambda}_1 = 0.2\) and \(\hat{\lambda}_2 = -0.5\)
The test is for \(H_0:\) No transformation is needed \((\lambda = 1)\).
Can the nonlinear model \(y = \beta_0e^{\beta_1x}\epsilon\) be transformed into a linear one (intrinsically linear)?