MATH 4780 / MSSC 5780 Regression Analysis
Parametric (Linear regression)
Nonparametric (Kernel smoother)
\(w_{ij}\) is larger when \(x_i\) and \(x_j\) are closer. \(y_i\) is affected more by its neighbors.
Can you give me an kernel function?
Let \(\tilde{y}_i\) be the kernel smoother of the \(i\)th response. Then \[\small \tilde{y}_i = \sum_{j=1}^n w_{ij}y_j\] where \(\sum_{j=1}^nw_{ij} = 1\).
The NadarayaâWatson kernel regression uses the weights given by \[\small w_{ij} = \frac{K \left( \frac{x_i - x_j}{b}\right)}{\sum_{k=1}^nK \left( \frac{x_i - x_k}{b}\right)}\]
\[\small w_{ij} = \frac{K \left( \frac{x_i - x_j}{b}\right)}{\sum_{k=1}^nK \left( \frac{x_i - x_k}{b}\right)}\] Bandwidth \(b\) defines âneighborsâ of \(x_i\), and controls the smoothness of the estimated \(f\).
In local weighted linear regression,
Use a kernel as a weighting function to define neighborhoods and weights to perform weighted least squares.
Find the estimates of \(\beta_0\) and \(\beta_1\) at \(x_0\) by minimizing \[\sum_{i=1}^nK_b(x_0, x_i)(y_i - \beta_0 - \beta_1x_i)^2\]
In locally weighted linear regression, we find the estimates of \(\beta_0\) and \(\beta_1\) at \(x_0\) by minimizing \[\sum_{i=1}^nK_b(x_0, x_i)(y_i - \beta_0 - \beta_1x_i)^2\]
Use KernSmooth
or locfit
package.
degree = 1
: local lineardegree = 2
: local quadraticdegree = 0
: kernel smoother
library(locfit)
locfit(y ~ lp(x, nn = 0.2,
h = 0.5, deg = 2),
weights = 1, subset, ...)
# weights: Prior weights (or sample sizes)
# for individual observations.
# subset: Subset observations in the
# data frame.
# nn: Nearest neighbor component of
# the smoothing parameter.
# h: The constant component of
# the smoothing parameter.
# deg: Degree of polynomial to use.
LOESS (LOcally Estimated Scatterplot Smoothing) uses the tricube kernel \(K(x_0, x_i)\) defined as \[K\left( \frac{|x_0 - x_i|}{\max_{k \in N(x_0)} |x_0 - x_k|}\right)\] where \[K(t) = \begin{cases} (1-t^3)^3 & \quad \text{for } 0 \le t \le 1\\ 0 & \quad \text{otherwise } \end{cases}\]
loess(y ~ x, span = 0.75, degree = 2) ## Default setting
loess()
, KernSmooth::locploy()
, locfit::locfit()
, ksmooth()
. Not all of them uses the same definition of the bandwidth.
ksmooth
: The kernels are scaled so that their quartiles are at \(\pm 0.25 * \text{bandwidth}\).
KernSmooth::locpoly
uses the raw value that we directly plug into the kernel.