3. Multivariate Density and Regression Functions

In this section we review kernel smoothing methods for density and regression function estimation in the case of multidimensional variables X.


3.1 Computational Aspects

As in the univariate case, density and regression functions can be estimated by exact computation or by WARPing approximation. However, the effect of WARPing is different in the multivariate case. WARPing is still relatively fast in the two-dimensional case. For three- and higher-dimensional estimates, exact estimation may be preferred. To have a choice between both the exact and the WARPing computation, all estimation routines are offered in two versions:

Functionality Exact WARPing
density estimation 2327 denxestp 2330 denestp
Nadaraya-Watson regression 2333 regxestp 2336 regestp
local linear regression 2339 lregxestp 2342 lregestp


3.2 Multivariate Density Estimation


hrot = 2359 denrotp (x {,K {,opt}})
computes a rule-of-thumb bandwidth for multivariate density estimation
fh = 2362 denestp (x {,h {,K} {,d}})
computes the multivariate kernel density estimate on a grid using the WARPing method
fh = 2365 denxestp (x {,h {,K} {,v}})
computes the multivariate kernel density estimate for all observations or on a grid v by exact computation

The kernel density estimator can be generalized to the multivariate case in a straightforward way. Suppose we now have observations $x_1,\ldots,x_n$ where each of the observations is a d-dimensional vector $x_i=(x_{i1},\ldots,x_{id})^T$. The multivariate kernel density estimator at a point $x=(x_{1},\ldots,x_{d})^T$ is defined as

\begin{displaymath}
\widehat{f}_{h}(x)=
\frac{1}{n}\sum_{i=1}^{n}\frac{1}{h_{1}\...
..._{i1}-x_{1}}{h_{1}},\ldots,\frac{x_{id}-x_{d}}{h_{d}}\right),
\end{displaymath} (18)

with $\mathcal{K}$ denoting a multivariate kernel function, i.e. a function working on d-dimensional arguments. Note that (18) assumes that the bandwidth h is a vector of bandwidths $h=\left(h_{1},\ldots, h_{d}\right)^{T}$.

What form should the multidimensional kernel function $\mathcal{K}(u)=\mathcal{K}(u_{1},\dots,u_{d})$ take on? The easiest solution is to use a multiplicative or product kernel

\begin{displaymath}\mathcal{K}(u)=K(u_{1})\cdotp \ldots \cdotp K(u_{p})\end{displaymath}

with K denoting an univariate kernel function. This means if K is a univariate kernel with support [-1,1] (e.g. the Quartic kernel), observations in a cube around x are used to estimate the density at the point x. An alternative is to use a genuine multivariate kernel function $\mathcal{K}(u)$, e.g. the radial symmetric Quartic kernel

\begin{displaymath}\mathcal{K}(u) \propto (1-u^{T}u)^2\;\,I(u^{T}u\le 1).\end{displaymath}

Radial symmetric kernels can be obtained from univariate by defining $\mathcal{K}(u) \propto K(\Vert u\Vert)$, where $\Vert u\Vert = \sqrt{u^Tu}$ denotes the Euclidean norm of the vector u. $\propto$ indicates that the appropriate constant has to be multiplied. Radial symmetric kernels use observations from a ball around x to estimate the density at x. Table 4 shows which product and which radial symmetric kernel functions are available in XploRe.
Table 4: Radial symmetric kernel functions.
Kernel Product Radial symmetric
     
Uniform 2381 uni 2384 runi
Triangle 2387 trian 2390 rtrian
Epanechnikov 2393 epa 2396 repa
Quartic 2399 qua 2402 rqua
Triweight 2405 tri 2408 rtri
Gaussian 2411 gau 2414 gau

The following quantlet computes a two-dimensional density estimate for the geyser data (see Data Sets). These are two-dimensional data featuring a bimodal density. The function 2421 denxestp can be called with only the data as input. In this case, the bandwidth vector is computed by Scott's rule (Scott; 1992). This rule of thumb is also separately implemented in 2424 denrotp. The default kernel function is the product Quartic kernel "qua". The resulting surface plot is shown in Figure 13.

  geyser = read("geyser") 
  fh = denxestp(geyser) 
  fh = setmask(fh,"surface","blue")
  axesoff()
  cu = grcube(fh)              ; box
  plot(cu.box,cu.x,cu.y, fh)   ; plot box and fh
  setgopt(plotdisplay,1,1,"title","2D Density Estimate")
  axeson()
2428smoo13.xpl

Figure 13: Two-dimensional density estimate.
\includegraphics[scale=0.6]{smootherd2d.ps}

The second example of this subsection shows a three-dimensional density estimate. This estimate can only be graphed in the form of a contour plot. See the Graphics Tutorial for an introduction to contour plots. The estimated data are columns 4 to 6 of the bank2 data (see Data Sets). This data set consists of two clusters which can be easily detected from the contour plot in Figure 14.

  bank    = read("bank2.dat") 
  bank456 = bank[,4:6]                ; columns 4 to 6
  fh = denxestp(bank456,1.5)
  axesoff()
  fhr  = (max(fh[,4])-min(fh[,4]))    ; range of fh
  cf1= grcontour3(fh,0.4*fhr,2)       ; contours
  cf2= grcontour3(fh,0.6*fhr,4)       ; contours
  cu = grcube(cf1|cf2)                ; box
  plot(cu.box, cf1,cf2)               ; graph contours
  setgopt(plotdisplay,1,1,"title","3D Density Estimate")
  axeson()
2441smoo14.xpl

Figure 14: Contours of three-dimensional density estimate.
\includegraphics[scale=0.6]{smootherd3d.ps}


3.3 Multivariate Regression


mh = 2465 regestp (x {,h {,K} {,d}})
computes the multivariate kernel regression on a grid using the WARPing method
mh = 2468 regxestp (x {,h {,K} {,v}})
computes the multivariate kernel regression for all observations or on a grid v by exact computation
mh = 2471 lregestp (x {,h {,K} {,d}})
computes the multivariate local linear kernel regression on a grid using the WARPing method
mh = 2474 lregxestp (x {,h {,K} {,v}})
computes the multivariate local linear kernel regression for all observations or on a grid v by exact computation

Multivariate nonparametric regression aims to estimate the functional relation between a univariate response variable Y and a d-dimensional explanatory variable X, i.e. the conditional expectation

\begin{displaymath}E(Y\vert X)=E\left(Y\vert X_{1},\ldots,X_{d}\right)
=m(X).\end{displaymath}

The multivariate Nadaraya-Watson estimator can then be written as a generalization of the univariate case. Suppose that we have independent observations $(x_1,y_1),\ldots,(x_n,y_n)$, then this estimator is defined as

\begin{displaymath}\widehat m_{h}(x)=
\frac{\sum\limits_{i=1}^n
\mathcal{K}\lef...
...rac{\displaystyle x_{ip}-x_{p}}{\displaystyle h_{p}}\right)}\,.\end{displaymath}

As in the univariate case, local polynomial approaches can be used. Due to the computational complexity one computes typically only local linear estimates.

The following quantlet compares the two-dimensional Nadaraya-Watson and the two-dimensional local linear estimate for a generated data set. For the bandwidth vector and the kernel function, we accept the defaults which are 20% of the range of the data and the product Quartic kernel "qua", respectively. Figure 15 shows the surface plots of both estimates.

  randomize(0)
  n=200
  x=uniform(n,2)
  m=sin(2*pi*x[,1])+x[,2]
  y=m+normal(n)/4
  mh= regestp(x~y)
  ml=lregestp(x~y)
  mh=setmask(mh,"surface","red")
  ml=setmask(ml,"surface","blue")
  c=grcube(mh)
  d=createdisplay(1,2)
  axesoff()
  show(d,1,1,mh,c.box,c.x,c.y)
  show(d,1,2,ml,c.box,c.x,c.y)
  axeson()
  setgopt(d,1,1,"title","Nadaraya-Watson")
  setgopt(d,1,2,"title","Local Linear")
2480smoo15.xpl

Figure 15: Bivariate Nadaraya-Watson and local linear estimate.
\includegraphics[scale=0.6]{smootherr2ld.ps}



Method and Data Technologies   MD*TECH Method and Data Technologies
  http://www.mdtech.de  mdtech@mdtech.de