The goal of density estimation is to
approximate the probability density function
of a random variable X.
Assume we have n independent observations
from
the random variable X.
The kernel density estimator
for the estimation
of the density value f(x) at point x is defined as
|
In order to illustrate how the kernel density estimator works,
let us rewrite equation (1) for the Uniform kernel:
The Uniform kernel implies that all observations xi in the neighborhood of x are given the same weight, since observations close to x carry also information about f(x). It is reasonable to attribute more weights to observations that are near to x and less weight to distant observations. All other kernels in Table 1 have this property. Figure 1 visualizes the construction of a kernel density estimate of 10 data points ( red + ) using the Gaussian kernel.
To understand the structure of the
smoother
library, we need to explain
shortly how kernel estimates can be computed in practice.
Consider the equation (1) again. To obtain the kernel
density estimates for all observations
we must
compute
For exploratory purposes, i.e. for graphing the density
estimate it is not necessary to evaluate the
at
all observations
.
Instead, the estimate
can be computed for example on an equidistant grid
:
The number of
evaluations of the
kernel function
is however
time consuming if the sample size is large. An alternative
and faster way is to approximate the kernel density estimate
by the WARPing method (Härdle and Scott; 1992).
The basic idea of WARPing (Weighted Average of Rounded Points)
is the ``binning'' of the data in bins of length d starting
at the origin x0=0. Each observation point is then replaced by the
bincenter of the corresponding bin (that means in fact that
each point is rounded to the precision given by d). A usual
choice for d is to use h/5 or
.
In the latter case, the effective sample size r for the computation
(the number of nonempty bins) can be at most 101.
For the WARPing method, the kernel function needs to
be evaluated only at
,
,
where s is the number of bins which contains the
support of the kernel function.
The calculation of the estimated density reduces then to
To have a choice between both the exact and the WARPing computation of kernel estimates and related functions, the smoother library offers most functionality in two functions:
Functionality | Exact | WARPing |
density estimate | ![]() |
![]() |
density confidence intervals | ![]() |
![]() |
density confidence bands | ![]() |
![]() |
density bandwidth selection | - | ![]() |
The easiest way to compute kernel density estimates is the function
denest. This function is based on the WARPing method which
makes the computation very fast.
Let us apply the
denest
routine to the
nicfoo
data which contain observations on household netincome in the first
column (see Data Sets).
denest
requires the data as a minimum input. Optionally,
the bandwidth h, the kernel function K and the
discretization parameter d which is used for the WARPing can
be specified. For example,
netinc=read("nicfoo") netinc=netinc[,1] h=(max(netinc)-min(netinc))*0.15 fh=denest(netinc,h)
To display the estimated function you may use these lines of code:
fh=setmask(fh,"line") plot(fh) tl="Density Estimate" xl="netincome" yl="density fh" setgopt(plotdisplay,1,1,"title",tl,"xlabel",xl,"ylabel",yl)The resulting density is plotted in Figure 2.
As an alternative to
denest, the function
denxest
can be used for obtaining a density estimate by exact computation.
As with
denest
the data are the minimum input. Optionally,
the bandwidth h, the kernel function K and a grid v
for the computation can be specified. The following code computes
the density estimate for all observations:
h=(max(netinc)-min(netinc))*0.15 fh=denxest(netinc,h)To compute the density on a grid of 30 equidistant points within the range of x, the previous example can be modified as follows:
h=(max(netinc)-min(netinc))*0.15 v=grid( min(netinc), (max(netinc)-min(netinc))/29, 30) fh=denxest(netinc,h,"qua",v)
Finally, let us calculate the density estimate for a family of different bandwidths in order to see the dependence on the bandwidth parameter. We modify our first code in the following way:
hfac=(max(netinc)-min(netinc)) fh =setmask( denest(netinc,0.15*hfac) ,"line") fh1=setmask( denest(netinc,0.05*hfac) ,"line","thin","magenta") fh2=setmask( denest(netinc,0.10*hfac) ,"line","thin","red") fh3=setmask( denest(netinc,0.20*hfac) ,"line","thin","blue") fh4=setmask( denest(netinc,0.25*hfac) ,"line","thin","green") plot(fh1,fh2,fh3,fh4,fh) tl="Family of Density Estimates" xl="netincome" yl="density fh" setgopt(plotdisplay,1,1,"title",tl,"xlabel",xl,"ylabel",yl)
|
Kernel density estimation requires two parameters: the kernel function K and the bandwidth parameter h. In practice, the choice of the kernel is not nearly as important as the choice of the kernel. The theoretical background of this observation is that kernel functions can be rescaled such that the difference between two kernel density estimates using two different kernels is almost negligible (Marron and Nolan; 1988).
Instead of rescaling kernel functions, we can also rescale the bandwidths.
This rescaling works as follows:
Suppose we have estimated an unknown density f using some kernel
KA and bandwidth hA and we are considering estimating f with
a different kernel, say KB. Then the appropriate bandwidth
hB to use with KB is
|
XploRe offers the function
canker
to do this recomputation of
bandwidths:
hqua=canker(1.0,"gau","qua") hquacomputes the bandwidth
Contents of hqua [1,] 2.6226that can be used with the Quartic kernel "qua" in order to find the kernel density estimate that is equivalent to using the Gaussian kernel "gau" and bandwidth 1.0.
Now we can consider the problem of bandwidth selection.
The optimal bandwidth for a kernel density estimate is typically
calculated on the basis of an estimate for the integrated
squared error
Plug-in approaches attempt to estimate MISE.
Under the asymptotic conditions
and
,
it is possible to approximate MISE by
If a different kernel function is used, the bandwidth
can be rescaled by using equation
(2). For example, the rule-of-thumb
bandwidth for the Quartic kernel computes as
Of course,
is a very crude estimate, in particular
if the true density f is far from looking like a normal density.
Refined plug-in methods such as the Park and Marron plug-in
and the Sheater and Jones plug-in use nonparametric
estimates for f''.
The quantlet
denbwsel
needs the data vector as input:
netinc=read("nicfoo") netinc=netinc[,1] tmp=denbwsel(netinc)
Choose for instance the Least Squares CV item to use LSCV. Here, a graphical display appears which shows the LSCV criterion in the upper left, the chosen optimal bandwidth in the upper right, the resulting kernel density estimate in the lower left, and some information about the search grid and the kernel in the lower right. This graphical display is shown in Figure 4.
You can play around now with the different bandwidth selectors. Try out the Park and Marron plug-in for example. Here, in contrast to the cross-validation methods, the aim is to find the root of the bandwidth selection criterion. The graphical display for the Park and Marron selector is shown in Figure 5.
During the use of
denbwsel, all currently obtained results
are displayed in the output window.
Note that for some selectors, it will be necessary to increase the
search grid for h. This can be done by using the menu item
(Change h range or K).
It is difficult to give a general decision of which bandwidth selector is the best among all. As mentioned above, Silverman's rule of thumb works well when the true density is approximately normal. It may oversmooth for bimodal or multimodal densities. A nice feature of the cross-validation methods is that the selected bandwidth automatically adapts to the smoothness of f. From a theoretical point of view, the BCV and SCV bandwidth selectors have the optimal asymptotic rate of convergence, but need either a very large sample size or pay with a large variance. As a consequence, we recommend determining bandwidths by different selection methods and comparing the resulting density estimates.
For detailed overviews on plug-in and cross-validation methods of bandwidth selection, see Härdle (1991), Park and Turlach (1992), and Turlach (1993). Practical experiences are available from the simulation studies in Marron (1989), Jones et al. (1992), Park and Turlach (1992), and Cao et al. (1992).
|
To derive confidence intervals or confidence bands for
one has to know its sampling distribution. The
following result about the asymptotic distribution of
is known from the literature (Härdle; 1991, p. 62).
Suppose that f'' exists and h=cn-1/5. Then
![]() |
(7) |
![]() |
(8) |
![]() |
(9) |
The functions
denci
and
denxci
compute pointwise confidence intervals for a grid
of points (
denci
using WARPing) or all
observations (
denxci
using exact computation).
The following quantlet code computes the confidence intervals
for bandwidth 0.2 (Gaussian kernel) and significance
level
.
For graphical exploration, the
pointwise intervals are drawn as curves and displayed
in a plot.
The resulting graphics is shown in Figure 6.
{fh,fl,fu}=denci(netinc,0.2,0.05,"gau") fh=setmask(fh,"line","blue") fl=setmask(fl,"line","blue","thin","dashed") fu=setmask(fu,"line","blue","thin","dashed") plot(fl,fu,fh) setgopt(plotdisplay,1,1,"title","Density Confidence Intervals")
Uniform confidence bands for f can only be derived under
some rather restrictive assumptions, see
Härdle (1991, p. 65).
Suppose that f is a density on [0,1] and ,
.
Then the following formula holds under
some regularity for all
(Bickel and Rosenblatt; 1973):
![]() |
(10) |
{fh,fl,fu}=dencb(netinc,0.2,0.05,"gau") fh=setmask(fh,"line","blue") fl=setmask(fl,"line","blue","thin","dashed") fu=setmask(fu,"line","blue","thin","dashed") plot(fl,fu,fh) setgopt(plotdisplay,1,1,"title","Density Confidence Bands")
The resulting graph is shown in Figure 7.
It can be clearly seen that the confidence bands are wider
than the confidence intervals (Figure 6).
This is explained by the fact
that here the whole function
lies within the bands
with probability 95%.
![]() |
MD*TECH Method and Data Technologies |
http://www.mdtech.de mdtech@mdtech.de |