In this section we consider the problem of recovering the regression function from noisy data based on a wavelet decomposition.
The upper two plots in the display show the underlying regression function (left) and noisy data (right). The lower two plots show the distribution of the wavelet coefficients in the time-scale domain for the regression function and the noisy data.
For simplicity we deal with a regression estimation problem. The noise is assumed to be additive and Gaussian with zero mean and unknown variance. Since the wavelet transform is an orthogonal transform we can consider the filtering problem in the space of wavelet coefficients. One takes the wavelet transform of the noisy data and tries to estimate the wavelet coefficients. Once the estimator is obtained one takes the inverse wavelet transform and recovers the unknown regression function.
To suppress the noise in the data two approaches are normally used. The first one is the so-called linear method. We have seen in Section 6 that the wavelet decomposition reflects well the properties of the signal in the frequency domain. Roughly speaking, higher decomposition scales correspond to higher frequency components in the regression function. If we assume that the underlying regression is allocated in the low frequency domain then the filtering procedure becomes evident. All empirical wavelet coefficients beyond some resolution scale are estimated by zero. This procedure works well if the signal is sufficiently smooth and when there is no boundary effect in the data. But for many practical problems such an approach does not seem to be fully appropriate, e.g. images cannot be considered as smooth functions.
To avoid this shortcoming often a nonlinear filtering procedure is used to suppress the noise in the empirical wavelet coefficients. The main idea is based on the fundamental property of the wavelet transform: father and mother functions are well localized in time domain. Therefore one could estimate the empirical wavelet coefficients independently. To do this let us compare the absolute value of the empirical wavelet coefficient and the standard deviation of the noise. It is clear that if the wavelet coefficient is of the same order of the noise level, then we cannot separate the signal and the noise. In this situation a good estimator for the wavelet coefficient is zero. In the case when an empirical wavelet coefficient is greater than the noise level a natural estimator for a wavelet coefficient is the empirical wavelet coefficient itself. This idea is called thresholding.
We divide the different modifications of thresholding in mainly three methods: hard thresholding, soft thresholding and a levelwise thresholding using Stein risk estimator, see Subsection 7.3.
To suppress the noise we apply the following nonlinear transform to
the empirical wavelet coefficients:
,
where t is a certain threshold. The choice of the threshold is a
very delicate and important statistical problem.
On the one hand, a big threshold leads to a large bias of the
estimator. But on the other hand, a small threshold increases the
variance of the smoother. Theoretical considerations yield the
following value of the threshold:
We provide two possibilities for choosing a threshold. First of all
you can do it by ``eye'' using the Hard threshold item and
entering the desired value of the threshold. The threshold offered is
the one described in the paragraph before. Note that this threshold
value is in most cases conservative and therefore we should choose a
threshold value below the offered one. The item Automatic
means that the threshold will be chosen as
with a suitably estimated variance.
The plots of the underlying regression function and the data corrupted by the noise are shown in top of the display. You can change the variance of the noise and the regression function by using the Noise level and the Change function items.
On the display above you can see the estimated regression function after the thresholding procedure. The left upper window shows how the threshold procedure distorts the underlying regression function when there is no noise in the data. The final estimator is shown in the right upper window.
Two plots in the bottom of the display show the thresholded and the nonthresholded wavelet coefficients. The thresholded coefficients are shown as blue circles and the nonthresholded as red ones. The radii of the circles correspond to the magnitudes of the empirical wavelet coefficients.
You can try to improve the estimator using the items Change basis and Change level. Under an appropriate choice of the parameters you can decrease the bias of your smoother.
Along with hard thresholding in many statistical applications soft thresholding procedures are often used. In this section, we study the so-called wavelet shrinkage procedure for recovering the regression function from noisy data.
The display shows the same regression function as in the lesson before, without and with noise, after soft thresholding.
The only difference between the hard and the soft thresholding
procedures is in the choice of the nonlinear transform on the
empirical wavelet coefficients. For soft thresholding the following
nonlinear transform is used:
In addition you can use a data driven procedure for choosing the threshold. This procedure is based on Stein's principle of unbiased risk estimation (Donoho and Johnstone, 1995). The main idea of this procedure is the following. Since S(x) is a continuous function, given t, one can obtain an unbiased risk estimate for the soft thresholding procedure. Then the optimal threshold is obtained by minimization of the estimated risks.
We first give an intuitive idea about adaptive thresholding using Stein's principle. Later on we give a detailed mathematical description of the procedure.
We define a risk function R(sk, Zk,t) with
,
where ck is the unknown coefficient, sk
known (or to be estimated) scale parameters, ek i.i.d. N(0,1)
random variables and t the threshold.
Since Stein enables us to estimate the risk
This procedure can be performed for each resolution level.
Mathematical derivation
Now we discuss the data driven choice of threshold, initial level j0 and the wavelet basis by the Stein (1981) method of unbiased risk estimation. The argument below follows Donoho and Johnstone (1995).
We first explain the Stein method for the idealized one-level
observation model discussed in the previous section:
Zk =ck + sk ek, for ,
where
is the vector of unknown parameters,
sk > 0 are known scale parameters and ek are i.i.d. N(0,1)
random variables.
Let
c' = (c1',...,cM') be an estimator of c. Introduce the
mean squared risk of the estimate of c:
.
Assume that the estimators ck' have the form
If the true parameters ck were known, one could compute t*
explicitly. In practice this is not possible, and one chooses a certain
approximation t' of t* as a minimizer of an unbiased estimator R'
of the risk R. To construct R', note that
E(ck' - ck)2 =sk2+2sk E(ckHt(Zk))+E(Ht2(Zk)), | (3) |
The Stein principle is to minimize R' with respect to t and take the
minimizer
![]() |
(4) |
In the rest of this paragraph we formulate the Stein principle for the
example of soft thresholding wavelet estimators. For soft
thresholding (see above) we have
An equivalent expression is
The expression in square brackets in (5) does not depend on t.
Thus, the definition (2) is equivalent to
Let
be the permutation ordering the array
such that |Zk|,
:
and
|Zp0| = 0. According to (6) one obtains
t' = |Zpl|
where
In particular for M=1 the above equation yields the following
estimator
It is easy to see that the computation of the estimate of t* defined
above requires approximately
operations provided that a
quick sort algorithm is used to order the array |Zk|,
.
We proceed from the idealized model Zk =ck + sk ek to a more realistic density estimation model. In the context of wavelet smoothing the principle of unbiased risk estimation gives the following possibilities for adaptation:
Please notice that in XploRe we have implemented the first version. To
demonstrate these possibilities consider the family of wavelet
estimators
In the third case we assume that the minimum is taken over a finite number of given wavelet bases.
Remark 1:
Since in practice the values of the different variances are not
available, one can use their empirical versions instead. For example
if (7) is the wavelet density estimator based on the sample
,
one can replace
by its estimator.
It is clear that this yields a consistent estimator for the variances under rather general assumptions on the basis functions and on the underlying density of the Xi's.
Remark 2:
If one wants to threshold only the coefficients ,
which is usually the
case, the function Ht for
should be
identically zero. Therefore,
should be replaced by
and Steins unbiased risk
estimator takes the form
Let us now apply the Stein principle to a regression estimation
example. We choose the following step function for :
The function was observed at 128 equispaced points and disturbed
with Gaussian noise with variance 1/128. We use the Stein rule only
for threshold choice 1 (level by level) and not for the cases 2 and 3
where the adaptive choice of j0 and of the basis is considered.
We thus choose the threshold t' as the minimizer with respect to
of
![]() |
MD*TECH Method and Data Technologies |
http://www.mdtech.de mdtech@mdtech.de |