2. Least Trimmed Squares
In this section the least trimmed squares estimator, its robustness
and asymptotic properties, and computational aspects will be discussed.
2.1 Definition
First of all, we will precise the verbal description of the estimator given
in the previous section. Let us consider a linear regression model for a
sample
with a response variable
and a vector of
explanatory variables
:
The least trimmed squares estimator
is defined as
![$ \hat{\beta}^{(LTS)}= \mathop{\rm argmin}\limits _{\beta \in \mathbb{R}^p} \sum_{i = 1}^{h} r_{[i]}^2(\beta),$](ltsimg27.gif) |
(2) |
where
represents the
-th order statistic among
with
(we believe that the notation is self-explaining).
The so-called trimming constant
have to satisfy
.
This constant determines the breakdown point of the LTS estimator since
the definition (2) implies that
observations with
the largest residuals will not affect the estimator (except of the fact
that the squared residuals of excluded points have to be larger than
the
-th order statistics among the squared residuals).
The maximum breakdown point is attained for
(see Rousseeuw and Leroy, 1987, Theorem 6),
whereas for
, which corresponds to the least
squares estimator, the breakdown point equals to 0.
More on the choice of the trimming constant can be found
in Subsection 3.1.
Before proceeding to the description of how such an estimate can be
evaluated in
XploRe
, several issues have to be discussed, namely, the
existence of this estimator and its statistical properties (a discussion of
its computational aspects is postponed to Subsection 2.2).
First, the existence of the optimum in (2) under some
reasonable assumptions can be justified in the following way:
the minimization of the objective function in
(2) can be viewed as a process in which we every time choose
a subsample of
observations and find some
minimizing the sum of
squared residuals for the selected subsample. Doing this for every subsample (there
are
of them) we get
candidates for the
LTS estimate and the one that commands the smallest value of the objective
function is the final estimate. Therefore, the existence of the LTS
estimator is basically equivalent to the existence of the least squares
estimator for subsamples of size
.
Let us now briefly discuss various statistical properties of LTS.
First, the least trimmed squares is regression, scale, and affine equivariant
(see, for example, Rousseeuw and Leroy, 1987, Lemma 3, Chapter 3).
We have also
already remarked that the breakdown point of LTS reaches the upper bound
for regression equivariant estimators if the trimming
constant
equals to
. Furthermore, the
-consistency
and asymptotic normality of LTS can be proved for a general linear regression
model with continuously distributed disturbances (Vísek, 1999b).
Besides these important statistical properties, there are also some
less practical aspects. The main one directly follows from the
noncontinuity of the LTS objective function. Because of this, the sensitivity of
the least trimmed squares estimator to a change of one or several observations might
be sometimes rather high (Vísek, 1999a). This property, often referred as high
subsample sensitivity, is closely connected with the possibility that a change or omission of
some observations may change considerably the subset of a sample that is treated
as the set of ``correct'' data points. It does not have to be seen necessarily
as disadvantageous, the point of view merely depends on the purpose we are using LTS for.
See Vísek (1999b) and Section 3 for further information.
2.2 Computation
- b =
lts(x, y{, h, all, mult})
- computes the least trimmed squares estimate of a linear regression model
|
The quantlet of quantlib metrics
which serves for the least trimmed
squares estimation is
lts. To understand the function of its
parameters, the algorithm used for the evaluation of LTS has to be
described. Later, the description of the quantlet follows.
There are two possible strategies how the least trimmed squares estimate
can be determined. First one relies on the full search through all
subsamples of size
and the consecutive LS estimation as described
in the previous section, and thus, let
us obtain the precise solution (neglecting ubiquitarian numerical errors).
Unfortunately, it is hardly possible to examine the total of
subsamples unless a very small sample is analyzed. Therefore,
in most cases (when the number of cases is higher) only an approximation can
be computed (note, please, that in the examples presented here we compute the
exact LTS estimates as described above, and thus, the computation is relatively
slow). The present algorithm does the approximation in the following way: having
selected randomly an
-tuple of observations we apply the least squares
method on them, and for the estimated regression coefficients we evaluate
residuals for all
observations. Then
-tuple of data points with
the smallest squared residuals is selected and the LS estimation takes
place again. This step is repeated so long until a decrease of the sum
of the
smallest squared residuals is obtained. When no further
improvement can be found this way, a new subsample of
observations is
randomly generated and the whole process is repeated. The search is stopped
either when we find
times the same estimate of model (where
is an a
priori given positive integer) or when an a priori given number of
randomly generated subsamples is accomplished. A more refined version of
this algorithm suitable also for large data sets was proposed and described
by Rousseeuw and Van Driessen (1999).
From now on, noninteractive quantlet
lts
is going to be described.
The quantlet expects at least two input parameters: an
matrix x that contains
observations for each of
explanatory
variables and an
vector y of
observed responses.
If the intercept is to be included in the regression model,
the
vector of ones can be concatenated to the matrix x
in the following way:
x = matrix(rows(x))~x
Neither the matrix x, nor the vector y should contain
missing (NaN) or infinite values (Inf,-Inf).
Their presence can be identified by
isNaN
or
isNumber
and the invalid observations should be processed before running
lts,
e.g., omitted using
paf.
These two parameter are enough for the most basic use of the quantlet.
Typing
b = lts(x,y)
results in the approximation of the LTS estimate for the most robust choice of
using the default number of iterations.
Though this might suffice for some purposes, in most cases we would like
to specify also the third parameter--the trimming constant
--too.
So probably the most common use takes the form
b = lts(x,y,h)
The last two parameters of the quantlet, particularly all and
mult, provide a way to influence how the estimate is in fact computed.
Parameter all allows to switch from the approximation
algorithm, which corresponds to all equal to 0 and is used by default,
to the precise computation of LTS, which takes place if all is nonzero.
As the precise calculation can take quite a long time if a given sample is
not really small, a warning together with a possibility to cancel the evaluation
is issued whenever the total number of iterations is too high. Finally, the last
parameter mult, which equals to 1 by default, offers possibility to
adjust the maximum number of randomly generated subsamples in the case of
the approximation algorithm--this maximum is calculated from the size of a
given sample and the trimming constant, and subsequently, it is multiplied by
mult.
To have a real example, let us show how the time trend in phonecal
data set was estimated in Section 1. The data set is
two-dimensional, having only one explanatory variable x, year, in
the first column and the response variable y,
the number of international
phone calls, in the second column.
In order to obtain the LTS estimate for the linear
regression of y on constant term and x,
you have to type at the command line or in the editor window
z = read("phonecal")
x = matrix(rows(z)) ~ z[,2]
y = z[,3]
b = lts(x,y)
b
The result of the above example should appear in the
XploRe
output window as follows:
Contents of coefs
[1,] -5.6522
[2,] 0.11649