1.1 Introduction
The purpose of the classical least squares estimation is to answer the question
``How does the conditional expectation of a random variable
depend on some explanatory variables
?,'' usually under some assumptions
about the functional form of
, e.g., linearity.
On the other hand, quantile regression enables to pose such a question
at any quantile of the conditional distribution. Let us remind that
a real-valued random variable
is fully characterized by its distribution
function
. Given
, we can for any
define
-th quantile of
by
 |
(1.1) |
The quantile function, i.e.,
as a
function of
, completely describes the distribution of the random variable
. Hence, the estimation of conditional quantile functions allows to obtain a
more complete picture about the dependence of the conditional distribution of
on
. In other words, this means that we have a possibility to
investigate the influence of explanatory variables on the shape of the
distribution.
To illustrate the concept of the quantile regression, we consider three
kinds of linear regression models. First, let us take a sample
and discuss a linear regression model with independent errors identically
distributed according to a distribution function
:
 |
(1.2) |
The corresponding conditional quantile functions of
are
where
denotes the quantile function corresponding to the
distribution function
. Apparently, the quantile functions
are just vertically shifted with respect to each
other (
).
Therefore, the least squares estimate (or a more robust alternative) of the
conditional expectation and some associated measure of dispersion would
usually be a satisfactory result in so simple model.
Next, the situation is little bit more complicated if the model exhibits
some kind of heteroscedasticity. Assuming, for example, that
in equation (1.2), where
are
independent and identically distributed errors with
,
quantile functions can be expressed as
(
can, of course, depend also on other variables than
,
and in most general case, there does not have to be a known function
characterizing the heteroscedasticity of
at all).
Therefore, the conditional quantile functions are no longer just parallel
to each other--depending on the form of
, the coefficients at
can be different for different quantiles
since the effect
of a particular explanatory variable depends now on
, the form of
, and
.
Such a form of heteroscedasticity can occur, for instance, if we are to
examine the dependence of a household's consumption on the household
income. Families with higher incomes have a wider range of possibilities
how to split earnings between consumption and saving, and can more easily
facilitate a redistribution of their incomes across time as well.
Therefore, it is quite natural to expect that the spread of consumption
choices observed at higher levels of income is bigger than at lower income
levels.
Finally, it is possible to think about models that exhibit some
(e.g. linear) relationship between conditional quantiles of a dependent
variable and explanatory variables, but the relationship itself depends on
the quantile under consideration (i.e.,
in model (1.2)
would be a function of
in such a case).
For example, the amount of sales of a commodity
certainly depends on its price and advertisement expenditures. However, it
is imaginable that the effects of price or advertisement on the amount of sales
are quite different for a commodity sold in high volumes and a similar one with low
sales. Hence, similarly to the heteroscedasticity case, we see that the
conditional quantile functions are not necessarily just vertically shifted
with respect to each other, and consequently, their estimation can provide
a more complete description of the model under consideration than a usual
expectation-oriented regression.
To provide a real-data example, let us look at pullover
data set,
which contains information on the amount of sales
of pullovers in 10
periods, their prices
, the corresponding advertisement cost
and the
presence of shop assistants
in hours. For the sake of simplicity, we
neglect for now eventual difficulties related to finding the correct
specification of a parametric model and assume a simple linear regression
model.
- 1.
- The standard linear regression model has the form
 |
(1.3) |
Numerical results obtained by the ordinary least squares estimator
for the given data set are presented in Table 1.1.
- 2.
- In the quantile regression framework, the model is
for a given
characterized by
 |
(1.4) |
(note that the parameters are now functions of
).
Numerical results for several choices of
are presented in
Table 1.2.
Table:
The QR estimate of model (1.4).
qr02.xpl
 |
 |
 |
 |
 |
| 0.1 |
87.6 |
-0.12 |
0.57 |
0.29 |
| 0.3 |
156.6 |
-0.46 |
0.58 |
-0.05 |
| 0.5 |
97.3 |
-0.40 |
0.60 |
0.59 |
| 0.7 |
56.1 |
-0.11 |
0.34 |
1.09 |
| 0.9 |
56.1 |
-0.11 |
0.34 |
1.09 |
|
Comparing given two methods, it is easy to see that the traditional
estimation of the conditional expectation (1.3) provides an
estimate of a single regression function, which describes effects of
explanatory variables on average sales, whereas quantile regression results
in several estimates, each for a different quantile, and hence, gives us an
idea how the effects of the price, advertisement expenditures, and the
presence of shop assistant may vary at different quantiles. For example,
the impact of the pullover price on the (conditional) expected sales as
obtained from the least squares estimate is expressed by
(see Table 1.1). On the other hand, the quantile
regression estimates indicate that the negative impact of price on sales
is quite important especially at some parts of the sales distribution
(i.e.,
in Table 1.2), while being less
important for pullovers whose sales lies in the upper or lower tail of the
sales distribution.