|
Let us turn to estimating the conditional mean function
of a
nonlinear autoregressive processes of order one (NAR(1) process)
Before one applies Nadaraya-Watson estimation one should be aware of
the conditions that the underlying data generating mechanism has to
fulfil such that the estimator has nice asymptotic properties: most importantly, the
function
has to be continuous, the stochastic process has to be
stationary and the dependence among the observations must decline fast
enough if the distance among the observations increases. For measuring
dependence in nonlinear time series one commonly uses various mixing
concepts. For example, a sequence is said to be
-mixing
(strong mixing) (Robinson, 1983) if
The quantlet
regxest
allows to compute Nadarya-Watson estimates of
for an
array of different
's. Its syntax is
mh = regxest(x{, h, K, v})
with the input variables
In order to illustrate the methods presented in this chapter, we model the dynamics underlying the famous annual Canadian lynx trappings in 1821-1934, see e.g. Brockwell andDavis (1991, Appendix, Series G). Figures 1 and 2 of their original and logged time series are obtained with the quantlet
library("plot")
setsize(640,480)
lynx = read("lynx.dat") ; read data
d1 = createdisplay(1,1)
x1 = #(1821:1934)~lynx
setmaskl (x1, (1:rows(x1))', 0, 1)
show(d1,1,1,x1) ; plot data
setgopt(d1,1,1,"title","Annual Canadian Lynx
Trappings, 1821-1934")
setgopt(d1,1,1,"xlabel","Years","ylabel","Lynx")
d2 = createdisplay(1,1)
x2 = #(1821:1934)~log(lynx)
setmaskl (x2, (1:rows(x2))', 0, 1)
show(d2,1,1,x2) ; plot data
setgopt(d2,1,1,"title","Logs of Annual Canadian
Lynx Trappings, 1821-1934")
setgopt(d2,1,1,"xlabel","Years","ylabel","Lynx")
library("smoother")
library("plot")
setsize(640,480)
; data preparation
lynx = read("lynx.dat")
lynxrows = rows(lynx)
lag1 = lynx[1:lynxrows-1] ; vector of first lag
y = lynx[2:lynxrows] ; vector of dep. var.
data = lag1~y
data = log(data)
; estimation
h = 0.2*(max(data[,1])-min(data[,1])); crude bandwidth
"Bandwidth used" h
mh = regxest(data,h) ; N-W estimation
; graphics
mh = setmask(mh,"line","blue")
xy = setmask(data,"cross","small")
plot(xy,mh)
setgopt(plotdisplay,1,1,"title","Estimated NAR(1)
mean function")
setgopt(plotdisplay,1,1,"xlabel","First Lag","ylabel","Lynx")
While the Nadaraya-Watson function estimate is simple to compute it
may suffer from a substantial estimation bias due to the zero order
Taylor expansion. Therefore, it seems natural to increase the order
of the expansion. For example, by selecting
one obtains the
local linear estimator which corresponds to the following weighted
minimiziation problem
y = lpregxest (x,h {,p {,v}})
where the inputs are:
The following quantlet allows to visualize the difference between local constant and local linear estimation of the first order nonlinear autoregressive mean function for the lynx data. It produces Figure 4 where the solid and dotted lines display the local linear and local constant estimates, respectively. One notices that the local linear function estimate shows less variation.
library("smoother")
library("plot")
setsize(640,480)
; data preparation
lynx = read("lynx.dat")
lynxrows = rows(lynx)
lag1 = lynx[1:lynxrows-1] ; vector of first lag
y = lynx[2:lynxrows] ; vector of dep. var.
data = lag1~y
data = log(data)
; estimation
h = 0.2*(max(data[,1])-min(data[,1])); crude bandwidth
mh = regxest(data,h) ; N-W estimation
mhlp = lpregxest(data,h) ; local linear estimation
; graphics
mh = setmask(mh,"line","blue","dashed")
mhlp = setmask(mhlp,"line","red")
xy = setmask(data,"cross","small")
plot(xy,mh,mhlp)
setgopt(plotdisplay,1,1,"title","Estimated NAR(1)
mean function")
setgopt(plotdisplay,1,1,"xlabel","First Lag","ylabel","Lynx")
|
So far we have used a primitive way of selecting the bandwidth
parameter
. Of course, there are better methods for bandwidth
choice. They are all based on minimizing some estimated distance
measures. Since we are interested in one bandwidth for various
, we
look at ``global'' distances like, for instance, the
integrated squared error
(ISE)
![]() |
(10) |
The interactive quantlet
regxbwsel
offers cross-validation and
other bandwidth selection methods. The latter may be used in case of
independent data.
It is called by
{hcrit, crit} = regxbwsel(x{, h, K})
with the input variables:
library("smoother")
library("plot")
setsize(640,480)
; data preparation
lynx = read("lynx.dat")
lynxrows = rows(lynx)
lag1 = lynx[1:lynxrows-1] ; vector of first lag
y = lynx[2:lynxrows] ; vector of dep. var.
data = lag1~y
data = log(data)
;
tmp = regxbwsel(data)
It was already noted that the optimal bandwidth with respect to ISE (6) or ASE (7) may vary across samples. In order to obtain a sample independent optimal bandwidth one may consider the mean integrated squared error (MISE)
It is well known that if a fitted model is misspecified, then
resulting inference can be misleading like, for example, for confidence
intervals or significance tests. One way to check whether a chosen
model is correctly specified is to investigate the resulting
residuals. Most importantly, one checks for autocorrelation remaining
in the residuals. This can easily be done by inspecting the graph of
the autocorrelation function using the quantlet
acfplot. It
only requires the
vector x with the estimated
residuals as input variable. The quantlet also draws 95% confidence
intervals for the case of no autocorrelation.
Another issue is to check the normality of the residuals. This is
commonly done by using the Bera-Jarque test
suggested by
Bera and Jarque (1982). It is commonly called JB-test and
can be computed with the
quantlet
jarber
which is called by
{jb, probjb, sk, k} = jarber(resid, printout)
with input variables
; load required quantlets
library("smoother")
library("plot")
func("acfplot")
func("jarber")
setsize(640,480)
; data preparation
lynx = read("lynx.dat")
lynxrows = rows(lynx)
lag1 = lynx[1:lynxrows-1] ; vector of first lag
y = lynx[2:lynxrows] ; vector of dep. var.
data = lag1~y
data = log(data)
datain = data~#(1:lynxrows-1) ; add index to data
dataso = sort(datain,1) ; sorted data
; estimation
h = 1.12085 ; Cross-validation bandwidth
mhlp = regxest(dataso[,1|2],h)
; local constant estimation
; graphics
mhlp = setmask(mhlp,"line","red")
xy = setmask(data,"cross","small")
plot(xy,mhlp)
setgopt(plotdisplay,1,1,"title",
"Estimated NAR(1) mean function")
setgopt(plotdisplay,1,1,"xlabel","First Lag","ylabel","Lynx")
; diagnostics
yhatso = mhlp.data[,2]~dataso[,3] ; sorted est. fct. values
yhat = sort(yhatso,2) ; undo sorting
eps = data[,2] - yhat[,1] ; compute residuals
acfplot(eps) ; plot autocorrelation function of res.
setgopt(dacf,1,1,"title","Autocorrelation function of NAR(1)
residuals")
;
{jb,probjb,sk,k} = jarber(eps,1)
; compute Jarque-Bera test for normality of residuals
|
Once one selected the bandwidth and checked the residuals one often wants to investigate the variance of estimating the autoregression function. Under appropriate conditions, the variance of both the Nadaraya-Watson and the local linear estimator can be approximated by
![]() |
(13) |
| (14) |
Based on these estimates the quantlet
regxci
computes pointwise
confidence intervals using the Nadaraya-Watson estimator. It is called with
{mh, clo, cup} = regxci(x{, h, alpha, K, xv})
with input variables:
In Subsection 1.3 we found that the NAR(1) model for the lynx data is misspecified. Therefore, it is not appropriate for illustrating the computation of pointwise confidence intervals. Instead we will use a simulated time series. The quantlet below generates 150 observations of a stationary exponential AR(1) process
library("smoother")
library("plot")
library("times")
setsize(640,480)
; generate exponential AR(1) process
phi1 = 0.3
phi2 = 2.2
g = 0.1
randomize(0)
x = genexpar(1,g,phi1,phi1+phi2,normal(150))
; data preparation
xrows = rows(x)
lag1 = x[1:xrows-1] ; vector of first lag
y = x[2:xrows] ; vector of dep. var.
data = lag1~y
; true function
f = sort(lag1~(phi1*lag1 + phi2*lag1.*exp(-g*lag1^2)),1)
; estimation
{hcrit,crit} = regxbwsel(data)
{mh, clo, cup} = regxci(data,hcrit)
f = setmask(f,"line","solid","red")
data = setmask(data,"cross")
mh = setmask(mh,"line","dashed","blue")
clo = setmask(clo,"line","blue","thin","dotted")
cup = setmask(cup,"line","blue","thin","dotted")
plot(data,f,mh,clo,cup)
setgopt(plotdisplay,1,1,"title","Confidence intervals of
estimated NAR(1) mean function")
setgopt(plotdisplay,1,1,"xlabel","First Lag","ylabel","Y")
|
|
When investigating the properties of a conditional mean function, one is often
interested in its derivatives.
The estimation of derivatives can be accomplished
by using local polynomial estimation as long as the order
of the polynomial
is at least as large as the order
of the derivative to be estimated. Using a
local quadratic estimator
mh = lpderxest (x, h{, q, p, K, v})
with input variables
In the following quantlet we estimate the first and second derivatives of the conditional mean function of the exponential AR(1) process (15) based on 150 observations. The true derviatives (solid lines) and their estimates (dashed lines) are shown in Figures 7 and 8.
library("smoother")
library("plot")
library("times")
setsize(640,480)
; generate exponential AR(1) process
phi1 = 0.3
phi2 = 2.2
g = 0.1
randomize(0)
x = genexpar(1,g,phi1,phi1+phi2,normal(150))
; data preparation
xrows = rows(x)
lag1 = x[1:xrows-1] ; vector of first lag
y = x[2:xrows] ; vector of dep. var.
data = lag1~y
ffder = sort(lag1~(phi1 + exp(-g*lag1^2).*
phi2.*(1-2.*g.*lag1^2)),1)
fsder = sort(lag1~(exp(-g*lag1^2).*(-2*g.*lag1)*
phi2.*(3-2.*g.*lag1^2)),1)
; estimate first derivative
ffder = setmask(ffder,"line","solid","red")
mhfder = lpderxest(data)
mhfder = setmask(mhfder, "line","dashed","blue")
plotder = createdisplay(1,1)
show(plotder,1,1,ffder,mhfder)
setgopt(plotder,1,1,"title","Estimated first derivative
of mean function")
setgopt(plotder,1,1,"xlabel","First lag","ylabel",
"First derivative")
; estimate second derivative
fsder = setmask(fsder,"line","solid","red")
hrot = 2*lpderrot(data,2)
mhsder = lpderxest(data,hrot,2)
mhsder = setmask(mhsder, "line","dashed","blue")
plot(fsder,mhsder)
setgopt(plotdisplay,1,1,"title","Estimated second
derivative of mean function")
setgopt(plotdisplay,1,1,"xlabel","First lag","ylabel",
"Second derivative")
![]() |
MD*TECH Method and Data Technologies |
| http://www.mdtech.de mdtech@mdtech.de |