1. Nonlinear Autoregressive Models of Order One


1.1 Estimation of the Conditional Mean


mh = 2496 regxest(x{, h, K, v})
computes the univariate conditional mean function using the Nadaraya-Watson estimator
mh = 2499 regest(x{, h, K, v})
computes the univariate conditional mean function using the Nadaraya-Watson estimator and WARPing
mh = 2502 lpregxest(x{, h, p, v})
computes the univariate conditional mean function using local polynomial estimation
mh = 2505 lpregest(x{, h, p, K, d})
computes the univariate conditional mean function using local polynomial estimation and WARPing

Let us turn to estimating the conditional mean function $ f(\cdot)$ of a nonlinear autoregressive processes of order one (NAR(1) process)

$ Y_t = f(Y_{t-1}) + \sigma(Y_{t-1})\xi_t$ (2)

using nonparametric techniques. The basic idea is to estimate a Taylor approximation of order $ p$ of the unknown function $ f(\cdot)$ around a given point $ y$. The simplest Taylor approximation is obtained if its order $ p$ is chosen to be zero. One then approximates the unknown function by a constant. Of course, this approximation may turn out to be very bad if one includes observations $ Y_{t-1}$ that are distant to $ y$ since this might introduce a large approximation bias. One therefore weights those observations less in the estimation. Using the least squares principle, the estimated function value $ \widehat{f}(y,h)$ is provided by the estimated constant $ \widehat{c}_0$ of a local constant estimate around $ y$

$ \widehat{c}_{0}=\textrm{arg min}_{\left\{ c_{0}\right\} }\sum_{t=2}^{T}\left\{ Y_{t}-c_{0}\right\} ^{2}K_{h}({Y}_{t-1}-{y}),$ (3)

where $ K$ denotes the weighting function, which is commonly called a kernel function, and $ K_{h}({Y}_{t-1}-{y})=h^{-1} K\left\{

(Y_{t-1}-y)/h\right\} $. A number of kernel functions are used in practice, e.g. the Gaussian density function or the quartic kernel $ K(u) = 15/16(1-u^2)^2$ on the range $ [-1,1]$ and $ K(u)=0$ elsewhere. $ \widehat{f}({y},h)=\widehat c_0$ is known as the Nadaraya-Watson or local constant function estimator and can be written as

$ \widehat{f}(y,h) = \frac{\sum_{t=2}^T K_h(Y_{t-1}-y) Y_t} {\sum_{t=2}^T K_h(Y_{t-1}-y)}.$ (4)

The parameter $ h$ is called bandwidth parameter and controls the weighting of the lagged variables $ Y_{t-1}$ with respect to their distance to $ y$. While choosing $ h$ too small and therefore including only few observations in the estimation procedure leads to a too large estimation variance, taking $ h$ too large implies a too large approximation bias. Methods for bandwidth selection are presented in Subsection 1.2.

Before one applies Nadaraya-Watson estimation one should be aware of the conditions that the underlying data generating mechanism has to fulfil such that the estimator has nice asymptotic properties: most importantly, the function $ f(\cdot)$ has to be continuous, the stochastic process has to be stationary and the dependence among the observations must decline fast enough if the distance among the observations increases. For measuring dependence in nonlinear time series one commonly uses various mixing concepts. For example, a sequence is said to be $ \alpha$-mixing (strong mixing) (Robinson, 1983) if

$\displaystyle

\sup_{A\in {\cal F}_1^n, B\in {\cal F}_{n+k}^\infty} \vert P(A\cap B)-P(A)P(B)\vert

\leq \alpha_k,

$

where $ \alpha_k\rightarrow 0$ and $ {\cal F}_i^j$ is the $ \sigma$-field generated by $ X_i,\dots, X_j$. An alternative and stronger condition is given by the $ \beta$-mixing condition (absolute regularity)

$\displaystyle

E\sup \left\{ \left\vert P(B\vert A)-P(B)\right\vert\right\} \leq \beta (k)

$

for any $ A\in {\cal F}_1^n$ and $ B\in {\cal F}_{n+k}^\infty $. An even stronger condition is the $ \phi$-mixing (uniformly mixing) condition (Billingsley, 1968) where

$\displaystyle

\vert P(A\cap B)-P(A)P(B)\vert\leq \phi_kP(A)

$

for any $ A\in {\cal F}_1^n$ and $ B\in {\cal F}_{n+k}^\infty $ and $ \phi_k$ tends to zero for $ k \rightarrow \infty$. The rate at which $ \alpha_k$, $ \beta_k$ or $ \phi_k$ go to zero plays an important role in showing asymptotic properties of the nonparametric smoothing procedures. We note that these conditions are in general difficult to check. However, if the process follows a stationary Markov chain, then geometric ergodicity implies absolute regularity, which in turn implies strong mixing conditions. Techniques exist for checking geometric ergodicity, see e.g. Doukhan (1994) or Lu (1998). Further and more detailed conditions will be discussed in Subsection 2.2.

The quantlet 2509 regxest allows to compute Nadarya-Watson estimates of $ f(\cdot)$ for an array of different $ y$'s. Its syntax is


  mh = regxest(x{, h, K, v})

with the input variables
x
$ (T-1) \times 2$ matrix, in the first column the independent, in the second column the dependent variable,
h
scalar, bandwidth for which if not given, 20% of the range of the values in the first column of x is used,
K
string, kernel function on [-1,1] or Gaussian kernel "gau" for which if not given, the Quartic kernel "qua" is used,
v
$ m \times 1$ vector of values of the independent variable on which to compute the regression for which if not given, x is used.
This quantlet returns a $ (T-1) \times 2$ or $ m \times 2$ matrix mh, where the first column is the sorted first column of x or the sorted v, the second column contains the regression estimate on the values of the first column.

In order to illustrate the methods presented in this chapter, we model the dynamics underlying the famous annual Canadian lynx trappings in 1821-1934, see e.g. Brockwell andDavis (1991, Appendix, Series G). Figures 1 and 2 of their original and logged time series are obtained with the quantlet


  library("plot")

  setsize(640,480)

  lynx        = read("lynx.dat")  ; read data

  d1          = createdisplay(1,1)

  x1          = #(1821:1934)~lynx

  setmaskl (x1, (1:rows(x1))', 0, 1)

  show(d1,1,1,x1)                 ; plot data

  setgopt(d1,1,1,"title","Annual Canadian Lynx

                                  Trappings, 1821-1934")

  setgopt(d1,1,1,"xlabel","Years","ylabel","Lynx")

  d2          = createdisplay(1,1)

  x2          = #(1821:1934)~log(lynx)

  setmaskl (x2, (1:rows(x2))', 0, 1)

  show(d2,1,1,x2)                 ; plot data

  setgopt(d2,1,1,"title","Logs of Annual Canadian

                                  Lynx Trappings, 1821-1934")

  setgopt(d2,1,1,"xlabel","Years","ylabel","Lynx")

2513 flts01.xpl

Their inspection indicates that taking logarihms is required to make the time series look stationary.

Figure 1: Time series of annual Canadian Lynx Trappings, 1821-1934
\includegraphics[scale=0.6]{plotlynx.ps}

Figure 2: Time series of logarithm of annual Canadian Lynx Trappings, 1821-1934
\includegraphics[scale=0.6]{plotloglynx.ps}

The following quantlet reads the lynx data set, constructs the vectors of the dependent and lagged variables, computes the Nadaraya-Watson estimator and plots the resulting function including the scatter plot which is displayed in Figure 3. For selecting the bandwidth we use here the primitive rule to take one fifth of the data range.

  library("smoother")

  library("plot")

  setsize(640,480)

;                       data preparation

  lynx      = read("lynx.dat")

  lynxrows  = rows(lynx)

  lag1      = lynx[1:lynxrows-1]    ; vector of first lag

  y         = lynx[2:lynxrows]      ; vector of dep. var.

  data      = lag1~y

  data      = log(data)

;                       estimation

  h         = 0.2*(max(data[,1])-min(data[,1])); crude bandwidth

  "Bandwidth used" h

  mh        = regxest(data,h)      ; N-W estimation

;                       graphics

  mh        = setmask(mh,"line","blue")

  xy        = setmask(data,"cross","small")

  plot(xy,mh)

  setgopt(plotdisplay,1,1,"title","Estimated NAR(1) 

                                                 mean function")

  setgopt(plotdisplay,1,1,"xlabel","First Lag","ylabel","Lynx")

2523 flts02.xpl

Figure 3: Nadaraya-Watson estimates of NAR(1) mean function for lynx data and scatter plot
\includegraphics[scale=0.6]{lynx1.ps}

For long time series the computation of the Nadaraya-Watson estimates may become quite slow since there are more points at which to estimate the function and each estimation involves more data. In this case one may use the WARPing, weighted average of rounded points, technique. The basic idea is the ``binning'' of the data in bins of length $ d$. Each observation is then replaced by the bincenter of the corresponding bin which means that each point is rounded to the precision given by $ d$. A typical choice for $ d$ is $ h/5$ or $ (\max

Y_{t-1}-\min Y_{t-1})/100$. In the latter case, the effective sample size $ r$, i.e. the number of nonempty bins, for computation is at most 101. If WARPing is necessary, just call the quantlet 2529 regest which has the same parameters as the quantlet 2532 regxest.

While the Nadaraya-Watson function estimate is simple to compute it may suffer from a substantial estimation bias due to the zero order Taylor expansion. Therefore, it seems natural to increase the order $ p$ of the expansion. For example, by selecting $ p=1$ one obtains the local linear estimator which corresponds to the following weighted minimiziation problem

$ \{\widehat{c}_{0},\widehat{c}_1\}=\textrm{arg min}_{\left\{ c_{0},c_1\right\} ...

...=2}^{T}\left\{ Y_{t}-c_{0}-c_1({Y}_{t-1}-{y})\right\} ^{2}K_{h}({Y}_{t-1}-{y}),$ (5)

where the estimated function value $ \widehat{f}_2(y,h)$ is provided as before by the estimated constant $ \widehat{c}_0$. In a similar way one obtains the local quadratic estimator if one chooses $ p=2$. The quantlet 2536 lpregxest allows to compute local linear or local quadratic function estimates using the quartic kernel. Its syntax is

  y = lpregxest (x,h {,p {,v}})

where the inputs are:
x
$ (T-1) \times 2$ matrix, in the first column the independent, in the second column the dependent variable,
h
scalar, bandwidth for which if not given, the rule-of-thumb bandwidth computed by the quantlet lpregrot is used,
p
integer, order of polynomial: p=0 yields the Nadaraya-Watson estimator, p=1 yields local linear estimation (which is default), p=2 (local quadratic) is the highest possible order,
v
$ m \times 1$, values of the independent variable on which to compute the regression for which if not given, x is used.
The output is given by the
mh
$ (T-1) \times 2$ or $ m \times 2$ matrix, the first column is the sorted first column of x or the sorted v, the second column contains the regression estimate on the values of the first column.

The following quantlet allows to visualize the difference between local constant and local linear estimation of the first order nonlinear autoregressive mean function for the lynx data. It produces Figure 4 where the solid and dotted lines display the local linear and local constant estimates, respectively. One notices that the local linear function estimate shows less variation.


  library("smoother")

  library("plot")

  setsize(640,480)

;                       data preparation

  lynx      = read("lynx.dat")

  lynxrows  = rows(lynx)

  lag1      = lynx[1:lynxrows-1]    ; vector of first lag

  y         = lynx[2:lynxrows]      ; vector of dep. var.

  data      = lag1~y

  data      = log(data)

;                       estimation

  h         = 0.2*(max(data[,1])-min(data[,1])); crude bandwidth

  mh        = regxest(data,h)       ; N-W estimation

  mhlp      = lpregxest(data,h)     ; local linear estimation

;                       graphics

  mh        = setmask(mh,"line","blue","dashed")

  mhlp      = setmask(mhlp,"line","red")

  xy        = setmask(data,"cross","small")

  plot(xy,mh,mhlp)

  setgopt(plotdisplay,1,1,"title","Estimated NAR(1) 

                                                mean function")

  setgopt(plotdisplay,1,1,"xlabel","First Lag","ylabel","Lynx")

2540 flts03.xpl

Figure 4: Local linear estimates (solid line) and Nadaraya-Watson estimates (dotted line) of NAR(1) mean function for lynx data and scatter plot
\includegraphics[scale=0.6]{lynx2.ps}

Like Nadaraya-Watson estimation local linear estimation may become slow for long time series. In this case, one may use the quantlet 2546 lpregest which uses the WARPing technique.


1.2 Bandwidth Selection


{hcrit, crit} = 2557 regxbwsel(x{, h, K})
interactive tool for bandwidth selection in univariate kernel regression estimation.
{hcrit, crit} = 2560 regbwsel(x{, h, K, d})
interactive tool for bandwidth selection in univariate kernel regression estimation using the WARPing method.

So far we have used a primitive way of selecting the bandwidth parameter $ h$. Of course, there are better methods for bandwidth choice. They are all based on minimizing some estimated distance measures. Since we are interested in one bandwidth for various $ y$, we look at ``global'' distances like, for instance, the integrated squared error (ISE)

$ d_I(h) = \int \left\{ f(y)-\widehat{f}(y,h)\right\}^2 w(y)\mu(y)dy.$ (6)

Here $ \mu(\cdot)$ denotes the density of the stationary distribution and $ w(\cdot)$ is a weight function with compact support. Note that the bandwidth which minimizes the ISE $ d_I(h)$ in generally varies from sample to sample. In practice, one may want to avoid the integration and consider an approximation of the ISE, namely the average squared error (ASE)

$ d_A(h) = \frac{1}{T-1}\sum_{t=2}^T \left\{f(Y_{t-1})-\widehat{f}(Y_{t-1},h)\right\}^2 w(Y_{t-1}).$ (7)

Since the measure of accuracy $ d_A(h)$ involves the unknown autoregression function $ f(\cdot)$, it cannot be used directly. Instead, one may estimate $ f(Y_{t-1})$ by $ Y_t$. One then obtains the average squared error of prediction (ASEP)

$ d_{AP}(h) = \frac{1}{T-1}\sum_{t=2}^T \left\{Y_t-\widehat{f}(Y_{t-1},h)\right\}^2 w(Y_{t-1}).$ (8)

This, however, implies the new problem that $ d_{AP}(h)$ can be driven to zero by choosing $ h$ small enough. To see this consider the Nadaraya-Watson estimator (4) and imagine that the bandwidth $ h$ is chosen so small that (4) becomes $ \widehat{f}(Y_{t-1},h)=Y_t$. This implies $ d_{AP}(h)=0$. This estimation problem can easily be solved by always leaving out $ Y_t$ in computing (4) which leads to

$ \widehat{f}_{-t}(y) = \frac{\sum_{i=2,i\neq t}^T K_h(Y_{i-1}-y) Y_i} {\sum_{i=2,i\neq t}^T K_h(Y_{i-1}-y)}$ (9)

and is called the leave-one-out cross-validation estimate of the autoregression function. One therefore estimates $ d_{AP}(h)$ with the cross-validation function

$ CV(h) = \frac{1}{T-1}\sum_{t=2}^T \left\{Y_t-\widehat{f}_{-t}(Y_{t-1},h)\right\}^2 w(Y_{t-1}).$ (10)

Let $ \widehat{h}$ be the bandwidth that minimizes $ CV(h)$. Härdle (1990) and Härdle and Vieu (1992) proved that under an $ \alpha$-mixing condition,

$\displaystyle

\frac{d_{A}(\widehat{h})}{\inf_h d_A(h)}\rightarrow 1\quad \textrm{in

probability}.

$

The interactive quantlet 2564 regxbwsel offers cross-validation and other bandwidth selection methods. The latter may be used in case of independent data. It is called by


  {hcrit, crit} = regxbwsel(x{, h, K})

with the input variables:
x
$ (T-1) \times 2$ vector of the data,
h
$ m \times 1$ vector of bandwidths,
K
string, kernel function on $ [-1,1]$ e.g. quartic kernel "qua" (default) or Gaussian kernel "gau".
The output variables are:
hcrit
$ p \times 1$ vector, selected bandwidths by the different criteria,
crit
$ p \times 1$ string vector, criteria considered for bandwidth selection.
If one wants to use WARPing one has to use the quantlet 2567 regbwsel. Using the following quantlet one may estimate the cross-validation bandwidth for the lynx data set and obtains $ \widehat

h=1.12085$.

  library("smoother")

  library("plot")

  setsize(640,480)

;                       data preparation

  lynx      = read("lynx.dat")

  lynxrows  = rows(lynx)

  lag1      = lynx[1:lynxrows-1]            ; vector of first lag

  y         = lynx[2:lynxrows]              ; vector of dep. var.

  data      = lag1~y

  data      = log(data)

;

  tmp       = regxbwsel(data)

2571 flts04.xpl

It was already noted that the optimal bandwidth with respect to ISE (6) or ASE (7) may vary across samples. In order to obtain a sample independent optimal bandwidth one may consider the mean integrated squared error (MISE)

$ d_M(h) = E\left[\int \left\{ f(y)-\widehat{f}(y,h)\right\}^2w(y)\mu(y)dy\right].$ (11)

Like $ d_I(h)$ or $ d_A(h)$, it also cannot be used directly. It is, however, possible to derive the asymptotic expansion of $ d_M(h)$. This allows to obtain an explicit formula for the asymptotically optimal bandwidth $ h_{opt}$ which, however, contains unknown constants. In Subsection 2.2 we show how one can estimate these unknown quantities in order to obtain a plug-in bandwidth $ \widehat{h}_{opt}$.


1.3 Diagnostics


2586 acfplot(x)
generates plot of autocorrelation function of time series contained in vector x.
{jb, probjb, sk, k} = 2589 jarber(x, 1)
checks for normality of the data contained in vector x using the Jarque-Bera test.

It is well known that if a fitted model is misspecified, then resulting inference can be misleading like, for example, for confidence intervals or significance tests. One way to check whether a chosen model is correctly specified is to investigate the resulting residuals. Most importantly, one checks for autocorrelation remaining in the residuals. This can easily be done by inspecting the graph of the autocorrelation function using the quantlet 2592 acfplot. It only requires the $ (T-1) \times 1$ vector x with the estimated residuals as input variable. The quantlet also draws 95% confidence intervals for the case of no autocorrelation.

Another issue is to check the normality of the residuals. This is commonly done by using the Bera-Jarque test suggested by Bera and Jarque (1982). It is commonly called JB-test and can be computed with the quantlet 2595 jarber which is called by


  {jb, probjb, sk, k} = jarber(resid, printout)

with input variables
resid
$ (T-1) \times 1$ matrix of residuals,
printout
scalar, 0 no printout, 1 printout,
and output variables
jb
scalar, test statistic of Jarque-Bera test,
probjb
scalar, probability value of test statistics,
sk
scalar, skewness,
k
scalar, kurtosis.
In the following quantlet these diagnostics are applied to the residuals of the NAR(1) model fitted to the lynx data using the Nadaraya-Watson estimator (4) with the cross-validation bandwidth $ \widehat

h=1.12085$

;               load required quantlets

  library("smoother")

  library("plot")

  func("acfplot")

  func("jarber")

  setsize(640,480)

;               data preparation

  lynx      = read("lynx.dat")

  lynxrows  = rows(lynx)

  lag1      = lynx[1:lynxrows-1]        ; vector of first lag

  y         = lynx[2:lynxrows]          ; vector of dep. var.

  data      = lag1~y

  data      = log(data)

  datain    = data~#(1:lynxrows-1)      ; add index to data

  dataso    = sort(datain,1)            ; sorted data

;               estimation

  h         = 1.12085               ; Cross-validation bandwidth

  mhlp      = regxest(dataso[,1|2],h)   

                                    ; local constant estimation

;               graphics

  mhlp      = setmask(mhlp,"line","red")

  xy        = setmask(data,"cross","small")

  plot(xy,mhlp)

  setgopt(plotdisplay,1,1,"title",

                                "Estimated NAR(1) mean function")

  setgopt(plotdisplay,1,1,"xlabel","First Lag","ylabel","Lynx")

;               diagnostics

  yhatso    = mhlp.data[,2]~dataso[,3]  ; sorted est. fct. values

  yhat      = sort(yhatso,2)            ; undo sorting

  eps       = data[,2] - yhat[,1]       ; compute residuals

  acfplot(eps)            ; plot autocorrelation function of res.

  setgopt(dacf,1,1,"title","Autocorrelation function of NAR(1)

                                                      residuals")

;

  {jb,probjb,sk,k} = jarber(eps,1)

        ; compute Jarque-Bera test for normality of residuals

2599 flts05.xpl

The plot of the resulting autocorrelation function of the residuals is shown in Figure 5. It clearly shows that the residuals are not white noise. This indicates that one should use a higher order nonlinear autoregressive process for modelling the dynamics of the lynx data. This will be discussed in Section 2. Moreover, normality is rejected even at the 1% significance level since the JB-test statistic is 11.779 which implies a $ p$-value of 0.003.

Figure 5: Autocorrelation function of estimated residuals based on a NAR(1) model for the lynx data
\includegraphics[scale=0.6]{lynxresacf.ps}


1.4 Confidence Intervals


{mh, clo, cup} = 2613 regxci(x{, h, alpha, K, xv})
computes pointwise confidence intervals with prespecified confidence level for univariate regression using the Nadaraya-Watson estimator.
{mh, clo, cup} = 2616 regci(x{, h, alpha, K, d})
computes pointwise confidence intervals with prespecified confidence level for univariate regression using the Nadaraya-Watson estimator. The computation uses WARPing.

Once one selected the bandwidth and checked the residuals one often wants to investigate the variance of estimating the autoregression function. Under appropriate conditions, the variance of both the Nadaraya-Watson and the local linear estimator can be approximated by

$ \textrm{Var}(\widehat{f}(y,h)) \approx\frac{1}{Th}\frac{\sigma^2(y)}{\mu(y)}\vert\vert K\vert\vert\vert _2^2$ (12)

as will be seen in Subsection 2.1. (12) can be used for constructing confidence intervals for $ \widehat{f}(\cdot)$ since one can estimate the conditional variance $ \sigma^2(y)$ by the kernel estimate

$ \widehat{\sigma}^2(y,h) = \frac{\sum_{t=2}^T K_h(Y_{t-1}-y) Y_t^2} {\sum_{t=2}^T K_h(Y_{t-1}-y)} - \widehat{f}(y,h)$ (13)

and the density $ \mu(y)$ by the kernel estimate

$ \widehat{\mu}(y,h) = \sum_{t=1}^T K_h(Y_{t}-y).$ (14)

Based on these estimates the quantlet 2620 regxci computes pointwise confidence intervals using the Nadaraya-Watson estimator. It is called with


  {mh, clo, cup} = regxci(x{, h, alpha, K, xv})

with input variables:
x
$ (T-1) \times 2$ matrix of the data with the independent and the dependent variable in the first and second column, respectively,
h
scalar, bandwidth for which if not given 20% of the range of the values in the first column x is used,
alpha
confidence level with 0.05 as default value,
K
string, kernel function on $ [-1,1]$ and the quartic kernel "qua" as default,
xv
$ m \times 1$ matrix of the values of the independent variable on which to compute the regression and x as default.
The output variables are:
mh
$ (T-1) \times 2$ or $ m \times 2$ matrix, the first column is the sorted first column of x or the sorted xv, the second column contains the regression estimate on the values of the first column,
clo
$ (T-1) \times 2$ or $ m \times 2$ matrix, the first column is the sorted first column of x or the sorted xv, the second column contains the lower confidence bounds on the values of the first column,
cup
$ (T-1) \times 2$ or $ m \times 2$ matrix, the first column is the sorted first column of x or the sorted xv, the second column contains the upper confidence bounds on the values of the first column.
If the WARPing technique is required, one uses the quantlet 2623 regci.

In Subsection 1.3 we found that the NAR(1) model for the lynx data is misspecified. Therefore, it is not appropriate for illustrating the computation of pointwise confidence intervals. Instead we will use a simulated time series. The quantlet below generates 150 observations of a stationary exponential AR(1) process

$ Y_t = 0.3 Y_{t-1} + 2.2 Y_{t-1}\exp\left(-0.1 Y_{t-1}^2\right) + \xi_t, \quad\xi \sim N(0,1),$ (15)

calls the interactive quantlet 2628 regxbwsel for bandwidth selection where one has to choose for the first time cross-validation and for the second time stop, computes the confidence intervals and plots the true and estimated function (solid and dashed line) as well as the pointwise confidence intervals (dotted line) as shown in Figure 6.


  library("smoother")

  library("plot")

  library("times")

  setsize(640,480)



;                   generate exponential AR(1) process

  phi1    = 0.3

  phi2    = 2.2

  g       = 0.1

  randomize(0)

  x       = genexpar(1,g,phi1,phi1+phi2,normal(150))



;                   data preparation

  xrows   = rows(x)

  lag1    = x[1:xrows-1]             ; vector of first lag

  y       = x[2:xrows]               ; vector of dep. var.

  data    = lag1~y



;                   true function

  f       = sort(lag1~(phi1*lag1 + phi2*lag1.*exp(-g*lag1^2)),1)



;                   estimation

  {hcrit,crit}    = regxbwsel(data)

  {mh, clo, cup}  = regxci(data,hcrit)



  f       = setmask(f,"line","solid","red")

  data    = setmask(data,"cross")

  mh      = setmask(mh,"line","dashed","blue")

  clo     = setmask(clo,"line","blue","thin","dotted")

  cup     = setmask(cup,"line","blue","thin","dotted")

  plot(data,f,mh,clo,cup)

  setgopt(plotdisplay,1,1,"title","Confidence intervals of

                                estimated NAR(1) mean function")

  setgopt(plotdisplay,1,1,"xlabel","First Lag","ylabel","Y")

2632 flts06.xpl

Figure 6: True and estimated mean function plus pointwise confidence intervals for a generated exponential AR(1) process
\includegraphics[scale=0.6]{earci.ps}


1.5 Derivative Estimation


mh = 2648 lpderxest(x, h{, q, p, K, v})
estimates the q-th derivative of a regression function using local polynomial kernel regression with quartic kernel.
mh = 2651 lpderest(x, h{, q, p, K, d})
estimates the q-th derivative of a autoregression function using local polynomial kernel regression. The computation uses WARPing.

When investigating the properties of a conditional mean function, one is often interested in its derivatives. The estimation of derivatives can be accomplished by using local polynomial estimation as long as the order $ p$ of the polynomial is at least as large as the order $ q$ of the derivative to be estimated. Using a local quadratic estimator

$\displaystyle

\{\widehat{c}_{0},\widehat{c}_1,\widehat{c}_2\}

$

$\displaystyle

=\textrm{arg}\!\!\!\!\!\!\min_{\left\{ c_{0},c_1,c_2\right\} }

\...

...c_{0}-c_1({Y}_{t-1}-{y})-c_2({Y}_{t-1}-{y})^2\right\} ^{2}K_{h}({Y}_{t-1}-{y})

$

one estimates the first and second derivative of $ f(y)$ at $ y$ with

$\displaystyle

\widehat{f}'(y,h) = \widehat{c}_1, \quad \widehat{f}''(y,h)=2\widehat{c}_2.

$

In general, one uses a $ q+1$ instead of a $ q$-th order polynomial for the estimation of the $ q$-th derivative since this reduces the complexity of the estimation bias, see e.g. Fan and Gijbels (1995). The estimated derivative is then obtained as $ \widehat{f}^{(q)}=q! \widehat{c}_q$. The quantlet 2655 lpderxest allows to estimate first and second order derivatives where maximally a second order polynomial is used. It is called by

  mh = lpderxest (x, h{, q, p, K, v})

with input variables
x
$ (T-1) \times 2$ matrix of the data with the independent and dependent variable in the first and second column, respectively.
h
scalar, bandwidth for which if not given the rule-of-thumb bandwidth is computed with 2658 lpderrot,
q
integer $ \leq 2$, order of derivative for which if not given, q=1 (first derivative) is chosen,
p
integer, order of polynomial for which if not given, p=q + 1 is used for q$ <2$ and p=q is used for q=2,
v
$ m \times 1$, values of the independent variable on which to compute the regression for which if not given, x is used.
The output variable is
mh
$ (T-1) \times 2$ or $ m \times 2$ matrix where the first column is the sorted first column of x or the sorted v and the second column contains the derivative estimate on the values of the first column.
The quantlet 2661 lpderest which applies the WARPing technique (Fan and Marron, 1994) allows for p $ \leq 5$ and q $ \leq 4$. We note, however, that WARPing may waste a lot of information. Bandwidth selection remains an important issue and can be done using the quantlet 2664 lpderrot.

In the following quantlet we estimate the first and second derivatives of the conditional mean function of the exponential AR(1) process (15) based on 150 observations. The true derviatives (solid lines) and their estimates (dashed lines) are shown in Figures 7 and 8.


  library("smoother")

  library("plot")

  library("times")

  setsize(640,480)

;                   generate exponential AR(1) process

  phi1    = 0.3

  phi2    = 2.2

  g       = 0.1

  randomize(0)

  x       = genexpar(1,g,phi1,phi1+phi2,normal(150))



;                       data preparation

  xrows   = rows(x)

  lag1    = x[1:xrows-1]             ; vector of first lag

  y       = x[2:xrows]               ; vector of dep. var.

  data    = lag1~y

  ffder   = sort(lag1~(phi1 + exp(-g*lag1^2).*

                                  phi2.*(1-2.*g.*lag1^2)),1)

  fsder   = sort(lag1~(exp(-g*lag1^2).*(-2*g.*lag1)*

                                  phi2.*(3-2.*g.*lag1^2)),1)



;                       estimate first derivative

  ffder   = setmask(ffder,"line","solid","red")

  mhfder  = lpderxest(data)

  mhfder  = setmask(mhfder, "line","dashed","blue")

  plotder = createdisplay(1,1)

  show(plotder,1,1,ffder,mhfder)

  setgopt(plotder,1,1,"title","Estimated first derivative

                                  of mean function")

  setgopt(plotder,1,1,"xlabel","First lag","ylabel",

                                  "First derivative")

;                       estimate second derivative

  fsder   = setmask(fsder,"line","solid","red")

  hrot    = 2*lpderrot(data,2)

  mhsder  = lpderxest(data,hrot,2)

  mhsder  = setmask(mhsder, "line","dashed","blue")

  plot(fsder,mhsder)

  setgopt(plotdisplay,1,1,"title","Estimated second

                                  derivative of mean function")

  setgopt(plotdisplay,1,1,"xlabel","First lag","ylabel",

                                  "Second derivative")

2668 flts07.xpl

Figure 7: True and estimated first derivative for a generated exponential AR(1) process
\includegraphics[scale=0.6]{earffder.ps}

Figure 8: True and estimated second derivative a generated exponential AR(1) process
\includegraphics[scale=0.6]{earfsder.ps}



Method and Data Technologies   MD*TECH Method and Data Technologies
  http://www.mdtech.de  mdtech@mdtech.de