3. Noninteractive Quantlets for Estimation


m = 1837 intest(t, y, h, g, loc{, opt})
estimates an additive model (AM)
{m, b, const} = 1840 intestpl(x, t, y, h, g, loc{, opt})
estimates an additive partially linear model (APLM)
{m, b, const} = 1843 backfit(t, y, h, loc, kern{, opt})
estimates an additive and additive partially linear model
m = 1846 gintest(code, t, y, h, g, loc{, opt})
estimates a generalized additive model (GAM)
{m, b, bv, const} = 1849 gintestpl(code, x, t, y, h, g{, opt})
estimates a generalized additive partially linear model (GAPLM)
m = 1852 intest2d(t, y, h, g, loc{, opt})
estimates a bivariate marginal influence
{fh, c} = 1855 interact(t, y, h, g, loc, incl{, tg})
estimates an additive model with interaction terms
m = 1858 fastint(t, y, h1, h2, loc{, tg})
estimates an additive model using marginal integration
Here is the list of all quantlets. Their use is described in the following subsections.


3.1 Estimating an Additive Model


m = 1867 intest(t, y, h, g, loc{, opt})
estimates an additive model (AM)



  library("gam")

  randomize(1234)

  t     = uniform(50,2)*2-1

  g1    = 2*t[,1]

  g2    = t[,2]^2

  g2    = g2 - mean(g2)

  y     = g1 + g2  + normal(50,1) * sqrt(0.25)

  h     = #(1.2, 1.0)

  g     = #(1.4, 1.2)

  loc   = 1

  gest  = intest(t,y,h,g,loc)

  gest

  bild  = createdisplay(1,2)

  dat11 = t[,1]~g1

  dat12 = t[,1]~gest[,1]

  dat21 = t[,2]~g2

  dat22 = t[,2]~gest[,2]

  setmaskp(dat12,4,4,8)

  setmaskp(dat22,4,4,8)

  show(bild,1,1,dat11,dat12)

  show(bild,1,2,dat21,dat22)

1871 gam02.xpl

The quantlet 1876 intest provides a way to estimate the univariate additive functions and derivatives of a separable additive model using Nadaraya-Watson, local linear or local quadratic estimation. Input parameters:
h
$ p'\times 1$ bandwidth vector for the directions of interest (see remarks). It can be $ p'=p$, $ p'=pg$ or $ p'=1$ for the same bandwidth in all directions.
g
$ p\times 1$ bandwidth vector for the directions not of interest
loc
scalar specifying the estimation procedure:
0
--Nadaraya-Watson (local constant)
1
--local linear
2
--local quadratic
Optional parameters (see Section 5):
opt.tg
$ ng \times pg$ matrix, a grid for continuous part If tg is given, the nonparametric function will be computed on this grid
opt.shf
scalar (show-how-far). If it exists and is equal to one, an output is produced and it indicates how the iteration is going on (additive function/point of estimation/number of iteration).
Output value:
m
$ n(ng) \times p'\cdot(loc+1)\times q$ matrix containing the marginal integration estimates in the first p' columns, followed by the 1-st and 2-nd derivative if local linear or quadratic estimation is used
Remarks: The grid may have less dimensions ($ p'$) than the explanatory data. The estimation will then be run on the first $ p'$ directions of interest. Consequently, it is possible to specify the bandwidth vector $ h$ for the directions of interest only.


3.2 Estimating an Additive Partially Linear Model


{m, b, const} = 1885 intestpl(x, t, y, h, g, loc{, opt})
estimates an additive partially linear model (APLM)



  library("gam")

  randomize(1345)

  loc= 2

  x = matrix(50,2)

  t = uniform(50,2)*2-1

  xh = uniform(50,2)

  x[,1]= 3*(xh>=0.8)+2*((0.8>xh)&&(xh>=0.3))+(0.3>xh)

  x[,2]= (xh>(1/3))

  g1    = 2*t[,1]

  g2    = (2*t[,2])^2

  g2    = g2 -mean(g2)

  m     = g1 + g2 + x*(0.2|-1.0)

  y     = m + normal(50,1)*0.25

  h     = #(1.4, 1.4)

  g     = #(1.4, 1.4)

  {m,b,const} = intestpl(x,t,y,h,g,loc)

  b

  const

  bild =createdisplay(1,2)

  dat11= t[,1]~g1

  dat12= t[,1]~m[,1]

  setmaskp(dat12,4,4,8)

  show(bild,1,1,dat11,dat12)

  dat21= t[,2]~g2

  dat22= t[,2]~m[,2]

  setmaskp(dat22,4,4,8)

  show(bild,1,2,dat21,dat22)

1889 gam03.xpl

The quantlet 1894 intestpl estimates the univariate additive functions and its derivatives in an additive partially linear model (APLM) using local linear or local quadratic estimation. Input parameters:
h
$ p\times 1$ vector or a scalar, the bandwidth for the directions of interest
g
$ p\times 1$ vector or a scalar, the bandwidth for the directions not of interest
loc
scalar indicating the estimation procedure:
1
--local linear
2
--local quadratic
Optional parameters (see Section 5):
opt.tg
$ ng \times pg$ matrix, a grid for continuous part. If tg is given, the nonparametric function will be computed on this grid.
opt.shf
scalar, if shf=1 then it is shown how the process is going on (default: shf=0)
Output values:
m
$ ng\times p\cdot(loc+1)\times q$ matrix, the marginal integration estimates in the first p columns, followed by the 1-st and 2-nd derivative, if local linear or quadratic is used
b
$ d \times 1$ vector, the coefficients of the linear part
const
scalar, the constant in the additive partial linear model


3.3 Estimating Additive and Additive Partially Linear Model


{m, b, const} = 1903 backfit(t, y, h, loc, kern{, opt})
estimates an additive and additive partially linear model



  library("gam")

  randomize(1)

  n   = 100

  t   = normal(n,2)             ; explanatory variable

  x   = normal(n,2)             ; the linear part

  f1  = - sin(2*t[,1])          ; estimated functions

  f2  = t[,2]^2

  eps = normal(n,1) * sqrt(0.75)

  y   = x[,1] - x[,2]/4 + f1 + f2 +eps      ; response variable

  h   = 0.5

  opt = gamopt("x",x,"shf",1)   ; the linear part is used

  ;                               and the iterations will be shown

  {m,b,const} = backfit(t,y,h,0,"qua",opt)

  ;

  b                             ; coefficients for the linear part

  ;                               ([1, -1/4] were used)

  const                         ; estimation of the constant 

  ;

  pic = createdisplay(1,2)      ; preparing the graphical output

  d1  = t[,1]~m[,1]

  d2  = t[,2]~m[,2]

  setmaskp(d1,4,4,4)

  setmaskp(d2,4,4,4)

  m1  = mean(f1)

  m2  = mean(f2)

  yy  = y - x*b - const

  x1  = t[,1]~(yy - m[,2])

  x2  = t[,2]~(yy - m[,1])

  setmaskp(x1,1,11,4)

  setmaskp(x2,1,11,4) 

  setmaskl(d1,(sort(d1~(1:rows(d1)))[,3])',4,1,1)

  setmaskl(d2,(sort(d2~(1:rows(d2)))[,3])',4,1,1)

  show(pic,1,1,d1,x1,t[,1]~(f1-m1))

  show(pic,1,2,d2,x2,t[,2]~(f2-m2))

1907 gam04.xpl

The quantlet 1912 backfit estimates the univariate additive functions and its derivatives in an additive (AM) or additive partially linear model (APLM) using the backfitting algorithm. It accepts only one-dimensional response variables y. Input parameters:
h
$ p\times 1$ vector or a scalar, the bandwidth
loc
scalar indicates the estimation procedure:
0
--Nadaraya-Watson (local constant)
1
--local linear
2
--local quadratic
kern
string indicates the kernel function
"qua"
quartic kernel
"epa"
Epanechnikov kernel
"gau"
Gaussian kernel
Optional parameters:
opt.x
$ n\times d$ matrix, the explanatory variables for the linear part (at least the discrete variables)
opt.shf
shf=1 to show how the iteration is going on (default: shf=0)
opt.miter
scalar, the maximal number of iterations (default: miter=50)
opt.cnv
scalar, the convergence criterion (default: cnv=$ 0.000001$)
The quantlet returns
m
$ n\times p\cdot(loc+1)$ matrix, the estimate of the additive functions in column 1 to $ p$, the first derivatives in column $ (p+1)$ to $ 2p$ and the second derivatives in column $ (2p+1)$ to $ 3p$
b
$ d \times 1$ vector, the coefficients of the linear part
const
scalar, the estimate of the constant in the model
The example to this quantlet ( 1915 gam04.xpl ) produces the following graphical output:
\includegraphics[scale=0.6]{gam_backfit.ps}
It can be seen original data (crosses), exact values of the estimated functions (circles) and their estimations (small triangles connected by lines).


3.4 Estimating a Generalized Additive Model


m = 1925 gintest(code, t, y, h, g, loc{, opt})
estimates a generalized additive model (GAM)



  library("gam")

  randomize(1235)

  n     = 100

  p     = 2

  t     = uniform(n,p)*2-1

  g1    = 2*t[,1]

  g2    = t[,2]^2

  g2    = g2 - mean(g2)

  m     = g1 + g2

  y     = cdfn(m) .> uniform(n)    ; probit model

  h     = #(1.7, 1.5)

  g     = #(1.7, 1.5)

  tg    = grid(-0.8,0.1,19)

  opt   = gamopt("tg",tg~tg,"shf",1)

  loc   = 1

  code  = "bipro"

  m     = gintest(code,t,y,h,g,loc,opt)

  d1    = tg[,1]~m[,1]

  d2    = tg[,2]~m[,2]

  setmaskp(d1,4,4,8)

  setmaskp(d2,4,4,8)

  bild  = createdisplay(1,2)

  show(bild,1,1,d1,t[,1]~g1)

  show(bild,1,2,d2,t[,2]~g2)

1929 gam05.xpl

The quantlet 1934 gintest estimates the univariate additive functions and its derivatives in a generalized additive model (GAM) using Nadaraya-Watson, local linear or local quadratic estimation. Input parameters:
code
string, specifies the distribution of y and the link function. Currently implemented codes are:
"bilo" 
binomial with logistic link (logit)
"bipro"
binomial with normal distribution link (probit)
"noid" 
normal with canonical (identity) link
h
$ p'\times 1$ bandwidth vector for the directions of interest (see remarks). It can be $ p'=p$, $ p'=pg$ or $ p'=1$ for the same bandwidth in all directions.
g
$ p\times 1$ bandwidth vector for the directions not of interest
loc
scalar specifies the estimation procedure:
0
--Nadaraya-Watson (local constant)
1
--local linear
2
--local quadratic
Optional parameters:
opt.tg
$ ng \times pg$ matrix, a grid for the continuous part (see remarks)
opt.shf
for shf=1 an indicator to show how the process is going on (default: shf=0)
The quantlet returns
m
$ ng \times p'\cdot (loc+1) \times q$, the marginal integration estimates in the first p' columns, followed by the 1-st and 2-nd derivative, if local linear or quadratic estimation is used
Remarks: The grid may have less dimensions $ p'$ than the explanatory data. The estimation will then be run on the first $ p'$ directions of interest. Consequently, you need to specify the bandwidth vector h for the directions of interest only.


3.5 Estimating a Generalized Additive Partially Linear Model


{m, b, bv, const} = 1943 gintestpl(code, x, t, y, h, g{, opt})
estimates a generalized additive partially linear model (GAPLM)



  library("gam")

  randomize(1235)

  n     = 100

  p     = 2

  d     = 2

  b     = 1|2

  t     = uniform(n,p)*2-1

  x     = 2.*uniform(n,d)-1

  g1    = 2*t[,1]

  g2    = t[,2]^2

  g2    = g2 - mean(g2)

  m     = g1 + g2

  y     = cdfn(m+x*b) .> uniform(n)    ; probit model

  h     = #(1.7, 1.5)

  g     = #(1.7, 1.5)

  tg    = grid(-0.8,0.1,18)

  opt   = gamopt("tg",tg~tg)

  opt   = gamopt("shf",1,opt)

  code  = "bipro"

  {m,b,bv,c} = gintestpl(code,x,t,y,h,g,opt)

  gamout(t,y,m,b,c,gamopt("pl",1,"x",x,"bv",bv,opt))

1947 gam06.xpl

The quantlet 1952 gintestpl estimates the univariate additive functions in a generalized additive partially linear model (GAPLM) using Newton-Raphson or Fisher scoring algorithm. Input parameters:
code
string specifying the distribution of y and the link function. It accepts only one-dimensional y. Currently implemented codes are:
binomial

"bilo" 
binomial with logistic link (logit)
"bipro"
binomial with normal distribution link (probit)
"bicll"
binomial with complementary log-log link
normal
"noid" 
normal with canonical=identity link
"nopow"
normal with power (inverse) link
gamma
"gacl" 
gamma with canonical=reciprocal (inverse) link
"gapow"
gamma poisson with power (inverse) link
inverse gaussian
"igcl" 
inverse gaussian with canonical=squared reciprocal (inverse) link
"igpow"
inverse gaussian with power (inverse) link
negative binomial
"nbcl" 
negative binomial with canonical (inverse) link
"nbpow"
negative binomial with power (inverse) link
h
$ p\times 1$ bandwidth vector for the directions of interest
g
$ p\times 1$ bandwidth vector for the directions not of interest
Optional parameters:
opt.tg
$ np \times p$ matrix to estimate on a grid
opt.shf
if shf=1 then it is shown how the process is going on (default: shf=0)
opt.b0
$ d \times 1$ vector to provide initial coefficients for the linear part (default: GLM pre-estimation)
opt.nosort
nosort=1 indicates that t is already sorted by its first column (default: nosort=0). Sorting is required by the algorithm, hence you should switch if off only when the data are already sorted.
opt.miter
maximal number of iterations (default: miter=10)
opt.cnv
scalar to determine the convergence criterion (default: cnv=0.0001)
opt.fscor
fscor=1 to switch to the Fisher-Scoring algorithm (default: Newton-Raphson). This parameter is ignored for canonical links.
opt.wx
scalar or $ n\times 1$ vector to make use of prior weights. For binomial models usually the binomial index vector (default: 1).
opt.wt
$ n\times 1$ vector, weights for t (trimming factors) (default: all components set to 1)
opt.wtc
$ n\times 1$ vector to apply weights for the convergence criterion, w.r.t. $ m(t)$ (default: wt is used)
opt.off
scalar or $ n\times 1$ vector, offset, can be used for constrained estimation (default: off=0)
opt.pow
scalar, power for power link (default: pow=0)
opt.nbk
scalar, extra parameter k for negative binomial distribution (default: nbk=1--geometric distribution)
The quantlet returns
m
$ ng \times p$ matrix, the marginal integration estimates
b
$ d \times 1$ vector, the coefficients of the linear part
bv
$ d\times d$ covariance matrix for the estimated coefficients
const
constant of the model


3.6 Estimating Bivariate Marginal Influence


m = 1961 intest2d(t, y, h, g, loc{, opt})
estimates a bivariate marginal influence





  library("gam")

  randomize(12345)

  t     = grid(#(-0.9,-0.9),#(0.2,0.2),#(10,10))

  n     = rows(t)

  t     = t~(uniform(n)*2-1)

  g3    = sin(2*t[,3])

  g12   = t[,1].*t[,2]^2

  y     = g3 + g12 + normal(n)*sqrt(0.5)

  h     = #(1.0, 1.0)

  g     = #(1.1, 1.1, 1.2)

  loc   = 1

  gest  = intest2d(t,y,h,g,loc)

  library("graphic")

  pic  = createdisplay(1,2)

  dat11 = grsurface(t[,1:2]~g12)

  dat12 = grsurface(t[,1:2]~gest[,1])

  gc = grcube( dat11|dat12 )

  show(pic,1,1,dat11,gc.box,gc.x,gc.y,gc.z,gc.c)

  show(pic,1,2,dat12,gc.box,gc.x,gc.y,gc.z,gc.c)

  setheadline(pic, 1, 1, "Original function")

  setheadline(pic, 1, 2, "Estimated function")

1965 gam07.xpl

The quantlet 1970 intest2d provides a way to estimate the bivariate marginal influence function of the explanatory variables $ t_{1}$ and $ t_{2}$. You can choose the Nadaraya-Watson, the local linear or the local quadratic kernel smoother. Further, if local linear is chosen the program gives you the first derivative functions for both directions. If you choose the local quadratic smoother, you get the mixed derivative function. This quantlet can be used e.g. to explore the joint influence of two arbitrary explanatory variables in a multidimensional regression problem. Input parameters:
h
scalar or $ 2\times 1$ vector, the bandwidth for the directions of interest
g
scalar or $ p\times 1$ vector, the bandwidth for the directions not of interest
loc
scalar specifying the estimation procedure:
0
--Nadaraya-Watson (local constant)
1
--local linear
2
--local quadratic
Optionally it is possible to use:
opt.tg
$ ng \times 2$ matrix for estimating on a grid
opt.shf
shf=1 to show how the process is going on (default: shf=0)
The quantlet returns
m
$ ng \times p' \times g$matrix, the bivariate marginal integration estimate in the first column, the derivatives in the following columns
The example from this quantlet ( 1973 gam07.xpl ) gives the following picture:
\includegraphics[scale=0.6]{gam_intest2d.ps}
The original function is displayed on the left side, its estimate on the right side.


3.7 Estimating an Additive Model with Interaction Terms


{fh, c} = 1983 interact(t, y, h, g, loc, incl{, tg})
estimates an additive model with interaction terms



  library("gam")

  randomize(12345)

  t     = grid(#(-0.9,-0.9),#(0.2,0.2),#(10,10))

  n     = rows(t)

  t     = t~(uniform(n)*2-1)

  g1    = 2*t[,1]

  g2    = t[,2]^2 - mean(t[,2]^2)

  g3    = sin(3*t[,3])

  g12   = t[,1].*t[,2]

  y     = g1+g2+g3+g12+normal(n)*sqrt(0.5)

  h     = #(0.9, 0.9, 0.9)

  g     = #(1.0, 1.0, 1.0)

  incl  = 1~2

  f     = interact(t,y,h,g,1,incl)

  library("graphic")

  pic   = createdisplay(2,2)

  dat11 = sort(t[,2]~g2)

  datf1 = sort(t[,2]~f.fh[,2])

  dat12 = sort(t[,3]~g3)

  datf2 = sort(t[,3]~f.fh[,3])

  setmaskp(dat11,1,3,8)

  setmaskp(dat12,1,3,8)

  setmaskp(datf1,4,3,8)

  setmaskp(datf2,4,3,8)

  setmaskl(datf1,(1:rows(datf1))',4,1,1)

  setmaskl(datf2,(1:rows(datf2))',4,1,1)

  show(pic,1,1,dat11,datf1)

  show(pic,1,2,dat12,datf2)

  dat21 = grsurface(t[,1:2]~g12)

  dat22 = grsurface(t[,1:2]~f.fh[,4])

  gc = grcube( dat21|dat22 )

  show(pic,2,1,dat21,gc.box,gc.x,gc.y,gc.z,gc.c)

  show(pic,2,2,dat22,gc.box,gc.x,gc.y,gc.z,gc.c)

1987 gam08.xpl

The quantlet 1992 interact estimates the univariate functions and the bivariate interaction terms wished by the user and the constant of the model, i.e., all functions $ f_{j}$ and $ f_{jk}$ of the model $ m = c

+f_1+\ldots +f_d+f_{12}+\ldots+f_{(d-1)d}$, see also Subsection 1.2. Again the marginal integration estimator is used and you can choose between the Nadaraya-Watson, the local linear and the local quadratic smoother. Input parameters:
h
scalar or $ p\times 1$ vector, the bandwidth for the directions of interest
g
scalar or $ p\times 1$ vector, the bandwidth for the directions not of interest
loc
scalar specifying the estimation procedure:
0
--Nadaraya-Watson (local constant)
1
--local linear
2
--local quadratic
incl
$ pp \times 2$ matrix giving all pairs of indices $ j,k$ for which $ f_{jk}$ shall be included
Optional parameters:
tg
$ ng \times p$ matrix to estimate on a grid (see remarks)
The quantlet returns
fh
$ ng \times (p+pp)$ matrix, the marginal integration estimates of the univariate functions and the chosen interaction terms
c
scalar, the constant of the model
The example 1995 gam08.xpl gives the following picture:
\includegraphics[scale=0.6]{gam_interact.ps}
You see displayed the second and third additive component in the upper plots, where the original functions are black and their estimates blue. In the lower plots are displayed the original interaction on the left and its estimate on the right. Remarks: Note that 1999 interact accepts only one-dimensional y. If you choose a grid tg, the interaction functions can only be estimated up to a constant shift.


3.8 Estimating an Additive Model Using Marginal Integration


m = 2008 fastint(t, y, h1, h2, loc{, tg})
estimates an additive model using marginal integration



  library("gam")

  randomize(1234)

  n = 100

  d = 2

  ;               generate a correlated design:

  var = 1.0

  cov = 0.4  *(matrix(d,d)-unit(d)) + unit(d)*var

  {eval, evec} = eigsm(cov)

  t = normal(n,d)

  t = t*((evec.*sqrt(eval)')*evec')

  g1    = 2*t[,1]

  g2    = t[,2]^2 -mean(t[,2]^2)

  y     = g1 + g2  + normal(n,1) * sqrt(0.5)

  h1    = 0.5

  h2    = 0.7

  loc   = 0

  gest  = fastint(t,y,h1,h2,loc)

  library("graphic")

  pic   = createdisplay(1,2)

  dat11 = t[,1]~g1

  dat12 = t[,1]~gest[,1]

  dat21 = t[,2]~g2

  dat22 = t[,2]~gest[,2]

  setmaskp(dat12,4,4,8)

  setmaskp(dat22,4,4,8)

  show(pic,1,2,dat11,dat12)

  show(pic,1,1,dat21,dat22)

2012 gam09.xpl

The quantlet 2017 fastint estimates the univariate additive components $ f_{j}$ if and only if the true model is of additive structure, i.e., the underlying model is $ m = c

+f_1+\ldots+f_d$. Here, the marginal integration estimator is applied and followed by a one-step-backfit . For the backfit step you can choose between the Nadaraya-Watson, the local linear and the local quadratic smoother. Consequently you get estimates for the first or for the first and the second derivatives. For the integration step we use the fully internalized smoother, see Subsection 1.2. This estimation procedure is very fast compared to the above mentioned integration procedures but we recommend to use it only if the number of observations is big compared to the number of covariates and if the true model is indeed additive. It accepts only higher-dimensional y variables.


Input parameters:

h1
scalar or a $ p\times 1$ vector, the bandwidth for the pilot estimation,(marginal integration); it is recommended to undersmooth here
h2
scalar or a $ p\times 1$ vector, the bandwidth for the backfit step
loc
scalar specifying the estimation procedure:
0
--Nadaraya-Watson (local constant)
1
--local linear
2
--local quadratic
Optionally it is possible to use:
tg
$ ng \times pg$ matrix for estimating on a grid
The quantlet returns
m
$ ng \times (p+pp)$ matrix, the estimates of the univariate additive components and their derivatives on t or tg, respectively


Method and Data Technologies   MD*TECH Method and Data Technologies
  http://www.mdtech.de  mdtech@mdtech.de