8. Application for Real Data
We will demonstrate an example of processing of real data in this section.
We can use two data sets of Wisconsin farm data, 1987, from originally 1000 data.
Selected are middle sized animal farms, outliers were removed. The first data set
animal.dat
contains 250 observations (rows) of family labor,
hired labor,
miscellaneous inputs, animal inputs and
intermediate run assets. The response variable livestock
is contained in the second data set
goods.dat. Detailed description of data, source,
possible models of interest and some nonparametric analysis can be found in
Sperlich (1998).
In this example we will deal with the first three inputs, i.e. family labor,
hired labor, miscellaneous inputs and animal inputs. We will store them into
the variable t and also we must read the response variable y:
data=read("animal.dat")
t1 = data[,1]
t2 = data[,2]
t3 = data[,3]
t4 = data[,4]
t=t1~t2~t3~t4
y=read("goods.dat")
Now we can calculate approximately bandwidth
:
h1=0.5*sqrt(cov(t1))
h2=0.5*sqrt(cov(t2))
h3=0.5*sqrt(cov(t3))
h4=0.5*sqrt(cov(t4))
h=h1|h2|h3|h4
Finally we set up the parameters for estimation and run the partial
integration procedure
intest.
It will be shown running of the computation.
g=h
loc=0
opt=gamopt("shf",1)
m = intest(t,y,h,g,loc,opt)
For an objective view of the results we create the graphical output
on Figure 1.
Figure 1:
Generalized additive model for animal.dat, partial integration.
|
It is produced by the following statements:
const=mean(y)*0.25
m1 = t[,1]~(m[,1]+const)
m2 = t[,2]~(m[,2]+const)
m3 = t[,3]~(m[,3]+const)
m4 = t[,4]~(m[,4]+const)
setmaskp(m1,4,4,4)
setmaskp(m2,4,4,4)
setmaskp(m3,4,4,4)
setmaskp(m4,4,4,4)
setmaskl(m1,(sort(m1~(1:rows(m1)))[,3])',4,1,1)
setmaskl(m2,(sort(m2~(1:rows(m2)))[,3])',4,1,1)
setmaskl(m3,(sort(m3~(1:rows(m3)))[,3])',4,1,1)
setmaskl(m4,(sort(m4~(1:rows(m4)))[,3])',4,1,1)
yy=y-mean(y)-sum(m,2)
d1=t[,1]~(yy+m[,1])
d2=t[,2]~(yy+m[,2])
d3=t[,3]~(yy+m[,3])
d4=t[,4]~(yy+m[,4])
setmaskp(d1,1,11,4)
setmaskp(d2,1,11,4)
setmaskp(d3,1,11,4)
setmaskp(d4,1,11,4)
pic = createdisplay(2,2)
show(pic,1,1,m1,d1)
show(pic,1,2,m2,d2)
show(pic,2,1,m3,d3)
show(pic,2,2,m4,d4)
We see two properties of the data from the produced Figure 1:
- 1.
- the bandwidth
was chosen quite well; the data seems not to be
oversmoothed or undersmoothed.
- 2.
- there are several outliers in the data; they can be seen in the right
part of the pictures.
If we try to use quantlet
intest
with inner grid for computation
(optional variable opt.tg) the quantlet ends with an error message. It is
because of outliers where the data is too sporadic.
For better understanding the data we can use backfitting algorithm for
estimation
(quantlet
backfit) and compare the results.
kern="qua"
{mb,b,const} = backfit(t,y,h,loc,kern,opt)
For graphical output we can use the similar approach as above with several differences.
m1 = t[,1]~mb[,1]
m2 = t[,2]~mb[,2]
m3 = t[,3]~mb[,3]
m4 = t[,4]~mb[,4]
setmaskp(m1,4,4,4)
setmaskp(m2,4,4,4)
setmaskp(m3,4,4,4)
setmaskp(m4,4,4,4)
setmaskl(m1,(sort(m1~(1:rows(m1)))[,3])',4,1,1)
setmaskl(m2,(sort(m2~(1:rows(m2)))[,3])',4,1,1)
setmaskl(m3,(sort(m3~(1:rows(m3)))[,3])',4,1,1)
setmaskl(m4,(sort(m4~(1:rows(m4)))[,3])',4,1,1)
yy=y-const-sum(mb,2)
d1=t[,1]~(yy+mb[,1])
d2=t[,2]~(yy+mb[,2])
d3=t[,3]~(yy+mb[,3])
d4=t[,4]~(yy+mb[,4])
setmaskp(d1,1,11,4)
setmaskp(d2,1,11,4)
setmaskp(d3,1,11,4)
setmaskp(d4,1,11,4)
pic2 = createdisplay(2,2)
show(pic2,1,1,m1,d1)
show(pic2,1,2,m2,d2)
show(pic2,2,1,m3,d3)
show(pic2,2,2,m4,d4)
Figure 2:
Generalized additive model for animal.dat, backfitting.
|
The graphs of this estimation on Figure 2
are like the graphs on Figure 1 achieved using
intest;
only different scale factor was used. It seems that the dependence
of variable
on the miscellaneous inputs is almost linear. Unfortunately
the quantlet
intestpl
for additive partially linear model ends with the
error because
of outliers. Likewise the testing of interactions (
intertest1
or
intertest1) is aborting. For data manipulation using this quantlets the
removing outliers from the data sets would be necessary.