Impact of new housing projects in New York state Municipalities.
VARIABLES:
State Code
County Code
Expenditure per person
Wealth per person
Population
Percent intergovernmental
Density
Mean Income per person
id # (for matching)
Growth rate
VARIABLE NAMES:
"ST" "CO" "EXPEN" "WEALTH" "POP" "PINTERG"
"DENS"
"INCOME" "ID"
"GROWR"
NY Municipalities data
TOWNS OF INTEREST:
ST CO
EXPEN WEALTH POP PINTERG DENS INCOME ID GROWR
WARWICK 36 33 237 78908 16225
24.7 170 19044 8730 30.3
MONROE 36 33 159 55067
9338 8.8 599 16726 5420 30.0
TUXEDO 36 33 926 155034
2328 6.1 52 30610 8400 2.5
QUESTION : PREDICT EXPENDITURE FOR YEARS 2005 and 2025
Year POP WEALTH PINTERG DENS INCOME
1992 16225
78908 24.7 170 19044
2005 20442
85000 24.7 214 19500
2025 31033
89000 26.0 325 20000
1992 7750
55067 8.8 599 16726
2005 8996
58000 8.8 695 17100
2025 12413
60000 10.1 959 18000
1992 2317
155034 6.1 52 30610
2005 10685 116000
6.1 249 28300
2025 29246 115000
7.0 656 25000
What I expected of you for today is to have a lot of questions
and to
show us some basic attempts to fit the data. Writing
the report is
something that you should have in mind when you ask the
questions.
For example what is it that we need to know....
You must follow the format for report writing that was described in last week's notes but you still have plenty of freedom about style and content. The report could be addressed as having a group of questions in mind and answering them. Of course you need an introduction with the basic facts. It should not be too long you do notneed to describe the problem in detail because everybody should know it, but just the fact that are relevant to the analysis.
Variable=EXPEN
Histogram
# Boxplot
3300+*
1
*
.*
1
*
.
.*
1
*
.*
1
*
.
.*
1
*
.*
2
*
1700+*
1
*
.*
2
*
.*
1
*
.*
9
*
.**
11
*
.****
29
0
.*********
80
0
.*********************************************** 421
+--+--+
100+****************************************
353 +-----+
----+----+----+----+----+----+----+----+----+--
Variable=WEALTH
Histogram
# Boxplot
575000+*
2
*
.
.*
1
*
.*
2
*
.*
2
*
.*
6
*
.*
1
*
.*
6
*
.**
25
*
.***
33
0
.*************
180 +--0--+
25000+*********************************************** 656
*-----*
----+----+----+----+----+----+----+----+----+--
Variable=POP
Histogram
# Boxplot
470000+*
1
*
.
.
.
.*
1
*
.
.
.
.
.*
1
*
.
.
.*
1
*
.
.
.*
1
*
.*
1
*
.
.*
2
*
.*
2
*
.*
4
*
.*
6
*
.***
37
*
10000+************************************************ 857
+--0--+
----+----+----+----+----+----+----+----+----+---
Variable=PINTER
Histogram
#
Boxplot
67.5+*
2
*
.**
6
*
.**
6
0
.*
3
0
.**
8
0
.***
11
0
.*****
23
|
.**********
48
|
.********************
97
|
.*****************************
145
+-----+
.********************************************* 221
*--+--*
.******************************************
208
+-----+
.*************************
123
|
2.5+***
13
|
----+----+----+----+----+----+----+----+----+
Variable=GROWR
Histogram
# Boxplot
290+*
1
*
.
.
.
.
.*
1
*
.
.
.
.*
2
*
.
.*
2
*
.**
13
0
.********
93
0
.*********************************************** 563
+--+--+
.*******************
225 +-----+
.**
13
0
-50+*
1
*
----+----+----+----+----+----+----+----+----+--
Variable=LEX
Histogram
#
Boxplot
8.1+*
2
*
.*
1
*
.*
2
*
.*
3
0
.*
3
0
.**
6
0
.**
7
0
6.7+***
11
0
.*******
25
|
.*******
27
|
.*************
52
|
.*******************
73
|
.**************************
103
+-----+
.****************************************** 167
*--+--*
5.3+***************************************** 161
| |
.*******************************
124
+-----+
.*********************
82
|
.**********
38
|
.*****
17
|
.**
5
0
.*
4
0
3.9+*
1
0
----+----+----+----+----+----+----+----+--
Variable=LW
Histogram
#
Boxplot
13.3+*
2
*
.*
3
0
.*
1
0
.**
7
0
.*
1
0
.**
6
0
.*****
18
0
.****
15
|
.****
15
|
.********
31
|
.***********
44
|
11.1+************
47
|
.*******************
76
+-----+
.*************************
99
| |
.**********************************
135
*--+--*
.***************************************** 161
| |
.***********************************
138
+-----+
.*********************
84
|
.******
23
|
.**
5
|
.*
1
|
.*
1
|
8.9+*
1
0
----+----+----+----+----+----+----+----+-
Plot of EXPEN*GROWR. Legend: A = 1 obs, B = 2 obs, etc.
4000 +
|
|
|
|
|
A
|
3000 +
A
|
|
EXPEN
|
A
|
A
|
|
2000 +
A
| A
A
|
A
|
A
|
A A
|
|
ABB A
1000 +
A AB B
|
A AAADA
|
AA BDBBCA A
| A BAAFEFIDB
A
|
DHRQQIGCBCA A
|
AAIZZZZZVPG AB
A
|
BEDZZZZZVNIDCC AA A
0 + A
B A A
A
-+--------+--------+--------+--------+--------+--------+--------+--------+
-100 -50
0 50
100 150 200
250 300
GROWR
Plot of EXPEN*POP. Legend: A = 1 obs, B = 2 obs, etc.
4000 +
|
|
|
|
| A
|
3000 + A
|
|
EXPEN
| A
| A
|
|
2000 + A
| B
| A
| A
| B
|
| E A
1000 + F
| GB
| NA A A
| ZCBBAA A
| ZHFBBAA B
A A
| ZZMMIABB AAA B A
A A
A
| ZZOCDB A
0 + BC A
--+-------------+-------------+-------------+-------------+-------------+-
0 100000
200000 300000
400000 500000
POP
Plot of LEX*LGROWR. Legend: A = 1 obs, B = 2 obs, etc.
LEX
|
|
9 +
|
|
|
|
8 +
A
A
| A A
| A
A
A
|
A
A
| A
A
7 + A
A A
B AA B A
| A
A A A
A AAA A
| AA AA
A A A A
A A CA A
| A A BA
AAA BBA A A ABA A
ACBBB A A
| A BA AA AB A AAA
A AA A CB B A BB AAA BA
6 + AA ABDACAAA ACBA
A AA AAAAAAACBAAACBBBBAAAA
A
| A BCCADA
CAA BACAAADAAACBAACCFBCAGDADCBBAA
|
AAAC CCBE BBBE A E AAD CADIEDGFMGMBFHCCBB
| B
CCCBAGBBBAAA A ABC ABCEDDBGCJIBGEGCMNBJD AB
A
| AA A BDBDBDDADBCACAADBAAAGBBDAFFBEFJEGHMFCFADBB
A
5 + A A AA
AA AAC AA A AEBBBDCCICBDBFGCJBDBBAA A
| A A A
A A AB A AABABB AEACEDCBFGCB
| A A
A AA A AA A A A
AB CBE A
A
|
A
A B A B
A A
|
A
B A
4 + A
A
A A
|
---+-------------+-------------+-------------+-------------+-------------+--
-4 -2
0
2
4
6
Plot of LEX*LPOP. Legend: A = 1 obs, B = 2 obs, etc.
LEX
|
|
9 +
|
|
|
|
8 + A
A
| A
A
| A
A A
|
A A
| AA
7 +
A A A B A A B
A
|
A A A B A A A
A
| B
B AC B A B
A A
| A
CAAB B BDAA CA B A BB
AA A
|
A AAAC DB DBBBBA AA A AB
AA A
6 +
BABADCFEDFAABBB B ABAA BA A BA A
| A B
AABAFIDGEGF ECB ABCA A AA B D BA A B A AA
|
ABBELFOIIIKIGECDAAAFADBAB EBAA
A AA
|
B CEAELHQTKHDLDHFGDE ABCCC C A B
|
BACAFFJLMJPGJHFKJEDDHABB AA
5 +
AACDFBELHDGHDHBBBDDBAA ABA
|
AA C CADDBHEDCFBDABA AA A A A
|
AAA B BBEADBAAAA A
|
AACAAA A
|
B A A
4 +
A B
A
|
---+-------------+-------------+-------------+-------------+-------------+--
4
6
8 10
12 14
LPOP
Plot of LEX*LDENS. Legend: A = 1 obs, B = 2 obs, etc.
LEX
|
|
9 +
|
|
|
|
8 + B
| B
| A A A
| A
A
| B
7 + A A AB A A
A AA
| A B A
AA A A
A
| A A A A BC
A A
AA AA
|
A AACBCAACA BCA A
A A A A A A A A
|
AAC CABAB CCBBAA A AA A A
A A A A
6 +
A ABCBAEBICECCC AA B AA AAAA
A A A A
|
BACABCBFGJFFBEDC A B B A AAABAB A B AC AAA
|
A AEDBHOFNJIFHEFEDBAADABBBBB AEAA A A A A
|
A A ADFJSGQQHGIFEEBFAFAABDDA A B B B
A A
|
AABAFFFKQPPILFGJDECCFBAAB B C A
5 +
B FFFGHIGKDCFBA CEAAAAAB B
|
B BECEBEIDBDECB C B B
|
C AAAAC EDBBA A A
|
A C BA AA
|
A B A
4 +
A A A
A
|
---+-------------+-------------+-------------+-------------+-------------+--
0
2
4
6
8 10
LDENS
Model:
MODEL1
Dependent
Variable: LEX
Parameter Estimates
Parameter Standard T for
H0:
Variable DF Estimate
Error Parameter=0 Prob > |T|
INTERCEP 1 9.128891
1.84162410 4.957
0.0001
LW 1
0.326899 0.02870269
11.389 0.0001
LPOP 1 -2.269150
0.67760970 -3.349
0.0008
LPOP2 1 0.243843
0.07875894 3.096
0.0020
LPOP3 1 -0.008099
0.00296974 -2.727
0.0065
PINTERG 1 -0.083776
0.01017274 -8.235
0.0001
PINT2 1 0.002605
0.00037065 7.028
0.0001
PINT3 1 -0.000023080
0.00000387 -5.971
0.0001
LDENS 1 -0.181394
0.13947979 -1.301
0.1938
LDENS2 1 -0.056308
0.03251611 -1.732
0.0837
LDENS3 1 0.006721
0.00230544 2.915
0.0036
LINCOME 1 0.155568
0.07218570 2.155
0.0314
LGROWR 1 -0.015870
0.00658608 -2.410
0.0162
Cook's
Obs -2-1-0 1 2
D
885 | |**** |
0.034
886 | |*** |
0.011
887 |******| |
0.677
888 | |* |
0.000
889 | |** |
0.001
890 | **| |
0.000
891 | *| |
0.006
892 | |
| 0.000
893 | **| |
0.000
894 | |** |
0.001
895 | |* |
0.000
896 | *| |
0.000
897 | |** |
0.004
898 | |
| 0.000
899
.
900 | |** |
0.001
901 | **| |
0.001
902 | |
| 0.000
903 | |
| 0.000
Parameter Estimates
Parameter Standard T for
H0:
Variable DF Estimate
Error Parameter=0 Prob > |T|
INTERCEP 1 8.364134
1.81187251 4.616
0.0001
LW 1
0.337906 0.02822854
11.970 0.0001
LPOP 1 -2.123000
0.66543637 -3.190
0.0015
LPOP2 1 0.236876
0.07730017 3.064
0.0022
LPOP3 1 -0.008398
0.00291483 -2.881
0.0041
PINTERG 1 -0.081084
0.00999342 -8.114
0.0001
PINT2 1 0.002560
0.00036382 7.035
0.0001
PINT3 1 -0.000022923
0.00000379 -6.043
0.0001
LDENS 1 -0.148868
0.13698980 -1.087
0.2775
LDENS2 1 -0.072503
0.03202610 -2.264
0.0238
LDENS3 1 0.008932
0.00229281 3.896
0.0001
LINCOME 1 0.158749
0.07084256 2.241
0.0253
LGROWR 1 -0.017031
0.00646630 -2.634
0.0086
Analysis of Variance
Sum of Mean
Source DF
Squares Square
F Value Prob>F
Model 12
176.48254 14.70688
123.144 0.0001
Error 897
107.12709 0.11943
C Total 909
283.60962
Root MSE 0.34558
R-square 0.6223
Dep Mean 5.49063
Adj R-sq 0.6172
C.V. 6.29407
Parameter Estimates
Parameter Standard T for
H0:
Variable DF Estimate
Error Parameter=0 Prob > |T|
INTERCEP 1 8.678552
1.79868778 4.825
0.0001
LW 1
0.338767 0.02799706
12.100 0.0001
LPOP 1 -2.355506
0.66252610 -3.555
0.0004
LPOP2 1 0.267774
0.07705389 3.475
0.0005
LPOP3 1 -0.009729
0.00291001 -3.343
0.0009
PINTERG 1 -0.080061
0.00991450 -8.075
0.0001
PINT2 1 0.002524
0.00036094 6.994
0.0001
PINT3 1 -0.000022594
0.00000376 -6.004
0.0001
LDENS 1 -0.050738
0.13806901 -0.367
0.7133
LDENS2 1 -0.103158
0.03267789 -3.157
0.0016
LDENS3 1 0.011627
0.00237202 4.902
0.0001
LINCOME 1 0.175452
0.07038409 2.493
0.0129
LGROWR 1 -0.016589
0.00641405 -2.586
0.0099
Stepwise Procedure for Dependent Variable LEX
Step 1 Variable LW Entered R-square = 0.40240739 C(p) =582.99726317
DF Sum of Squares Mean Square F Prob>F
Regression 1
113.28220824 113.28220824
610.08 0.0001
Error 906
168.22904622 0.18568327
Total 907
281.51125446
Parameter Standard
Type II
Variable Estimate
Error Sum of Squares
F Prob>F
INTERCEP -0.49677927
0.24289341 0.77672675
4.18 0.0411
LW
0.56523742 0.02288424
113.28220824 610.08 0.0001
Bounds
on condition number:
1, 1
--------------------------------------------------------------------------------
Step 2 Variable LDENS Entered R-square = 0.48338211 C(p) =383.50684314
--------------------------------------------------------------------------------
Step
3 Variable LDENS3 Entered R-square = 0.59505421
C(p) =107.63173004
--------------------------------------------------------------------------------
Step
4 Variable LGROWR Entered R-square = 0.60075340
C(p) = 95.45036599
--------------------------------------------------------------------------------
Step 7 Variable PINT3 Entered R-square = 0.63239624 C(p) = 22.71309905
--------------------------------------------------------------------------------
Step 8 Variable LINCOME Entered R-square = 0.63449188 C(p) = 19.49848848
DF Sum of Squares Mean Square F Prob>F
Regression 8
178.61660425 22.32707553
195.07 0.0001
Error 899
102.89465021 0.11445456
Total 907
281.51125446
Parameter Standard
Type II
Variable Estimate
Error Sum of Squares
F Prob>F
INTERCEP 2.74449327
0.50441862 3.38824829
29.60 0.0001
LW
0.33493799 0.02780928
16.60284462 145.06 0.0001
PINTERG -0.07904981
0.00973812 7.54196606
65.89 0.0001
PINT2 0.00252008
0.00035717 5.69795441
49.78 0.0001
PINT3 -0.00002269
0.00000374 4.21351271
36.81 0.0001
LDENS -0.50367595
0.02842944 35.92518565
313.88 0.0001
LDENS3 0.00499558
0.00034247 24.35386781
212.78 0.0001
LINCOME 0.15976080
0.07036884 0.58994659
5.15 0.0234
LGROWR -0.01742089
0.00637521 0.85464441
7.47 0.0064
Bounds
on condition number: 350.2561,
4610.76
--------------------------------------------------------------------------------
All
variables left in the model are significant at the 0.1500 level.
No
other variable met the 0.1500 significance level for entry into the model.
Summary of Stepwise Procedure for Dependent Variable LEX
Variable Number Partial
Model
Step Entered Removed In
R**2 R**2 C(p)
F Prob>F
1 LW
1 0.4024 0.4024 582.9973
610.0830 0.0001
2 LDENS
2 0.0810 0.4834 383.5068
141.8498 0.0001
3 LDENS3
3 0.1117 0.5951 107.6317
249.2965 0.0001
4 LGROWR
4 0.0057 0.6008 95.4504
12.8902 0.0003
5 PINTERG
5 0.0016 0.6024 93.4558
3.6415 0.0567
6 PINT2
6 0.0151 0.6174 57.9492
35.4993 0.0001
7 PINT3
7 0.0150 0.6324 22.7131
36.6371 0.0001
8 LINCOME
8 0.0021 0.6345 19.4985
5.1544 0.0234
N =
908 Regression Models for Dependent Variable: LEX
Adjusted R-square Variables in Model
R-square In
0.6358478
0.6402642 11 LW LPOP LPOP2 LPOP3 PINTERG PINT2 PINT3 LDENS2 LDENS3
LINCOME LGROWR
0.6354960
0.6403185 12 LW LPOP LPOP2 LPOP3 PINTERG PINT2 PINT3 LDENS LDENS2
LDENS3 LINCOME LGROWR
0.6337453
0.6377834 10 LW LPOP LPOP2 LPOP3 PINTERG PINT2 PINT3 LDENS2 LDENS3
LGROWR
0.6335406
0.6375810 10 LW LPOP LPOP2 LPOP3 PINTERG PINT2 PINT3 LDENS2 LDENS3
LINCOME
0.6333749
0.6378213 11 LW LPOP LPOP2 LPOP3 PINTERG PINT2 PINT3 LDENS LDENS2
LDENS3 LGROWR
0.6331814
0.6376302 11 LW LPOP LPOP2 LPOP3 PINTERG PINT2 PINT3 LDENS LDENS2
LDENS3 LINCOME
0.6318487
0.6363136 11 LW LPOP LPOP2 LPOP3 PINTERG PINT2 PINT3 LDENS LDENS3
LINCOME LGROWR
0.6313896
0.6350473 9 LW PINTERG PINT2 PINT3 LDENS LDENS2 LDENS3 LINCOME LGROWR
0.6313555
0.6358264 11 LW LPOP LPOP2 PINTERG PINT2 PINT3 LDENS LDENS2 LDENS3
LINCOME LGROWR
0.6312393
0.6344919 8 LW PINTERG PINT2 PINT3 LDENS LDENS3 LINCOME LGROWR
0.6311713
0.6352377 10 LW LPOP LPOP2 PINTERG PINT2 PINT3 LDENS LDENS3 LINCOME
LGROWR
0.6310451
0.6351129 10 LW LPOP3 PINTERG PINT2 PINT3 LDENS LDENS2 LDENS3 LINCOME
LGROWR
0.6310135
0.6350817 10 LW LPOP2 PINTERG PINT2 PINT3 LDENS LDENS2 LDENS3 LINCOME
LGROWR
0.6309898
0.6354651 11 LW LPOP LPOP3 PINTERG PINT2 PINT3 LDENS LDENS2 LDENS3
LINCOME LGROWR
Backward Elimination Procedure for Dependent Variable LEX
Step 0 All Variables Entered R-square = 0.64031851 C(p) = 13.00000000
DF Sum of Squares Mean Square F Prob>F
Regression 12
180.25686745 15.02140562
132.78 0.0001
Error 895
101.25438701 0.11313339
Total 907
281.51125446
Parameter Standard
Type II
Variable Estimate
Error Sum of Squares
F Prob>F
INTERCEP 8.67855174
1.79868778 2.63374522
23.28 0.0001
LW
0.33876721 0.02799706
16.56412705 146.41 0.0001
LPOP -2.35550556
0.66252610 1.43005618
12.64 0.0004
LPOP2 0.26777440
0.07705389 1.36628045
12.08 0.0005
LPOP3 -0.00972914
0.00291001 1.26458864
11.18 0.0009
PINTERG -0.08006091
0.00991450 7.37717980
65.21 0.0001
PINT2 0.00252434
0.00036094 5.53380577
48.91 0.0001
PINT3 -0.00002259
0.00000376 4.07790415
36.05 0.0001
LDENS -0.05073846
0.13806901 0.01527823
0.14 0.7133
LDENS2 -0.10315774
0.03267789 1.12742153
9.97 0.0016
LDENS3 0.01162658
0.00237202 2.71805055
24.03 0.0001
LINCOME 0.17545199
0.07038409 0.70300507
6.21 0.0129
LGROWR -0.01658922
0.00641405 0.75679540
6.69 0.0099
Bounds
on condition number: 17515.01,
345502.9
--------------------------------------------------------------------------------
Step 1 Variable LDENS Removed R-square = 0.64026424 C(p) = 11.13504615
DF Sum of Squares Mean Square F Prob>F
Regression 11
180.24158922 16.38559902
144.97 0.0001
Error 896
101.26966524 0.11302418
Total 907
281.51125446
Parameter Standard
Type II
Variable Estimate
Error Sum of Squares
F Prob>F
INTERCEP 9.06958159
1.44948605 4.42504596
39.15 0.0001
LW
0.34097629 0.02733090
17.59186741 155.65 0.0001
LPOP -2.52224957
0.48252866 3.08817037
27.32 0.0001
LPOP2 0.28686074
0.05689168 2.87353198
25.42 0.0001
LPOP3 -0.01043163
0.00219300 2.55739800
22.63 0.0001
PINTERG -0.08043663
0.00985688 7.52662295
66.59 0.0001
PINT2 0.00253471
0.00035966 5.61367623
49.67 0.0001
PINT3 -0.00002268
0.00000375 4.12639603
36.51 0.0001
LDENS2 -0.11462431
0.00970272 15.77386911
139.56 0.0001
LDENS3 0.01239009
0.00114392 13.25961238
117.32 0.0001
LINCOME 0.17482331
0.07032932 0.69838867
6.18 0.0131
LGROWR -0.01657313
0.00641080 0.75536390
6.68 0.0099
Bounds
on condition number: 9557.367,
167198.3
--------------------------------------------------------------------------------
All variables left in the model are significant at the 0.1000 level.
Summary of Backward Elimination Procedure for Dependent Variable LEX
Variable Number Partial
Model
Step Removed
In R**2 R**2
C(p) F
Prob>F
1 LDENS
11 0.0001 0.6403
11.1350 0.1350 0.7133
OPTIONS PS=40 lS=80;
data a;
infile 'nym';
input nn ST CO EXPEN
WEALTH POP PINTERG DENS INCOME ID GROWR ;
lex = log(expen);
lw = log(wealth);
lpop = log(pop);
ldens = log(dens);
lincome = log(income);
pint2 = pinterg**2;
pint3 = pinterg**3;
lpop2 = lpop**2;
ldens2 = ldens**2;
lpop3 = lpop**3;
ldens3 = ldens**3;
if growr < 0 then
lgrowr = - log(-growr);
if growr > 0 then
lgrowr = log(growr);
if _N_ ne 887;
if _N_ ne 475;
run;
run;
proc univariate plot;
var ST CO
EXPEN WEALTH POP PINTERG DENS INCOME ID GROWR
lex lw lpop ldens lincome;
proc plot;
plot EXPEN*(growr
wealth pop dens income PINTERG GROWR);
plot lex*(lgrowr
lw lpop ldens lincome);
run;
proc reg;
model lEX = lW lPOP
lpop2 lpop3 PINTERG pint2 pint3 lDENS ldens2 ldens3 lINCOME
lGROWR/ P R;
run;
proc reg data = a;
model lEX = lW lPOP
lpop2 lpop3 PINTERG pint2 pint3 lDENS ldens2 ldens3 lINCOME
lGROWR/ METHOD=stepwise;
proc reg data = a;
model lEX = lW lPOP
lpop2 lpop3 PINTERG pint2 pint3 lDENS ldens2 ldens3 lINCOME
lGROWR/ METHOD=backward;
proc reg data = a;
model lEX = lW lPOP
lpop2 lpop3 PINTERG pint2 pint3 lDENS ldens2 ldens3 lINCOME
lGROWR/ METHOD=adjrsq;
run;
If you use Splus here are some basic comments about modeling.
Example:
lm.ny <- lm(LEXPEN~LPOP+LDENS+LWEALTH+LINCOME+PINTERG+GROWR,data=x.s)
+ POP^3
or + POP^2 + POP^3
Look at the R^2 to decided if it improves much.
Plots are also assigned to model objects:
plot(lm.ny)
will produce a plot of residuals vs predicted values
Example:
LWEALTH > value or things like that.
================SOME OUTPUT===========Lets say that high expenditures happen for different reasons and their relation to population and to wealth or income is complicated... It is easier to just look a a part of the data where we have to do the prediction and hope that the complicated relationship has became simpler for that subset.
# Initial graphs of all the variables.
postscript("tmp")
pairs(~.,data=x.s[,c(3:8,10)],pch=".")
pairs(~.,data=x.s[,c(6,10:15)],pch=".")
nam <- names(x.s[,c(6,10:15)])
par(mfrow=c(2,2),pty="s")
plot( x.s$LINCOME, x.s$LEXPEN,pch=183)
s1 <- identify( x.s$LINCOME, x.s$LEXPEN)
s2 <- unique(s2)
s2 <- identify( x.s$LINCOME,
x.s$LEXPEN)
s2 <- unique(s2)
# Explore the relationship between LINCOME , LWEALTH , LEXPEN, LPOP
plot( x.s$LINCOME, x.s$LEXPEN,pch=".")
points( x.s$LINCOME[s1], x.s$LEXPEN[s1],
pch="*")
points( x.s$LINCOME[s2], x.s$LEXPEN[s2],
pch="O")
plot( x.s$LWEALTH, x.s$LEXPEN,pch=".")
points( x.s$LWEALTH[s1], x.s$LEXPEN[s1],
pch="*")
points( x.s$LWEALTH[s2], x.s$LEXPEN[s2],
pch="O")
plot( x.s$LPOP, x.s$LEXPEN,pch=".")
points( x.s$LPOP[s1], x.s$LEXPEN[s1],
pch="*")
points( x.s$LPOP[s2], x.s$LEXPEN[s2],
pch="O")
graphics.off()
# Fit the basic linear model with lm()
> lm.ny <- lm(LEXPEN~LPOP+LDENS+LWEALTH+LINCOME+PINTERG+GROWR,data=x.s)
> summary(lm.ny)
anova(lm.ny)
Call: lm(formula = LEXPEN ~ LPOP + LDENS
+ LWEALTH + LINCOME + PINTERG + GROWR, data
= x.s)
Residuals:
Min
1Q Median 3Q Max
-1.579 -0.2359 0.01679 0.246 1.638
Coefficients:
Value Std. Error t value
(Intercept) 0.790756 0.5902594
1.3397
LPOP 0.087591 0.0291388 3.0060
LDENS
-0.192027 0.0283077 -6.7836
LWEALTH
0.522807 0.0296278 17.6458
LINCOME -0.071884
0.0824464 -0.8719
PINTERG -0.001489
0.0014275 -1.0430
GROWR
-0.003827 0.0007794 -4.9103
Residual standard error: 0.3979 on 907
degrees of freedom
Multiple R-Squared: 0.494
Correlation of Coefficients:
(Intercept) LPOP LDENS LWEALTH LINCOME PINTERG
pp LPOP -0.1718
LDENS 0.4463
-0.8266
LWEALTH 0.2935
-0.1520 0.4362
LINCOME -0.8885
0.0194 -0.4257 -0.6563
PINTERG -0.1659
0.0848 0.0821 0.2120 -0.0324
GROWR -0.1846
-0.1248 0.0119 -0.0305 0.1798 -0.0018
# Get the analysis of variance table related to this model.
> anova(lm.ny)
Analysis of Variance Table
Response: LEXPEN
Terms added sequentially (first to last)
Df Sum of Sq Mean Sq F Value Pr(F)
LPOP
1 15.8017 15.80170 99.8112 0.0000000
LDENS
1 25.8957 25.89574 163.5700 0.0000000
LWEALTH 1
94.4967 94.49675 596.8872 0.0000000
LINCOME 1
0.0001 0.00008 0.0005 0.9816068
PINTERG 1
0.1752 0.17524 1.1069 0.2930425
GROWR
1 3.8171 3.81711 24.1107 0.0000011
Residuals 907 143.5925 0.15832