Keywords - Function groups - @ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Group: Nonparametric Methods
See also: cartsplitopt prune cartcv plotcarttree tree agglom

Function: cartsplit
Description: Computes a regression tree.


Link:
Usage: cs = cartsplit (x, y, type, {opt})
Input:
x n x p matrix: data matrix of regression variables
y n x 1 vector: values of the response variable
type p x 1 vector: types of the regression variables; 1 means that the corresponding variable is continuous and 0 that it is categorical.
opt list of scalars: Determines when the growing of the tree is stopped.
opt.minsize integer >=1: The number of observations in child nodes must be greater or equal to minsize for a split to be allowed. Default value is 1.
opt.mincut integer >= 2: Nodes of size mincut or larger are candidates for a split; the growing continues if there are at least mincut observations in a node. Default value is 2.
opt.mindev scalar >=0: If the deviance of the node (sum of mean squared errors, value of cs.ssr) is less or equal to mindev, the node will not be splitted anymore. Default value is 0.
Output:
cs.val list of characters and vectors of characters; The length of the list equals the number of nodes in the tree. For the correspondance between the items of the vector and the leaves of the tree, see the note. In the case that the variable which is splitted after the i:th node is continuous, cs.val.split(i-1) gives the split point for that variable: observations having the value in the given variable which is less or equal to the cs.val.split(i-1), are located in the left child of this node, other observations are located to the right child of this node. In the case that the variable which is splitted after the i:th node is categorical, cs.val.split(i-1) is a two element vector: observations having values enumerated in cs.val.spliti[1] are located to the left child of the given node, observations having values enumerated in cs.val.spliti[2] are located to the right child of the given node. The list cs.val contains +NAN in the leaf nodes.
cs.vec vector whose length equals the number of nodes in the tree: gives the number of the variable which is split on the given node. Variables are numbered by denoting with "i" the variable whose values are in the i:th column of the data matrix x. At the leave nodes, vector contains +NAN.
cs.mean vector whose length equals the number of nodes in the tree: gives the fitted value of the response at each node, that is, gives the mean value of the response variable y for the observations in the given node.
cs.ssr vector whose length equals the number of nodes in the tree: gives the deviance of each node, that is, gives the sum of squared residuals of each node, that is, the sum of squared differences between the values of the response variable and the mean value of the response variable.
cs.nelem vector whose length equals the number of nodes in the tree: gives the number of observations in the given node.
cs.endpoint vector: gives the position of the last leaf in the subtree starting with the actual leaf.

Note:

Example:



; generate some data, y depends deterministically from x1:

; when 0 <= x1 < 0.5, then y=0, when 0.5 <= x1 <=1, then y=1

x1=#(0.1,0.2,0.3,0.4,0.45,0.6,0.7,0.8,0.9,0.95)

x2=uniform(10,1)

x=x1~x2

y=#(0,0,0,0,0,1,1,1,1,1)

minsize=1

mincut=1

mindev=0

opt=list(minsize,mincut,mindev)

cs=cartsplit(x,y,#(1,1),opt)

; show the results

cs

Result:



Contents of cs.val.split0

[1,] "0.45" 

Contents of cs.val.split1

[1,] "+NAN" 

Contents of cs.val.split2

[1,] "+NAN" 

Contents of cs.vec

[1,]        1 

[2,]     +NAN 

[3,]     +NAN 

Contents of cs.mean

[1,]      0.5 

[2,]        0 

[3,]        1 

Contents of cs.ssr

[1,]      2.5 

[2,]        0 

[3,]        0 

Contents of cs.nelem

[1,]       10 

[2,]        5 

[3,]        5 

Contents of cs.endpoint

[1,]        3 

[2,]        2 

[3,]        3 

Example:



x1=#(0,0,0,0,1,1,1,1,1,2)

x2=#(0,0,0,0,0,0,0,1,1,1)

x=x1~x2

y=#(0,0,0,0,100,100,100,120,120,120)

cs=cartsplit(x,y,#(0,1))

cs

Result:



Contents of cs.val.split0

[1,] "0" 

[2,] "1,2" 

Contents of cs.val.split1

[1,] "+NAN" 

Contents of cs.val.split2

[1,] "0" 

Contents of cs.val.split3

[1,] "+NAN" 

Contents of cs.val.split4

[1,] "+NAN" 

Contents of cs.vec

[1,]        1 

[2,]     +NAN 

[3,]        2 

[4,]     +NAN 

[5,]     +NAN 

Contents of cs.mean

[1,]       66 

[2,]        0 

[3,]      110 

[4,]      100 

[5,]      120 

Contents of cs.ssr

[1,]    29640 

[2,]        0 

[3,]      600 

[4,]        0 

[5,]        0 

Contents of cs.nelem

[1,]       10 

[2,]        4 

[3,]        6 

[4,]        3 

[5,]        3 

Contents of cs.endpoint

[1,]        5 

[2,]        2 

[3,]        5 

[4,]        4 

[5,]        5 


Group: Nonparametric Methods
See also: cartsplitopt prune cartcv plotcarttree tree agglom

Keywords - Function groups - @ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Author: Jussi Klemel„ 980223
(C) MD*TECH Method and Data Technologies, 21.9.2000