Group: | Nonparametric Methods |
See also: | cartsplitopt prune cartcv plotcarttree tree agglom |
Function: | cartsplit | |
Description: |
Computes a regression tree.
|
Usage: | cs = cartsplit (x, y, type, {opt}) | |
Input: | ||
x | n x p matrix: data matrix of regression variables | |
y | n x 1 vector: values of the response variable | |
type | p x 1 vector: types of the regression variables; 1 means that the corresponding variable is continuous and 0 that it is categorical. | |
opt | list of scalars: Determines when the growing of the tree is stopped. | |
opt.minsize | integer >=1: The number of observations in child nodes must be greater or equal to minsize for a split to be allowed. Default value is 1. | |
opt.mincut | integer >= 2: Nodes of size mincut or larger are candidates for a split; the growing continues if there are at least mincut observations in a node. Default value is 2. | |
opt.mindev | scalar >=0: If the deviance of the node (sum of mean squared errors, value of cs.ssr) is less or equal to mindev, the node will not be splitted anymore. Default value is 0. | |
Output: | ||
cs.val | list of characters and vectors of characters; The length of the list equals the number of nodes in the tree. For the correspondance between the items of the vector and the leaves of the tree, see the note. In the case that the variable which is splitted after the i:th node is continuous, cs.val.split(i-1) gives the split point for that variable: observations having the value in the given variable which is less or equal to the cs.val.split(i-1), are located in the left child of this node, other observations are located to the right child of this node. In the case that the variable which is splitted after the i:th node is categorical, cs.val.split(i-1) is a two element vector: observations having values enumerated in cs.val.spliti[1] are located to the left child of the given node, observations having values enumerated in cs.val.spliti[2] are located to the right child of the given node. The list cs.val contains +NAN in the leaf nodes. | |
cs.vec | vector whose length equals the number of nodes in the tree: gives the number of the variable which is split on the given node. Variables are numbered by denoting with "i" the variable whose values are in the i:th column of the data matrix x. At the leave nodes, vector contains +NAN. | |
cs.mean | vector whose length equals the number of nodes in the tree: gives the fitted value of the response at each node, that is, gives the mean value of the response variable y for the observations in the given node. | |
cs.ssr | vector whose length equals the number of nodes in the tree: gives the deviance of each node, that is, gives the sum of squared residuals of each node, that is, the sum of squared differences between the values of the response variable and the mean value of the response variable. | |
cs.nelem | vector whose length equals the number of nodes in the tree: gives the number of observations in the given node. | |
cs.endpoint | vector: gives the position of the last leaf in the subtree starting with the actual leaf. |
In the vectors representing a tree, nodes are ordered
in such a way that after the root node comes the left
child, then comes the left child of the previous node,
and so on, until the leaf node is reached, then comes the
right child which is the sybling for the previously enumerated
left child, and so on. The hole left subtree is enumerated
before the right subtree.
; generate some data, y depends deterministically from x1: ; when 0 <= x1 < 0.5, then y=0, when 0.5 <= x1 <=1, then y=1 x1=#(0.1,0.2,0.3,0.4,0.45,0.6,0.7,0.8,0.9,0.95) x2=uniform(10,1) x=x1~x2 y=#(0,0,0,0,0,1,1,1,1,1) minsize=1 mincut=1 mindev=0 opt=list(minsize,mincut,mindev) cs=cartsplit(x,y,#(1,1),opt) ; show the results cs
Contents of cs.val.split0 [1,] "0.45" Contents of cs.val.split1 [1,] "+NAN" Contents of cs.val.split2 [1,] "+NAN" Contents of cs.vec [1,] 1 [2,] +NAN [3,] +NAN Contents of cs.mean [1,] 0.5 [2,] 0 [3,] 1 Contents of cs.ssr [1,] 2.5 [2,] 0 [3,] 0 Contents of cs.nelem [1,] 10 [2,] 5 [3,] 5 Contents of cs.endpoint [1,] 3 [2,] 2 [3,] 3
x1=#(0,0,0,0,1,1,1,1,1,2) x2=#(0,0,0,0,0,0,0,1,1,1) x=x1~x2 y=#(0,0,0,0,100,100,100,120,120,120) cs=cartsplit(x,y,#(0,1)) cs
Contents of cs.val.split0 [1,] "0" [2,] "1,2" Contents of cs.val.split1 [1,] "+NAN" Contents of cs.val.split2 [1,] "0" Contents of cs.val.split3 [1,] "+NAN" Contents of cs.val.split4 [1,] "+NAN" Contents of cs.vec [1,] 1 [2,] +NAN [3,] 2 [4,] +NAN [5,] +NAN Contents of cs.mean [1,] 66 [2,] 0 [3,] 110 [4,] 100 [5,] 120 Contents of cs.ssr [1,] 29640 [2,] 0 [3,] 600 [4,] 0 [5,] 0 Contents of cs.nelem [1,] 10 [2,] 4 [3,] 6 [4,] 3 [5,] 3 Contents of cs.endpoint [1,] 5 [2,] 2 [3,] 5 [4,] 4 [5,] 5
Group: | Nonparametric Methods |
See also: | cartsplitopt prune cartcv plotcarttree tree agglom |