 Usage:  cs = cartsplit (x, y, type, {opt})   

 

 Input:



  x                      n x p matrix: data matrix of regression variables 

                         

                         

  y                      n x 1 vector: values of the response variable 

                         

                         

  type                   p x 1 vector: types of the regression variables; 

                         

                         1 means that the corresponding variable is continuous and 

                         

                         0 that it is categorical. 

                         

                         

  opt                    list of scalars: Determines when the growing of the 

                         

                         tree is stopped. 

                         

                         

  opt.minsize            integer >=1: The number of observations in child nodes 

                         

                         must be greater or equal to minsize for a split to be allowed. 

                         

                         Default value is 1. 

                         

                         

  opt.mincut             integer >= 2: Nodes of size mincut or larger are candidates 

                         

                         for a split; the growing continues if there are 

                         

                         at least mincut observations in a node. 

                         

                         Default value is 2. 

                         

                         

  opt.mindev             scalar >=0: If the deviance of the node (sum of mean squared 

                         

                         errors, value of cs.ssr) is less or equal to 

                         

                         mindev, the node will not be splitted anymore. 

                         

                         Default value is 0. 

                         

                         

                         

                         

 Output:



  cs.val                 list of characters and vectors of characters; 

                         

                         The length of the list equals the number of nodes in the tree. 

                         

                         For the correspondance between the items of the vector 

                         

                         and the leaves of the tree, see the note. 

                         

                         In the case that the variable which is splitted after 

                         

                         the i:th node is continuous, cs.val.split(i-1) gives the 

                         

                         split point for 

                         

                         that variable: observations having the value in the given 

                         

                         variable which is less or equal to the cs.val.split(i-1), 

                         

                         are located 

                         

                         in the left child of this node, other observations are 

                         

                         located to the right child of this node. 

                         

                         In the case that the variable which is splitted after 

                         

                         the i:th node is categorical, cs.val.split(i-1) is a two 

                         

                         element vector: observations having values enumerated in 

                         

                         cs.val.spliti[1] are located to the left child of the given 

                         

                         node, observations having values enumerated in 

                         

                         cs.val.spliti[2] are located to the right child of the 

                         

                         given node. 

                         

                         The list cs.val contains +NAN in the leaf nodes. 

                         

                         

  cs.vec                 vector whose length equals the number of nodes in the tree: 

                         

                         gives the number of the variable which is split on the 

                         

                         given node. Variables are numbered by denoting with "i" the 

                         

                         variable whose values are in the i:th column of the data 

                         

                         matrix x. At the leave nodes, vector contains +NAN. 

                         

                         

  cs.mean                vector whose length equals the number of nodes in the tree: 

                         

                         gives the fitted value of the response at each node, that is, 

                         

                         gives the mean value of the response variable y for the 

                         

                         observations in the given node. 

                         

                         

  cs.ssr                 vector whose length equals the number of nodes in the tree: 

                         

                         gives the deviance of each node, that is, 

                         

                         gives the sum of squared residuals of each node, that 

                         

                         is, the sum of squared differences between the values of 

                         

                         the response variable and the mean value of the response 

                         

                         variable. 

                         

                         

  cs.nelem               vector whose length equals the number of nodes in the tree: 

                         

                         gives the number of observations in the given node. 

                         

                         

  cs.endpoint            vector: gives the position of the last leaf in the subtree 

                         

                         starting with the actual leaf. 

                         

                         

                         

                         

--------------------------------------------------------------

(C) MD*TECH Method and Data Technologies, 21.9.2000

