2. Pruning the Tree


subcs = 1155 prune(cs, alfa)
prunes a subtree from the given tree, given the complexity parameter alfa
subcs = 1158 prunetot(cs, lnum)
prunes a subtree from a tree, given desired number of leaves
subcs = 1161 maketr(cs, node)
forms a subtree, cutting at a given node
resu = 1164 prunecv(tr, alfaseq, x, y, type)
for a given tree, this quantlet calculates the MSE of squared residuals for the subtrees of a given tree
seq = 1167 pruneseq(cs)
for a given tree, this quantlet gives the sequence of $ \alpha$ values and numbers of leaves of the pruned subtrees
One might think that the optimal way of choosing a regression tree is to stop growing the tree before it gets too large. For example, one could stop growing when the sum of the mean squared residuals of the regression estimator does not decrease substantially anymore. However, it might happen that the decrease in the sum of the mean squared residuals is momentarily slow, but some further splits result again in considerable decrease in this sum. Let us denote

$\displaystyle

R (\hat{f} )

= \frac{1}{n} \sum_{i=1}^n \left( Y_i - \hat{f}(X_i) \right)^2

$

and for $ 0 \leq \alpha < \infty$,

$\displaystyle

R_{\alpha} ( \hat{f} )

= R(\hat{f}) + \alpha \textrm{Leaves} ( \hat{f} )

$

where $ \textrm{Leaves} ( \hat{f} )$ is the number of the leaves of the regression tree $ \hat{f}$, which could be viewed as the number of rectangulars on which $ \hat{f}$ is constant. The criterion $ R_{\alpha} ( \hat{f} )$ takes into account not only the sum of the squared residuals but it also penalizes with respect to the complexity of the tree, measured by number of leaves in the tree. The number $ \alpha$ is like smoothing parameter in kernel estimation. Let $ \hat{f}$ be a large, overfitting tree and let $ \hat{f}_{\alpha}$ be a subtree of $ \hat{f}$ for which $ R( )$ is minimal. Note that $ \hat{f}_0$ is $ \hat{f}$, and when $ \alpha$ is sufficiently large, $ \hat{f}_{\alpha}$ is the constant estimator, that is, it consists of the root node only. When $ \alpha$ increases from 0 to infinity, there are only finite number of values of $ \alpha$ at which $ \hat{f}_{\alpha}$ is different. Let us call those values $ 0=\alpha_1 < \cdots < \alpha_k$. The number $ k$ is less or equal to the number of leaves in $ \hat{f}$. For $ \alpha_k \leq \alpha < \alpha_{k+1}$, $ \hat{f}_{\alpha}$ is the smallest subtree of $ \hat{f}$ minimizing $ R_{\alpha} ( )$. Now the sequence $ \hat{f}_{\alpha_1},\ldots ,\hat{f}_{\alpha_k}$ forms a decreasing sequence of trees, $ \hat{f}_{\alpha_1}$ is the original tree $ \hat{f}$, and $ \hat{f}_{\alpha_k}$ consists of the root node only.

Method and Data Technologies   MD*TECH Method and Data Technologies
  http://www.mdtech.de  mdtech@mdtech.de