Several methods have been proposed in the literature for decision tree (post)-pruning. This article presents a unifying framework according to which any pruning method can be defined as a four-tuple (Space, Operators, Evaluation function, Search strategy), and the pruning process can be cast as an optimization problem. Six well-known pruning methods are investigated by means of this framework and their common aspects, strengths and weaknesses are described. Furthermore, a new empirical analysis of the effect of post-pruning on both the predictive accuracy and the size of induced decision trees is reported. The experimental comparison of the pruning methods involves 14 datasets and is based on the cross-validation procedure. The results confirm most of the conclusions drawn in a previous comparison based on the holdout procedure.
The effects of pruning methods on the predictive accuracy of induced decision trees
ESPOSITO, Floriana;MALERBA, Donato;SEMERARO, Giovanni;
1999-01-01
Abstract
Several methods have been proposed in the literature for decision tree (post)-pruning. This article presents a unifying framework according to which any pruning method can be defined as a four-tuple (Space, Operators, Evaluation function, Search strategy), and the pruning process can be cast as an optimization problem. Six well-known pruning methods are investigated by means of this framework and their common aspects, strengths and weaknesses are described. Furthermore, a new empirical analysis of the effect of post-pruning on both the predictive accuracy and the size of induced decision trees is reported. The experimental comparison of the pruning methods involves 14 datasets and is based on the cross-validation procedure. The results confirm most of the conclusions drawn in a previous comparison based on the holdout procedure.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.