This paper presents a study of one particular problem of decision tree induction, namely (post-)pruning, with the aim of finding a common framework for the plethora of pruning methods appeared in literature. Given a tree Tmax to prune, a state space is defined as the set of all subtrees of Tmax to which only one operator, called any-depth branch pruning operator, can be applied in several ways in order to move from one state to another. By introducing an evaluation function f defined on the set of subtrees, the problem of tree pruning can be cast as an optimization problem, and it is also possible to classify each post-pruning method according to both its search strategy and the kind of information exploited by f. Indeed, while some methods use only the training set in order to evaluate the accuracy of a decision tree, other methods exploit an additional pruning set that allows them to get less biased estimates of the predictive accuracy of apruned tree. The introduction of the state space shows that very simple search strategies are used by the postpruning methods considered. Finally, some empirical results allow theoretical observations on strengths and weaknesses of pruning methods to be better understood.
Decision Tree Pruning as a Search in the State Space
ESPOSITO, Floriana;MALERBA, Donato;SEMERARO, Giovanni
1993-01-01
Abstract
This paper presents a study of one particular problem of decision tree induction, namely (post-)pruning, with the aim of finding a common framework for the plethora of pruning methods appeared in literature. Given a tree Tmax to prune, a state space is defined as the set of all subtrees of Tmax to which only one operator, called any-depth branch pruning operator, can be applied in several ways in order to move from one state to another. By introducing an evaluation function f defined on the set of subtrees, the problem of tree pruning can be cast as an optimization problem, and it is also possible to classify each post-pruning method according to both its search strategy and the kind of information exploited by f. Indeed, while some methods use only the training set in order to evaluate the accuracy of a decision tree, other methods exploit an additional pruning set that allows them to get less biased estimates of the predictive accuracy of apruned tree. The introduction of the state space shows that very simple search strategies are used by the postpruning methods considered. Finally, some empirical results allow theoretical observations on strengths and weaknesses of pruning methods to be better understood.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.