Association rules are a class of regularities that expresses statistical information about cooccurrence relations among items. Generalized association rules are a very important extension of traditional association rules that allows to exploit a taxonomy over the items. However, by using a taxonomy several thousands of rules are discovered and the most of them can be redundant. In this paper we face the problem to eliminate redundancy in the generalized association rules by introducing a framework that exploits the statistical properties of two well-known concepts in the field of association rule mining, namely closed itemsets and minimal non-redundant rules. At this aim, we define an algorithm which solves the problem of mining generalized closed frequent itemsets. Thus an operative characterization to generate non-redundant generalized rules from the set of generalized closed frequent itemsets is provided too. An application of the framework to biomedical textual data analysis is reported.
Mining non redundant generalized association rules
LOGLISCI, CORRADO;MALERBA, Donato
2007-01-01
Abstract
Association rules are a class of regularities that expresses statistical information about cooccurrence relations among items. Generalized association rules are a very important extension of traditional association rules that allows to exploit a taxonomy over the items. However, by using a taxonomy several thousands of rules are discovered and the most of them can be redundant. In this paper we face the problem to eliminate redundancy in the generalized association rules by introducing a framework that exploits the statistical properties of two well-known concepts in the field of association rule mining, namely closed itemsets and minimal non-redundant rules. At this aim, we define an algorithm which solves the problem of mining generalized closed frequent itemsets. Thus an operative characterization to generate non-redundant generalized rules from the set of generalized closed frequent itemsets is provided too. An application of the framework to biomedical textual data analysis is reported.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.