Most of works on text categorization have focused on classifying documents into a set of categories with no relationships among them (flat classification). However, due to the intrinsic structure that can be found in many domains, recent works are focusing on more complex tasks, such as multi-label classification, hierarchical classification and multidimensional classification. In this paper, we propose the hierarchical multidimensional classification task, where documents can be classified according to different dimensions/viewpoints (e.g., topic, geographic area, time period, etc.), where in each dimension categories can be organized hierarchically. In particular, we propose the system Multi- WebClass, a multidimensional variant of the system WebClassIII, which discovers correlations among categories belonging to different dimensions and exploits them, according to two different strategies, to refine the set of features used during the learning process. Experimental evaluation performed on both synthetic and real datasets confirms that the exploitation of correlations among categories can lead to better results in terms of classification accuracy, possibly reducing specialization error or generalization error, depending on the strategy adopted for the refinement of the feature sets.
Scheda prodotto non validato
Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo
|Titolo:||Hierarchical multidimensional classification of web documents with MultiWebClass|
|Data di pubblicazione:||2015|
|Appare nelle tipologie:||4.1 Contributo in Atti di convegno|