Discovering knowledge through multi-modal association rule mining for document image analysis

IRIS

The paper introduces a descriptive data mining method to discover knowledge for the task of automatic categorization in document image analysis. We argue that a document image is a multi-modal unit of analysis whose semantics is deduced from a combination of textual content, layout structure and logical structure. So, the method consid-ers simultaneously different modalities of document representation, and, therefore different types of information: spatial information derived from a complex document image analysis process (layout analysis), informa-tion extracted from the logical structure of the document (by means of document image classification and understanding) and the textual infor-mation extracted by means of an OCR. The proposed method is based on a relational data mining approach to discover association rules, where the relational setting is justified, given its appropriateness to analyze data available in more than one modality. Experimental results on a real world dataset are reported.

Discovering knowledge through multi-modal association rule mining for document image analysis

CECI, MICHELANGELO;LOGLISCI, CORRADO;RUDD, Lynn Margaret;MALERBA, Donato

2015-01-01

Abstract

The paper introduces a descriptive data mining method to discover knowledge for the task of automatic categorization in document image analysis. We argue that a document image is a multi-modal unit of analysis whose semantics is deduced from a combination of textual content, layout structure and logical structure. So, the method consid-ers simultaneously different modalities of document representation, and, therefore different types of information: spatial information derived from a complex document image analysis process (layout analysis), informa-tion extracted from the logical structure of the document (by means of document image classification and understanding) and the textual infor-mation extracted by means of an OCR. The proposed method is based on a relational data mining approach to discover association rules, where the relational setting is justified, given its appropriateness to analyze data available in more than one modality. Experimental results on a real world dataset are reported.

Scheda breve

Scheda completa

Scheda completa (DC)

Anno

2015

Appare nelle tipologie:

2.1 Contributo in volume (Capitolo o Saggio)

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11586/178012

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

1

ND

social impact