Document image understanding denotes the recognition of semantically relevant components in the layout extracted from a document image. Automatic approaches for document image understanding are highly demanded today by organizations involved in the preservation and valorisation of historical documents that collect more and more document images, whose effective usage critically depends on their fast and accurate indexing and cataloguing. In this context, Data Mining techniques can be profitably applied in order to support the user in the recognition of semantically relevant components in historical document images. However, such application is not straightforward and two important aspects have to be considered: First, extracted models should take into account the inherent spatial nature of the layout of a document image and spatial relations among layout components of interest. Second, low layout quality and standard of such a material introduces a considerable amount of noise in its description. For this reasons, in this paper, we investigate the application of a Statistical Relational Data Mining method, which successfully allows relations between components to be effectively and naturally represented by resorting to the Relational Data Mining framework and guarantees robustness to noise by exploiting statistical methods. Experiments are performed on two historical document corpora from the 20's and 30's.

Relational Data Mining Techniques for Historical Document Processing

CECI, MICHELANGELO;MALERBA, Donato
2006-01-01

Abstract

Document image understanding denotes the recognition of semantically relevant components in the layout extracted from a document image. Automatic approaches for document image understanding are highly demanded today by organizations involved in the preservation and valorisation of historical documents that collect more and more document images, whose effective usage critically depends on their fast and accurate indexing and cataloguing. In this context, Data Mining techniques can be profitably applied in order to support the user in the recognition of semantically relevant components in historical document images. However, such application is not straightforward and two important aspects have to be considered: First, extracted models should take into account the inherent spatial nature of the layout of a document image and spatial relations among layout components of interest. Second, low layout quality and standard of such a material introduces a considerable amount of noise in its description. For this reasons, in this paper, we investigate the application of a Statistical Relational Data Mining method, which successfully allows relations between components to be effectively and naturally represented by resorting to the Relational Data Mining framework and guarantees robustness to noise by exploiting statistical methods. Experiments are performed on two historical document corpora from the 20's and 30's.
2006
88-6068-018-2
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11586/136712
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact