Organizing large repositories spread throughout the most diverse Web sites rises the problem of effective storage and efficient retrieval of documents. This can be obtained by selectively extracting from them the significant textual information, contained in peculiar layout components, that in turn depend on the identification of the correct document class. The continuous flow of new and different documents in a weakly structured environment like the Web calls for incrementality, as the ability to continuously update or revise a faulty knowledge previously acquired, while the need to express structural relations among layout components suggest the exploitation of a powerful and symbolic representation language. This paper proposes the application of incremental first-order logic learning techniques in the document layout preprocessing steps, supported by good results obtained in experiments on a real dataset.

Incremental Learning of First Order Logic Theories for the Automatic Annotations of Web Documents

ESPOSITO, Floriana;FERILLI, Stefano;DI MAURO, NICOLA;BASILE, TERESA MARIA
2007-01-01

Abstract

Organizing large repositories spread throughout the most diverse Web sites rises the problem of effective storage and efficient retrieval of documents. This can be obtained by selectively extracting from them the significant textual information, contained in peculiar layout components, that in turn depend on the identification of the correct document class. The continuous flow of new and different documents in a weakly structured environment like the Web calls for incrementality, as the ability to continuously update or revise a faulty knowledge previously acquired, while the need to express structural relations among layout components suggest the exploitation of a powerful and symbolic representation language. This paper proposes the application of incremental first-order logic learning techniques in the document layout preprocessing steps, supported by good results obtained in experiments on a real dataset.
2007
978-0-7695-2822-9
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11586/112016
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 7
  • ???jsp.display-item.citation.isi??? 3
social impact