Advances of genome sequencing techniques have risen an overwhelming increase in the literature on discovered genes, proteins and their role in biological processes. However, the biomedical literature remains a greatly unexploited source of biological information. Information Extraction (IE) techniques are necessary to map this information into structured representations that allow facts relating domain-relevant entities to be automatically recognized. In this paper, we present a framework that supports biologists in the task of automatic extraction of information from texts. The framework integrates a data mining module that discovers extraction rules from a set of manually labelled texts. Extraction models are subsequently applied in an automatic mode on unseen texts. We report an application to a real-world dataset composed by publications selected to support biologists in the annotation of the HmtDB database.
Mining Information Extraction Models for HmtDB annotation
MALERBA, Donato;ATTIMONELLI, Marcella
2006-01-01
Abstract
Advances of genome sequencing techniques have risen an overwhelming increase in the literature on discovered genes, proteins and their role in biological processes. However, the biomedical literature remains a greatly unexploited source of biological information. Information Extraction (IE) techniques are necessary to map this information into structured representations that allow facts relating domain-relevant entities to be automatically recognized. In this paper, we present a framework that supports biologists in the task of automatic extraction of information from texts. The framework integrates a data mining module that discovers extraction rules from a set of manually labelled texts. Extraction models are subsequently applied in an automatic mode on unseen texts. We report an application to a real-world dataset composed by publications selected to support biologists in the annotation of the HmtDB database.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.