The maintenance of biological databases is at present a problem of great interest since the progress made in many experimental procedures has led to an ever increasing amount of data. These data need to be structured and stored in databases and made accessible to the biological community in user-friendly ways. Although both the interest and the need of accessing biological databases are high, the mechanisms to fund their maintenance are unclear. Funding agencies cannot support data annotation in terms of labour costs and hence the development of new tools based on “data miming” technologies could greatly contribute to keep biological databases updated. Here we present a new approach aimed to contribute to the annotation in the HmtDB resource (http://www.hmdb.uniba.it/) of variability data associated to clinical phenotypes [1]. These data are prevalently available in literature where they are reported in a completely free style. Thus, we suggest the construction of a knowledge base derived from browsing papers on web and to be used in the retrieval phase. Nevertheless, problems in extracting data from literature come not only from the heterogeneity of presentation styles but mainly from the unstructured format (i.e. the natural language) in which they are represented. In this scenario, the goal is to feed a knowledge base by identifying occurrences of specific biological entities and their features as well as the particular method and experimental setting of the scientific study adopted in the publication. In this work, we describe some solutions to the problem of structuring information contained in scientific literature in digital (i.e., pdf) or paper format.

A data mining approach to retrieve mitochondrial variability data associated to clinical phenotypes

ATTIMONELLI, Marcella;SANTAMARIA M.;CECI, MICHELANGELO;LOGLISCI, CORRADO;MALERBA, Donato
2005-01-01

Abstract

The maintenance of biological databases is at present a problem of great interest since the progress made in many experimental procedures has led to an ever increasing amount of data. These data need to be structured and stored in databases and made accessible to the biological community in user-friendly ways. Although both the interest and the need of accessing biological databases are high, the mechanisms to fund their maintenance are unclear. Funding agencies cannot support data annotation in terms of labour costs and hence the development of new tools based on “data miming” technologies could greatly contribute to keep biological databases updated. Here we present a new approach aimed to contribute to the annotation in the HmtDB resource (http://www.hmdb.uniba.it/) of variability data associated to clinical phenotypes [1]. These data are prevalently available in literature where they are reported in a completely free style. Thus, we suggest the construction of a knowledge base derived from browsing papers on web and to be used in the retrieval phase. Nevertheless, problems in extracting data from literature come not only from the heterogeneity of presentation styles but mainly from the unstructured format (i.e. the natural language) in which they are represented. In this scenario, the goal is to feed a knowledge base by identifying occurrences of specific biological entities and their features as well as the particular method and experimental setting of the scientific study adopted in the publication. In this work, we describe some solutions to the problem of structuring information contained in scientific literature in digital (i.e., pdf) or paper format.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11586/136739
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact