Biomedical information contained in text repositories (e.g. Medline) represents the vast majority of genomic information accumulated through years. Methods to transform unstructured text into structured information are necessary to provide access to these resources. Regularities in structured data can then be discovered by means of data mining techniques. Transforming unstructured text into structured information means transforming texts into structured objects described on the basis of discrete data (e.g. words denoting biomedical entities such as genes, disease names) characterized by one or more attributes and eventually by relations between data (e.g. words denoting relations between genes and drug reactions or diseases). Entity and relation extraction is generally performed by using Information Extraction techniques which allow to analyze natural language texts at different levels of complexity in order to extract features on entities and relations. Obtained features may express statistical (e.g. word frequency), lexical (e.g. alphanumeric, capitalized word), structural (e.g. the order of sentences in a text and of entities in a sentence), syntactical (e.g. singular/plural proper/not proper nouns, base/conjugated verbs) or domain­ specific knowledge (e.g. an entity belonging to a dictionary). Biomedical entities can also be described in terms of specialized taxonomies available in the life science field (e.g. GeneOntology, MeSH, UMLS). Association rule mining on biomedical literature exploiting the MeSH taxonomy to discover associations between entities at different level of abstraction has been already investigated. While previous works ignore information on relations among objects, we propose to exploit object interactions by resorting to a first ­order formalism and a multi­relational approach to association rule mining. In this case, the mining process is able to extract association rules involving objects and relations at different levels of granularity with respect to a hierarchy defined on objects of interest.

Beyond unstructured textual data for life science

MALERBA, Donato
2005-01-01

Abstract

Biomedical information contained in text repositories (e.g. Medline) represents the vast majority of genomic information accumulated through years. Methods to transform unstructured text into structured information are necessary to provide access to these resources. Regularities in structured data can then be discovered by means of data mining techniques. Transforming unstructured text into structured information means transforming texts into structured objects described on the basis of discrete data (e.g. words denoting biomedical entities such as genes, disease names) characterized by one or more attributes and eventually by relations between data (e.g. words denoting relations between genes and drug reactions or diseases). Entity and relation extraction is generally performed by using Information Extraction techniques which allow to analyze natural language texts at different levels of complexity in order to extract features on entities and relations. Obtained features may express statistical (e.g. word frequency), lexical (e.g. alphanumeric, capitalized word), structural (e.g. the order of sentences in a text and of entities in a sentence), syntactical (e.g. singular/plural proper/not proper nouns, base/conjugated verbs) or domain­ specific knowledge (e.g. an entity belonging to a dictionary). Biomedical entities can also be described in terms of specialized taxonomies available in the life science field (e.g. GeneOntology, MeSH, UMLS). Association rule mining on biomedical literature exploiting the MeSH taxonomy to discover associations between entities at different level of abstraction has been already investigated. While previous works ignore information on relations among objects, we propose to exploit object interactions by resorting to a first ­order formalism and a multi­relational approach to association rule mining. In this case, the mining process is able to extract association rules involving objects and relations at different levels of granularity with respect to a hierarchy defined on objects of interest.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11586/69240
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact