Cross-lingual data linking is the problem of establishing links between resources, such as places, services, or movies, which are described in different languages. In cross-lingual data linking it is often the case that very short descriptions have to be matched, which makes the problem even more challenging. This work presents a method named TRanslation-based Explicit Semantic Analysis (TR-ESA) to represent and match short textual descriptions available in different languages. TR-ESA translates short descriptions in any given language into a pivot language by exploiting a machine translation tool. Then, it generates a Wikipedia-based representation of the translated text by using the Explicit Semantic Analysis technique. The resulting representations are used to match short descriptions in different languages. The method is incorporated in CroSeR (Cross-lingual Service Retrieval), an interactive data linking tool that recommends potential matches to users. We compared results coming from an in-vitro evaluation on a gold standard consisting of five datasets in different languages, with an in-vivo experiment that involved human experts supported by CroSeR. The in-vivo evaluation confirmed the results of the in-vitro evaluation and the overall effectiveness of the proposed method.

Cross-lingual link discovery with TR-ESA

NARDUCCI, FEDELUCIO;SEMERARO, Giovanni
2017-01-01

Abstract

Cross-lingual data linking is the problem of establishing links between resources, such as places, services, or movies, which are described in different languages. In cross-lingual data linking it is often the case that very short descriptions have to be matched, which makes the problem even more challenging. This work presents a method named TRanslation-based Explicit Semantic Analysis (TR-ESA) to represent and match short textual descriptions available in different languages. TR-ESA translates short descriptions in any given language into a pivot language by exploiting a machine translation tool. Then, it generates a Wikipedia-based representation of the translated text by using the Explicit Semantic Analysis technique. The resulting representations are used to match short descriptions in different languages. The method is incorporated in CroSeR (Cross-lingual Service Retrieval), an interactive data linking tool that recommends potential matches to users. We compared results coming from an in-vitro evaluation on a gold standard consisting of five datasets in different languages, with an in-vivo experiment that involved human experts supported by CroSeR. The in-vivo evaluation confirmed the results of the in-vitro evaluation and the overall effectiveness of the proposed method.
File in questo prodotto:
File Dimensione Formato  
1-s2.0-S0020025517304905-main FINAL.pdf

non disponibili

Tipologia: Documento in Versione Editoriale
Licenza: NON PUBBLICO - Accesso privato/ristretto
Dimensione 1.67 MB
Formato Adobe PDF
1.67 MB Adobe PDF   Visualizza/Apri   Richiedi una copia
1-s2.0-S0020025517304905-main(AcceptedManuscript).pdf

accesso aperto

Tipologia: Documento in Pre-print
Licenza: Creative commons
Dimensione 1.79 MB
Formato Adobe PDF
1.79 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11586/187033
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 7
  • ???jsp.display-item.citation.isi??? 3
social impact