Community question answering (CQA) sites use a collaborative paradigm to satisfy complex information needs. Although the task of matching questions to their best answers has been tackled for more than a decade, the social question-answering practice is a complex process. The factors influencing the accuracy of question-answer matching are many and hard to disentangle. We approach the task from an applicationoriented perspective, probing the space of several dimensions relevant to this problem: features, algorithms, and topics. We gather under a learning to rank framework the most extensive feature set used in literature to date, including 225 features from five different families. We test the power of such features in predicting the best answer to a question on the largest dataset from Yahoo Answers used for this task so far (40M answers) and provide a faceted analysis of the results along different topical areas and question types. We propose a novel family of distributional semantics measures that most of the time can seamlessly replace widely used linguistic similarity features, being more than one order of magnitude faster to compute and providing greater predictive power. The best feature set reaches an improvement between 11% and 26% in P@1 compared to recent well-established state-of-the-art methods.

Social question answering: Textual, user, and network features for best answer prediction

MOLINO, PIERO;LOPS, PASQUALE
2016-01-01

Abstract

Community question answering (CQA) sites use a collaborative paradigm to satisfy complex information needs. Although the task of matching questions to their best answers has been tackled for more than a decade, the social question-answering practice is a complex process. The factors influencing the accuracy of question-answer matching are many and hard to disentangle. We approach the task from an applicationoriented perspective, probing the space of several dimensions relevant to this problem: features, algorithms, and topics. We gather under a learning to rank framework the most extensive feature set used in literature to date, including 225 features from five different families. We test the power of such features in predicting the best answer to a question on the largest dataset from Yahoo Answers used for this task so far (40M answers) and provide a faceted analysis of the results along different topical areas and question types. We propose a novel family of distributional semantics measures that most of the time can seamlessly replace widely used linguistic similarity features, being more than one order of magnitude faster to compute and providing greater predictive power. The best feature set reaches an improvement between 11% and 26% in P@1 compared to recent well-established state-of-the-art methods.
File in questo prodotto:
File Dimensione Formato  
a4-molino.pdf

non disponibili

Tipologia: Documento in Versione Editoriale
Licenza: NON PUBBLICO - Accesso privato/ristretto
Dimensione 774.33 kB
Formato Adobe PDF
774.33 kB Adobe PDF   Visualizza/Apri   Richiedi una copia
TOIS_open copia.pdf

non disponibili

Tipologia: Documento in Post-print
Licenza: NON PUBBLICO - Accesso privato/ristretto
Dimensione 727.7 kB
Formato Adobe PDF
727.7 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11586/182310
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 25
  • ???jsp.display-item.citation.isi??? 21
social impact