An empirical assessment of best-answer prediction models in technical Q&amp;A sites

Calefato, Fabio; Lanubile, Filippo; Novielli, Nicole

doi:10.1007/s10664-018-9642-5

Technical Q&A sites have become essential for software engineers as they constantly seek help from other experts to solve their work problems. Despite their success, many questions remain unresolved, sometimes because the asker does not acknowledge any helpful answer. In these cases, an information seeker can only browse all the answers within a question thread to assess their quality as potential solutions. We approach this time-consuming problem as a binary-classification task where a best-answer prediction model is built to identify the accepted answer among those within a resolved question thread, and the candidate solutions to those questions that have received answers but are still unresolved. In this paper, we report on a study aimed at assessing 26 best-answer prediction models in two steps. First, we study how models perform when predicting best answers in Stack Overflow, the most popular Q&A site for software engineers. Then, we assess performance in a cross-platform setting where the prediction models are trained on Stack Overflow and tested on other technical Q&A sites. Our findings show that the choice of the classifier and automatied parameter tuning have a large impact on the prediction of the best answer. We also demonstrate that our approach to the best-answer prediction problem is generalizable across technical Q&A sites. Finally, we provide practical recommendations to Q&A platform designers to curate and preserve the crowdsourced knowledge shared through these sites.

An empirical assessment of best-answer prediction models in technical Q&A sites

Fabio Calefato;Filippo Lanubile;Nicole Novielli

2019-01-01

Abstract

Technical Q&A sites have become essential for software engineers as they constantly seek help from other experts to solve their work problems. Despite their success, many questions remain unresolved, sometimes because the asker does not acknowledge any helpful answer. In these cases, an information seeker can only browse all the answers within a question thread to assess their quality as potential solutions. We approach this time-consuming problem as a binary-classification task where a best-answer prediction model is built to identify the accepted answer among those within a resolved question thread, and the candidate solutions to those questions that have received answers but are still unresolved. In this paper, we report on a study aimed at assessing 26 best-answer prediction models in two steps. First, we study how models perform when predicting best answers in Stack Overflow, the most popular Q&A site for software engineers. Then, we assess performance in a cross-platform setting where the prediction models are trained on Stack Overflow and tested on other technical Q&A sites. Our findings show that the choice of the classifier and automatied parameter tuning have a large impact on the prediction of the best answer. We also demonstrate that our approach to the best-answer prediction problem is generalizable across technical Q&A sites. Finally, we provide practical recommendations to Q&A platform designers to curate and preserve the crowdsourced knowledge shared through these sites.

Scheda breve

Scheda completa

Scheda completa (DC)

Anno

2019

Appare nelle tipologie:

1.1 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
Calefato2019_Article_AnEmpiricalAssessmentOfBest-an.pdf non disponibili Descrizione: Versione editoriale Tipologia: Documento in Versione Editoriale Licenza: NON PUBBLICO - Accesso privato/ristretto Dimensione 2.02 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	2.02 MB	Adobe PDF	Visualizza/Apri Richiedi una copia
EMSE author preprint.pdf accesso aperto Descrizione: Versione preprint Tipologia: Documento in Pre-print Licenza: Creative commons Dimensione 4.09 MB Formato Adobe PDF Visualizza/Apri	4.09 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11586/224634

Citazioni

ND

22

14

An empirical assessment of best-answer prediction models in technical Q&A sites

Fabio Calefato;Filippo Lanubile;Nicole Novielli

2019-01-01

Abstract

Scheda breve Scheda completa Scheda completa (DC)

Informazioni

Citazioni

social impact

Conferma cancellazione

Scheda breve

Scheda completa

Scheda completa (DC)