The expected bug-fixing resolution time is one of the most important factors in bug triage, as an accurate prediction of bug-fixing times of newly submitted bugs helps to support both resource allocation and the triage process. Our approach treats the problem of bug-fix time estimation as a text categorization problem. To address this problem, we used Latent Dirichlet Allocation (LDA) model, a hierarchical statistical model based on what are called topics. Formally, a topic is a probability distribution over terms in a vocabulary. Such topic models provide useful descriptive statistics for a collection, which facilitates tasks like classification. Here we build a classification model on latent Dirichlet allocation (LDA). In LDA, we treat the topic proportions for a bug report as a draw from a Dirichlet distribution. We obtain the words in the bug report by repeatedly choosing a topic assignment from those proportions, then drawing a word from the corresponding topic. In supervised latent Dirichlet allocation (SLDA), we add to LDA a response variable associated with each document. Finally, we consider the supervised latent Dirichlet allocation with covariates (SLDAX) model, a generalization of SLDA, that incorporates manifest variables and latent topics as predictors of an outcome. We evaluated the proposed approach on a large dataset, composed of data gathered from defect tracking systems of five well-known open-source systems. Results show that SLDAX provides a better recall than those provided by topic models LDA-based.

Predicting Bug-Fixing Time Using the Latent Dirichlet Allocation Model with Covariates

Ardimento P.
;
Boffoli N.
2023-01-01

Abstract

The expected bug-fixing resolution time is one of the most important factors in bug triage, as an accurate prediction of bug-fixing times of newly submitted bugs helps to support both resource allocation and the triage process. Our approach treats the problem of bug-fix time estimation as a text categorization problem. To address this problem, we used Latent Dirichlet Allocation (LDA) model, a hierarchical statistical model based on what are called topics. Formally, a topic is a probability distribution over terms in a vocabulary. Such topic models provide useful descriptive statistics for a collection, which facilitates tasks like classification. Here we build a classification model on latent Dirichlet allocation (LDA). In LDA, we treat the topic proportions for a bug report as a draw from a Dirichlet distribution. We obtain the words in the bug report by repeatedly choosing a topic assignment from those proportions, then drawing a word from the corresponding topic. In supervised latent Dirichlet allocation (SLDA), we add to LDA a response variable associated with each document. Finally, we consider the supervised latent Dirichlet allocation with covariates (SLDAX) model, a generalization of SLDA, that incorporates manifest variables and latent topics as predictors of an outcome. We evaluated the proposed approach on a large dataset, composed of data gathered from defect tracking systems of five well-known open-source systems. Results show that SLDAX provides a better recall than those provided by topic models LDA-based.
2023
978-3-031-36596-6
978-3-031-36597-3
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11586/451440
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? ND
social impact