Issue classification aims to recognize whether an issue reports a bug, a request for enhancement or support. In this paper we use pre-trained models for the automatic classification of issues and investigate how the quality of data affects the performance of classifiers. Despite the application of data quality filters, none of our attempts had a significant effect on model quality. As root cause we identify a threat to construct validity underlying the issue labeling.

Impact of data quality for automatic issue classification using pre-trained language models

Giuseppe Colavito;Filippo Lanubile;Nicole Novielli;Luigi Quaranta
2024-01-01

Abstract

Issue classification aims to recognize whether an issue reports a bug, a request for enhancement or support. In this paper we use pre-trained models for the automatic classification of issues and investigate how the quality of data affects the performance of classifiers. Despite the application of data quality filters, none of our attempts had a significant effect on model quality. As root cause we identify a threat to construct validity underlying the issue labeling.
File in questo prodotto:
File Dimensione Formato  
BERT_based_Issue_Classification_JSS__workshop_extension.pdf

accesso aperto

Tipologia: Documento in Post-print
Licenza: Creative commons
Dimensione 1.01 MB
Formato Adobe PDF
1.01 MB Adobe PDF Visualizza/Apri
Novielli et al. @JSS 2024.pdf

accesso aperto

Tipologia: Documento in Versione Editoriale
Licenza: Copyright dell'editore
Dimensione 1.62 MB
Formato Adobe PDF
1.62 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11586/454542
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? ND
social impact