Impact of data quality for automatic issue classification using pre-trained language models

Colavito, Giuseppe; Lanubile, Filippo; Novielli, Nicole; Quaranta, Luigi

doi:10.1016/j.jss.2023.111838

Issue classification aims to recognize whether an issue reports a bug, a request for enhancement or support. In this paper we use pre-trained models for the automatic classification of issues and investigate how the quality of data affects the performance of classifiers. Despite the application of data quality filters, none of our attempts had a significant effect on model quality. As root cause we identify a threat to construct validity underlying the issue labeling.

Impact of data quality for automatic issue classification using pre-trained language models

Giuseppe Colavito;Filippo Lanubile;Nicole Novielli;Luigi Quaranta

2024-01-01

Abstract

Issue classification aims to recognize whether an issue reports a bug, a request for enhancement or support. In this paper we use pre-trained models for the automatic classification of issues and investigate how the quality of data affects the performance of classifiers. Despite the application of data quality filters, none of our attempts had a significant effect on model quality. As root cause we identify a threat to construct validity underlying the issue labeling.

Scheda breve

Scheda completa

Scheda completa (DC)

Anno

2024

Appare nelle tipologie:

1.1 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
BERT_based_Issue_Classification_JSS__workshop_extension.pdf accesso aperto Tipologia: Documento in Pre-print Licenza: Creative commons Dimensione 1.01 MB Formato Adobe PDF Visualizza/Apri	1.01 MB	Adobe PDF	Visualizza/Apri
Novielli et al. @JSS 2024.pdf non disponibili Tipologia: Documento in Versione Editoriale Licenza: Copyright dell'editore Dimensione 1.62 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	1.62 MB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11586/454542

Citazioni

ND

10

10

Impact of data quality for automatic issue classification using pre-trained language models

Giuseppe Colavito;Filippo Lanubile;Nicole Novielli;Luigi Quaranta

2024-01-01

Abstract

Scheda breve Scheda completa Scheda completa (DC)

Informazioni

Citazioni

social impact

Conferma cancellazione

Scheda breve

Scheda completa

Scheda completa (DC)