Assessment of off-the-shelf SE-specific sentiment analysis tools: An extended replication study

IRIS

Sentiment analysis methods have become popular for investigating human communication, including discussions related to software projects. Since general-purpose sentiment analysis tools do not fit well with the information exchanged by software developers, new tools, specific for software engineering (SE), have been developed. We investigate to what extent off-the-shelf SE-specific tools for sentiment analysis mitigate the threats to conclusion validity of empirical studies in software engineering, highlighted by previous research. First, we replicate two studies addressing the role of sentiment in security discussions on GitHub and in question-writing on Stack Overflow. Then, we extend the previous studies by assessing to what extent the tools agree with each other and with the manual annotation on a gold standard of 600 documents. We find that different SE-specific sentiment analysis tools might lead to contradictory results at a fine-grain level, when used off-the-shelf. Conversely, platform-specific tuning or retraining might be needed to take into account differences in platform conventions, jargon, or document lengths.

Assessment of off-the-shelf SE-specific sentiment analysis tools: An extended replication study

Nicole Novielli;Fabio Calefato;Filippo Lanubile;Alexander Serebrenik

2021-01-01

Abstract

Sentiment analysis methods have become popular for investigating human communication, including discussions related to software projects. Since general-purpose sentiment analysis tools do not fit well with the information exchanged by software developers, new tools, specific for software engineering (SE), have been developed. We investigate to what extent off-the-shelf SE-specific tools for sentiment analysis mitigate the threats to conclusion validity of empirical studies in software engineering, highlighted by previous research. First, we replicate two studies addressing the role of sentiment in security discussions on GitHub and in question-writing on Stack Overflow. Then, we extend the previous studies by assessing to what extent the tools agree with each other and with the manual annotation on a gold standard of 600 documents. We find that different SE-specific sentiment analysis tools might lead to contradictory results at a fine-grain level, when used off-the-shelf. Conversely, platform-specific tuning or retraining might be needed to take into account differences in platform conventions, jargon, or document lengths.

Scheda breve

Scheda completa

Scheda completa (DC)

Anno

2021

Appare nelle tipologie:

1.1 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
s10664-021-09960-w (1).pdf accesso aperto Tipologia: Documento in Versione Editoriale Licenza: Creative commons Dimensione 1.29 MB Formato Adobe PDF Visualizza/Apri	1.29 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11586/369573

Citazioni

ND

26

17

social impact