The role of sentiment analysis is increasingly emerging to study software developers' emotions by mining crowd-generated content within social software engineering tools. However, off-the-shelf sentiment analysis tools have been trained on non-technical domains and general-purpose social media, thus resulting in misclassifications of technical jargon and problem reports. Here, we present Senti4SD, a classifier specifically trained to support sentiment analysis in developers' communication channels. Senti4SD is trained and validated using a gold standard of Stack Overflow questions, answers, and comments manually annotated for sentiment polarity. It exploits a suite of both lexicon- and keyword-based features, as well as semantic features based on word embedding. With respect to a mainstream off-the-shelf tool, which we use as a baseline, Senti4SD reduces the misclassifications of neutral and positive posts as emotionally negative. To encourage replications, we release a lab package including the classifier, the word embedding space, and the gold standard with annotation guidelines.

Sentiment Polarity Detection for Software Development

CALEFATO, FABIO;LANUBILE, Filippo;NOVIELLI, NICOLE
2018-01-01

Abstract

The role of sentiment analysis is increasingly emerging to study software developers' emotions by mining crowd-generated content within social software engineering tools. However, off-the-shelf sentiment analysis tools have been trained on non-technical domains and general-purpose social media, thus resulting in misclassifications of technical jargon and problem reports. Here, we present Senti4SD, a classifier specifically trained to support sentiment analysis in developers' communication channels. Senti4SD is trained and validated using a gold standard of Stack Overflow questions, answers, and comments manually annotated for sentiment polarity. It exploits a suite of both lexicon- and keyword-based features, as well as semantic features based on word embedding. With respect to a mainstream off-the-shelf tool, which we use as a baseline, Senti4SD reduces the misclassifications of neutral and positive posts as emotionally negative. To encourage replications, we release a lab package including the classifier, the word embedding space, and the gold standard with annotation guidelines.
File in questo prodotto:
File Dimensione Formato  
Calefato2018_Article_SentimentPolarityDetectionForS.pdf

non disponibili

Tipologia: Documento in Versione Editoriale
Licenza: NON PUBBLICO - Accesso privato/ristretto
Dimensione 1.35 MB
Formato Adobe PDF
1.35 MB Adobe PDF   Visualizza/Apri   Richiedi una copia
sentiment_polarity_preprint.pdf

accesso aperto

Descrizione: Pre-print
Tipologia: Documento in Pre-print
Licenza: Creative commons
Dimensione 2.19 MB
Formato Adobe PDF
2.19 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11586/199695
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 151
  • ???jsp.display-item.citation.isi??? 142
social impact