The task of identifying hate speech in social networks has recently attracted considerable interest in the community of natural language processing. This challenge has great importance for identifying cyberattacks on minors, bullying activities, misogyny, or other kinds of hate discriminations that can cause diseases. Identifying them quickly and accurately can, therefore, help to solve situations that are dangerous for the health of the attacked people. Numerous national and international initiatives have addressed this problem by providing many resources and solutions to the problem. In particular, we focus on the Hate Speech Detection evaluation campaign (HaSpeeDe) held at Evalita 2018. It proposes an evaluation campaign with the aim of developing strategies for identifying hate speeches on Twitter and Facebook written in the Italian language. The dataset released for the task has been used by the classification approach proposed in this work for demonstrating that it is possible to solve the task efficiently and accurately. Our solution is based on an Italian Language Understanding model trained with a BERT architecture and 200M of Italian Tweets (AlBERTo). We used AlBERTo for fine-tuning a classification model of hate speech, obtaining state of the art results considering the best systems presented at the HaSpeeDe workshop. In this regard, AlBERTo is here proposed as one of the most versatile resources to be used for the task of classification of Social Media Textual contents in the Italian Language. The claim is supported by the similar results obtained by AlBERTo in the task of sentiment analysis, and irony detection demonstrated in previous works. The resources need for fine-tuning AlBERTo in these classification tasks are available at: https://github.com/marcopoli/AlBERTo-it

Hate speech detection through Alberto Italian language understanding model

Polignano M.
;
Basile P.
;
de Gemmis M.;Semeraro G.
2019-01-01

Abstract

The task of identifying hate speech in social networks has recently attracted considerable interest in the community of natural language processing. This challenge has great importance for identifying cyberattacks on minors, bullying activities, misogyny, or other kinds of hate discriminations that can cause diseases. Identifying them quickly and accurately can, therefore, help to solve situations that are dangerous for the health of the attacked people. Numerous national and international initiatives have addressed this problem by providing many resources and solutions to the problem. In particular, we focus on the Hate Speech Detection evaluation campaign (HaSpeeDe) held at Evalita 2018. It proposes an evaluation campaign with the aim of developing strategies for identifying hate speeches on Twitter and Facebook written in the Italian language. The dataset released for the task has been used by the classification approach proposed in this work for demonstrating that it is possible to solve the task efficiently and accurately. Our solution is based on an Italian Language Understanding model trained with a BERT architecture and 200M of Italian Tweets (AlBERTo). We used AlBERTo for fine-tuning a classification model of hate speech, obtaining state of the art results considering the best systems presented at the HaSpeeDe workshop. In this regard, AlBERTo is here proposed as one of the most versatile resources to be used for the task of classification of Social Media Textual contents in the Italian Language. The claim is supported by the similar results obtained by AlBERTo in the task of sentiment analysis, and irony detection demonstrated in previous works. The resources need for fine-tuning AlBERTo in these classification tasks are available at: https://github.com/marcopoli/AlBERTo-it
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11586/273682
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 3
  • ???jsp.display-item.citation.isi??? ND
social impact