Contextualized BERT Sentence Embeddings for Author Profiling: The Cost of Performances

IRIS

The necessity to know information about the real identity of an online subject is a highly relevant issue in User Profiling, especially for analysis from digital sources such as social media. The digital identity of a user does not always present explicit data about her offline life such as age, gender, work, and more. This problem makes the task of user profiling complex and incomplete. For many years this issue has received a considerable amount of attention from the whole community, which has developed several solutions, also based on machine learning, to estimate user characteristics. The increasing diffusion of deep learning approaches has allowed, on the one hand, to obtain a considerable increase in predictive performance, but on the other hand, to have available models that cannot be interpreted and that require very high computational power. Considering the validity of new pre-trained language models on extensive data for resolving many natural language processing and classification tasks, we decided to propose a BERT-based approach (BERT-DNN) also for the author profiling task. In a first analysis, we compared the results obtained by our model with them of more classical approaches. As a follow, a critical analysis was carried out. We analyze the advantages and disadvantages of these approaches also in terms of resources needed to run them. The results obtained by our model are encouraging in terms of reliability but very disappointing if we consider the computational power required for running it.

Contextualized BERT Sentence Embeddings for Author Profiling: The Cost of Performances

Polignano M.;de Gemmis M.;Semeraro G.

2020-01-01

Abstract

The necessity to know information about the real identity of an online subject is a highly relevant issue in User Profiling, especially for analysis from digital sources such as social media. The digital identity of a user does not always present explicit data about her offline life such as age, gender, work, and more. This problem makes the task of user profiling complex and incomplete. For many years this issue has received a considerable amount of attention from the whole community, which has developed several solutions, also based on machine learning, to estimate user characteristics. The increasing diffusion of deep learning approaches has allowed, on the one hand, to obtain a considerable increase in predictive performance, but on the other hand, to have available models that cannot be interpreted and that require very high computational power. Considering the validity of new pre-trained language models on extensive data for resolving many natural language processing and classification tasks, we decided to propose a BERT-based approach (BERT-DNN) also for the author profiling task. In a first analysis, we compared the results obtained by our model with them of more classical approaches. As a follow, a critical analysis was carried out. We analyze the advantages and disadvantages of these approaches also in terms of resources needed to run them. The results obtained by our model are encouraging in terms of reliability but very disappointing if we consider the computational power required for running it.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2020
			
	Codice ISBN
	
				978-3-030-58810-6
978-3-030-58811-3
			
	Appare nelle tipologie:
	
				4.1 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
316-Polignano-Contextualized BERT Sentence Embeddings for Author Profiling.pdf non disponibili Tipologia: Documento in Versione Editoriale Licenza: NON PUBBLICO - Accesso privato/ristretto Dimensione 792.18 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	792.18 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11586/379384

Citazioni

ND

16

11

social impact