The necessity to know information about the real identity of an online subject is a highly relevant issue in User Profiling, especially for analysis from digital sources such as social media. The digital identity of a user does not always present explicit data about her offline life such as age, gender, work, and more. This problem makes the task of user profiling complex and incomplete. For many years this issue has received a considerable amount of attention from the whole community, which has developed several solutions, also based on machine learning, to estimate user characteristics. The increasing diffusion of deep learning approaches has allowed, on the one hand, to obtain a considerable increase in predictive performance, but on the other hand, to have available models that cannot be interpreted and that require very high computational power. Considering the validity of new pre-trained language models on extensive data for resolving many natural language processing and classification tasks, we decided to propose a BERT-based approach (BERT-DNN) also for the author profiling task. In a first analysis, we compared the results obtained by our model with them of more classical approaches. As a follow, a critical analysis was carried out. We analyze the advantages and disadvantages of these approaches also in terms of resources needed to run them. The results obtained by our model are encouraging in terms of reliability but very disappointing if we consider the computational power required for running it.

Contextualized BERT Sentence Embeddings for Author Profiling: The Cost of Performances

Polignano M.;de Gemmis M.;Semeraro G.
2020-01-01

Abstract

The necessity to know information about the real identity of an online subject is a highly relevant issue in User Profiling, especially for analysis from digital sources such as social media. The digital identity of a user does not always present explicit data about her offline life such as age, gender, work, and more. This problem makes the task of user profiling complex and incomplete. For many years this issue has received a considerable amount of attention from the whole community, which has developed several solutions, also based on machine learning, to estimate user characteristics. The increasing diffusion of deep learning approaches has allowed, on the one hand, to obtain a considerable increase in predictive performance, but on the other hand, to have available models that cannot be interpreted and that require very high computational power. Considering the validity of new pre-trained language models on extensive data for resolving many natural language processing and classification tasks, we decided to propose a BERT-based approach (BERT-DNN) also for the author profiling task. In a first analysis, we compared the results obtained by our model with them of more classical approaches. As a follow, a critical analysis was carried out. We analyze the advantages and disadvantages of these approaches also in terms of resources needed to run them. The results obtained by our model are encouraging in terms of reliability but very disappointing if we consider the computational power required for running it.
2020
978-3-030-58810-6
978-3-030-58811-3
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11586/379384
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 12
  • ???jsp.display-item.citation.isi??? 9
social impact