In this paper we face the problem of intelligently analyze Twitter data. We propose a novel workflow based on Nonnegative Matrix Factorization (NMF) to collect, organize and analyze Twitter data. The proposed workflow firstly fetches tweets from Twitter (according to some search criteria) and processes them using text mining techniques; then it is able to extract latent features from tweets by using NMF, and finally it clusters tweets and extracts human-interpretable topics. We report some preliminary experiments demonstrating the effectiveness of the proposed workflow as a tool for Intelligent Data Analysis (IDA), indeed it is able to extract and visualize interpretable topics from some newly collected Twitter datasets, that are automatically grouped together according to these topics. Furthermore, we numerically investigate the influence of different initializations mechanisms for NMF algorithms on the factorization results when very sparse Twitter data are considered. The numerical comparisons confirm that NMF algorithms can be used as clustering method in place of the well known k-means.

Intelligent Twitter Data Analysis Based on Nonnegative Matrix Factorizations

CASALINO, GABRIELLA;CASTIELLO, CIRO;DEL BUONO, Nicoletta;MENCAR, CORRADO
2017-01-01

Abstract

In this paper we face the problem of intelligently analyze Twitter data. We propose a novel workflow based on Nonnegative Matrix Factorization (NMF) to collect, organize and analyze Twitter data. The proposed workflow firstly fetches tweets from Twitter (according to some search criteria) and processes them using text mining techniques; then it is able to extract latent features from tweets by using NMF, and finally it clusters tweets and extracts human-interpretable topics. We report some preliminary experiments demonstrating the effectiveness of the proposed workflow as a tool for Intelligent Data Analysis (IDA), indeed it is able to extract and visualize interpretable topics from some newly collected Twitter datasets, that are automatically grouped together according to these topics. Furthermore, we numerically investigate the influence of different initializations mechanisms for NMF algorithms on the factorization results when very sparse Twitter data are considered. The numerical comparisons confirm that NMF algorithms can be used as clustering method in place of the well known k-means.
2017
978-331962391-7
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11586/196939
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 8
  • ???jsp.display-item.citation.isi??? 7
social impact