A framework for intelligent Twitter data analysis with non-negative matrix factorization

IRIS

Purpose The purpose of this paper is to propose a framework for intelligent analysis of Twitter data. The purpose of the framework is to allow users to explore a collection of tweets by extracting topics with semantic relevance. In this way, it is possible to detect groups of tweets related to new technologies, events and other topics that are automatically discovered. Design/methodology/approach The framework is based on a three-stage process. The first stage is devoted to dataset creation by transforming a collection of tweets in a dataset according to the vector space model. The second stage, which is the core of the framework, is centered on the use of non-negative matrix factorizations (NMF) for extracting human-interpretable topics from tweets that are eventually clustered. The number of topics can be user-defined or can be discovered automatically by applying subtractive clustering as a preliminary step before factorization. Cluster analysis and word-cloud visualization are used in the last stage to enable intelligent data analysis. Findings The authors applied the framework to a case study of three collections of Italian tweets both with manual and automatic selection of the number of topics. Given the high sparsity of Twitter data, the authors also investigated the influence of different initializations mechanisms for NMF on the factorization results. Numerical comparisons confirm that NMF could be used for clustering as it is comparable to classical clustering techniques such as spherical k-means. Visual inspection of the word-clouds allowed a qualitative assessment of the results that confirmed the expected outcomes. Originality/value The proposed framework enables a collaborative approach between users and computers for an intelligent analysis of Twitter data. Users are faced with interpretable descriptions of tweet clusters, which can be interactively refined with few adjustable parameters. The resulting clusters can be used for intelligent selection of tweets, as well as for further analytics concerning the impact of products, events, etc. in the social network.

A framework for intelligent Twitter data analysis with non-negative matrix factorization

Gabriella Casalino;Ciro Castiello;Nicoletta Del Buono;Corrado Mencar

2018-01-01

Abstract

Purpose The purpose of this paper is to propose a framework for intelligent analysis of Twitter data. The purpose of the framework is to allow users to explore a collection of tweets by extracting topics with semantic relevance. In this way, it is possible to detect groups of tweets related to new technologies, events and other topics that are automatically discovered. Design/methodology/approach The framework is based on a three-stage process. The first stage is devoted to dataset creation by transforming a collection of tweets in a dataset according to the vector space model. The second stage, which is the core of the framework, is centered on the use of non-negative matrix factorizations (NMF) for extracting human-interpretable topics from tweets that are eventually clustered. The number of topics can be user-defined or can be discovered automatically by applying subtractive clustering as a preliminary step before factorization. Cluster analysis and word-cloud visualization are used in the last stage to enable intelligent data analysis. Findings The authors applied the framework to a case study of three collections of Italian tweets both with manual and automatic selection of the number of topics. Given the high sparsity of Twitter data, the authors also investigated the influence of different initializations mechanisms for NMF on the factorization results. Numerical comparisons confirm that NMF could be used for clustering as it is comparable to classical clustering techniques such as spherical k-means. Visual inspection of the word-clouds allowed a qualitative assessment of the results that confirmed the expected outcomes. Originality/value The proposed framework enables a collaborative approach between users and computers for an intelligent analysis of Twitter data. Users are faced with interpretable descriptions of tweet clusters, which can be interactively refined with few adjustable parameters. The resulting clusters can be used for intelligent selection of tweets, as well as for further analytics concerning the impact of products, events, etc. in the social network.

Scheda breve

Scheda completa

Scheda completa (DC)

Anno

2018

Appare nelle tipologie:

1.1 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
framework.pdf non disponibili Tipologia: Documento in Versione Editoriale Licenza: NON PUBBLICO - Accesso privato/ristretto Dimensione 1.59 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	1.59 MB	Adobe PDF	Visualizza/Apri Richiedi una copia
framework-intelligent-twitter(con-DOI).pdf accesso aperto Descrizione: versione pre-print dell'articolo con DOI Tipologia: Documento in Pre-print Licenza: Creative commons Dimensione 2.41 MB Formato Adobe PDF Visualizza/Apri	2.41 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11586/223791

Citazioni

ND

27

17

social impact