Non-negative matrix factorization is a multivariate analysis method which is proven to be useful in many areas such as bio-informatics, molecular pattern discovery, pattern recognition, document clustering and so on. It seeks a reduced representation of a multivariate data matrix into the product of basis and encoding matrices possessing only non-negative elements, in order to learn the so called part-based representations of data. All algorithms for computing non-negative matrix factorization are iterative, therefore particular emphasis must be placed on a proper initialization of NMF because of its local convergence. The problem of selecting appropriate starting matrices becomes more complex when data possess special meaning as in document clustering. In this paper, we propose the adoption of the subtractive clustering algorithm as a scheme to generate initial matrices for non-negative matrix factorization algorithms. Comparisons with other commonly adopted initializations of non-negative matrix factorization algorithms have been performed and the proposed scheme reveals to be a good trade-off between effectiveness and speed. Moreover, the effectiveness of the proposed initialization to suggest a number of basis for NMF, when data distances are estimated, is illustrated when NMF is used for solving clustering problems where the number of groups in which the data are grouped is not known a-priori. The influence of a proper rank factor on the interpretability and the effectiveness of the results are also discussed.

Subtractive clustering for seeding non-negative matrix factorizations

CASALINO, GABRIELLA;DEL BUONO, Nicoletta;MENCAR, CORRADO
2014-01-01

Abstract

Non-negative matrix factorization is a multivariate analysis method which is proven to be useful in many areas such as bio-informatics, molecular pattern discovery, pattern recognition, document clustering and so on. It seeks a reduced representation of a multivariate data matrix into the product of basis and encoding matrices possessing only non-negative elements, in order to learn the so called part-based representations of data. All algorithms for computing non-negative matrix factorization are iterative, therefore particular emphasis must be placed on a proper initialization of NMF because of its local convergence. The problem of selecting appropriate starting matrices becomes more complex when data possess special meaning as in document clustering. In this paper, we propose the adoption of the subtractive clustering algorithm as a scheme to generate initial matrices for non-negative matrix factorization algorithms. Comparisons with other commonly adopted initializations of non-negative matrix factorization algorithms have been performed and the proposed scheme reveals to be a good trade-off between effectiveness and speed. Moreover, the effectiveness of the proposed initialization to suggest a number of basis for NMF, when data distances are estimated, is illustrated when NMF is used for solving clustering problems where the number of groups in which the data are grouped is not known a-priori. The influence of a proper rank factor on the interpretability and the effectiveness of the results are also discussed.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11586/130807
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 50
  • ???jsp.display-item.citation.isi??? 43
social impact