Nowadays, big data is available in many areas of science, so the need to summarize data sets into groups and extract information is important. Using the cluster analysis technique, we can explore such data based on their similarity. The degree of similarity in the data is quantitatively represented by distance functions. In this paper, using Ward’s method of cosine distance in a database with 100 Albanian texts into 16 different clusters based on the frequency of words, with 87 percent of texts well classified by author.

An agglomerative hierarchical clustering method for text in Albanian

Najada Firza;
2023-01-01

Abstract

Nowadays, big data is available in many areas of science, so the need to summarize data sets into groups and extract information is important. Using the cluster analysis technique, we can explore such data based on their similarity. The degree of similarity in the data is quantitatively represented by distance functions. In this paper, using Ward’s method of cosine distance in a database with 100 Albanian texts into 16 different clusters based on the frequency of words, with 87 percent of texts well classified by author.
2023
978-9-92-880528-7
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11586/475940
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact