Nowadays, big data is available in many areas of science, so the need to summarize data sets into groups and extract information is important. Using the cluster analysis technique, we can explore such data based on their similarity. The degree of similarity in the data is quantitatively represented by distance functions. In this paper, using Ward’s method of cosine distance in a database with 100 Albanian texts into 16 different clusters based on the frequency of words, with 87 percent of texts well classified by author.
An agglomerative hierarchical clustering method for text in Albanian
Najada Firza;
2023-01-01
Abstract
Nowadays, big data is available in many areas of science, so the need to summarize data sets into groups and extract information is important. Using the cluster analysis technique, we can explore such data based on their similarity. The degree of similarity in the data is quantitatively represented by distance functions. In this paper, using Ward’s method of cosine distance in a database with 100 Albanian texts into 16 different clusters based on the frequency of words, with 87 percent of texts well classified by author.File in questo prodotto:
Non ci sono file associati a questo prodotto.
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.