An agglomerative hierarchical clustering method for text in Albanian

IRIS

Nowadays, big data is available in many areas of science, so the need to summarize data sets into groups and extract information is important. Using the cluster analysis technique, we can explore such data based on their similarity. The degree of similarity in the data is quantitatively represented by distance functions. In this paper, using Ward’s method of cosine distance in a database with 100 Albanian texts into 16 different clusters based on the frequency of words, with 87 percent of texts well classified by author.

An agglomerative hierarchical clustering method for text in Albanian

Luela Prifti;Najada Firza;Denisa Salillari

2023-01-01

Abstract

Nowadays, big data is available in many areas of science, so the need to summarize data sets into groups and extract information is important. Using the cluster analysis technique, we can explore such data based on their similarity. The degree of similarity in the data is quantitatively represented by distance functions. In this paper, using Ward’s method of cosine distance in a database with 100 Albanian texts into 16 different clusters based on the frequency of words, with 87 percent of texts well classified by author.

Scheda breve

Scheda completa

Scheda completa (DC)

Anno

2023

Codice ISBN

978-9-92-880528-7

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11586/475940

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

ND

ND

social impact