In recent years, there has been a significant increase in interest in lexical semantic change detection. Many are the existing approaches, data used, and evaluation strategies to detect semantic shifts. The classification of change words against stable words requires thresholds to label the degree of semantic change. In this work, we compare state-of-the-art computational historical linguistics approaches to evaluate the efficacy of thresholds based on the Gaussian Distribution of semantic shifts. We present the results of an in-depth analysis conducted on both SemEval-2020 Task 1 Subtask 1 and DIACR-Ita tasks. Specifically, we compare Temporal Random Indexing, Temporal Referencing, Orthogonal Procrustes Alignment, Dynamic Word Embeddings and Temporal Word Embedding with a Compass. While results obtained with Gaussian thresholds achieve state-of-the-art performance in English, German, Swedish and Italian, they remain far from results obtained using the optimal threshold.
Analyzing Gaussian distribution of semantic shifts in Lexical Semantic Change Models
Cassotti, Pierluigi;Basile, Pierpaolo;Gemmis, Marco;Semeraro, Giovanni
2020-01-01
Abstract
In recent years, there has been a significant increase in interest in lexical semantic change detection. Many are the existing approaches, data used, and evaluation strategies to detect semantic shifts. The classification of change words against stable words requires thresholds to label the degree of semantic change. In this work, we compare state-of-the-art computational historical linguistics approaches to evaluate the efficacy of thresholds based on the Gaussian Distribution of semantic shifts. We present the results of an in-depth analysis conducted on both SemEval-2020 Task 1 Subtask 1 and DIACR-Ita tasks. Specifically, we compare Temporal Random Indexing, Temporal Referencing, Orthogonal Procrustes Alignment, Dynamic Word Embeddings and Temporal Word Embedding with a Compass. While results obtained with Gaussian thresholds achieve state-of-the-art performance in English, German, Swedish and Italian, they remain far from results obtained using the optimal threshold.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.