A diachronic Italian corpus based on “L’Unità”

Basile, P.; Caputo, A.; Caselli, T.; Cassotti, P.; Varvara, R.

doi:10.4000/books.aaccademia.8245

In this paper, we describe the creation of a diachronic corpus for Italian by exploiting the digital archive of the newspaper “L’Unità”. We automatically clean and annotate the corpus with PoS tags, lemmas, named entities and syntactic dependencies. Moreover, we compute frequency-based time series for tokens, lemmas and entities. We show some interesting corpus statistics taking into account the temporal dimension and describe some examples of usage of time series.

A diachronic Italian corpus based on “L’Unità”

Basile P.;Caputo A.;Caselli T.;Cassotti P.;Varvara R.

2020-01-01

Abstract

In this paper, we describe the creation of a diachronic corpus for Italian by exploiting the digital archive of the newspaper “L’Unità”. We automatically clean and annotate the corpus with PoS tags, lemmas, named entities and syntactic dependencies. Moreover, we compute frequency-based time series for tokens, lemmas and entities. We show some interesting corpus statistics taking into account the temporal dimension and describe some examples of usage of time series.

Scheda breve

Scheda completa

Scheda completa (DC)

Anno

2020

Appare nelle tipologie:

4.1 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
paper_44.pdf accesso aperto Tipologia: Documento in Versione Editoriale Licenza: Creative commons Dimensione 403.16 kB Formato Adobe PDF Visualizza/Apri	403.16 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11586/348725

Citazioni

ND

6

ND

A diachronic Italian corpus based on “L’Unità”

Basile P.;Caputo A.;Caselli T.;Cassotti P.;Varvara R.

2020-01-01

Abstract

Scheda breve Scheda completa Scheda completa (DC)

Informazioni

Citazioni

social impact

Conferma cancellazione

Scheda breve

Scheda completa

Scheda completa (DC)