The Italian Public Administration (PA) relies on costly manual analyses to ensure the GDPR compliance of public documents and secure personal data. Despite recent advances in Artificial Intelligence (AI) have benefited many legal fields, the automation of workflows for data protection of public documents is still only marginally affected. The main aim of this work is to design a framework that can be effectively adopted to check whether PA documents written in Italian meet the GDPR requirements. The main outcome of our interdisciplinary research is INTREPID (artI ficial iNT elligence for gdpR compliancE of P ublic admI nistration D ocuments), an AI-based framework that can help the Italian PA to ensure GDPR compliance of public documents. INTREPID is realized by tuning some linguistic resources for Italian language processing (i.e. SpaCy and Tint) to the GDPR intelligence. In addition, we set the foundations for a text classification methodology to recognise the public documents published by the Italian PA, which perform data breaches. We show the effectiveness of the framework over a text corpus of public documents that were published online by the Italian PA. We also perform an inter-annotator study and analyse the agreement of the annotation predictions of the proposed methodology with the annotations by domain experts. Finally, we evaluate the accuracy of the proposed text classification model in detecting breaches of security.

An AI framework to support decisions on GDPR compliance

Basile, Pierpaolo;Appice, Annalisa
;
de Gemmis, Marco;Malerba, Donato;Semeraro, Giovanni
2023-01-01

Abstract

The Italian Public Administration (PA) relies on costly manual analyses to ensure the GDPR compliance of public documents and secure personal data. Despite recent advances in Artificial Intelligence (AI) have benefited many legal fields, the automation of workflows for data protection of public documents is still only marginally affected. The main aim of this work is to design a framework that can be effectively adopted to check whether PA documents written in Italian meet the GDPR requirements. The main outcome of our interdisciplinary research is INTREPID (artI ficial iNT elligence for gdpR compliancE of P ublic admI nistration D ocuments), an AI-based framework that can help the Italian PA to ensure GDPR compliance of public documents. INTREPID is realized by tuning some linguistic resources for Italian language processing (i.e. SpaCy and Tint) to the GDPR intelligence. In addition, we set the foundations for a text classification methodology to recognise the public documents published by the Italian PA, which perform data breaches. We show the effectiveness of the framework over a text corpus of public documents that were published online by the Italian PA. We also perform an inter-annotator study and analyse the agreement of the annotation predictions of the proposed methodology with the annotations by domain experts. Finally, we evaluate the accuracy of the proposed text classification model in detecting breaches of security.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11586/421491
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact