An Approach to Trade-off Privacy and Classification Accuracy in Machine Learning Processes

IRIS

Machine learning techniques applied to large and distributed data archives might result in the disclosure of sensitive information. Data often contain sensitive identifiable information, and even if these are protected, the excessive processing capabilities of current machine learning techniques might facilitate the identification of individuals. This discussion paper presents a decision-support framework for data anonymization. The latter relies on a novel approach that exploits data correlations, expressed in terms of relaxed functional dependencies (rfds), to identify data anonymization strategies for providing suitable trade-offs between privacy and data utility. It also permits to generate anonymization strategies leveraging multiple data correlations simultaneously to increase the utility of anonymized datasets. In addition, our framework provides support in the selection of the anonymization strategies by enabling an understanding of the trade-offs between privacy and data utility offered by the obtained strategies. Experiments on real-life datasets show that our approach achieves promising results in data utility while guaranteeing the desired privacy level. Additionally, it allows data owners to select anonymization strategies balancing their privacy and data utility requirements.

An Approach to Trade-off Privacy and Classification Accuracy in Machine Learning Processes

Desiato Domenico^Methodology;Zannone Nicola^Validation

2023-01-01

Abstract

Machine learning techniques applied to large and distributed data archives might result in the disclosure of sensitive information. Data often contain sensitive identifiable information, and even if these are protected, the excessive processing capabilities of current machine learning techniques might facilitate the identification of individuals. This discussion paper presents a decision-support framework for data anonymization. The latter relies on a novel approach that exploits data correlations, expressed in terms of relaxed functional dependencies (rfds), to identify data anonymization strategies for providing suitable trade-offs between privacy and data utility. It also permits to generate anonymization strategies leveraging multiple data correlations simultaneously to increase the utility of anonymized datasets. In addition, our framework provides support in the selection of the anonymization strategies by enabling an understanding of the trade-offs between privacy and data utility offered by the obtained strategies. Experiments on real-life datasets show that our approach achieves promising results in data utility while guaranteeing the desired privacy level. Additionally, it allows data owners to select anonymization strategies balancing their privacy and data utility requirements.

Scheda breve

Scheda completa

Scheda completa (DC)

Anno

2023

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11586/487027

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

0

ND

social impact