As the number of online financial transactions increases, the problem of credit card fraud detection has become quite urgent. Machine learning methods, including supervised and unsupervised approaches, have been proven to be effective to detect fraudulent activities. In our previous work presented at EUSFLAT2019 we proposed the use of an incremental semi-supervised fuzzy clustering that processes both labeled and unlabeled data as a stream to create a classification model for credit card fraud detection. However, we observed that the results of the method were affected by data unbalancement. Indeed credit card fraud data are highly imbalanced since the number of fraudulent activities is far less than the genuine ones. In this work, to deal with the high data unbalance, different resampling methods are investigated and their empirical comparison is reported.
Balancing Data Within Incremental Semi-supervised Fuzzy Clustering for Credit Card Fraud Detection
Casalino, Gabriella;Castellano, Giovanna;
2021-01-01
Abstract
As the number of online financial transactions increases, the problem of credit card fraud detection has become quite urgent. Machine learning methods, including supervised and unsupervised approaches, have been proven to be effective to detect fraudulent activities. In our previous work presented at EUSFLAT2019 we proposed the use of an incremental semi-supervised fuzzy clustering that processes both labeled and unlabeled data as a stream to create a classification model for credit card fraud detection. However, we observed that the results of the method were affected by data unbalancement. Indeed credit card fraud data are highly imbalanced since the number of fraudulent activities is far less than the genuine ones. In this work, to deal with the high data unbalance, different resampling methods are investigated and their empirical comparison is reported.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.