Learning from imbalanced data poses significant challenges for the classifier. This becomes even more difficult, when dealing with multi-class problems. Here relationships among classes are no longer well-defined and it is easy to loose performance on one of the classes while gaining on other. In last years this topic has gained increased interest from the machine learning community - however, still there is a need for developing new and efficient algorithms to handle this challenge. In this paper we propose a new approach for balancing multi-class imbalanced problems. It is based on a two-step undersampling methodology. In the first step, a one-class classifier is being trained on each of the classes, achieving skew-insensitive data description. Support vectors for each class are extracted and used as new class representatives, thus achieving significant reduction in the terms of used instances. In the second step, an evolutionary undersampling approach is being used on these support vectors in order to further balance the training set. By applying this technique on a set of support vectors and not on a full dataset, we achieve a significant reduction of the computational time and increased accuracy. Finally, a standard multi-class classifier is being trained on the balanced data set. A thorough experimental study proves the usefulness of the proposed approach in comparison with state-of-the-art approaches for handling multi-class imbalanced data.

Undersampling with Support Vectors for Multi-Class Imbalanced Data Classification

Roberto Corizzo;
2021-01-01

Abstract

Learning from imbalanced data poses significant challenges for the classifier. This becomes even more difficult, when dealing with multi-class problems. Here relationships among classes are no longer well-defined and it is easy to loose performance on one of the classes while gaining on other. In last years this topic has gained increased interest from the machine learning community - however, still there is a need for developing new and efficient algorithms to handle this challenge. In this paper we propose a new approach for balancing multi-class imbalanced problems. It is based on a two-step undersampling methodology. In the first step, a one-class classifier is being trained on each of the classes, achieving skew-insensitive data description. Support vectors for each class are extracted and used as new class representatives, thus achieving significant reduction in the terms of used instances. In the second step, an evolutionary undersampling approach is being used on these support vectors in order to further balance the training set. By applying this technique on a set of support vectors and not on a full dataset, we achieve a significant reduction of the computational time and increased accuracy. Finally, a standard multi-class classifier is being trained on the balanced data set. A thorough experimental study proves the usefulness of the proposed approach in comparison with state-of-the-art approaches for handling multi-class imbalanced data.
2021
978-1-6654-3900-8
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11586/373337
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact