Undersampling with Support Vectors for Multi-Class Imbalanced Data Classification

IRIS

Learning from imbalanced data poses significant challenges for the classifier. This becomes even more difficult, when dealing with multi-class problems. Here relationships among classes are no longer well-defined and it is easy to loose performance on one of the classes while gaining on other. In last years this topic has gained increased interest from the machine learning community - however, still there is a need for developing new and efficient algorithms to handle this challenge. In this paper we propose a new approach for balancing multi-class imbalanced problems. It is based on a two-step undersampling methodology. In the first step, a one-class classifier is being trained on each of the classes, achieving skew-insensitive data description. Support vectors for each class are extracted and used as new class representatives, thus achieving significant reduction in the terms of used instances. In the second step, an evolutionary undersampling approach is being used on these support vectors in order to further balance the training set. By applying this technique on a set of support vectors and not on a full dataset, we achieve a significant reduction of the computational time and increased accuracy. Finally, a standard multi-class classifier is being trained on the balanced data set. A thorough experimental study proves the usefulness of the proposed approach in comparison with state-of-the-art approaches for handling multi-class imbalanced data.

Undersampling with Support Vectors for Multi-Class Imbalanced Data Classification

Bartosz Krawczyk;Colin Bellinger;Roberto Corizzo;Nathalie Japkowicz

2021-01-01

Abstract

Learning from imbalanced data poses significant challenges for the classifier. This becomes even more difficult, when dealing with multi-class problems. Here relationships among classes are no longer well-defined and it is easy to loose performance on one of the classes while gaining on other. In last years this topic has gained increased interest from the machine learning community - however, still there is a need for developing new and efficient algorithms to handle this challenge. In this paper we propose a new approach for balancing multi-class imbalanced problems. It is based on a two-step undersampling methodology. In the first step, a one-class classifier is being trained on each of the classes, achieving skew-insensitive data description. Support vectors for each class are extracted and used as new class representatives, thus achieving significant reduction in the terms of used instances. In the second step, an evolutionary undersampling approach is being used on these support vectors in order to further balance the training set. By applying this technique on a set of support vectors and not on a full dataset, we achieve a significant reduction of the computational time and increased accuracy. Finally, a standard multi-class classifier is being trained on the balanced data set. A thorough experimental study proves the usefulness of the proposed approach in comparison with state-of-the-art approaches for handling multi-class imbalanced data.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2021
			
	Codice ISBN
	
				978-1-6654-3900-8
			
	Appare nelle tipologie:
	
				4.1 Contributo in Atti di convegno

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11586/373337

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

ND

ND

social impact