Cluster-based Diversity Over-sampling: A Density and Diversity Oriented Synthetic Over-sampling for Imbalanced Data

Yuxuan Yang, Hadi Khorshidi, Uwe Aickelin

2022

Abstract

In many real-life classification tasks, the issue of imbalanced data is commonly observed. The workings of mainstream machine learning algorithms typically assume the classes amongst underlying datasets are relatively well-balanced. The failure of this assumption can lead to a biased representation of the models’ performance. This has encouraged the incorporation of re-sampling techniques to generate more balanced datasets. However, mainstream re-sampling methods fail to account for the distribution of minority data and the diversity within generated instances. Therefore, in this paper, we propose a data-generation algorithm, Cluster-based Diversity Over-sampling (CDO), to consider minority instance distribution during the process of data generation. Diversity optimisation is utilised to promote diversity within the generated data. We have conducted extensive experiments on synthetic and real-world datasets to evaluate the performance of CDO in comparison with SMOTE-based and diversity-based methods (DADO, DIWO, BL-SMOTE, DB-SMOTE, and MAHAKIL). The experiments show the superiority of CDO.

Download


Paper Citation


in Harvard Style

Yang Y., Khorshidi H. and Aickelin U. (2022). Cluster-based Diversity Over-sampling: A Density and Diversity Oriented Synthetic Over-sampling for Imbalanced Data. In Proceedings of the 14th International Joint Conference on Computational Intelligence (IJCCI 2022) - Volume 1: ECTA; ISBN 978-989-758-611-8, SciTePress, pages 17-28. DOI: 10.5220/0011381000003332


in Bibtex Style

@conference{ecta22,
author={Yuxuan Yang and Hadi Khorshidi and Uwe Aickelin},
title={Cluster-based Diversity Over-sampling: A Density and Diversity Oriented Synthetic Over-sampling for Imbalanced Data},
booktitle={Proceedings of the 14th International Joint Conference on Computational Intelligence (IJCCI 2022) - Volume 1: ECTA},
year={2022},
pages={17-28},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0011381000003332},
isbn={978-989-758-611-8},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 14th International Joint Conference on Computational Intelligence (IJCCI 2022) - Volume 1: ECTA
TI - Cluster-based Diversity Over-sampling: A Density and Diversity Oriented Synthetic Over-sampling for Imbalanced Data
SN - 978-989-758-611-8
AU - Yang Y.
AU - Khorshidi H.
AU - Aickelin U.
PY - 2022
SP - 17
EP - 28
DO - 10.5220/0011381000003332
PB - SciTePress