Table 2: Experiments With Different Up-sampling Techniques in Learning By Oversampling.
XGBoost with Different Up-sampling Methods TPR(%) FPR(%)
XGBoost + SMOTE (Bunkhumpornpat et al., 2009) 77.1 ± 0.01 10.4 ± 0.01
XGBoost + AdaSyn-SMOTE (Gameng et al., 2019) 77.0 ± 0.01 10.6 ± 0.01
XGBoost + cGAN (Douzas and Bac¸
˜
ao, 2017) 78.5 ± 0.01 9.4 ± 0.01
XGBoost + DOS (Ando and Huang, 2017) 78.6 ± 0.01 9.3 ± 0.01
XGBoost + GAMO (Mullick et al., 2019) 78.9 ± 0.01 9.1 ± 0.01
it takes a toll on the generalization error of the model.
An intuitive reason behind this is the fact that inaccu-
rate imputation of data with high degree of sparsity,
significantly alters the distribution of the data after
imputation. It eventually results in the model learn-
ing a distribution which is significantly different from
the ground-truth of the distribution. We observed
that complex model-based imputation methods, such
as MINWAE (Mattei and Frellsen, 2019), yield bet-
ter true and false positive rate from the same model
in comparison to simple mean or median imputation
methods.
9 CONCLUSION
In this paper, we introduce a human-centered AI
based change risk assessment system which aims to
bridge the gap between model-based assessment of
change risks and the assessment by the domain ex-
perts. While designing the system, we faced many
challenges, such as, extreme class imbalance, gradual
concept drifts, model selection, explaining the predic-
tions for user adoption, scaling at an industrial level.
We also elaborate on how this system created business
impact post deployment. In near future, we will ex-
plore an active-learning based framework to leverage
the experts’ feedback more effectively.
REFERENCES
Ando, S. and Huang, C. (2017). Deep over-sampling
framework for classifying imbalanced data. In ECML
PKDD, volume 10534 of Lecture Notes in Computer
Science, pages 770–785. Springer.
Andrey Malinin, L. P. and Ustimenko, A. (2021). Uncer-
tainty in gradient boosting via ensembles. In ICLR.
Bunkhumpornpat, C., Sinapiromsaran, K., and Lursinsap,
C. (2009). Safe-level-SMOTE: Safe-level-synthetic
minority over-sampling technique for handling the
class imbalance problem. In Advances in Knowledge
Discovery and Data Mining, pages 475–482.
Chen, T. and Guestrin, C. (2016). XGBoost: A scalable tree
boosting system. In KDD, pages 785–794.
David, D. B., Resheff, Y. S., and Tron, T. (2021). Explain-
able AI and adoption of financial algorithmic advi-
sors: An experimental study. In AIES, pages 390–400.
Douzas, G. and Bac¸
˜
ao, F. (2017). Effective data generation
for imbalanced learning using conditional generative
adversarial networks. Expert Systems with Applica-
tions, 91.
Gade, K., Geyik, S. C., Kenthapadi, K., Mithal, V., and Taly,
A. (2019). Explainable AI in industry. In KDD, pages
3203–3204.
Gal, Y. (2016). Uncertainty in Deep Learning. PhD thesis,
University of Cambridge.
Gameng, H. A., Gerardo, B. B., and Medina, R. P. (2019).
Modified adaptive synthetic SMOTE to improve clas-
sification performance in imbalanced datasets. In IC-
ETAS, pages 1–5.
Lai, Y., Shi, Y., Han, Y., Shao, Y., Qi, M., and Li, B. (2021).
Exploring uncertainty in deep learning for construc-
tion of prediction intervals. CoRR, abs/2104.12953.
Leo Breiman, Jerome Friedman, C. J. S. R. O. (1984).
Classification and Regression Trees. Chapman and
Hall/CRC.
Lundberg, S. M. and Lee, S. (2017). A unified approach
to interpreting model predictions. In NeurIPS, pages
4765–4774.
Mattei, P.-A. and Frellsen, J. (2019). MIWAE: Deep gen-
erative modelling and imputation of incomplete data
sets. In ICML, volume 97 of Proceedings of Machine
Learning Research, pages 4413–4423. PMLR.
Molnar, C., Casalicchio, G., and Bischl, B. (2020). Inter-
pretable machine learning - A brief history, state-of-
the-art and challenges. In ECML PKDD, volume 1323
of Communications in Computer and Information Sci-
ence, pages 417–431.
Mullick, S. S., Datta, S., and Das, S. (2019). Generative
adversarial minority oversampling. In ICCV, pages
1695–1704.
Ribeiro, M. T., Singh, S., and Guestrin, C. (2016). ”Why
should I trust you?”: Explaining the predictions of any
classifier. In KDD, pages 1135–1144.
Wu, J., Chen, X.-Y., Zhang, H., Xiong, L.-D., Lei, H., and
Deng, S.-H. (2019). Hyperparameter optimization for
machine learning models based on Bayesian optimiza-
tion. Journal of Electronic Science and Technology,
17(1):26–40.
Zaharia, M., Chen, A., Davidson, A., Ghodsi, A., Hong,
S. A., Konwinski, A., Murching, S., Nykodym, T.,
Ogilvie, P., Parkhe, M., Xie, F., and Zumar, C.
(2018). Accelerating the machine learning lifecycle
with mlflow. IEEE Data Eng. Bull., 41(4):39–45.
ICAART 2022 - 14th International Conference on Agents and Artificial Intelligence
662