Table 2: Summarised ranges of the negative impact of each partitioning strategy on the examined algorithms and datasets (L,
M, H indicate low (<5%), medium (<10%), high (≥10%) impact, respectively, and 0 indicates no noticeable impact or results
that are better than those obtained for uniform data distribution; for non-deterministic algorithms, separate differences were
taken into account for both mean and maximum values collected from multiple executions; bold cells indicate potentially high
impact).
Main non-IID category
of data partitioning strategy
Naive Bayes DN-SVM DK-means Opt-DKM LCT DP-BIRCH
Covariate shift 0 0 - L 0 - M 0 - H 0 - H 0 - H
Concept shift M-H H 0 0 - L 0 - H 0 - L
Concept drift 0 0 - M 0 - L 0 - H 0 - H L - H
Prior probability shift 0 0 - H 0 - L 0 - H 0 - H 0 - M
Unbalancedness 0 0 0 - L 0 - L 0 - H L - H
REFERENCES
Aouad, L. M., Le-Khac, N.-A., and Kechadi, T. M. (2007).
Lightweight clustering technique for distributed data
mining applications. In Industrial Conference on Data
Mining, pages 120–134. Springer.
Arthur, D. and Vassilvitskii, S. (2006). k-means++: The
advantages of careful seeding. Technical report, Stan-
ford.
Bouraqqadi, H., Berrag, A., Mhaouach, M., Bouhoute,
A., Fardousse, K., and Berrada, I. (2021).
Pyfed: extending PySyft with N-IID Feder-
ated Learning Benchmark. Proceedings of the
Canadian Conference on Artificial Intelligence.
https://caiac.pubpub.org/pub/7yr5bkck.
Caldas, S., Duddu, S. M. K., Wu, P., Li, T., Kone
ˇ
cn
`
y,
J., McMahan, H. B., Smith, V., and Talwalkar, A.
(2018). Leaf: A benchmark for federated settings.
arXiv preprint arXiv:1812.01097.
Grigorescu, S. M. (2018). Generative one-shot learn-
ing (gol): A semi-parametric approach to one-shot
learning in autonomous vision. In 2018 IEEE In-
ternational Conference on Robotics and Automation
(ICRA), pages 7127–7134. IEEE.
Guha, S., Rastogi, R., and Shim, K. (1998). Cure: An ef-
ficient clustering algorithm for large databases. ACM
Sigmod record, 27(2):73–84.
Hsieh, K., Phanishayee, A., Mutlu, O., and Gibbons, P.
(2020). The non-iid data quagmire of decentralized
machine learning. In International Conference on Ma-
chine Learning, pages 4387–4398. PMLR.
Hu, S., Li, Y., Liu, X., Li, Q., Wu, Z., and He, B.
(2020). The oarf benchmark suite: Characterization
and implications for federated learning systems. arXiv
preprint arXiv:2006.07856.
Ji, G. and Ling, X. (2007). Ensemble learning based dis-
tributed clustering. In Pacific-Asia Conference on
Knowledge Discovery and Data Mining, pages 312–
321. Springer.
Kairouz, P., McMahan, H. B., Avent, B., Bellet, A., Bennis,
M., Bhagoji, A. N., Bonawitz, K., Charles, Z., Cor-
mode, G., Cummings, R., et al. (2019). Advances and
open problems in federated learning. arXiv preprint
arXiv:1912.04977.
Li, Q., Diao, Y., Chen, Q., and He, B. (2021a). Federated
learning on non-iid data silos: An experimental study.
arXiv preprint arXiv:2102.02079.
Li, Q., Wen, Z., Wu, Z., Hu, S., Wang, N., Li, Y., Liu,
X., and He, B. (2021b). A survey on federated learn-
ing systems: vision, hype and reality for data privacy
and protection. IEEE Transactions on Knowledge and
Data Engineering.
Lim
´
on, X., Guerra-Hern
´
andez, A., Cruz-Ram
´
ırez, N., and
Grimaldo, F. (2019). Modeling and implementing dis-
tributed data mining strategies in jaca-ddm. Knowl-
edge and Information Systems, 60(1):99–143.
Liu, L., Zhang, F., Xiao, J., and Wu, C. (2020). Evaluation
framework for large-scale federated learning. arXiv
preprint arXiv:2003.01575.
Luo, J., Wu, X., Luo, Y., Huang, A., Huang, Y., Liu, Y.,
and Yang, Q. (2019). Real-world image datasets for
federated learning. arXiv preprint arXiv:1910.11089.
Markiewicz, M. and Koperwas, J. (2019). Hybrid
partitioning-density algorithm for k-means cluster-
ing of distributed data utilizing optics. International
Journal of Data Warehousing and Mining (IJDWM),
15(4):1–20.
Markiewicz, M. and Koperwas, J. (2022). Evaluation
platform for ddm algorithms with the usage of non-
uniform data distribution strategies. International
Journal of Information Technologies and Systems Ap-
proach (IJITSA), 15(1):1–23.
Navia-V
´
azquez, A., Gutierrez-Gonzalez, D., Parrado-
Hern
´
andez, E., and Navarro-Abellan, J. (2006). Dis-
tributed support vector machines. IEEE Transactions
on Neural Networks, 17(4):1091.
Sattler, F., Wiedemann, S., M
¨
uller, K.-R., and Samek, W.
(2019). Robust and communication-efficient feder-
ated learning from non-iid data. IEEE transactions on
neural networks and learning systems, 31(9):3400–
3413.
Zhang, T., Ramakrishnan, R., and Livny, M. (1997). Birch:
A new data clustering algorithm and its applications.
Data Mining and Knowledge Discovery, 1(2):141–
182.
Zhu, H., Xu, J., Liu, S., and Jin, Y. (2021). Federated
learning on non-iid data: A survey. arXiv preprint
arXiv:2106.06843.
ICSOFT 2022 - 17th International Conference on Software Technologies
318