Empowering Client Selection with Local Knowledge Distillation for Efficient Federated Learning in Non-IID Data
DOI:
https://doi.org/10.5753/jisa.2025.5154Keywords:
Machine Learning, Federated Learning, Distributed ComputingAbstract
Federated Learning (FL) is a distributed approach in which multiple devices collaborate to train a shared, global model (GM). During its training, client devices must frequently communicate their gradients to the central server to update the GM weights. This incurs significant communication costs (bandwidth utilization and the number of messages exchanged). The heterogeneous nature of clients’ local datasets poses an extra challenge to the model training. In this sense, we introduce FedSeleKDistill, Federated Selection and Knowledge Distillation Algorithm, to decrease the overall communication costs. FedSeleKDistill is an innovative combination of: (i) client selection, and (ii) knowledge distillation approaches with three main objectives: (i) reducing the number of devices training at every round; (ii) decreasing the number of rounds until convergence; and (iii) mitigating the effect of client’s heterogeneous data on the GM effectiveness. In this paper, we extend the results obtained from the initial paper presenting FedSeleKDistill. The additional experimental evaluations on the MNIST and German Traffic Signs Benchmark datasets demonstrate that FedSeleKDistill is highly efficient in training the GM until convergence in heterogeneous FL. FedSeleKDistill reaches a higher accuracy score and faster convergence than state-of-the-art models. Our results also show higher performance when analyzing the accuracy scores on the clients’ local datasets.
Downloads
References
Amiri, M. M., Gunduz, D., Kulkarni, S. R., and Poor, H. V. (2020). Federated learning with quantized global model updates. arXiv preprint arXiv:2006.10672. DOI: 10.48550/arxiv.2006.10672.
Bernstein, J., Wang, Y.-X., Azizzadenesheli, K., and Anandkumar, A. (2018). signsgd: Compressed optimisation for non-convex problems. In International Conference on Machine Learning, pages 560-569. PMLR. DOI: 10.48550/arXiv.1802.04434.
Bucila, C., Caruana, R., and Niculescu-Mizil, A. (2006). Model compression. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '06, pages 535-541. ACM. DOI: 10.1145/1150402.1150464.
Cho, Y. J., Wang, J., and Joshi, G. (2020). Client selection in federated learning: Convergence analysis and power-of-choice selection strategies. arXiv preprint arXiv:2010.01243. DOI: 10.48550/arxiv.2010.01243.
de Souza, A. M., Bittencourt, L. F., Cerqueira, E., Loureiro, A. A., and Villas, L. A. (2023). Dispositivos, eu escolho vocês: Seleção de clientes adaptativa para comunicação eficiente em aprendizado federado. In Anais do XLI Simpósio Brasileiro de Redes de Computadores e Sistemas Distribuídos, pages 1-14. SBC. DOI: 10.5753/sbrc.2023.499.
de Souza, A. M., Maciel, F., da Costa, J. B., Bittencourt, L. F., Cerqueira, E., Loureiro, A. A., and Villas, L. A. (2024). Adaptive client selection with personalization for communication efficient federated learning. Ad Hoc Networks, page 103462. DOI: 10.2139/ssrn.4654118.
He, Y., Chen, Y., Yang, X., Zhang, Y., and Zeng, B. (2022). Class-wise adaptive self distillation for federated learning on non-iid data (student abstract). In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 12967-12968. DOI: 10.1609/aaai.v36i11.21620.
Hinton, G., Vinyals, O., and Dean, J. (2015). Distilling the knowledge in a neural network. DOI: 10.48550/arxiv.1503.02531.
LeCun, Y., Bottou, L., Bengio, Y., and Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278-2324. DOI: 10.1109/5.726791.
Lee, G., Jeong, M., Shin, Y., Bae, S., and Yun, S.-Y. (2022). Preservation of the global knowledge by not-true distillation in federated learning. Advances in Neural Information Processing Systems, 35:38461-38474. DOI: 10.48550/arxiv.2106.03097.
Li, T., Sahu, A. K., Talwalkar, A., and Smith, V. (2020). Federated learning: Challenges, methods, and future directions. IEEE signal processing magazine, 37(3):50-60. DOI: 10.1109/msp.2020.2975749.
Li, T., Sahu, A. K., Zaheer, M., Sanjabi, M., Talwalkar, A., and Smith, V. (2018). Federated optimization in heterogeneous networks. CoRR, abs/1812.06127. Available at:[link].
Lim, W. Y. B., Luong, N. C., Hoang, D. T., Jiao, Y., Liang, Y.-C., Yang, Q., Niyato, D., and Miao, C. (2020). Federated learning in mobile edge networks: A comprehensive survey. IEEE Communications Surveys & Tutorials, 22(3):2031-2063. DOI: 10.1109/COMST.2020.2986024.
Lin, T., Kong, L., Stich, S. U., and Jaggi, M. (2020a). Ensemble distillation for robust model fusion in federated learning. In Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., and Lin, H., editors, Advances in Neural Information Processing Systems, volume 33, pages 2351-2363. Curran Associates, Inc. Available at:[link].
Lin, Y., Han, S., Mao, H., Wang, Y., and Dally, W. J. (2020b). Deep gradient compression: Reducing the communication bandwidth for distributed training. DOI: 10.48550/arxiv.1712.01887.
McMahan, B., Moore, E., Ramage, D., Hampson, S., and y Arcas, B. A. (2017). Communication-efficient learning of deep networks from decentralized data. In Artificial intelligence and statistics, pages 1273-1282. PMLR. DOI: 10.48550/arxiv.1602.05629.
Mohamed, A., Souza, A., Costa, J., Villas, L., and Reis, J. (2024). Fedselekdistill: Empoderando a escolha de clientes com a destilação do conhecimento para aprendizado federado em dados não-iid. In Anais do VIII Workshop de Computação Urbana, pages 71-84, Porto Alegre, RS, Brasil. SBC. DOI: 10.5753/courb.2024.3238.
Mohamed, A. H., de Souza, A. M., da Costa, J. B. D., Villas, L., and dos Reis, J. C. (2023). Ccsf: Clustered client selection framework for federated learning in non-iid data. In Proceedings of the 16th IEEE/ACM Utility and Cloud Computing Conference (UCC), UCC '23, New York, NY, USA. Association for Computing Machinery. DOI: 10.1145/3603166.3632563.
Mora, A., Tenison, I., Bellavista, P., and Rish, I. (2022). Knowledge distillation for federated learning: a practical guide. arXiv preprint arXiv:2211.04742. DOI: 10.48550/arxiv.2211.04742.
Mothukuri, V., Parizi, R. M., Pouriyeh, S., Huang, Y., Dehghantanha, A., and Srivastava, G. (2021). A survey on security and privacy of federated learning. Future Generation Computer Systems, 115:619-640. DOI: 10.1016/j.future.2020.10.007.
Nguyen, H. T., Sehwag, V., Hosseinalipour, S., Brinton, C. G., Chiang, M., and Poor, H. V. (2020). Fast-convergent federated learning. IEEE Journal on Selected Areas in Communications, 39(1):201-218. DOI: 10.1109/jsac.2020.3036952.
Rothchild, D., Panda, A., Ullah, E., Ivkin, N., Stoica, I., Braverman, V., Gonzalez, J., and Arora, R. (2020). Fetchsgd: Communication-efficient federated learning with sketching. In International Conference on Machine Learning, pages 8253-8265. PMLR. DOI: 10.48550/arxiv.2007.07682.
Sattler, F., Wiedemann, S., Müler, K.-R., and Samek, W. (2020). Robust and communication-efficient federated learning from non-i.i.d. data. IEEE Transactions on Neural Networks and Learning Systems, 31(9):3400-3413. DOI: 10.1109/TNNLS.2019.2944481.
Shahid, O., Pouriyeh, S., Parizi, R. M., Sheng, Q. Z., Srivastava, G., and Zhao, L. (2021). Communication efficiency in federated learning: Achievements and challenges. arXiv preprint arXiv:2107.10996. DOI: 10.48550/arxiv.2107.10996.
Stallkamp, J., Schlipsing, M., Salmen, J., and Igel, C. (2012). Man vs. computer: Benchmarking machine learning algorithms for traffic sign recognition. volume 32, page 323-332, GBR. Elsevier Science Ltd.. DOI: 10.1016/j.neunet.2012.02.016.
Ström, N. (2015). Scalable distributed dnn training using commodity gpu cloud computing. In Interspeech 2015. DOI: 10.21437/interspeech.2015-354.
Wang, H., Kaplan, Z., Niu, D., and Li, B. (2020). Optimizing federated learning on non-iid data with reinforcement learning. In IEEE INFOCOM 2020-IEEE Conference on Computer Communications, pages 1698-1707. IEEE. DOI: 10.1109/INFOCOM41043.2020.9155494.
Yao, D., Pan, W., Dai, Y., Wan, Y., Ding, X., Jin, H., Xu, Z., and Sun, L. (2021). Local-global knowledge distillation in heterogeneous federated learning with non-iid data. arXiv preprint arXiv:2107.00051. DOI: 10.48550/arxiv.2107.00051.
Zhang, L., Shen, L., Ding, L., Tao, D., and Duan, L.-Y. (2022). Fine-tuning global model via data-free knowledge distillation for non-iid federated learning. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10164-10173. DOI: 10.1109/CVPR52688.2022.00993.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Journal of Internet Services and Applications

This work is licensed under a Creative Commons Attribution 4.0 International License.

