Deep GreenAI: Effective Layer Pruning Method for Modern Neural Networks
DOI:
https://doi.org/10.5753/reic.2025.6038Keywords:
Green AI, Layer Pruning, Efficient Deep Learning, Similarity MetricAbstract
Deep neural networks have been the predominant paradigm in machine learning for solving cognitive tasks. Such models, however, are restricted by a high computational overhead, limiting their applicability and hindering advancements in the field. Extensive research demonstrated that pruning structures from these models is a straightforward approach to reducing network complexity. In this direction, most efforts focus on removing weights or filters. Studies have also been devoted to layer pruning as it promotes superior computational gains. However, layer removal often hurts network predictive ability (i.e., accuracy) at high compression rates. This work introduces an effective layer-pruning strategy that meets all underlying properties pursued by pruning methods. Our method estimates the relative importance of a layer using the Centered Kernel Alignment (CKA) metric, employed to measure the similarity between representations of the unpruned model and a candidate subnetwork for pruning. We confirm the effectiveness of our method on standard architectures and benchmarks, in which it outperforms existing layer-pruning strategies and other state-of-the-art pruning techniques. Specifically, we remove more than 75% of computation while improving predictive ability and reducing CO2 emissions required for training by 80%, taking an important step towards GreenAI. At higher compression regimes, our method exhibits negligible accuracy drop, while others notably deteriorate it. Apart from these benefits, our pruned models exhibit robustness to adversarial and out-of-distribution samples.
Downloads
References
Alwani, M., Wang, Y., and Madhavan, V. (2022). DECORE: deep compression with reinforcement learning. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18-24, 2022, pages 12339–12349. IEEE. DOI: 10.1109/CVPR52688.2022.01203.
Amatriain, X. (2023). Transformer models: an introduction and catalog. CoRR, abs/2302.07730. DOI: 10.48550/ARXIV.2302.07730.
Bair, A., Yin, H., Shen, M., Molchanov, P., and Álvarez, J. M. (2024). Adaptive sharpness-aware pruning for robust sparse networks. In The Twelfth International Conference on Learning Representations, ICLR 2024, Vienna, Austria, May 7-11, 2024. OpenReview.net.
Bommasani, R., Hudson, D. A., Adeli, E., and et al. (2021). On the opportunities and risks of foundation models. CoRR, abs/2108.07258.
Chen, D., Zhang, W., Xu, X., and Xing, X. (2016). Deep networks with stochastic depth for acoustic modelling. In Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2016, Jeju, South Korea, December 13-16, 2016, pages 1–4. IEEE. DOI: 10.1109/APSIPA.2016.7820692.
Chen, S. and Zhao, Q. (2019). Shallowing deep networks: Layer-wise pruning based on feature representations. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(12):3048–3056. DOI: 10.1109/TPAMI.2018.2874634.
Chen, Y., Yuille, A. L., and Zhou, Z. (2023). Which layer is learning faster? A systematic exploration of layer-wise convergence rate for deep neural networks. In ICLR.
Dehghani, M., Tay, Y., Arnab, A., Beyer, L., and Vaswani, A. (2022). The efficiency misnomer. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net.
Dong, Y., Cordonnier, J., and Loukas, A. (2021). Attention is not all you need: pure attention loses rank doubly exponentially with depth. In Meila, M. and Zhang, T., editors, Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event, volume 139 of Proceedings of Machine Learning Research, pages 2793–2803. PMLR.
Duong, L. R., Zhou, J., Nassar, J., Berman, J., Olieslagers, J., and Williams, A. H. (2023). Representational dissimilarity metric spaces for stochastic neural networks. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net.
Evci, U., Dumoulin, V., Larochelle, H., and Mozer, M. C. (2022). Head2toe: Utilizing intermediate representations for better transfer learning. In Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., and Sabato, S., editors, International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA, volume 162 of Proceedings of Machine Learning Research, pages 6009–6033. PMLR.
Faiz, A., Kaneda, S., Wang, R., Osi, R. C., Sharma, P., Chen, F., and Jiang, L. (2024). LLMCarbon: Modeling the end-to-end carbon footprint of large language models. In The Twelfth International Conference on Learning Representations.
Gretton, A., Bousquet, O., Smola, A. J., and Schölkopf, B. (2005). Measuring statistical dependence with hilbert-schmidt norms. In Jain, S., Simon, H. U., and Tomita, E., editors, Algorithmic Learning Theory, 16th International Conference, ALT 2005, Singapore, October 8-11, 2005, Proceedings, volume 3734 of Lecture Notes in Computer Science, pages 63–77. Springer. DOI: 10.1007/11564089_7.
Han, K., Wang, Y., Chen, H., Chen, X., Guo, J., Liu, Z., Tang, Y., Xiao, A., Xu, C., Xu, Y., Yang, Z., Zhang, Y., and Tao, D. (2023). A survey on vision transformer. IEEE Trans. Pattern Anal. Mach. Intell., 45(1):87–110. DOI: 10.1109/TPAMI.2022.3152247.
Han, Y., Huang, G., Song, S., Yang, L., Wang, H., and Wang, Y. (2022). Dynamic neural networks: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(11):7436–7456. DOI: 10.1109/TPAMI.2021.3117837.
He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pages 770–778. IEEE Computer Society. DOI: 10.1109/CVPR.2016.90.
He, Y., Liu, P., Wang, Z., Hu, Z., and Yang, Y. (2019). Filter pruning via geometric median for deep convolutional neural networks acceleration. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019, pages 4340–4349. Computer Vision Foundation / IEEE. DOI: 10.1109/CVPR.2019.00447.
He, Y. and Xiao, L. (2024). Structured pruning for deep convolutional neural networks: A survey. IEEE Trans. Pattern Anal. Mach. Intell., 46(5):2900–2919. DOI: 10.1109/TPAMI.2023.3334614.
Hendrycks, D. and Dietterich, T. G. (2019). Benchmarking neural network robustness to common corruptions and perturbations. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net.
Huang, Z., Shao, W., Wang, X., Lin, L., and Luo, P. (2021). Rethinking the pruning criteria for convolutional neural network. In NeurIPS.
Jordão, A., de Araújo, G. C., de Almeida Maia, H., and Pedrini, H. (2023). When layers play the lottery, all tickets win at initialization. In IEEE/CVF International Conference on Computer Vision, ICCV 2023 - Workshops, Paris, France, October 2-6, 2023, pages 1196–1205. IEEE. DOI: 10.1109/ICCVW60793.2023.00130.
Jordão, A., Lie, M., and Schwartz, W. R. (2020). Discriminative layer pruning for convolutional neural networks. IEEE J. Sel. Top. Signal Process., 14(4):828–837. DOI: 10.1109/JSTSP.2020.2975987.
Kornblith, S., Norouzi, M., Lee, H., and Hinton, G. E. (2019). Similarity of neural network representations revisited. In Chaudhuri, K. and Salakhutdinov, R., editors, Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA, volume 97 of Proceedings of Machine Learning Research, pages 3519–3529. PMLR.
Lacoste, A., Luccioni, A., Schmidt, V., and Dandres, T. (2019). Quantifying the carbon emissions of machine learning. CoRR, abs/1910.09700.
Lin, M., Ji, R., Wang, Y., Zhang, Y., Zhang, B., Tian, Y., and Shao, L. (2020). Hrank: Filter pruning using high-rank feature map. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13-19, 2020, pages 1526–1535. Computer Vision Foundation / IEEE. DOI: 10.1109/CVPR42600.2020.00160.
Lu, S., Nott, B., Olson, A., Todeschini, A., Vahabi, H., Carmon, Y., and Schmidt, L. (2020). Harder or different? a closer look at distribution shift in dataset reproduction. In ICML Workshop on Uncertainty and Robustness in Deep Learning, volume 5, page 15.
Masarczyk, W., Ostaszewski, M., Imani, E., Pascanu, R., Miłoś, P., and Trzcinski, T. (2023). The tunnel effect: Building data representations in deep neural networks. In NeurIPS.
Nguyen, T., Raghu, M., and Kornblith, S. (2021). Do wide and deep networks learn the same things? uncovering how neural network representations vary with width and depth. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net.
Nguyen, T., Raghu, M., and Kornblith, S. (2022). On the origins of the block structure phenomenon in neural network representations. Trans. Mach. Learn. Res., 2022.
Shen, M., Yin, H., Molchanov, P., Mao, L., Liu, J., and Álvarez, J. M. (2022). Structural pruning via latency-saliency knapsack. In Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., and Oh, A., editors, Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 - December 9, 2022.
Silver, D., Huang, A., Maddison, C. J., and et al. (2016). Mastering the game of go with deep neural networks and tree search. Nat., 529(7587):484–489. DOI: 10.1038/NATURE16961.
Strubell, E., Ganesh, A., and McCallum, A. (2019). Energy and policy considerations for deep learning in NLP. In Korhonen, A., Traum, D. R., and Màrquez, L., editors, Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28–August 2, 2019, Volume 1: Long Papers, pages 3645–3650. Association for Computational Linguistics. DOI: 10.18653/V1/P19-1355.
Vasu, P. K. A., Gabriel, J., Zhu, J., Tuzel, O., and Ranjan, A. (2023). Mobileone: An improved one millisecond mobile backbone. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023, Vancouver, BC, Canada, June 17–24, 2023, pages 7907–7917. IEEE. DOI: 10.1109/CVPR52729.2023.00764.
Veit, A., Wilber, M. J., and Belongie, S. J. (2016). Residual networks behave like ensembles of relatively shallow networks. In Lee, D. D., Sugiyama, M., von Luxburg, U., Guyon, I., and Garnett, R., editors, Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, December 5-10, 2016, Barcelona, Spain, pages 550–558.
Wang, H., Qin, C., Bai, Y., Zhang, Y., and Fu, Y. (2022). Recent advances on neural network pruning at initialization. In Raedt, L. D., editor, Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI 2022, Vienna, Austria, 23–29 July 2022, pages 5638–5645. ijcai.org. DOI: 10.24963/IJCAI.2022/786.
Xu, C. and McAuley, J. J. (2023). A survey on model compression and acceleration for pretrained language models. In Williams, B., Chen, Y., and Neville, J., editors, Thirty-Seventh AAAI Conference on Artificial Intelligence, AAAI 2023, Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence, IAAI 2023, Thirteenth Symposium on Educational Advances in Artificial Intelligence, EAAI 2023, Washington, DC, USA, February 7–14, 2023, pages 10566–10575. AAAI Press. DOI: 10.1609/AAAI.V37I9.26255.
Zhang, C., Bengio, S., and Singer, Y. (2022). Are all layers created equal? Journal of Machine Learning Research.
Zhang, K. and Liu, G. (2022). Layer pruning for obtaining shallower resnets. IEEE Signal Process. Lett., 29:1172–1176. DOI: 10.1109/LSP.2022.3171128.
Zhao, Q. and Wressnegger, C. (2023). Holistic adversarially robust pruning. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net.
Zhou, Y., Yen, G. G., and Yi, Z. (2022). Evolutionary shallowing deep neural networks at block levels. IEEE Trans. Neural Networks Learn. Syst., 33(9):4635–4647. DOI: 10.1109/TNNLS.2021.3059529.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 The authors

This work is licensed under a Creative Commons Attribution 4.0 International License.
