Avaliação de Desempenho para Aplicações Científicas utilizando o Modelo Roofline
Keywords:
Performance Evaluation, HPC, Roofline, Machine Learning, Scientific ApplicationsAbstract
Understanding the factors that limit the performance of applications and their computational requirements can help in software optimization and in the acquisition, development and choice of hardwares that best meet the application's computational performance. This work proposes a methodology for evaluating and characterizing the performance of scientific applications using the Roofline model. A series of experiments with a set of applications were developed and analyzed following the proposed methodology. The results allow us to identify the aspects that limit the performance of applications, suggest software optimizations and the hardware that increases the performance of the applications.
Downloads
References
[Antão et al. 2013] Antão, D., Tanic ̧a, L., Ilic, A., Pratas, F., Tom ́as, P., and Sousa, L. (2013). Monitoring performance and power for application characterization with the cache-aware roofline model. In Wyrzykowski, R., Dongarra, J. J., Karczewski, K., and Wasniewski, J., editors, Parallel Processing and Applied Mathematics - 10th International Conference, PPAM 2013, Warsaw, Poland, September 8-11, 2013, Revised Selected Papers, Part I, volume 8384 of Lecture Notes in Computer Science, pages 747–760. Springer.
[Breiman et al. 1984] Breiman, L., Friedman, J., Stone, C. J., and Olshen, R. A. (1984). Classification and regression trees. CRC press.
[Denoyelle et al. 2019] Denoyelle, N., Goglin, B., Ilic, A., Jeannot, E., and Sousa, L. (2019). Modeling non-uniform memory access on large compute nodes with the cache-aware roofline model. IEEE Trans. Parallel Distributed Syst., 30(6):1374–1389.
[Frumkin et al. 2009] Frumkin, M., Jin, H., and Yan, J. (2009). Implementation of nas parallel benchmarks in high performance fortran. [Ibrahim et al. 2020] Ibrahim, K., Williams, S., and Oliker, L. (2020). Performance Analysis of GPU Programming Models Using the Roofline Scaling Trajectories, pages 3–19.
[Ilic et al. 2014] Ilic, A., Pratas, F., and Sousa, L. (2014). Cache-aware roofline model: Upgrading the loft. IEEE Computer Architecture Letters, 13(1):21–24.
[Ilic et al. 2017] Ilic, A., Pratas, F., and Sousa, L. (2017). Beyond the roofline: Cache-aware power and energy-efficiency modeling for multi-cores. IEEE Trans. Computers, 66(1):52–58.
[Kim et al. 2011] Kim, K.-H., Kim, K.-H., and Park, Q.-H. (2011). Performance analysis and optimization of three-dimensional fdtd on gpu using roofline model. Computer Physics Communications, 182:1201–1207.
[Lopes et al. 2017] Lopes, A., Pratas, F., Sousa, L., and Ilic, A. (2017). Exploring GPU performance, power and energy-efficiency bounds with cache-aware roofline modeling. In 2017 IEEE International Symposium on Performance Analysis of Systems and Software, ISPASS 2017, Santa Rosa, CA, USA, April 24-25, 2017, pages 259-268. IEEE Computer Society.
[Lorenzo et al. 2011] Lorenzo, J. A., Pichel, J. C., Pena, T. F., Suarez, M., and Rivera, F. F. (2011). Study of performance issues on a SMP-NUMA system using the roofline model. In 2011 International Conference on Parallel and Distributed Processing Techniques and Applications.
[Marques et al. 2017a] Marques, D., Duarte, H., Ilic, A., Sousa, L., Belenov, R., Thierry, P., and Matveev, Z. A. (2017a). Performance analysis with cache-aware roofline model in intel advisor. In 2017 International Conference on High Performance Computing & Simulation, HPCS 2017, Genoa, Italy, July 17-21, 2017, pages 898–907. IEEE.
[Marques et al. 2017b] Marques, D., Duarte, H., Sousa, L., and Ilic, A. (2017b). Analyzing performance of multi-cores and applications with cache-aware roofline model. In 2017 International Conference on High Performance Computing & Simulation, HPCS 2017, Genoa, Italy, July 17-21, 2017, pages 933–934. IEEE.
[Marques et al. 2020] Marques, D., Ilic, A., Matveev, Z. A., and Sousa, L. (2020). Application-driven cache-aware roofline model. Future Gener. Comput. Syst., 107:257–273.
[Marques et al. 2021] Marques, D., Ilic, A., and Sousa, L. (2021). Mansard roofline model: Reinforcing the accuracy of the roofs. ACM Trans. Model. Perform. Evaluation Comput. Syst., 6(2):7:1–7:23.
[Pedregosa et al. 2011] Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., and Duchesnay, E. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830.
[Pereira et al. 2017] Pereira, R., Couto, M., Ribeiro, F., Rua, R., Cunha, J., Fernandes, J. a. P., and Saraiva, J. a. (2017). Energy efficiency across programming languages: How do energy, time, and memory relate? In Proceedings of the 10th ACM SIGPLAN International Conference on Software Language Engineering, SLE 2017, page 256–267, New York, NY, USA.
[Quinlan 1993] Quinlan, J. R. (1993). C4.5: programs for machine learning. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA.
[Sato et al. 2009] Sato, Y., Nagaoka, R., Musa, A., Egawa, R., Takizawa, H., Okabe, K., and Kobayashi, H. (2009). Performance tuning and analysis of future vector processors based on the roofline model. In Proceedings of the 10th Workshop on MEmory Performance: DEaling with Applications, Systems and Architecture, MEDEA ’09, page 7–14, New York, NY, USA. Association for Computing Machinery.
[Serrano et al. 2018] Serrano, E., Ilic, A., Sousa, L., Garc ́ıa-Blas, J., and Carretero, J. (2018). Cache-aware roofline model and medical image processing optimizations in gpus. In Yokota, R., Weiland, M., Shalf, J., and Alam, S. R., editors, High Performance Computing - ISC High Performance 2018 International Workshops, Frankfurt/Main, Germany, June 28, 2018, Revised Selected Papers, volume 11203 of Lecture Notes in Computer Science, pages 509–526. Springer.
[Silva et al. 2021] Silva, G., Schulze, B., and Ferro, M. (2021). Performance and energy efficiency analysis of machine learning algorithms towards green ai: a case study of decision tree algorithms. Master’s thesis, National Lab. for Scientific Computing.
[Sá et al. 2020] Sá, V., Klôh, V., Schulze, B., and Ferro, M. (2020). Análise de desempenho e de requisitos computacionais utilizando o modelo roofline: Um estudo para aplicações de inteligˆencia artificial e do nas-hpc. In Anais Estendidos do XXI Simpósio em Sistemas Computacionais de Alto Desempenho, pages 22–29, Porto Alegre, RS, Brasil. SBC.
[Sá et al. 2021] Sá, V., Klôh, V., Schulze, B., and Ferro, M. (2021). Análise e avaliação de desempenho para aplicações científicas utilizando o modelo roofline. http://pergamum.lncc.br/pergamumweb/vinculos/000000/00000013.pdf.
[Williams et al. 2009] Williams, S., Waterman, A., and Patterson, D. (2009). Roofline:an insightful visual performance model for multicore architectures. Commun.ACM, 52(4):65–76.
[Yang et al. 2018] Yang, C., Gayatri, R., Kurth, T., Basu, P., Ronaghi, Z., Adetokunbo, A., Friesen, B., Cook, B., Doerfler, D., Oliker, L., Deslippe, J., and Williams, S. (2018). An empirical roofline methodology for quantitatively assessing performance portability. In 2018, IEEE/ACM Int. Workshop on Performance, Portability and Productivity in HPC (P3HPC), pages 14–23.