Avaliação de Desempenho para Aplicações Científicas utilizando o Modelo Roofline
Keywords:
Avaliação de Dessempenho, HPC, Roofline, Aprendizado de Máquina, Aplicações CientíficasAbstract
Entender os fatores que limitam o desempenho das aplicações e seus requisitos computacionais pode auxiliar nas otimizações de software e na aquisição, no desenvolvimento e na escolha de um hardware que melhor atende às necessidades de desempenho da aplicação. Neste trabalho é proposta uma metodologia de avaliação e caracterização de desempenho de aplicações científicas usando o modelo Roofline. Foram desenvolvidas uma série de experimentos com diferentes aplicações, as quais são analisadas seguindo a metodologia proposta. Os resultados permitem identificar os aspectos que limitam o desempenho, sugerir otimizações de software e de hardwares que aumentem a eficiência de execução das aplicações avaliadas.
Downloads
Referências
[Antão et al. 2013] Antão, D., Tanic ̧a, L., Ilic, A., Pratas, F., Tom ́as, P., and Sousa, L. (2013). Monitoring performance and power for application characterization with the cache-aware roofline model. In Wyrzykowski, R., Dongarra, J. J., Karczewski, K., and Wasniewski, J., editors, Parallel Processing and Applied Mathematics - 10th International Conference, PPAM 2013, Warsaw, Poland, September 8-11, 2013, Revised Selected Papers, Part I, volume 8384 of Lecture Notes in Computer Science, pages 747–760. Springer.
[Breiman et al. 1984] Breiman, L., Friedman, J., Stone, C. J., and Olshen, R. A. (1984). Classification and regression trees. CRC press.
[Denoyelle et al. 2019] Denoyelle, N., Goglin, B., Ilic, A., Jeannot, E., and Sousa, L. (2019). Modeling non-uniform memory access on large compute nodes with the cache-aware roofline model. IEEE Trans. Parallel Distributed Syst., 30(6):1374–1389.
[Frumkin et al. 2009] Frumkin, M., Jin, H., and Yan, J. (2009). Implementation of nas parallel benchmarks in high performance fortran. [Ibrahim et al. 2020] Ibrahim, K., Williams, S., and Oliker, L. (2020). Performance Analysis of GPU Programming Models Using the Roofline Scaling Trajectories, pages 3–19.
[Ilic et al. 2014] Ilic, A., Pratas, F., and Sousa, L. (2014). Cache-aware roofline model: Upgrading the loft. IEEE Computer Architecture Letters, 13(1):21–24.
[Ilic et al. 2017] Ilic, A., Pratas, F., and Sousa, L. (2017). Beyond the roofline: Cache-aware power and energy-efficiency modeling for multi-cores. IEEE Trans. Computers, 66(1):52–58.
[Kim et al. 2011] Kim, K.-H., Kim, K.-H., and Park, Q.-H. (2011). Performance analysis and optimization of three-dimensional fdtd on gpu using roofline model. Computer Physics Communications, 182:1201–1207.
[Lopes et al. 2017] Lopes, A., Pratas, F., Sousa, L., and Ilic, A. (2017). Exploring GPU performance, power and energy-efficiency bounds with cache-aware roofline modeling. In 2017 IEEE International Symposium on Performance Analysis of Systems and Software, ISPASS 2017, Santa Rosa, CA, USA, April 24-25, 2017, pages 259-268. IEEE Computer Society.
[Lorenzo et al. 2011] Lorenzo, J. A., Pichel, J. C., Pena, T. F., Suarez, M., and Rivera, F. F. (2011). Study of performance issues on a SMP-NUMA system using the roofline model. In 2011 International Conference on Parallel and Distributed Processing Techniques and Applications.
[Marques et al. 2017a] Marques, D., Duarte, H., Ilic, A., Sousa, L., Belenov, R., Thierry, P., and Matveev, Z. A. (2017a). Performance analysis with cache-aware roofline model in intel advisor. In 2017 International Conference on High Performance Computing & Simulation, HPCS 2017, Genoa, Italy, July 17-21, 2017, pages 898–907. IEEE.
[Marques et al. 2017b] Marques, D., Duarte, H., Sousa, L., and Ilic, A. (2017b). Analyzing performance of multi-cores and applications with cache-aware roofline model. In 2017 International Conference on High Performance Computing & Simulation, HPCS 2017, Genoa, Italy, July 17-21, 2017, pages 933–934. IEEE.
[Marques et al. 2020] Marques, D., Ilic, A., Matveev, Z. A., and Sousa, L. (2020). Application-driven cache-aware roofline model. Future Gener. Comput. Syst., 107:257–273.
[Marques et al. 2021] Marques, D., Ilic, A., and Sousa, L. (2021). Mansard roofline model: Reinforcing the accuracy of the roofs. ACM Trans. Model. Perform. Evaluation Comput. Syst., 6(2):7:1–7:23.
[Pedregosa et al. 2011] Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., and Duchesnay, E. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830.
[Pereira et al. 2017] Pereira, R., Couto, M., Ribeiro, F., Rua, R., Cunha, J., Fernandes, J. a. P., and Saraiva, J. a. (2017). Energy efficiency across programming languages: How do energy, time, and memory relate? In Proceedings of the 10th ACM SIGPLAN International Conference on Software Language Engineering, SLE 2017, page 256–267, New York, NY, USA.
[Quinlan 1993] Quinlan, J. R. (1993). C4.5: programs for machine learning. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA.
[Sato et al. 2009] Sato, Y., Nagaoka, R., Musa, A., Egawa, R., Takizawa, H., Okabe, K., and Kobayashi, H. (2009). Performance tuning and analysis of future vector processors based on the roofline model. In Proceedings of the 10th Workshop on MEmory Performance: DEaling with Applications, Systems and Architecture, MEDEA ’09, page 7–14, New York, NY, USA. Association for Computing Machinery.
[Serrano et al. 2018] Serrano, E., Ilic, A., Sousa, L., Garc ́ıa-Blas, J., and Carretero, J. (2018). Cache-aware roofline model and medical image processing optimizations in gpus. In Yokota, R., Weiland, M., Shalf, J., and Alam, S. R., editors, High Performance Computing - ISC High Performance 2018 International Workshops, Frankfurt/Main, Germany, June 28, 2018, Revised Selected Papers, volume 11203 of Lecture Notes in Computer Science, pages 509–526. Springer.
[Silva et al. 2021] Silva, G., Schulze, B., and Ferro, M. (2021). Performance and energy efficiency analysis of machine learning algorithms towards green ai: a case study of decision tree algorithms. Master’s thesis, National Lab. for Scientific Computing.
[Sá et al. 2020] Sá, V., Klôh, V., Schulze, B., and Ferro, M. (2020). Análise de desempenho e de requisitos computacionais utilizando o modelo roofline: Um estudo para aplicações de inteligˆencia artificial e do nas-hpc. In Anais Estendidos do XXI Simpósio em Sistemas Computacionais de Alto Desempenho, pages 22–29, Porto Alegre, RS, Brasil. SBC.
[Sá et al. 2021] Sá, V., Klôh, V., Schulze, B., and Ferro, M. (2021). Análise e avaliação de desempenho para aplicações científicas utilizando o modelo roofline. http://pergamum.lncc.br/pergamumweb/vinculos/000000/00000013.pdf.
[Williams et al. 2009] Williams, S., Waterman, A., and Patterson, D. (2009). Roofline:an insightful visual performance model for multicore architectures. Commun.ACM, 52(4):65–76.
[Yang et al. 2018] Yang, C., Gayatri, R., Kurth, T., Basu, P., Ronaghi, Z., Adetokunbo, A., Friesen, B., Cook, B., Doerfler, D., Oliker, L., Deslippe, J., and Williams, S. (2018). An empirical roofline methodology for quantitatively assessing performance portability. In 2018, IEEE/ACM Int. Workshop on Performance, Portability and Productivity in HPC (P3HPC), pages 14–23.