Challenges in High-Performance Computing




High-Performance Computing, Supercomputers, Exascale, Computer Architecture, Parallel Programming


High-Performance Computing, HPC, has become one of the most active computer science fields. Driven mainly by the need for high processing capabilities required by algorithms from many areas, such as Big Data, Artificial Intelligence, Data Science, and subjects related to chemistry, physics, and biology, the state-of-art algorithms from these fields are notoriously demanding computer resources. Therefore, choosing the right computer system to optimize their performance is paramount. This article presents the main challenges of future supercomputer systems, highlighting the areas that demand the most of HPC servers; the new architectures, including heterogeneous processors composed of artificial intelligence chips, quantum processors, the adoption of HPC on cloud servers; and the challenges of software developers when facing parallelizing applications. We also discuss challenges regarding non-functional requirements, such as energy consumption and resilience.


Download data is not yet available.


Aurora (2021). Argonne leadership computing facility. Available online [link]. Accessed: Apr. 11, 2021.

Baldini, I., Castro, P., Chang, K., Cheng, P., Fink, S., Ishakian, V., Mitchell, N., Muthusamy, V., Rabbah, R., Slominski, A., et al. (2017). Serverless computing: Current trends and open problems. Research advances in cloud computing, pages 1-20. DOI: 10.1007/978-981-10-5026-8_1.

Barney, B. (2009). POSIX threads programming. National Laboratory. Available online [link]. Accessed: Mai. 4, 2022.

Cerebras (2021). The future of ai is here. Available online [link]. Accessed: Sep. 10, 2021.

Chandra, R., Dagum, L., Menon, R., Kohr, D., Maydan, D., and McDonald, J. (2001). Parallel programming in OpenMP. Morgan Kaufmann.

Cook, S. (2012). CUDA programming: a developer's guide to parallel computing with GPUs. Newnes.

Cox, M. and Ellsworth, D. (1997). Managing big data for scientific visualization. In ACM siggraph, volume 97, pages 21-38. MRJ/NASA Ames Research Center. Available online [link].

Cruz, E. H., Diener, M., Pilla, L. L., and Navaux, P. O. (2021). Online thread and data mapping using a sharing-aware memory management unit. ACM Transactions on Modeling and Performance Evaluation of Computing Systems (TOMPECS), 5(4):1-28. DOI: 10.1145/3433687.

Dávila, G. P., Oliveira, D., Navaux, P., and Rech, P. (2019). Identifying the most reliable collaborative workload distribution in heterogeneous devices. In 2019 Design, Automation & Test in Europe Conference & Exhibition (DATE), pages 1325-1330. IEEE. DOI: 10.23919/DATE.2019.8715107.

Desjardins, J. (2019). How much data is generated each day? Available online [link]. Accessed: Mar. 12, 2021.

Dongarra, J. H. M. and Strohmaier, E. (2020). Top500 supercomputer:. Available online [link]. Accessed: Mar. 10, 2021.

Farber, R. (2016). Parallel programming with OpenACC. Newnes.

Freytag, G., Lima, J. V., Rech, P., and Navaux, P. O. (2022). Impact of Reduced and Mixed-Precision on the Efficiency of a Multi-GPU Platform on CFD Applications. In Computational Science and Its Applications-ICCSA 2022 Workshops: Malaga, Spain, July 4-7, 2022, Proceedings, Part IV, pages 570-587. Springer. DOI: 10.1007/978-3-031-10542-5_39.

Frontier (2021). ORNL exascale supercomputer. Available online [link]. Accessed: Apr. 10, 2021.

Fu, X., Riesebos, L., Lao, L., Almudever, C. G., Sebastiano, F., Versluis, R., Charbon, E., and Bertels, K. (2016). A heterogeneous quantum computer architecture. In Proceedings of the ACM International Conference on Computing Frontiers, pages 323-330. DOI: 10.1145/2903150.2906827.

Fujitsu (2021). Supercomputer fugaku. Available online [link]. Accessed: Apr. 10, 2021.

Gabriel, E., Fagg, G. E., Bosilca, G., Angskun, T., Dongarra, J. J., Squyres, J. M., Sahay, V., Kambadur, P., Barrett, B., Lumsdaine, A., et al. (2004). Open mpi: Goals, concept, and design of a next generation mpi implementation. In Recent Advances in Parallel Virtual Machine and Message Passing Interface: 11th European PVM/MPI Users’ Group Meeting Budapest, Hungary, September 19-22, 2004. Proceedings 11, pages 97-104. Springer. DOI: 10.1007/978-3-540-30218-6_19.

Ghose, S., Boroumand, A., Kim, J. S., Gómez-Luna, J., and Mutlu, O. (2019). Processing-in-memory: A workload-driven perspective. IBM Journal of Research and Development, 63(6):3-1. DOI: 10.1147/JRD.2019.2934048.

Gupta, S., Patel, T., Engelmann, C., and Tiwari, D. (2017). Failures in large scale systems: long-term measurement, analysis, and implications. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pages 1-12. DOI: 10.1145/3126908.3126937.

Hendrickx, N. W., Lawrie, W. I., Russ, M., van Riggelen, F., de Snoo, S. L., Schouten, R. N., Sammak, A., Scappucci, G., and Veldhorst, M. (2021). A four-qubit germanium quantum processor. Nature, 591(7851):580-585. DOI: 10.1038/s41586-021-03332-6.

Kelleher, J. D. and Tierney, B. (2018). Data science. MIT Press.

Khan, S. M. and Mann, A. (2020). Ai chips: what they are and why they matter. Center for Security and Emerging Technology. DOI: 10.51593/20190014.

LeBeane, M., Ryoo, J. H., Panda, R., and John, L. K. (2015). Watt watcher: fine-grained power estimation for emerging workloads. In 2015 27th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), pages 106-113. IEEE. DOI: 10.1109/SBAC-PAD.2015.26.

Lee, S., Kang, S.-h., Lee, J., Kim, H., Lee, E., Seo, S., Yoon, H., Lee, S., Lim, K., Shin, H., et al. (2021). Hardware architecture and software stack for pim based on commercial dram technology: Industrial product. In 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA), pages 43-56. IEEE. DOI: 10.1109/ISCA52012.2021.00013.

Liao, X.-k., Lu, K., Yang, C.-q., Li, J.-w., Yuan, Y., Lai, M.-c., Huang, L.-b., Lu, P.-j., Fang, J.-b., Ren, J., et al. (2018). Moving from exascale to zettascale computing: challenges and techniques. Frontiers of Information Technology & Electronic Engineering, 19:1236-1244. DOI: 10.1631/FITEE.1800494.

LLNL (2021). DOE/NNSA Lab announces a partnership with Cray to develop NNSA's first exascale supercomputer. Jeremy Thomas. Available online [link]. Accessed: Sep. 10, 2021.

Matsuoka, S., Domke, J., Wahib, M., Drozd, A., and Hoefler, T. (2023). Myths and legends in high-performance computing. arXiv preprint arXiv:2301.02432. DOI: 10.48550/arXiv.2301.02432.

Munshi, A., Gaster, B., Mattson, T. G., and Ginsburg, D. (2011). OpenCL programming guide. Pearson Education.

Padoin, E. L., Diener, M., Navaux, P. O., and Méhaut, J.-F. (2019). Managing power demand and load imbalance to save energy on systems with heterogeneous CPU speeds. In 2019 31st International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), pages 72-79. IEEE. DOI: 10.1109/SBAC-PAD.2019.00024.

Paillard, G. A. L., Coutinho, E. F., de Lima, E. T., and Moreira, L. O. (2015). An architecture proposal for high performance computing in cloud computing environments. In 4th International Workshop on Advances in ICT Infrastructures and Services (ADVANCE 2015), Recife. Available online [link].

Reed, D., Gannon, D., and Dongarra, J. (2022). Reinventing high performance computing: Challenges and opportunities. arXiv preprint arXiv:2203.02544. DOI: 10.48550/arXiv.2203.02544.

Rocki, K., Van Essendelft, D., Sharapov, I., Schreiber, R., Morrison, M., Kibardin, V., Portnoy, A., Dietiker, J. F., Syamlal, M., and James, M. (2020). Fast stencil-code computation on a wafer-scale processor. In SC20: International Conference for High Performance Computing, Networking, Storage and Analysis, pages 1-14. IEEE. DOI: 10.1109/SC41405.2020.00062.

SambaNova (2021). Accelerated computing with a reconfigurable dataflow architecture. white paper. Available online [link]. Accessed: Sep. 10, 2021.

Sanders, J. and Kandrot, E. (2010). CUDA by example: an introduction to general-purpose GPU programming. Addison-Wesley Professional.

Schardl, T. B., Lee, I.-T. A., and Leiserson, C. E. (2018). Brief announcement: Open cilk. In Proceedings of the 30th on Symposium on Parallelism in Algorithms and Architectures, pages 351-353. DOI: 10.1145/3210377.3210658.

Stevens, R., Taylor, V., Nichols, J., Maccabe, A. B., Yelick, K., and Brown, D. (2020). AI for science: Report on the department of energy (doe) town halls on artificial intelligence (ai) for science. Technical report, Argonne National Lab.(ANL), Argonne, IL (United States).

Verbraeken, J., Wolting, M., Katzy, J., Kloppenburg, J., Verbelen, T., and Rellermeyer, J. S. (2020). A survey on distributed machine learning. Acm computing surveys (csur), 53(2):1-33. DOI: 10.1145/3377454.

Vetter, J. S., Brightwell, R., Gokhale, M., McCormick, P., Ross, R., Shalf, J., Antypas, K., Donofrio, D., Humble, T., Schuman, C., et al. (2022). Extreme heterogeneity 2018-productive computational science in the era of extreme heterogeneity: Report for DOE ASCR workshop on extreme heterogeneity. DOI: 10.2172/1473756.

Voss, M., Asenjo, R., Reinders, J., Voss, M., Asenjo, R., and Reinders, J. (2019). Mapping parallel patterns to TBB. Pro TBB: C++ Parallel Programming with Threading Building Blocks, pages 233-248. DOI: 10.1007/978-1-4842-4398-5_8.

Xenopoulos, P., Daniel, J., Matheson, M., and Sukumar, S. (2016). Big data analytics on HPC architectures: Performance and cost. In 2016 IEEE International Conference on Big Data (Big Data), pages 2286-2295. IEEE. DOI: 10.1109/BigData.2016.7840861.




How to Cite

Navaux, P. O. A., Lorenzon, A. F., & Serpa, M. da S. (2023). Challenges in High-Performance Computing. Journal of the Brazilian Computer Society, 29(1), 51–62.