Enhancing Red Team Agent Learning with the Kill Chain Catalyst Algorithm in Capture the Flag Scenarios

Antonio Horta; Anderson dos Santos; Ronaldo Goldschmidt

doi:10.5753/jbcs.2026.5365

Authors

Antonio Horta Military Institute of Engineering, Accenture https://orcid.org/0000-0002-0381-315X
Anderson dos Santos Military Institute of Engineering, Venturus Centro de Inovação Tecnológica https://orcid.org/0000-0002-6754-4809
Ronaldo Goldschmidt Military Institute of Engineering https://orcid.org/0000-0003-1688-0586

DOI:

https://doi.org/10.5753/jbcs.2026.5365

Keywords:

Autonomous Red Team Agents, Reinforcement Learning, Kill Chain, Genetic Align, Random Forest

Abstract

With the advancement of technology, tasks once performed by humans have increasingly transitioned to machines or agents equipped with artificial intelligence, including various cyber security domains. From the perspective of real-world cyber attacks, executing actions with minimal failures and steps is critical to reducing the likelihood of exposure. Although research on autonomous cyber attacks predominantly employs Reinforcement Learning (RL), this approach has gaps in scenarios such as limited training data, low resilience in dynamic environments, and limited interpretability of decision-making policies. Therefore, Kill Chain Catalyst (KCC), an RL algorithm based on Gini Impurity-Based Weighted Random Forest that prioritizes interpretability, efficiency in scenarios with limited experience, and resilience in dynamic environments explored by RL agents, has been introduced. KCC leverages decision tree logic for enhanced interpretability and employs a catalyst module inspired by genetic alignment to optimize the search for efficient attack sequences. More than 150 attack experiments were conducted to evaluate learning in terms of offset, speed, and generalization. The analysis focused on the steps, rewards, and failures of agents using the RL algorithms KCC, PPO, DQN, TRPO, and A2C, within a Capture the Flag tournament setting. Both static and dynamic scenarios with limited learning experiences were considered. These experiments demonstrate the superior performance of KCC, revealing differences of up to 198.69% for steps, 129.43% for rewards, and 1096.39% for failures when performing attacks using KCC compared with the other algorithms.

Downloads

Download data is not yet available.

References

Al-Azzawi, M., Doan, D., Sipola, T., Hautamäki, J., and Kokkonen, T. (2024). Artificial intelligence cyberattacks in red teaming: A scoping review. In World Conference on Information Systems and Technologies, pages 129-138. Springer. DOI: 10.1007/978-3-031-60215-3_13.

Aspland, E., Harper, P. R., Gartner, D., Webb, P., and Barrett-Lee, P. (2021). Modified needleman–wunsch algorithm for clinical pathway clustering. Journal of Biomedical Informatics, 115:103668. DOI: 10.1016/j.jbi.2020.103668.

Breiman, L. (2001). Random forests. Machine learning, 45:5-32. DOI: 10.1023/a:1010933404324.

Che Mat, N. I., Jamil, N., Yusoff, Y., and Mat Kiah, M. L. (2024). A systematic literature review on advanced persistent threat behaviors and its detection strategy. Journal of Cybersecurity, 10(1):tyad023. DOI: 10.1093/cybsec/tyad023.

Chen, J., Hu, S., Zheng, H., Xing, C., and Zhang, G. (2023). Gail-pt: An intelligent penetration testing framework with generative adversarial imitation learning. Computers & Security, 126:103055. DOI: 10.1016/j.cose.2022.103055.

Da Silva, F. L. and Costa, A. H. R. (2019). A survey on transfer learning for multiagent reinforcement learning systems. Journal of Artificial Intelligence Research, 64:645-703. DOI: 10.1613/jair.1.11396.

Disha, R. A. and Waheed, S. (2022). Performance analysis of machine learning models for intrusion detection system using gini impurity-based weighted random forest (giwrf) feature selection technique. Cybersecurity, 5(1):1. DOI: 10.1186/s42400-021-00103-8.

Farouk, M., Sakr, R. H., and Hikal, N. (2024). Identifying the most accurate machine learning classification technique to detect network threats. Neural Computing and Applications, 36(16):8977-8994. DOI: 10.1007/s00521-024-09562-9.

Gancheva, V. and Stoev, H. (2023). An algorithm for pairwise dna sequences alignment. In International Work-Conference on Bioinformatics and Biomedical Engineering, pages 48-61. Springer. DOI: 10.1007/978-3-031-34953-9_4.

Gangupantulu, R., Cody, T., Rahma, A., Redino, C., Clark, R., and Park, P. (2021). Crown jewels analysis using reinforcement learning with attack graphs. In 2021 IEEE Symposium Series on Computational Intelligence (SSCI), pages 1-6. DOI: 10.1109/SSCI50451.2021.9659947.

Holm, H. (2022). Lore a red team emulation tool. IEEE Transactions on Dependable and Secure Computing, 1:1-1. DOI: 10.1109/TDSC.2022.3160792.

Horta, A., dos Santos, A. F. P., and Goldschmidt, R. R. (2024a). Evaluating the stealth of reinforcement learning-based cyber attacks against unknown scenarios using knowledge transfer techniques. Journal of Computer Security, (Preprint):1-19. DOI: 10.3233/jcs-230145.

Horta, A., Santos, A., and Goldshmidt, R. (2024b). Kill chain catalyst for autonomous red team operations in dynamic attack scenarios. In Anais do XXIV Simpósio Brasileiro de Segurança da Informação e de Sistemas Computacionais, pages 415-430, Porto Alegre, RS, Brasil. SBC. DOI: 10.5753/sbseg.2024.241371.

Ibrahim, M. K., Yusof, U. K., Eisa, T. A. E., and Nasser, M. (2024). Bioinspired algorithms for multiple sequence alignment: A systematic review and roadmap. Applied Sciences, 14(6):2433. DOI: 10.3390/app14062433.

Janisch, J., Pevnỳ, T., and Lisỳ, V. (2023). Nasimemu: Network attack simulator & emulator for training agents generalizing to novel scenarios. In European Symposium on Research in Computer Security, pages 589-608. Springer. DOI: 10.48550/arXiv.2305.17246.

Li, L., El Rami, J.-P. S., Taylor, A., Rao, J. H., and Kunz, T. (2022). Enabling a network ai gym for autonomous cyber agents. In 2022 International Conference on Computational Science and Computational Intelligence (CSCI), pages 172-177. IEEE. DOI: 10.1109/csci58124.2022.00034.

Mnih, V., Badia, A. P., Mirza, M., Graves, A., Lillicrap, T., Harley, T., Silver, D., and Kavukcuoglu, K. (2016). Asynchronous methods for deep reinforcement learning. In International conference on machine learning, pages 1928-1937. PMLR. DOI: 10.48550/arxiv.1602.01783.

Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., Graves, A., Riedmiller, M., Fidjeland, A. K., Ostrovski, G., et al. (2015). Human-level control through deep reinforcement learning. nature, 518(7540):529-533. DOI: 10.1038/nature14236.

Ortiz-Garces, I., Gutierrez, R., Guerra, D., Sanchez-Viteri, S., and Villegas-Ch., W. (2023). Development of a platform for learning cybersecurity using capturing the flag competitions. Electronics, 12(7). DOI: 10.3390/electronics12071753.

Paudel, B. and Amariucai, G. (2023). Reinforcement learning approach to generate zero-dynamics attacks on control systems without state space models. In European Symposium on Research in Computer Security, pages 3-22. Springer. DOI: 10.1007/978-3-031-51482-1_1.

Poinsignon, T., Poulain, P., Gallopin, M., and Lelandais, G. (2023). Working with omics data: An interdisciplinary challenge at the crossroads of biology and computer science. In Machine Learning for Brain Disorders, pages 313-330. Springer. DOI: 10.1007/978-1-0716-3195-9_10.

Pozdniakov, K., Alonso, E., Stankovic, V., Tam, K., and Jones, K. (2020). Smart security audit: Reinforcement learning with a deep neural network approximator. In 2020 International Conference on Cyber Situational Awareness, Data Analytics and Assessment (CyberSA), pages 1-8. DOI: 10.1109/CyberSA49311.2020.9139683.

Schulman, J., Levine, S., Abbeel, P., Jordan, M., and Moritz, P. (2015). Trust region policy optimization. In International conference on machine learning, pages 1889-1897. PMLR. DOI: 10.48550/arxiv.1502.05477.

Schulman, J., Wolski, F., Dhariwal, P., Radford, A., and Klimov, O. (2017). Proximal policy optimization algorithms. Available at:[link].

Sharma, P. and Rana, C. (2024). Artificial intelligence based object detection and traffic prediction by autonomous vehicles-a review. Expert Systems with Applications, page 124664. DOI: 10.1016/j.eswa.2024.124664.

Standen, M., Lucas, M., Bowman, D., Richer, T. J., Kim, J., and Marriott, D. (2021). Cyborg: A gym for the development of autonomous cyber agents. In IJCAI-21 1st International Workshop on Adaptive Cyber Defense. arXiv. DOI: 10.48550/ARXIV.2108.09118.

Sutton, R. S. and Barto, A. G. (2018). Reinforcement learning: An introduction. MIT press, second edition. DOI: 10.1109/tnn.1998.712192.

Thangavel, K., Sabatini, R., Gardi, A., Ranasinghe, K., Hilton, S., Servidia, P., and Spiller, D. (2024). Artificial intelligence for trusted autonomous satellite operations. Progress in Aerospace Sciences, 144:100960. DOI: 10.1016/j.paerosci.2023.100960.

Tran, K., Akella, A., Standen, M., Kim, J., Bowman, D., Richer, T., and Lin, C.-T. (2021). Deep hierarchical reinforcement agents for automated penetration testing. In IJCAI-21 1st International Workshop on Adaptive Cyber Defense. arXiv. DOI: 10.48550/ARXIV.2109.06449.

Viering, T. and Loog, M. (2023). The shape of learning curves: A review. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(6):7799-7819. DOI: 10.1109/TPAMI.2022.3220744.

Yang, Y. and Liu, X. (2022). Behaviour-diverse automatic penetration testing: A curiosity-driven multi-objective deep reinforcement learning approach. DOI: 10.48550/ARXIV.2202.10630.

Zhou, S., Liu, J., Hou, D., Zhong, X., and Zhang, Y. (2021). Autonomous penetration testing based on improved deep q-network. Applied Sciences, 11(19). DOI: 10.3390/app11198823.