Portfolio-based Active Learning with Gaussian Processes for Vulnerabilities Risk Classification

Davyson S. Ribeiro; Rafael S. Lemos; Francisco R. P. da Ponte; César Lincoln C. Mattos; Emanuel B. Rodrigues

doi:10.5753/jbcs.2026.5567

Authors

Davyson S. Ribeiro Federal University of Ceará https://orcid.org/0000-0002-7375-0684
Rafael S. Lemos Federal University of Ceará https://orcid.org/0009-0002-3114-9580
Francisco R. P. da Ponte Federal University of Ceará https://orcid.org/0009-0004-4580-2063
César Lincoln C. Mattos Federal University of Ceará https://orcid.org/0000-0002-2404-3625
Emanuel B. Rodrigues Federal University of Ceará https://orcid.org/0000-0002-6613-9502

DOI:

https://doi.org/10.5753/jbcs.2026.5567

Keywords:

Vulnerability Risk Classification, Machine Learning, Cybersecurity, Active Learning, Gaussian Processes

Abstract

Effective vulnerability management is essential for cybersecurity, particularly as the demand for skilled professionals often exceeds supply. This paper investigates the application of Gaussian Processes (GPs) integrated with Active Learning (AL) techniques to classify security vulnerabilities based on their risk of exploitation. The main objective is to optimize the labeling process, thereby reducing the amount of labeled data necessary for training an effective classifier. The proposed methodology combines the uncertainty predictions provided by GP models with five established data selection strategies, utilizing a portfolio-based approach. The portfolio avoids the need of choosing a single strategy and leverages the strengths of each technique. This approach enhances adaptability and balances exploration versus exploitation in complex optimization scenarios, ultimately improving the diversity of labeled samples and contributing to the development of better classifiers trained with less examples. Experiments were conducted using the CVEjoin dataset, which encompasses over 200,000 vulnerabilities, across three distinct evaluation scenarios. The different setups consider equivalent volumes of labeled data, but varying Active Learning iterations. When considering a single strategy, the results indicate that the BSB (best and second best) method consistently outperformed the others in terms of accuracy and F1 score, particularly with an increased number of labeling iterations. In the scenario where multiple strategies are used in a portfolio, the results indicate gains in all evaluation metrics. This study underscores the usefulness of a portfolio-based Active Learning approach in optimizing the labeling procedure and, ultimately, prioritizing vulnerabilities for remediation. This research lays the groundwork for extending the framework to other areas of cybersecurity, such as vulnerabilities in web applications and cloud environments, thereby improving overall security measures in the digital landscape.

Downloads

Download data is not yet available.

References

Alshaya, F. A., Alqahtani, S. S., and Alsamel, Y. A. (2023). Vrt: A cwe-based vulnerability report tagger: Machine learning driven cybersecurity tool for vulnerability classification. In 2023 IEEE/ACM 1st International Workshop on Software Vulnerability (SVM), pages 10-13. IEEE. DOI: 10.1109/svm59160.2023.00007.

Blasco, T., Sánchez, J. S., and García, V. (2024). A survey on uncertainty quantification in deep learning for financial time series prediction. Neurocomputing, 576:127339. DOI: 10.1016/j.neucom.2024.127339.

da Ponte, F. R., Rodrigues, E. B., and Mattos, C. L. (2023). A vulnerability risk assessment methodology using active learning. In International Conference on Advanced Information Networking and Applications, pages 171-182. Springer. DOI: 10.1007/978-3-031-28451-9_15.

Elbaz, C., Rilling, L., and Morin, C. (2021). Automated risk analysis of a vulnerability disclosure using active learning. In C&ESAR 2021-28th Computer & Electronics Security Application Rendezvous, pages 1-19. Available at:[link].

Firoiu, M. (2015). General considerations on risk management and information system security assessment according to iso/iec 27005: 2011 and iso 31000: 2009 standards. Quality-Access to Success, 16(149). Book.

Foreman, P. (2019). Vulnerability management. Auerbach Publications. DOI: 10.1201/9780429289651.

Garnett, R. (2023). Bayesian optimization. Cambridge University Press. DOI: 10.1017/9781108348973.

Hensman, J., Fusi, N., and Lawrence, N. D. (2013). Gaussian processes for big data. In Proceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence, UAI 2013. AUAI Press. DOI: 10.48550/arxiv.1309.6835.

Hensman, J., Matthews, A., and Ghahramani, Z. (2015). Scalable variational gaussian process classification. In Artificial Intelligence and Statistics, pages 351-360. PMLR. DOI: 10.48550/arxiv.1411.2005.

Hoffman, M., Brochu, E., De Freitas, N., et al. (2011). Portfolio allocation for bayesian optimization. In UAI, pages 327-336. DOI: 10.48550/arXiv.1009.5419.

Hore, S., Shah, A., and Bastian, N. D. (2023). Deep vulman: A deep reinforcement learning-enabled cyber vulnerability management framework. Expert Systems with Applications, 221:119734. DOI: 10.1016/j.eswa.2023.119734.

Jakkal, V. (2022). Cybersecurity threats are always changing—staying on top of them is vital, getting ahead of them is paramount. Available at:[link] Microsoft Security Blog.

Joshi, A. J., Porikli, F., and Papanikolopoulos, N. (2009). Multi-class active learning for image classification. In 2009 ieee conference on computer vision and pattern recognition, pages 2372-2379. IEEE. DOI: 10.1109/cvpr.2009.5206627.

Kashyap, A., Chakravarthy, A., and Menon, P. P. (2022). Detection of cyber-attacks in automotive traffic using macroscopic models and gaussian processes. IEEE Control Systems Letters, 6:1688-1693. DOI: 10.1109/lcsys.2021.3131259.

Kure, H. I., Islam, S., Ghazanfar, M., Raza, A., and Pasha, M. (2022). Asset criticality and risk prediction for an effective cybersecurity risk management of cyber-physical system. Neural Computing and Applications, 34(1):493-514. DOI: 10.1007/s00521-021-06400-0.

Pereira-Santos, D., Prudêncio, R. B. C., and de Carvalho, A. C. (2019). Empirical investigation of active learning strategies. Neurocomputing, 326:15-27. DOI: 10.1016/j.neucom.2017.05.105.

Ponte, F. R. P., Rodrigues, E. B., and Mattos, C. L. C. (2025). Frape: A framework for risk assessment, prioritization and explainability of vulnerabilities in cybersecurity. Journal of Information Security and Applications, 89:103971. DOI: 10.1016/j.jisa.2025.103971.

Rasmussen, C. E. and Williams, C. K. I. (2006). Gaussian processes for machine learning. Adaptive computation and machine learning. MIT Press. DOI: 10.7551/mitpress/3206.001.0001.

Ross, R. S. (2012). Guide for conducting risk assessments. Special Publication 800-30 Rev. 1, National Institute of Standards and Technology.

Sabottke, C., Suciu, O., and Dumitras, T. (2015). Vulnerability disclosure in the age of social media: Exploiting twitter for predicting $Real-World$ exploits. In 24th USENIX Security Symposium (USENIX Security 15), pages 1041-1056. Available at:[link].

Sun, X., Tu, L., Zhang, J., Cai, J., Li, B., and Wang, Y. (2023). Assbert: Active and semi-supervised bert for smart contract vulnerability detection. Journal of Information Security and Applications, 73:103423. DOI: 10.1016/j.jisa.2023.103423.

Swiler, L. P., Gulian, M., Frankel, A. L., Safta, C., and Jakeman, J. D. (2020). A survey of constrained gaussian process regression: Approaches and implementation challenges. Journal of Machine Learning for Modeling and Computing, 1(2). DOI: 10.1615/jmachlearnmodelcomput.2020035155.

Tenable (2023). Três desafios reais enfrentados pelas organizações de segurança cibernética. Available at:[link].

Vasconcelos, T. d. P., de Souza, D. A. R. M. A., Mattos, C. L. C., and Gomes, J. P. P. (2019). No-past-bo: Normalized portfolio allocation strategy for bayesian optimization. In 2019 IEEE 31st International Conference on Tools with Artificial Intelligence (ICTAI), pages 561-568. DOI: 10.1109/ICTAI.2019.00084.

Williams, C. K. and Rasmussen, C. E. (2006). Gaussian processes for machine learning, volume 2. MIT press Cambridge, MA. DOI: 10.7551/mitpress/3206.001.0001.