Investigating Vulnerability-Fixing Commits

Authors

DOI:

https://doi.org/10.5753/jbcs.2025.4675

Keywords:

Vulnerability, Datasets, Commits, Common Vulnerabilities and Exposures

Abstract

An insecure software can cause severe damage to the user experience and privacy. Therefore, developers should prevent software vulnerabilities. However, detecting such problems is expensive and time-consuming. To address this issue, researchers propose vulnerability datasets that facilitate the investigation of their properties. In this regard, we investigate one of these datasets to better understand the vulnerabilities, their corrections, the authors involved, and the properties of the correction commits. Our results indicate that some vulnerabilities require many patches to solve. Furthermore, among the projects included in the target dataset, the Chromium project is the most affected by these vulnerabilities. We also find that in most cases correction commits are small in terms of the number of files and lines affected. Additionally, the authors of the corrections are mostly not new to the files that need fixing. Finally, we find that most corrections involve changes that affect other developers and rarely affect the developer who introduced the problem. Therefore, corrections are usually made by other developers rather than by those who introduced the problem. We believe that our findings can help developers resolve vulnerabilities with fewer resources, such as time, budget, and manpower.

Downloads

Download data is not yet available.

References

Allen, J., Barnum, S., Ellison, R., McGraw, G., and Mead, N. (2006). Software Security Engineering. Addison-Wesley Professional. Book.

Almeida, V. and Andrade, R. (2024). Online appendix. Available at: [link].

Avelino, G., Passos, L., Hora, A., and Valente, M. T. (2016). A novel approach for estimating truck factors. In 2016 IEEE 24th International Conference on Program Comprehension (ICPC), pages 1-10. DOI: 10.1109/ICPC.2016.7503718.

Basili, V., Caldiera, G., and Rombach, D. H. (1994). The goal question metric approach. In Marciniak, J. J., editor, Encyclopedia of Software Engineering, pages 528-532. Wiley, New Jersey. Available online [link].

Bhandari, G., Naseer, A., and Moonen, L. (2021). Cvefixes: automated collection of vulnerabilities and their fixes from open-source software. In Proceedings of the 17th International Conference on Predictive Models and Data Analytics in Software Engineering, pages 30-39. DOI: 10.5281/zenodo.5111494.

Bosu, A., Carver, J. C., Hafiz, M., Hilley, P., and Janni, D. (2014). Identifying the characteristics of vulnerable code changes: An empirical study. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, FSE 2014, page 257–268, New York, NY, USA. Association for Computing Machinery. DOI: 10.1145/2635868.2635880.

Chen, Y., Ding, Z., Alowain, L., Chen, X., and Wagner, D. (2023). Diversevul: A new vulnerable source code dataset for deep learning based vulnerability detection. RAID '23, page 654–668, New York, NY, USA. Association for Computing Machinery. DOI: 10.1145/3607199.3607242.

Cosentino, V., Izquierdo, J. L. C., and Cabot, J. (2015). Assessing the bus factor of git repositories. In 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER), pages 499-503. DOI: 10.1109/SANER.2015.7081864.

CVE-2011-3110 (2023). CVE-2011-3110. Available at:[link].

CVE-2012-2827 (2023). CVE-2012-2827. Available at: [link].

CVE-2013-0892 (2023). CVE-2013-0892. Available at: [link].

CVE-2015-1265 (2023). CVE-2015-1265. Available at: [link].

CVE-2017-6903 (2023). CVE-2017-6903. Available at: [link].

den Besten, M., Amrit, C., Capiluppi, A., and Robles, G. (2021). Collaboration and innovation dynamics in software ecosystems: A technology management research perspective. IEEE Transactions on Engineering Management, 68(5):1532-1537. DOI: 10.1109/TEM.2020.3015969.

Fan, J., Li, Y., Wang, S., and Nguyen, T. N. (2020). A c/c++ code vulnerability dataset with code changes and cve summaries. In Proceedings of the 17th International Conference on Mining Software Repositories, MSR '20, page 508–512, New York, NY, USA. Association for Computing Machinery. DOI: 10.1145/3379597.3387501.

FFmpeg (2023). Ffmpeg. Available at:[link].

Forootani, S., Sorbo, A. D., and Visaggio, C. A. (2022). An exploratory study on self-fixed software vulnerabilities in oss projects. 2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER), pages 90-100. DOI: 10.1109/SANER53432.2022.00023.

Ghaffarian, S. M. and Shahriari, H. R. (2017). Software vulnerability analysis and discovery using machine-learning and data-mining techniques: A survey. 50(4). DOI: 10.1145/3092566.

GitHub REST API (2021). Available at:[link].

Gkortzis, A., Mitropoulos, D., and Spinellis, D. (2018). Vulinoss: A dataset of security vulnerabilities in open-source systems. In Proceedings of the 15th International Conference on Mining Software Repositories, MSR '18, page 18–21, New York, NY, USA. Association for Computing Machinery. DOI: 10.1145/3196398.3196454.

Graf, J., Hecker, M., and Mohr, M. (2013). Using joana for information flow control in java programs - a practical guide. In Wagner, S. and Lichter, H., editors, Software Engineering 2013 - Workshopband, pages 123-138, Bonn. Gesellschaft für Informatik e.V. Available online [link].

Hanif, H., Md Nasir, M. H. N., Ab Razak, M. F., Firdaus, A., and Anuar, N. B. (2021). The rise of software vulnerability: Taxonomy of software vulnerabilities detection and machine learning approaches. Journal of Network and Computer Applications, 179:103009. DOI: 10.1016/j.jnca.2021.10300.

Harer, J. A., Kim, L. Y., Russell, R. L., Ozdemir, O., Kosta, L. R., Rangamani, A., Hamilton, L. H., Centeno, G. I., Key, J. R., Ellingwood, P. M., Antelman, E., Mackay, A., McConley, M. W., Opper, J. M., Chin, P., and Lazovich, T. (2018). Automated software vulnerability detection with machine learning.

Herzig, K. and Zeller, A. (2013). The impact of tangled code changes. In 2013 10th Working Conference on Mining Software Repositories (MSR), pages 121-130. DOI: 10.1109/MSR.2013.6624018.

ImageMagick (2023). Imagemagick. Available at:[link].

Krsul, I. V. (1998). Software vulnerability analysis. Available online [link].

Liu, B., Meng, G., Zou, W., Gong, Q., Li, F., Lin, M., Sun, D., Huo, W., and Zhang, C. (2020). A large-scale empirical study on vulnerability distribution within projects and the lessons learned. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering, ICSE '20, page 1547–1559, New York, NY, USA. Association for Computing Machinery. DOI: 10.1145/3377811.3380923.

McGraw, G., Allen, J. H., Mead, N., Ellison, R. J., and Barnum, S. (2008). Software Security Enginneering: A Guide for Project Managers. Pearson Education (US), EUA. Book.

Meneely, A., Srinivasan, H., Musa, A., Tejeda, A. R., Mokary, M., and Spates, B. (2013). When a patch goes bad: Exploring the properties of vulnerability-contributing commits. In 2013 ACM / IEEE International Symposium on Empirical Software Engineering and Measurement, pages 65-74. DOI: 10.1109/ESEM.2013.19.

Meneely, A., Tejeda, A. C. R., Spates, B., Shannon Trudeau, D. N., Whitlock, K., Ketant, C., and Davis, K. (2014). An empirical investigation of socio-technical code review metrics and security vulnerabilities. In Proceedings of the 6th International Workshop on Social Software Engineering, pages 37-44. DOI: 10.1145/2661685.2661687.

Meneely, A. and Williams, O. (2012). Interactive churn metrics: Socio-technical variants of code churn. SIGSOFT Softw. Eng. Notes, 37(6):1–6. DOI: 10.1145/2382756.2382785.

National Institute of Standards and Technology. National Vulnerability Database (NVD) (2021). Available at:[link].

OpenJK (2023). Available at:[link].

Perl, H., Dechand, S., Smith, M., Arp, D., Yamaguchi, F., Rieck, K., Fahl, S., and Acar, Y. (2015). Vccfinder: Finding potential vulnerabilities in open-source projects to assist code audits. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, CCS '15, page 426–437, New York, NY, USA. Association for Computing Machinery. DOI: 10.1145/2810103.2813604.

Piantadosi, V., Scalabrino, S., and Oliveto, R. (2019). Fixing of security vulnerabilities in open source projects: A case study of apache http server and apache tomcat. 2019 12th IEEE Conference on Software Testing, Validation and Verification (ICST), pages 68-78. DOI: 10.1109/ICST.2019.00017.

Ponta, S. E., Plate, H., Sabetta, A., Bezzi, M., and Dangremont, C. (2019). A manually-curated dataset of fixes to vulnerabilities of open-source software. In 2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR), pages 383-387. DOI: 10.1109/MSR.2019.00064.

Rahman, F. and Devanbu, P. (2011). Ownership, experience and defects: a fine-grained study of authorship. In International Conference on Software Engineering, ICSE '11, page 491–500, New York, NY, USA. Association for Computing Machinery. DOI: 10.1145/1985793.1985860.

Rigby, P. C., Zhu, Y. C., Donadelli, S. M., and Mockus, A. (2016). Quantifying and mitigating turnover-induced knowledge loss: case studies of chrome and a project at avaya. In Proceedings of the 38th International Conference on Software Engineering, ICSE '16, page 1006–1016, New York, NY, USA. Association for Computing Machinery. DOI: 10.1145/2884781.2884851.

Spagnoletti, P., Kazemargi, N., and Prencipe, A. (2022). Agile practices and organizational agility in software ecosystems. IEEE Transactions on Engineering Management, 69(6):3604-3617. DOI: 10.1109/TEM.2021.3110105.

The Bugzilla Guide (2024). Available at: [link].

The Chromium Projects (2023). Chromium. Available at:[link].

The Chromium Repository on GitHub (2023). Github. Available at:[link] .

The FFmpeg Repository on GitHub (2023). Github. Available at: [link].

The ImageMagick Repository on GitHub (2023). Github. Available at:[link].

The Linux Foundation (2017). 2017 linux kernel development report. Available at: [link].

The Linux Kernel Archives (2023). Linux. Available at: [link].

The Linux Repository on GitHub (2023). Github. Available at: [link].

The MITRE Corporation (2023). Common vulnerabilities and exposures (cve). Available at:[link].

The MITRE Corporation (2024). Common weakness enumeration. Available at: [link].

W3Schools (2023). Available at:[link].

Wan, L. (2019). Automated vulnerability detection system based on commit messages. PhD thesis. PhD Thesis.

Wohlin, C., Runeson, P., Höst, M., Ohlsson, M. C., Regnell, B., and Wesslén, A. (2012). Experimentation in software engineering. Springer Science & Business Media. DOI: 10.1007/978-3-662-69306-3.

Wood, A. and Stankovic, J. (2002). Denial of service in sensor networks. Computer, 35(10):54-62. DOI: 10.1109/MC.2002.1039518.

Downloads

Published

2025-05-15

How to Cite

Almeida, V., & Andrade, R. (2025). Investigating Vulnerability-Fixing Commits. Journal of the Brazilian Computer Society, 31(1), 294–309. https://doi.org/10.5753/jbcs.2025.4675

Issue

Section

Articles