Extending the Comparative Study of Anomaly Detection Tools in Software Requirements with ChatGPT

Fábio Rodrigues Pereira; Heitor Augustus Xavier Costa; Paulo Afonso Parreira Júnior

doi:10.5753/jserd.2026.5919

Authors

Fábio Rodrigues Pereira Federal University of Lavras https://orcid.org/0009-0006-4733-9484
Heitor Augustus Xavier Costa Federal University of Lavras https://orcid.org/0000-0002-9903-7414
Paulo Afonso Parreira Júnior Federal University of Lavras https://orcid.org/0000-0002-8877-2931

DOI:

https://doi.org/10.5753/jserd.2026.5919

Keywords:

Requirements Engineering, Requirements Anomalies, Generative AI, ChatGPT

Abstract

A software requirement indicates a capability or characteristic that a software system must possess to provide value to its stakeholders. It is essential to ensure that the description of the requirements is unambiguous to allow for proper understanding and facilitate its evolution. However, since most software requirements are described in natural language, they may contain subjectivity and inconsistencies in their descriptions, which are conventionally referred to as "Software Requirements Anomalies". Several studies propose tools to aid in the detection of requirements anomalies. However, it can be observed that few of these studies evaluate the effectiveness (recall and precision) of the proposed tools. Therefore, this work presents a comparative study of three anomaly detection tools (RETA, Tactile Check, and Tiger Pro), as well as the ChatGPT model, analyzed based on requirements documents from different domains containing over 85 anomalies. The results show that the Tactile Check tool produced the best performance. Although ChatGPT offers advantages in terms of information visualization and flexibility of interaction, its performance was not satisfactory compared to tools specifically designed for anomaly detection in software requirements. All analyzed tools, including ChatGPT, demonstrated unsatisfactory levels of recall and precision, averaging below 66% and 57%, respectively. These results highlight the need for further contributions in this research area.

Downloads

Download data is not yet available.

References

Arendse, B. and Lucassen, G. (2016). Toward tool mashups: Comparing and combining nlp re tools. In 2016 IEEE 24th International Requirements Engineering Conference Workshops (REW), pages 26–31. DOI: 10.1109/REW.2016.019.

Arora, C., Grundy, J., and Abdelrazek, M. (2024). Advancing Requirements Engineering Through Generative AI: Assessing the Role of LLMs, pages 129–148. Springer Nature Switzerland, Cham. DOI: 10.1007/978-3-031-55642-56.

Arora, C., Sabetzadeh, M., Briand, L., and Zimmer, F. (2015). Automated checking of conformance to requirements templates using natural language processing. IEEE Transactions on Software Engineering, 41:1–1. DOI: 10.1109/TSE.2015.2428709.

Arora, C., Sabetzadeh, M., Briand, L., and Zimmer, F. (2017). Automated extraction and clustering of requirements glossary terms. IEEE Transactions on Software Engineering, 43(10):918–945. DOI: 10.1109/TSE.2016.2635134.

Asadabadi, M., Chang, E., Zwikael, O., Saberi, M., and Sharpe, K. (2019). Hidden fuzzy information: Requirement specification and measurement of project provider performance. Fuzzy Sets and Systems. DOI: 10.1016/j.fss.2019.06.017.

Asadabadi, M. R., Saberi, M., Zwikael, O., and Chang, E. (2020). Ambiguous requirements: A semi-automated approach to identify and clarify ambiguity in large-scale projects. Computers & Industrial Engineering, 149:106828. DOI: https://doi.org/10.1016/j.cie.2020.106828.

Beer, A., Junker, M., Femmer, H., and Felderer, M. (2017). Initial investigations on the influence of requirement smells on test-case design. In 2017 IEEE 25th International Requirements Engineering Conference Workshops (REW), pages 323–326. DOI: 10.1109/REW.2017.43.

Bäumer, F. and Geierhos, M. (2018). Flexible ambiguity resolution and incompleteness detection in requirements descriptions via an indicator-based configuration of text analysis pipelines. DOI: 10.24251/HICSS.2018.720.

Calazans, A. T. S., Paldês, R. Á., Canedo, E. D., Masson, E. T. S., de Albuquerque Guimarães, F., Rezende, K. M. F., de Souza Gonçalves, F., and Mariano, A. M. (2019). Quality requirements and the requirements quality: The indications from requirements smells in a financial institution systems. In Proceedings of the XXXIII Brazilian Symposium on Software Engineering, pages 472–480.

Chitchyan, R., Sampaio, A., Rashid, A., and Rayson, P. (2006). In A Tool Suite for Aspect-Oriented Requirements Engineering, EA ’06, page 19–26, New York, NY, USA. Association for Computing Machinery. DOI: 10.1145/1137639.1137644.

Elrakaiby, Y., Ferrari, A., Spoletini, P., Gnesi, S., and Nuseibeh, B. (2017). Using argumentation to explain ambiguity in requirements elicitation interviews. In 2017 IEEE 25th International Requirements Engineering Conference (RE), pages 51–60. DOI: 10.1109/RE.2017.27.

Fabbri, S. C. P. F., Ferrari, F. C., and Camargo, K. G. (2014). A atividade de teste sob a perspectiva de qualidade de software. Revista TIS, 2(3):164–166.

Felizardo, K. R., Lima, M. S., Deizepe, A., Conte, T. U., and Steinmacher, I. (2024). Chatgpt application in systematic literature reviews in software engineering: an evaluation of its accuracy to support the selection activity. In Proceedings of the 18th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, ESEM ’24, page 25–36, New York, NY, USA. Association for Computing Machinery. DOI: 10.1145/3674805.3686666.

Femmer, H. (2013). Reviewing natural language requirements with requirements smells – a research proposal –.

Femmer, H. (2017). Automatic requirements reviews - potentials, limitations and practical tool support. pages 617–620. DOI: 10.1007/978-3-319-69926-453.

Femmer, H., Fernández, D. M., Juergens, E., Klose, M., Zimmer, I., and Zimmer, J. (2014). Rapid requirements checks with requirements smells: Two case studies. RCoSE 2014, page 10–19, New York, NY, USA. Association for Computing Machinery. DOI: 10.1145/2593812.2593817.

Femmer, H., Hauptmann, B., Eder, S., and Moser, D. (2016). Quality assurance of requirements artifacts in practice: A case study and a process proposal. pages 506–516. DOI: 10.1007/978-3-319-49094-636.

Femmer, H., MÃ©ndez FernÃ¡ndez, D., Wagner, S., and Eder, S. (2017). Rapid quality assurance with requirements smells. Journal of Systems and Software, 123:190–213. DOI: https://doi.org/10.1016/j.jss.2016.02.047.

Ferrari, A. (2018). Natural language requirements processing: from research to practice. pages 536–537. DOI: 10.1145/3183440.3183467.

Ferrari, A., Donati, B., and Gnesi, S. (2017a). Detecting domain-specific ambiguities: An nlp approach based on wikipedia crawling and word embeddings. pages 393–399. DOI: 10.1109/REW.2017.20.

Ferrari, A., Gori, G., Rosadini, B., Trotta, I., Bacherini, S., Fantechi, A., and Gnesi, S. (2018). Detecting requirements defects with nlp patterns: an industrial experience in the railway domain. Empirical Software Engineering, 23(6):3684–3733.

Ferrari, A., Spagnolo, G., and Gnesi, S. (2017b). Pure: A dataset of public requirements documents. pages 502–505. DOI: 10.1109/RE.2017.29.

Ferrari, A., Spoletini, P., Donati, B., Zowghi, D., and Gnesi, S. (2017c). Interview review: Detecting latent ambiguities to improve the requirements elicitation process. pages 400–405. DOI: 10.1109/RE.2017.15.

Garner, P., Hopewell, S., Chandler, J., MacLehose, H., Schünemann, H., Akl, E., Beyene, J., Chang, S., Churchill, R., Dearness, K., Guyatt, G., Lefebvre, C., Liles, B., Marshall, R., García, L., Mavergames, C., Nasser, M., Qaseem, A., Sampson, M., and Wilson, E. (2016). When and how to update systematic reviews: Consensus and checklist. BMJ, 354:i3507. DOI: 10.1136/bmj.i3507.

Herrera, J., Macia, I., Salas, P., Pinho, R., Vargas, R., Garcia, A., Araújo, J., and Breitman, K. (2012). Revealing cross-cutting concerns in textual requirements documents: an exploratory study with industry systems. In 2012 26th Brazilian Symposium on Software Engineering, pages 111–120. IEEE.

Hu, W., Carver, J., Anu, V., Walia, G., and Bradshaw, G. (2018). Using human error information for error prevention. Empirical Software Engineering, 23. DOI: 10.1007/s10664-018-9623-8.

Kamsties, E., Berry, D., and Paech, B. (2001). Detecting ambiguities in requirements documents using inspections.

Kasser, J., Tran, X.-L., Matisons, S., et al. (2003). Prototype educational tools for systems and software (pets) engineering. PhD thesis, Australasian Association for Engineering Education.

Kiyavitskaya, N., Zeni, N., Mich, L., and Berry, D. M. (2008). Requirements for tools for ambiguity identification and measurement in natural language requirements specifications. Requirements engineering, 13(3):207–239.

Kroth, E. (2012). Emprego de técnicas de representação do conhecimento como forma de apoio à engenharia de requisitos. In XVIII Congreso Argentino de Ciencias de la Computación.

Lucassen, G., Dalpiaz, F., Van der Werf, J. M., and Brinkkemper, S. (2016). Improving agile requirements: the quality user story framework and tool. Requirements Engineering, 21. DOI: 10.1007/s00766-016-0250-x.

Machado, F. (2018). Análise e Gestão de Requisitos de Software – Onde nascem os sistemas. Saraiva Educação S.A.

Nakache, D., Metais, E., and Timsit, J. F. (2005). Evaluation and nlp. In International Conference on Database and Expert Systems Applications, pages 626–632. Springer.

Nascimento, R., Aranha, E., Kulesza, U., and Lucena, M. (2018). Requirements smells como indicadores de má qualidade na especificação de requisitos: Um mapeamento sistemático da literatura. In WER.

Nascimento, R., Guimarães, E., and Lucena, M. (2021). Requirements smells como indicador de qualidade para histórias de usuários: Estudo exploratório. In WER.

Pereira, F. R., Costa, H. A. X., and Parreira, P. A. (2024). A comparative study of tools for anomaly detection in software requirements. In Proceedings of the XXIII Brazilian Symposium on Software Quality, SBQS ’24, page 11–21, New York, NY, USA. Association for Computing Machinery. DOI: 10.1145/3701625.3701641.

Ronanki, K., Cabrero-Daniel, B., Horkoff, J., and Berger, C. (2024). Requirements Engineering Using Generative AI: Prompts and Prompting Patterns, pages 109–127. Springer Nature Switzerland, Cham. DOI: 10.1007/978-3-031-55642-55.

Rosadini, B., Ferrari, A., Gori, G., Fantechi, A., Gnesi, S., Trotta, I., and Bacherini, S. (2017). Using nlp to detect requirements defects: An industrial experience in the railway domain. volume 10153, pages 344–360. DOI: 10.1007/978-3-319-54045-024.

Sasaki, Y. et al. (2007). The truth of the f-measure. Teach tutor mater, 1(5):1–5.

Sommerville, I. (2011). Engenharia de software. Pearson Education, 9th edition.

Valente, M. T. (2020). Engenharia de Software Moderna: Princípios e Práticas para Desenvolvimento de Software com Produtividade. Independente.

Vogelsang, A. (2024). From specifications to prompts: On the future of generative large language models in requirements engineering. IEEE Software, 41(5):9–13. DOI: 10.1109/MS.2024.3410712.

Wilmink, M. and Bockisch, C. (2017). On the ability of lightweight checks to detect ambiguity in requirements documentation. volume 10153, pages 327–343. DOI: 10.1007/978-3-319-54045-023.

Zhao, L., Alhoshan, W., Ferrari, A., Letsholo, K., Ajagbe, M., Chioasca, E.-V., and Batista-Navarro, R. (2020). Natural language processing (nlp) for requirements engineering: A systematic mapping study.

Extending the Comparative Study of Anomaly Detection Tools in Software Requirements with ChatGPT

Authors

DOI:

Keywords:

Abstract

Downloads

References

Downloads

Published

How to Cite

Issue

Section

License

Make a Submission

Metrics: