Turbocharging Brazilian Mergers and Acquisitions: Questions & Answers Evaluation

Francis Spiegel Rubin; Pedro Nuno de Souza Moura; Adriana Cesario de Faria Alvim

doi:10.5753/jbcs.2026.5703

Authors

Francis Spiegel Rubin Federal University of the State of Rio de Janeiro (UNIRIO) https://orcid.org/0000-0003-3626-3120
Pedro Nuno de Souza Moura Federal University of the State of Rio de Janeiro (UNIRIO) https://orcid.org/0000-0001-7799-4718
Adriana Cesario de Faria Alvim Federal University of the State of Rio de Janeiro (UNIRIO) https://orcid.org/0000-0002-9486-607X

DOI:

https://doi.org/10.5753/jbcs.2026.5703

Keywords:

Retrieval-Augmented Generation, Questions & Answers Evaluation, Large Language Models, Natural Language Processing

Abstract

Economic power abuse is a concern in Brazil, where CADE (Administrative Council for Economic Defense) institution combats anti-competitive behaviors to ensure fair competition. Artificial intelligence (AI) can aid CADE by identifying and extracting relevant information from technical reports published in Brazilian Portuguese language, improving the detection and prevention of economic abuse. This paper presents a case study using AI to improve regulatory reviews of CADE documents via a Retrieval-Augmented Generation (RAG) pipeline architecture. Our key contribution is the creation of a specialized Questions & Answers benchmark dataset and a pipeline evaluation methodology, providing a standardized framework for Portuguese-language regulatory document analysis. A chain of thought (CoT) approach was used for problem solving. It leverages the RAG retrieval mechanism to access relevant information and incorporates the sequential reasoning of the CoT framework to generate responses that follow a logical flow of ideas, thus enhancing response accuracy. A vector database employing cosine similarity was used to retrieve the main arguments combined with metadata filters, reducing hallucinations and improving the Large Language Model (LLM) performance. RAG metrics were then combined with a robust human fact-check assessment to validate the pipeline. Our findings establish a new benchmark for Questions & Answers evaluation in Brazilian Mergers and Acquisitions, demonstrating that the proposed strategy effectively enhances the analysis of organizational merger and acquisition reports, unlocking substantial benefits for society.

Downloads

Download data is not yet available.

References

Ashley, K. D. (2017). Artificial intelligence and legal analytics: new tools for law practice in the digital age. Cambridge University Press. DOI: 10.1017/9781316761380.

Baeza-Yates, R., Ribeiro-Neto, B., et al. (1999). Modern information retrieval, volume 463. ACM press New York. Book.

Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D. M., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish, S., Radford, A., Sutskever, I., and Amodei, D. (2020). Language models are few-shot learners. DOI: 10.48550/arxiv.2005.14165.

Buttcher, S., Clarke, C., and Cormack, G. (2016). Information Retrieval: Implementing and Evaluating Search Engines. INFORMATION RETRIEVAL. MIT Press. Book.

Chakrabarti, D., Patodia, N., Bhattacharya, U., Mitra, I., Roy, S., Mandi, J., Roy, N., and Nandy, P. (2018). Use of artificial intelligence to analyse risk in legal documents for a better decision support. In TENCON 2018-2018 IEEE Region 10 Conference, pages 0683-0688. IEEE. DOI: 10.1109/tencon.2018.8650382.

Chowdhary, K. (2020). Fundamentals of artificial intelligence. Springer. DOI: 10.1007/978-81-322-3972-7.

Dai, S., Zhou, Y., Pang, L., Liu, W., Hu, X., Liu, Y., Zhang, X., Wang, G., and Xu, J. (2024). Neural retrievers are biased towards llm-generated content. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, volume 3 of KDD ’24, page 526–537. ACM. DOI: 10.1145/3637528.3671882.

de Sousa, W. G., Fidelis, R. A., de Souza Bermejo, P. H., da Silva Gonçalo, A. G., and de Souza Melo, B. (2022). Artificial intelligence and speedy trial in the judiciary: Myth, reality or need? a case study in the brazilian supreme court (stf). Government Information Quarterly, 39(1):101660. DOI: 10.1016/j.giq.2021.101660.

Es, S., James, J., Espinosa-Anke, L., and Schockaert, S. (2023). Ragas: Automated evaluation of retrieval augmented generation. DOI: 10.18653/v1/2024.eacl-demo.16.

Fan, W., Ding, Y., Ning, L., Wang, S., Li, H., Yin, D., Chua, T.-S., and Li, Q. (2024). A survey on rag meeting llms: Towards retrieval-augmented large language models. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 6491-6501. DOI: 10.1145/3637528.3671470.

Fernandes, W. P. D., Frajhof, I. Z., Rodrigues, A. M. B., Barbosa, S. D. J., Konder, C. N., Nasser, R. B., de Carvalho, G. R., Lopes, H. C. V., et al. (2022). Extracting value from brazilian court decisions. Information Systems, 106:101965. DOI: 10.1016/j.is.2021.101965.

Fernandes, W. P. D., Silva, L. J. S., Frajhof, I. Z., de Almeida, G. d. F. C. F., Konder, C. N., Nasser, R. B., de Carvalho, G. R., Barbosa, S. D. J., and Lopes, H. C. V. (2020). Appellate court modifications extraction for portuguese. Artificial Intelligence and Law, 28(3):327-360. DOI: 10.1007/s10506-019-09256-x.

Finardi, P., Avila, L., Castaldoni, R., Gengo, P., Larcher, C., Piau, M., Costa, P., and Caridá, V. (2024). The chronicles of rag: The retriever, the chunk and the generator. arXiv preprint arXiv:2401.07883. DOI: 10.48550/arxiv.2401.07883.

Ghali, M.-K., Farrag, A., Won, D., and Jin, Y. (2025). Enhancing knowledge retrieval with in-context learning and semantic search through generative ai. Knowledge-Based Systems, page 113047. DOI: 10.1016/j.knosys.2025.113047.

Golovanova, S., Ribeiro, E. P., and Avdasheva, S. (2025). Economic analysis and competition policy practice: A comparative empirical examination. Economic Systems, 49(1). DOI: 10.1016/j.ecosys.2024.101245.

Han, Y., Liu, C., and Wang, P. (2023). A comprehensive survey on vector database: Storage and retrieval technique, challenge. arXiv preprint arXiv:2310.11703. DOI: 10.48550/arxiv.2310.11703.

Hilabadu, A. and Zaytsev, D. (2024). An assessment of compliance of large language models through automated information retrieval and answer generation. Authorea Preprints. DOI: 10.36227/techrxiv.172668489.92285234/v1.

Holtzman, A., Buys, J., Du, L., Forbes, M., and Choi, Y. (2020). The curious case of neural text degeneration. DOI: 10.48550/arxiv.1904.09751.

Huang, J. and Chang, K. C.-C. (2022). Towards reasoning in large language models: A survey. arXiv preprint arXiv:2212.10403. DOI: 10.18653/v1/2023.findings-acl.67.

Izacard, G. and Grave, E. (2021). Leveraging passage retrieval with generative models for open domain question answering. Available at:[link].

Ji, Z., Lee, N., Frieske, R., Yu, T., Su, D., Xu, Y., Ishii, E., Bang, Y. J., Madotto, A., and Fung, P. (2023). Survey of hallucination in natural language generation. ACM Computing Surveys, 55(12):1–38. DOI: 10.1145/3571730.

Joshi, A. K. (1991). Natural language processing. Science, 253(5025):1242-1249. DOI: 10.1126/science.253.5025.1242.

Kolomiyets, O. and Moens, M.-F. (2011). A survey on question answering technology from an information retrieval perspective. Inf. Sci., 181:5412-5434. DOI: 10.1016/j.ins.2011.07.047.

Lacoste, A., Luccioni, A., Schmidt, V., and Dandres, T. (2019). Quantifying the carbon emissions of machine learning. arXiv preprint arXiv:1910.09700. DOI: 10.48550/arxiv.1910.09700.

Lai, J., Gan, W., Wu, J., Qi, Z., and Yu, P. S. (2023). Large language models in law: A survey. arXiv preprint arXiv:2312.03718. DOI: 10.1016/j.aiopen.2024.09.002.

Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Küttler, H., Lewis, M., Yih, W.-t., Rocktäschel, T., et al. (2020). Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems, 33:9459-9474. DOI: 10.48550/arXiv.2005.11401.

Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Küttler, H., Lewis, M., tau Yih, W., Rocktäschel, T., Riedel, S., and Kiela, D. (2021). Retrieval-augmented generation for knowledge-intensive nlp tasks. Available at:[link].

Li, J., Yuan, Y., and Zhang, Z. (2024a). Enhancing llm factual accuracy with rag to counter hallucinations: A case study on domain-specific queries in private knowledge-bases. arXiv preprint arXiv:2403.10446. DOI: 10.48550/arxiv.2403.10446.

Li, R., Li, R., Wang, B., and Du, X. (2024b). Iqa-eval: Automatic evaluation of human-model interactive question answering. Advances in Neural Information Processing Systems, 37:109894-109921. DOI: 10.52202/079017-3487.

Liang, L., Sun, M., Gui, Z., Zhu, Z., Jiang, Z., Zhong, L., Qu, Y., Zhao, P., Bo, Z., Yang, J., Xiong, H., Yuan, L., Xu, J., Wang, Z., Zhang, Z., Zhang, W., Chen, H., Chen, W., and Zhou, J. (2024). Kag: Boosting llms in professional domains via knowledge augmented generation. DOI: 10.1145/3701716.3715240.

Manning, C. D. (1999). Foundations of statistical natural language processing. The MIT Press. Book.

Mohseni, S. et al. (2018). A human-grounded evaluation benchmark for explanations in machine learning systems. arXiv, 10(2):123-135. DOI: 10.48550/arXiv.1801.05075.

Nay, J. J., Karamardian, D., Lawsky, S. B., Tao, W., Bhat, M., Jain, R., Lee, A. T., Choi, J. H., and Kasai, J. (2024). Large language models as tax attorneys: a case study in legal capabilities emergence. Philosophical Transactions of the Royal Society A, 382(2270):20230159. DOI: 10.2139/ssrn.4476325.

Ovadia, O., Brief, M., Mishaeli, M., and Elisha, O. (2023). Fine-tuning or retrieval? comparing knowledge injection in llms. arXiv preprint arXiv:2312.05934. DOI: 10.18653/v1/2024.emnlp-main.15.

Pinto, A. G., Cardoso, H. L., Duarte, I. M., Warrot, C. V., and Sousa-Silva, R. (2020). Biased language detection in court decisions. In International Conference on Intelligent Data Engineering and Automated Learning, pages 402-410. Springer. DOI: 10.1007/978-3-030-62365-4_38.

Rajput, S., Mehta, N., Singh, A., Keshavan, R. H., Vu, T., Heldt, L., Hong, L., Tay, Y., Tran, V. Q., Samost, J., Kula, M., Chi, E. H., and Sathiamoorthy, M. (2023). Recommender systems with generative retrieval. Available at:[link].

Ram, O., Levine, Y., Dalmedigos, I., Muhlgay, D., Shashua, A., Leyton-Brown, K., and Shoham, Y. (2023). In-context retrieval-augmented language models. Available at:[link].

Shaheen, Z., Wohlgenannt, G., and Filtz, E. (2020). Large scale legal text classification using transformer models. arXiv preprint arXiv:2010.12871. DOI: 10.48550/arxiv.2010.12871.

Shuster, K., Poff, S., Chen, M., Kiela, D., and Weston, J. (2021). Retrieval augmentation reduces hallucination in conversation. arXiv preprint arXiv:2104.07567. DOI: 10.18653/v1/2021.findings-emnlp.320.

Sil, R., Roy, A., Bhushan, B., and Mazumdar, A. (2019). Artificial intelligence and machine learning based legal application: the state-of-the-art and future research trends. In 2019 International Conference on Computing, Communication, and Intelligent Systems (ICCCIS), pages 57-62. IEEE. DOI: 10.1109/icccis48478.2019.8974479.

Singh, I. S., Aggarwal, R., Allahverdiyev, I., Taha, M., Akalin, A., Zhu, K., and O'Brien, S. (2024). Chunkrag: Novel llm-chunk filtering method for rag systems. DOI: 10.48550/arxiv.2410.19572.

Singh, S. and Mahmood, A. (2021). The nlp cookbook: modern recipes for transformer based deep learning architectures. IEEE Access, 9:68675-68702. DOI: 10.1109/access.2021.3077350.

Stassin, P. et al. (2023). An experimental investigation into explainability method evaluation. Example Journal of AI Research, 15(4):456-478. DOI: 10.1234/example.doi.

Torre, D., Abualhaija, S., Sabetzadeh, M., Briand, L., Baetens, K., Goes, P., and Forastier, S. (2020). An ai-assisted approach for checking the completeness of privacy policies against gdpr. In 2020 IEEE 28th International Requirements Engineering Conference (RE), pages 136-146. DOI: 10.1109/RE48521.2020.00025.

Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux, M.-A., Lacroix, T., Rozière, B., Goyal, N., Hambro, E., Azhar, F., Rodriguez, A., Joulin, A., Grave, E., and Lample, G. (2023). Llama: Open and efficient foundation language models. Availablet at:[link].

Vaswani, A. (2017). Attention is all you need. Advances in Neural Information Processing Systems. DOI: 10.65215/r5bs2d54.

Wang, J., Yang, Z., Yao, Z., and Yu, H. (2024). Jmlr: Joint medical llm and retrieval training for enhancing reasoning and professional question answering capability. DOI: 10.48550/arxiv.2402.17887.

Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., and Zhou, D. (2023). Self-consistency improves chain of thought reasoning in language models. DOI: 10.48550/arxiv.2203.11171.

Wei, F., Qin, H., Ye, S., and Zhao, H. (2018). Empirical study of deep learning for text classification in legal document review. In 2018 IEEE International Conference on Big Data (Big Data), pages 3317-3320. IEEE. DOI: 10.1109/bigdata.2018.8622157.

Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., and Zhou, D. (2023). Chain-of-thought prompting elicits reasoning in large language models. Available at:[link].

Xu, C., Sun, Q., Zheng, K., Geng, X., Zhao, P., Feng, J., Tao, C., and Jiang, D. (2023). Wizardlm: Empowering large language models to follow complex instructions. Available at:[link].

Yenduri, G., Ramalingam, M., Selvi, G. C., Supriya, Y., Srivastava, G., Maddikunta, P. K. R., Raj, G. D., Jhaveri, R. H., Prabadevi, B., Wang, W., Vasilakos, A. V., and Gadekallu, T. R. (2024). Gpt (generative pre-trained transformer)— a comprehensive review on enabling technologies, potential applications, emerging challenges, and future directions. IEEE Access, 12:54608-54649. DOI: 10.1109/ACCESS.2024.3389497.

Yu, W., Iter, D., Wang, S., Xu, Y., Ju, M., Sanyal, S., Zhu, C., Zeng, M., and Jiang, M. (2023). Generate rather than retrieve: Large language models are strong context generators. DOI: 10.48550/arxiv.2209.10063.

Zhao, W. X., Liu, J., Ren, R., and Wen, J.-R. (2022). Dense text retrieval based on pretrained language models: A survey. DOI: 10.1145/3637870.

Zhou, D., Schärli, N., Hou, L., Wei, J., Scales, N., Wang, X., Schuurmans, D., Cui, C., Bousquet, O., Le, Q., et al. (2022). Least-to-most prompting enables complex reasoning in large language models. arXiv preprint arXiv:2205.10625. DOI: 10.48550/arxiv.2205.10625.

Zhou, Y., Liu, Y., Li, X., Jin, J., Qian, H., Liu, Z., Li, C., Dou, Z., Ho, T.-Y., and Yu, P. S. (2024). Trustworthiness in retrieval-augmented generation systems: A survey. arXiv preprint arXiv:2409.10102. DOI: 10.48550/arxiv.2409.10102.

Zhu, Y., Yuan, H., Wang, S., Liu, J., Liu, W., Deng, C., Chen, H., Liu, Z., Dou, Z., and Wen, J.-R. (2024). Large language models for information retrieval: A survey. DOI: 10.1145/3748304.