STELLAR: A Structured, Trustworthy, and Explainable LLM-Led Architecture for Reliable Customer Support

Matheus Ferracciú Scatolin; Helio Pedrini

doi:10.5753/jbcs.2026.6044

Authors

Matheus Ferracciú Scatolin Universidade Estadual de Campinas (UNICAMP) https://orcid.org/0009-0003-2014-356X
Helio Pedrini Universidade Estadual de Campinas (UNICAMP) https://orcid.org/0000-0003-0125-630X

DOI:

https://doi.org/10.5753/jbcs.2026.6044

Keywords:

Large Language Models (LLMs), Intelligent Customer Support Systems, Structured LLM Architectures, Reliable and Trustworthy AI, Retrieval-Augmented Generation (RAG)

Abstract

While Large Language Models (LLMs) offer transformative potential for automating customer support, significant hurdles remain concerning their reliability, explainability, and consistent performance in complex, sensitive interactions. This paper introduces STELLAR (Structured, Trustworthy, and Explainable LLM-Led Architecture for Reliable Customer Support), a novel architectural blueprint designed to address these issues. STELLAR utilizes a Directed Acyclic Graph (DAG) structure composed of nine specialized modules and eleven predefined workflows to orchestrate support interactions in a structured and predictable manner. This design promotes enhanced traceability, reliability, and control compared to less constrained systems. The architecture integrates components for few-shot classification, Retrieval-Augmented Generation (RAG), urgency-aware human escalation, compliance verification, user interaction validation, and knowledge base refinement through a semi-automated loop. This modular design deliberately balances LLM-driven innovation with operational requirements such as human-in-the-loop integration and ethical safeguards through embedded checks. We evaluated the core modules of STELLAR in key tasks - classification, retrieval, and compliance - demonstrating strong performance and reliability. Together, these features position STELLAR as a robust and transparent foundation for the next generation of intelligent, reliable customer support systems.

Downloads

Download data is not yet available.

Author Biography

Helio Pedrini, Universidade Estadual de Campinas (UNICAMP)

Helio Pedrini is currently a professor in the Institute of Computing at the University of Campinas, Brazil. He was an associate professor at the Federal University of Paraná, Brazil, from July 2000 to March 2008. He worked for Research and Development Center (CPqD) at Brazilian Telecommunications (Telebrás) company, from March 1987 to December 1995.

He was a visiting scholar at the following institutions: Digital Imaging Research Centre at Kingston University, London, United Kingdom, from December 2014 to February 2015; Centre for Intelligent Machines at McGill University, Montreal, Canada, from December 2013 to February 2014; Department of Computer Science at the University of California, Davis, USA, from September 2005 to August 2006.

He received a Ph.D. degree in Electrical and Computer Engineering from Rensselaer Polytechnic Institute, Troy, NY, USA, in 2000. He received his M.Sc. degree in Electrical Engineering from the University of Campinas, Brazil, in 1994, and his B.Sc. degree in Computer Science from the University of Campinas, Brazil, in 1986.

He has received financial support for research projects from a number of funding agencies. He has served as a member of technical committee and reviewer for several conferences and journals. He is a member of the Institute of Electrical and Electronics Engineers (IEEE) and Brazilian Computer Society (SBC).

His research interests include image processing, computer vision, pattern recognition, machine learning, computational geometry, geometric modeling, computer graphics, scientific visualization.

References

Bodonhelyi, Z., Recski, G., and Iantovics, L. B. (2024). User Intent Recognition and Satisfaction with Large Language Models: A User Study with ChatGPT. arXiv preprint arXiv:2402.02136. DOI: 10.48550/arxiv.2402.02136.

Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D. M., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish, S., Radford, A., Sutskever, I., and Amodei, D. (2020). Language Models are Few-Shot Learners. In Advances in Neural Information Processing Systems (NeurIPS), volume 33, pages 1877-1901. DOI: 10.48550/arxiv.2005.14165.

cardiffnlp (2021). twitter-roberta-base-sentiment-latest. Available at:[link].

Chase, H. (2022). LangChain. Available at:[link].

Chen, H., Liu, X., Yin, D., and Tang, J. (2018). A Survey on Dialogue Systems: Recent Advances and New Frontiers. arXiv preprint arXiv:1711.01731v3. DOI: 10.48550/arxiv.1711.01731.

Dong, H., Huang, Z., Xu, Y., Zhang, Y., and Yu, Y. (2025). ProTOD: Proactive Task-Oriented Dialogue System Based on Large Language Model. In 30th International Conference on Computational Linguistics (COLING), pages 9147-9164. Available at:[link].

Fu, Y. and Feng, C. (2023). GPTCache: An Open-Source Semantic Cache for LLM Applications. In 3rd Workshop for Natural Language Processing Open Source Software (NLP-OSS), pages 212-218. Available at:[link].

Ghosh, D., Xiang, Z., Singh, M., and Choi, Y. (2024). Aegis: Online Adaptive AI Content Safety Moderation with Ensemble of LLM Experts. arXiv preprint arXiv:2404.05993. DOI: 10.48550/arxiv.2404.05993.

Gim, I., Chen, G., Lee, S.-S., Sarda, N., Khandelwal, A., and Zhong, L. (2024). Prompt Cache: Modular Attention Reuse for Low-Latency Inference. arXiv preprint arXiv:2311.04934v2. DOI: 10.48550/arxiv.2311.04934.

Google (2024). Gemma-2-9b-it. Available at:[link].

Hudeček, O. and Dušek, O. (2023). Are Large Language Models All You Need for Task-Oriented Dialogue? In 24th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL), pages 216-228. DOI: 10.18653/v1/2023.sigdial-1.21.

Jia, R., Arora, S., Lee, H., Khaliq, B., Kwak, H., and Yang, D. (2024). Leveraging LLMs for Dialogue Quality Measurement. [link]. arXiv preprint arXiv:2406.17304.

Jiang, Z., Xu, F. F., Gao, L., Sun, Z., Dou, Q., Bing, L., Lin, R. W., and Han, S. (2023). Active Retrieval Augmented Generation. In Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 7969-7992. DOI: 10.18653/v1/2023.emnlp-main.495.

Joko, S., Sakai, T., Wu, D., and Joho, H. (2024). Doing Personal LAPS: LLM-Augmented Dialogue Construction for Personalized Multi-Session Conversational Search. In 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), pages 796-806. ACM. DOI: 10.1145/3626772.3657815.

Li, A., Gong, B., Yang, B., Shan, B., Liu, C., et al. (2025). MiniMax-01: Scaling Foundation Models with Lightning Attention. arXiv preprint arXiv:2501.08313v1. DOI: 10.48550/arxiv.2501.08313.

Lin, Z. and Zhang, Y. (2023). LLM-EVAL: Unified Multi-Dimensional Automatic Evaluation for Open-Domain Conversations with LLMs. In 5th Workshop on NLP for Conversational AI (NLP4ConvAI), pages 47-58. Available at:[link].

Meta (2024a). Llama-3.2-3B. Available at:[link].

Meta (2024b). Meta-Llama-3-8B-Instruct. Available at:[link].

Meta (2024c). Meta-Llama-3.1-70B-Instruct. Available at:[link].

Meta (2024d). Meta-Llama-3.3-70B-Instruct. Available at:[link].

Moura, J. (2023). CrewAI. Available at:[link].

Nehring, J., Augenstein, L., Gregor, B., Heinzerling, B., and Kern, R. (2024). Dynamic Prompting: Large Language Models for Task-Oriented Dialog. In 9th Italian Conference on Computational Linguistics (CLiC-it), pages 1-10. Available at:[link].

Rayo, J., Vila, J., Klinaku, S., and Schmidt, A. (2025). A Hybrid Approach to Information Retrieval and Answer Generation for Regulatory Texts. arXiv preprint arXiv:2502.16767. DOI: 10.48550/arxiv.2502.16767.

sentence-transformers (2020a). all-MiniLM-L12-v2. Available at:[link].

sentence-transformers (2020b). all-MiniLM-L6-v2. Available at:[link].

sentence-transformers (2020c). multi-qa-MiniLM-L6-cos-v1. Available at:[link].

sentence-transformers (2021). all-mpnet-base-v2. Available at:[link].

Shao, Z., Ren, Y., Sun, H., Wu, H., and Li, X. (2023). Enhancing Retrieval-Augmented Large Language Models with Iterative Retrieval-Generation Synergy. In Findings of the Association for Computational Linguistics (EMNLP), pages 9248-9274. DOI: 10.18653/v1/2023.findings-emnlp.620.

Shavit, Y., Agarwal, S., Brundage, M., Adler, S., O’Keefe, C., Campbell, R., Kjellsson, H., Button, T., Sastry, G., Kokotajlo, D., Saunders, W., Knight, R., Schulman, J., and Solaiman, I. (2023). Practices for Governing Agentic AI Systems. Available at:[link].

ShieldGemma Team (2024). ShieldGemma: Generative AI Content Moderation Based on Gemma. arXiv preprint arXiv:2407.21772. Available at:[link].

Sreekar, K., Ashok, A., Lalwani, J., Rajanala, S., Joty, S., Lyzinski, V., Elazar, Y., and Potti, N. (2024). AXCEL: Automated eXplainable Consistency Evaluation using LLMs. In Findings of the Association for Computational Linguistics (EMNLP), pages 14943-14957. DOI: 10.18653/v1/2024.findings-emnlp.878.

Sumers, T. R., Yao, S., Narasimhan, K., and Griffiths, T. L. (2024). Cognitive Architectures for Language Agents. arXiv preprint arXiv:2309.02427v3. DOI: 10.48550/arxiv.2309.02427.

Thoppilan, R., De Freitas, D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H. T., Chowdhery, A., Ichter, B., Le, Q. V., Salkin, R., Gao, L., Narang, S., Fiedel, N., Dean, J., and Roberts, A. (2022). LaMDA: Language Models for Dialog Applications. arXiv preprint arXiv:2201.08239. DOI: 10.48550/arxiv.2201.08239.

Wang, X., Wang, Z., Gao, X., Zhang, F., Wu, Y., Xu, Z., Li, Z., Zhang, W., Yuan, Z., Li, Z., Zhang, H., Li, H., Liu, Z., and Sun, M. (2024). Searching for Best Practices in Retrieval-Augmented Generation. arXiv preprint arXiv:2407.01219v1. DOI: 10.18653/v1/2024.emnlp-main.981.

Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., and Zhou, D. (2023). Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. arXiv preprint arXiv:2201.11903v6. Available at:[link].

Weizenbaum, J. (1966). ELIZA - A Computer Program for the Study of Natural Language Communication Between Man and Machine. Communications of the ACM, 9(1):36-45. DOI: 10.1145/357980.357991.

Wu, Q., Bansal, G., Zhang, J., Wu, Y., Zhang, S., Zhu, E., Li, B., Jiang, L., Zhang, X., and Wang, C. (2023). AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation Framework. arXiv preprint arXiv:2308.08155. Available at:[link].

Wulf, J. and Meierhofer, J. (2024). Utilizing Large Language Models for Automating Technical Customer Support. arXiv preprint arXiv:2406.01407. DOI: 10.48550/arxiv.2406.01407.

Xi, Z., Chen, W., Guo, X., He, W., Ding, Y., Hong, B., Zhang, M., Wang, J., Jin, S., Zhou, E., Zheng, R., Fan, A., Wang, H., Gui, T., Zhang, Q., Wang, F., Zhang, B., Wang, Z., Zhao, H., Liu, S., Li, Z., Yuan, J., Wu, L., Liu, Z., Sun, M., and Zhang, Y. (2023). The Rise and Potential of Large Language Model Based Agents: A Survey. arXiv preprint arXiv:2309.07864v3. DOI: 10.1007/s11432-024-4222-0.

Xu, Z., Jain, S., and Kankanhalli, M. (2024). Hallucination is Inevitable: An Innate Limitation of Large Language Models. arXiv preprint arXiv:2401.11817v2. DOI: 10.48550/arxiv.2401.11817.

Zhang, W. and Zhang, J. (2023). Hallucination Mitigation for Retrieval-Augmented Large Language Models: A Review. Mathematics, 13(5):856. DOI: 10.3390/math13050856.