Proposing Usability-UX technologies for the design and evaluation of text-based chatbots

Malu Gabriele Silva Mafra; Kennedy Nunes; Simara Rocha; Geraldo Braz Junior; Aristofanes Silva; Davi Viana; Williamson Silva; Luis Rivero

doi:10.5753/jis.2024.3856

Authors

Malu Gabriele Silva Mafra Federal University of Maranhão https://orcid.org/0000-0002-2956-4649
Kennedy Nunes Federal University of Maranhão https://orcid.org/0000-0003-0826-8207
Simara Rocha Federal University of Maranhão https://orcid.org/0000-0003-3318-7281
Geraldo Braz Junior Federal University of Maranhão https://orcid.org/0000-0003-3731-6431
Aristofanes Silva Federal University of Maranhão https://orcid.org/0000-0003-0423-2514
Davi Viana Federal University of Maranhão https://orcid.org/0000-0003-0470-549X
Williamson Silva Federal University of Pampa https://orcid.org/0000-0003-1849-2675
Luis Rivero Federal University of Maranhão https://orcid.org/0000-0001-6008-6537

DOI:

https://doi.org/10.5753/jis.2024.3856

Keywords:

Inspection Checklist, Chatbots Evaluation, Usability, User Experience, Design Patterns

Abstract

Chatbots are interactive systems that communicate using natural language with human users, via a textual interface or voice activation. These tools are useful for many spheres of business such as Customer Service, Sales, Education and Learning, Health and Entertainment. Recently, chatbots have become popular, with significant growth in the software industry, especially text-based chatbots. This is encouraging developers to create their own tools, as well as attracting efforts from researchers into this area. Despite this highlight, technologies to guarantee the quality of chatbots and user satisfaction are not keeping up with the growing demand for these tools. Considering this, there is a need to propose technologies capable of supporting developers and development teams in the process of building and evaluating chatbots. Therefore, this research proposes to develop artifacts applicable to the design and evaluation process of chatbots, based on quality attributes identified in systematic literature reviews related to Usability and User Experience (UX), due to the importance and impact that these aspects have on user satisfaction and the perceived quality of the system. The first artifact is the U2Chatbot inspection checklist, developed to assist development teams in the process of identifying defects in text-based chatbots. The second artifact is a set of interface design patterns, DP-U2Chatbot, containing useful examples to support developers in the process of building chatbots. The technologies were subjected to the necessary evaluations. The results of the empirical study regarding the U2Chatbot inspection checklist indicated that participants considered the technology useful for discovering defects in chatbots, however, ease of use could be improved. The participants' experience discreetly influenced the effectiveness and efficiency of the technique, leading us to believe that professionals with a certain level of inspection experience can benefit more from the checklist. Regarding the evaluation of DP-U2Chatbot design patterns, the results generally indicated that the technology is easy to understand and useful in supporting the design of chatbots, helping to build better tools.

Downloads

Download data is not yet available.

References

Adamopoulou, E. and Moussiades, L. (2020). An overview of chatbot technology. In IFIP international conference on artificial intelligence applications and innovations, pages 373–383. Springer. DOI: https://doi.org/10.1007/978-3-030-49186-4_31.

Almeida, D. C., Pitanga, H. N., Silva, T. O. d., Silva, N.A. B., and Avelar, M. G. d. (2022). Utilização dos testes estatísticos kruskal-wallis e mann-whitney para avaliação de sistemas de solos reforçados com geotêxteis. Matéria (Rio de Janeiro), 27. DOI: https://doi.org/10.1590/1517-7076-RMAT-2021-45351.

Alsayed, A. O., Bilgrami, A. L., and Foster, W. (2017). Improving software quality management: testing, review, inspection and walkthrough. International Journal of Latest Research in Science and Technology, 6(1):7–12.

Amorim, P. F., Sacramento, C., Capra, E. P., Tavares, P. Z., and Ferreira, S. B. L. (2019). Submit or not my hci research project to the ethics committee, that is the question. In Proceedings of the 18th Brazilian Symposium on Human Factors in Computing Systems, pages 1–11. DOI: https://doi.org/10.1145/3357155.3358473.

Anshu, K., Gaur, L., and Solanki, A. (2021). Impact of chatbot in transforming the face of retailing an empirical model of antecedents and outcomes. Recent Advances in Computer Science and Communications (Formerly: Recent Patents on Computer Science), 14(3):774–787.DOI: https://doi.org/10.2174/2213275912666190809110804.

Barbosa, M., Valle, P., Nakamura, W., Guerino, G., Finger, A., Lunardi, G., and Silva, W. (2022). Um estudo exploratório sobre métodos de avaliaçao de user experience em chatbots. In Anais da VI Escola Regional de Engenharia de Software, pages 21–30. SBC. DOI: https://doi.org/10.5753/eres.2022.227723.

Borsci, S., Malizia, A., Schmettow, M., Van Der Velde, F., Tariverdiyeva, G., Balaji, D., and Chamberlain, A. (2021). The chatbot usability scale: the design and pilot of a usability scale for interaction with ai based conversational agents. Personal and Ubiquitous Computing, pages 1–25. DOI: https://doi.org/10.1007/s00779-021-01582-9.

Brill, T. M., Munoz, L., and Miller, R. J. (2019). Siri, alexa and other digital assistants: a study of customer satisfaction with artificial intelligence applications. Journal of Marketing Management, 35(15-16):1401–1436. DOI: https://doi.org/10.1080/0267257X.2019.1687571.

Brykczynski, B. (1999). A survey of software inspection checklists. ACM SIGSOFT Software Engineering Notes, 24(1):82. DOI: https://doi.org/10.1145/308769.308798.

Cabrejos, L. J. E. R., Viana, D., and dos Santos, R. P. (2018). Planejamento e execução de estudos secundários em informática na educação: Um guia prático baseado em experiências. Jornada de Atualização em Informática na Educação, 7(1):21–52.

Chaves, A. P. and Gerosa, M. A. (2021). How should my chatbot interact? a survey on social characteristics in human–chatbot interaction design. International Journal of Human–Computer Interaction, 37(8):729–758.DOI: https://doi.org/10.1080/10447318.2020.1841438.

Ciechanowski, L., Przegalinska, A., and Wegner, K. (2018). The necessity of new paradigms in measuring human-chatbot interaction. In Advances in Cross-Cultural Decision Making: Proceedings of the AHFE 2017 International Conference on Cross-Cultural Decision Making, July 17-21, 2017, The Westin Bonaventure Hotel, Los Angeles, California, USA 8, pages 205–214. Springer. DOI: https://doi.org/10.1007/978-3-319-60747-4_19.

Codina, L. (2005). Scopus: el mayor navegador científico de la web. El profesional de la información, 14(1):44–49. Available in: [link]

Coppola, R. and Ardito, L. (2021). Quality assessment methods for textual conversational interfaces: A multi-vocal literature review. Information, 12(11):437. DOI: https://doi.org/10.3390/info12110437.

Cruz, Y. P., Collazos, C. A., and Granollers, T. (2015). The thin red line between usability and user experiences. In Proceedings of the xvi international conference on human computer interaction, pages 1–2. DOI: https://doi.org/10.1145/2829875.2829915.

Dalmoro, M. and Vieira, K. M. (2014). Dilemas na construção de escalas tipo likert: o número de itens e a disposição influenciam nos resultados? Revista gestão organizacional, 6(3). DOI: https://doi.org/10.22277/rgo.v6i3.1386.

Davis, F. D., Bagozzi, R. P., and Warshaw, P. R. (1989). User acceptance of computer technology: A comparison of two theoretical models. Management science, 35(8):982–1003. DOI: https://doi.org/10.1287/mnsc.35.8.982.

De Souza Monteiro, M., da Silva Batista, G. O., and de Castro Salgado, L. C. (2023). Investigating usability pitfalls in brazilian and foreign governmental chatbots. Journal on Interactive Systems, 14(1):331–340. DOI: https://doi.org/10.5753/jis.2023.3104.

Denecke, K. and Warren, J. (2020). How to evaluate health applications with conversational user interface? Studies in health technology and informatics, 270:976–980. DOI: https://doi.org/10.3233/SHTI200307.

Fernandez, A., Abrahão, S., and Insfran, E. (2013). Empirical validation of a usability inspection method for model driven web development. Journal of Systems and Software, 86(1):161–186. DOI: https://doi.org/10.1016/j.jss.2012.07.043.

Frazao, K. et al. (2020). Analyzing app store comments and quality attributes for defining an inspection checklist for mobile educational games. In Proceedings of the 34th Brazilian Symposium on Software Engineering, pages 854–859. DOI: https://doi.org/10.1145/3422392.3422477.

Frazão, K. A. (2021). Ic-meg: Um checklist específico para avaliação de jogos educacionais digitais em plataformas móveis. Master’s thesis, Universidade Federal do Maranhão. Available in: [link]

Georgescu, A.-A. et al. (2018). Chatbots for education–trends, benefits and challenges. In Conference proceedings of» eLearning and Software for Education «(eLSE), volume 2, pages 195–200. “Carol I” National Defence University Publishing House. DOI: https://doi.org/10.12753/2066-026X-18-097.

Gomes, B. R., Jacob Jr, A. F. L., Pinto, I. d. J. P., and Colcher, S. (2020). Ágata: um chatbot para difusão de práticas para educação ambiental. Anais Estendidos do XXVI Simpósio Brasileiro de Sistemas Multimídia e Web, pages 85–89. DOI: https://doi.org/10.5753/webmedia_estendido.2020.13068.

Gomes, D., Pinto, N., Melo, A., Maia, I., Paiva, A., Barreto, R., Viana, D., and Rivero, L. (2021). Developing a set of design patterns specific for the design of user interfaces for autistic users. In Proceedings of the XX Brazilian Symposium on Human Factors in Computing Systems, pages 1–7. DOI: https://doi.org/10.1145/3472301.3484347.

Guerino, G. C. and Valentim, N. M. C. (2020). Usability and user experience evaluation of conversational systems: A systematic mapping study. In Proceedings of the 34th Brazilian Symposium on Software Engineering, pages 427–436. DOI: https://doi.org/10.1145/3422392.3422421.

Hassan, H. M. and Galal-Edeen, G. H. (2017). From usability to user experience. In 2017 International Conference on Intelligent Informatics and Biomedical Sciences (ICIIBMS), pages 216–222. IEEE. DOI: https://doi.org/10.1109/ICIIBMS.2017.8279761.

Indrayan, A. and Mishra, A. (2021). The importance of small samples in medical research. Journal of Postgraduate Medicine, 67(4):219. DOI: https://doi.org/10.4103/jpgm.JPGM_230_21.

Kalinowski, M. and Spínola, R. O. (2008). Introdução à inspeção de software. Revista Engenharia de Software: Qualidade de software, 1:68–74. Available in: [link]

Madan, A. and Dubey, S. K. (2012). Usability evaluation methods: a literature review. International Journal of Engineering Science and Technology, 4(2):590–599.

Mafra, M. G. S. (2023). Desenvolvimento de artefatos para apoiar o design e a avaliação de chatbots focando em usabilidade e user experience. Master’s thesis, Universidade Federal do Maranhão. Available in: [link]

Mirnig, A. G., Meschtscherjakov, A., Wurhofer, D., Meneweger, T., and Tscheligi, M. (2015). A formal analysis of the iso 9241-210 definition of user experience. In Proceedings of the 33rd annual ACM conference extended abstracts on human factors in computing systems, pages 437–450. DOI: https://doi.org/10.1145/2702613.2732511.

Mishra, P., Pandey, C. M., Singh, U., Gupta, A., Sahu, C., and Keshri, A. (2019). Descriptive statistics and normality tests for statistical data. Annals of cardiac anaesthesia, 22(1):67. DOI: https://doi.org/10.4103/aca.ACA_157_18.

Motaung, T. (2022). Design attributes for a successful Online Retail Chatbot Information System. PhD thesis, University of Johannesburg. Available in: [link]

Moya, C. R. (2021). Como escolher o teste estatístico: um guia para o pesquisador iniciante. Master’s thesis, Universidade Cruzeiro do Sul. Available in: [link]

Muñoz, L. and Avila, O. (2019). A model to assess customer alignment through customer experience concepts. In International Conference on Business Information Systems, pages 339–351. Springer. DOI: https://doi.org/10.1007/978-3-030-36691-9_29.

Nilsson, E. G. (2009). Design patterns for user interface for mobile applications. Advances in engineering software, 40(12):1318–1328. DOI: https://doi.org/10.1016/j.advengsoft.2009.01.017.

Pachani, R. A. (2006). Cálculo e uso de mediana. Exacta, 4(2):417–423. Available in: [link]

Radziwill, N. M. and Benton, M. C. (2017). Evaluating quality of chatbots and intelligent conversational agents. arXiv preprint arXiv:1704.04579. DOI: https://doi.org/10.48550/arXiv.1704.04579.

Rahman, A., Al Mamun, A., and Islam, A. (2017). Programming challenges of chatbot: Current and future prospective. In 2017 IEEE Region 10 Humanitarian Technology Conference (R10-HTC), pages 75–78. IEEE. DOI: https://doi.org/10.1109/R10-HTC.2017.8288910.

Rapp, A., Curti, L., and Boldi, A. (2021). The human side of human-chatbot interaction: A systematic literature review of ten years of research on text-based chatbots. International Journal of Human-Computer Studies, 151:102630. DOI: https://doi.org/10.1016/j.ijhcs.2021.102630.

Reicherts, L., Park, G. W., and Rogers, Y. (2022). Extending chatbots to probe users: Enhancing complex decision-making through probing conversations. In Proceedings of the 4th Conference on Conversational User Interfaces, pages 1–10. DOI: https://doi.org/10.1145/3543829.3543832.

Rosruen, N. and Samanchuen, T. (2018). Chatbot utilization for medical consultant system. In 2018 3rd technology innovation management and engineering science international conference (TIMES-iCON), pages 1–5. IEEE. DOI: https://doi.org/10.1109/TIMESiCON.2018.8621678.

Sharma, P. (2021). Review paper on contextual chatbot for covid-19 updates. IITM Journal of Management and IT, 12(1):36–37. Available in: [link].

Sharma, R. K. and Joshi, M. (2020). An analytical study and review of open source chatbot framework, Rasa. Int. J. Eng. Res, 9(06):1011–1014. DOI: https://doi.org/10.17577/ijertv9IS060723.

Sharma, V., Goyal, M., and Malik, D. (2017). An intelligent behaviour shown by chatbot system. International Journal of New Technology and Research, 3(4):263312.

Sperlí, G. (2020). A deep learning based chatbot for cultural heritage. In Proceedings of the 35th Annual ACM Symposium on Applied Computing, pages 935–937. DOI: https://doi.org/10.1145/3341105.3374129.

Sugisaki, K. and Bleiker, A. (2020). Usability guidelines and evaluation criteria for conversational user interfaces: a heuristic and linguistic approach. In Proceedings of the Conference on Mensch und Computer, pages 309–319. DOI: https://doi.org/10.1145/3404983.3405505.

Suhaili, S. M., Salim, N., and Jambli, M. N. (2021). Service chatbots: A systematic review. Expert Systems with Applications, 184:115461. DOI: https://doi.org/10.1016/j.eswa.2021.115461.

Thorat, S. A. and Jadhav, V. (2020). A review on implementation issues of rule-based chatbot systems. In Proceedings of the international conference on innovative computing & communications (ICICC). DOI: http://dx.doi.org/10.2139/ssrn.3567047.

Van Duyne, D. K., Landay, J. A., and Hong, J. I. (2007). The design of sites: Patterns for creating winning web sites. Prentice Hall Professional.

Vora, P. (2009). Web application design patterns. Morgan Kaufmann. DOI: https://doi.org/10.1016/B978-0-12-374265-0.X0001-1.