Using Model Cards for ethical reflection on machine learning models: an interview-based study

José Luiz Nunes; Gabriel D. J. Barbosa; Clarisse Sieckenius de Souza; Simone D. J. Barbosa

doi:10.5753/jis.2024.3444

Authors

José Luiz Nunes FGV Direito Rio; Department of Informatics, PUC-Rio https://orcid.org/0000-0002-8215-6150
Gabriel D. J. Barbosa Department of Informatics, PUC-Rio https://orcid.org/0000-0003-3929-2736
Clarisse Sieckenius de Souza Department of Informatics, PUC-Rio https://orcid.org/0000-0002-2154-4723
Simone D. J. Barbosa Department of Informatics, PUC-Rio https://orcid.org/0000-0002-0044-503X

DOI:

https://doi.org/10.5753/jis.2024.3444

Keywords:

Ethical Reasoning, Model Cards, Model Reporting, Reflective Practica

Abstract

How do tools designed for documenting machine learning models contribute to developers’ ethical reflection? We set out to answer this question regarding Model Cards, a tool proposed for such purpose. We conducted a thematic analysis of eight semi-structured interviews based on speculative design sessions. Each participant assumed the role of developer of an artificial intelligence model in one of two scenarios: loan applications or university admissions. We found evidence that designers may have been selective about which ethical issues – from among those they had reflected on – they recorded in the Model Cards. While participants were hesitant to grant full autonomy to the artifact to be developed, we identified they still tended to rely on a third party (outside the design team) to mediate the relationship between the system and other stakeholders. Our findings contribute to our understanding of documentation tools, their epistemic value, and how they can be leveraged to engage in a more ethically informed design process of artificial intelligence systems.

Downloads

References

Arnold, M., Bellamy, R. K. E., Hind, M., Houde, S., Mehta, S., Mojsilović, A., Nair, R., Ramamurthy, K. N., Olteanu, A., Piorkowski, D., Reimer, D., Richards, J., Tsay, J., and Varshney, K. R. (2019). FactSheets: Increasing trust in AI services through supplier’s declarations of conformity. IBM Journal of Research and Development, 63(4/5):6:1–6:13.

Auger, J. (2013). Speculative design: Crafting the speculation. Digital Creativity, 24(1):11–35.

Barbosa, G. D. J., Nunes, J. L., de Souza, C. S., and Barbosa, S. D. J. ( forthcoming). Investigating the extended metacommunication template: How a semiotic tool may encourage reflective ethical practice in the development of machine learning systems. In Proceedings of the 22nd Brazilian Symposium on Human Factors in Computing Systems (forthcoming), IHC ’23, pages 1–12, New York, NY, USA. Association for Computing Machinery.

Barbosa, S. D. J., Barbosa, G. D. J., de Souza, C. S., and Leitão, C. F. (2021). A Semiotics-based epistemic tool to reason about ethical issues in digital technology design and development. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, FAccT’21, pages 363–374, New York, NY, USA. Association for Computing Machinery.

Beauchamp, T. L. and Childress, J. F. (2019). Principles of Biomedical Ethics. Oxford University Press, New York, 8th edition edition.

Bellamy, R. K. E., Dey, K., Hind, M., Hoffman, S. C., Houde, S., Kannan, K., Lohia, P., Martino, J., Mehta, S., Mojsilovic, A., Nagar, S., Ramamurthy, K. N., Richards, J., Saha, D., Sattigeri, P., Singh, M., Varshney, K. R., and Zhang, Y. (2018). AI Fairness 360: An Extensible Toolkit for Detecting, Understanding, and Mitigating Unwanted Algorithmic Bias. arXiv:1810.01943 [cs].

Bender, E. M. and Friedman, B. (2018). Data Statements for Natural Language Processing: Toward Mitigating System Bias and Enabling Better Science. Transactions of the Association for Computational Linguistics, 6:587–604.

Brandão, R., Carbonera, J., de Souza, C., Ferreira, J., Gonçalves, B., and Leitão, C. (2019). Mediation Challenges and Socio-Technical Gaps for Explainable Deep Learning Applications. arXiv: 1907.07178.

Braun, V. and Clarke, V. (2012). Thematic analysis. In Cooper, H., Camic, P. M., Long, D. L., Panter, A. T., Rindskopf, D., and Sher, K. J., editors, APA handbook of research methods in psychology, Vol 2: Research designs: Quantitative, qualitative, neuropsychological, and biological., pages 57–71. American Psychological Association, Washington.

Bryant, A. (2002). Re-grounding grounded theory. The Journal of Information Technology Theory and Application (JITTA), 4(1):25–42. The last issue of JITTA appeared by the end of 2018.

Bryant, A. (2017). Grounded Theory and Grounded Theorizing: Pragmatism in Research Practice. Oxford University Press, New York.

Bryant, A. (2021). Continual permutations of misunderstanding: The curious incidents of the grounded theory method. Qualitative Inquiry, 27(3-4):397–411.

Bryant, Antony, C. K. (2019). The SAGE handbook of current developments in grounded theory. Sage, Thousand Oaks, California.

Campbell, J. L., Quincy, C., Osserman, J., and Pedersen, O. K. (2013). Coding In-depth Semistructured Interviews: Problems of Unitization and Intercoder Reliability and Agreement. Sociological Methods & Research, 42(3):294–320.

Choi, E., He, H., Iyyer, M., Yatskar, M., Yih, W.-t., Choi, Y., Liang, P., and Zettlemoyer, L. (2018). QuAC: Question Answering in Context. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2174–2184, Brussels, Belgium. Association for Computational Linguistics.

Crisan, A., Drouhard, M., Vig, J., and Rajani, N. (2022). Interactive Model Cards: A Human-Centered Approach to Model Documentation. In 2022 ACM Conference on Fairness, Accountability, and Transparency, FAccT ’22, pages 427–439, New York, NY, USA. Association for Computing Machinery.

De Souza, C. S. (2005). The semiotic engineering of human-computer interaction. MIT press.

de Souza, C. S., de Gusmão Cerqueira, R. F., Afonso, L. M., and Ferreira, J. S. J. (2016). Software Developers as Users. Semiotic Investigations on Human-Centered Software Development. Springer International, Cham, Switzreland.

Deng, W. H., Nagireddy, M., Lee, M. S. A., Singh, J., Wu, Z. S., Holstein, K., and Zhu, H. (2022). Exploring How Machine Learning Practitioners (Try To) Use Fairness Toolkits. In 2022 ACM Conference on Fairness, Accountability, and Transparency, FAccT ’22, pages 473–484, New York, NY, USA. Association for Computing Machinery.

Eubanks, V. (2018). Automating inequality: How high-tech tools profile, police, and punish the poor. St. Martin’s Press, Inc., USA.

Floridi, L. and Cowls, J. (2019). A Unified Framework of Five Principles for AI in Society. Harvard Data Science Review, 1(1):1–15.

Gabriel, Y. (2018). Interpretation, reflexivity and imagination in qualitative research. In Ciesielska, M. and Jemielniak, D., editors, Qualitative Methodologies in Organization Studies: Volume I: Theories and New Approaches, pages 137–157. Springer International Publishing, Cham.

Gebru, T., Morgenstern, J., Vecchione, B., Vaughan, J. W., Wallach, H., Daumé III, H., and Crawford, K. (2018). Datasheets for Datasets.

Goel, K., Rajani, N. F., Vig, J., Taschdjian, Z., Bansal, M., and Ré, C. (2021). Robustness Gym: Unifying the NLP Evaluation Landscape. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations, pages 42–55, Online. Association for Computational Linguistics.

Guillemin, M. and Gillam, L. (2004). Ethics, reflexivity, and “ethically important moments” in research. Qualitative Inquiry, 10(2):261–280.

Hind, M., Houde, S., Martino, J., Mojsilovic, A., Piorkowski, D., Richards, J., and Varshney, K. R. (2020). Experiences with Improving the Transparency of AI Models and Services. In Extended Abstracts of the 2020 CHI Conference on Human Factors in Computing Systems, CHI EA ’20, pages 1–8, New York, NY, USA. Association for Computing Machinery.

Holland, S., Hosny, A., Newman, S., Joseph, J., and Chmielinski, K. (2018). The dataset nutrition label: A framework to drive higher data quality standards. arXiv: 1805.03677.

Hutchinson, B., Smart, A., Hanna, A., Denton, E., Greer, C., Kjartansson, O., Barnes, P., and Mitchell, M. (2021). Towards Accountability for Machine Learning Datasets: Practices from Software Engineering and Infrastructure. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, FAccT ’21, pages 560–575, New York, NY, USA. Association for Computing Machinery.

Lakoff, P. o. L. R. T. (2001). The Language War. University of California Press, Berkeley, first edição edition.

Loukides, H., Mason, M., and Patil, D. (2018). Of oaths and checklists. [link].

Miceli, M., Yang, T., Naudts, L., Schuessler, M., Serbanescu, D., and Hanna, A. (2021). Documenting Computer Vision Datasets: An Invitation to Reflexive Data Practices. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, FAccT ’21, pages 161–172, New York, NY, USA. Association for Computing Machinery.

Mitchell, M., Wu, S., Zaldivar, A., Barnes, P., Vasserman, L.,Hutchinson, B., Spitzer, E., Raji, I. D., and Gebru, T. (2019). Model Cards for Model Reporting. In Proceedings of the Conference on Fairness, Accountability, and Transparency, FAT* ’19, pages 220–229, New York, NY, USA. ACM. event-place: Atlanta, GA, USA.

Nunes, J. L., Barbosa, G. D. J., de Souza, C. S., Lopes, H., and Barbosa, S. D. J. (2022). Using model cards for ethical reflection: A qualitative exploration. In Proceedings of the 21st Brazilian Symposium on Human Factors in Computing Systems, IHC ’22, pages 1–11, New York, NY, USA. Association for Computing Machinery.

O’Neil, C. (2016). Weapons of math destruction: How big data increases inequality and threatens democracy. Crown Publishing Group, USA.

Pushkarna, M., Zaldivar, A., and Kjartansson, O. (2022). Data Cards: Purposeful and Transparent Dataset Documentation for Responsible AI. In 2022 ACM Conference on Fairness, Accountability, and Transparency, FAccT ’22, pages 1776–1826, New York, NY, USA. Association for Computing Machinery.

Raji, I. D., Smart, A., White, R. N., Mitchell, M., Gebru, T., Hutchinson, B., Smith-Loud, J., Theron, D., and Barnes, P. (2020). Closing the AI accountability gap: Defining an end-to-end framework for internal algorithmic auditing. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, FAT* ’20, pages 33–44, New York, NY, USA. Association for Computing Machinery.

Richards, J., Piorkowski, D., Hind, M., Houde, S., and Mojsilović, A. (2020). A Methodology for Creating AI FactSheets. arXiv:2006.13796 [cs].

Rostamzadeh, N., Mincu, D., Roy, S., Smart, A., Wilcox, L., Pushkarna, M., Schrouff, J., Amironesei, R., Moorosi, N., and Heller, K. (2022). Healthsheet: Development of a Transparency Artifact for Health Datasets. In 2022 ACM Conference on Fairness, Accountability, and Transparency, FAccT ’22, pages 1943–1961, New York, NY, USA. Association for Computing Machinery.

Seck, I., Dahmane, K., Duthon, P., and Loosli, G. (2018). Baselines and a datasheet for the Cerema AWP dataset. arXiv: 1806.04016.

Shen, H., Deng, W. H., Chattopadhyay, A., Wu, Z. S., Wang, X., and Zhu, H. (2021). Value Cards: An Educational Toolkit for Teaching Social Impacts of Machine Learning through Deliberation. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, FAccT ’21, pages 850–861, New York, NY, USA. Association for Computing Machinery.

Shen, H., Wang, L., Deng, W. H., Brusse, C., Velgersdijk, R., and Zhu, H. (2022). The Model Card Authoring Toolkit: Toward Community-centered, Deliberation-driven AI Design. In 2022 ACM Conference on Fairness, Accountability, and Transparency, FAccT ’22, pages 440–451, New York, NY, USA. Association for Computing Machinery.

Siqueira De Cerqueira, J. A., Pinheiro De Azevedo, A., Acco Tives, H., and Dias Canedo, E. (2022). Guide for Artificial Intelligence Ethical Requirements Elicitation - RE4AI Ethical Guide. In Hawaii International Conference on System Sciences.

Talbert, M. (2019). Moral Responsibility. In Zalta, E. N., editor, The Stanford Encyclopedia of Philosophy. Metaphysics Research Lab, Stanford University, Online, winter 2019 edition.

Tavory, I. and Timmermans, S. (2014). Abductive Analysis: Theorizing Qualitative Research. University of Chicago Press, Chicago, illustrated edition edition.

Vakkuri, V., Kemell, K.-K., Jantunen, M., Halme, E., and Abrahamsson, P. (2021). ECCOLA — A method for implementing ethically aligned AI systems. Journal of Systems and Software, 182:111067.

Wexler, J., Pushkarna, M., Bolukbasi, T., Wattenberg, M., Viegas, F., and Wilson, J. (2019). The What-If Tool: Interactive Probing of Machine Learning Models. IEEE Transactions on Visualization and Computer Graphics, pages 1–1.

Wieringa, M. (2020). What to account for when accounting for algorithms: a systematic literature review on algorithmic accountability. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, FAT* ’20, pages 1–18, Barcelona, Spain. Association for Computing Machinery.