KoopaML, a Machine Learning platform for medical data analysis
DOI:
https://doi.org/10.5753/jis.2022.2574Keywords:
Machine Learning, Data Analysis, Machine Learning Pipelines, Learning Platform, HealthAbstract
Machine Learning allows facing complex tasks related to data analysis with big datasets. This Artificial Intelligence branch allows not technical contexts to get benefits related to data processing and analysis. In particular, in medicine, medical professionals are increasingly interested in Machine Learning to identify patterns in clinical cases and make predictions regarding health issues. However, many do not have the necessary programming or technological skills to perform these tasks. Many different tools focus on developing Machine Learning pipelines, from libraries for developers and data scientists to visual tools for experts or platforms to learn. However, we have identified some requirements in the medical context that raise the need to create a customized platform adapted to end-user found in this context. This work describes the design process and the first version of KoopaML, an ML platform to bridge the data science gaps of physicians while automatizing Machine Learning pipelines. The platform is focused on enhanced interactivity to improve the engagement of physicians while still providing all the benefits derived from the introduction of Machine Learning pipelines in medical departments, as well as integrated ongoing training during the use of the tool’s features.
Downloads
References
Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S., Irving, G., Isard, M., Kudlur, M., Levenberg, J., Monga, R., Moore, S., Murray, D. G., Steiner, B., Tucker, P., Vasudevan, V., Warden, P., . . . Zheng, X. (2016). TensorFlow: A System for Large-Scale Machine Learning. In 12th USENIX Symposium on Operating Systems Design and Implementation OSDI 16 (pp. 265-283). USENIX Association. [link]
Anil, R., Capan, G., Drost-Fromm, I., Dunning, T., Friedman, E., Grant, T., Quinn, S., Ranjan, P., Schelter, S., & Yılmazel, Ö. (2020). Apache Mahout: Machine Learning on Distributed Dataflow Systems. Journal of Machine Learning Research, 21(127), 1-6. [link]
Berthold, M. R., Cebron, N., Dill, F., Gabriel, T. R., Kötter, T., Meinl, T., Ohl, P., Thiel, K., & Wiswedel, B. (2009). KNIME - the Konstanz information miner: version 2.0 and beyond. SIGKDD Explor. Newsl., 11(1), 26–31. https://doi.org/10.1145/1656274.1656280
Bisong, E. (2019a). Google Colaboratory. In Building Machine Learning and Deep Learning Models on Google Cloud Platform: A Comprehensive Guide for Beginners (pp. 59-64). Apress. https://doi.org/10.1007/978-1-4842-4470-8_7
Bisong, E. (2019b). Kubeflow and Kubeflow Pipelines. In Building Machine Learning and Deep Learning Models on Google Cloud Platform (pp. 671-685). Apress. https://doi.org/10.1007/978-1-4842-4470-8_46
Bjaoui, M., Sakly, H., Said, M., Kraiem, N., & Bouhlel, M. S. (2020). Depth insight for data scientist with RapidMiner « an innovative tool for AI and big data towards medical applications» Proceedings of the 2nd International Conference on Digital Tools & Uses Congress, Virtual Event, Tunisia. https://doi.org/10.1145/3423603.3424059
Buitinck, L., Louppe, G., Blondel, M., Pedregosa, F., Mueller, A., Grisel, O., Niculae, V., Prettenhofer, P., Gramfort, A., Grobler, J., Layton, R., VanderPlas, J., Joly, A., Holt, B., & Varoquaux, G. e. (2013). API design for machine learning software: experiences from the scikit-learn project ECML PKDD Workshop: Languages for Data Mining and Machine Learning,
Burns, B., Beda, J., & Hightower, K. (2017). Kubernetes: Up & Running. Dive into the Future of Infrastructure. O’Really Media.
C. Weyerer, J., & F. Langer, P. (2019). Garbage in, garbage out: The vicious cycle of ai-based discrimination in the public sector. Proceedings of the 20th Annual International Conference on Digital Government Research, Dubai, United Arab Emirates.
Carroll, J. (2000). Making use: Scenario-based design of human-computer interactions. The MIT Press.
Cooper, A. (1999). The Inmates Are Running the Asylum: Why High Tech Products Drive Us Crazy and How to Restore the Sanity. Sams.
Cope, S. (2020). Focus Groups: Are They Right for You? Digital.gov. Retrieved March 12 from [link]
Fardoun, H., González-González, C. S., Collazos, C. A., & Yousef, M. (2020). Exploratory Study in Iberoamerica on the Teaching-Learning Process and Assessment Proposal in the Pandemic Times. Education in the Knowledge Society 21. https://doi.org/10.14201/eks.23437
Ferrer, X., van Nuenen, T., Such, J. M., Coté, M., & Criado, N. (2021). Bias and Discrimination in AI: a cross-disciplinary perspective. IEEE Technology and Society Magazine, 40(2), 72-80. https://doi.org/10.1109/MTS.2021.3056293
Frank, E., Hall, M., Holmes, G., Kirkby, R., Pfahringer, B., & Witten, I. H. (2009). Weka-A Machine Learning Workbench for Data Mining. In O. Maimon & L. Rokach (Eds.), Data Mining and Knowledge Discovery Handbook. Springer. https://doi.org/10.1007/978-0-387-09823-4_66
García-Holgado, A., & García-Peñalvo, F. J. (2017). A Metamodel Proposal for Developing Learning Ecosystems. In P. Zaphiris & A. Ioannou (Eds.), Learning and Collaboration Technologies. Novel Learning Ecosystems. 4th International Conference, LCT 2017. Held as Part of HCI International 2017, Vancouver, BC, Canada, July 9–14, 2017. Proceedings, Part I (Vol. 10295, pp. 100-109). Springer International Publishing. https://doi.org/10.1007/978-3-319-58509-3_10
García-Holgado, A., & García-Peñalvo, F. J. (2019). Validation of the learning ecosystem metamodel using transformation rules. Future Generation Computer Systems, 91, 300-310. https://doi.org/10.1016/j.future.2018.09.011
García-Holgado, A., Vázquez-Ingelmo, A., Alonso, J., García-Peñalvo, F. J., Sampedro-Gómez, J., Sánchez-Puente, A., Vicente-Palacios, V., Dorado-Díaz, P. I., & Sánchez, P. L. (2021). User-centered design approach for a machine learning platform for medical purpose. In P. Ruiz, V. Agredo Delgado, & A. Kawamoto (Eds.), 7th Iberoamerican Workshop, HCI-COLLAB 2021, Sao Paulo, Brazil (September 8–10, 2021) (Vol. 1478, pp. 237-249). Springer. https://doi.org/10.1007/978-3-030-92325-9_18
García-Peñalvo, F. J., Corell, A., Abella-García, V., & Grande-de-Prado, M. (2020). Online Assessment in Higher Education in the Time of COVID-19. Education in the Knowledge Society, 21. https://doi.org/10.14201/eks.23086
García-Peñalvo, F. J., Corell, A., Rivero-Ortega, R., Rodríguez-Conde, M. J., & Rodríguez-García, N. (2021). Impact of the COVID-19 on Higher Education: An Experience-Based Approach. In F. J. García-Peñalvo (Ed.), Information Technology Trends for a Global and Interdisciplinary Research Community (pp. 1-18). IGI Global.
García-Peñalvo, F. J., Vázquez-Ingelmo, A., García-Holgado, A., Sampedro-Gómez, J., Sánchez-Puente, A., Vicente-Palacios, V., Dorado-Díaz, P. I., & Sánchez, P. L. (2021). Application of Artificial Intelligence algorithms within the medical context for non-specialized users: the CARTIER-IA platform. International Journal of Interactive Multimedia and Artificial Intelligence, 6(6), 46-53. https://doi.org/10.9781/ijimai.2021.05.005
González Izard, S., Sánchez Torres, R., Alonso Plaza, Ó., Juanes Méndez, J. A., & García-Peñalvo, F. J. (2020). Nextmed: Automatic Imaging Segmentation, 3D Reconstruction, and 3D Model Visualization Platform Using Augmented and Virtual Reality. Sensors (Basel, Switzerland), 20(10), 2962. https://doi.org/10.3390/s20102962
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., & Witten, I. H. (2009). The WEKA data mining software: an update. SIGKDD Explor. Newsl., 11(1), 10–18. https://doi.org/10.1145/1656274.1656278
Hoffman, S. (2021). The Emerging Hazard of AI‐Related Health Care Discrimination. Hastings Center Report, 51(1), 8-9. https://doi.org/10.1002/hast.1203
Izard, S. G., Juanes, J. A., García Peñalvo, F. J., Estella, J. M. G., Ledesma, M. J. S., & Ruisoto, P. (2018). Virtual Reality as an Educational and Training Tool for Medicine. Journal of Medical Systems, 42(3), 50. https://doi.org/10.1007/s10916-018-0900-2
Krueger, R. A., & Casey, M. A. (2014). Focus Groups: A Practical Guide for Applied Research. Sage publications.
Kuhn, K. (2000). Problems and Benefits of Requirements Gathering With Focus Groups: A Case Study. International Journal of Human–Computer Interaction, 12(3-4), 309-325. https://doi.org/10.1080/10447318.2000.9669061
Litjens, G., Kooi, T., Bejnordi, B. E., Setio, A. A. A., Ciompi, F., Ghafoorian, M., van der Laak, J. A. W. M., van Ginneken, B., & Sánchez, C. I. (2017). A survey on deep learning in medical image analysis. Med Image Anal, 42, 60-88. https://doi.org/10.1016/j.media.2017.07.005
McCormick, K., & Salcedo, J. (2017). IBM SPSS Modeler essentials: Effective techniques for building powerful data mining and predictive analytics solutions. Packt Publishing Ltd.
Meng, X., Bradley, J., Yavuz, B., Sparks, E., Venkataraman, S., Liu, D., Freeman, J., Tsai, D., Amde, M., & Owen, S. (2016). MLlib: Machine Learning in Apache Spark. The Journal of Machine Learning Research, 17(1), 1235-1241.
Miller, D. D. (2019). The medical AI insurgency: what physicians must know about data to practice with intelligent machines. npj Digital Medicine, 2(1), 62. https://doi.org/10.1038/s41746-019-0138-5
Nature Materials. (2019). Ascent of machine learning in medicine. Nature Materials, 18(5), 407-407. https://doi.org/10.1038/s41563-019-0360-1
Nielsen, L. (2003). Constructing the user. In C. Stephanidis & J. Jacko (Eds.), Human-computer interaction: theory and practice (Part 2) (Vol. 2, pp. 430-434). CRC Press.
Nielsen, L. (2004). Engaging Personas and Narrative Scenarios. Samfundslitteratur.
Nielsen, L. (2013a). Personas. In The Encyclopedia of Human-Computer Interaction. The Interaction Design Foundation. [link]
Nielsen, L. (2013b). Personas - User Focused Design. Springer. https://doi.org/10.1007/978-1-4471-4084-9
Pernice, K. (2016). UX Prototypes: Low Fidelity vs. High Fidelity. [link]
Rajkomar, A., Dean, J., & Kohane, I. (2019). Machine Learning in Medicine. New England Journal of Medicine, 380(14), 1347-1358. https://doi.org/10.1056/NEJMra1814259
Rodríguez-García, J. D., Moreno-León, J., Román-González, M., & Robles, G. (2021). Evaluation of an Online Intervention to Teach Artificial Intelligence with LearningML to 10-16-Year-Old Students. In Proceedings of the 52nd ACM Technical Symposium on Computer Science Education (pp. 177–183). Association for Computing Machinery. https://doi.org/10.1145/3408877.3432393
Sampedro-Gómez, J., Dorado-Díaz, P. I., Vicente-Palacios, V., Sánchez-Puente, A., Jiménez-Navarro, M., San Roman, J. A., Galindo-Villardón, P., Sanchez, P. L., & Fernández-Avilés, F. (2020). Machine Learning to Predict Stent Restenosis Based on Daily Demographic, Clinical, and Angiographic Characteristics. Canadian Journal of Cardiology, 36(10), 1624-1632. https://doi.org/10.1016/j.cjca.2020.01.027
Scikit-Learn. (2020). Choosing the right estimator - Scikit-Learn documentation. [link]
Spruit, M., & Lytras, M. (2018). Applied data science in patient-centric healthcare: Adaptive analytic systems for empowering physicians and patients. Telematics and Informatics, 35(4), 643-653. https://doi.org/https://doi.org/10.1016/j.tele.2018.04.002
Vázquez-Ingelmo, A., Alonso, J., García-Holgado, A., García-Peñalvo, F. J., Sampedro-Gómez, J., Sánchez-Puente, A., Vicente-Palacios, V., Dorado-Díaz, P. I., & Sánchez, P. L. (2021). Usability Study of CARTIER-IA: A Platform for Medical Data and Imaging Management. In P. Zaphiris & A. Ioannou (Eds.), Learning and Collaboration Technologies: New Challenges and Learning Experiences. 8th International Conference, LCT 2021, Held as Part of the 23rd HCI International Conference, HCII 2021, Virtual Event, July 24–29, 2021, Proceedings, Part I (pp. 374-384). Springer. https://doi.org/10.1007/978-3-030-77889-7_26
Wachter, S., Mittelstadt, B., & Russell, C. (2021). Why fairness cannot be automated: Bridging the gap between EU non-discrimination law and AI. Computer Law & Security Review, 41, 105567. https://doi.org/10.2139/ssrn.3547922
Zaharia, M., Xin, R. S., Wendell, P., Das, T., Armbrust, M., Dave, A., Meng, X., Rosen, J., Venkataraman, S., Franklin, M. J., Ghodsi, A., Gonzalez, J., Shenker, S., & Stoica, I. (2016). Apache Spark: a unified engine for big data processing. Commun. ACM, 59(11), 56–65. https://doi.org/10.1145/2934664
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2022 Alicia García-Holgado, Andrea Vázquez-Ingelmo, Julia Alonso-Sánchez, Francisco José García-Peñalvo, Roberto Therón, Jesús Sampedro-Gómez, Antonio Sánchez-Puente, Víctor Vicente-Palacios, P. Ignacio Dorado-Díaz, Pedro L. Sánchez
This work is licensed under a Creative Commons Attribution 4.0 International License.
JIS is free of charge for authors and readers, and all papers published by JIS follow the Creative Commons Attribution 4.0 International (CC BY 4.0) license.