A heuristic Data-Centric AI approach to predict non-contact injuries in elite football players
DOI:
https://doi.org/10.5753/jidm.2026.5855Keywords:
Professional soccer, Injury prediction, Data Science, Sports injuries, Machine learningAbstract
Preventing non-contact injuries in professional soccer is critical for safeguarding athlete health and minimizing disruptions to team performance and financial stability. This study investigates predictive modeling strategies for forecasting non-contact traumatic injuries during a microcycle of male professional players from Fluminense Football Club, integrating Data-Centric AI (DCAI) principles with machine learning algorithms. Building upon previous work, we extend the Regressive Multi-dimensional Model Selection (RMMS) methodology through new experiments that incorporate alternative class balancing strategies, hyperparameter tuning, and feature selection methods in place of Principal Component Analysis (PCA). Among the tested models, tree-based algorithms—particularly XGBoost—achieved the highest AUC-ROC (74.1\%), though this result remained below the 79.8\% baseline obtained with a Decision Tree in earlier research. Undersampling with a 70/30 ratio of non-injury to injury cases emerged as the most effective balancing approach, reinforcing prior findings. SHAP (SHapley Additive exPlanations) analysis identified Adaboost as the most positively impactful model, while feature selection and hyperparameter optimization yielded adverse effects on performance. These results suggest that PCA continues to be a more effective dimensionality reduction technique for this dataset. Future research should incorporate additional training seasons, match-related data, and broader athlete characteristics beyond GPS metrics—such as biochemical markers and perceived exertion—to improve model robustness and predictive accuracy.
Downloads
References
Dandrieux, P.-E., Tondut, J., Nagahara, R., Mendiguchia, J., Morin, J.-B., Lahti, J., Ley, C., Edouard, P., and Navarro, L. (2023). Prediction des blessures des ischiojambiers en football a l'aide d'apprentissage automatique: etude preliminaire sur 284 footballeurs. Journal de Traumatologie du Sport, 40(2):69-73. DOI: 10.1016/j.jts.2023.04.003.
Ekstrand, J., Spreco, A., Bengtsson, H., and Bahr, R. (2021). Injury rates decreased in men's professional football: An 18-year prospective cohort study of almost 12 000 injuries sustained during 1.8 million hours of play. British Journal of Sports Medicine, 55(19):1084-1091. DOI: 10.1136/bjsports-2020-103159.
Fang, J. and Xiang, T. (2024). Medical Decision Support for Football Players Based on Machine Learning Historical Injury Data. Revista Internacional de Medicina y Ciencias de la Actividad Fisica y del Deporte, 24(96):479-489. DOI: 10.15366/rimcafd2024.96.029.
Fernandez Cuevas, I., Carmona, P., Quintana, M., Salces, J., Arnaiz-Lastras, J., and Barron, A. (2010). Economic costs estimation of soccer injuries in first and second spanish division professional teams. In Proceedings of the 15th Annual Congress of the European College of Sport Sciences (ECSS).
Fiscutean, A. (2021). Data scientists are predicting sports injuries with an algorithm. Nature, 592(7852):S10-S11. DOI: 10.1038/d41586-021-00818-1.
Freitas, D. N., Mostafa, S. S., Caldeira, R., Santos, F., Ferme, E., Gouveia, E. R., and Morgado-Dias, F. (2025). Predicting noncontact injuries of professional football players using machine learning. PLoS ONE, 20(1):1-21. DOI: 10.1371/journal.pone.0315481.
Giusti, L., Carvalho, L., Gomes, A. T. A., Coutinho, R., de Abreu Soares, J., and Ogasawara, E. S. (2022). Analyzing flight delay prediction under concept drift. Evolving Systems, (0123456789). DOI: 10.1007/s12530-021-09415-z.
Hagglund, M., Walden, M., Bahr, R., and Ekstrand, J. (2005). Methods for epidemiological study of injuries to professional football players: Developing the UEFA model. British Journal of Sports Medicine, 39(6):340-346. DOI: 10.1136/bjsm.2005.018267.
Haller, N., Kranzinger, S., Kranzinger, C., Blumkaitis, J. C., Strepp, T., Simon, P., Tomaskovic, A., O'brien, J., During, M., and Stoggl, T. (2023). Predicting Injury and Illness with Machine Learning in Elite Youth Soccer: A Comprehensive Monitoring Approach over 3 Months. Journal of Sports Science and Medicine, 22(3):475-486. DOI: 10.52082/jssm.2023.475.
Hagglund, M., Walden, M., Hedevik, H., Kristenson, K., Bengtsson, H., and Ekstrand, J. (2013). Injuries affect team performance negatively in professional football: An 11-year follow-up of the UEFA Champions League injury study. British Journal of Sports Medicine, 47(12):738-742. DOI: 10.1136/bjsports-2013-092215.
Jarrahi, M. H., Memariani, A., and Guha, S. (2023). The Principles of Data-Centric AI. Communications of the ACM, 66(8):84-92. DOI: 10.1145/3571724.
Jauhiainen, S., Kauppi, J.-P., Krosshaug, T., Bahr, R., Bartsch, J., and Ayramo, S. (2022). Predicting ACL Injury Using Machine Learning on Data From an Extensive Screening Test Battery of 880 Female Elite Athletes. American Journal of Sports Medicine, 50(11):2917-2924. DOI: 10.1177/03635465221112095.
Kirkendall, D. T. and Dvorak, J. (2010). Effective injury prevention in soccer. Physician and Sportsmedicine, 38(1):147-157. DOI: 10.3810/psm.2010.04.1772.
Kolodziej, M., Groll, A., Nolte, K., Willwacher, S., Alt, T., Schmidt, M., and Jaitner, T. (2023). Predictive modeling of lower extremity injury risk in male elite youth soccer players using least absolute shrinkage and selection operator regression. Scandinavian Journal of Medicine and Science in Sports, (February 2022):1-13. DOI: 10.1111/sms.14322.
Kuhn, M. and Johnson, K. (2013). Applied Predictive Modeling. DOI: 10.1007/978-1-4614-6849-3.
Majumdar, A., Bakirov, R., Hodges, D., Scott, S., and Rees, T. (2022). Machine Learning for Understanding and Predicting Injuries in Football. Sports Medicine - Open, 8(1). DOI: 10.1186/s40798-022-00465-4.
Martins, F., Przednowek, K., Franca, C., Lopes, H., Nascimento, M., Sarmento, H., Marques, A., Ihle, A., Henriques, J., and Gouveia, E. (2022). Predictive Modeling of Injury Risk Based on Body Composition and Selected Physical Fitness Tests for Elite Football Players. Journal of Clinical Medicine, 11(16). DOI: 10.3390/jcm11164923.
Melo, M., Maia, M., Padrao, G., Brandao, D., Bezerra, E., Spineti, J., Giusti, L., and Soares, J. (2024). Data-centric ai for predicting non-contact injuries in professional soccer players. In Anais do XXXIX Simposio Brasileiro de Bancos de Dados, pages 167-180, Porto Alegre, RS, Brasil. SBC. DOI: 10.5753/sbbd.2024.240518.
Page, M., Mckenzie, J., Bossuyt, P., Boutron, I., Hoffmann, T., Mulrow, C., Shamseer, L., Tetzlaff, J., Akl, E., Brennan, S., Chou, R., Glanville, J., Grimshaw, J., Hrobjartsson, A., Lalu, M., Li, T., Loder, E., Mayo-Wilson, E., Mcdonald, S., and Moher, D. (2021). The PRISMA 2020 statement: An updated guideline for reporting systematic reviews. The BMJ, 372. DOI: 10.1136/bmj.n71.
Pfirrmann, D., Herbst, M., Ingelfinger, P., Simon, P., and Botzenhardt, S. (2016). Analysis of injury incidences in male professional adult and elite youth soccer players: A systematic review. Journal of Athletic Training, 51(5):410-424. DOI: 10.4085/1062-6050-51.6.03.
Pilka, T., Grzelak, B., Aleksandra, S., Gorecki, T., and Dyczkowski, K. (2023). Predicting injuries in football based on data collected from gps-based wearable sensors. Sensors, 23(3). DOI: 10.3390/s23031227.
Rossi, A., Pappalardo, L., Cintia, P., Iaia, F., Fernandez, J., and Medina, D. (2018). Effective injury forecasting in soccer with gps training data and machine learning. PloS one, 13(7):e0201264.
Rossi, A., Pappalardo, L., Filetti, C., and Cintia, P. (2022). Blood sample profile helps to injury forecasting in elite soccer players. Sport Sciences for Health, 19(1):285-296. DOI: 10.1007/s11332-022-00932-1.
Saberisani, R., Barati, A. H., Zarei, M., Santos, P., Gorouhi, A., Ardigo, L. P., and Nobari, H. (2025). Prediction of football injuries using GPS-based data in Iranian professional football players: a machine learning approach. Frontiers in Sports and Active Living, 7(January):1-9. DOI: 10.3389/fspor.2025.1425180.
Studnicka, A. (2020). The emergence of wearable technology and the legal implications for athletes, teams, leagues and other sports organizations across amateur and professional athletics. DePaul J. Sports L., 16:i.
Vallance, E., Sutton-Charani, N., Imoussaten, A., Montmain, J., and Perrey, S. (2020). Combining internal- and external-training-loads to predict non-contact injuries in soccer. Applied Sciences (Switzerland), 10(15). DOI: 10.3390/APP10155261.

