A Machine Learning Classification Model for Identifying College Students with Depression Based on Digital Phenotyping
DOI:
https://doi.org/10.5753/jbcs.2026.5939Keywords:
College students, depression, digital phenotyping, machine learning, mobile sensorsAbstract
Depression is a serious global mental health illness that causes significant suffering to the individual and social impairment in their lives. Compared to the general population, depression shows a higher prevalence among college students. With recent advancements in digital phenotyping data analysis to infer depressive symptoms, machine learning (ML) techniques have been increasingly employed to indicate behaviors related to potential depressive profiles (PDP). However, despite the growing body of work on ML usage to detect depression, few studies have focused on data preprocessing approaches to handle missing values in datasets that go beyond common data imputation. In this study, we conducted a series of experiments to evaluate the combination of data preprocessing methods and ML algorithms for effectively classifying PDP and non-PDP students using data from the Amive project. The primary challenges were implementing a data processing workflow to address missing values and class imbalance, common issues in digital phenotyping datasets, and selecting algorithms capable of handling such data. The experimental results showed promising outcomes, with individual classification models, including Random Forest, XGBoost, and SVM(rbf), achieving accuracies of 77%, 75%, and 76%, respectively. The best performance was obtained by training on datasets that went through outlier filtering, specifically removing rows with four or more missing values. This combination of data preprocessing approaches and ML algorithms resulted in a Random Forest classification model with the best performance ranging between 77% of accuracy and with mean errors metrics of AUC and MCC above 0.5.
Downloads
References
Ahmed, M. S. and Ahmed, N. (2023). A fast and minimal system to identify depression using smartphones: Explainable machine learning-based approach. JMIR Form Res, 7:e28848. DOI: 10.2196/28848.
Akbarova, S., Im, M., Kim, S., Toshnazarov, K., Chung, K.-M., Chun, J., Noh, Y., and Kim, Y.-A. (2023). Improving depression severity prediction from passive sensing: Symptom-profiling approach. Sensors, 23(21). DOI: 10.3390/s23218866.
Alves, V. d. C., Garcia, F. E., Saud, C., Mendes, A., Medeiros Caseli, H., Genaro Motti, V., de Oliveira Neris, L., Blecher, T., and Almeida Neris, V. P. (2023). College students-in-the-loop for their mental health: a case of ai and humans working together to support well-being. Interaction Design and Architecture(s), (59):79–94. DOI: 10.55612/s-5002-059-003.
American Psychiatric Association, A. (2013). Diagnostic and Statistical Manual of Mental Disorders. American Psychiatric Association. DOI: 10.1176/appi.books.9780890425596.
Asare, K. O., Moshe, I., Terhorst, Y., Vega, J., Hosio, S., Baumeister, H., Pulkki-Råback, L., and Ferreira, D. (2022). Mood ratings and digital biomarkers from smartphone and wearable data differentiates and predicts depression status: A longitudinal data analysis. Pervasive and Mobile Computing, 83:101621. DOI: 10.1016/j.pmcj.2022.101621.
Bai, R., Xiao, L., Guo, Y., Zhu, X., Li, N., Wang, Y., Chen, Q., Feng, L., Wang, Y., Yu, X., Wang, C., Hu, Y., Liu, Z., Xie, H., and Wang, G. (2021). Tracking and monitoring mood stability of patients with major depressive disorder by machine learning models using passive digital data: Prospective naturalistic multicenter study. JMIR Mhealth Uhealth, 9(3):e24365. DOI: 10.2196/24365.
Davidson, B. I. (2022). The crossroads of digital phenotyping. General Hospital Psychiatry, 74:126-132. DOI: 10.1016/j.genhosppsych.2020.11.009.
Doryab, A., Villalba, D. K., Chikersal, P., Dutcher, J. M., Tumminia, M., Liu, X., Cohen, S., Creswell, K., Mankoff, J., Creswell, J. D., et al. (2019). Identifying behavioral phenotypes of loneliness and social isolation with passive sensing: statistical analysis, data mining and machine learning of smartphone and fitbit data. JMIR mHealth and uHealth, 7(7):e13209. DOI: 10.2196/13209.
Eichstaedt, J. C., Smith, R. J., Merchant, R. M., Ungar, L. H., Crutchley, P., Preoţiuc-Pietro, D., Asch, D. A., and Schwartz, H. A. (2018). Facebook language predicts depression in medical records. Proceedings of the National Academy of Sciences, 115(44):11203-11208. DOI: 10.1073/pnas.1802331115.
Goodfellow, I., Bengio, Y., and Courville, A. (2016). Deep Learning. MIT Press. DOI: 10.1038/nature14539.
Hu, Y., Chen, J., Chen, J., Wang, W., Zhao, S., and Hu, X. (2023). An ensemble classification model for depression based on wearable device sleep data. IEEE Journal of Biomedical and Health Informatics. DOI: 10.1109/jbhi.2023.3258601.
Huang, C.-M., Hung, C.-S., Hsu, Y.-Y., Zheng, Y.-C., Yu, C.-H., Lin, C.-H. R., and Chen, S.-H. (2024). A k-means clustering based under-sampling method for imbalanced dataset classification. International Conference on Information Networking, page 708 – 713. DOI: 10.1109/ICOIN59985.2024.10572133.
Ibrahim, A. K., Kelly, S. J., Adams, C. E., and Glazebrook, C. (2013). A systematic review of studies of depression prevalence in university students. Journal of psychiatric research, 47(3):391-400. DOI: 10.1016/j.jpsychires.2012.11.015.
Khoo, L. S., Lim, M. K., Chong, C. Y., and McNaney, R. (2024). Machine learning for multimodal mental health detection: A systematic review of passive sensing approaches. Sensors, 24(2). DOI: 10.3390/s24020348.
Lauckner, C., Hill, M., and Ingram, L. A. (2020). An exploratory study of the relationship between social technology use and depression among college students. Journal of college student psychotherapy, 34(1):33-39. DOI: 10.1080/87568225.2018.1508396.
Lima, J. D., Plácido, J., Andrade, B., Abend, L. D., Waclawovsky, A. J., Pires, D. A., et al. (2025). Intersectionality and mental health in university students: a jeopardy index approach. Revista de Saúde Pública, 59:e3. DOI: 10.11606/s1518-8787.2025059006197.
Lorena, A., Faceli, K., Almeida, T., de Carvalho, A., and Gama, J. (2021). Inteligência Artificial: uma abordagem de Aprendizado de Máquina (2nd edition). LTC. Book.
Melcher, J., Hays, R., and Torous, J. (2020). Digital phenotyping for mental health of college students: a clinical review. BMJ Ment Health, 23(4):161-166. DOI: 10.1136/ebmental-2020-300180.
Meleiro, A., Teng, C. T., Demetrio, F. N., Batista, V. C., Vieira, L. F., and Elorza, P. M. (2023). Understanding the journey of patients with depression in brazil: A systematic review. Clinics, 78:100192. DOI: 10.1016/j.clinsp.2023.100192.
Mitchell, T. M. (1997). Machine learning, volume 1. McGraw-hill New York. DOI: 10.1007/978-1-4613-2279-5.
Pacheco, J. P., Giacomin, H. T., Tam, W. W., Ribeiro, T. B., Arab, C., Bezerra, I. M., and Pinasco, G. C. (2017). Mental health problems among medical students in brazil: a systematic review and meta-analysis. Brazilian Journal of Psychiatry, 39:369-378. DOI: 10.1590/1516-4446-2017-2223.
Pedrelli, P., Fedor, S., Ghandeharioun, A., Howe, E., Ionescu, D. F., Bhathena, D., Fisher, L. B., Cusin, C., Nyer, M., Yeung, A., Sangermano, L., Mischoulon, D., Alpert, J. E., and Picard, R. W. (2020). Monitoring changes in depression severity using wearable and mobile sensors. Frontiers in Psychiatry, 11. DOI: 10.3389/fpsyt.2020.584711.
Santos, I. S., Tavares, B. F., Munhoz, T. N., Almeida, L. S. P. d., Silva, N. T. B. d., Tams, B. D., Patella, A. M., and Matijasevich, A. (2013). Sensibilidade e especificidade do patient health questionnaire-9 (phq-9) entre adultos da população geral. Cadernos de Saúde Pública, 29(8):1533–1543. DOI: 10.1590/0102-311X00144612.
Santos, M. P. d., Heckler, W. F., Bavaresco, R. S., and Barbosa, J. L. V. (2024). Machine learning applied to digital phenotyping: A systematic literature review and taxonomy. Computers in Human Behavior, 161:108422. DOI: 10.1016/j.chb.2024.108422.
Saud, C. d. S. A. (2023). Uma infraestrutura computacional para a identificação de estudantes universitários com possível perfil depressivo usando dados de sensores móveis. Available at:[link].
Schuch, H. S., CADEMARTORI, M. G., DIAS, V. D., LEVANDOWSKI, M. L., MUNHOZ, T. N., HALLAL, P. C., and DEMARCO, F. F. (2023). Depression and anxiety among the university community during the covid-19 pandemic: a study in southern brazil. Anais da Academia Brasileira de Ciências, 95(1). DOI: 10.1590/0001-3765202320220100.
Srividya, M., Mohanavalli, S., and Bhalaji, N. (2018). Behavioral modeling for mental health using machine learning algorithms. Journal of medical systems, 42:1-12. DOI: 10.1007/s10916-018-0934-5.
Sultana, M., Al-Jefri, M., and Lee, J. (2020). Using machine learning and smartphone and smartwatch data to detect emotional states and transitions: Exploratory study. JMIR Mhealth Uhealth, 8(9):e17818. DOI: 10.2196/17818.
Tate, A. E., McCabe, R. C., Larsson, H., Lundström, S., Lichtenstein, P., and Kuja-Halkola, R. (2020). Predicting mental health problems in adolescence using machine learning techniques. PLOS ONE, 15(4):1-13. DOI: 10.1371/journal.pone.0230389.
Thieme, A., Belgrave, D., and Doherty, G. (2020). Machine learning in mental health: A systematic review of the hci literature to support the development of effective and implementable ml systems. ACM Trans. Comput.-Hum. Interact., 27(5). DOI: 10.1145/3398069.
Tomaszewski, J. E. (2021). Overview of the role of artificial intelligence in pathology: the computer as a pathology digital assistant. In Artificial intelligence and deep learning in pathology, pages 237-262. Elsevier. DOI: 10.1016/b978-0-323-67538-3.00011-7.
Torous, J., Kiang, M. V., Lorme, J., and Onnela, J.-P. (2016). New tools for new research in psychiatry: A scalable and customizable platform to empower data driven smartphone research. JMIR Mental Health, 3(2):e16. DOI: 10.2196/mental.5165.
Wang, R., Chen, F., Chen, Z., Li, T., Harari, G., Tignor, S., Zhou, X., Ben-Zeev, D., and Campbell, A. T. (2017). Studentlife: Using smartphones to assess mental health and academic performance of college students. Mobile Health: Sensors, Analytic Methods, and Applications, pages 7-33. DOI: 10.1007/978-3-319-51394-2_2.
Wang, R., Wang, W., DaSilva, A., Huckins, J. F., Kelley, W. M., Heatherton, T. F., and Campbell, A. T. (2018). Tracking depression dynamics in college students using mobile phone and wearable sensing. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 2(1):1-26. DOI: 10.1145/3191775.
Ware, S., Yue, C., Morillo, R., Lu, J., Shang, C., Bi, J., Kamath, J., Russell, A., Bamis, A., and Wang, B. (2020). Predicting depressive symptoms using smartphone data. Smart Health, 15:100093. DOI: 10.1016/j.smhl.2019.100093.
World Health Organization (2022). Mental health and covid-19: scientific brief, 2 march 2022. Available at:[link].
World Health Organization (2024). Depression. DOI: 10.1037/e303202003-001.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Evandro Y. A. Ribeiro, Franco E. Garcia, Conrado dos S. Alves Saud, Helena de M. Caseli, Vivian G. Motti, Taís Bleicher, Jair B. Neto, Heloisa C. Figueiredo Frizzo, Larissa C. Martini, Luciano de O. Neris, Anderson Ara, Alan D. Baria Valejo, Vânia P. Almeida

This work is licensed under a Creative Commons Attribution 4.0 International License.

