Machine Learning methods and models to predict food insecurity levels for families in Ceará, Brazil, based on employment, housing and other social indicators
DOI:
https://doi.org/10.5753/jbcs.2025.3100Keywords:
Food insecurity, machine learning, feature importance, classification modelAbstract
Many nations still struggle to provide their populations with access to food and balanced nutrition. The Food and Agriculture Organization of the United Nations (FAO) included Brazil in its 2022 Hunger Map, highlighting that 61 million Brazilians face difficulties in feeding themselves. Despite the presence of various food security alert and monitoring systems in food-insecure countries, the data and methodologies they rely on capture only a fraction of the issue’s complexity, underscoring the need for further research to fully comprehend this multifaceted problem. In response, the Secretary for Social Protection of Ceará (SPS - Secretaria de Proteção Social), located in Brazil’s northeast, conducted a survey to collect data on the social and economic characteristics of extremely vulnerable families. This dataset, analyzed in our study, represents a concentrated effort by the government of Ceará to evaluate the needs of low-income households, particularly those with children who lack access to essential services. We used the Brazilian Food Insecurity Scale, a tool validated by the Brazilian Ministry, to measure food insecurity levels based on families’ responses, assigning scores to their answers. This paper presents a machine learning model that examines the collected data to identify which factors related to Food Access, Employment and Income, Housing, and Public Services can predict levels of food insecurity. Our best model demonstrates an accuracy of approximately 0.75, an F1-score of 0.80, and can distinguish between severe and non-severe food insecurity levels. We suggest that our model could be applied to other datasets lacking nutrition-specific questions to gauge a family’s food insecurity level. Additionally, our research sheds light on the key factors influencing food insecurity levels in Brazil, notably income and housing conditions, providing valuable insights for addressing this issue.
Downloads
References
Barbosa, R. M. and Nelson, D. R. (2016). The use of support vector machine to analyze food security in a region of brazil. Applied Artificial Intelligence, 30(4):318-330. DOI: 10.1080/08839514.2016.1169048.
Batista, G. E., Bazzan, A. L., Monard, M. C., et al. (2003). Balancing training data for automated annotation of keywords: a case study. Wob, 3:10-18. Available at: [link].
Breiman, L. (1996). Bagging predictors. Machine Learning, 24(2):123-140. DOI: 10.1007/BF00058655.
Breiman, L. (2001). Random forests. Machine Learning, 45(1):5-32. DOI: 10.1023/A:1010933404324.
Caruana, R. and Niculescu-Mizil, A. (2006). An empirical comparison of supervised learning algorithms. In Proceedings of the 23rd International Conference on Machine Learning, pages 161-168. ACM. DOI: 10.1145/1143844.1143865.
Chawla, N. V., Bowyer, K. W., Hall, L. O., and Kegelmeyer, W. P. (2002). Smote: synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16:321-357. DOI: 10.48550/arXiv.1106.1813.
Christensen, C., Wagner, T., and Langhals, B. (2021). Year-independent prediction of food insecurity using classical and neural network machine learning methods. AI, 2(2):244-260. DOI: 10.3390/ai2020015.
Collado, L. F., Leichsenring, A. R., and Mountian, A. G. (2024). A saga do censo demográfico brasileiro de 2020. Boletim de Políticas Públicas/OIPP No16 agosto/2021, 29. Available at: [link].
Costa, N. S., Santos, M. O., Carvalho, C. P. O., Assunção, M. L., and Ferreira, H. S. (2017). Prevalence and factors associated with food insecurity in the context of the economic crisis in brazil. Current Developments in Nutrition, 1(10):e000869. DOI: 10.3945/cdn.117.000869.
Deléglise, H., Bégué, A., Interdonato, R., d’Hôtel, E. M., Roche, M., and Teisseire, M. (2020). Linking heterogeneous data for food security prediction. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 335-344. Springer. DOI: 10.1007/978-3-030-65965-3_22.
Deléglise, H., Interdonato, R., Bégué, A., d’Hôtel, E. M., Teisseire, M., and Roche, M. (2022). Food security prediction from heterogeneous data combining machine and deep learning methods. Expert Systems with Applications, 190:116189. DOI: 10.1016/j.eswa.2021.116189.
dos Santos, L. P., Lindemann, I. L., dos Santos Motta, J. V., Mintem, G., Bender, E., and Gigante, D. P. (2014). Proposta de versão curta da escala brasileira de insegurança alimentar. Revista de Saúde Pública, 48(5):783-789. Available at: [link].
Felker-Kantor, E. and Wood, C. H. (2012). Female-headed households and food insecurity in brazil. Food Security, 4(4):607-617. DOI: 10.1007/s12571-012-0215-y.
Fernández, A., García, S., del Jesus, M. J., and Herrera, F. (2008). A study of the behaviour of linguistic fuzzy rule based classification systems in the framework of imbalanced data-sets. Fuzzy Sets and Systems, 159(18):2378-2398. DOI: 10.1016/j.fss.2007.12.023.
Fletcher, J. M., Andreyeva, T., and Busch, S. H. (2009). Assessing the effect of changes in housing costs on food insecurity. Journal of Children and Poverty, 15(2):79-93. DOI: 10.1080/10796120903310541.
Foini, P., Tizzoni, M., Martini, G., Paolotti, D., and Omodei, E. (2022). On the forecastability of food insecurity. medRxiv, pages 2021-07. DOI: 10.1038/s41598-023-29700-y.
Friedman, J. H. (2001). Greedy function approximation: a gradient boosting machine. Annals of statistics, pages 1189-1232. DOI: 10.1214/aos/1013203451.
Furtado, L. S. and Furtado, L. S. (2021). Urban collectives and insurgency to fight covid-19: an analysis of social media content. Oculum Ensaios, 18:1-21. DOI: 10.24220/2318-0919v18e2021a5136.
Gao, C., Fei, C. J., McCarl, B. A., and Leatham, D. J. (2020). Identifying vulnerable households using machine learning. Sustainability, 12(15). DOI: 10.3390/su12156002.
Gedeon, T. D. (1997). Data mining of inputs: analysing magnitude and functional measures. International Journal of Neural Systems, 8(02):209-218. DOI: 10.1142/s0129065797000227.
Guidotti, R., Monreale, A., Ruggieri, S., Turini, F., Giannotti, F., and Pedreschi, D. (2018). A survey of methods for explaining black box models. ACM Computing Surveys (CSUR), 51(5):1-42. DOI: 10.1145/3236009.
Haixiang, G., Yijing, L., Shang, J., Mingyun, G., Yuanyue, H., and Bing, G. (2017). Learning from class-imbalanced data: Review of methods and applications. Expert Systems with Applications, 73:220-239. DOI: 10.1016/j.eswa.2016.12.035.
Hanif, A., Beheshti, A., Benatallah, B., Zhang, X., Habiba, Foo, E., Shabani, N., and Shahabikargar, M. (2023). A comprehensive survey of explainable artificial intelligence (xai) methods: Exploring transparency and interpretability. In International Conference on Web Information Systems Engineering, pages 915-925. Springer. DOI: 10.1007/978-981-99-7254-8_71.
He, X., Zhao, K., and Chu, X. (2021). Automl: A survey of the state-of-the-art. Knowledge-Based Systems, 212:106622. DOI: 10.48550/arXiv.1908.00709.
Howell, D. C. (2011). Chi-square test: analysis of contingency tables. In International Encyclopedia of Statistical Science, pages 250-252. Springer. Available at: [link].
King, C. (2018). Food insecurity and housing instability in vulnerable families. Review of Economics of the Household, 16(2):255-273. DOI: 10.1007/s11150-016-9335-z.
LeDell, E. and Poirier, S. (2020). H2O AutoML: Scalable automatic machine learning. 7th ICML Workshop on Automated Machine Learning (AutoML). Available at:[link].
Lee, C. Y., Zhao, X., Reesor-Oyer, L., Cepni, A. B., and Hernandez, D. C. (2021). Bidirectional relationship between food insecurity and housing instability. Journal of the Academy of Nutrition and Dietetics, 121(1):84-91. DOI: 10.1016/j.jand.2020.08.081.
Lentz, E. C., Michelson, H., Baylis, K., and Zhou, Y. (2019). A data-driven approach improves food insecurity crisis prediction. World Development, 122:399-409. DOI: 10.1016/j.worlddev.2019.06.008.
Lundberg, S. M. and Lee, S.-I. (2017). A unified approach to interpreting model predictions. In Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS'17, page 4768–4777, Red Hook, NY, USA. Curran Associates Inc.. DOI: 10.48550/arXiv.1705.07874.
Molnar, C. (2022). Interpretable Machine Learning. 2 edition. Available at: [link].
Neri, M. (2022). Insegurança alimentar no brasil: pandemia, tendências e comparações internacionais. Rio de Janeiro: FGV Social. Available at: [link].
Nica-Avram, G., Harvey, J., Goulding, J., Lucas, B., Smith, A., Smith, G., and Perrat, B. (2020). Fims: Identifying, predicting and visualising food insecurity. In Companion Proceedings of the Web Conference 2020, pages 190-193. DOI: 10.1145/3366424.3383538.
Nord, M. (2007). Characteristics of low-income households with very low food security: an analysis of the usda gpra food security indicator. USDA-ERS Economic Information Bulletin, (25). Available at: [link].
Organization, W. H. et al. (2021). The State of Food Security and Nutrition in the World 2021: Transforming food systems for food security, improved nutrition and affordable healthy diets for all, volume 2021. Food & Agriculture Org. Available at: [link].
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., and Duchesnay, E. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825-2830. Available at: [link].
Reis, M. (2012). Food insecurity and the relationship between household income and children's health and nutrition in brazil. Health economics, 21(4):405-427. DOI: 10.1002/hec.1722.
Saarela, M. and Jauhiainen, S. (2021). Comparison of feature importance measures as explanations for classification models. SN Applied Sciences, 3(2):272. DOI: 10.1007/s42452-021-04148-9.
Sammut, C. and Webb, G. I. (2017). Encyclopedia of machine learning and data mining. Springer Publishing Company, Incorporated. DOI: 10.1007/978-1-4899-7687-1.
Santana, O. M. M. L. d., Sousa, L. V. d. A., Lima Rocha, H. A., Correia, L. L., Gomes, L. G. A., Aquino, C. M. d., Rocha, S. G. M. O., Araújo, D. A. B. S., Soares, M. D. d. A., Machado, M. M. T., et al. (2023). Analyzing households’ food insecurity during the covid-19 pandemic and the role of public policies to mitigate it: evidence from ceará, brazil. Global Health Promotion, 30(1):53-62. DOI: 10.1177/17579759221107035.
Schapire, R. E., Freund, Y., Bartlett, P., and Lee, W. S. (1998). Boosting the margin: A new explanation for the effectiveness of voting methods. Annals of statistics, pages 1651-1686. DOI: 10.1214/aos/1024691352.
Schmidhuber, J. (2015). Deep learning in neural networks: An overview. Neural Networks, 61:85-117. DOI: 10.1016/j.neunet.2014.09.003.
Segall-Corrêa, A. M. and Marin-Leon, L. (2009). A segurança alimentar no brasil: proposição e usos da escala brasileira de medida da insegurança alimentar (ebia) de 2003 a 2009. Segurança Alimentar e Nutricional, 16(2):1-19. DOI: 10.20396/san.v16i2.8634782.
Sutton, C. D. (2005). Classification and regression trees, bagging, and boosting. Handbook of statistics, 24:303-329. DOI: 10.1016/S0169-7161(04)24011-1.
Tang, J., Alelyani, S., and Liu, H. (2014). Feature selection for classification: A review. Data classification: Algorithms and applications, page 37. Available at: [link].
Tomek, I. (1976). Two modifications of cnn. IEEE Transactions on Systems, Man, and Cybernetics, SMC-6(11):769-772. DOI: 10.1109/TSMC.1976.4309452.
Zhou, Z.-H. (2012). Ensemble methods: foundations and algorithms. CRC press. Book.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Ticiana L. Coelho da Silva, Lara Sucupira Furtado, Guilherme Sales Fernandes, José A. Fernandes de Macêdo, Lívia Almada Cruz, Régis Pires Magalhães, Laecia Gretha Amorim Gomes

This work is licensed under a Creative Commons Attribution 4.0 International License.

