Performance Evaluation of Classifiers Based on Rainfall-Related Data Collected by an Automatic Weather Station

Authors

DOI:

https://doi.org/10.5753/reic.2025.5420

Keywords:

Machine Learning, Precipitation, Explainable Artificial Intelligence, Classical Models, Water Resources

Abstract

Precipitation forecasting is essential for the planning and management of water resources, affecting areas such as agriculture, energy generation, urban planning, and water conservation. In the case of agriculture, this forecasting enables the collection of accurate data on rainfall distribution and quantity, optimising water usage and increasing productivity. This study explores the application of different classical machine learning approaches to precipitation forecasting, using a pluviometric dataset from Tianguá (CE), encompassing stages such as pre-processing, data balancing, evaluation of nine different models, and local explanation of a model. The objective is to conduct a comparative analysis of the performance of algorithms such as Logistic Regression (LR), Naive Bayes (NB), Extra Tree (ET), Extreme Learning Machine (ELM), k-nearest neighbours (k-NN), Multilayer Perceptron (MLP), Random Forest (RF), Support Vector Machines (SVM), and an ensemble model. Additionally, the Local Interpretable Model-Agnostic Explanations (LIME) method was applied, aiming to identify the most suitable model and understand the attributes that most influence the classifications. This approach provides valuable insights for improving performance in future studies, such as the application of feature selection techniques. Among the tested models, the ensemble based on the majority vote of four algorithms demonstrated the best overall performance, with an accuracy of 85% and excellent balance in the F1-Score, making it a robust choice for applications requiring reliable predictions. Although the NB model showed lower accuracy (76%), it stood out for its high precision (91%), indicating that when it predicted rain, it was almost always correct, as well as demonstrating excellent sensitivity in detecting the absence of precipitation (recall of 88%). Other models recorded accuracy rates between 83% and 84%, making them viable alternatives depending on the specific needs of each application. When explaining the ET model using the LIME method, the analysis of the first ten predictions from the test set revealed that the most influential features for predicting rain included maximum and instantaneous humidity, as well as minimum and instantaneous temperature, while factors such as maximum temperature and minimum humidity were more relevant for predicting the absence of rain.

Downloads

Download data is not yet available.

References

Ardabili, S., Mosavi, A., Dehghani, M., and Várkonyi-Kóczy, A. R. (2020). Deep Learning and Machine Learning in Hydrological Processes Climate Change and Earth Systems a Systematic Review. In Várkonyi-Kóczy, A. R., editor, Engineering for Sustainable Future, pages 52–62, Cham. Springer International Publishing. DOI: 10.1007/978-3-030-36841-8_5.

Borella, L. d. C., Borella, M. R. d. C., and Corso, L. L. (2022). Climate analysis using neural networks as supporting to the agriculture. Gestão Produção, 29:e06. DOI: 10.1590/1806-9649-2022v29e06.

Danandeh Mehr, A., Rikhtehgar Ghiasi, A., Yaseen, Z. M., Sorman, A. U., and Abualigah, L. (2023). A novel intelligent deep learning predictive model for meteorological drought forecasting. Journal of Ambient Intelligence and Humanized Computing, 14(8):10441–10455. DOI: 10.1007/s12652-022-03701-7.

Dotse, S.-Q., Larbi, I., Limantol, A. M., and De Silva, L. C. (2024). A review of the application of hybrid machine learning models to improve rainfall prediction. Modeling Earth Systems and Environment, 10(1):19–44. DOI: 10.1007/s40808-023-01835-x.

Fernández-Delgado, M., Cernadas, E., Barro, S., and Amorim, D. (2014). Do we Need Hundreds of Classifiers to Solve Real World Classification Problems? The Journal of Machine Learning Research, 15(1):3133–3181. Disponível em [link].

Latif, S. D. and Ahmed, A. N. (2023). A review of deep learning and machine learning techniques for hydrological inflow forecasting. Environment, Development and Sustainability, 25(11):12189–12216. DOI: 10.1007/s10668-023-03131-1.

Maaten, L. v. d. and Hinton, G. (2008). Visualizing data using t-SNE. Journal of Machine Learning Research, 9(Nov):2579–2605. Disponível em [link].

Mishra, S. (2017). Handling imbalanced data: Smote vs. random undersampling. Int. Res. J. Eng. Technol., 4(8):317–320. Disponível em [link].

Molnar, C. (2020). Interpretable Machine Learning. Leanpub.

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., and Duchesnay, E. (2011). Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, 12:2825–2830. Disponível em [link].

Pereira Filho, A. J., Pinto, M. A. R. C., Manfredini, L., Lima, F. A. d., Pinto, A. C. e. C., Moribe, C. H., Vemado, F., and Silva Júnior, I. W. d. (2020). Sistema Integrado de Estimativa e Previsão de Precipitação para Bacias Hidrográficas da CESP. Revista Brasileira de Meteorologia, 35(4):529–552. DOI: 10.1590/0102-7786352023.

Ribeiro, M. T., Singh, S., and Guestrin, C. (2016). "Why Should I Trust You?": Explaining the Predictions of Any Classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16, page 1135–1144, New York, NY, USA. Association for Computing Machinery. DOI: 10.1145/2939672.2939778.

Ribeiro, P. L. (2024). Análise e previsão de secas no norte de Minas Gerais utilizando machine learning. Dissertação de Mestrado, PROFÁGUA, Universidade Federal de Itajubá. Disponível em [link].

Sokolova, M. and Lapalme, G. (2009). A systematic analysis of performance measures for classification tasks. Information Processing & Management, 45(4):427–437. DOI: 10.1016/j.ipm.2009.03.002.

Sousa, R., Brito, R., and Ximenes, J. (2021). Avaliação de Desempenho Utilizando Diferentes Arquiteturas de Aprendizagem Profunda a Partir de Dados Relacionados a Precipitação Pluviométrica Coletados por Estação Meteorológica Automática. In Anais do XIV Encontro Unificado de Computação do Piauí e XI Simpósio de Sistemas de Informação, pages 168–175, Porto Alegre, RS, Brasil. SBC. DOI: 10.5753/enucompi.2021.17768.

Waqas, M., Humphries, U. W., Wangwongchai, A., Dechpichai, P., and Ahmad, S. (2023). Potential of Artificial Intelligence-Based Techniques for Rainfall Forecasting in Thailand: A Comprehensive. Water, 15(16). DOI: 10.3390/w15162979.

Ximenes de Brito, R., DE OLIVEIRA, A. C., and de Oliveira, A. C. (2025). Classificação de Precipitações - INMET - Tianguá - 2021 - IFCE. Conjunto de dados de precipitações coletados do INMET referente ao município de Tianguá (CE) durante o período de 15/03/2018 a 11/05/2021. Disponível em [link].

Published

2025-04-14

How to Cite

Oliveira, A. C. de, de Brito, R. X., Chaves, M. A. de O., Sousa, R. N. de, Oliveira, A. C. de, & Almeida Júnior, P. C. de. (2025). Performance Evaluation of Classifiers Based on Rainfall-Related Data Collected by an Automatic Weather Station. Electronic Journal of Undergraduate Research on Computing, 23(1), 24–29. https://doi.org/10.5753/reic.2025.5420

Issue

Section

Full Papers