Enhancing Infrastructure Observability: Machine Learning for Proactive Monitoring and Anomaly Detection

Authors

DOI:

https://doi.org/10.5753/jisa.2024.4509

Keywords:

Machine Learning, Infrastructure Monitoring, Anomaly Detection, Proactive Maintenance

Abstract

This study addresses the critical challenge of proactive anomaly detection and efficient resource management in infrastructure observability. Introducing an innovative approach to infrastructure monitoring, this work integrates machine learning models into observability platforms to enhance real-time monitoring precision. Employing a microservices architecture, the proposed system facilitates swift and proactive anomaly detection, addressing the limitations of traditional monitoring methods that often fail to predict potential issues before they escalate. The core of this system lies in its predictive models that utilize Random Forest, Gradient Boosting, and Support Vector Machine algorithms to forecast crucial metric behaviors, such as CPU usage and memory allocation. The empirical results underscore the system's efficacy, with the GradientBoostingRegressor model achieving an R² score of 0.86 for predicting request rates, and the RandomForestRegressor model significantly reducing the Mean Squared Error by 2.06% for memory usage predictions compared to traditional monitoring methods. These findings not only demonstrate the potential of machine learning in enhancing observability but also pave the way for more resilient and adaptive infrastructure management.

Downloads

Download data is not yet available.

Author Biography

Valderi R. Q. Leithardt, Instituto Universitário de Lisboa (ISCTE-IUL), ISTAR, Lisboa, Portugal

VALDERI REIS QUIETINHO LEITHARDT
(Senior Member, IEEE) received the Ph.D. degree in computer science from INF-UFRGS, Brazil,
in 2015. He is currently a Professor with the ISEL -  Polytechnic University of Lisbon and a Researcher
integrated with the CTS UNINOVA Universidade Nova de Lisboa. He is also a Collaborating Researcher at the Expert Systems and Applications Laboratory (ESALab), University of Salamanca, Spain. His mainline of research interests include distributed systems with a focus on data privacy, communication, and programming protocols, involving scenarios and applications for the Internet of Things, smart cities, big data, cloud computing, and blockchain.

https://www.isel.pt/docente/valderi-reis-quietinho-leithardt

References

Ahola, J. (2022). Cloud monitoring: cloud monitoring with dynatrace. Available at: [link] (Accessed: March 20, 2024).

Aldi, F. et al. (2023). Standardscaler's potential in enhancing breast cancer accuracy using machine learning. Journal of Applied Engineering and Technological Science (JAETS), 5(1):401-413. DOI: 10.37385/jaets.v5i1.3080.

Barth, W. (2008). Nagios: System and network monitoring. Book.

Behnel, S. et al. (2010). Cython: The best of both worlds. Computing in Science and Engineering, 13(2):31-39. DOI: 10.1109/MCSE.2010.118.

Borré, A., Seman, L. O., Camponogara, E., Stefenon, S. F., Mariani, V. C., and Coelho, L. S. (2023). Machine fault detection using a hybrid CNN-LSTM attention-based model. Sensors, 23(9):4512. DOI: 10.3390/s23094512.

Chakrabarty, N. et al. (2019). Flight arrival delay prediction using gradient boosting classifier. Springer Singapore. DOI: 10.1007/978-981-13-1498-8_57.

Chakraborty, Mainak, K. A. P. (2021). Grafana, pages 187-240. Apress. DOI: 10.1007/978-1-4842-6888-9.

Christodoulou, E. et al. (2019). A systematic review shows no performance benefit of machine learning over logistic regression for clinical prediction models. Journal of clinical epidemiology, 110:12-22. DOI: 10.1016/j.jclinepi.2019.02.004.

Corso, M. P., Stefenon, S. F., Singh, G., Matsuo, M. V., Perez, F. L., and Leithardt, V. R. Q. (2023). Evaluation of visible contamination on power grid insulators using convolutional neural networks. Electrical Engineering, 105:3881–3894. DOI: 10.1007/s00202-023-01915-2.

da Silva, E. C., Finardi, E. C., and Stefenon, S. F. (2024). Enhancing hydroelectric inflow prediction in the Brazilian power system: A comparative analysis of machine learning models and hyperparameter optimization for decision support. Electric Power Systems Research, 230:110275. DOI: 10.1016/j.epsr.2024.110275.

Da Silva, M. D. and Tavares, H. L. (2015). Redis Essentials. Book.

De Souza, P. R. R., Matteussi, K. J., Veith, A. D. S., Zanchetta, B. F., Leithardt, V. R. Q., Murciego, . L., De Freitas, E. P., Anjos, J. C. S. D., and Geyer, C. F. R. (2020). Boosting big data streaming applications in clouds with burstflow. IEEE Access, 8:219124-219136. DOI: 10.1109/ACCESS.2020.3042739.

dos Santos, G. H., Seman, L. O., Bezerra, E. A., Leithardt, V. R. Q., Mendes, A. S., and Stefenon, S. F. (2021). Static attitude determination using convolutional neural networks. Sensors, 21(19):6419. DOI: 10.3390/s21196419.

Dossot, D. (2014). RabbitMQ essentials. Book.

Dynatrace (2024). Dynatrace documentation. Available at: [link] (Accessed: July 18, 2024).

Elango, S. et al. (2022). Extreme gradient boosting regressor solution for defy in drilling of materials. Advances in Materials Science and Engineering. DOI: 10.1155/2022/8330144.

Hao, N., He, F., Xie, C., Tian, C., and Yao, Y. (2022). Nonlinear observability analysis of multi-robot cooperative localization. Systems & Control Letters, 168:105340. DOI: 10.1016/j.sysconle.2022.105340.

Jouppi, N., Young, C., Patil, N., and Patterson, D. (2018). Motivation for and evaluation of the first tensor processing unit. ieee Micro, 38(3):10-19. DOI: 10.1109/MM.2018.032271057.

Junior, R. L. R., Malde, S., Cazzaniga, C., Kastriotou, M., Letiche, M., Frost, C., and Rech, P. (2022). High energy and thermal neutron sensitivity of google tensor processing units. IEEE Transactions on Nuclear Science, 69(3):567-575. DOI: 10.1109/TNS.2022.3142092.

Klaar, A. C. R., Stefenon, S. F., Seman, L. O., Mariani, V. C., and Coelho, L. S. (2023). Optimized EWT-Seq2Seq-LSTM with attention mechanism to insulators fault prediction. Sensors, 23(6):3202. DOI: 10.3390/s23063202.

Kramer, O. and Kramer, O. (2016). Scikit-learn, pages 45-53. DOI: 10.1007/978-3-319-33383-0.

Leithardt, V., Santos, D., Silva, L., Viel, F., Zeferino, C., and Silva, J. (2020). A solution for dynamic management of user profiles in iot environments. IEEE Latin America Transactions, 18(07):1193-1199. DOI: 10.1109/TLA.2020.9099759.

Liu, Y., Wang, Y., and Zhang, J. (2012). New machine learning algorithm: Random forest, volume 3. Springer Berlin Heidelberg. DOI: 10.1007/978-3-642-34062-8_32.

McKinney, W. (2018). Python para análise de dados: Tratamento de dados com pandas, numpy e ipython. Book.

Milani, A. (2008). Postgresql-guia do programador. Book.

Min, L., Alnowibet, K. A., Alrasheedi, A. F., Moazzen, F., Awwad, E. M., and Mohamed, M. A. (2021). A stochastic machine learning based approach for observability enhancement of automated smart grids. Sustainable Cities and Society, 72:103071. DOI: 10.1016/j.scs.2021.103071.

Moreno, S. R., Seman, L. O., Stefenon, S. F., dos Santos Coelho, L., and Mariani, V. C. (2024). Enhancing wind speed forecasting through synergy of machine learning, singular spectral analysis, and variational mode decomposition. Energy, 292:130493. DOI: 10.1016/j.energy.2024.130493.

Noetzold, D. (2024a). Healthcheck analytics platform. Available at: [link] (Accessed: April 02, 2024).

Noetzold, D. (2024b). Healthcheck training. Available at: [link] (Accessed: April 06, 2024).

Noetzold, D. et al. (2023). Use of spyware integrated with prediction models for computer monitoring. In 2023 18th Iberian Conference on Information Systems and Technologies (CISTI). IEEE. DOI: 10.23919/CISTI58278.2023.10211594.

Olups, R. (2016). Zabbix network monitoring. Book.

Ribeiro, M. H. D. M., da Silva, R. G., Moreno, S. R., Canton, C., Larcher, J. H. K., Stefenon, S. F., Mariani, V. C., and dos Santos Coelho, L. (2024). Variational mode decomposition and bagging extreme learning machine with multi-objective optimization for wind power forecasting. Applied Intelligence, 54:3119-3134. DOI: 10.1007/s10489-024-05331-2.

Rodriguez-Galiano, V. et al. (2015). Machine learning predictive models for mineral prospectivity: An evaluation of neural networks, random forest, regression trees and support vector machines. Ore Geology Reviews, 71:804-818. DOI: 10.1016/j.oregeorev.2015.01.001.

Singh, G., Stefenon, S. F., and Yow, K.-C. (2023). Interpretable visual transmission lines inspections using pseudo-prototypical part network. Machine Vision and Applications, 34(3):41. DOI: 10.1007/s00138-023-01390-6.

Stefenon, S. F., Seman, L. O., Aquino, L. S., and dos Santos Coelho, L. (2023a). Wavelet-Seq2Seq-LSTM with attention for time series forecasting of level of dams in hydroelectric power plants. Energy, 274:127350. DOI: 10.1016/j.energy.2023.127350.

Stefenon, S. F., Seman, L. O., da Silva, L. S. A., Mariani, V. C., and dos Santos Coelho, L. (2024). Hypertuned temporal fusion transformer for multi-horizon time series forecasting of dam level in hydroelectric power plants. International Journal of Electrical Power & Energy Systems, 157:109876. DOI: 10.1016/j.ijepes.2024.109876.

Stefenon, S. F., Seman, L. O., Mariani, V. C., and Coelho, L. S. (2023b). Aggregating prophet and seasonal trend decomposition for time series forecasting of Italian electricity spot prices. Energies, 16(3):1371. DOI: 10.3390/en16031371.

Stefenon, S. F., Seman, L. O., Sopelsa Neto, N. F., Meyer, L. H., Mariani, V. C., and Coelho, L. d. S. (2023c). Group method of data handling using Christiano-Fitzgerald random walk filter for insulator fault prediction. Sensors, 23(13):6118. DOI: 10.3390/s23136118.

Stefenon, S. F., Singh, G., Souza, B. J., Freire, R. Z., and Yow, K.-C. (2023d). Optimized hybrid YOLOu-Quasi-ProtoPNet for insulators classification. IET Generation, Transmission & Distribution, 17(15):3501-3511. DOI: 10.1049/gtd2.12886.

Surek, G. A. S., Seman, L. O., Stefenon, S. F., Mariani, V. C., and Coelho, L. S. (2023). Video-based human activity recognition using deep learning approaches. Sensors, 23(14):6384. DOI: 10.3390/s23146384.

Tarek, Z. et al. (2023). Wind power prediction based on machine learning and deep learning models. Computers, Materials and Continua, 75(1). DOI: 10.32604/cmc.2023.032533.

Turnbull, J. (2018). Monitoring with Prometheus. Book.

Vos, G., Trinh, K., Sarnyai, Z., and Azghadi, M. R. (2023). Generalizable machine learning for stress monitoring from wearable devices: A systematic literature review. International Journal of Medical Informatics, 173:105026. DOI: 10.1016/j.ijmedinf.2023.105026.

Webb, P. et al. (2013). Spring boot reference guide. Available at: [link] (Accessed: March 25, 2024).

Yamasaki, M., Freire, R. Z., Seman, L. O., Stefenon, S. F., Mariani, V. C., and dos Santos Coelho, L. (2024). Optimized hybrid ensemble learning approaches applied to very short-term load forecasting. International Journal of Electrical Power & Energy Systems, 155:109579. DOI: 10.1016/j.ijepes.2023.109579.

Zabbix (2024). Zabbix documentation. Available at: [link] (Accessed: July 18, 2024).

Zhang, F. and O'Donnell, L. J. (2020). Support vector regression, pages 123-140. Academic Press. DOI: 10.1016/B978-0-12-815739-8.00007-9.

Downloads

Published

2024-10-28

How to Cite

Noetzold, D., Rossetto, A. G. D. M., Leithardt, V. R. Q., & Costa, H. . J. de M. (2024). Enhancing Infrastructure Observability: Machine Learning for Proactive Monitoring and Anomaly Detection. Journal of Internet Services and Applications, 15(1), 508–522. https://doi.org/10.5753/jisa.2024.4509

Issue

Section

Research article