Log parsers' performance on raw logs from Android devices

Authors

DOI:

https://doi.org/10.5753/jisa.2025.5049

Keywords:

Data mining, embedded systems, log analysis, parsin, regular expression

Abstract

Enhancing log file structure for improved analysis, commonly referred to as "Log Parsing'', holds significant importance in deciphering pertinent insights from software-generated records. This study undertakes a comprehensive comparison of ten parsing tools and models available within the Logpai collection, namely AEL, Brain, Drain, LFA, LogCluster, Logram, NuLog, SHISO, SLCT, and ULP focusing on raw logs sourced from Android Devices, extending a previous work. Our findings underscore a notable precision deficit in models lacking preprocessing steps, as existing tools encounter considerable challenges in managing untreated logs. Consequently, these tools exhibit suboptimal performance levels when analyzing information gleaned from raw Android Logs of the same origin as the reference logs. When analyzing other blocks, such as wifi networks, the difficulty of dealing with small variations in format was persistent.

Downloads

Download data is not yet available.

References

Bessa, J., Filho, R., Souza, G., Pessoa, L., Barreto, R., and Freitas, R. (2024). Análise de desempenho de log parsers da coleção logpai em dados brutos de dispositivos android. In Anais do XXIII Workshop em Desempenho de Sistemas Computacionais e de Comunicação, pages 37-48, Porto Alegre, RS, Brasil. SBC. DOI: 10.5753/wperformance.2024.2423.

Boase, J. and Ling, R. (2013). Measuring mobile phone use: Self-report versus log data. Journal of Computer-Mediated Communication, 18(4):508-519. DOI: 10.1111/jcc4.12021.

Cheng, C. C.-C., Shi, C., Gong, N. Z., and Guan, Y. (2021). Logextractor: Extracting digital evidence from android log messages via string and taint analysis. Forensic Science International: Digital Investigation, 37:301193. DOI: 10.1016/j.fsidi.2021.301193.

Dai, H., Li, H., Chen, C.-S., Shang, W., and Chen, T.-H. (2020). Logram: Efficient log parsing using n n-gram dictionaries. IEEE Transactions on Software Engineering, 48(3):879-892. DOI: 10.1109/TSE.2020.3007554.

Dhanaraj, R. K., Ramakrishnan, V., Poongodi, M., Krishnasamy, L., Hamdi, M., Kotecha, K., and Vijayakumar, V. (2021). Random forest bagging and x-means clustered antipattern detection from sql query log for accessing secure mobile data. Wireless Communications and Mobile Computing, 2021:1-9. DOI: 10.1155/2021/2730246.

He, P., Zhu, J., He, S., Li, J., and Lyu, M. R. (2016). An evaluation study on log parsing and its use in log mining. In 2016 46th annual IEEE/IFIP international conference on dependable systems and networks (DSN), pages 654-661. IEEE. DOI: 10.1109/DSN.2016.66.

He, P., Zhu, J., He, S., Li, J., and Lyu, M. R. (2017a). Towards automated log parsing for large-scale log data analysis. IEEE Transactions on Dependable and Secure Computing, 15(6):931–944. DOI: 10.1109/tdsc.2017.2762673.

He, P., Zhu, J., Zheng, Z., and Lyu, M. R. (2017b). Drain: An online log parsing approach with fixed depth tree. In 2017 IEEE international conference on web services (ICWS), pages 33-40. IEEE. DOI: 10.1109/ICWS.2017.13.

Hwang, K.-H., Chan-Olmsted, S. M., Nam, S.-H., and Chang, B.-H. (2016). Factors affecting mobile application usage: exploring the roles of gender, age, and application types from behaviour log data. International Journal of Mobile Communications, 14(3):256-272. DOI: 10.1504/IJMC.2016.076285.

Jiang, Z., Hassan, A. E., Flora, P., and Hamann, G. (2008). Abstracting execution logs to execution events for enterprise applications (short paper). pages 181-186. DOI: 10.1109/QSIC.2008.50.

Jiang, Z., Liu, J., Huang, J., Li, Y., Huo, Y., Gu, J., Chen, Z., Zhu, J., and Lyu, M. R. (2024). A large-scale evaluation for log parsing techniques: How far are we? In Proceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis, pages 223-234. DOI: 10.1145/3650212.3652123.

Marin-Castro, H. M. and Tello-Leal, E. (2021). Event log preprocessing for process mining: A review. Applied Sciences, 11(22):10556. DOI: 10.3390/app112210556.

Mizutani, M. (2013). Incremental mining of system log format. In 2013 IEEE International Conference on Services Computing, pages 595-602. IEEE. DOI: 10.1109/SCC.2013.73.

Nagappan, M. and Vouk, M. A. (2010). Abstracting log lines to log event types for mining software system logs. In 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010), pages 114-117. IEEE. DOI: 10.1109/MSR.2010.5463281.

Nedelkoski, S., Bogatinovski, J., Acker, A., Cardoso, J., and Kao, O. (2021). Self-supervised log parsing. In Machine Learning and Knowledge Discovery in Databases: Applied Data Science Track: European Conference, ECML PKDD 2020, Ghent, Belgium, September 14-18, 2020, Proceedings, Part IV, pages 122-138. Springer. DOI: 10.1007/978-3-030-67667-4_8.

Petrescu, S., Den Hengst, F., Uta, A., and Rellermeyer, J. S. (2023). Log parsing evaluation in the era of modern software systems. In 2023 IEEE 34th International Symposium on Software Reliability Engineering (ISSRE), page 379–390. IEEE. DOI: 10.1109/issre59848.2023.00019.

Romero, C. and Ventura, S. (2007). Educational data mining: A survey from 1995 to 2005. Expert systems with applications, 33(1):135-146. DOI: 10.1016/j.eswa.2006.04.005.

Sedki, I., Hamou-Lhadj, A., Mohamed, O. A., and Shehab, M. A. (2022). An effective approach for parsing large log files. 2022 IEEE International Conference on Software Maintenance and Evolution (ICSME), pages 1-12. DOI: 10.1109/ICSME55016.2022.00009.

Sharma, P., Malik, N., Akhtar, N., and Rohilla, H. (2013). Parsing techniques: A review. International Journal of Advanced Research in Engineering and Applied Sciences, 2:65-76. Available online [link].

Theys, P. P. (1999). Log data acquisition and quality control. Editions Technip. Book.

Vaarandi, R. (2003). A data clustering algorithm for mining patterns from event logs. In Proceedings of the 3rd IEEE Workshop on IP Operations & Management (IPOM 2003) (IEEE Cat. No.03EX764), pages 119-126. DOI: 10.1109/IPOM.2003.1251233.

Vaarandi, R. and Pihelgas, M. (2015). Logcluster-a data clustering and pattern mining algorithm for event logs. In 2015 11th International conference on network and service management (CNSM), pages 1-7. IEEE. DOI: 10.1109/CNSM.2015.7367331.

Xu, W., Huang, L., Fox, A., Patterson, D., and Jordan, M. I. (2009). Detecting large-scale system problems by mining console logs. In Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles, pages 117-132. DOI: 10.1145/1629575.1629587.

Yu, S., He, P., Chen, N., and Wu, Y. (2023). Brain: Log parsing with bidirectional parallel tree. IEEE Transactions on Services Computing, 16(5):3224-3237. DOI: 10.1109/TSC.2023.3270566.

Zhang, Y., Xiao, Y., Chen, M., Zhang, J., and Deng, H. (2012). A survey of security visualization for computer network logs. Security and Communication Networks, 5(4):404-421. DOI: 10.1002/sec.324.

Zhu, J., He, S., He, P., Liu, J., and Lyu, M. R. (2023). Loghub: A large collection of system log datasets for ai-driven log analytics. DOI: 10.1109/ISSRE59848.2023.00071.

Zhu, J., He, S., Liu, J., He, P., Xie, Q., Zheng, Z., and Lyu, M. R. (2019). Tools and benchmarks for automated log parsing. In 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP), pages 121-130. IEEE. DOI: 10.1109/ICSE-SEIP.2019.00021.

Downloads

Published

2025-05-01

How to Cite

Bessa, J. A., Miranda Filho, R., Souza, G., Barreto, R., & de Freitas, R. (2025). Log parsers’ performance on raw logs from Android devices. Journal of Internet Services and Applications, 16(1), 105–116. https://doi.org/10.5753/jisa.2025.5049

Issue

Section

Research article