Lightweight Malware Classification with FORTUNATE: Precision Meets Computational Efficiency

Authors

DOI:

https://doi.org/10.5753/jisa.2025.4905

Keywords:

Malware Classification, Opcode, Recurrent Neural Networks (RNNs), Long Short Term Memory (LSTM), Natural Language Processing (NLP), Cybersecurity

Abstract

After detecting a malicious artifact, classifying malware into specific families becomes an essential step to understand the threat's behavior, implement mitigation strategies, and develop proactive defenses. This task is particularly challenging due to the diversity of malware formats, the rapid evolution of obfuscation and packing techniques, as well as the scarcity of labeled data for training robust models. Additionally, the high volume of samples generated daily demands solutions that combine high accuracy and computational efficiency. Although transformer-based models are widely recognized as the state-of-the-art for sequence processing tasks, their high computational demands limit their practical application in resource-constrained environments. In this work, we present FORTUNATE, a lightweight framework that leverages LSTM networks with one-hot encoding to classify malware based on variable-length opcode sequences. The framework adopts an optimized opcode extraction process focused on reducing redundancies and representing data in compact vectors, minimizing computational costs. Experimental results indicate that FORTUNATE achieves accuracies of 99.82% for active malware and 99.81% for inactive malware, with an average classification time of only 56 ms per sample, significantly outperforming related works. The obtained results demonstrate that lightweight artificial intelligence approaches can deliver competitive performance in malware classification, especially in scenarios with computational constraints. FORTUNATE not only fills an important gap in malware classification but also establishes a foundation for future research aimed at optimizing the balance between accuracy, efficiency, and scalability.

Downloads

Download data is not yet available.

References

Abid, Y. A., Wu, J., Farhan, M., and Ahmad, T. (2023). ECMT Framework for Internet of Things: An Integrative Approach Employing In-Memory Attribute Examination and Sophisticated Neural Network Architectures in Conjunction With Hybridized Machine Learning Methodologies. IEEE Internet of Things Journal, pages 1-1. Conference Name: IEEE Internet of Things Journal. DOI: 10.1109/JIOT.2023.3312152.

Abusitta, A., Li, M. Q., and Fung, B. C. (2021). Malware classification and composition analysis: A survey of recent developments. Journal of Information Security and Applications, 59:102828. DOI: https://doi.org/10.1016/j.jisa.2021.102828.

Aggarwal, S. and Di Troia, F. (2024). Malware classification using dynamically extracted api call embeddings. Applied Sciences, 14(13). DOI: 10.3390/app14135731.

Albuquerque, D. G. d., Vieira, L. d. Q., Sant'Ana, R., and Duarte, J. C. (2021). Análise de comportamento de malware utilizando redes neurais recorrentes - uma abordagem por intermédio da previsão de opcodes. Revista Militar de Ciência e Tecnologia, 37(3). Available at:[link].

Alraizza, A. and Algarni, A. (2023). Ransomware Detection Using Machine Learning: A Survey. Big Data and Cognitive Computing, 7(3):143. Number: 3 Publisher: Multidisciplinary Digital Publishing Institute. DOI: 10.3390/bdcc7030143.

Andrade, C. A. B., Rocha Filho, G. P., Meneguette, R. I., Maranhão, J. P. A., Sant'Ana, R., Duarte, J. C., Serrano, A. L. M., and Gonçalves, V. P. (2024). Fortunate: Decrypting and classifying malware by variable length instruction sequences. In 2024 IEEE 13th International Conference on Cloud Networking (CloudNet), pages 1-9. DOI: 10.1109/CloudNet62863.2024.10815801.

Arora, A., Gannon, M., and Warner, G. (2017). Kelihos botnet: A never-ending saga. Available at: [link].

Aslan, Ö. and Yilmaz, A. A. (2021). A new malware classification framework based on deep learning algorithms. IEEE Access, 9:87936-87951. DOI: 10.1109/ACCESS.2021.3089586.

Awad, Y., Nassar, M., and Safa, H. (2018). Modeling Malware as a Language. In 2018 IEEE International Conference on Communications (ICC), pages 1-6. ISSN: 1938-1883. DOI: 10.1109/ICC.2018.8422083.

Catak, F. O. and Yazi, A. F. (2019). A benchmark api call dataset for windows pe malware classification. ArXiv, abs/1905.01999. Available at: [link].

Dang, D., Troia, F. D., and Stamp, M. (2021). Malware classification using long short-term memory models. DOI: 10.5220/0010378007430752.

de Oliveira, J. A., Gonçalves, V. P., Meneguette, R. I., de Sousa Jr, R. T., Guidoni, D. L., Oliveira, J. C., and Rocha Filho, G. P. (2023). F-nids—a network intrusion detection system based on federated learning. Computer Networks, 236:110010. DOI: 10.1016/j.comnet.2023.110010.

Djenna, A., Bouridane, A., Rubab, S., and Marou, I. M. (2023). Artificial Intelligence-Based Malware Detection, Analysis, and Mitigation. Symmetry, 15(3):677. Number: 3 Publisher: Multidisciplinary Digital Publishing Institute. DOI: 10.3390/sym15030677.

El ghabri, N., Belmekki, E., and Bellafkih, M. (2024). Pre-trained deep learning models for malware image based classification and detection. In 2024 Sixth International Conference on Intelligent Computing in Data Sciences (ICDS), pages 1-7. DOI: 10.1109/ICDS62089.2024.10756501.

Gaber, M. G., Ahmed, M., and Janicke, H. (2024). Malware Detection with Artificial Intelligence: A Systematic Literature Review. ACM Comput. Surv., 56(6):148:1-148:33. DOI: 10.1145/3638552.

Gulmez, S., Kakisim, A. G., and Sogukpinar, I. (2024). Analysis of the zero-day detection of metamorphic malware. In 2024 9th International Conference on Computer Science and Engineering (UBMK), pages 1-6. DOI: 10.1109/UBMK63289.2024.10773421.

Habib, F., Shirazi, S. H., Aurangzeb, K., Khan, A., Bhushan, B., and Alhussein, M. (2024). Deep neural networks for enhanced security: Detecting metamorphic malware in iot devices. IEEE Access, 12:48570-48582. DOI: 10.1109/ACCESS.2024.3383831.

Hebish, M. W. and Awni, M. (2024). Cnn-based malware family classification and evaluation. In 2024 14th International Conference on Electrical Engineering (ICEENG), pages 219-224. DOI: 10.1109/ICEENG58856.2024.10566448.

Hochreiter, S. and Schmidhuber, J. (1997). Long short-term memory. Neural Comput., 9(8):1735–1780. DOI: 10.1162/neco.1997.9.8.1735.

Jannat Mim, M. M., Nela, N. A., Das, T. R., Rahman, M. S., and Ahmed Shibly, M. M. (2024). Enhancing malware detection through convolutional neural networks and explainable ai. In 2024 IEEE Region 10 Symposium (TENSYMP), pages 1-6. DOI: 10.1109/TENSYMP61132.2024.10752108.

Kale, A. S., Pandya, V., Di Troia, F., and Stamp, M. (2023). Malware classification with Word2Vec, HMM2Vec, BERT, and ELMo. J Comput Virol Hack Tech, 19(1):1-16. DOI: 10.1007/s11416-022-00424-3.

Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., and Liu, T.-Y. (2017). Lightgbm: a highly efficient gradient boosting decision tree. In Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS'17, page 3149–3157, Red Hook, NY, USA. Curran Associates Inc. Available at: [link].

Kong, Z., Xue, J., Wang, Y., Zhang, Q., Han, W., and Zhu, Y. (2023). Malfsm: Feature subset selection method for malware family classification. Chinese Journal of Electronics, 32(1):26-38. DOI: 10.23919/cje.2022.00.038.

Li, C. and Zheng, J. (2021). Api call-based malware classification using recurrent neural networks. Journal of Cyber Security and Mobility. DOI: 10.13052/jcsm2245-1439.1036.

Li, Z., Liu, H., Shan, R., Sun, Y., Jiang, Y., and Hu, N. (2023). Binary code similarity detection: State and future. In 12th IEEE International Conference on Cloud Networking, CloudNet 2023, Hoboken, NJ, USA, November 1-3, 2023, pages 408-412. IEEE. DOI: 10.1109/CLOUDNET59005.2023.10490019.

Lu, R. (2019). Malware detection with lstm using opcode language. Available at: [link].

Mauri, L. and Damiani, E. (2025). Hardening behavioral classifiers against polymorphic malware: An ensemble approach based on minority report. Information Sciences, 689:121499. DOI: https://doi.org/10.1016/j.ins.2024.121499.

Mehta, R., Jurečková, O., and Stamp, M. (2024). A natural language processing approach to Malware classification. J Comput Virol Hack Tech, 20(1):173-184. DOI: 10.1007/s11416-023-00506-w.

Microsoft (2012). Update on kelihos botnet and new related malware. Available at: [link]Accessed: 2024-12-09.

Microsoft (2024a). Adware:win32/lollipop - malware encyclopedia. Available at: [link]Accessed: 2024-12-09.

Microsoft (2024b). Backdoor:win32/bifrose - malware encyclopedia. Available at: [link] Accessed: 2024-12-09.

Microsoft (2024c). Backdoor:win32/rbot - malware encyclopedia. Available at: [link] Accessed: 2024-12-09.

Microsoft (2024d). Browsermodifier:win32/zwangi - malware encyclopedia. Available at: [link]Accessed: 2024-12-09.

Microsoft (2024e). Trojan:win32/gatak - malware encyclopedia. Available at: [link]Accessed: 2024-12-09.

Microsoft (2024f). Trojan:win32/startpage - malware encyclopedia. Available at : [link]Accessed: 2024-12-09.

Microsoft (2024g). Trojan:win32/tracur.b - malware encyclopedia. Available at: [link] Accessed: 2024-12-09.

Microsoft (2024h). Virtool:win32/obfuscator.acy - malware encyclopedia. Available at: [link]Accessed: 2024-12-09.

Microsoft (2024i). Win32/hupigon - malware encyclopedia. Available at: [link]Accessed: 2024-12-09.

Microsoft (2024j). Win32/kelihos - malware encyclopedia. Available at: [link] Accessed: 2024-12-09.

Microsoft (2024k). Win32/koutodoor - malware encyclopedia. Available at [link] Accessed: 2024-12-09.

Microsoft (2024l). Win32/ramnit - malware encyclopedia. Availble at: [link] Accessed: 2024-12-09.

Microsoft (2024m). Win32/vundo - malware encyclopedia. Available at: [link]Accessed: 2024-12-09.

Moawad, A., Ebada, A. I., El-Harby, A., and Al-Zoghby, A. M. (2024). An Automatic Artificial Intelligence System for Malware Detection, chapter 6, pages 115-138. John Wiley & Sons, Ltd. DOI: https://doi.org/10.1002/9781394213948.ch6.

Mohammed, M., Abdalla, M., and Elhoseny, M. (2025). Detecting zero-day polymorphic worms using honeywall. Journal of Cybersecurity and Information Management, pages 34-49. DOI: 10.54216/JCIM.150104.

Molina, A. L., Gonçalves, V. P., De Sousa, R. T., Pividal, M., Meneguette, R. I., and Rocha Filho, G. P. (2022). A lightweight unsupervised learning architecture to enhance user behavior anomaly detection. In 2022 IEEE Latin-American Conference on Communications (LATINCOM), pages 1-6. IEEE. DOI: 10.1109/latincom56090.2022.10000477.

Omar, M. (2022). New Approach to Malware Detection Using Optimized Convolutional Neural Network, pages 13-35. Springer International Publishing, Cham. DOI: 10.1007/978-3-031-15893-3_2.

Owoh, N., Adejoh, J., Hosseinzadeh, S., Ashawa, M., Osamor, J., and Qureshi, A. (2024). Malware detection based on api call sequence analysis: A gated recurrent unit–generative adversarial network model approach. Future Internet, 16(10). DOI: 10.3390/fi16100369.

Ronen, R., Radu, M., Feuerstein, C., Yom-Tov, E., and Ahmadi, M. (2018). Microsoft malware classification challenge. Available at: [link].

Rumelhart, D. E., Hinton, G. E., and Williams, R. J. (1986). Learning representations by back-propagating errors. Nature, 323:533-536. Available at: [link].

Shi, Y., Ke, G., Chen, Z., Zheng, S., and Liu, T.-Y. (2024). Quantized training of gradient boosting decision trees. In Proceedings of the 36th International Conference on Neural Information Processing Systems, NIPS '22, Red Hook, NY, USA. Curran Associates Inc.. DOI: https://doi.org/10.48550/arXiv.2207.0968210.48550/arXiv.2207.09682.

Sikorski, M. and Honig, A. (2012). Practical Malware Analysis: The Hands-On Guide to Dissecting Malicious Software. No Starch Press, USA, 1st edition. Book.

Sun, J., Luo, X., Gao, H., Wang, W., Gao, Y., and Yang, X. (2020). Categorizing malware via a word2vec-based temporal convolutional network scheme. Journal of Cloud Computing, 9. DOI: 10.1186/s13677-020-00200-y.

Sung, Y., Jang, S., Jeong, Y.-S., and Park, J. H. J. J. . (2020). Malware classification algorithm using advanced Word2vec-based Bi-LSTM for ground control stations. Computer Communications, 153:342-348. DOI: https://doi.org/10.1016/j.comcom.2020.02.005.

Syeda, D. Z. and Asghar, M. N. (2024). Dynamic malware classification and api categorisation of windows portable executable files using machine learning. Applied Sciences, 14(3). DOI: 10.3390/app14031015.

Taher, F., AlFandi, O., Al-kfairy, M., Al Hamadi, H., and Alrabaee, S. (2023). DroidDetectMW: A Hybrid Intelligent Model for Android Malware Detection. Applied Sciences, 13(13):7720. Number: 13 Publisher: Multidisciplinary Digital Publishing Institute. DOI: 10.3390/app13137720.

Vanzan, M. and Duarte, J. (2023). Malware classification using transfer learning through the gpt-2 model. In Anais do XXIII Simpósio Brasileiro de Segurança da Informação e de Sistemas Computacionais, pages 167-180, Porto Alegre, RS, Brasil. SBC. DOI: 10.5753/sbseg.2023.233086.

Zhang, J., Qin, Z., Yin, H., Ou, L., and Zhang, K. (2019). A feature-hybrid malware variants detection using cnn based opcode embedding and bpnn based api embedding. Computers & Security, 84:376-392. DOI: https://doi.org/10.1016/j.cose.2019.04.005.

Zhao, J., Basole, S., and Stamp, M. (2021). Malware Classification with GMM-HMM Models. arXiv:2103.02753 [cs, stat]. DOI: 10.48550/arXiv.2103.02753.

Downloads

Published

2025-04-14

How to Cite

de Andrade, C. A. B., Filho, G. P. R., Meneguette, R. I., Maranhão, J. P. A., Sant’Ana, R., Duarte, J. C., Serrano, A. L. M., & Gonçalves, V. P. (2025). Lightweight Malware Classification with FORTUNATE: Precision Meets Computational Efficiency. Journal of Internet Services and Applications, 16(1), 87–104. https://doi.org/10.5753/jisa.2025.4905

Issue

Section

Research article