FORENSICS: Deciphering and Detecting Malware Through Variable-Length Instruction Sequences

César Augusto Borges de Andrade; Geraldo Pereira Rocha Filho; Rodolfo I. Meneguette; Ricardo Sant'Ana; Julio Cesar Duarte; André Luiz Marques Serrano; Clóvis Neumann; Vinícius P. Gonçalves

doi:10.5753/jisa.2025.5054

Authors

César Augusto Borges de Andrade University of Brasilia https://orcid.org/0000-0001-5776-2119
Geraldo Pereira Rocha Filho State University of Southwest Bahia https://orcid.org/0000-0001-6795-2768
Rodolfo I. Meneguette University of São Paulo https://orcid.org/0000-0003-2982-4006
Ricardo Sant'Ana Military Institute of Engineering https://orcid.org/0000-0002-4629-6877
Julio Cesar Duarte Military Institute of Engineering https://orcid.org/0000-0001-6656-1247
André Luiz Marques Serrano University of Brasília https://orcid.org/0000-0001-5182-0496
Clóvis Neumann University of Brasília https://orcid.org/0000-0003-4320-8795
Vinícius P. Gonçalves University of Brasília https://orcid.org/0000-0002-3771-2605

DOI:

https://doi.org/10.5753/jisa.2025.5054

Keywords:

Malware Detection, Opcode, Recurrent Neural Networks (RNNs), Long-Short-Term Memory (LSTM), Natural Language Processing (NLP), Cybersecurity

Abstract

The increasing complexity of contemporary malware, driven by the use of advanced evasion and obfuscation techniques, combined with the rapid growth in the number of new variants emerging continuously, undermines the effectiveness of traditional signature-based detection mechanisms. In response to this scenario, this work proposes FORENSICS (Framework fOr malwaRe dEtectioN baSed on InstruCtion Sequences), an innovative deep learning-based framework that employs variable-length instruction sequences to detect malware efficiently and accurately. By integrating Natural Language Processing (NLP) techniques with Long Short-Term Memory (LSTM) neural networks, FORENSICS analyzes opcode sequences extracted from real-world malware and benign software artifacts. The framework introduces optimized methods for opcode extraction and representation, significantly reducing computational overhead while preserving detection performance. FORENSICS achieved 99.91% accuracy, 99.99% precision, and detection times ranging from 8 to 17 milliseconds, outperforming several state-of-the-art approaches across multiple metrics. Additionally, the framework demonstrated robustness in identifying zero-day malware samples, confirming its effectiveness in real-world cybersecurity scenarios. A new balanced dataset comprising over 40,000 labeled samples was created and made publicly available, facilitating reproducibility and encouraging further research. These results position FORENSICS as a robust, scalable, and highly effective solution for malware detection in modern threat landscapes.

Downloads

Download data is not yet available.

References

Abbaspour, S., Fotouhi, F., Sedaghatbaf, A., Fotouhi, H., Vahabi, M., and Linden, M. (2020). A comparative analysis of hybrid deep learning models for human activity recognition. Sensors, 20(19). DOI: 10.3390/s20195707.

Abusitta, A., Li, M. Q., and Fung, B. C. (2021). Malware classification and composition analysis: A survey of recent developments. Journal of Information Security and Applications, 59:102828. DOI: 10.1016/j.jisa.2021.102828.

Aggarwal, S. and Di Troia, F. (2024). Malware classification using dynamically extracted api call embeddings. Applied Sciences, 14(13). DOI: 10.3390/app14135731.

Aslan, Ö. and Yilmaz, A. A. (2021). A new malware classification framework based on deep learning algorithms. IEEE Access, 9:87936-87951. DOI: 10.1109/ACCESS.2021.3089586.

Aslan, O. A. and Samet, R. (2020). A comprehensive review on malware detection approaches. IEEE Access, 8:6249-6271. DOI: 10.1109/ACCESS.2019.2963724.

Baghirov, E. (2023). Malware detection based on opcode frequency. Problems of Information Technology, 14:3-7. DOI: 10.25045/jpit.v14.i1.01.

Barzev, I. and Borissova, D. (2025). An improved static analysis approach for malware detection by optimizing feature extraction combining different ml algorithms. In Bennour, A., Bouridane, A., Almaadeed, S., Bouaziz, B., and Edirisinghe, E., editors, Intelligent Systems and Pattern Recognition, pages 102-115, Cham. Springer Nature Switzerland. DOI: 10.1007/978-3-031-82153-0_8.

Botacin, M., Alves, M. Z., Oliveira, D., and Grégio, A. (2022). Heaven: A hardware-enhanced antivirus engine to accelerate real-time, signature-based malware detection. Expert Systems with Applications, 201:117083. DOI: 10.1016/j.eswa.2022.117083.

Carlson, J., Ralescu, A., Kebede, T., and Kapp, D. (2023). Experiments on recognition of malware based on static opcode occurrence distribution. In NAECON 2023 - IEEE National Aerospace and Electronics Conference, pages 98-103, Dayton, OH, USA. IEEE. DOI: 10.1109/NAECON58068.2023.10366040.

Catak, F. O. and Yazi, A. F. (2019). A benchmark api call dataset for windows pe malware classification. ArXiv, abs/1905.01999. DOI: 10.48550/arXiv.1905.01999.

Catal, C., Giray, G., and Tekinerdogan, B. (2022). Applications of deep learning for mobile malware detection: A systematic literature review. Neural Comput & Applic, 34(2):1007-1032. DOI: 10.1007/s00521-021-06597-0.

de Andrade, C. A. B., Filho, G. P. R., Meneguette, R. I., Maranhão, J. P. A., Sant’Ana, R., Duarte, J. C., Serrano, A. L. M., and Gonçalves, V. P. (2025). Lightweight malware classification with fortunate: Precision meets computational efficiency. Journal of Internet Services and Applications, 16(1):87-104. DOI: 10.5753/jisa.2025.4905.

El ghabri, N., Belmekki, E., and Bellafkih, M. (2024). Pre-trained deep learning models for malware image based classification and detection. In 2024 Sixth International Conference on Intelligent Computing in Data Sciences (ICDS), pages 1-7. DOI: 10.1109/ICDS62089.2024.10756501.

Ferdous, J., Islam, R., Mahboubi, A., and Islam, M. Z. (2023). A review of state-of-the-art malware attack trends and defense mechanisms. IEEE Access, 11:121118-121141. DOI: 10.1109/ACCESS.2023.3328351.

Gaber, M. G., Ahmed, M., and Janicke, H. (2024). Malware detection with artificial intelligence: A systematic literature review. ACM Comput. Surv., 56(6). DOI: 10.1145/3638552.

Gulmez, S., Kakisim, A. G., and Sogukpinar, I. (2024). Analysis of the zero-day detection of metamorphic malware. In 2024 9th International Conference on Computer Science and Engineering (UBMK), pages 1-6. DOI: 10.1109/UBMK63289.2024.10773421.

Habib, F., Shirazi, S. H., Aurangzeb, K., Khan, A., Bhushan, B., and Alhussein, M. (2024). Deep neural networks for enhanced security: Detecting metamorphic malware in iot devices. IEEE Access, 12:48570-48582. DOI: 10.1109/ACCESS.2024.3383831.

Hebish, M. W. and Awni, M. (2024). Cnn-based malware family classification and evaluation. In 2024 14th International Conference on Electrical Engineering (ICEENG), pages 219-224. DOI: 10.1109/ICEENG58856.2024.10566448.

Hoang, X. D., Nguyen, B. C., and Trang Ninh, T. T. (2023). Detecting malware based on statistics and machine learning using opcode n-grams. In 2023 RIVF International Conference on Computing and Communication Technologies (RIVF), pages 118-123, Hanoi, Vietnam. IEEE. DOI: 10.1109/RIVF60135.2023.10471824.

Hochreiter, S. and Schmidhuber, J. (1997). Long short-term memory. Neural Comput., 9(8):1735–1780. DOI: 10.1162/neco.1997.9.8.1735.

Jannat Mim, M. M., Nela, N. A., Das, T. R., Rahman, M. S., and Ahmed Shibly, M. M. (2024). Enhancing malware detection through convolutional neural networks and explainable ai. In 2024 IEEE Region 10 Symposium (TENSYMP), pages 1-6. DOI: 10.1109/TENSYMP61132.2024.10752108.

Jeon, S. and Moon, J. (2020). Malware-detection method with a convolutional recurrent neural network using opcode sequences. Information Sciences, 535:1-15. DOI: 10.1016/j.ins.2020.05.026.

Jha, S., Prashar, D., Long, H. V., and Taniar, D. (2020). Recurrent neural network for detecting malware. Computers & Security, 99:102037. DOI: 10.1016/j.cose.2020.102037.

Kale, A. S., Pandya, V., Di Troia, F., and Stamp, M. (2023). Malware classification with Word2Vec, HMM2Vec, BERT, and ELMo. J Comput Virol Hack Tech, 19(1):1-16. DOI: 10.1007/s11416-022-00424-3.

Kaspersky (2023). 2023 threat intelligence report. Available online [link].

Khan, R. U., Zhang, X., and Kumar, R. (2019). Analysis of ResNet and GoogleNet models for malware detection. J Comput Virol Hack Tech, 15(1):29-37. DOI: 10.1007/s11416-018-0324-z.

Li, C. and Zheng, J. (2021). Api call-based malware classification using recurrent neural networks. Journal of Cyber Security and Mobility. DOI: 10.13052/jcsm2245-1439.1036.

Lu, R. (2019). Malware detection with lstm using opcode language.

Mauri, L. and Damiani, E. (2025). Hardening behavioral classifiers against polymorphic malware: An ensemble approach based on minority report. Information Sciences, 689:121499. DOI: 10.1016/j.ins.2024.121499.

Mehta, R., Jurečková, O., and Stamp, M. (2024). A natural language processing approach to Malware classification. J Comput Virol Hack Tech, 20(1):173-184. DOI: 10.1007/s11416-023-00506-w.

Miao, C., Kou, L., Zhang, J., and Dong, G. (2024). A lightweight malware detection model based on knowledge distillation. Mathematics, 12(24):4009. DOI: 10.3390/math12244009.

Mohammed, M., Abdalla, M., and Elhoseny, M. (2025). Detecting zero-day polymorphic worms using honeywall. Journal of Cybersecurity and Information Management, pages 34-49. DOI: 10.54216/JCIM.150104.

Mohandas, P., Santhosh Kumar, S. K., Kulyadi, S. P., Shankar Raman, M. J., S, V. V., and Venkataswami, B. (2021). Detection of malware using machine learning based on operation code frequency. In 2021 IEEE International Conference on Industry 4.0, Artificial Intelligence, and Communications Technology (IAICT), pages 214-220, Bandung, Indonesia. IEEE. DOI: 10.1109/IAICT52856.2021.9532521.

Muppalaneni, N. B. and Patgiri, R. (2021). Malware detection using machine learning approach. In Patgiri, R., Bandyopadhyay, S., and Balas, V. E., editors, Proceedings of International Conference on Big Data, Machine Learning and Applications, pages 219-225, Singapore. Springer Singapore. DOI: 10.1007/978-981-33-4788-5_18.

Omar, M. (2022). New Approach to Malware Detection Using Optimized Convolutional Neural Network, pages 13-35. Springer International Publishing, Cham. DOI: 10.1007/978-3-031-15893-3_2.

Or-Meir, O., Nissim, N., Elovici, Y., and Rokach, L. (2019). Dynamic malware analysis in the modern era—a state of the art survey. ACM Comput. Surv., 52(5). DOI: 10.1145/3329786.

Owoh, N., Adejoh, J., Hosseinzadeh, S., Ashawa, M., Osamor, J., and Qureshi, A. (2024). Malware detection based on api call sequence analysis: A gated recurrent unit–generative adversarial network model approach. Future Internet, 16(10). DOI: 10.3390/fi16100369.

Palma Salas, M. I., De Geus, P., and Botacin, M. (2023). Enhancing malware family classification in the microsoft challenge dataset via transfer learning. In Proceedings of the 12th Latin-American Symposium on Dependable and Secure Computing, LADC '23, page 156–163, New York, NY, USA. Association for Computing Machinery. DOI: 10.1145/3615366.3615374.

Panda, B., Bisoyi, S. S., Panigrahy, S., and Mohanty, P. (2025). Machine learning techniques for imbalanced multiclass malware classification through adaptive feature selection. PeerJ Computer Science, 11:e2752. DOI: 10.7717/peerj-cs.2752.

Parildi, E. S., Hatzinakos, D., and Lawryshyn, Y. (2021). Deep learning-aided runtime opcode-based Windows malware detection. Neural Comput & Applic, 33(18):11963-11983. DOI: 10.1007/s00521-021-05861-7.

Qian, L. and Cong, L. (2024). Channel features and api frequency-based transformer model for malware identification. Sensors, 24(2):580. DOI: 10.3390/s24020580.

Rumelhart, D. E., Hinton, G. E., and Williams, R. J. (1986). Learning representations by back-propagating errors. Nature, 323:533-536. DOI: 10.1038/323533a0.

Shiri, F. M., Perumal, T., Mustapha, N., and Mohamed, R. (2023). A comprehensive overview and comparative analysis on deep learning models: Cnn, rnn, lstm, gru. DOI: 10.32604/jai.2024.054314.

SonicWall (2023). 2023 cyber threat report: Shifting front lines. Available online [link].

Syeda, D. Z. and Asghar, M. N. (2024). Dynamic malware classification and api categorisation of windows portable executable files using machine learning. Applied Sciences, 14(3). DOI: 10.3390/app14031015.

Tayyab, U.-e.-H., Khan, F. B., Durad, M. H., Khan, A., and Lee, Y. S. (2022). A survey of the recent trends in deep learning based malware detection. Journal of Cybersecurity and Privacy, 2(4):800-829. DOI: 10.3390/jcp2040041.

Xiao, J. and Zhou, Z. (2020). Research progress of rnn language model. In 2020 IEEE International Conference on Artificial Intelligence and Computer Applications (ICAICA), pages 1285-1288, Dalian, China. IEEE. DOI: 10.1109/ICAICA50127.2020.9182390.

Yoo, S., Kim, S., Kim, S., and Kang, B. B. (2021). Ai-hydra: Advanced hybrid approach using random forest and deep learning for malware classification. Information Sciences, 546:420-435. DOI: 10.1016/j.ins.2020.08.082.

Zhang, J., Qin, Z., Yin, H., Ou, L., and Zhang, K. (2019). A feature-hybrid malware variants detection using cnn based opcode embedding and bpnn based api embedding. Computers & Security, 84:376-392. DOI: 10.1016/j.cose.2019.04.005.

Zhao, J., Basole, S., and Stamp, M. (2021). Malware Classification with GMM-HMM Models. arXiv. arXiv:2103.02753 [cs, stat]. DOI: 10.48550/arXiv.2103.02753.

FORENSICS: Deciphering and Detecting Malware Through Variable-Length Instruction Sequences

Authors

DOI:

Keywords:

Abstract

Downloads

References

Downloads

Published

How to Cite

Issue

Section

License

Metrics:

Make a Submission