Generalizing Feature Selection in Android Malware Detection: The SigAPI AutoCraft Approach

Authors

DOI:

https://doi.org/10.5753/jbcs.2026.6043

Keywords:

SigAPI, AutoCraft, Android, Malwares, Feature Selection

Abstract

Feature selection methods are widely employed in Android malware detection to improve accuracy and efficiency by identifying the most relevant features. However, their generalizability often remains limited, as approaches like SigAPI are typically developed and evaluated on a small number of datasets, reducing their effectiveness across diverse scenarios. The practical use of SigAPI is further hindered by the need to predefine a minimum number of features, the instability of its evaluation metrics, and its inability to adapt efficiently to the heterogeneity commonly present in Android datasets. To address these limitations, we developed SigAPI AutoCraft, an enhanced and fully automated version of the original method. SigAPI AutoCraft achieves consistent and robust performance across ten Android malware datasets, substantially improving generalization. The results demonstrate a 5–15% increase in Matthews Correlation Coefficient (MCC) and up to a 7.6-fold improvement in feature reduction, underscoring its effectiveness and adaptability to complex and heterogeneous data environments.

Downloads

Download data is not yet available.

References

Alazab, M., Alazab, M., Shalaginov, A., Mesleh, A., and Awajan, A. (2020). Intelligent Mobile Malware Detection Using Permission Requests and API Calls. Future Generation Computer Systems, 107:509-521. DOI: 10.1016/j.future.2020.02.002.

Assolin, J., Canto, G., Kreutz, D., Feitosa, E., Bragança, H., Nogueira, A., and Rocha, V. (2025). Interpretable by design: MH-AutoML for transparent and efficient android malware detection without compromising performance. Available at:[link] arXiv eprint 2506.23314.

Azad, M. A., Riaz, F., Aftab, A., Rizvi, S. K. J., Arshad, J., and Atlam, H. F. (2022). Deepsel: A novel feature selection for early identification of malware in mobile applications. Future Generation Computer Systems, 129:54-63. DOI: 10.1016/j.future.2021.10.029.

Bolón-Canedo, V. and Alonso-Betanzos, A. (2019). Ensembles for feature selection: A review and future trends. Information fusion, 52:1-12. DOI: 10.1016/j.inffus.2018.11.008.

Braganca, H., Kreutz, D., Rocha, V., Assolin, J., , and Feitosa, E. (2025). MH-1M: A 1.34 million-sample comprehensive multi-feature android malware dataset for machine learning, deep learning, large language models, and threat intelligence research. Available at:[link] arXiv eprint 2511.00342.

Cai, L., Li, Y., and Xiong, Z. (2021). Jowmdroid: Android malware detection based on feature weighting with joint optimization of weight-mapping and classifier parameters. Computers & Security, 100:102086. DOI: 10.1016/j.cose.2020.102086.

Cohen, I. et. al. (2009). Pearson Correlation Coefficient. Noise reduction in speech processing. DOI: 10.1007/978$-1$-4020$-5614$-7_2569.

Costa, E., Kreutz, D., Rocha, V., Leão, L., Sabóia, S., Neves, N., and Feitosa, E. (2022). FS3E: Uma Ferramenta Para Execução e Avaliação de Métodos de Seleção de Características Para Detecção de Malwares Android. In XXII SBSeg. SBC. DOI: 10.5753/sbseg_estendido.2022.227041.

Feizollah, A., Anuar, N. B., Salleh, R., and Wahab, A. W. A. (2015). A review on feature selection in mobile malware detection. Digital investigation, 13:22-37. DOI: 10.1016/j.diin.2015.02.001.

Galib, A. H. and Hossain, B. M. M. (2020). Significant API Calls in Android Malware Detection (Using Feature Selection Techniques and Correlation Based Feature Elimination). In The 32nd SEKE. DOI: 10.18293/SEKE2020-143.

Golrang, A., Yayilgan, S. Y., and Elezaj, O. (2021). The Multi-Objective Feature Selection in Android Malware Detection System. In Intelligent Tech. and Applications, page 311. DOI: 10.1007/978$-3$-030$-71711$-7_26.

Hammouri, A. I., Awadallah, M. A., Braik, M. S., Al-Betar, M. A., and Beseiso, M. (2024). Improved dwarf mongoose optimization algorithm for feature selection: Application in software fault prediction datasets. Journal of Bionic Engineering, 21(4):2000-2033. DOI: 10.1007/s42235-024-00524-4.

Kim, J., Ban, Y., Ko, E., Cho, H., and Yi, J. H. (2022). Mapas: a practical deep learning-based android malware detection system. International Journal of Information Security, 21(4):725-738. DOI: 10.1007/s10207-022-00579-6.

Mahindru, A. and Sangal, A. L. (2021). FSDroid: A Feature Selection Technique to Detect Malware from Android Using Machine Learning Techniques. Multimedia Tools and Applications, 80:13271-13323. DOI: 10.1007/s11042-020-10367-w.

Maniriho, P., Mahmood, A. N., and Chowdhury, M. J. M. (2023). API-MalDetect: Automated Malware Detection Framework for Windows Based on API Calls and Deep Learning Techniques. JNCA, 218:103704. DOI: 10.1016/j.jnca.2023.103704.

Neves, N., Rocha, V., Kreutz, D., Bragança, H., and Feitosa, E. (2023). Avaliação de Métodos de Seleção de Características de Amostras Android com a Ferramenta FS3E (v2). In Anais da XX ERRC. SBC. DOI: 10.5753/errc.2023.928.

Paim, K. O., Nogueira, A. G. D., Kreutz, D., Cordeiro, W., and Mansilha, R. B. (2025). MalDataGen: A modular framework for synthetic tabular data generation in malware detection. In Anais Estendidos do XXV Simpósio Brasileiro de Cibersegurança (SBSeg 2025), SBSeg Estendido 2025, page 38–47. Sociedade Brasileira de Computação - SBC. DOI: 10.5753/sbseg_estendido.2025.12113.

Qazi, N. and Raza, K. (2012). Effect of Feature Selection, SMOTE and under Sampling on Class Imbalance Classification. In 2012 UKSim 14th International Conference on Computer Modelling and Simulation, pages 145-150. DOI: 10.1109/UKSim.2012.116.

Qiu, J., Han, Q.-L., Luo, W., Pan, L., Nepal, S., Zhang, J., and Xiang, Y. (2023). Cyber Code Intelligence for Android Malware Detection. IEEE Transactions on Cybernetics, 53(1):617-627. DOI: 10.1109/TCYB.2022.3164625.

Rocha, V., Kreutz, D., Canto, G., Bragança, H., and Feitosa, E. (2025). MH-FSF: A unified framework for overcoming benchmarking and reproducibility limitations in feature selection evaluation. Available at:[link].

Şahin, D. Ö., Kural, O. E., Akleylek, S., and Kılıç, E. (2023). A novel android malware detection system: adaption of filter-based feature selection methods. Journal of Ambient Intelligence and Humanized Computing, pages 1-15. DOI: 10.1007/s12652-021-03376-6.

Salah, A., Shalabi, E., and Khedr, W. (2020). A lightweight android malware classifier using novel feature selection methods. Symmetry, 12(5). DOI: 10.3390/sym12050858.

Singh, A. K., Jaidhar, C., and Kumara, M. A. (2019). Experimental analysis of android malware detection based on combinations of permissions and api-calls. Journal of Computer Virology and Hacking Techniques, 15:209-218. DOI: 10.1007/s11416-019-00332-z.

Soares, T., Kreutz, D., Rocha, V., Costa, E., Leão, L., Pontes, J., Assolin, J., Rodrigues, G., and Feitosa, E. (2022). Uma Análise de Métodos de Seleção de Características Aplicados à Detecção de Malwares Android. In Anais do XXII SBSeg. SBC. DOI: 10.5753/sbseg.2022.225321.

Soi, D., Sanna, A., Maiorca, D., and Giacinto, G. (2024). Enhancing android malware detection explainability through function call graph apis. Journal of Information Security and Applications, 80:103691. DOI: 10.1016/j.jisa.2023.103691.

Song, F., Guo, Z., and Mei, D. (2010). Feature Selection Using Principal Component Analysis. In 2010 International Conference on System Science, Engineering Design and Manufacturing Informatization, volume 1, pages 27-30. DOI: 10.1109/ICSEM.2010.14.

Tschiedel, L., Rocha, V., Kreutz, D., Bragança, H., Quincozes, S., Nogueira, A., and Assolin, J. (2024). SigAPI AutoCraft: Uma Ferramenta de Seleção de Características com Capacidade de Generalização. In Anais Estendidos do XXIV Simpósio Brasileiro de Segurança da Informação e de Sistemas Computacionais, pages 169-176, Porto Alegre, RS, Brasil. SBC. DOI: 10.5753/sbseg_estendido.2024.243361.

Venkatesh, B. and Anuradha, J. (2019). A Review of Feature Selection and Its Methods. Cybernetics and Information Technologies, 19(1):3-26. DOI: doi:10.2478/cait-2019-0001.

Wang, L., Gao, Y., Gao, S., and Yong, X. (2021). A new feature selection method based on a self-variant genetic algorithm applied to android malware detection. Symmetry, 13(7). DOI: 10.3390/sym13071290.

Wei, G., Zhao, J., Feng, Y., He, A., and Yu, J. (2020). A Novel Hybrid Feature Selection Method Based on Dynamic Feature Importance. Applied Soft Computing, 93:106337. DOI: 10.1016/j.asoc.2020.106337.

Wu, Y., Li, M., Zeng, Q., Yang, T., Wang, J., Fang, Z., and Cheng, L. (2023). DroidRL: Feature Selection for Android Malware Detection with Reinforcement Learning. Computers & Security, 128:103126. DOI: 10.1016/j.cose.2023.103126.

Yang, H., Wang, Y., Zhang, L., Cheng, X., and Hu, Z. (2024). A novel android malware detection method with api semantics extraction. Computers & Security, 137:103651. DOI: 10.1016/j.cose.2023.103651.

Zou, D., Wu, Y., Yang, S., Chauhan, A., Yang, W., Zhong, J., Dou, S., and Jin, H. (2021). Intdroid: Android malware detection based on api intimacy analysis. ACM Transactions on Software Engineering and Methodology (TOSEM), 30(3):1-32. DOI: 10.1145/3442588.

Downloads

Published

2026-03-09

How to Cite

Rocha, V., Tschiedel, L., Kreutz, D., Bragança, H., Assolin, J., Mansilha, R. B., Quincozes, S. E., & Nogueira, A. G. D. (2026). Generalizing Feature Selection in Android Malware Detection: The SigAPI AutoCraft Approach. Journal of the Brazilian Computer Society, 32(1), 250–263. https://doi.org/10.5753/jbcs.2026.6043

Issue

Section

Regular Issue