On the Limits of Genetically-Optimized Homogeneous Ensembles for Credit Risk Classification

Ricardo Franceli da Silva; Leandro dos Santos Maciel

doi:10.5753/jbcs.2026.6415

Authors

Ricardo Franceli da Silva University of Sao Paulo https://orcid.org/0009-0005-5753-6628
Leandro dos Santos Maciel University of Sao Paulo https://orcid.org/0000-0002-1900-7179

DOI:

https://doi.org/10.5753/jbcs.2026.6415

Keywords:

Credit Risk, Machine Learning, Ensemble, Genetic Algorithm, Optimization

Abstract

Credit risk assessment is a challenging task with significant economic and financial impacts. It requires capturing complex nonlinear patterns and interactions between variables to accurately predict creditworthiness and minimize the risk of default. This study investigates the performance of a genetically-optimized homogeneous ensemble composed of five Multilayer Perceptron (MLP) models applied to a large-scale peer-to-peer lending dataset. Individual models achieved competitive precision scores (up to 80.57%); however, the optimized ensemble failed to surpass the best-performing individual model under economically viable conditions. Ensemble scenarios operating under a stricter classification threshold achieved higher precision gains but yielded negative Expected Profit per loan, rendering them impractical for real-world credit granting. This finding was confirmed by a robustness check, where the experiment was repeated after removing the top model. A pairwise error correlation analysis revealed consistently high correlations among base learners (0.762-0.918), with co-occurring error rates between 79.43% and 93.93%, providing empirical evidence that the base classifiers lack the predictive diversity necessary for synergistic ensemble gains. The results reveal a critical boundary condition for ensemble methods: when base classifiers share the same underlying learning algorithm, thereby lacking conceptual diversity, synergistic gains are unattainable; instead, the optimization process converges on weighting the strongest component. This study concludes that classifier diversity is a fundamental principle for an ensemble to deliver superior performance, regardless of the strength of its individual learners, challenging the assumption that optimized ensembles universally outperform their strongest individual components in machine learning.

Downloads

Download data is not yet available.

References

Abdou, H., Pointon, J., and El-Masry, A. (2008). Neural nets versus conventional techniques in credit scoring in egyptian banking. Expert Systems with Applications, 35(3):1275-1292. DOI: 10.1016/j.eswa.2007.08.030.

Abellán, J. and Mantas, C. J. (2014). Improving experimental studies about ensembles of classifiers for bankruptcy prediction and credit scoring. Expert Systems with Applications, 41(8):3825-3830. DOI: 10.1016/j.eswa.2013.12.003.

Abellán, J. and Castellano, J. G. (2017). A comparative study on base classifiers in ensemble methods for credit scoring. Expert Systems with Applications, 73:1-10. DOI: 10.1016/j.eswa.2016.12.020.

Al-Maari, A.-A., Abdulnabi, M., Nathan, Y., Ali, A., Ali, U., and Khan, M. (2025). Optimized credit card fraud detection leveraging ensemble machine learning methods. Engineering, Technology & Applied Science Research, 15(3):22287-22294. DOI: 10.48084/etasr.10287.

Altman, E. I. (1968). Financial ratios, discriminant analysis and the prediction of corporate bankruptcy. The Journal of Finance, 23(4):589-609. DOI: 10.1111/j.1540-6261.1968.tb00843.x.

Angelini, E., Di Tollo, G., and Roli, A. (2008). A neural network approach for credit risk evaluation. The quarterly review of economics and finance, 48(4):733-755. DOI: 10.1016/j.qref.2007.04.001.

Atiya, A. F. (2001). Bankruptcy prediction for credit risk using neural networks: A survey and new results. IEEE Transactions on neural networks, 12(4):929-935. DOI: 10.1109/72.935101.

Battiston, S., Puliga, M., Kaushik, R., Tasca, P., and Caldarelli, G. (2012). Debtrank: Too central to fail? financial networks, the fed and systemic risk. Scientific reports, 2:541. DOI: 10.1038/srep00541.

Bhuria, R., Gupta, S., Kaur, U., Bharany, S., Ur Rehman, A., Hussen, S., Tejani, G. G., and Jangir, P. (2025). Ensemble-based customer churn prediction in banking: a voting classifier approach for improved client retention using demographic and behavioral data. Discover Sustainability, 6(28). DOI: 10.1007/s43621-025-00807-8.

Breiman, L. (1996). Bagging predictors. Machine learning, 24:123-140. DOI: 10.1007/BF00058655.

Caouette, J. B., Altman, E. I., and Narayanan, P. (1998). Managing credit risk: the next great financial challenge. John Wiley & Sons. Book.

Chen, M.-C. and Huang, S.-H. (2003). Credit scoring and rejected instances reassigning through evolutionary computation techniques. Expert Systems with Applications, 24(4):433-441. DOI: 10.1016/S0957-4174(02)00191-4.

Chen, Z., Chen, W., and Shi, Y. (2020). Ensemble learning with label proportions for bankruptcy prediction. Expert Systems with Applications, 146:113155. DOI: 10.1016/j.eswa.2019.113155.

Chi Guotai, M. Z. A. and Moula, F. (2017). Modeling credit approval data with neural networks: an experimental investigation and optimization*. Journal of Business Economics and Management, 18(2):224-240. DOI: 10.3846/16111699.2017.1280844.

Crook, J., Edelman, D., and Thomas, L. (2007). Recent developments in consumer credit risk assessment. European Journal of Operational Research, 183(3):1447-1465. DOI: 10.1016/j.ejor.2006.09.100.

Dietterich, T. G. (1998). Approximate statistical tests for comparing supervised classification learning algorithms. Neural Computation, 10(7):1895-1923. DOI: 10.1162/089976698300017197.

Duffie, D. and Singleton, K. J. (2003). Credit Risk: Pricing, Measurement, and Management. Princeton University Press. DOI: 10.1515/9781400829170.

Fagerland, M. W., Lydersen, S., and Laake, P. (2013). The mcnemar test for binary matched-pairs data: mid-p and asymptotic are better than exact conditional. BMC Medical Research Methodology, 13:91. DOI: 10.1186/1471-2288-13-91.

Gandomi, A. H. and Haider, M. (2015). Beyond the hype: Big data concepts, methods, and analytics. International Journal of Information Management, 35(2):137-144. DOI: 10.1016/j.ijinfomgt.2014.10.007.

Ghatge, A. and Halkarnikar, P. (2013). Ensemble neural network strategy for predicting credit default evaluation. International Journal of Engineering and Innovative Technology (IJEIT), 2(7):223-225. Avaialble at:[link].

Goldberg, D. E. (1989). Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley, Reading, MA, USA. DOI: 10.5860/choice.27-0936.

Guo, C., Pleiss, G., Sun, Y., and Weinberger, K. Q. (2017). On calibration of modern neural networks. In Proceedings of the 34th International Conference on Machine Learning, volume 70, pages 1321-1330. DOI: 10.48550/arxiv.1706.04599.

Haykin, S. (1999). Neural Networks: A Comprehensive Foundation. Prentice Hall. Book.

Huang, M.-C., Chen, M.-H., Hsu, C.-H., Chen, P.-C., and Wu, D.-M. (2004). Credit rating analysis with support vector machines and neural networks: a market comparative study. Decision Support Systems, 37(4):543-558. DOI: 10.1016/S0167-9236(03)00086-1.

Kingma, D. P. and Ba, J. (2015). Adam: A method for stochastic optimization. In International Conference on Learning Representations (ICLR). DOI: 10.48550/arXiv.1412.6980.

Kuncheva, L. I. and Rodríguez, J. J. (2007). A weighted voting framework for ensembles of classifiers. Journal of Artificial Intelligence Research, 30:691-717. DOI: 10.1007/s10115-012-0586-6.

Kuncheva, L. I. and Whitaker, C. J. (2003). Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning, 51(2):181-207. DOI: 10.1023/a:1022859003006.

Lee, T.-H., Chiu, C.-S., Chou, P.-H., and Lu, C.-C. (2006). Credit scoring using the hybrid neural discriminant technique. Expert Systems with Applications, 30(4):773-782. DOI: 10.1016/S0957-4174(02)00044-1.

Lessmann, S., Baesens, B., Seow, H.-V., and Thomas, L. C. (2015). Benchmarking state-of-the-art classification algorithms for credit scoring: An update of research. European Journal of Operational Research, 247(1):124-136. DOI: 10.1016/j.ejor.2015.05.030.

Li, W., Ding, S., Chen, Y., and Yang, S. (2018). Heterogeneous ensemble for default prediction of peer-to-peer lending in china. IEEE Access, 6:54396-54406. DOI: 10.1109/ACCESS.2018.2810864.

Liu, Y., Baals, L. J., Osterrieder, J., and Hadji-Misheva, B. (2024). Leveraging network topology for credit risk assessment in p2p lending: A comparative study under the lens of machine learning. Expert Systems with Applications, 252:124100. DOI: 10.1016/j.eswa.2024.124100.

Louzada, F. and Ara, A. (2016). Classification methods applied to credit scoring: Systematic review and overall comparison. Expert Systems with Applications. DOI: 10.1016/j.sorms.2016.10.001.

McNemar, Q. (1947). Note on the sampling error of the difference between correlated proportions or percentages. Psychometrika, 12(2):153-157. DOI: 10.1007/BF02295996.

Nanni, L. and Lumini, A. (2009). An experimental comparison of ensemble of classifiers for bankruptcy prediction and credit scoring. Expert Systems with Applications, 36(2, Part 2):3028-3033. DOI: 10.1016/j.eswa.2008.01.018.

Rokach, L. (2010). Ensemble-based classifiers. Artificial Intelligence Review, 33(1-2):1-39. DOI: 10.1007/s10462-009-9124-7.

Sadhwani, A., Giesecke, K., and Sirignano, J. (2020). Deep Learning for Mortgage Risk*. Journal of Financial Econometrics, 19(2):313-368. DOI: 10.1093/jjfinec/nbaa025.

Saunders, A. and Cornett, M. M. (2018). Financial Institutions Management: A Risk Management Approach. McGraw-Hill Education, 9th edition. Book.

Tsai, C.-F. and Wu, J.-C. (2007). Using neural network ensembles for bankruptcy prediction and credit scoring. Expert Systems with Applications, 34(4):2639-2649. DOI: 10.1016/j.eswa.2007.05.019.

Van Calster, B., McLernon, D. J., van Smeden, M., Wynants, L., and Steyerberg, E. W. (2019). Calibration: the achilles heel of predictive analytics. BMC Medicine, 17:230. DOI: 10.1186/s12916-019-1466-7.

Verbraken, T., Bravo, C., Weber, R., and Baesens, B. (2014). Development and application of consumer credit scoring models using profit-based classification measures. European Journal of Operational Research, 238(2):505-513. DOI: 10.1016/j.ejor.2014.04.001.

West, D. (2000). Neural network credit scoring models. Computers & Operations Research, 27(11-12):1131-1152. DOI: 10.1016/S0305-0548(99)00149-5.

Zhao, Z., Xu, S., Kang, B. H., Kabir, M. M. J., Liu, Y., and Wasinger, R. (2015). Investigation and improvement of multi-layer perceptron neural networks for credit scoring. Expert Systems with Applications, 42(7):3508-3516. DOI: 10.1016/j.eswa.2014.12.006.

Zhou, Z.-H. (2012). Ensemble Methods: Foundations and Algorithms. CRC Press. DOI: 10.1201/b12207.