Hybrid Artificial Intelligence Model for Educational Efficiency Analysis: Integration of Unsupervised Learning and Data Envelopment Analysis

Authors

DOI:

https://doi.org/10.5753/rbie.2026.6708

Keywords:

Artificial Intelligence, Efficiency Analysis, Unsupervised Learning, Educational Systems, Data Envelopment Analysis

Abstract

Low efficiency in Brazil’s public education systems remains a critical challenge that calls for innovative, AI-driven solutions. This paper proposes a hybrid framework that integrates unsupervised learning, dimensionality reduction via Principal Component Analysis (PCA), and Data Envelopment Analysis (DEA) to assess efficiency in complex educational systems. Using data from the 2021 Basic Education Assessment System (SAEB), the model was applied to 16,664 public high schools and outperformed traditional approaches in uncovering latent patterns. In our sample, the proposed methodology improves clustering accuracy by 23% and identifies actionable improvement opportunities for 78% of schools. Beyond producing intra-cluster performance targets and peer benchmarks, the framework enables the mapping of performance asymmetries among comparable contexts, thereby surfacing inequities and guiding targeted interventions (e.g., peer-support networks, prioritized resource allocation, and focused teacher development). The main contribution is a scalable pipeline that combines multiple AI techniques to generate actionable insights for education managers, with an emphasis on equity and continuous efficiency improvement in complex systems.

Downloads

Download data is not yet available.

References

Arthur, D., & Vassilvitskii, S. (2007). K-means++: The Advantages of Careful Seeding. Proceedings of the 18th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), 1027–1035. [Link] [GS Search]

Avelar, C. P., Penna, P. H., & Freitas, H. C. (2014). Algoritmo K-means para mapeamento estático de processos em Redes-em-Chip. Anais do XV Simpósio em Sistemas Computacionais de Alto Desempenho (WSCAD 2014), 1–12. https://doi.org/10.5753/wscad.2014.15012 [GS Search]

Banker, R. D., Charnes, A., & Cooper, W. W. (1984). Some models for estimation technical and scale inefficiencies in Data Envelopment Analysis. Management Science, 30, 1078–1092. https://doi.org/10.1287/mnsc.30.9.1078 [GS Search]

Bartolacci, F., Gobbo, R. D., & Soverchia, M. (2024). Improving public services' performance measurement systems: applying data envelopment analysis in the big and open data context. International Journal of Public Sector Management, 38(3), 313–331. https://doi.org/10.1108/IJPSM-06-2023-0186 [GS Search]

Battisti, F. M., & Carvalho, T. B. A. (2022). Threshold Feature Selection PCA. Anais do X Symposium on Knowledge Discovery, Mining and Learning, 50–57. https://doi.org/10.5753/kdmile.2022.227718 [GS Search]

Borba, B. F. (2019). Proposta de um sistema de informação gerencial para análise de dados baseado no modelo K-MEANS e MCLP sobre a localização de instalações policiais [tese de dout., Universidade Federal de Pernambuco]. [Link]

Brasil. (1996). Lei de Diretrizes e Bases da Educação Nacional, Lei nº 9.394, de 20 de dezembro de 1996. [Link]

Brasil. (2018). Lei nº 13.709, de 14 de agosto de 2018. [Link]

Brasil. (2023). Saeb 2021: Indicador de Nível Socioeconômico do Saeb 2021 - Nota Técnica. Instituto Nacional de Estudos e Pesquisas Educacionais Anísio Teixeira (Inep). Brasília, DF. [Link] [GS Search]

Calinski, T., & Harabasz, J. (1974). A dendrite method for cluster analysis. Communications in Statistics-theory and Methods, 3, 1–27. https://doi.org/10.1080/03610927408827101 [GS Search]

Campos, M. M., & Vieira, L. F. (2021). COVID-19 and early childhood in Brazil: Impacts on children's well-being, education and care. European Early Childhood Education Research Journal, 29, 125–140. https://doi.org/10.1080/1350293X.2021.1872671 [GS Search]

Charnes, A., Cooper, W. W., & Rhodes, E. (1978). Measuring efficiency of decision making units. European Journal of Operational Research, 1, 429–444. https://doi.org/10.1016/0377-2217(78)90138-8 [GS Search]

Cook, W. D., & Zhu, J. (2014). Data Envelopment Analysis: A Handbook on the Modeling of Internal Structures and Networks. Springer. https://doi.org/10.1007/978-1-4899-8068-7 [GS Search]

Dahouda, M. K., & Joe, I. (2021). A deep-learned embedding technique for categorical features encoding. IEEE Access, 9, 114381–114391. https://doi.org/10.1109/ACCESS.2021.3104357 [GS Search]

Davies, D. L., & Bouldin, D. W. (2009). A cluster separation measure. IEEE transactions on pattern analysis and machine intelligence, (2), 224–227. https://doi.org/10.1109/TPAMI.1979.4766909 [GS Search]

Ding, C., & He, X. (2004). K-means clustering via principal component analysis. Proceedings of the 21st International Conference on Machine Learning (ICML). https://doi.org/10.1145/1015330.1015408 [GS Search]

Ersoy, Y. (2021). Performance Evaluation in Distance Education by Using Data Envelopment Analysis (DEA) and TOPSIS Methods. Arabian Journal for Science and Engineering, 46, 1803–1817. https://doi.org/10.1007/s13369-020-05087-0 [GS Search]

Ester, M., Kriegel, H.-P., Sander, J., & Xu, X. (1996). A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD-96), 226–231. [Link] [GS Search]

Gharakhani, M., Kazemi, I., & Haji, H. (2011). A robust DEA model for measuring the relative efficiency of Iranian high schools. Management Science Letters, 1, 389–404. https://doi.org/10.5267/j.msl.2011.01.002 [GS Search]

Giacomello, C. P., & Oliveira, R. L. D. (2014). Análise Envoltória de Dados (DEA): uma proposta para avaliação de desempenho de unidades acadêmicas de uma universidade. Revista Gestão Universitária na América Latina-GUAL, 7, 130–151. https://doi.org/10.5007/1983-4535.2014v7n2p130 [GS Search]

Guan, C., Mou, J., & Jiang, Z. (2020). Artificial intelligence innovation in education: A twenty-year data-driven historical analysis. International Journal of Innovation Studies, 4, 134–147. https://doi.org/10.1016/j.ijis.2020.09.001 [GS Search]

Halkidi, M., Batistakis, Y., & Vazirgiannis, M. (2001). On clustering validation techniques. Journal of intelligent information systems, 17(2), 107–145. https://doi.org/10.1023/A:1012801612483 [GS Search]

Hancock, J. T., & Khoshgoftaar, T. M. (2020). Survey on categorical data for neural networks. Journal of Big Data, 7(1), 28. [GS Search]

Hongyu, K., Sandanielo, V. L. M., & Oliveira Junior, G. J. (2016). Análise de componentes principais: resumo teórico, aplicação e interpretação. E&S Engineering and Science, 5, 83–90. [Link] [GS Search]

Iodice D'Enza, A., Markos, A., & Palumbo, F. (2022). Chunk-wise regularised PCA-based imputation of missing data. Statistical Methods & Applications, 31(2), 365–386. [GS Search]

James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An Introduction to Statistical Learning with Applications in R (Vol. 103). Springer. https://doi.org/10.1007/978-1-4614-7138-7 [GS Search]

Jolliffe, I. T. (2002). Principal Component Analysis (2ª ed.). Springer. [GS Search]

Jolliffe, I. T., & Cadima, J. (2016). Principal component analysis: a review and recent developments. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 374(2065), 20150202. https://doi.org/10.1098/rsta.2015.0202 [GS Search]

Khatun, M. R., Mim, M. A., Tasin, M. M., & Hossain, M. M. (2025). A hybrid framework of statistical, machine learning, and explainable AI methods for school dropout prediction. Plos one, 20(9), e0331917. [GS Search]

Knox, J., Williamson, B., & Bayne, S. (2020). Machine behaviourism: future visions of 'learnification' and 'datafication' across humans and digital technologies. Learning, Media and Technology, 45(1), 31–45. https://doi.org/10.1080/17439884.2019.1623251 [GS Search]

Leporace, C. (2023). Machine Learning e a Aprendizagem Humana – Uma Análise a Partir do Enativismo. [Link] [GS Search]

Mariano, E. B., Almeida, M. R., & Rebelatto, D. A. N. (2006). Princípios Básicos para uma proposta de ensino sobre análise por envoltória de dados. Anais do XXXIV Congresso Brasileiro de Ensino de Engenharia (COBENGE 2006). [Link] [GS Search]

Miranda, A. C., & Miranda, E. C. M. (2018). Alternative methodology in the elaboration of indicators to evaluate schools. Pro-Posições, 29(3), 207. https://doi.org/10.1590/1980-6248-2016-0051 [GS Search]

Mohamad, I. B., & Usman, D. (2013). Standardization and Its Effects on K-Means Clustering Algorithm. Research Journal of Applied Sciences, Engineering and Technology, 6(17), 3299–3303. https://doi.org/10.19026/rjaset.6.3638 [GS Search]

Murtagh, F., & Contreras, P. (2012). Algorithms for hierarchical clustering: an overview. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 2, 86–97. https://doi.org/10.1002/widm.53 [GS Search]

Nogueira, J. M. M., Oliveira, K. M. M., de Vasconcelos, A. P., & Oliveira, L. G. L. (2012). Estudo exploratório da eficiência dos Tribunais de Justiça estaduais brasileiros usando a Análise Envoltória de Dados (DEA). Revista de Administração Pública, 46, 1317–1340. https://doi.org/10.1590/S0034-76122012000500007 [GS Search]

Onusic, L. M., Nova, S. P. C. C., & Almeida, F. C. (2007). Modelos de previsão de insolvência utilizando a análise por envoltória de dados: aplicação a empresas brasileiras. Revista de Administração Contemporânea, 11, 77–97. https://doi.org/10.1590/S1415-65552007000500006 [GS Search]

Park, W., & Kwon, H. (2024). Implementing artificial intelligence education for middle school technology education in Republic of Korea. International Journal of Technology and Design Education, 34(1), 109–135. https://doi.org/10.1007/s10798-023-09812-2 [GS Search]

Pereira, V. R. F., Paula, A. D., & Araújo, C. O. (2020). Método de agrupamento aplicado à avaliação escolar: um estudo de caso para avaliações de larga escala. EDUCA – Revista Multidisciplinar em Educação, 7(17), 901–919. https://doi.org/10.26568/2359-2087.2020.4413 [GS Search]

Perico, A. E., Rebelatto, D. A. N., & Santana, N. B. (2008). Eficiência bancária: os maiores bancos são os mais eficientes? Uma análise por envoltória de dados. Gestão & Produção, 15(2), 421–431. https://doi.org/10.1590/S0104-530X2008000200016 [GS Search]

Pessano, N. B., & Halmenschlager, C. (2005). Aplicação de Data Mining em Data Warehouse: Desenvolvimento da Ferramenta ToolMiner. Simpósio Brasileiro de Sistemas de Informação (SBSI), 151–158. https://doi.org/10.5753/sbsi.2005.14979 [GS Search]

Pimenta, I. A., Silva, D. A., Moura, E. S., Silveira, M. M., & Gomes, R. L. (2024). Impact of Data Anonymization in Machine Learning Models. 13th Latin-American Symposium on Dependable and Secure Computing (LADC 2024), 188–191. https://doi.org/10.1145/3697090.3699865 [GS Search]

Rassouli-Currier, S. (2007). Assessing the efficiency of Oklahoma public schools: a data envelopment analysis. Southwestern Economic Review, 34, 131–144. [Link] [GS Search]

Roll, I., & Wylie, R. (2016). Evolution and revolution in artificial intelligence in education. International Journal of Artificial Intelligence in Education, 26, 582–599. https://doi.org/10.1007/s40593-016-0110-3 [GS Search]

Rousseeuw, P. J. (1987). Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, 20, 53–65. https://doi.org/10.1016/0377-0427(87)90125-7 [GS Search]

Sinaga, K. P., & Yang, M. S. (2020). Unsupervised k-means clustering algorithm. IEEE Access, 8, 80716–80727. https://doi.org/10.1109/ACCESS.2020.2988796 [GS Search]

Soares, J. L., Costa, T. B., Moura, L. S., Sousa, W. S., Mesquita, A. L., & Mesquita, D. S. (2023). Machine Learning Based Fault Detection on Belt Conveyor Idlers. Proceedings of the DINAME. https://doi.org/10.5753/rbie.2023.2831 [GS Search]

Soares, T. S. S. (2022). O Sistema de Avaliação da Educação Básica (SAEB) em tempos de pandemia: ensino de Matemática e as Tecnologias Digitais. Com a Palavra, o Professor, 7, 95–106. [Link] [GS Search]

Varella, C. A. A. (2008). Análise de Componentes Principais. Universidade Federal Rural do Rio de Janeiro. [Link] [GS Search]

Vilaça, W. S. (2023). Análise de Sistemas Educacionais Aplicando Técnicas de Agrupamento e Análise por Envoltória de Dados [tese de dout., Universidade Estadual do Ceará]. [Link]

Zhu, J. (2022). DEA under big data: data enabled analytics and network data envelopment analysis. Annals of Operations Research, 309(2), 761–783. https://doi.org/10.1007/s10479-020-03668-8 [GS Search]

Published

2026-04-07

How to Cite

CONCEIÇÃO, F. J. C. da; DORES, C. C. C. das; CAMPOS, G. A. L. de; GOMES, R. L. Hybrid Artificial Intelligence Model for Educational Efficiency Analysis: Integration of Unsupervised Learning and Data Envelopment Analysis. Brazilian Journal of Computers in Education, [S. l.], v. 34, p. 429–457, 2026. DOI: 10.5753/rbie.2026.6708. Disponível em: https://journals-sol.sbc.org.br/index.php/rbie/article/view/6708. Acesso em: 30 may. 2026.

Issue

Section

Articles

Most read articles by the same author(s)