Explainable Clustering: A solution to interpret and describe clusters

Authors

DOI:

https://doi.org/10.5753/jidm.2025.4663

Keywords:

clustering explainability, unsupervised learning, Explainable Artificial Intelligence, XAI

Abstract

Unsupervised learning algorithms represent a set of techniques for finding hidden patterns or characteristics in data without a previously defined label. An unsupervised learning technique is clustering, which consists of grouping data with similar characteristics into the same group, while data with different characteristics belong to other groups. Despite being a technique with many applications, understanding the output of clustering models is a complex task, requiring extensive manual analysis to understand the characteristics of each group, since the output doesn't contain much information about the cluster's characteristics. Therefore, this article proposes MAACLI: Model and Algorithm Agnostic CLustering Interpretability, a technique for generating user-friendly descriptions to help interpret groups generated by unsupervised clustering algorithms. The solution consists of two components that generate friendly descriptions of the groups and was tested on two types of datasets, one of which was provided by a partner company. The solution was able to generate simple, user-friendly descriptions of the groups, extracting only the important attributes.

Downloads

Download data is not yet available.

References

Bartels, C. (2022). Cluster analysis for customer segmentation with open banking data. In 2022 3rd Asia Service Sciences and Software Engineering Conference, ASSE’ 22, page 87–94, New York, NY, USA. Association for Computing Machinery. DOI: 10.1145/3523181.3523194.

Bertsimas, D., Orfanoudaki, A., and Wiberg, H. M. (2020). Interpretable clustering: an optimization approach. Machine Learning, 110:89–138. Breiman, L. (2001). Random forests. Machine learning, 45(1):5–32.

Breiman, L., Friedman, J. H., Olshen, R. A., and Stone, C. J. (1983). Classification and regression trees.

Corral, G., Armengol, E., Fornells, A., and Golobardes, E. (2009). Explanations of unsupervised learning clustering applied to data security analysis. Neurocomputing, 72(13):2754–2762.

Dasgupta, S., Frost, N., Moshkovitz, M., and Rashtchian, C. (2020). Explainable k-means and k-medians clustering. ICML’20. JMLR.org.

Ellis, C. A., Sendi, M. S. E., Plis, S., Miller, R. L., and Calhoun, V. D. (2021). Algorithm-agnostic explainability for unsupervised clustering. ArXiv, abs/2105.08053.

Fraiman, R., Ghattas, B., and Svarc, M. (2013). Interpretable clustering using unsupervised binary trees. Advances in Data Analysis and Classification, 7:125–145.

Frost, N., Moshkovitz, M., and Rashtchian, C. (2020). Exkmc: Expanding explainable k-means clustering. ArXiv, abs/2006.02399.

Gagolewski, M. (2022). A framework for benchmarking clustering algorithms. SoftwareX, 20:101270. DOI: https://doi.org/10.1016/j.softx.2022.101270.

Ghahramani, Z. (2003). Unsupervised learning. In Summer school on machine learning, pages 72–112. Springer.

Laber, E., Murtinho, L., and Oliveira, F. (2023). Shallow decision trees for explainable k-means clustering. Pattern Recognition, 137:109239. DOI: https://doi.org/10.1016/j.patcog.2022.109239.

Li, Y., Chu, X., Tian, D., Feng, J., and Mu, W. (2021). Customer segmentation using k-means clustering and the adaptive particle swarm optimization algorithm. Applied Soft Computing, 113:107924.

Liu, Y., Li, Z., Xiong, H., Gao, X., and Wu, J. (2010). Understanding of internal clustering validation measures. In 2010 IEEE International Conference on Data Mining, pages 911–916. DOI: 10.1109/ICDM.2010.35.

Loyola-González, O., Gutierrez-Rodríguez, A. E., Medina-Pérez, M. A., Monroy, R., Martínez-Trinidad, J. F., Carrasco-Ochoa, J. A., and García-Borroto, M. (2020). An explainable artificial intelligence model for clustering numerical databases. IEEE Access, 8:52370–52384. DOI: 10.1109/ACCESS.2020.2980581.

Mann, H. B. and Whitney, D. R. (1947). On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other. The Annals of Mathematical Statistics, 18(1):50 – 60. DOI: 10.1214/aoms/1177730491.

Molnar, C. (2022). Interpretable Machine Learning. 2 edition.

Morichetta, A., Casas, P., and Mellia, M. (2019). Explainit: Towards explainable ai for unsupervised network traffic analysis. Big-DAMA ’19, page 22–28, New York, NY, USA. Association for Computing Machinery. DOI: 10.1145/3359992.3366639.

Moura, M., Veras, R., and Machado, V. (2022). Caibal: cluster-attribute interdependency based automatic labeler. In Proceedings of the 37th ACM/SIGAPP Symposium on Applied Computing, SAC ’22, page 1109–1116, New York, NY, USA. Association for Computing Machinery. DOI: 10.1145/3477314.3507140.

Oliveira, G., Silva, F., and Ferreira, R. (2023). Model and algorithm-agnostic clustering interpretability. In Anais do XI Symposium on Knowledge Discovery, Mining and Learning, pages 33–40, Porto Alegre, RS, Brasil. SBC. DOI: 10.5753/kdmile.2023.232618.

Ribeiro, M. T., Singh, S., and Guestrin, C. (2016). ”why should i trust you?”: Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16, page 1135–1144, New York, NY, USA. Association for Computing Machinery. DOI: 10.1145/2939672.2939778.

Xu, R. and Wunsch, D. (2005). Survey of clustering algorithms. IEEE Transactions on neural networks, 16(3):645–678. Yu, Z., Sohail, A., Nofal, T. A., and Tavares, J. M. R. S. (2022). Explainability Of Neural Network Clustering In Interpreting The Covid 19 Emergency Data. FRACTALS (fractals), 30(05):1–12. DOI: 10.1142/S0218348X22401223.

Zhang, T., Ramakrishnan, R., and Livny, M. (1996). Birch: an efficient data clustering method for very large databases. In Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, SIGMOD ’96, page 103–114, New York, NY, USA. Association for Computing Machinery. DOI: 10.1145/233269.233324.

Downloads

Published

2025-06-20

How to Cite

de Oliveira, G. S., A. Silva, F., & V. Ferreira, R. (2025). Explainable Clustering: A solution to interpret and describe clusters. Journal of Information and Data Management, 16(1), 170–180. https://doi.org/10.5753/jidm.2025.4663

Issue

Section

Best Papers of KDMiLe 2023 - Extended Papers