New Metrics for Assessing the Quality of Hierarchical Topic Modeling Strategies

Authors

  • Antônio Pereira UFSJ
  • Leonardo Rocha UFSJ
  • Felipe Viegas UFSJ

Keywords:

Topic modeling, Automatic evaluation, Word embeddings, Hierarchy topic modeling

Abstract

Hierarchical Topic Modeling (HTM) are strategies that aim to automatically extract consistent semantic topics from textual documents, respecting the hierarchy in which the information is structured. Current evaluation metrics for these approaches typically measure the quality of each topic individually. In HTM, other issues need to be considered: (i) Redundancy of topics; (ii) Semantic diversity of constructed topics; and (iii) Topological consistency. In this work, we propose and evaluate three new evaluation metrics that consider these issues, complementing the methodology for evaluating HTM approaches from the perspective of the hierarchical structure in which the topics are constructed.

Downloads

Download data is not yet available.

References

Bicalho, P. V., de Oliveira Cunha, T., Mourao, F. H. J., Pappa, G. L., and Jr., W. M. (2014). Generating cohesive semantic topics from latent factors. In BRACIS.

Li, W. and McCallum, A. (2006). Pachinko allocation: Dag-structured mixture models of topic correlations. In Proceedings of the 23rd international conference on Machine learning, pages 577–584. ACM.

Mikolov, T., Grave, E., Bojanowski, P., Puhrsch, C., and Joulin, A. (2017). Advances in pre-training distributed word representations. CoRR, abs/1712.09405.

Mimno, D., Li, W., and McCallum, A. (2007). Mixtures of hierarchical topics with pachinko allocation. In Proceedings of the 24th international conference on Machine learning, pages 633–640.

Nikolenko, S. I., Koltcov, S., and Koltsova, O. (2017). Topic modelling for qualitative studies. Journal of Information Science.

Perotte, A. J., Wood, F., Elhadad, N., and Bartlett, N. (2011). Hierarchically supervised latent dirichlet allocation. In Advances in neural information processing systems, pages 2609–2617.

Teh, Y. W., Jordan, M. I., Beal, M. J., and Blei, D. M. (2006). Hierarchical dirichlet processes. Journal of the American Statistical Association, 101(476):1566–1581.

Viegas, F., Cunha, W., Gomes, C., Pereira, A., Rocha, L., and Goncalves, M. (2020). Cluhtm-semantic hierarchical topic modeling based on cluwords. In Proceedings of the 58th ACL, pages 8138–8150.

Published

2022-07-21

How to Cite

Pereira, A., Rocha, L., & Viegas, F. (2022). New Metrics for Assessing the Quality of Hierarchical Topic Modeling Strategies. Eletronic Journal of Undergraduate Research on Computing, 20(3). Retrieved from https://journals-sol.sbc.org.br/index.php/reic/article/view/2689

Issue

Section

Special Issue: CTIC/CSBC