Topic Modeling of Discussions in the Standing Committees of the Brazilian Chamber of Deputies

Authors

  • Matheus A. dos Santos Universidade Federal de Campina Grande
  • Nazareno Andrade Universidade Federal de Campina Grande
  • Fabio Morais Universidade Federal de Campina Grande

DOI:

https://doi.org/10.5753/jidm.2022.2705

Keywords:

Brazilian Chamber of Deputies, Latent Dirichlet Allocation, Natural Language Processing, Politics

Abstract

In order to establish and reinforce democracy, civil society must have the ability to oversee and keep track of the actions of its representatives. Despite substantial progress in transparency, monitoring the committees of the Brazilian National Congress remains a challenge. Primarily, due to the large volume of activities in these committees and the lack of structured data on their discussions. This work provides two contributions to this context. The first is an open dataset containing structured speeches from the 25 standing committees of the Brazilian Chamber of Deputies over the past two decades, which we have created and made available. The second is the application of Natural Language Processing techniques, particularly Latent Dirichlet Allocation (LDA), to identify the topics addressed in these committees. Based on these latent topics, we studied the similarities and differences between the standing committees, their relations, and how their debates changed over time. We also explored how the speeches of parliamentarians from various political parties and states related differently to these latent topics. Our results demonstrate how the topics discussed in the standing committees reverberate external events and show that these committees accommodate conversations between their main topics and opposing agendas.

Downloads

Download data is not yet available.

References

Arora, R. and Ravindran, B. Latent Dirichlet Allocation Based Multi-Document Summarization. In Proceedings of the Second Workshop on Analytics for Noisy Unstructured Text Data. Association for Computing Machinery, Singapore, pp. 91–97, 2008.

Batista, M. QUAIS POLÍTICAS IMPORTAM? Usando ênfases na agenda legislativa para mensurar saliência. Revista Brasileira de Ciências Sociais 35 (104): 1–20, 2020.

Blei, D. M. and Lafferty, J. D. Topic Models. In A. N. Srivastava and M. Sahami (Eds.), Text Mining: Classification, Clustering, and Applications. Chapman and Hall/CRC, New York, pp. 71–93, 2009.

Blei, D. M., Ng, A. Y., and Jordan, M. I. Latent Dirichlet Allocation. The Journal of Machine Learning Research 3 (18): 993–1022, 2003.

Chang, J., Boyd-Graber, J., Gerrish, S., Wang, C., and Blei, D. M. Reading Tea Leaves: How Humans Interpret Topic Models. In Proceedings of the 22nd International Conference on Neural Information Processing Systems. Curran Associates Inc., Red Hook, New York, pp. 288–296, 2009.

de Secondat de Montesquieu, C.-L., Carrithers, D. W., and Nugent, T. The Spirit of the Laws. University of California Press, Berkeley, 1977.

dos Santos, M. A., Andrade, N., and Morais, F. Topic Modeling of Committee Discussions in the Brazilian Chamber of Deputies. In Anais do IX Symposium on Knowledge Discovery, Mining and Learning (KDMiLe 2021).

Sociedade Brasileira de Computação - SBC, Brazil, pp. 49–56, 2021.

Greene, D. and Cross, J. P. Exploring the Political Agenda of the European Parliament Using a Dynamic Topic Modeling Approach. Political Analysis 25 (1): 77–94, 2017.

Grootendorst, M. BERTopic: Leveraging BERT and c-TF-IDF to create easily interpretable topics., 2020.

Huyck, C. and Orengo, V. M. A Stemming Algorithmm for the Portuguese Language. In International Symposium on String Processing and Information Retrieval. IEEE Computer Society, California, pp. 186–193, 2001.

McInnes, L., Healy, J., Saul, N., and Grossberger, L. UMAP: Uniform Manifold Approximation and Projection. The Journal of Open Source Software 3 (29): 861, 2018.

Moreira, D. Com a Palavra os Nobres Deputados: Ênfase Temática dos Discursos dos Parlamentares Brasileiros. Dados 63 (1): 1–37, 2020.

Niu, L., Dai, X., Zhang, J., and Chen, J. Topic2Vec: Learning distributed representations of topics. In 2015 International Conference on Asian Language Processing (IALP). IEEE, Suzhou, China, pp. 193–196, 2015.

Downloads

Published

2023-01-17

How to Cite

dos Santos, M. A., Andrade, N., & Morais, F. (2023). Topic Modeling of Discussions in the Standing Committees of the Brazilian Chamber of Deputies. Journal of Information and Data Management, 13(6). https://doi.org/10.5753/jidm.2022.2705

Issue

Section

KDMiLe 2021