Hate Speech Detection Against Women in Brazilian Portuguese Texts: Construction of the MINA-BR Database and Classification Model

Authors

  • Hannah O. Plath Universidade Estadual de Campinas
  • Maria Estela O. Paiva State University of Campinas
  • Danielle L. Pinto State University of Campinas
  • Paula D. P. Costa State University of Campinas

Keywords:

Hate Speech, Database, Misogyny, Machine Learning

Abstract

Due to the wide use of social networks, among other reasons, hate speech has gained prominence, sometimes motivated by impunity, sometimes associated with freedom of expression. One of the reasons why hate speech recognition is a difficult task is the scarcity of adequate databases, especially in languages other than English or when we refer to a specific domain, such as misogyny. This article describes a database in the Brazilian Portuguese language, which can be useful to classify hate speech against women. This work also reports a preliminary study where established hate speech classification algorithms were used to determine a baseline for the dataset. The highest F1-score obtained was 0.57 by the SVM algorithm.

Downloads

Download data is not yet available.

References

Alfina, I., Mulia, R., Fanany, M. I., and Ekanata, Y. (2017). Hate speech detection in the indonesian language: A dataset and preliminary study. In 2017 International Conference on Advanced Computer Science and Information Systems (ICACSIS), pages 233–238.

Badjatiya, P., Gupta, S., Gupta, M., and Varma, V. (2017). Deep learning for hate speech detection in tweets. In Proceedings of the 26th International Conference on World Wide Web Companion, pages 759–760.

Basile, V., Bosco, C., Fersini, E., Debora, N., Patti, V., Pardo, F. M. R., Rosso, P., Sanguinetti, M., et al. (2019). Semeval-2019 task 5: Multilingual detection of hate speech against immigrants and women in twitter. In 13th International Workshop on Semantic Evaluation, pages 54–63. Association for Computational Linguistics.

Bishop, C. M. (2006). Pattern recognition. Machine learning, 128(9).

Citron, D. K. (2011). Misogynistic cyber hate speech.

Corazza, M., Menini, S., Arslan, P., Sprugnoli, R., Cabrio, E., Tonelli, S., and Villata, S. (2018). Comparing different supervised approaches to hate speech detection.

Corazza, M., Menini, S., Cabrio, E., Tonelli, S., and Villata, S. (2020). A multilingual evaluation for online hate speech detection. ACM Transactions on Internet Technology (TOIT), 20(2):1–22.

Davidson, T., Warmsley, D., Macy, M., and Weber, I. (2017). Automated hate speech detection and the problem of offensive language. In Proceedings of the International AAAI Conference on Web and Social Media, volume 11.

de Gibert, O., Perez, N., Garc ́ıa-Pablos, A., and Cuadros, M. (2018). Hate speech dataset from a white supremacy forum. arXiv preprint arXiv:1809.04444.

de Pelle, R. and Moreira, V. (2017). Offensive comments in the brazilian web: a dataset and baseline results. In Anais do VI Brazilian Workshop on Social Network Analysis and Mining, Porto Alegre, RS, Brasil. SBC.

Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.

ElSherief, M., Nilizadeh, S., Nguyen, D., Vigna, G., and Belding, E. (2018). Peer to peer hate: Hate speech instigators and their targets. In Proceedings of the International AAAI Conference on Web and Social Media, volume 12.

EPTV (2021). Jornal da eptv segunda edição - campinas/piracicaba - pesquisa da unicamp busca desenvolver detector de discursos de ódio na internet.

Fortuna, P., Rocha da Silva, J., Soler-Company, J., Wanner, L., and Nunes, S. (2019). A hierarchically-labeled Portuguese hate speech dataset. In Proceedings of the Third

Workshop on Abusive Language Online, pages 94–104, Florence, Italy. Association for Computational Linguistics.

Huyck, C. and Orengo, V. (2001). A stemming algorithmm for the portuguese language. In String Processing and Information Retrieval, International Symposium on, page 0186, Los Alamitos, CA, USA. IEEE Computer Society.

Koushik, G., Rajeswari, K., and Muthusamy, S. K. (2019). Automated hate speech detection on twitter. In 2019 5th International Conference On Computing, Communication, Control And Automation (ICCUBEA), pages 1–4.

Kwok, I. and Wang, Y. (2013). Locate the hate: Detecting tweets against blacks. In Twenty-seventh AAAI conference on artificial intelligence.

Moura, M. A. (2016). O discurso do ódio em redes sociais. Lura Editorial (Lura Editoração Eletrônica LTDA-ME).

Plath, H. O., Paiva, M. E. O., Pinto, D. L., and Costa, P. D. P. (2021). Base de comentários de discurso de ódio contra mulheres mina-br: da concepção aos ataques por robôs. In XXIX Congresso de Iniciação Científica da UNICAMP, Campinas, Brasil.

Poletto, F., Basile, V., Sanguinetti, M., Bosco, C., and Patti, V. (2020). Resources and benchmark corpora for hate speech detection: a systematic review. Language Resources and Evaluation, pages 1–47.

Sohn, H. and Lee, H. (2019). Mc-bert4hate: Hate speech detection using multi-channel bert for different languages and translations. In 2019 International Conference on Data Mining Workshops (ICDMW), pages 551–559.

Tontodimamma, A., Nissi, E., Sarra, A., and Fontanella, L. (2021). Thirty years of research into hate speech: topics of interest and their evolution. Scientometrics, 126(1):157–179.

Warner, W. and Hirschberg, J. (2012). Detecting hate speech on the world wide web. In Proceedings of the second workshop on language in social media, pages 19–26. Association for Computational Linguistics.

Watanabe, H., Bouazizi, M., and Ohtsuki, T. (2018). Hate speech on twitter: A pragmatic approach to collect hateful and offensive expressions and perform hate speech detection. IEEE access, 6:13825–13835.

Published

2022-07-21

How to Cite

O. Plath, H., O. Paiva, M. E., L. Pinto, D., & D. P. Costa, P. (2022). Hate Speech Detection Against Women in Brazilian Portuguese Texts: Construction of the MINA-BR Database and Classification Model. Eletronic Journal of Undergraduate Research on Computing, 20(3). Retrieved from https://journals-sol.sbc.org.br/index.php/reic/article/view/2696

Issue

Section

Special Issue: CTIC/CSBC