From Pampas to Pixels: Fine-Tuning Diffusion Models for Gaúcho Heritage
DOI:
https://doi.org/10.5753/jbcs.2025.4166Keywords:
Diffusion Models, Text-to-image Generation, Generative Artificial Intelligence, Computer VisionAbstract
Generative Artificial Intelligence has become pervasive in society, witnessing significant advancements in various domains. Particularly in the domain of Text-to-Image (TTI) models, Latent Diffusion Models (LDMs) showcase remarkable capabilities in generating visual content based on textual prompts. This paper addresses the potential of LDMs in representing local cultural concepts, historical figures, and endangered species. In this study, we use the cultural heritage of Rio Grande do Sul (RS), Brazil, as an illustrative case. Our objective is to contribute to the broader understanding of how generative models can help to capture and preserve the regional culture and historical identity. The article outlines the methodology, including subject selection, dataset creation, and fine-tuning process. The results showcase the picture generation alongside the challenges and feasibility of each concept. In conclusion, this work shows the power of these models to represent and preserve unique aspects of diverse regions and communities.
Downloads
References
Bernardes, A. D. (2021). O chimarrão como patrimônio imaterial gaúcho: os sentidos atribuídos ao desejo de preservação. Available at:[link].
Betker, J., Goh, G., Jing, L., Brooks, T., Wang, J., Li, L., Ouyang, L., Zhuang, J., Lee, J., Guo, Y., et al. (2023). Improving image generation with better captions. Computer Science. Available at:[link].
Donahue, C., McAuley, J., and Puckette, M. (2019). Adversarial audio synthesis. DOI: 10.48550/arXiv.1802.04208.
He, Y., Yang, T., Zhang, Y., Shan, Y., and Chen, Q. (2023). Latent video diffusion models for high-fidelity long video generation. DOI: 10.48550/arXiv.2211.13221.
Ho, J., Jain, A., and Abbeel, P. (2020). Denoising diffusion probabilistic models. In Proceedings of the 34th International Conference on Neural Information Processing Systems, NIPS'20, Red Hook, NY, USA. Curran Associates Inc.. DOI: 10.5555/3495724.3496298.
Hu, E. J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., and Chen, W. (2021). Lora: Low-rank adaptation of large language models. DOI: 10.48550/arXiv.2106.09685.
Hughes, R. T., Zhu, L., and Bednarz, T. (2021). Generative adversarial networks–enabled human–artificial intelligence collaborative applications for creative and design industries: A systematic review of current approaches and trends. Frontiers in Artificial Intelligence, 4. Available at:[link]. DOI: 10.3389/frai.2021.604234.
Kaur, N. and Singh, P. (2023). Conventional and contemporary approaches used in text to speech synthesis: A review. Artificial Intelligence Review, 56(7):5837-5880. DOI: 10.1007/s10462-022-10315-0.
Liu, D., Long, C., Zhang, H., Yu, H., Dong, X., and Xiao, C. (2020). Arshadowgan: Shadow generative adversarial network for augmented reality in single light scenes. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 8136-8145. DOI: 10.1109/CVPR42600.2020.00816.
Liu, V., Qiao, H., and Chilton, L. (2022). Opal: Multimodal image generation for news illustration. In Proceedings of the 35th Annual ACM Symposium on User Interface Software and Technology, pages 1-17. DOI: 10.1145/3526113.3545621.
Lobato, W., Farias, F., Cruz, W., and Amadeus, M. (2023). Performance comparison of tts models for brazilian portuguese to establish a baseline. In ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1-5. DOI: 10.1109/ICASSP49357.2023.10097264.
Mantovani, A., Morellato, L. P. C., and Reis, M. S. d. (2004). Fenologia reprodutiva e produção de sementes em araucaria angustifolia (bert.) o. kuntze. Brazilian Journal of Botany, 27(4):787–796. DOI: 10.1590/S0100-84042004000400017.
Oliveira, M. (2012). Garibaldi: herói dos dois mundos. Editora Contexto. Book.
OpenAI (2023). Gpt-4 technical report. DOI: 10.48550/arXiv.2303.08774.
Packhäuser, K., Folle, L., Thamm, F., and Maier, A. (2023). Generation of anonymous chest radiographs using latent diffusion models for training thoracic abnormality classification systems. In 2023 IEEE 20th International Symposium on Biomedical Imaging (ISBI), pages 1-5. DOI: 10.1109/ISBI53787.2023.10230346.
Pinaya, W. H. L., Tudosiu, P.-D., Dafflon, J., Da Costa, P. F., Fernandez, V., Nachev, P., Ourselin, S., and Cardoso, M. J. (2022). Brain imaging generation with latent diffusion models. In Deep Generative Models, pages 117-126, Cham. Springer Nature Switzerland. DOI: 10.1007/978-3-031-18576-2_12.
Ploennigs, J. and Berger, M. (2023). Ai art in architecture. AI in Civil Engineering, 2(1):8. DOI: 10.1007/s43503-023-00018-y.
Podell, D., English, Z., Lacey, K., Blattmann, A., Dockhorn, T., Müller, J., Penna, J., and Rombach, R. (2023). Sdxl: Improving latent diffusion models for high-resolution image synthesis. DOI: 10.48550/arXiv.2307.01952.
Raj, A., Kaza, S., Poole, B., Niemeyer, M., Ruiz, N., Mildenhall, B., Zada, S., Aberman, K., Rubinstein, M., Barron, J., Li, Y., and Jampani, V. (2023). Dreambooth3d: Subject-driven text-to-3d generation. In 2023 IEEE/CVF International Conference on Computer Vision (ICCV), pages 2349-2359. DOI: 10.1109/ICCV51070.2023.00223.
Ramesh, A., Pavlov, M., Goh, G., Gray, S., Voss, C., Radford, A., Chen, M., and Sutskever, I. (2021). Zero-shot text-to-image generation. DOI: 10.48550/arXiv.2102.12092.
Reed, J., Alterio, B., Coblenz, H., O'Lear, T., and Metz, T. (2023). Ai image-generation as a teaching strategy in nursing education. Journal of Interactive Learning Research, 34(2):369-399. Available at:[link].
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., and Ommer, B. (2022). High-resolution image synthesis with latent diffusion models. pages 10674-10685. DOI: 10.1109/CVPR52688.2022.01042.
Ruiz, N., Li, Y., Jampani, V., Pritch, Y., Rubinstein, M., and Aberman, K. (2023). Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 22500-22510. DOI: 10.1109/CVPR52729.2023.02155.
Silveira, F. (2018). Gato-do-mato-pequeno (leopardus guttulus). Available at:[link].
Sinotti, K. G., Kontz, L. B., and Júnior, O. L. (2015). A revolução farroupilha: o massacre de cerro dos porongos. Revista Contribuciones a las Ciencias Sociales, (27). Available at:[link].
Sutedy, M. F. and Qomariyah, N. N. (2022). Text to image latent diffusion model with dreambooth fine tuning for automobile image generation. In 2022 5th International Seminar on Research of Information Technology and Intelligent Systems (ISRITI), pages 440-445. IEEE. DOI: 10.1109/ISRITI56927.2022.10052908.
Takagi, Y. and Nishimoto, S. (2023). High-resolution image reconstruction with latent diffusion models from human brain activity. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 14453-14463. DOI: 10.1109/CVPR52729.2023.01389.
von Platen, P., Patil, S., Lozhkov, A., Cuenca, P., Lambert, N., Rasul, K., Davaadorj, M., and Wolf, T. (2022). Diffusers: State-of-the-art diffusion models. GitHub repository. Available at:[link].
Weber, T., Ingrisch, M., Bischl, B., and Rügamer, D. (2023). Cascaded latent diffusion models for high-resolution chest x-ray synthesis. In Kashima, H., Ide, T., and Peng, W.-C., editors, Advances in Knowledge Discovery and Data Mining, pages 180-191, Cham. Springer Nature Switzerland. DOI: 10.1007/978-3-031-33380-4_14.
Yıldırım, E. (2022). Text-to-image generation ai in architecture. Art and Architecture: Theory, Practice and Experience, page 97. Available at:[link].
Zalla, J. and Menegat, C. (2011). História e memória da revolução farroupilha: breve genealogia do mito. Revista Brasileira de História, 31(62):49–70. DOI: 10.1590/S0102-01882011000200005.
Zhang, S. (2023). Dreambooth-based image generation methods for improving the performance of cnn. In 2023 IEEE 3rd International Conference on Electronic Technology, Communication and Information (ICETCI), pages 1181-1184. DOI: 10.1109/ICETCI57876.2023.10176568.
Zhou, D., Wang, W., Yan, H., Lv, W., Zhu, Y., and Feng, J. (2023). Magicvideo: Efficient video generation with latent diffusion models. DOI: 10.48550/arXiv.2211.11018.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 William Alberto Cruz-Castañeda, Marcellus Amadeus, André Felipe Zanella, Felipe Rodrigues Perche Mahlow

This work is licensed under a Creative Commons Attribution 4.0 International License.

