Exploring Few-Shot Approaches to Automatic Text Complexity Assessment in European Portuguese

Authors

DOI:

https://doi.org/10.5753/jbcs.2025.5820

Keywords:

Text Complexity, Readability, Few-Shot Prompting, Large Language Models

Abstract

The automatic assessment of text complexity has an important role to play in the context of language education. In this study, we shift the focus from L2 learners to adult native speakers with low literacy by exploring the new iRead4Skills dataset in European Portuguese. Furthermore, instead of relying on classical machine learning approaches or fine-tuning a pre-trained language model, we leverage the capabilities of prompt-based Large Language Models (LLMs), with a special focus on few-shot prompting approaches. We explore prompts with varying degrees of information, as well as different example selection approaches. Overall, the results of our experiments reveal that even a single example significantly increases the performance of the model and that few-shot approaches generalize better than fine-tuned models. However, automatic complexity assessment is a difficult and highly subjective task that is still far from solved.

Downloads

Download data is not yet available.

References

Akef, S., Mendes, A., Meurers, D., and Rebuschat, P. (2024). Investigating the Generalizability of Portuguese Readability Assessment Models Trained Using Linguistic Complexity Features. In Proceedings of the International Conference on Computational Processing of Portuguese (PROPOR), pages 332-341. Available online [link].

Aluisio, S., Specia, L., Gasperin, C., and Scarton, C. (2010). Readability Assessment for Text Simplification. In Proceedings of the NAACL-HLT Workshop on Innovative Use of NLP for Building Educational Applications, pages 1-9. Available online [link].

Amaro, R., Monteiro, R., François, T., and de Deuxchaisnes, J. N. (2024). iRead4Skills Dataset 2: Annotated Corpora by Level of Complexity for FR, PT and SP. Number D3.7. DOI: 10.5281/zenodo.14653180.

Baptista, J., Ribeiro, E., and Mamede, N. (2024). iRead4Skills @ IberSPEECH 2024: Project Presentation and Developments for the Portuguese Language. In Proceedings of the IberSPEECH Conference, pages 297-299. DOI: 10.21437/IberSPEECH.2024-62.

Branco, A., Rodrigues, J., Costa, F., Silva, J., and Vaz, R. (2014a). Assessing Automatic Text Classification for Interactive Language Learning. In Proceedings of the International Conference on Information Society (i-Society), pages 70-78. DOI: 10.1109/i-society.2014.7009014.

Branco, A., Rodrigues, J., Costa, F., Silva, J., and Vaz, R. (2014b). Rolling out Text Categorization for Language Learning Assessment Supported by Language Technology. In Proceedings of the International Conference on the Computational Processing of the Portuguese Language (PROPOR), pages 256-261. DOI: 10.1007/978-3-319-09761-9_29.

Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., et al. (2020). Language Models are Few-Shot Learners. pages 1877-1901. DOI: 10.48550/arxiv.2005.14165.

Cha, M., Gwon, Y., and Kung, H. (2017). Language Modeling by Clustering with Word Embeddings for Text Readability Assessment. In Proceedings of the Conference on Information and Knowledge Management (CIKM), pages 2003-2006. DOI: 10.1145/3132847.3133104.

Chen, B., Zhang, Z., Langrené, N., and Zhu, S. (2023). Unleashing the Potential of Prompt Engineering in Large Language Models: a Comprehensive Review. Computing Research Repository, arXiv:2310.14735. DOI: 10.48550/arXiv.2310.14735.

Correia, J. and Mendes, R. (2021). Neural Complexity Assessment: A Deep Learning Approach to Readability Classification for European Portuguese Corpora. In Proceedings of the International Conference on Intelligent Data Engineering and Automated Learning (IDEAL), pages 300-311. DOI: 10.1007/978-3-030-91608-4_30.

Council of Europe (2001). Common European Framework of Reference for Languages: Learning, Teaching, Assessment. Cambridge University Press. Available online [link].

Crossley, S. A., Skalicky, S., Dascalu, M., McNamara, D. S., and Kyle, K. (2017). Predicting Text Comprehension, Processing, and Familiarity in Adult Readers: New Approaches to Readability Formulas. Discourse Processes, 54(5-6):340-359. DOI: 10.1080/0163853x.2017.1296264.

Curto, P. (2014). Classificador de Textos para o Ensino de Português como Segunda Língua. Master's thesis. Available online [link].

Curto, P., Mamede, N., and Baptista, J. (2015). Automatic Text Difficulty Classifier. In Proceedings of the International Conference on Computer Supported Education (CSEDU), volume 1, pages 36-44. DOI: 10.5220/0005428300360044.

Devlin, J., Chang, M.-W., Kenton, L., and Toutanova, K. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In NAACL-HLT, volume 1, pages 4171-4186. DOI: 10.18653/v1/N19-1423.

DuBay, W. H. (2004). The Principles of Readability. Impact Information. Available online [link].

Fan, W., Ding, Y., Ning, L., Wang, S., Li, H., Yin, D., Chua, T.-S., and Li, Q. (2024). A Survey on RAG Meeting LLMs: Towards Retrieval-Augmented Large Language Models. In Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 6491–-6501. DOI: 10.1145/3637528.3671470.

Filighera, A., Steuer, T., and Rensing, C. (2019). Automatic Text Difficulty Estimation Using Embeddings and Neural Networks. In Proceedings of the European Conference on Technology Enhanced Learning (EC-TEL), pages 335-348. DOI: 10.1007/978-3-030-29736-7_25.

Forti, L., Grego Bolli, G., Santarelli, F., Santucci, V., and Spina, S. (2020). MALT-IT2: A New Resource to Measure Text Difficulty in Light of CEFR Levels for Italian L2 Learning. In Proceedings of Language Resources and Evaluation Conference (LREC), pages 7204-7211. Available online [link].

François, T. and Fairon, C. (2012). An ``AI Readability'' Formula for French as a Foreign Language. In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pages 466-477. Available online [link].

François, T., Müller, A., Rolin, E., and Norré, M. (2020). AMesure: A Web Platform to Assist the Clear Writing of Administrative Texts. In Proceedings of the Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the International Joint Conference on Natural Language Processing (AACL-IJCNLP): System Demonstrations, pages 1-7. DOI: 10.18653/v1/2020.aacl-demo.1.

Friedman, J., Hastie, T., and Tibshirani, R. (2000). Additive Logistic Regression: A Statistical View of Boosting. The Annals of Statistics, 28(2):337-407. DOI: 10.1214/aos/1016218223.

Friedman, J. H. (2001). Greedy Function Approximation: A Gradient Boosting Machine. The Annals of Statistics, 29(5). DOI: 10.1214/aos/1013203451.

Gao, Y., Xiong, Y., Gao, X., Jia, K., Pan, J., Bi, Y., Dai, Y., Sun, J., Wang, H., and Wang, H. (2023). Retrieval-Augmented Generation for Large Language Models: A Survey. Computing Research Repository, arXiv:2312.10997. DOI: 10.48550/arXiv.2312.10997.

Giray, L. (2023). Prompt Engineering with ChatGPT: a Guide for Academic Writers. Annals of Biomedical Engineering, 51(12):2629-2633. DOI: 10.1007/s10439-023-03272-4.

Gomes, L., Branco, A., Silva, J., Rodrigues, J., and Santos, R. (2025). Open Sentence Embeddings for Portuguese with the Serafim PT* Encoders Family. In Proceedings of the EPIA Conference on Artificial Intelligence, pages 267-279. DOI: 10.1007/978-3-031-73503-5_22.

Graesser, A. C., McNamara, D. S., Louwerse, M. M., and Cai, Z. (2004). Coh-Metrix: Analysis of Text on Cohesion and Language. Behavior Research Methods, Instruments, & Computers, 36(2):193-202. DOI: 10.3758/BF03195564.

Hartmann, N., Fonseca, E., Shulby, C., Treviso, M., Silva, J., and Aluísio, S. (2017). Portuguese Word Embeddings: Evaluating on Word Analogies and Natural Language Tasks. In Proceedings of the Brazilian Symposium in Information and Human Language Technology (STIL), pages 122-131. DOI: 10.48550/arxiv.1708.06025.

Hernandez, N., Oulbaz, N., and Faine, T. (2022). Open Corpora and Toolkit for Assessing Text Readability in French. In Proceedings of the Workshop on Tools and Resources to Empower People with REAding DIfficulties (READI), pages 54-61. Available online [link].

Jönsson, S., Rennes, E., Falkenjack, J., and Jönsson, A. (2018). A Component Based Approach to Measuring Text Complexity. In Proceedings of the Swedish Language Technology Conference (SLTC), pages 58-61.

Karpov, N., Baranova, J., and Vitugin, F. (2014). Single-sentence Readability Prediction in Russian. In Proceedings of the International Conference on Analysis of Images, Social Networks and Texts (AIST), pages 91-100. DOI: 10.1007/978-3-319-12580-0_9.

Kincaid, J. P., Fishburne Jr, R. P., Rogers, R. L., and Chissom, B. S. (1975). Derivation of New Readability Formulas (Automated Readability Index, Fog Count and Flesch Reading Ease Formula) for Navy Enlisted Personnel. DOI: 10.21236/ada006655.

Leal, S. E., Duran, M. S., Scarton, C. E., Hartmann, N. S., and Aluísio, S. M. (2023). NILC-Metrix: Assessing the Complexity of Written and Spoken Language in Brazilian Portuguese. Language Resources and Evaluation. DOI: 10.1007/s10579-023-09693-w.

Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., and Stoyanov, V. (2019). RoBERTa: A Robustly Optimized BERT Pretraining Approach. Computing Research Repository, arXiv:1907.11692. DOI: 10.48550/arXiv.1907.11692.

Llama Team (2024). The Llama 3 Herd of Models. Computing Research Repository, arXiv:2407.21783. DOI: 10.48550/arXiv.2407.21783.

Martinc, M., Pollak, S., and Robnik-Šikonja, M. (2021). Supervised and Unsupervised Neural Approaches to Text Readability. Computational Linguistics, 47(1):141-179. DOI: 10.1162/coli_a_00398.

Martins, P. H., Fernandes, P., Alves, J., Guerreiro, N. M., Rei, R., Alves, D. M., Pombal, J., Farajian, A., Faysse, M., Klimaszewski, M., Colombo, P., Haddow, B., de Souza, J. G. C., Birch, A., and Martins, A. F. T. (2024). EuroLLM: Multilingual Language Models for Europe. Computing Research Repository, arXiv:2409.16235. DOI: 10.48550/arXiv.2409.16235.

Marujo, L., Lopes, J., Mamede, N., Trancoso, I., Pino, J., Eskenazi, M., Baptista, J., and Viana, C. (2009). Porting REAP to European Portuguese. In Proceedings of the International Workshop on Speech and Language Technology in Education (SLaTE), pages 69-72. DOI: 10.21437/slate.2009-28.

McCullagh, P. (1980). Regression Models for Ordinal Data. Journal of the Royal Statistical Society: Series B (Methodological), 42(2):109-127. DOI: 10.1111/j.2517-6161.1980.tb01109.x.

McNamara, D. S., Graesser, A. C., McCarthy, P. M., and Cai, Z. (2014). Automated Evaluation of Text and Discourse with Coh-Metrix. Cambridge University Press. DOI: 10.1017/CBO9780511894664.

Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., and Dean, J. (2013). Distributed Representations of Words and Phrases and their Compositionality. In NeurIPS, pages 3111-3119. Available online [link].

Mishra, S., Khashabi, D., Baral, C., and Hajishirzi, H. (2022). Cross-Task Generalization via Natural Language Crowdsourcing Instructions. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), pages 3470-3487. DOI: 10.18653/v1/2022.acl-long.244.

Mohtaj, S., Naderi, B., and Möller, S. (2022). Overview of the GermEval 2022 Shared Task on Text Complexity Assessment of German Text. In Proceedings of the GermEval Workshop on Text Complexity Assessment of German Text, pages 1-9. Available online [link].

Monteiro, R., Amaro, R., Correia, S., Pintard, A., Gauchola, R., Moutinho, M., and Blanco Escoda, X. (2023). iRead4Skills Complexity Levels. Number D3.1. DOI: 10.5281/zenodo.10459090.

Nadeem, F. and Ostendorf, M. (2018). Estimating Linguistic Complexity for Science Texts. In Proceedings of the Workshop on Innovative Use of NLP for Building Educational Applications, pages 45-55. DOI: 10.18653/v1/W18-0505.

North, K., Zampieri, M., and Shardlow, M. (2023). Lexical Complexity Prediction: An Overview. ACM Computing Surveys, 55(9):1-42. DOI: 10.1145/3557885.

OpenAI (2023). ChatGPT. Available online [link].

OpenAI et al. (2024). GPT-4o System Card. Computing Research Repository, arXiv:2410.21276. DOI: 10.48550/arXiv.2410.21276.

Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C., Mishkin, P., Zhang, C., Agarwal, S., Slama, K., Ray, A., Schulman, J., Hilton, J., Kelton, F., Miller, L., Simens, M., Askell, A., Welinder, P., Christiano, P. F., Leike, J., and Lowe, R. (2022). Training Language Models to Follow Instructions with Human Feedback. In Proceedings of Advances in Neural Information Processing Systems (NeurIPS), volume 35, pages 27730-27744. DOI: 10.48550/arxiv.2203.02155.

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., and Duchesnay, E. (2011). Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, 12:2825-2830. Available online [link].

Pilán, I. and Volodina, E. (2018). Investigating the Importance of Linguistic Complexity Features Across Different Datasets Related to Language Learning. In Proceedings of the Workshop on Linguistic Complexity and Natural Language Processing, pages 49-58. Available online [link].

Pintard, A., François, T., Nagant de Deuxchaisnes, J., Barbosa, S., Reis, M. L., Moutinho, M., Monteiro, R., Amaro, R., Correia, S., Rodríguez Rey, S., Garcia González, M., Mu, K., and Blanco Escoda, X. (2024). iRead4Skills Dataset 1: Corpora by Complexity Level for FR, PT and SP. Number D3.2. DOI: 10.5281/zenodo.13768477.

Qiao, S., Ou, Y., Zhang, N., Chen, X., Yao, Y., Deng, S., Tan, C., Huang, F., and Chen, H. (2022). Reasoning with Language Model Prompting: A Survey. Computing Research Repository, arXiv:2212.09597. DOI: 10.48550/arXiv.2212.09597.

Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., and Sutskever, I. (2019). Language Models are Unsupervised Multitask Learners. Available online [link].

Reimers, N. and Gurevych, I. (2019). Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. In Proceedings of the Conference on Empirical Methods in Natural Language Processing and the International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3982-3992. DOI: 10.18653/v1/D19-1410.

Reis, M. L., Barbosa, S., Moutinho, M., Monteiro, R., Correia, S., and Amaro, R. (2024). Intelligent Support for Low Literacy Adults: The European Portuguese iRead4Skills Corpus. International Journal of Emerging Technologies in Learning (iJET), 19(8):61-–81. DOI: 10.3991/ijet.v19i08.52023.

Reynolds, R. (2016). Insights from Russian Second Language Readability Classification: Complexity-Dependent Training Requirements, and Feature Evaluation of Multiple Categories. In Proceedings of the Workshop on Innovative Use of NLP for Building Educational Applications, pages 289-300. DOI: 10.18653/v1/W16-0534.

Ribeiro, E., Mamede, N., and Baptista, J. (2024a). Automatic Text Readability Assessment in European Portuguese. In Proceedings of the International Conference on Computational Processing of Portuguese (PROPOR), pages 97-107. Available online [link].

Ribeiro, E., Mamede, N., and Baptista, J. (2024b). Avaliação Automática do Nível de Complexidade de Textos em Português Europeu. Linguamática, 16(2):121-145. DOI: 10.21814/lm.16.2.449.

Ribeiro, E., Mamede, N., and Baptista, J. (2024c). Text Readability Assessment in European Portuguese: A Comparison of Classification and Regression Approaches. In Proceedings of the International Conference on Computational Processing of Portuguese (PROPOR), pages 551-557. Available online [link].

Ribeiro, E., Wilkens, R., Braña, A. B., Bolos, A. C., Mamede, N., Baptista, J., González, M. G., Amaro, R., and François, T. (2024d). ICA API Documentation. Project Deliverable D4.2, iRead4Skills. Available online [link].

Rodrigues, J., Gomes, L., Silva, J., Branco, A., Santos, R., Cardoso, H. L., and Osório, T. (2023). Advancing Neural Encoding of Portuguese with Transformer Albertina PT-*. In Proceedings of the Portuguese Conference on Artificial Intelligence (EPIA), page 441–453. DOI: 10.1007/978-3-031-49008-8_35.

Rodríguez Rey, S., Bernárdez Braña, A., and Garcia, M. (2025). Exploring Linguistic Features in a New Readability Corpus for Spanish. Procesamiento del Lenguaje Natural, 74:221-239. Available online [link].

Santos, R., Rodrigues, J., Branco, A., and Vaz, R. (2021). Neural Text Categorization with Transformers for Learning Portuguese as a Second Language. In Proceedings of the Portuguese Conference on Artificial Intelligence (EPIA), pages 715-726. DOI: 10.1007/978-3-030-86230-5_56.

Santucci, V., Santarelli, F., Forti, L., and Spina, S. (2020). Automatic Classification of Text Complexity. Applied Sciences, 10(20):7285. DOI: 10.3390/app10207285.

Sawarkar, K., Mangal, A., and Solanki, S. R. (2024). Blended RAG: Improving RAG (Retriever-Augmented Generation) Accuracy with Semantic Search and Hybrid Query-Based Retrievers. In Proceedings of the International Conference on Multimedia Information Processing and Retrieval (MIPR), pages 155-161. DOI: 10.1109/MIPR62202.2024.00031.

Scarton, C. E. and Aluísio, S. M. (2010). Análise da Inteligibilidade de Textos via Ferramentas de Processamento de Língua Natural: Adaptando as Métricas do Coh-Metrix para o Português. Linguamática, 2(1):45-61. Available online [link].

Sung, Y. T., Lin, W. C., Dyson, S. B., Chang, K. E., and Chen, Y. C. (2015). Leveling L2 Texts through Readability: Combining Multilevel Linguistic Features with the CEFR. The Modern Language Journal, 99(2):371-391. DOI: 10.1111/modl.12213.

Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux, M.-A., Lacroix, T., Rozière, B., Goyal, N., Hambro, E., Azhar, F., Rodriguez, A., Joulin, A., Grave, E., and Lample, G. (2023). LLaMA: Open and Efficient Foundation Language Models. Computing Research Repository, arXiv:2302.13971. DOI: 10.48550/arXiv.2302.13971.

Wei, J., Wang, X., Schuurmans, D., Bosma, M., Xia, F., Chi, E., Le, Q. V., Zhou, D., et al. (2022). Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. Proceedings of the Conference on Neural Information Processing Systems (NeurIPS), pages 24824-24837. Available online [link].

Wilkens, R., Alfter, D., Wang, X., Pintard, A., Tack, A., Yancey, K., and François, T. (2022). FABRA: French Aggregator-Based Readability Assessment Toolkit. In Proceedings of the Language Resources and Evaluation Conference (LREC), pages 1217-1233. Available online [link].

Wolf, T., Debut, L., Sanh, V., Chaumond, J., Delangue, C., Moi, A., Cistac, P., Rault, T., Louf, R., Funtowicz, M., Davison, J., Shleifer, S., von Platen, P., Ma, C., Jernite, Y., Plu, J., Xu, C., Scao, T. L., Gugger, S., Drame, M., Lhoest, Q., and Rush, A. M. (2020). HuggingFace's Transformers: State-of-the-art Natural Language Processing. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP): System Demonstrations, pages 38-45. DOI: 10.48550/arXiv.1910.03771.

Xia, M., Kochmar, E., and Briscoe, T. (2016). Text Readability Assessment for Second Language Learners. In Proceedings of the Workshop on Innovative Use of NLP for Building Educational Applications, pages 12-22. DOI: 10.18653/v1/W16-0502.

Yancey, K., Pintard, A., and Francois, T. (2021). Investigating Readability of French as a Foreign Language with Deep Learning and Cognitive and Pedagogical Features. Lingue e Linguaggio, 20(2):229-258. DOI: 10.1418/102814.

Downloads

Published

2025-08-21

How to Cite

Ribeiro, E., Antunes, D., Mamede, N., & Baptista, J. (2025). Exploring Few-Shot Approaches to Automatic Text Complexity Assessment in European Portuguese. Journal of the Brazilian Computer Society, 31(1), 690–710. https://doi.org/10.5753/jbcs.2025.5820

Issue

Section

Articles