Journal of the Brazilian Computer Society

A Semiparametric Approach to Mitigating the Impact of Outliers in ROC Curve Generation for Image Analysis

2025-07-04T21:16:35+00:00

Artificial intelligence enables the development of machine learning algorithms that can identify and categorize patterns using large amounts of data across various areas. Computational tools were created to analyze these algorithms, allowing for the validation and comparison of their results. The Receiver Operating Characteristic (ROC) is an important statistical technique used for analyzing binary classification models. A ROC curve is commonly utilized in image analysis as a validation metric to compare images generated by a classification model with images created by humans, referred to as Ground Truth (GT). Currently, machine learning algorithms produce ROC curves with a limited number of points, even when trained on large-scale datasets. The result is the presence of outliers which can significantly distort the ROC curve, potentially leading to inaccurate conclusions about the model's performance. This study introduces a novel method for preventing outliers in the creation of ROC curves, guaranteeing a reliable and robust evaluation of image classification models. We implemented our algorithm in Python using a dataset of 1000 grayscale contour images. Performance was compared against Logistic Regression, SVM, Random Forests and SKlearn using ROC curves, AUC, precision, accuracy, and F1-score. Statistical significance was assessed via paired t-tests and Cohen’s d for effect size, with outlier detection via Local Outlier Factor. Results demonstrated that SPROC showed a refined curve with more precise AUC values on noisy images in contrast to machine learning approaches.

Topic Taxonomy Generation with LLMs for Enriched Transaction Tagging

2025-08-15T18:02:35+00:00

This work presents an unsupervised method for tagging banking consumers’ transactions using automatically constructed and expanded topic taxonomies. Initially, we enrich the bank transactions via web scraping to collect relevant descriptions, which are then preprocessed using NLP techniques to generate candidate terms. Topic taxonomies are created using instruction-based fine-tuned LLMs (Large Language Models). To expand existing taxonomies with new terms, we use zero-shot prompting to determine where to add new nodes. The resulting taxonomies are used to assign descriptive tags that characterize the transactions in the retail bank dataset. For evaluation, 12 volunteers completed a two-part form assessing the quality of the taxonomies and the tags assigned to merchants. The evaluation revealed a coherence rate exceeding 90% for the chosen taxonomies. Additionally, taxonomy expansion using LLMs demonstrated promising results for parent node prediction, with F1-scores of 89% and 70% for Food and Shopping taxonomies, respectively.

Complex Interactions in Dialog Systems for Brazilian Portuguese: A Comparison of RAG Approaches

2025-07-15T17:56:04+00:00

Retrieval-Augmented Generation (RAG) has emerged as a key technique in enhancing the capabilities of Large Language Models (LLMs) by incorporating external knowledge sources into the response generation process. This paper presents a comparative analysis of various RAG approaches applied to dialog systems in Brazilian Portuguese. The study explores multiple retrieval strategies, including VectorRAG, GraphRAG, MemoRAG, HybridRAG, and HippoRAG, assessing their performance in handling complex queries, multi-turn conversations, and contextual disambiguation.We evaluate these models in the banking context using real-world datasets from two case studies. The analysis highlights the strengths and limitations of each method.Experimental results indicate that context-aware retrieval strategies improve response accuracy when addressing ambiguous or multi-faceted user queries. However, trade-offs in computational efficiency and response time remain critical challenges. Our findings provide insights into optimizing dialog systems for Brazilian Portuguese, paving the way for domain-specific conversational agents in financial and other specialized applications.

Evaluating LLMs on Argument Mining Tasks in Brazilian Portuguese Debate Data

2025-07-13T16:39:52+00:00

This study investigates Argument Mining (AM) in Brazilian Portuguese data, focusing on audio transcriptions of semi-structured debates. It proposes an experimental setup to evaluate the effectiveness of Large Language Models (LLMs) in AM tasks. The research addresses key challenges in the field, such as the lack of universally accepted definitions, the absence of a cohesive theoretical framework for dataset standardization, the limited availability of annotated datasets, and underexplored evaluation methods for Artificial Intelligence (AI) models, particularly LLMs. Aiming to bridge these gaps, especially in the underrepresented Brazilian Portuguese context, the study employs multiple prompt engineering strategies, including Single-Prompt, 2-Prompts, and 4-Prompts. The 4-Prompts approach, which integrates few-shot and chain-of-thought (CoT) prompting, demonstrated the best overall performance. The evaluated LLMs include ChatGPT-3.5 Turbo, ChatGPT-4, Gemini, LLaMA 70B, and Sabiá 3. Results show that while LLMs can achieve up to 74% F1 score in basic argument detection, their performance significantly drops in more complex AM tasks that require nuanced interpretation, with a maximum F1 score of 43%. Comparisons with Portuguese-specialized models such as Sabiá 3 revealed similar or inferior performance compared to multilingual models. Surprisingly, LLaMA 70B emerged as the best-performing model across most AM tasks. These findings underscore the need for continued development of AM methodologies and highlight the importance of expanding Natural Language Processing (NLP) research to languages beyond English.

Environmental Monitoring with Low-Processing Embedded AI through Sound Event Classification

2024-06-17T14:49:22+00:00

In this work, we propose an embedded low-processing Machine Learning solution designed to assist in environmental acoustic monitoring. The pre-processing stage employs the Wavelet Packet Transform, generating low-dimensional features that serve as inputs to a Gradient Boosting model for the near-real-time classification of relevant sound events. Subsequently, we introduce an event filter that checks if there is any relevant event occurring at the moment before sending the features to the model or ignores them until any sound event is detected. This approach enhances the robustness of our solution, making it resilient to noise and wind-contaminated samples while optimizing memory, battery, and computational power usage. Finally, we converted the processing pipeline and trained model to the C programming language, successfully embedding them into the Nordic Thingy:53, a low-power hardware device equipped with a built-in digital Pulse Density Modulation microphone (VM3011 from Vesper). To evaluate the efficacy of our proposed method, we compared it with a convolutional neural network approach using Mel-frequency cepstral coefficients and conducted tests using audio recordings of bird species found in forests located in the central and western regions of Brazil, as well as samples of human activity-related sounds. The favorable classification scores obtained, in conjunction with the embedded solution's substantial battery life capacity, have the potential to greatly reduce the necessity for extensive environmental monitoring field surveys.

Assessing the Capability of LLMs in Solving POSCOMP Questions

2025-03-29T21:17:35+00:00

Recent advancements in Large Language Models (LLMs) have significantly expanded the capabilities of artificial intelligence in natural language processing tasks. Despite this progress, their performance in specialized domains such as computer science remains relatively unexplored. Understanding the proficiency of LLMs in these domains is critical for evaluating their practical utility and guiding future developments. The POSCOMP, a prestigious Brazilian examination used for graduate admissions in computer science promoted by the Brazlian Computer Society (SBC), provides a challenging benchmark. This study investigates whether LLMs can match or surpass human performance on the POSCOMP exam. Four LLMs – ChatGPT-4, Gemini 1.0 Advanced, Claude 3 Sonnet, and Le Chat Mistral Large – were evaluated on the 2022 and 2023 POSCOMP exams. The assessments measured the models’ proficiency in handling complex questions typical of the exam. LLM performance was notably better on text-based questions than on image interpretation tasks. In the 2022 exam, ChatGPT-4 led with 57 correct answers out of 70 questions, followed by Gemini 1.0 Advanced (49), Le Chat Mistral (48), and Claude 3 Sonnet (44). Similar trends were observed in the 2023 exam. ChatGPT-4 achieved the highest performance, surpassing all students who took the POSCOMP 2023 exam. LLMs, particularly ChatGPT-4, show promise in text-based tasks on the POSCOMP exam, although image interpretation remains a challenge. Given the rapid evolution of LLMs, we expanded our analysis to include more recent models -- o1, Gemini 2.5 Pro, Claude 3.7 Sonnet, and o3-mini-high -- evaluated on the 2022–2024 POSCOMP exams. These newer models demonstrate further improvements and consistently surpass both the average and top-performing human participants across all three years.

Infrastructure and tool support for MDE in the petrochemical industry automation

2025-01-28T12:06:34+00:00

Automation is essential for productivity and safety in the oil industry, but designing control systems often requires multiple software tools, leading to redundant plant modeling (costing time and money) and creating possible inconsistencies. The Model-Driven Engineering for Petrochemical Industry Automation (M4PIA) approach streamlines automation design by enabling the interoperability of models between different tools. Currently, M4PIA integrates EMSO for process simulations and MPA for deploying real plant applications. Besides, it supports high-level, graphical automation models design, also providing a component library. This paper aims to detail M4PIA, positioning it within the state-of-the-art, and illustrate its application through the deployment of an oil and gas automation system. This case study highlights M4PIA’s ability to handle complex, real-world systems, demonstrating the platform’s capability to optimize modeling stages through domain-specific languages (DSLs) and automated transformations. The results show that M4PIA not only reduces development time but also enhances system reliability and maintainability. By bridging simulation and deployment tools, M4PIA establishes a solid foundation for efficient and robust application development in the oil and gas sector. This platform represents a significant step forward in advancing model-driven engineering for industrial automation.

Optimizing Business Efficiency: The Strategic Potential of Low-Code Tools in Developing Information Management Systems and Their Benefits for Companies

2025-03-10T03:38:41+00:00

The quest for business efficiency encourages organizations to adopt innovative strategies for optimizing their internal processes. One of the biggest challenges faced by the companies is to manage high quantity of information efficiently for strategic decision-making. This article discusses the development of information management systems by using low-code tools, as an approach to improve efficiency in data management at internal process for the chosen company, without the need of high-level expert developers. The paper is carried out as action research, which is constructed in three stages: process mapping, data collection and results’ analysis, considering the use of current low-code tools by non-developers in this area, creating a robust and adaptable system, showing advantages and impact on operational performance with post-implementation comparison by system users. During the mapping, the main activity was to understand the process and workflows from the company’s task which is under analysis. Data collection occurred through observation and structured questionnaire. At the end, data analysis was carried out through descriptive statistical analysis. The research presents in its results the development of a new information management system, web-based, where the main advantages identified were on accuracy, data collection, dissemination and the increase in operational efficiency due to the use of the new system.

The Topics of Depression on Social Networking Sites

2025-07-04T11:31:57+00:00

While depressive linguistic expressions have been extensively studied in traditional clinical contexts, there has been comparatively little attention devoted to modeling how both depressed and non-depressed individuals express their symptoms on social networking sites as a holistic thematic process. This study addresses this gap by examining how depression is expressed linguistically on social networking sites using various topic modeling techniques, including an innovative methodology based on LLMs. We use datasets in the Brazilian Portuguese language gathered from Instagram, Reddit, and X. Our evaluation reveals that while common themes related to depression emerge across different social networking sites, each platform's unique characteristics influence the thematic content. Reddit discussions focus on symptomatology, Instagram on travel and positive emotions, and Twitter on everyday life and media. The LLM-based approach produced more interpretable topics with a higher embedding-based coherence metric, whereas traditional methods often resulted in noisy and less internally coherent topics. This research contributes to a deeper understanding of the holistic online expressions of depression and highlights the potential of advanced topic modeling techniques to reveal subtle aspects of mental health discussions online.

Efficient Number of Functional Units and Loop Pipeline Design Space Exploration for High-Level Synthesis

2025-05-30T17:06:36+00:00

High-level synthesis compilers offer numerous directives for controlling hardware architecture implementations, leading to highly customized solutions, but also to large design spaces that are impractical to be fully explored due to the time-consuming stages of hardware compilation and synthesis. Traditional design space exploration approaches aim to identify architectures with the best hardware resources-performance balance. However, they usually consider the compilation process as a black box, failing to leverage relationships between directives and evaluation metrics to improve their efficiency. This paper analyses the relationship between the "number of functional units" and "loop pipeline" directives, which allow for balancing hardware area and computation time. For the former, we propose a novel path-based method to solve the shortcomings of traditional exploration approaches. For the latter, we propose a novel incremental exploration flow based on a Pareto-frontier evaluation. Results show improvements in exploration speed and quality of hardware designs when compared to established methods.

The Influence of Age, Gender, and Gaming Experience on Robot Control Interface: An Exploratory Study

2025-04-14T11:48:56+00:00

This study investigated how three distinct control modes—dual-joystick, smartphone gyroscope, and PlayStation 5 (PS5) controller—affect user experience variables (challenge, competence, flow, tension) during interaction with a robotic car, considering demographic factors such as age, gender, and gaming experience. Using an experimental design with 30 participants, non-parametric analyses revealed significant differences in performance and perception across control modes. The PS5 controller demonstrated superior efficiency, with fewer path deviations (p = 0.032) and faster task completion (p = 0.012) compared to the smartphone gyroscope. Marginal tension differences (p = 0.054) favored the PS5 controller over the smartphone interface. Gender differences emerged in task completion time (Condition 1: p = 0.038) and perceived competence/positive affect (Condition 3: p < 0.05), with males reporting higher competence and satisfaction. Generational disparities were context-specific: Generation Z exhibited lower negative affect than Generation Y (p = 0.009) in the PS5 condition, likely due to greater console familiarity. Prior gaming experience enhanced adaptation, with advanced users showing higher competence (p = 0.033) and flow (p = 0.023) in smartphone control. These findings underscore the need for adaptive interface designs that account for gender-specific preferences, generational familiarity (e.g., gaming consoles for younger users), and prior gaming expertise. The study advocates for personalized Human-Robot Interaction solutions to improve inclusivity and efficiency across applications, from entertainment to industrial robotics.

Training of English Listening and Speaking Using Virtual Reality Technology

2025-04-19T02:20:51+00:00

This paper briefly introduces virtual reality (VR) technology and the basic application process for English listening and speaking training. A case study was conducted with freshmen from the School of Foreign Languages at Hubei University of Science and Technology. The long short-term memory (LSTM) algorithm used for scoring spoken English levels was analyzed and compared with the traditional back-propagation neural network (BPNN) and recurrent neural network (RNN) algorithms. The students were divided into a control group and an experimental group for five-week training. The English listening and speaking levels of these students were tested before and after the training. The results showed that the LSTM algorithm had the highest accuracy and efficiency in scoring spoken English. After the training, the English speaking and listening scores of the students in the experimental group, which used VR technology, were significantly improved.

A Comparative Study of Artificial Intelligence Classification Models for Analyzing Vitiligo Effects in Zebrafish (Danio rerio) Images

2025-05-20T19:37:35+00:00

In recent years, the term Artificial Intelligence (AI) has gained recognition across various fields, such as healthcare, finance, agriculture, and many other sectors due to its versatility and capacity to improve outcome. Within the applications of AI, classification stands out as one of the most prevalent, aiding in optimizing decision-making and efficiently organizing data. In recent years, there has been an increase in the use of zebrafish (Danio rerio) in studies related to human dermatological diseases, such as vitiligo, which involves the autoimmune-mediated destruction of the melanocytes in the epidermal layer. Despite the current interest in studies related to this disease, no papers were found applying Machine Learning (ML) or Deep Learning (DL) models to classify the effects of the disease and its treatments. In this context, this paper uses the challenge of evaluating the effects of vitiligo in zebrafish to compare the performance of different AI approaches. The methodology employed in this paper includes image acquisition, dataset creation, preprocessing, model testing, and evaluation. The ML models applied in this study were Support Vector Machine (SVM), Random Forest (RF), K-Nearest Neighbors (KNN), and Extreme Gradient Boosting (XGBoost), while the DL models included Visual Geometry Group 16 (VGG16) and GoogLeNet. Following the evaluation, SVM and GoogLeNet achieved the best results, correctly classifying 80% and 71% of the data, respectively. Moreover, the former accurately identified all samples in the healthy and treated classes, with misclassifications occurring only within the sick class. The models performed satisfactorily in relation to the objectives of this study and the results exhibited potential for future applications in treating vitiligo in humans.

A Review of Interpretability Methods for Gradient Boosting Decision Trees

2025-04-28T10:48:52+00:00

This survey examines interpretability methods used or proposed for Gradient Boosting Decision Trees, which are advanced machine learning algorithms based on decision trees. The studies analyzed were gathered using synonyms for "explainability" combined with synonyms for "method," as well as synonyms for "Gradient Boosting Decision Trees." The proposed or applied approaches are classified by their techniques and described in detail. Among these methods, we recommend using SHAP values to rank features based on their relevance, as this approach aligns well with the structure of Gradient Boosting Decision Trees. Additionally, we suggest considering inTrees, RULECOSI+, and Tree Space Prototypes when applicable.

Containerized Testbed Architecture for Cybersecurity Data Collection on Malicious Activities in Industrial Water Systems

2025-05-15T13:12:40+00:00

Detecting malicious activities in Information Technology (IT) is a critical component of cybersecurity and is essential for identifying threats and attacks on systems, networks, and resources. However, security of industrial control systems, an area of increasing concern due to the convergence of IT with Operational Technology (OT), requires new approaches. This work proposes a novel containerized testbed architecture for industrial water systems, implemented using open-source technologies and structured according to the Purdue reference model, which is widely used in industrial control and automation systems. The architecture aims to provide a secure and efficient environment. The experiments demonstrate that the proposed architecture enables the simulation of computational devices behavior in water systems under different scenarios, allowing data to be collected for the detection of malicious activities, such as denial-of-service attacks and command injection. The results highlight the architecture's relevance to advancing research and development efforts aimed at enhancing the security of critical water infrastructure.

Enhancing Video Quality Using a Multi-Domain Spatio-Temporal Deformable Fusion Approach

2025-05-19T14:23:23+00:00

Video compression is essential for efficient data management but often introduces artifacts that degrade visual quality. This work presents the Multi-Domain Spatio-Temporal Deformable Fusion (MD-STDF) architecture, which employs a multi-domain learning approach to enhance the quality of compressed videos across codecs like HEVC, VVC, AV1, and VP9. By training on multiple domains, the model effectively adapts to diverse artifact patterns and compression scenarios. Experimental results show that MD-STDF achieves significant quality improvements, with average ∆PSNR gains of 0.764 dB for HEVC, 0.695 dB for VVC, 0.359 dB for VP9, and 0.228 dB for AV1. The model also demonstrated resilience under different compression rates, with BD-Rate values indicating that video quality can be efficiently restored in high compression scenarios for VP9 (-16.50%) and HEVC (-14.59%). Visual analysis shows a reduction in artifacts, resulting in perceptible improvements in subjective quality.

Beyond Recommendations: Intrinsic Evaluation Strategies for Item Embeddings in Recommender Systems

2025-05-06T14:40:10+00:00

With the constant growth in available information and the widespread adoption of technology, recommender systems have to deal with an ever-growing number of users and items. To alleviate problems of scalability and sparsity that arise with this growth, many recommender systems aim to generate low-dimensional dense representations of items. Among different strategies with this shared goal, e.g., matrix factorization and graph-based techniques, neural embeddings have gained significant attention in recent literature. This type of representation leverages neural networks to learn dense vectors that encapsulate intrinsic meaning. However, most studies proposing embeddings for recommender systems, regardless of the underlying strategy, tend to ignore this property and focus primarily on extrinsic evaluations. This study aims to bridge this gap by presenting a guideline for assessing the intrinsic quality of matrix factorization and neural-based embedding models for collaborative filtering. To enrich the evaluation pipeline, we adapt an intrinsic evaluation task commonly used in Natural Language Processing and propose a novel strategy for evaluating the learned representation in comparison to a content-based scenario. We apply these techniques to established and state-of-the-art recommender models, discussing and comparing the results with those of traditional extrinsic evaluations. Results show how vector representations that do not yield good recommendations can still be useful in other tasks that demand intrinsic knowledge. Conversely, models excelling at generating recommendations may not perform as well in intrinsic tasks. These results underscore the importance of considering intrinsic evaluation, a perspective often overlooked in the literature, and highlight its potential to uncover valuable insights about embedding models.

A sentence similarity-based approach for enhancing entity linking

2025-04-28T21:14:35+00:00

Entity linking involves associating mentions of entities in natural language texts, such as references to people or locations, with specific entity representations in knowledge graphs like DBpedia or Wikidata. This process is essential in natural language processing tasks, as it aids in disambiguating entities in unstructured data, thereby improving comprehension and semantic processing. However, entity linking faces challenges due to the complexity and ambiguity of natural languages, as well as discrepancies between the forms of textual entity mentions and entity representations. Building upon our previous work, this study extends the E-BELA --Enhanced Embedding-Based Entity Linking Approach, which is based on literal embeddings. We extend our previous work by evaluating E-BELA using a new dataset, conducting a comprehensive analysis of failure cases and limitations, and providing further discussion of our results. E-BELA associates mentions and entity representations using a similarity or distance metric between vector representations of them in a shared vector space. The results suggest that our approach achieves comparable performance to other state-of-the-art methods, while employing a much simpler model, contributing to the field of natural language processing.

L-PRISM: A Domain-Specific Language for Describing Multimedia Service Function Chains

2025-04-29T01:24:18+00:00

Virtualization has emerged as a key technology for handling the complexity of heterogeneous environments, such as the Internet of Things (IoT) and multimedia systems. In this context, multimedia sensors represent an important data source that contributes to the development of the Internet of Media Things (IoMT) paradigm. Using the concepts of virtualization and IoMT, a multimedia Virtual Network Function (multimedia VNF) has been introduced to encapsulate virtualized devices and software components for processing multimedia streams. Frequently, multimedia streams require a sequence of processing steps, forming what is known as multimedia Service Function Chains (multimedia SFCs). With the advancement of technologies like augmented reality and autonomous vehicles, which require processing complex multimedia streams, often handled by multimedia SFCs, the deployment continues to be highly challenging. Currently, the execution of multimedia SFCs is performed by creating and binding components individually, requiring manual configuration and low-level integration. This increases the development time and probability of errors. Moreover, existing approaches lack a standardized mechanism for describing multimedia SFCs, resulting in ad hoc solutions with low interoperability. Despite the importance of describing multimedia SFCs, there has been limited research on this topic. To bridge this gap, we propose a novel metamodel named M-PRISM, designed to serve as the conceptual foundation for describing multimedia stream processing. Using this metamodel, we create a new Domain-Specific Language (DSL) named L-PRISM, tailored to describe multimedia SFCs. Additionally, we introduce a novel architecture for executing multimedia SFCs, demonstrated through a Proof-of-Concept (PoC) which follows the proposed architecture and executes multimedia SFCs described in L-PRISM. Our approach was evaluated through experiments with software developers, focusing on aspects such as expressiveness, ease of adoption, and reduction of configuration complexity.In the experiments we adopted the Goal Question Metric (GQM), Technology Acceptance Model (TAM) and Cognitive Dimensions of Notations (CDN) approaches, and the results indicate that L-PRISM and its PoC facilitate the definition and deployment of multimedia SFCs based on multimedia VNFs.

MLISP: Machine-Learning-based ISP Decision Scheme for VVC Encoders

2025-07-01T12:59:29+00:00

The Versatile Video Coding (VVC) standard achieves high compression rates by introducing new encoding tools, such as the Intra Subpartition Prediction (ISP). However, the ISP increases the computational effort necessary to perform the mode decision in the intra-prediction step. In this paper, we propose the MLISP, a machine learning-based ISP decision scheme for VVC encoders where two solutions are adopted to accelerate the intra-mode decision process for the ISP tool. The first solution, named ISP Skip Decision, utilizes a Decision Tree trained with image features that predicts whether the evaluation of the ISP tool is necessary, resulting in an average time saving of 8.53% with only 0.22% of coding efficiency loss. The second solution called ISP Mode Decision, uses a Decision Tree trained with encoding features to predict the optimal class of intra modes between Planar/DC and Angular to be evaluated with the ISP tool, obtaining an average time saving of 7.01% with only 0.19% of coding efficiency loss. By combining these solutions, MLISP achieves an average time saving of 10.97% with only 0.32% loss in coding efficiency, demonstrating its effectiveness in reducing encoding time with minimal impact on compression performance. Compared with related works, MLISP achieves competitive results and introduces a novel approach for optimizing the ISP decision.

Fake News Detection in Portuguese Under Large Language Model-Generated Content

2025-07-12T19:51:46+00:00

We are daily exposed to fake news, a growing problem that spreads in various forms, including rumours, advertisements, social media posts, and political propaganda. Predominantly created by humans, in recent years, we have witnessed an increase of digital content fabricated or manipulated with the use of deep learning. Large Language Models (LLMs), for instance, represent a real threat if used to generate highly convincing fake news that could evade conventional detection systems. This study evaluates the impact of LLM-generated fake news on machine learning (ML) classifiers. The ML models are trained with Portuguese-language datasets and experiments are conducted using aligned data, where each fake news sample has its true news counterpart. We assess the performance of each ML model with synthetic fake news, which was generated using a Portuguese-based LLM, namely Sabiá-3. Our results reveal significant performance degradation of ML models when assessed under mismatch conditions, e.g., when they are trained with human-generated content, and tested with LLM-generated fake news (or vice-versa). These findings highlight the need for updated detection strategies capable of handling the linguistic and stylistic nuances introduced by LLMs. To address that, a Retrieval-Augmented Generation (RAG) framework was evaluated under the same conditions as the ML models. The framework showed to be more robust under mismatch conditions, whereas ML models provided better performance when there was no distribution shift between train and test data. These results contribute to the understanding of fake news detection in Portuguese, emphasizing the importance of adapting existing models to the evolving nature of misleading LLM-generated content.

Rating Prediction in Brazilian Portuguese: A Benchmark of Large Language Models

2025-05-25T21:22:26+00:00

This study evaluates the performance of Large Language Models (LLMs) in predicting ratings for Brazilian Portuguese user reviews. We benchmark ten LLMs, including ChatGPT-3.5, ChatGPT-4o, DeepSeek, Mistral, LLaMA (3, 3.3), Gemma (1, 2), and the Brazilian Portuguese-specific models Sabiá-3 and Sabiazinho, using two prompting strategies: simple (p1) and detailed (p2). Results indicate that ChatGPT-4o and DeepSeek achieved the highest accuracy, particularly in predicting extreme ratings (1 and 5 stars). Sabiá-3 also performed competitively, highlighting the potential of language-specific models. Models performed better in objective categories such as food and baby products but struggled with more subjective domains like automotive and games. Cost analysis showed that DeepSeek is a more cost-effective alternative to ChatGPT-4o while maintaining similar accuracy. This study provides a systematic benchmark of LLMs for rating prediction in Brazilian Portuguese, offering insights into their effectiveness and limitations.

Enhancing Brazilian Legal Information Retrieval: An Automated Keyphrase Generation

2025-04-20T17:19:38+00:00

The volume of legal proceedings in Brazil has grown significantly in recent years, highlighting the potential for leveraging advances in Natural Language Processing (NLP) to automate tasks in the legal domain. This study explores text decoding methods for automating keyphrase generation—sequences of key terms that summarize legal documents. A sequence-to-sequence Transformer-based framework generates keyphrases using three decoding techniques: greedy, top-K, and top-p sampling. To evaluate the effectiveness of the generated keyphrases, we integrate them into legal document retrieval tasks using traditional Information Retrieval (IR) methods, such as TF-IDF and BM25. Our results, validated through IR metrics, demonstrate that greedy decoding produces high-quality keyphrases that closely align with those written by human specialists, achieving statistically significant improvements in retrieval performance. As part of this study, we introduce a new data set of Brazilian legal documents, including dates and pre-processed keyphrases, which allows reproducibility and supports further research on keyphrase generation and legal document retrieval tasks.

User Experience Evaluation in Virtual Museum Tours: A Scoping Review (2014-2024)

2025-06-13T12:10:11+00:00

Virtual museum tours are crucial tools for expanding accessibility to cultural heritage, offering immersive experiences that overcome geographic and socioeconomic barriers. Despite several studies on user experience (UX) in this context, a comprehensive compilation of these works is lacking, which is vital for identifying research opportunities and guiding advancements. This review consolidates findings from 20 studies published between 2014 and 2024, following four phases: identification, screening, eligibility, and inclusion. Studies were selected from seven databases, including ACM Digital Library and ScienceDirect. The main objective was to identify the most commonly used methodological approaches and the existing gaps in the literature. The review highlights the predominance of standardized questionnaires, qualitative interviews, behavioral observations, and structural equation modeling emerged as primary UX evaluation methods. The findings highlight usability, engagement, accessibility, and the mitigation of physical discomfort as critical for user satisfaction, emphasizing the need for standardized evaluation methods and broader user inclusion. Future research should focus on unified frameworks and expanding studies to underrepresented regions. Advancing technology is expected to foster more inclusive, accessible, and engaging virtual experiences, promoting democratized access to cultural heritage.

Sequence Labeling in Product Descriptions on Invoices: Comparing LLM-based settings with a CRF baseline

2025-07-07T17:14:13+00:00

Electronic invoices are present in most commercial transactions since several countries require their issue in the purchase, sale, and transportation of goods. The accurate identification of elements within these invoices is crucial for governmental oversight, aiding in tasks such as detecting overpricing in public contracts. However, this identification is a challenge due to the diversity of products, as well as variations and errors in filling out the information. This article aims to compare the performance of a model developed using a traditional Conditional Random Fields (CRF) technique for the task with models based on large language models adapted for this task. The goal is to assess whether language models can be effectively used to improve the performance in this scenario. The paper assesses the use of several modeling approaches, including the influence of language in the base model (Portuguese-specific vs. Multilingual BERT), as well as alternatives for the classification head (fine-tuning with a linear layer vs. feature-extraction with BiLSTM and a linear layer, with or without a CRF layer). The best model, which combines a Portuguese BERT-based approach with a Conditional Random Fields layer, achieves an F1-score improvement of approximately 4% over the baseline model that relies solely on CRF. The tests used data from invoices issued in Brazil in 2021 in the context of public contracts.

Augmented Reality for Cognitive Screening in Neurodegenerative Diseases: A Ten-Year Systematic Review

2025-06-13T11:58:09+00:00

Augmented Reality (AR) is increasingly explored as a low-burden alternative to pencil-and-paper cognitive tests for dementia and Parkinson’s Disease. Our objective with this review is to synthesize ten years (2014-2024) of empirical evidence on AR-based cognitive screening, estimate pooled diagnostic accuracy, and distil user experience (UX) guidelines for people with neurodegenerative disorders. We searched Scopus with the string “augmented reality” AND cognitive AND (dementia OR Parkinson), screened 399 records, and retained 38 primary studies. Two reviewers independently extracted sample, task, hardware, and accuracy metrics. Optical see-through AR improved test sensitivity over matched non-immersive tests, while projection-based AR offered the largest UX gains. Hardware cost and eye-tracker drift were the main precision bottlenecks. AR can raise both diagnostic sen sitivity and patient engagement, but only four studies used clinical-stage participants. Future work should couple low-cost hand-held AR with cloud inference to widen accessibility.

Enhancing Large Language Models for Underrepresented Varieties: Pretraining Strategies in the Galician-Portuguese Diasystem

2025-04-29T19:56:41+00:00

This study presents a systematic exploration of strategies for pretraining generative Large Language Models (LLMs) within the Galician-Portuguese diasystem, by focusing on two underrepresented varieties of this diasystem, namely European Portuguese and Galician. We investigate the impact of combining versus separating linguistic varieties during continued pretraining, the trade-offs between large-scale noisy data and smaller high-quality corpora, and the potential gains from incorporating instruction-based data during the training phase instead of in post-training (e.g., instruction tuning). Our findings show that the inclusion of language varieties in training enhances both task-solving performance and linguistic quality in text generation, especially when leveraging curated linguistic resources. By integrating technical experimentation with sociolinguistic insight, this work underscores the importance of equitable and context-aware LLM development in multilingual and minority-language settings.

RagPharma: A RAG-Based Chatbot for Medicine Leaflets with a Dual-Dataset Evaluation Framework

2025-07-10T17:05:24+00:00

Despite being essential sources of information, Brazilian medicine package leaflets remain underutilized due to their complexity and lack of user-friendly tools for information retrieval. Currently, there are no chat-based systems in Portuguese designed to assist patients in accessing and understanding leaflet content. To address this gap, we present RagPharma, a novel Retrieval-Augmented Generation (RAG) system that integrates professional medicine leaflets into a chat interface to answer patient queries. During RagPharma's development, we observed that evaluation performance was significantly higher when using questions derived from the same dataset used to build the system. This led to the identification of a critical evaluation bias, often overlooked in RAG applications. In response, we propose a novel dual-dataset evaluation framework, which separates the knowledge base and the evaluation source in distinct, but related, datasets. Experimental results confirmed the presence of bias when using overlapping datasets and demonstrated the reliability of our dual-dataset methodology. Under this new evaluation scheme, RagPharma achieved 81% accuracy using the Mistral 7B model—representing a 60% improvement over standalone LLMs. These findings validate both the effectiveness of RagPharma and the importance of unbiased evaluation strategies in domain-specific RAG systems.

Long-Text Abstractive Summarization using Transformer Models: A Systematic Review

2025-07-29T19:04:37+00:00

Transformer models have significantly advanced abstractive summarization, achieving near-human performance. However, while effective for short texts, long-text summarization remains a challenge. This systematic review analyzes 56 studies on transformer-based long-text abstractive summarization published between 2017 and 2024, following predefined inclusion criteria. Findings indicate that 69.64% of studies adopt a hybrid approach while 30.36% focus on improving transformer attention mechanisms. News articles and scientific papers are the most studied domains, with widely used datasets including CNN/Daily Mail, PubMed, arXiv, GovReport, QMSum, and XSum. ROUGE is the dominant evaluation metric (61%), followed by BERTScore (20%), with others such as BARTScore, human evaluation, METEOR, and BLEU-4 also used. Despite progress, challenges persist, including contextual information loss, high computational costs, implementation complexity, lack of standardized evaluation metrics, and limited model generalization. These findings highlight the need for more robust hybrid approaches, efficient attention mechanisms, and standardized evaluation frameworks to enhance long-text abstractive summarization. This review provides a comprehensive analysis of existing methods, datasets, and evaluation techniques, identifying research gaps and offering insights for future advancements in transformer-based long-text abstractive summarization.

Building High-Quality Datasets for Portuguese LLMs: From Common Crawl Snapshots to Industrial-Grade Corpora

2025-07-04T14:07:15+00:00

The performance of large language models (LLMs) is deeply influenced by the quality and composition of their training data. While much of the existing work has centered on English, there remains a gap in understanding how to construct effective training corpora for other languages. We explore scalable methods for building web-based corpora for LLMs. We apply them to build a new 120B token corpus in Portuguese that achieves competitive results to an industrial-grade corpus. Using a continual pretraining setup, we study how different data selection and preprocessing strategies affect LLM performance when transitioning a model originally trained in English to another language. Our findings demonstrate the value of language-specific filtering pipelines, including classifiers for education, science, technology, engineering, and mathematics (STEM), as well as toxic content. We show that adapting a model to the target language leads to performance improvements, reinforcing the importance of high-quality, language-specific data. While our case study focuses on Portuguese, our methods are applicable to other languages, offering insights for multilingual LLM development.

Large Languages Models in Brazilian Portuguese: A Chronological Survey

2025-07-25T20:34:26+00:00

The era of Large Language Models (LLMs) started with OpenAI’s GPT-3 model, and the popularity of LLMs has increased exponentially after the introduction of models like ChatGPT and GPT4 that demonstrated remarkable capabilities in natural language processing tasks. LLMs are a special class of pre-trained language models (PLMs) obtained by scaling model size, pretraining corpus, and use of computational power. Large PLMs can be valuable assets, especially for languages such as Portuguese to capture the cultural and knowledge richness inherent in the language. In this sense, this survey encompasses, based on the existing scientific literature, an overview of the research development with LLMs on Brazilian Portuguese (PT-BR-LLMs). The objective is to bring a self-contained, comprehensive overview of PT-BR-LLMs advancements, architectures, and resources. This survey is intended not only to provide a systematic survey but also a quick, comprehensive reference for the research community and practitioners to draw insights from extensive informative summaries of the existing scientific works to advance and progress in the PT-BR-LLMs research field. Considering the emergence of new literature on PT-BR-LLMs, future updates will be made and regularly maintained in the project repository https://github.com/Amadeus-AI-Official/pt-br-llms

The Cocoruta Hub: Open and Curated Corpora, Datasets and Language Models on Brazilian Ocean Law

2025-04-26T15:01:49+00:00

This paper describes Cocoruta, an open-source hub for computational linguistics resources and language models related to the Brazilian legal context, with a focus on ocean law. The Cocoruta hub consists of open-access, curated corpora and datasets, and fine-tuned language models. Cocoruta includes two sets of resources. The first set is associated with a training dataset for question-answering tasks, whereas the second set features a dataset prepared to support model training for dialogue tasks, in addition to having refined curation procedures. In this paper, we provide a comprehensive analysis of Cocoruta's contributions, including information to ensure transparency and reproducibility of Cocoruta's construction process, fine-tuning of language models, and quantitative and qualitative evaluations of models. Quantitative evaluations are based on various performance metrics, while qualitative evaluations are conducted using human assessment procedures. This work contributes by advancing fundamental resources in specialized language domains and by fostering research and development in Brazilian legal natural language processing, serving as a hub to bring together efforts in this field.

A bilingual analysis of multi-head attention mechanism for image captioning based on morphosyntactic information

2025-04-26T15:30:15+00:00

Image Captioning is the task of describing the information conveyed by an image, i.e., its visual content in a natural language. Most of the current researches make use of the encoder-decoder architecture to create relations between images (the inputs) and text (the output). These relations are originated from the attention mechanisms, present on the Transformer model, and can be leveraged to understand how the image-text relationship is encoded during training and inference times. This work investigates the hypothesis that the attention mechanism behaves analogously for words that share morphosyntactic labels within texts. To this matter, the attention weights for each predicted word --- posed as the "focus" given in the image at each step --- are gathered, averaged and inspected; also, the analysis are performed taking into account one model trained with English captions and another trained with Portuguese captions, therefore comparing two languages with different morphological organization. Our results show that words with the same functioning in the sentence,e.g., being prone to similar inflections, usually have the same focal point in the image. Our work sheds light to the importance of linguistic studies for the vision-language area, reinforcing the benefits of including language-aware knowledge during training.

Evaluating Large Language Models for Brazilian Portuguese Sentiment Analysis: A Comparative Study of Multilingual State-of-the-Art vs. Brazilian Portuguese Fine-Tuned LLMs

2025-05-13T20:12:30+00:00

This study presents an extensive comparative analysis of Large Language Models (LLMs) for sentiment analysis in Brazilian Portuguese texts. We evaluated 23 LLMs—comprising 13 state-of-the-art multilingual models and 10 models specifically fine-tuned for Portuguese—across 12 public annotated datasets from diverse domains, employing the in-context learning paradigm. Our findings demonstrate that large-scale models such as Claude-$3.5$-Sonnet, GPT-4o, DeepSeek-V3, and Sabiá-3 delivered superior results with accuracies exceeding 92%, while smaller models (7-13B parameters) also showed compelling performance with top performers achieving accuracies above 90%. Notably, linguistic specialization through fine-tuning demonstrated mixed results—significantly reducing hallucination rates for some models but not consistently yielding performance improvements across all model types. We also observed that newer model generations frequently outperformed their predecessors, and in the one dataset where traditional machine learning methods were employed by the original authors for sentiment classification, all evaluated LLMs substantially surpassed these traditional approaches. Moreover, smaller-scale models exhibited a tendency toward overgeneration despite explicit instructions. These findings contribute valuable insights to the discourse on language-specific model optimization and establish empirical benchmarks for both multilingual and Portuguese-specialized LLMs in sentiment analysis tasks.

Domain Learning from Data for Large Language Model Translation and Adaptation

2025-07-08T11:32:50+00:00

Large Language Models (LLMs) have improved multilingual translation and adaptation, particularly for languages like Portuguese; however, they often fail to produce outputs that accurately reflect the linguistic, stylistic, and topical characteristics expected in real-world scenarios. This paper addresses the challenge of adapting texts to specific domains and audiences by moving beyond direct translation to include variations in genre and topic. We propose a method for learning domain representation vectors through prompt tuning, allowing LLMs to generate text that matches the communicative norms of a target domain or user profile (e.g., legal discourse, informal speech, or social media posts) or even topics. In contrast to most domain adaptation approaches that focus solely on translation, our method supports broader text adaptation and can be applied to multiple tasks. We demonstrate the effectiveness of our approach using two Portuguese datasets—a newly compiled corpus of video game discussions and a financial tweet corpus—and evaluate the results with respect to linguistic variation. Our main contributions include: (i) a method for learning reusable domain vectors to support prompt-based adaptation; (ii) application to translation and broader text adaptation tasks; and (iii) the release of a new domain-specific dataset in Portuguese.

BRoverbs - Measuring how much LLMs understand Portuguese proverbs

2025-07-14T12:22:16+00:00

Large Language Models (LLMs) exhibit significant performance variations depending on the linguistic and cultural context in which they are applied. This disparity signals the necessity of mature evaluation frameworks that can assess their capabilities in specific regional settings. In the case of Portuguese, existing evaluations remain limited, often relying on translated datasets that may not fully capture linguistic nuances or cultural references. Meanwhile, native Portuguese-language datasets predominantly focus on structured national exams or sentiment analysis of social media interactions, leaving gaps in evaluating broader linguistic understanding. To address this limitation, we introduce BRoverbs, a dataset specifically designed to assess LLM performance through Brazilian proverbs. Proverbs serve as a rich linguistic resource, encapsulating cultural wisdom, figurative expressions, and complex syntactic structures that challenge the model comprehension of regional expressions. BRoverbs aims to provide a new evaluation tool for Portuguese-language LLMs, contributing to advancing regionally informed benchmarking. The benchmark is available at https://huggingface.co/datasets/Tropic-AI/BRoverbs.

Rewriting Stories with LLMs: Gender Bias in Generated Portuguese-language Narratives

2025-04-20T17:07:56+00:00

Gender bias in Large Language Models (LLMs) has been widely documented, yet its impact on Portuguese-language text generation remains underexplored. In this study, we investigate gender bias in storytelling by prompting instruction-tuned LLMs to generate narrative continuations from masked sentences extracted from 840 public domain literary works. We analyze the gender distribution of generated characters and apply word association tests to quantify bias in word embeddings trained on the generated texts. Our findings reveal that both Mistral-7B-Instruct and LLaMA 3.2-3B tend to perpetuate and, in some cases, amplify existing gender imbalances; male characters are overrepresented and associated with cognitive and professional domains; and female characters are underrepresented and linked to emotional and domestic roles. We also explore the effectiveness of prompt engineering as a bias mitigation strategy, finding that while it increases gender-neutral descriptions, it also introduces greater uncertainty in gender inference. Our results highlight the challenges of addressing bias in LLMs and emphasize the need for more robust evaluation and mitigation strategies for Portuguese-language LLMs.

Pt-HotpotQA: Evaluating Multi-Hop Question Answering on Original and Portuguese-translated Datasets Using LLMs

2025-04-20T16:37:37+00:00

Multi-hop Question Answering (MHQA) advances Natural Language Processing by pushing models to combine information from multiple sources in a series of reasoning steps. Despite substantial advancements in MHQA for English, resources for evaluating Large Language Models (LLMs) in Portuguese remain scarce. To address this gap, we introduce a publicly available Portuguese translation of the HotpotQA dataset, a well-established English MHQA benchmark. We systematically evaluate several variants of the Llama multilingual LLM across both the original and translated datasets, analyzing performance variations by language. Our findings demonstrate that multilingual models consistently perform better in English than in Portuguese, though this gap narrows with increased model size. Additionally, we show the impact of fine-tuning on improving MHQA performance in Portuguese. This study provides valuable insights into optimizing LLMs for multilingual contexts and contributes a relevant benchmark for Portuguese-language MHQA research.

Abstractive Summarization with LLMs for Texts in Brazilian Portuguese

2025-05-13T19:52:13+00:00

This study aims to compare large language models (LLMs) in the task of text summarization for Portuguese-language texts. A dataset of 8,116 samples was used, containing the original texts and their corresponding reference summaries. Initially, an experiment was conducted comparing three different prompts using zero-shot, one-shot, and few-shot techniques, processing 100 samples for four out of the six models (those that accept instructions as part of their input). The goal of this preliminary experiment was to determine an optimal prompt for conducting the full-scale experiment. After selecting the prompt, a second experiment was performed, running all six models on the 8,116 samples and evaluating summarization quality using metrics such as BLEU and ROUGE, as well as Compression Rate and Inference Time for the generated summaries. Finally, an experiment was conducted to analyze the impact of 4-bit and 8-bit quantization, assessing how these different configurations affect the generated summaries, evaluation metrics, Compression Rate, and Inference Time.

The Bode Family of Large Language Models: Investigating the Frontiers of LLMs in Brazilian Portuguese

2025-04-26T15:02:08+00:00

The rapid advancement of Large Language Models (LLMs) has significantly impacted Natural Language Processing, yet their effectiveness remains uneven across languages. Most state-of-the-art models are trained predominantly in English, leading to performance disparities in lower-resource languages such as Brazilian Portuguese (BP). This paper explores fine-tuning strategies for adapting open-weight LLMs to BP, focusing on dataset translation techniques, linguistic adaptation challenges, and parameter-efficient fine-tuning methods, such as LoRA and Q-LoRA. We present a benchmark analysis evaluating multiple fine-tuning approaches across various open models, establishing a guiding framework for future BP-specific adaptations. Our results showcase the importance of specialized fine-tuning in improving cross-lingual transfer and NLP performance in BP, contributing to the broader goal of enhancing multilingual language model accessibility.

Exploring Brazil's LLM Fauna: Investigating the Generative Performance of Large Language Models in Portuguese

2025-04-20T20:56:10+00:00

Large Language Models (LLMs) are now embedded in widely used applications worldwide, yet their evaluation still centers on narrow, discriminative benchmarks. These pipelines often overlook key generative aspects such as discourse coherence, linguistic transformations, and adequacy, which are crucial for real-world applications. In addition, most large-scale evaluations remain heavily biased toward English, limiting our understanding of LLM performance in other languages. This research addresses these gaps by presenting a comprehensive analysis of Brazilian Portuguese LLMs across three core Natural Language Generation tasks: summarization, simplification, and generative question answering. We evaluate six Brazilian models and compare them to the widely used GPT-4o. Our findings, supported by diverse automatic metrics, an LLM-as-a-judge framework, and human evaluation, show that GPT-4o series achieves the best generative performance in Portuguese, followed closely by the Sabiá-3 family. While slightly behind, the open-weight model Tucano stands out for its computational efficiency, making it a strong candidate for deployment in resource-constrained settings. The code used to conduct all experiments is publicly available at https://github.com/MeLLL-UFF/brfauna-gen-eval.

Cross-Lingual Keyword Extraction for Pesticide Terminology in Brazilian Portuguese and English

2025-04-26T15:01:10+00:00

Agriculture plays a crucial role in Brazil's economy. As the country intensifies its activities in the sector, the use of pesticides also increases. Hence, the risks associated with pesticide-laden food consumption have become a concern for chemistry researchers. An issue affecting regulatory standardization of pesticides in Brazil is the difficulty in translating pesticide names, particularly from English. For example, the word malathion can be translated from English to Portuguese as malatiom or malatião, resulting in inconsistent labeling. This issue extends to the broader problem of translating highly technical terms between languages, in particular for low-resource languages. In this work, we investigate terminological variation in the chemistry of organophosphorus pesticides. Our goal is to study strategies for domain-specific multilingual keyword extraction. To that end, two corpora were built based on pesticide-related scientific documents in Brazilian Portuguese and English, which led to a total of 84 and 210 texts, respectively, representing the low- and high-resource languages in this study. We then assessed 6 methods for keyword extraction: Simple Maths, TF-IDF, YAKE, TextRank, MultipartiteRank, and KeyBERT. We relied on a multilingual contextual BERT embedding to retrieve corresponding pesticide names in the target language. Fine-tuning was also explored to improve the multilingual representation further. Moreover, we evaluated the use of large language models (LLMs) combined with the recent retrieval-augmented generation (RAG) framework. As a result, we found that the contextual approach, combined with fine-tuning, provided the best results, contributing to enhancing Pesticide Terminology Extraction in a multilingual scenario.

An Empirical Analysis of Large Language Models for Automated Cross-Prompt Essay Trait Scoring in Brazilian Portuguese

2025-07-25T20:33:09+00:00

The development of automated essay grading systems with minimal human intervention has been pursued for decades. While these systems have advanced significantly in English, there is still a lack of in-depth analysis of the use of modern Large Language Models for automatic essay scoring in Portuguese. This work addresses this gap by evaluating different language model architectures (encoder-only, decoder-only, reasoning-based), fine-tuning and prompt engineering strategies. Our study focuses on scoring argumentative essays written as practice exercises for the Brazilian national entrance exam regarding five trait-specific criteria. Our results show that no architecture is always dominant, and that encoder-only models offer a good balance between accuracy and computational cost. We obtain state-of-the-art results for the dataset, obtaining trait-specific performance that ranges from .60 to .73 measured in Quadratic Weighted Kappa.

Exploring Few-Shot Approaches to Automatic Text Complexity Assessment in European Portuguese

2025-04-20T17:31:11+00:00

The automatic assessment of text complexity has an important role to play in the context of language education. In this study, we shift the focus from L2 learners to adult native speakers with low literacy by exploring the new iRead4Skills dataset in European Portuguese. Furthermore, instead of relying on classical machine learning approaches or fine-tuning a pre-trained language model, we leverage the capabilities of prompt-based Large Language Models (LLMs), with a special focus on few-shot prompting approaches. We explore prompts with varying degrees of information, as well as different example selection approaches. Overall, the results of our experiments reveal that even a single example significantly increases the performance of the model and that few-shot approaches generalize better than fine-tuned models. However, automatic complexity assessment is a difficult and highly subjective task that is still far from solved.

Open LLMs Meet Causality in Portuguese: A Corpus-Based Fine-Tuning Approach

2025-04-26T14:57:26+00:00

Causal reasoning is a key component in the development of more robust, fair, and explainable language models. However, the ability of open-source Large Language Models (LLMs) to perform causal reasoning, especially in languages other than English, remains an open challenge. In this paper, we introduce an expanded version of CaLQuest.PT, a corpus of 2,500 natural questions in Portuguese designed to support multi-level causal evaluation. This dataset enables three layers of classification: (1) causal vs. non-causal questions, (2) causal action types such as cause-seeking, effect-seeking, and recommendation-seeking, and (3) reasoning types based on Pearl’s Ladder of Causality—associational, interventional, and counterfactual. We also present an enhanced Few-Shot Learning prompting strategy and evaluate the performance of open-source models fine-tuned on this corpus. Our results show that, with targeted training and prompt design, smaller open-source LLMs can approach and even surpass the performance of larger models in several causal classification tasks. This study highlights the viability of corpus-based fine-tuning as a low-resource alternative for enhancing causal reasoning in open LLMs and advancing natural language understanding in Portuguese.

A Framework for Semantic and Musical Hyperlapses

2025-07-04T21:42:50+00:00

With the growing prevalence of portable cameras---such as smartphones, action cameras, and smart glasses---recording first-person videos of daily activities has become increasingly common. However, these recordings often suffer from shaky footage caused by the wearer's continuous movements, making them physically uncomfortable to watch, and include repetitive or irrelevant segments that make them tedious to watch. To address these challenges, hyperlapse methods fast-forward first-person videos while stabilizing camera motion, and semantic hyperlapse methods additionally preserve the most important segments. Although audio is an important part of watching videos, it is often overlooked in hyperlapse creation, leaving the choice of soundtrack to the user. In this work, we introduce a multimodal hyperlapse algorithm that jointly optimizes semantic content retention, visual stability, and playback alignment with a user-chosen song's loudness. Specifically, the hyperlapse slows down during quiet parts of the song to highlight important frames and speeds up during louder segments to de-emphasize less critical content. We also propose strategies to select songs that best complement the hyperlapse. Our experiments show that this approach outperforms existing methods in semantic retention and loudness--speed correlation, while maintaining comparable camera stability and temporal continuity.

ENDLESS: An End-to-End Framework for Urban Synthetic Dataset Generation

2025-06-09T12:20:42+00:00

Computer vision models are fundamental for smart city applications. These models enable the city to interpret visual data, obtained from sensors such as surveillance cameras, to optimize its tasks and positively impact the citizens' lives. However, these models require ever-growing amounts of labeled data for training, which is expensive and raises ethical concerns when collected in the real world. Conversely, 3D engines and simulators allow the cost-effective and large-scale generation of automatically annotated synthetic data. This work proposes a synthetic dataset generator for the smart cities field using the CARLA simulator. The proposed generator allows the end-to-end generation of massive datasets with a single command, which includes the simulation of city assets, such as vehicles and pedestrians, and the recording and annotation of visual data. To demonstrate the generator's effectiveness, a dataset with over 300K annotated frames was generated and compared with other state-of-the-art datasets. The comparison results show that the proposed generator is capable of producing datasets comparable to the state of the art in terms of data volume and number of annotations. It's expected that the proposed generator could be used to create useful datasets for training and evaluating computer vision models in the smart cities area. It's also expected that this work brings attention to the synthetic data usage for smart city models.

Harnessing Foveated Rendering and AI to Tackle VR Cybersickness: A Feature-Centric Perspective

2025-06-13T12:11:37+00:00

As virtual reality becomes increasingly immersive, issues related to cybersickness pose a major challenge. This review investigates how foveated rendering techniques, powered by artificial intelligence, are transforming our response to this topic. We analyze the primary factors that lead to cybersickness, including latency, field of view, vergence-accommodation mismatch, and unnatural locomotion, while demonstrating how adaptive visual strategies can significantly alleviate user discomfort. By considering individual traits like age, previous virtual reality experience, and real-time physiological indicators, including heart rate and skin conductance, modern rendering systems are evolving to be more intelligent and user-specific. We emphasize the role of advanced machine learning models, from interpretable symbolic frameworks to deep neural networks, along with gaze prediction systems that enable real-time adjustments through predictive rendering and user-context-specific optimization. Our findings highlight the promise of closed-loop rendering systems, which aim to preserve visual fidelity while enhancing comfort and engagement, steering to toward safer, more personalized virtual reality experiences.

VIDA XR: Mixed Reality Tool for teaching anatomy to dental students

2025-07-04T12:44:49+00:00

Teaching dental anatomy faces challenges related to ethical concerns, cost of materials and quality of artificial representations. Virtual and augmented reality technologies have been explored for education and training in healthcare for the same motivations. In this paper we present a novel mixed reality tool that allows dental students to visualize and manipulate 3D models of tooth surface and tooth root, generated from microtomography scans. The development process by a multidisciplinary team is described, followed by results from performance evaluation and assessment by an expert panel. Performance was within the guidelines for applications in the adopted head-mounted device, and expert assessment was positive regarding its usefulness and ease of use. Future work includes assessing the tool's pedagogical effectiveness in anatomy classes.

A Practical Evaluation of a Federated Learning Application in IoT Device and Cloud Environment

2024-10-16T00:17:47+00:00

Internet of Things (IoT) devices have grown exponentially in recent years, resulting in valuable data for machine learning applications. Traditionally, machine learning models require centralized data collection and processing, which is not feasible in the IoT landscape due to high density and growing data privacy concerns. Federated Learning is a trend in this scenario, as it allows collaborative training of models on IoT devices, distributed and without the need to share data. This paper examines a federated learning framework for IoT devices, employing a parameter server topology in a benign node environment without considering strategies for optimizing model performance. The evaluation is conducted in two distinct scenarios: (i) a testbed of IoT devices equipped with ARM processors and limited to 2GB of RAM, and (ii) a virtualized cloud environment with a mixture of resource-constrained virtual machines. The experiments use non-identically distributed (non-IID) datasets—MNIST for the IoT testbed and CIFAR-10 for the cloud environment—evaluated under various client configurations and aggregation strategies. In the IoT device scenario, the framework achieved an accuracy of up to 0.6 after ten rounds of global aggregation, while the cloud environment attained a maximum accuracy of 0.4. These results demonstrate the feasibility of applying FL in resource-constrained IoT environments, with scalability and accuracy influenced by the number of clients and local training epochs.

A Human-Centered Multiperspective and Interactive Visual Tool For Explainable Machine Learning

2024-09-03T18:15:46+00:00

Understanding why a trained machine learning model makes some decisions is paramount to trusting the model and applying its recommendations in real-world applications. In this article, we present the design and development of an interactive and visual approach to support the use, interpretation and refinement of ML models, whose development was guided by user's needs. We also present Explain-ML, an interactive tool that implements a visual multi-perspective approach to the support interpretation of ML models. Explain-ML development followed a Human-Centered Machine Learning strategy guided by the target (knowledgeable) users' demands, resulting in a multi-perspective approach in which interpretability is supported by a set of complementary visualizations under several perspectives (e.g., global and local). We performed a qualitative evaluation of the tool´s approach to interpretation with a group of target users, focused on their perspective regarding Explain-ML helpfulness and usefulness in comprehending the outcomes of ML models. The evaluation also explored users' capability in applying the knowledge obtained from the tool's explanations for adapting/improving the current models. Results show that Explain-ML provides a broad account of the model's execution (including historical), offering users an ample and flexible exploration space to make different decisions and conduct distinct analyses. Users stated the tool was very useful and that they would be interested in using it in their daily activities.

WIT comes of Age: The Successful Story of the Women in Information Technology Workshop

2024-09-24T19:06:42+00:00

Having more women in Computing, Information Technology, and Hard Sciences is a pressing matter that has been around for at least three decades. This article presents a success story about WIT (Women in Information Technology), an event part of the CSBC (Brazilian Computer Society Congress). It was created 18 years ago as an open space to discuss gender equity issues in Computing and is currently organized by Programa Meninas Digitais – PMD (Digital Girls Program). In this article, we present the impact of WIT within the Brazilian Computing community. Quantitative and qualitative analyses were carried out over data recorded throughout this period. The results show WIT’s growing number of participants, citations to published papers, and submissions over the years prove its success. So, WIT was born as an embryo, then grew, matured, and created a personality. Today, upon reaching adulthood, it inspires thousands of women to begin working in computing or to thrive in it. WIT has also shown men that society benefits when diversity and respect exist for all minorities, especially women in Computing.

End-users perspective matters in ADAS: designing a blind-spot alert system from a user-centered approach

2024-09-18T20:25:35+00:00

The number of traffic accidents involving motorcycles has been increasing in Brazil recently. Many accidents are caused by drivers who do not see motorcycles approaching in the vehicle blind spots. Advanced Driver Assistance Systems (ADAS) installed in vehicles can be used to mitigate this problem. However, the development of ADASs often focuses on security issues and does not consider the user experience with the ADASs interface. In this paper, we present the design of an alerting system that warns drivers about collision risks when motorcycles are identified in vehicle blind spots. Our proposal alerts drivers by using visual and haptic interaction modes. In line with a user-centered design methodology, our initial action involved identifying the traits of the target users in order to create personas that embody their characteristics. The vehicle blind spot alert system was conceived in a co-design session using the personas elaborated previously. In the session, 9 end-users produced 3 low-fidelity (lo-fi) prototypes which were analyzed and compiled generating a single lo-fi prototype. After promoting a viability discussion, a high-fidelity (hi-fi) prototype containing haptic and visual alerting features was implemented and installed in a car for testing. The alert system was evaluated by 20 end-users concerning their experience with the different warning modes. The results showed that for both the visual and haptic modes, users could recognize and understand the alerts without employing a great effort in the information interpretation. This result reinforces the idea that ADASs should provide simple interpretative interfaces because drivers' interaction with these systems should be a secondary activity since their concentration must be on driving.

Unveiling the Use of Networked Ontologies to Develop a Supporting Tool for UX Evaluation in an Immersive Context

2024-10-18T15:08:29+00:00

Immersive technologies have emerged as a new type of interactive system that aims to provide users with immersive experiences. They have been adopted in various fields and are gradually becoming part of our lives. UX is a key quality attribute to evaluate or model such experiences. However, when it comes to immersive experiences, evaluating UX is particularly challenging because the user should not be interrupted to provide feedback. Aiming at giving a step to address this issue, we have explored using ontologies from an ontology network to support evaluating immersive experiences. In this work, we adopted the Human-Computer Interaction Ontology Network (HCI-ON) and used an extract containing concepts from some of its networked ontologies to develop the User eXperience evaluation based on Ontology Network (UXON), an ontology-based tool that supports UX experts evaluating immersive experiences based on data recorded in interaction logs. HCI-ON is a framework for organizing knowledge of the HCI domain, offering a general understanding of the field, regardless of specific solutions. UXON was used to evaluate the UX of Compomus, an immersive application that supports collaborative music composition. UXON extracts data from the application interaction logs, calculates UX metrics, and provides consolidated data and information in graphs and tables. We conducted a study and collected feedback from the UXON developer and three UX experts who used the tool. Results showed that using networked ontologies to develop a tool to support UX evaluation is feasible and valuable. In summary, the ontologies helped at the conceptual level by offering a basis to define the system's structural model and at the implementation level by assigning semantics to data to make inferences about UX. Based on the UX experts' perceptions, UXON was considered a promising system, beneficial, helpful, and easy to use. The conceptualization used to develop UXON was evaluated by HCI experts and it was considered adequate and understandable, having the potential to be used by other people to solve HCI evaluation problems.

Comparative Evaluation of Deep Learning Models for Diagnosis of COVID-19 Using X-ray Images and Computed Tomography

2024-09-30T13:49:05+00:00

(1) Background: The COVID-19 pandemic is an unprecedented global challenge, having affected more than 776.79 million people, with over 7.07 million deaths recorded since 2020. The application of Deep Learning (DL) in diagnosing COVID-19 through chest X-rays and computed tomography (CXR and CT) has proven promising. While CNNs have been effective, models such as the Vision Transformer and Swin Transformer have emerged as promising solutions in this field. (2) Methods: This study investigated the performance of models like ResNet50, Vision Transformer, and Swin Transformer. We utilized Bayesian Optimization (BO) in the diagnosis of COVID-19 in CXR and CT based on four distinct datasets: COVID-QU-Ex, HCV-UFPR-COVID-19, HUST-19, and SARS-COV-2 Ct-Scan Dataset. We found that, although all tested models achieved commendable performance metrics, the Swin Transformer stood out. Its unique architecture provided greater generalization power, especially in cross-dataset evaluation (CDE) tasks, where it was trained on one dataset and tested on another. (3) Results: Our approach aligns with state-of-the-art (SOTA) methods, even in complex tasks like CDE. On some datasets, we achieved exceptional performance metrics, with AUC, Accuracy, Precision, Recall, and F1-Score values of 1. (4) Conclusion: Results obtained by the Swin Transformer go beyond what is offered by current SOTA methods and indicate actual feasibility for application in medical diagnostic scenarios. The robustness and generalization power of the Swin Transformer, demonstrated across different datasets, encourage future exploration and adoption of this approach in clinical settings.

Learner Experience Evaluation: a Feasibility Study and a Benchmark

2024-04-09T23:41:53+00:00

Learner eXperience (LX) is a concept derived from User eXperience (UX) and it can be defined as the perceptions, answers, and performances of learners interacting with Digital Communication and Information Technologies (DICTs). Evaluating the LX to obtain experiences that support and facilitate learning and knowledge mastery is important. Thus, we developed the LEEM to assess and improve the learner's experience using DICTs during learning. The LEEM is a generic evaluation model and can be used for any level of education; it can be used independently of the discipline and used with any educational technology. Therefore, this article presents a feasibility study to evaluate the LEEM steps and sentences from the perspective of potential users. Nineteen teachers from different levels of education participated in this study. The study results were analyzed and generated in a new version of LEEM. The results showed positive points of LEEM, such as a practical, objective, easy-to-use, and useful model. In addition, opportunities for improving some items and sentences of LEEM were obtained. The teachers also suggested adding a description at the ends of the scales to facilitate the response to the items. This study contributes to creating a body of knowledge about LEEM and analyzing its use, feasibility, and evolution. Moreover, realizing the lack of content and synthesized characteristics about the technology that evaluates the LX, we carried out a benchmark on the LX evaluation technologies identified from a Systematic Mapping Study (SMS) to compare them with LEEM, in addition to presenting important characteristics to be analyzed in these types of technology, such as the elements and types of LX assessment and whether there is tool support for this assessment.

Identification and classification of speech disfluencies: A systematic review on methods, databases, tools, evaluation and challenges

2024-10-05T13:08:20+00:00

With the advancement of multimedia technologies, human-computer conversational interfaces are becoming increasingly important and are emerging as a highly promising area of research. Vocal representations, facial expressions, and body language can be used to extract various types of information. In the context of vocal representations, the complexity of human communication involves a wide range of expressions that vary according to grammatical rules, languages, accents, slang, disfluencies, and other speech events. In particular, the detection of disfluencies, i.e., interruptions in the normal flow of speech characterized by pauses, repetitions, and sound prolongations, is of interest not only for improving speech recognition systems but also for potentially identifying emotional aspects in audio. Several studies have aimed to define computational methods to identify and classify disfluencies, as well as appropriate evaluation methods in different languages. However, no studies have compiled the findings in the literature on this topic. This is important for both summarizing the motivations and applications of the research, as well as identifying opportunities that could guide new investigations. Our objective is to provide an analysis of the state of the art, the main limitations, and the challenges in this field. Eighty articles were extracted from four databases and analyzed through a systematic review. Our results show that research into the detection of disfluencies has been conducted for various purposes. Some aimed to improve the performance of translation tools, while others focused on the summarization of spoken dialogues, speaker diarization, and Natural Language Processing. Most of the research was oriented toward the English language. F-score, precision, and recall were the most commonly used evaluation measures for the reported methods. Statistical and machine learning techniques were widely applied, with CRFs (Conditional Random Fields), MaxEnt (Maximum Entropy), Decision Trees, and BLSTM (Bidirectional Long Short-Term Memory) being especially prominent. In general, newer approaches, such as BERT and BLSTM, have demonstrated higher performance. However, several challenges remain, opening up new research opportunities.

Prediction of defects in Smart Contracts applying Deep Learning with Solidity metrics

2024-11-14T10:43:58+00:00

Smart Contracts are autonomous, self-executable programs that facilitate agreement execution without the need for intermediaries. These contracts are also susceptible to software defects, leading to vulnerabilities that can be exploited by attackers. The use of models for predicting software defects is a well-studied research area. However, applying these models with Smart Contract metrics is an area that remains underexplored. The aim of this study is to evaluate whether deep learning models used in the prediction of traditional software defects produce equivalent results with specific Smart Contract metrics. Machine learning models were applied to four data sets, and performances were evaluated using Precision, Recall, F-score, Area under the curve (AUC), Precision-recall curve (PRC), and Matthews Correlation Coefficient (MCC). This approach complements traditional formal verification methods, which, although accurate, are often slower and less adaptable to emerging vulnerabilities. By employing deep learning, the model enables faster and more cost-effective analysis of large volumes of Smart Contracts. Unlike conventional techniques that rely on expert-defined rules and require substantial computational resources, this model offers scalable and continuous monitoring. Consequently, the research provides a complementary solution that can significantly enhance the security of the smart contract ecosystem, allowing for the detection of potential defects before exploitation occurs.

Enhancing Disease and Pest Detection in Greenhouse Tomato Cultivation Using Advanced Machine Learning on New Dataset of Images

2024-12-09T10:23:47+00:00

Increasing food production is a continuous need. In this context, agriculture is a fundamental part of meeting the ever-increasing demand for food. Plant diseases are one of the factors that compromise food production goals, and the characteristics and climate of each production region influence them. Tomatoes are one of the world's most consumed vegetables and are widely affected by various diseases. However, tomato cultivation in greenhouses allows its continuous production. In this context, this research work focuses on the problem of identifying diseases in scenarios of tomato cultivation in greenhouses, where we have specific occurrences of diseases that are affected by regional climatic conditions. Brazil is a major producer of tomatoes, producing more than 3 million tons annually, with 8% of this production being made in the state of Paraná. This study was developed through data collection in collaboration with greenhouse tomato producers from an important region in North Paraná. For this study, we created new datasets with two image sizes: the Tomato Leaf Image Dataset (TLID) with image sizes of 256x256 pixels and 15,256 images, and the Patch Tomato Leaf Image Dataset (PTLID) with patch sizes of 32x32 pixels and 227,218 images. Both datasets comprise seven classes, including four types of diseases, two combinations of diseases on the same leaf, and the healthy leaf. Machine Learning techniques have been widely used to identify plant diseases. This work presents two machine learning methods tested with both datasets. In the proposed models, we combine three convolutional neural networks, a customized CNN, VGG19, and Resnet50, and two voting classification methods using hard and soft decisions. The evaluation performed on the datasets showed that when the patches are used, the results improve significantly, reaching an accuracy of 90.48%. It is also possible to identify the stage of the disease.

Machine Learning methods and models to predict food insecurity levels for families in Ceará, Brazil, based on employment, housing and other social indicators

2025-01-05T12:53:10+00:00

Many nations still struggle to provide their populations with access to food and balanced nutrition. The Food and Agriculture Organization of the United Nations (FAO) included Brazil in its 2022 Hunger Map, highlighting that 61 million Brazilians face difficulties in feeding themselves. Despite the presence of various food security alert and monitoring systems in food-insecure countries, the data and methodologies they rely on capture only a fraction of the issue’s complexity, underscoring the need for further research to fully comprehend this multifaceted problem. In response, the Secretary for Social Protection of Ceará (SPS - Secretaria de Proteção Social), located in Brazil’s northeast, conducted a survey to collect data on the social and economic characteristics of extremely vulnerable families. This dataset, analyzed in our study, represents a concentrated effort by the government of Ceará to evaluate the needs of low-income households, particularly those with children who lack access to essential services. We used the Brazilian Food Insecurity Scale, a tool validated by the Brazilian Ministry, to measure food insecurity levels based on families’ responses, assigning scores to their answers. This paper presents a machine learning model that examines the collected data to identify which factors related to Food Access, Employment and Income, Housing, and Public Services can predict levels of food insecurity. Our best model demonstrates an accuracy of approximately 0.75, an F1-score of 0.80, and can distinguish between severe and non-severe food insecurity levels. We suggest that our model could be applied to other datasets lacking nutrition-specific questions to gauge a family’s food insecurity level. Additionally, our research sheds light on the key factors influencing food insecurity levels in Brazil, notably income and housing conditions, providing valuable insights for addressing this issue.

Endorsement Networks in the 2022 Brazilian Presidential Elections: a Case Study on Twitter Data

2024-04-20T21:17:16+00:00

Online social networks take an increasingly substantive role in modern elections. For example, Twitter (renamed X) provides an online platform to present, discuss and confirm/endorse each other’s opinions. In regard of the 2022 Brazilian Presidential elections, we examine the endorsement networks from supporters of potential presidential candidates to reveal characteristic structural properties of those networks. Their construction has involved a data set with over 20 million tweets; their analysis depicts the development of the online supporters within three months of the 2022 presidential election. Within the networks we look into the roles of the most essential supporters and the presence of social bots. In order to uncover changes between election years, we compare our results to the previous election in 2018. This descriptive network analysis provides an overview of the political engagement of Twitter users during the Brazilian Presidential elections.

Resource Utilization of 2D SLAM Algorithms in ROS-Based Systems: an Empirical Evaluation

2024-11-24T13:58:32+00:00

Simultaneous localization and mapping (SLAM) is an important task in robotic systems, which entails mapping an environment while keeping track of the robot's position within the created map. The Robot Operating System (ROS) offers various packages for this functionality, where each one may lead to different performance and resource usage. Therefore, this study aims to investigate the impact of different ROS-based SLAM algorithms on resource utilization, including possible trade-offs with performance (e.g., the accuracy of the created map). The investigation is centered on primary experiments involving multiple runs of a single robot, which alternates between four SLAM algorithms: Cartographer, Gmapping, Hector SLAM, and Karto. During these experiments, the robot autonomously navigates through two types of arenas: point-to-point (multi-goal navigation) and circular (returning to the starting position after following the perimeter). Throughout these trials, the robot's performance is assessed based on the ROS system's efficiency and energy consumption. In a secondary set of experiments, the tests are repeated, but with key SLAM algorithm parameters reconfigured to evaluate their impact. The experiment results reveal Karto as the most efficient algorithm across all evaluated metrics, and the one that creates the most visually consistent maps. Cartographer was the algorithm that showed the least promising results regarding both, energy consumption and CPU utilization. Furthermore, Gmapping was the algorithm most susceptible to changes in SLAM algorithms' parameter values. The results presented in this study are, therefore, key for resource-aware design choices when using SLAM algorithms in the context of ROS-based systems.

From Pampas to Pixels: Fine-Tuning Diffusion Models for Gaúcho Heritage

2024-12-05T18:15:38+00:00

Generative Artificial Intelligence has become pervasive in society, witnessing significant advancements in various domains. Particularly in the domain of Text-to-Image (TTI) models, Latent Diffusion Models (LDMs) showcase remarkable capabilities in generating visual content based on textual prompts. This paper addresses the potential of LDMs in representing local cultural concepts, historical figures, and endangered species. In this study, we use the cultural heritage of Rio Grande do Sul (RS), Brazil, as an illustrative case. Our objective is to contribute to the broader understanding of how generative models can help to capture and preserve the regional culture and historical identity. The article outlines the methodology, including subject selection, dataset creation, and fine-tuning process. The results showcase the picture generation alongside the challenges and feasibility of each concept. In conclusion, this work shows the power of these models to represent and preserve unique aspects of diverse regions and communities.

Analyzing the Technological Influence on Self-esteem: A Systematic Literature Review under the Socially-Aware Design Perspective

2024-10-18T15:10:04+00:00

Self-esteem is a concept developed from the personal experiences of a person in society, relative to his/her own assessment. It expresses an attitude of approval, or disapproval, and indicates to what extent the person believes to be capable, meaningful, dignified, and successful in relation to his/her abilities, skills, social relationships and physical appearance, for example. Some studies indicate technology has the potential to provide experiences that can affect the users’ psychological states, including their self-esteem. This work contributes with a systematic literature review, aiming at investigating technological applications designed to influence self-esteem. Through automatic and manual search, we identified 3,668 papers potentially addressing the subject, and 7 of them were selected for a thorough analysis based on inclusion and exclusion criteria. In this paper we organized and detailed the results we raised through the Semiotic Framework (Semiotic Ladder) perspective. The results highlight what types of systems are used to affect/impact self-esteem, how technology has been used, the strategies for supporting this purpose, and the aspects of self-esteem addressed. Yet, we analyzed what values (personal, social, and technical) these applications considered. This work relates self-esteem, values, technological issues, and strategies to influence self-esteem, and points out limitations and open challenges to further explore.

Investigating Vulnerability-Fixing Commits

2025-01-27T16:39:02+00:00

An insecure software can cause severe damage to the user experience and privacy. Therefore, developers should prevent software vulnerabilities. However, detecting such problems is expensive and time-consuming. To address this issue, researchers propose vulnerability datasets that facilitate the investigation of their properties. In this regard, we investigate one of these datasets to better understand the vulnerabilities, their corrections, the authors involved, and the properties of the correction commits. Our results indicate that some vulnerabilities require many patches to solve. Furthermore, among the projects included in the target dataset, the Chromium project is the most affected by these vulnerabilities. We also find that in most cases correction commits are small in terms of the number of files and lines affected. Additionally, the authors of the corrections are mostly not new to the files that need fixing. Finally, we find that most corrections involve changes that affect other developers and rarely affect the developer who introduced the problem. Therefore, corrections are usually made by other developers rather than by those who introduced the problem. We believe that our findings can help developers resolve vulnerabilities with fewer resources, such as time, budget, and manpower.

Data sharing-based approach for Federated Learning tasks on Edge Devices

2024-12-17T17:47:42+00:00

Federated Learning (FL) enables edge devices to collaboratively train a global machine learning model. In this paradigm, the data is maintained on the devices themselves and a server is responsible for aggregating the parameters of the local models. However, the aggregated model may present convergence difficulties when the device data are non-independent and identically distributed (non-IID), that is, when they present a heterogeneous distribution. This work proposes an algorithm that extends data sharing-based solutions from the literature by considering privacy-flexible environment, where users agree to share a small percentage of their private, and privacy-sensitive environment, where it is assumed that the aggregator server has a set of public global data that is shared with users in the initial phase of the FL process. The proposed algorithm is evaluated in a distributed and centralized way considering a Human Activity Recognition (HAR) application. The results show that data sharing strategies indicate improved global model performance in non-IID scenarios.

A Lightweight Method for Almost Real-Time Recognition of Emotions through Facial Geometry with Normalized Landmarks

2025-02-06T14:58:33+00:00

Recognizing emotion is an intrinsic and fundamental part of human social interactions and is closely tied to behaviors that produce distinct facial patterns. Facial expressions indicate emotional context, so scientific, artistic, medical, and commercial interest in the field has increased, driving the development of computational techniques that can recognize emotion automatically. Although current methods provide satisfactory results, challenges persist, particularly challenges regarding the variable patterns of facial shapes and the response time achieved with low computational resources. For instance, some applications requiring instantaneous emotion recognition with high accuracy and low latency may be limited by the processing power, specially in the case of mobile devices. Here, we present a practical and simple method called REFEL (Recognizing Emotions through Facial Expression and Landmark Normalization), designed to identify facial expressions and human emotions in digital images. This method addresses sequential steps that reduce sample variability such as anatomical, scale factor and geometric variations and performs reductions of color, brightness and others in preprocessing tasks. REFEL normalizes facial fiducial points, commonly referred to as landmarks, and allows fine-tuning of informative aspects delineating facial patterns. Using landmark positions makes the process of recognizing emotions more reliable. REFEL also exploits classifiers explicitly tailored to identify facial emotions accurately. As in the case of related works, we employed Machine Learning algorithms, to achieve average accuracy higher than 90% for emotion recognition, when we applied REFEL before classification. We have experimented REFEL with various datasets including facial images that consider racial, age and gender factors as well as facial rotation. In this study, we also compared emotion classification without grouping emotions and with two emotion groups (Fear-Surprise and Anger-Disgust). Analysis of the ROC curves revealed that grouping emotions led to a slight improvement in the average performance of the REFEL method, with a 3% increase in accuracy. Our method represents an enhanced approach in terms of hit rate and response time, generates resilient outcomes, and relies less on the training set and classifier architecture. Furthermore, REFEL performs well, almost in real time, lowers the processing costs inherent to training, and is particularly suited to devices with limited processing capabilities, like cell phones. Emotion recognition methods usually have almost minimal real-time delay, which enables the system to react fast but not necessarily instantaneously. With REFEL, we hope to improve computational synthesis techniques and resources, and to help robust and motivating assistive technologies to advance. As future efforts, we intend to consider 3D images and videos.

Quantifying Computational Thinking Skills: an Exploratory Study on Bebras Tasks

2025-01-06T21:14:21+00:00

Computational Thinking (CT) is a cognitive problem-solving approach commonly employed in the field of Computer Science. Over recent years, various strategies have emerged to promote CT awareness and understanding. Despite these initiatives, there has been a lack of quantitative analysis aimed at assessing CT as a cognitive skill among undergraduate students, particularly focusing on items designed for this purpose. In this study, our objective is to investigate the psychometric properties of CT questions as answered by novice Computer Science undergraduates. To achieve this, we selected a set of questions from the Bebras Challenge, an international competition designed to explore CT skills without requiring programming expertise. In pursuit of our goal, we utilized Item Response Theory (IRT) to scrutinize the difficulty and discrimination levels of these selected Bebras questions. Difficulty is related to how an examinee responds to an item, while discrimination measures how effectively an item can differentiate between individuals with higher and lower levels of knowledge. Our findings reveal several key insights: (i) Concerning the accuracy in predicting question difficulty, theoretical predictions achieved an accuracy rate ranging from 53% to 58% when compared to empirical data. (ii) The Bebras Challenge questions predominantly exhibited two levels of difficulty, spanning from easy to medium. (iii) The questions displayed a spectrum of discrimination levels, encompassing low, moderate, and high discrimination, a crucial aspect for crafting effective assessment instruments. Additionally, we have gathered observed lessons from this exploratory study, regarding the design of questions that can contribute to reliably measure CT skills. These lessons contribute to understanding features influencing the reliable design of items for measuring CT skills. These insights serve as a resource for future research endeavors aimed at enhancing our understanding of assessing CT abilities.

Efficient Multiscale Object-based Superpixel Framework

2024-08-30T01:30:00+00:00

Superpixel segmentation can be used as an intermediary step in many applications, often to improve object delineation and reduce computer workload. However, classical methods do not incorporate information about the desired object. Deep-learning-based approaches consider object information, but their delineation performance depends on data annotation. Additionally, the computational time of object-based methods is usually much higher than desired. In this work, we propose a novel superpixel framework which exploits object information being able to generate a multiscale segmentation on-the-fly. Our method starts off from seed oversampling and repeats optimal connectivity-based superpixel delineation and object-based seed removal until a desired number of superpixels is reached. It generalizes recent superpixel methods, surpassing them and other state-of-the-art approaches in efficiency and effectiveness according to multiple delineation metrics.

A concise representation for adaptive hexagonal meshes

2025-04-01T21:33:14+00:00

Adaptive hexagonal meshes yield high-quality planar quadrilateral meshes, which are desirable in applications. We propose a concise exact representation for adaptive hexagonal meshes that is based solely on the faces. We represent each face by its position, type, orientation, and scale. Our representation is simple to use and requires a small fraction of the memory required by topological data structures. Although no other topological elements or relations are explicitly represented, the mesh and all its topological relations can be reconstructed in linear time.

Learning to Rank for Query Auto-completion in the Legal Domain

2025-03-10T10:37:46+00:00

Most modern Web search engines implement query auto-completion (QAC) to facilitate faster user query input by predicting users' intended query. This is the case of Jusbrasil, Brazil’s most prominent and widely used legal search engine platform. Query auto-completion is typically performed in two steps: matching and ranking. Matching refers to the selection of candidate query from a suggestions dataset. Ranking sorts the matching results according to a score function that attempts to select the top most relevant suggestions for the user. In this paper, our main goal is to explore the effectiveness of learning to rank algorithms on the ranking step for query auto-completion in the legal domain. In particular, we explore four learning to rank algorithms: LambdaMART, XGBoost, RankSVM and Genetic Programming. LambdaMART is widely used in query auto-completion. On the other hand, as far as we know, this is the first time that the RankSVM and XGBoost are used for this task. Additionally, we propose the use of Genetic Programming as a lightweight and viable alternative for query auto-completion. One difficulty for exploring learning to rank algorithms in query auto-completion is the lack of fine-grained training and test datasets, since learning to rank algorithms rely on a large number of features. To bridge this gap, and also to foster research on this area, we propose two datasets with different types of features for query auto-completion in the legal domain. The datasets were created by collecting data from several data sources from Jusbrasil, including contextual features from search query logs, enriched with additional features extracted from other data sources like auto-completion log, document content and metadata available at Jusbrasil. Then, we show that learning to rank is effective for query auto-completion in the legal domain by answering four main research questions: 1) How each feature, specially the novel ones proposed in our work, impact the rankings in query auto-completion?; 2) How effective is learning to rank with respect to the Most Popular Completion (MPC), a ranking algorithm widely adopted as baseline in the literature?; 3) Among the four alternatives experimented, which learning to rank algorithm is more effective in the legal domain?; and 4) How effective is learning to rank with respect to ranking models based on BERT and ColBERT? Finally, we conduct an online A/B test at Jusbrasil.

Characterizations and recognition algorithms for distance-hereditary graphs partitionable into a stable set and a forest

2024-11-24T19:48:18+00:00

Given a simple graph G=(V, E), the Near-Bipartiteness problem asks whether V(G) can be partitioned into two sets S and F such that S is a stable set and F induces a forest. Alternatively, the Near-Bipartiteness problem can be seen as the problem of determining whether G admits an independent feedback vertex set S. Since such a problem is NP-complete even for perfect graphs, in this paper, our goal is to study the property of being near-bipartite on distance-hereditary graphs, a well-known subclass of perfect graphs. We show that there is an infinite set of minimal forbidden subgraphs for a distance-hereditary graph to be near-bipartite. In addition, we present a finite set of forbidden subgraphs that give us a sufficient condition for the existence of a near-bipartition in such a graph class. Finally, by using one-vertex-extension trees, we present a linear-time algorithm for the Near-Bipartiteness problem on distance-hereditary graphs.

Enhancing Routed DEVS Models with Event Tracking

2025-03-24T22:12:48+00:00

The Routed Discrete Event System Specification (RDEVS) is a modular and hierarchical Modeling and Simulation (M&S) formalism based on the Discrete Event System Specification (DEVS) formalism that provides a set of design models for dealing with routing problems over DEVS. At the formal level, RDEVS models (as DEVS models themselves) are defined mathematically. However, software implementations of both formalisms are based on an object-oriented paradigm. Furthermore, at the implementation design level, the RDEVS formalism is represented by a conceptual model that uses DEVS simulators as execution engines. Even when RDEVS models can be executed with DEVS simulators, the resulting data (obtained as execution outputs) remains DEVS-based, restricting the study of event flows between models influenced by routing policies. This paper shows how the RDEVS formalism design was enhanced to include event tracking in the models without altering their expected behavior during simulation. Such an improvement is based on adding new features to existing RDEVS components. These features are defined as trackers, which are responsible for getting structured data from events exchanged during RDEVS executions. The proposed solution employs the Decorator pattern as a software engineering option to achieve the required goal. It was deployed as a Java package attached to the RDEVS library, devoted to collecting structured event flow data using JavaScript Object Notation (JSON). The results highlight the modeling benefits of adding event tracking to the original capabilities of the RDEVS formalism. For the M&S community, the novel contribution is an advance in understanding how best modeling practices of software engineering can be used to enhance their software tools in general and the RDEVS formalism in particular.

Efficient online tree, rule-based and distance-based algorithms

2025-04-05T12:03:52+00:00

The fast development of technology resulted in the constant production of data in different forms and from different sources. Contrary to what was observed in the first machine learning (ML) research works, there might be too much data to handle with traditional algorithms. Changes in the underlying data distributions might also render traditional ML solutions useless in real-world applications. Online ML (OML) aims to create solutions able to process data incrementally, with limited computation resource usage, and to deal with time-changing data distributions. Unfortunately, we have seen a recent growing trend in creating OML algorithms that solely focus on predictive performance and overlook computational costs. In regression tasks, the problem is even more pronounced when considering some of the most popular OML solutions: decision trees, decision rules, and ensembles of these models. We created improved and efficient OML algorithms from the algorithmic families mentioned, focusing on decreasing time and memory costs while maintaining competitive predictive performance. In this paper, we present an overview of the main contributions discussed in the Ph.D. thesis of the main author, which was awarded the best thesis prize by the Brazilian Computer Society in the 37th edition of the competition. Our proposals are either novel standalone OML algorithms or additions that can be paired with any existing tree or decision rule regressors.

Toward Advancing License Plate Super-Resolution in Real-World Scenarios: A Dataset and Benchmark

2024-11-24T22:41:30+00:00

Recent advancements in super-resolution for License Plate Recognition (LPR) have sought to address challenges posed by low-resolution (LR) and degraded images in surveillance, traffic monitoring, and forensic applications. However, existing studies have relied on private datasets and simplistic degradation models. To address this gap, we introduce UFPR-SR-Plates, a novel dataset containing 10,000 tracks with 100,000 paired low and high-resolution license plate images captured under real-world conditions. We establish a benchmark using multiple sequential LR and high-resolution (HR) images per vehicle – five of each – and two state-of-the-art models for super-resolution of license plates. We also investigate three fusion strategies to evaluate how combining predictions from a leading Optical Character Recognition (OCR) model for multiple super-resolved license plates enhances overall performance. Our findings demonstrate that super-resolution significantly boosts LPR performance, with further improvements observed when applying majority vote-based fusion techniques. Specifically, the Layout-Aware and Character-Driven Network (LCDNet) model combined with the Majority Vote by Character Position (MVCP) strategy led to the highest recognition rates, increasing from 1.7% with low-resolution images to 31.1% with super-resolution, and up to 44.7% when combining OCR outputs from five super-resolved images. These findings underscore the critical role of super-resolution and temporal information in enhancing LPR accuracy under real-world, adverse conditions. The proposed dataset is publicly available to support further research and can be accessed at: https://valfride.github.io/nascimento2024toward/.

Semantic Coherence of Short Text at the Word Level

2025-04-16T16:27:02+00:00

Most text coherence models proposed in the literature focus on sentence ordering and semantic similarity of neighboring sentences. Thus, they cannot be applied to documents with just one sentence and do not properly look at incoherences caused by particular words. This work, on the other hand, focuses on word coherence in short texts. It proposes a framework called COHEWL (COHErence at Word Level) for assessing short document coherence at the word semantic level. COHEWL also support contrastive data generation by exchanging particular words with other ones that may fit in the context of the respective documents. Experiments with single-sentence questions typical of QA in Brazilian Portuguese and English were conducted. BERT, properly trained for a new task proposed in this paper – discriminating original documents from those with a changed word – achieves accuracy between 80% and 99.88%. However, our experimental results did not show relevant correlations of the BERT Masked Language Model (MLM) word prediction rank with coherence (or incoherence) measures calculated as average similarities (or distances) between BERT embeddings of text changed by the predicted words. In addition, in our manually created corpus of coherent and incoherent questions about data structures, coherence measures based on a topic model built from a few documents of the same domain discriminate coherent documents from incoherent ones with much higher precision than the coherence measures derived from BERT embeddings.

The evaluation of prosody in speech synthesis: a systematic review

2025-03-28T17:26:30+00:00

This paper presents a systematic review on the relationship between prosody and speech synthesis, focusing on the evaluation of prosodic parameters of synthesized speech. The relevance of the topic lies in the fact that the task of speech synthesis has not yet been resolved, therefore the information obtained in this review can contribute to knowledge and to the improvement of the methodologies used in evaluating the prosody of synthesized speech. To select studies, we used the Parsifal platform, including 100 studies published between 2020 and 2024, with the purpose of answering eight previously established research questions. The highlights of this systematic review are presented in the following. The main prosodic parameters considered in speech synthesis systems are fundamental frequency (F0), duration and intensity, with F0 standing out (95 studies). The metric most frequently used in studies belongs to the group of acoustic metrics --- F0-RMSE (Root mean-squared error evaluation of F0). Lower values of this metric indicate greater proximity between the F0 of synthesized speech and that of natural speech. The most used dataset was LJ Speech, a public domain speech dataset consisting of English audio clips of a single speaker reading short excerpts from seven non-fiction books, reinforcing that the predominant language was English --- 48 studies focus on English to evaluate speech description prosody, although there is a relevant number of studies in Mandarin Chinese (27 studies) and Japanese (15 studies).Most studies used models as a baseline to compare the performance of their methods or proposed new models in order to improve the prosody of synthesized speech. Each study presented different methods for this improvement, according to the objectives, such as learning prosodic features extracted from reference speech and adding auxiliary modules to existing model architectures. As highlighted baselines, there was recurrent use of Tacotron 2, which generates mel-spectrograms from text and then synthesize speech from the generated mel-spectrograms using a separately trained vocoder, and FastSpeech 2, which can extract explicit prosodic features to be directly used as entry into training.