https://journals-sol.sbc.org.br/index.php/jbcs/issue/feedJournal of the Brazilian Computer Society2026-01-20T16:18:16+00:00Soraia Mussesoraia.musse@pucrs.brOpen Journal Systems<div class="cms-item cms-collection cms-collection--split cms-collection--untitled" data-fragment="784856"> <div class="cms-collection__row"> <div class="cms-collection__column"> <div class="cms-collection__column-inner"> <div class="cms-item cms-collection" data-fragment="784854"> <div id="aimsAndScope" class="cms-item placeholder placeholder-aimsAndScope"> <div class="placeholder-aimsAndScope_content"> <p>The <em>Journal of the Brazilian Computer Society</em> (JBCS) is an international journal which serves as a forum for disseminating innovative research in all fields of computer science and related subjects. Contents include theoretical, practical and experimental papers reporting original research contributions, as well as high quality survey papers. Coverage extends to all computer science topics, computer systems development and formal and theoretical aspects of computing, including computer architecture; high-performance computing; database management and information retrieval; computational biology; computer graphics; data visualization; image and video processing; VLSI design and software-hardware codesign; embedded systems; geoinformatics; artificial intelligence; games, entertainment and virtual reality; natural language processing and much more.</p> <p>The JBCS team wishes that all quality articles be published in the journal independently of the authors' funding capacity. Thus, if the authors are unable to pay the APC charge, we recommend that they contact the editors (editorial@journal-bcs.com). The JBCS team will provide support in finding alternative funding. In particular, a grant from the Brazilian Internet Steering Committee (http://nic.br/) helps sponsor the publication of many JBCS articles.</p> </div> </div> </div> </div> </div> </div> </div>https://journals-sol.sbc.org.br/index.php/jbcs/article/view/4636OneTrack-M: A Multitask Approach for Transformer-Based MOT Models2025-11-07T12:55:00+00:00Luiz Carlos Silva de Araujoluiz.clssss.a@gmail.comCarlos Mauricio Seródio Figueiredocfigueiredo@uea.edu.br<p>Multi-Object Tracking (MOT) is a critical problem in computer vision, essential for understanding how objects move and interact in videos. This field faces significant challenges such as occlusions and complex environmental dynamics, impacting model accuracy and efficiency. While traditional approaches have relied on Convolutional Neural Networks (CNNs), the introduction of transformers has brought substantial advancements. This work introduces OneTrack-M, a transformer-based MOT model that enhances tracking computational efficiency and accuracy. Our approach introduces the transformer encoder as the model backbone, significantly reducing processing time and increasing inference speed. Additionally, we employ innovative data preprocessing and multitask training techniques to address occlusion and diverse objective challenges within a single set of weights. Experimental results demonstrate that OneTrack-M achieves at least 25% faster inference times compared to state-of-the-art models in the literature while maintaining or improving tracking accuracy metrics. These improvements highlight the potential of the proposed solution for real-time applications such as autonomous vehicles, surveillance systems, and robotics, where rapid responses are crucial for system effectiveness.</p>2026-03-27T00:00:00+00:00Copyright (c) 2026 Luiz Carlos Silva de Araujo, Carlos Mauricio Seródio Figueiredohttps://journals-sol.sbc.org.br/index.php/jbcs/article/view/5309Multiclass Classification for Detection of GPS Spoofing and Jamming Attacks on UAVs2025-09-02T16:16:48+00:00Gustavo Gualberto Rocha de Lemosgustavo.gualberto@aluno.ufabc.edu.brRodrigo Augusto Cardoso da Silvacardoso.rodrigo@ufabc.edu.br<p>Unmanned Aerial Vehicles (UAVs) are increasingly being employed across various domains, making them more vulnerable to a range of attacks, particularly cyber threats. These vehicles usually rely on a global navigation satellite system (GNSS), such as the Global Positioning System (GPS) satellites, for location and navigation data, which can be exploited by adversaries launching attacks using fake GPS signals. To safeguard UAVs from GPS Jamming and GPS Spoofing attacks, this paper proposes an Intrusion Detection System (IDS) that utilizes machine learning techniques for detecting and identifying such attacks. The IDS analyzes GPS signal samples representing normal operation, GPS Jamming, and three types of GPS Spoofing attacks. It relies on machine learning, with models trained and tested for binary class and multiclass classification. The binary class version aims to identify an occurrence of any attack, irrespective of type, as suggested by previous literature. However, the novelty of this work lies in the multiclass version, which enables the identification of attack types — an essential factor in determining the most effective protective measures and providing data for forensic investigations. Stacking, an ensemble machine learning method, yielded the best results, achieving an accuracy rate of 96.91%. Furthermore, the proposed multiclass IDS reduced false negatives to 0.71%, leading to an improved IDS that reduces the likelihood of overlooking attacks compared to the binary class version, which is crucial in real UAV deployments.</p>2026-03-17T00:00:00+00:00Copyright (c) 2026 Gustavo Gualberto Rocha de Lemos, Rodrigo Augusto Cardoso da Silvahttps://journals-sol.sbc.org.br/index.php/jbcs/article/view/5365Enhancing Red Team Agent Learning with the Kill Chain Catalyst Algorithm in Capture the Flag Scenarios2025-06-10T16:22:17+00:00Antonio Hortaantonio@horta.net.brAnderson dos Santosanderson@ime.eb.brRonaldo Goldschmidtronaldo.rgold@ime.eb.br<p>With the advancement of technology, tasks once performed by humans have increasingly transitioned to machines or agents equipped with artificial intelligence, including various cyber security domains. From the perspective of real-world cyber attacks, executing actions with minimal failures and steps is critical to reducing the likelihood of exposure. Although research on autonomous cyber attacks predominantly employs Reinforcement Learning (RL), this approach has gaps in scenarios such as limited training data, low resilience in dynamic environments, and limited interpretability of decision-making policies. Therefore, Kill Chain Catalyst (KCC), an <em>RL</em> algorithm based on Gini Impurity-Based Weighted Random Forest that prioritizes interpretability, efficiency in scenarios with limited experience, and resilience in dynamic environments explored by <em>RL</em> agents, has been introduced. <em>KCC</em> leverages decision tree logic for enhanced interpretability and employs a catalyst module inspired by genetic alignment to optimize the search for efficient attack sequences. More than 150 attack experiments were conducted to evaluate learning in terms of offset, speed, and generalization. The analysis focused on the steps, rewards, and failures of agents using the RL algorithms <em>KCC</em>, <em>PPO</em>, <em>DQN</em>, <em>TRPO</em>, and <em>A2C</em>, within a <em>Capture the Flag</em> tournament setting. Both static and dynamic scenarios with limited learning experiences were considered. These experiments demonstrate the superior performance of <em>KCC</em>, revealing differences of up to 198.69% for steps, 129.43% for rewards, and 1096.39% for failures when performing attacks using <em>KCC</em> compared with the other algorithms.</p>2026-03-16T00:00:00+00:00Copyright (c) 2026 Antonio Horta, Anderson dos Santos, Ronaldo Goldschmidthttps://journals-sol.sbc.org.br/index.php/jbcs/article/view/5376Evaluation of explainable artificial intelligence techniques in the context of credit card fraud detection2025-11-21T17:16:20+00:00Gabriel Mendes de Limamendes.gabriel@ufabc.edu.brPaulo Henrique Pisanipaulo.pisani@ufabc.edu.br<p>Artificial intelligence has been employed in several applications in the financial sector. This paper deals with one of these applications: fraud detection in credit card transactions. In this context, a number of machine learning algorithms can be used to obtain models which automate the classification of a transaction as fraudulent or genuine. However, some of these machine learning algorithms are not directly interpretable. The current paper presents an evaluation of explainable artificial intelligence techniques SHAP and LIME applied to models for fraud detection in credit card transactions. Along with the results of the evaluation, the paper discusses the effectiveness and need for explainable artificial intelligence techniques. This paper extends a previous paper by including hyperparameter tuning, new results and an evaluation of the processing time to obtain explanations. The reported results suggest that SHAP obtains better results than LIME, although LIME required less processing time after obtaining the LIME explainer.</p>2026-03-25T00:00:00+00:00Copyright (c) 2026 Gabriel Mendes de Lima, Paulo Henrique Pisanihttps://journals-sol.sbc.org.br/index.php/jbcs/article/view/5390Improved Biclique Cryptanalysis of the Lightweight Cipher FUTURE2025-07-05T17:25:20+00:00Gabriel de Carvalhogabrielc@eng.uerj.brLuis Kowadaluis@ic.uff.br<p>In the past decade, lightweight cryptography has been of much interest in the academia, especially regarding the cryptanalysis of such ciphers. The National Institute of Standards and Technology (NIST) is one of the entities responsible for this interest, given that they promoted in 2019 a public process to choose the American standard for lightweight cryptography. In 2022, the FUTURE cipher was published and has since been the target of much cryptanalysis, including integral, meet-in-the-middle and differential cryptanalysis in a very short period of time. The objective of this paper is to present four biclique attacks that are better than the one previously published, in terms of time, memory and data complexities, obtained through semi-automatic search. Our fastest attack requires 2<sup>124.38</sup> full computations of the cipher to run, while requiring only 2<sup>24</sup> data pairs and negligible memory. We also present the fastest unbalanced biclique attack and star attack to our knowledge. Only one integral attack on FUTURE has been published that is faster than our attacks, 2<sup>123.70</sup> without using the full codebook of data, i.e. less than 2<sup>64</sup> pairs of plaintexts/ciphertexts, requiring 2<sup>63</sup> pairs. Still, when compared to it, our attacks use much less data while being only slightly slower, which presents a good trade-off.</p>2026-03-25T00:00:00+00:00Copyright (c) 2026 Gabriel de Carvalho, Luis Kowadahttps://journals-sol.sbc.org.br/index.php/jbcs/article/view/5424Building flexible databases by using web services for computer-aided diagnosis of cardiomyopathies: from conceptual definition to usability evaluation2025-10-16T07:07:54+00:00Larissa Terto Alvimlterto17@gmail.comVagner Mendonça Gonçalvesvagner.goncalves@usp.brFátima L. S. Nunesfatima.nunes@usp.br<p>Computer-aided diagnosis (CAD) systems based on medical images and records apply computational techniques to process data and extract features from them to provide a second opinion to the health professional. A diverse and organized set of images and records is necessary to develop and validate such systems. However, medical data are generally obtained in a non-standardized way. With each new research and development project in this area, specific data models need to be built to organize and standardize these data and enable their use in the construction of models and computational systems. This article presents a flexible and generic database modeled and implemented to persist Cardiac Magnetic Resonance exams aiming to support the development of CAD schemes of cardiomyopathies. Furthermore, a web application was developed to enable data search and retrieval from the database. An experiment was carried out to evaluate the interface usability of the web application. Results showed that it is possible to develop a generic and flexible DB model, which can be used in several CAD applications. Additionally, the implemented interface received positive evaluations on its functionalities and usability, and users were capable of performing the intended tasks with correct outcomes.</p>2026-03-20T00:00:00+00:00Copyright (c) 2026 Larissa Terto Alvim, Vagner Mendonça Gonçalves, Fátima L. S. Nuneshttps://journals-sol.sbc.org.br/index.php/jbcs/article/view/5449Survey of Brazilian Open Budget Data Portals: Query Interfaces and Dashboards2025-07-17T14:21:39+00:00Kaline B. F. Mesquitamesquitabfkaline@gmail.comDennis G. Balreiradgbalreira@inf.ufrgs.brAndre S. Spritzerspritzer@gmail.comCarla M. D. S. Freitascarla@inf.ufrgs.br<p>To promote transparency, the Brazilian government provides access to public data through web portals featuring query interfaces and dashboards. While query interfaces are used by more experienced users to gather data for further analyses, dashboards that include visualizations help a broader audience consult and explore data. A domain of particular complexity that benefits from the use of these interfaces is government spending and budgets. This study analyzes dashboards and query interfaces of government budget data through qualitative research based on a survey. Focusing on Brazil's budget transparency initiative, we examined 83 interfaces in total: 30 dashboards and 53 query interfaces from federal, state, and major city governments. This survey assesses these interfaces using design patterns for general-purpose dashboards and design principles for open government data dashboards. Our findings reveal a critical weakness: while most portals provide access to budget data, they largely neglect user-centered design, failing to provide the necessary context or consider the data literacy of their audience. This creates a significant "transparency gap'' that undermines genuine accountability and demonstrates the need for a fundamental shift in the design of these essential public tools.</p>2026-03-25T00:00:00+00:00Copyright (c) 2026 Kaline B. F. Mesquita, Dennis G. Balreira, Andre S. Spritzer, Carla M. D. S. Freitashttps://journals-sol.sbc.org.br/index.php/jbcs/article/view/5482Subspace representations in deep neural networks: A survey2025-09-04T20:27:10+00:00Stéfane Rêgo Gandrastefanerego@gmail.comBernardo Bentes Gattogatto_b@mti.co.jpEulanda Miranda dos Santosemsantos@icomp.ufam.edu.br<p>Computer vision applications often involve processing large-scale multidimensional data, requiring methods that are both efficient and accurate. Traditional pattern recognition methods based on subspace representations offer low computational complexity but typically underperform compared to deep learning models in terms of recognition accuracy. This study aims to explore and analyze the integration of subspace representations within deep learning frameworks to leverage the advantages of both approaches. We conducted a comprehensive survey of existing methods that combine subspace representation techniques with deep neural networks. We propose a taxonomy to categorize these methods into three distinct groups based on their integration strategies. The reviewed methods demonstrate that incorporating subspace representations can enhance the performance and efficiency of deep learning models. The taxonomy helps to clarify the landscape of these hybrid approaches and identifies trends in methodological development. The surveyed approaches demonstrate a clear methodological evolution, contributing to enhanced outcomes in various real-world applications.</p>2026-03-27T00:00:00+00:00Copyright (c) 2026 Stéfane Rêgo Gandra, Bernardo Bentes Gatto, Eulanda Miranda dos Santoshttps://journals-sol.sbc.org.br/index.php/jbcs/article/view/5545EasyGuard: A Gamified App for Generating Strong and Memorable Passwords2025-05-14T07:37:41+00:00Hugo L. Romãohugo8romao@gmail.comMarcelo H. O. Henklainmarcelo.henklain@ufrr.brFelipe L. Lobofelipe.lobo@ufrr.brEduardo L. Feitosaefeitosa@icomp.ufam.edu.br<p>Although the use of online services has increased substantially over the past decade, the strength of user-created passwords has remained at concerning levels. This study aimed to develop and evaluate the efficiency of a gamified application in promoting the behavior of designing strong passwords. Two rounds of experiments were conducted, each lasting nine days. In the first experiment (<em>n</em> = 10), we evaluated the passwords generated based on user inputs compared to random passwords. Our findings showed that our app generated passwords with an improvement of 68.43 percentage points in the memorization test, 4.87 p.p. in the typing test, and 60.38 p.p. in the combined memorization and typing test. In the second experiment (<em>n</em> = 15), we incorporated a dictionary-based password generation policy into the evaluation and applied an automated tool for data collection. User input-based passwords outperformed random ones by 87.26 p.p. in the memorization test, 2.75 p.p. in the typing test, and 85.92 p.p. in the combined test. Meanwhile, dictionary-based passwords showed improvements of 54.32 p.p., 1.69 p.p., and 69.70 p.p., respectively. Our approach proved promising in promoting strong and memorable passwords. Nonetheless, EasyGuard requires further development and should be further investigated in future studies.</p>2026-03-17T00:00:00+00:00Copyright (c) 2026 Hugo L. Romão, Marcelo H. O. Henklain, Felipe L. Lobo, Eduardo L. Feitosahttps://journals-sol.sbc.org.br/index.php/jbcs/article/view/5548High-Performance Elliptic Curve Cryptography: A SIMD Approach to Modern Curves (Thesis Distillation)2025-11-21T17:18:03+00:00Armando Faz-Hernandezarmfazh@gmail.comJulio Lópezjlopez@ic.unicamp.br<p>Cryptography based on elliptic curves is endowed with efficient methods for public-key cryptography. Recent research has shown the superiority of the Montgomery and Edwards curves over the Weierstrass curves as they require fewer arithmetic operations. Using these modern curves has, however, introduced several challenges to the cryptographic algorithm's design, opening up new opportunities for optimization. Our main objective is to propose algorithmic optimizations and implementation techniques for cryptographic algorithms based on elliptic curves. In order to speed up the execution of these algorithms, our approach relies on the use of extensions to the instruction set architecture. In addition to those specific for cryptography, we use extensions that follow the Single Instruction, Multiple Data (SIMD) parallel computing paradigm. In this model, the processor executes the same operation over a set of data in parallel. We investigated how to apply SIMD to the implementation of elliptic curve algorithms. As part of our contributions, we design parallel algorithms for prime field and elliptic curve arithmetic. We also design a new three-point ladder algorithm for the scalar multiplication <em>P+kQ</em>, and a faster formula for calculating <em>3P</em> on Montgomery curves. These algorithms have found applicability in isogeny-based cryptography. Using SIMD extensions such as SSE, AVX, and AVX2, we develop optimized implementations of the following cryptographic algorithms: X25519, X448, SIDH, ECDH, ECDSA, EdDSA, and qDSA. Performance benchmarks show that these implementations are faster than existing implementations in the state of the art. Our study confirms that using extensions to the instruction set architecture is an effective tool for optimizing implementations of cryptographic algorithms based on elliptic curves. May this be an incentive not only for those seeking to speed up programs in general but also for computer manufacturers to include more advanced extensions that support the increasing demand for cryptography.</p>2026-03-25T00:00:00+00:00Copyright (c) 2026 Armando Faz-Hernandez, Julio Lópezhttps://journals-sol.sbc.org.br/index.php/jbcs/article/view/5558A Triad of Defenses to Mitigate Poisoning Attacks in Federated Learning2025-05-19T14:52:35+00:00Blenda Oliveira Mazettoblenda.mazetto@uel.brBruno Bogaz Zarpelãobrunozarpelao@uel.br<p>Federated learning (FL) enables the training of machine learning models on decentralized data, potentially improving data privacy. However, the FL distributed architecture is vulnerable to poisoning attacks. In this paper, we propose an FL method capable of mitigating these attacks through a triad of defense strategies: organizing clients into groups, evaluating the local performance of global models during training, and using a voting scheme during the inference phase. The proposed approach first divides the clients into randomly sampled groups, each generating a distinct global model. Each client trains a local model on their private data and submits it to the central server. The central server aggregates the local models within each group to generate the global models. Then, each client receives all global models, selects the best performing one as their new local model, and the process repeats until training is complete. During the inference phase, each client classifies its inputs according to a majority-based voting scheme among the global models. Our experiments using the HAR and MNIST datasets demonstrate that our method can effectively mitigate poisoning attacks without compromising the global model's performance.</p>2026-03-16T00:00:00+00:00Copyright (c) 2026 Blenda Oliveira Mazetto, Bruno Bogaz Zarpelãohttps://journals-sol.sbc.org.br/index.php/jbcs/article/view/5565Partial integrity, authenticity and belongingness using modification-tolerant signature schemes2025-06-19T07:18:13+00:00Anthony Bernardo Kamersanthony.kamers@posgrad.ufsc.brGustavo Zamboningustavo.zambonin@posgrad.ufsc.brThaís Bardini Idalinothais.bardini@ufsc.brPaola de Oliveira Abelpaola.abel@grad.ufsc.brJean Everson Martinajean.martina@ufsc.br<p>Digital signatures allow us to ensure that the signed digital data is authentic and has not been modified. However, even a single bit modification in the data invalidates the entire signature. In INDOCRYPT '19, Idalino et al. presented an efficient modification-tolerant signature scheme (MTSS) framework using combinatorial group testing techniques, allowing the location and correction of modified parts of the signed data. In this work, we implement their framework and discuss the practical performance of the solution. We also propose various necessary auxiliary algorithms not explored in the initial work, such as the division of data into blocks and the generation of the underlying combinatorial structure needed for the signature generation. Moreover, we propose a novel use case of the framework, which we call the <em>belongingness framework</em>. This scheme allows the verification of the integrity and authenticity of a subset of the signed data without having access to the whole data. This is particularly interesting in big data applications, where access to the whole signed data is prohibitive due to storage limitations.</p>2026-03-16T00:00:00+00:00Copyright (c) 2026 Anthony Bernardo Kamers, Gustavo Zambonin, Thaís Bardini Idalino, Paola de Oliveira Abel, Jean Everson Martinahttps://journals-sol.sbc.org.br/index.php/jbcs/article/view/5687A survey of social media stance detection using non-textual features2025-07-12T19:47:37+00:00Laís Carraro Leme Cavalheirolaiscarraro@usp.brIvandré Paraboniivandre.paraboni@gmail.com<p>Stance detection is known as the computational task of estimating an individual's attitude towards a given target topic, which is often of a political or moral nature. In traditional NLP fashion, models of this kind have relied mainly on learning features extracted from social media text. However, social media may provide many other types of non-content information in conjunction with text, such as friends networks, interactions with other users, etc. These knowledge sources, despite being potentially useful for stance prediction, remain relatively little discussed in existing surveys of the field. To fill this gap in the literature, this article presents a survey of stance detection research focusing on the use of network-related features and on how these are combined with more standard text models.</p>2026-03-25T00:00:00+00:00Copyright (c) 2026 Laís Carraro Leme Cavalheiro, Ivandré Parabonihttps://journals-sol.sbc.org.br/index.php/jbcs/article/view/5703Turbocharging Brazilian Mergers and Acquisitions: Questions & Answers Evaluation2025-10-02T11:07:48+00:00Francis Spiegel Rubinfran.spiegel@edu.unirio.brPedro Nuno de Souza Mourapedro.moura@uniriotec.brAdriana Cesario de Faria Alvimadriana@uniriotec.br<p>Economic power abuse is a concern in Brazil, where CADE (Administrative Council for Economic Defense) institution combats anti-competitive behaviors to ensure fair competition. Artificial intelligence (AI) can aid CADE by identifying and extracting relevant information from technical reports published in Brazilian Portuguese language, improving the detection and prevention of economic abuse. This paper presents a case study using AI to improve regulatory reviews of CADE documents via a Retrieval-Augmented Generation (RAG) pipeline architecture. Our key contribution is the creation of a specialized Questions & Answers benchmark dataset and a pipeline evaluation methodology, providing a standardized framework for Portuguese-language regulatory document analysis. A chain of thought (CoT) approach was used for problem solving. It leverages the RAG retrieval mechanism to access relevant information and incorporates the sequential reasoning of the CoT framework to generate responses that follow a logical flow of ideas, thus enhancing response accuracy. A vector database employing cosine similarity was used to retrieve the main arguments combined with metadata filters, reducing hallucinations and improving the Large Language Model (LLM) performance. RAG metrics were then combined with a robust human fact-check assessment to validate the pipeline. Our findings establish a new benchmark for Questions & Answers evaluation in Brazilian Mergers and Acquisitions, demonstrating that the proposed strategy effectively enhances the analysis of organizational merger and acquisition reports, unlocking substantial benefits for society.</p>2026-03-17T00:00:00+00:00Copyright (c) 2026 Francis Spiegel Rubin, Pedro Nuno de Souza Moura, Adriana Cesario de Faria Alvimhttps://journals-sol.sbc.org.br/index.php/jbcs/article/view/5851Visually Comparing Graph Vertex Ordering Algorithms through Geometrical and Topological Approaches2025-10-12T13:25:13+00:00Karelia Vilca Salinaskareliavs17@gmail.comVictor Barellavictorhb@icmc.usp.brThales Vieirathales@ic.ufal.brLuis Gustavo Nonatognonato@icmc.usp.br<p>Graph vertex ordering is a resource widely employed in spatial data analysis, particularly in the urban analytics context, where street graphs are frequently used as spatial discretization for modeling and simulation. Vertex ordering is also important for visualization purposes, as many methods require the vertices to be arranged and displayed in a well-defined order to enable the visual identification of non-trivial patterns. The primary goal of vertex ordering methods is to find an ordering that preserves neighborhood relations. However, the structural complexity of graphs employed in real-world applications leads to unavoidable distortions in the ordering process. Therefore, comparing different vertex ordering methods is fundamental to enable effective analysis and selection of the most appropriate method in each application. Although several metrics have been proposed to assess spatial vertex ordering, they typically focus on measuring the quality of the ordering globally. Global ordering assessment does not enable the analysis and identification of locations where distortions are more pronounced, hampering the analytical process. Visual evaluation of the vertex ordering mechanisms is particularly valuable in this context, as it allows analysts to distinguish between methods based on their performance within a single visualization, assess distortions, identify regions with anomalous behavior, and, in urban contexts, explain spatial inconsistencies in the ordering. This work introduces a visualization-assisted tool to assess vertex ordering techniques, having urban analytics as the application focus. Specifically, we evaluate geometric and topological vertex ordering approaches using urban street graphs as the basis for comparisons. The visual tool builds upon existing and newly proposed metrics, which are validated through experiments on urban data from multiple cities, demonstrating that the proposed methodology is effective in assisting users in selecting a suitable vertex ordering technique, fine-tuning hyperparameters, and identifying regions with high ordering distortions.</p>2026-03-16T00:00:00+00:00Copyright (c) 2026 Karelia Vilca Salinas, Victor Barella, Thales Vieira, Luis Gustavo Nonatohttps://journals-sol.sbc.org.br/index.php/jbcs/article/view/5871LiwTERM-r: A Revised Lightweight Transformer-based Model for Multimodal Skin Lesion Detection Robust to Incomplete Input2025-09-04T20:21:29+00:00Luis Antonio de Souza Júniorsouzajr.la@gmail.comAndré Georghton Cardoso Pachecoapacheco@inf.ufes.brThiago Oliveira dos Santostodsantos@inf.ufes.brWyctor Fogos da Rochawyctor.rocha@edu.ufes.brPedro Henrique Bouzonpedro.bouzon@edu.ufes.brChristoph Palmchristoph.palm@oth-regensburg.deJoão Paulo Papajoao.papa@unesp.br<p>As the most common type of cancer in the world, skin cancer accounts for approximately 30% of all diagnosed tumor-based lesions. Early diagnosis can reduce mortality and prevent disfiguring in different skin regions. With the application of machine learning techniques in recent years, especially deep learning, promising results in this task could be achieved, presenting studies demonstrating that the combination of patients' clinical anamneses and images of the injured lesion is essential for improving the correct classification of skin lesions. Despite that, meaningful use of anamneses with multiple collected images of the same skin lesion is mandatory, requiring further investigation. Thus, this project aims to contribute to developing multimodal machine learning-based models to solve the skin lesion classification problem by employing a lightweight transformer model that is robust to missing clinical information input. As a main hypothesis, models can be fed by multiple images from different sources as input along with clinical anamneses from the patient's historical evaluations, leading to a more factual and trustworthy diagnosis. Our model deals with the not-trivial task of combining images and clinical information concerning the skin lesions in a lightweight transformer architecture that does not demand high computation resources or even all the information from the anamneses but still presents competitive classification results.</p>2026-03-16T00:00:00+00:00Copyright (c) 2026 Luis Antonio de Souza Júnior, André Georghton Cardoso Pacheco, Thiago Oliveira dos Santos dos Santos, Wyctor Fogos da Rocha, Pedro Henrique Bouzon, Christoph Palm, João Paulo Papahttps://journals-sol.sbc.org.br/index.php/jbcs/article/view/5888Crowd-Powered Sampling for Machine Learning: Leveraging Citizen Scientist Response Patterns in AutoML Workflows2025-10-12T13:27:25+00:00Hugo Resendehresende@unifesp.brEduardo B. Netoebneto@unifesp.brFabio A. M. Cappabiancocappabianco@unifesp.brÁlvaro L. Fazendaalvaro.fazenda@unifesp.brFabio A. Fariafabio.faria@tecnico.ulisboa.pt<p>Defining effective models for data classification is challenging, especially in complex contexts. Automated Machine Learning (AutoML) tools can assist in this process by generating rankings tailored to the nature of the data and the problem. In this work, we investigate the performance of five classifiers applied to the task of deforestation segment classification, using data labeled through a citizen science campaign from the ForestEyes project. We selected SVM, Ridge, AdaBoost, KNN, and MLP models based on a ranking generated with the PyCaret AutoML library, prioritizing diverse modeling approaches. Initially, the performance of the models is assessed using the incremental training strategy based on entropy of the volunteer's classifications. Then, a new training strategy is proposed based on the median response time of volunteers when evaluating each segment, exploring three ordering strategies: ascending, descending, and edge-based. Experimental results aligned with the PyCaret ranking, with SVM achieving the best performance, followed by Ridge and AdaBoost, especially when trained on smaller and more reliable data subsets. Both the entropy-based approach and the new strategy using median response time demonstrated strong potential to efficiently train machine learning models in scenarios with scarce data, typical in citizen science campaigns.</p>2026-03-16T00:00:00+00:00Copyright (c) 2026 Hugo Resende, Eduardo B. Neto, Fabio A. M. Cappabianco, Álvaro L. Fazenda, Fabio A. Fariahttps://journals-sol.sbc.org.br/index.php/jbcs/article/view/6448An approach to Data Literacy through a Personalized Interactive LGPD Guide using LLM for Educators2025-10-15T13:39:48+00:00César Murilo da Silva Juniorcesar.junior@serpro.gov.brSilvio E. Quincozessilvioquincozes@unipampa.edu.brJuliana Saraivajulianajags@dcx.ufpb.brRafael D. Araújorafael.araujo@ufu.br<p>The Brazilian General Data Protection Law (LGPD) was created to protect the fundamental rights of freedom and privacy of Brazilian citizens. Since its implementation, it has brought new challenges to all institutions established in Brazil, whether public or private, requiring an adaptation to personal data processing practices. In the context of higher education, many professors face difficulties in understanding and properly applying the guidelines of this legislation in their daily activities. This work proposes the development of an approach to data literacy through an interactive guide, based on practical scenarios, to support educators in the process of complying with the LGPD. The proposed system uses the OpenAI API to offer personalized support in real-time. Ten representative academic scenarios were implemented, in which users can interact through multiple-choice questions followed by a chat with the guide. The results showed that, despite initial usability limitations, the system represents a promising tool to promote the comprehension of LGPD among teachers. We observed that our approach can facilitate compliance with the legislation, but requires accessibility and usability improvements to ensure greater and easier adoption.</p>2026-03-25T00:00:00+00:00Copyright (c) 2026 César Murilo da Silva Junior, Silvio E. Quincozes, Juliana Saraiva, Rafael D. Araújohttps://journals-sol.sbc.org.br/index.php/jbcs/article/view/6635AI-Driven Hierarchical Taxonomy Generation from Emergency Call Transcripts2025-10-27T14:16:00+00:00Juan Gabriel Flores Sanchezjuanfloressanchez@es.uazuay.edu.ecMarcos Orellanamarore@uazuay.edu.ecPatricio Santiago García-Monterosantyg20@est.uazuay.edu.ecJorge Luis Zambrano-Martinezjorge.zambrano@uazuay.edu.ec<p class="p1">This article presents a case study on hierarchical topic modeling for emergency call transcripts from Ecuador's ECU 911 service. We introduce a hybrid methodology that first generates a taxonomy from unlabeled data using <em>BERTopic</em> and agglomerative clustering, and then employs embedding-based similarity for multi-label classification. By leveraging multilingual embeddings (<em>LaBSE</em>) and clustering algorithms (<em>UMAP & HDBSCAN</em>), we identified 23 coherent topics, demonstrating a practical balance between accuracy and operational applicability. The key result is a significant reduction in Hamming Loss and an F1-score of 0.4951, achieved without the need for pre-labeled data. This underscores the method's primary practical significance: offering a scalable, automated solution for emergency management centers to rapidly categorize complex incidents, thereby enhancing situational awareness and resource allocation. The integration of <em>LLaMA 3</em> for automated label generation further optimized semantic interpretation, highlighting the potential of language models in critical, resource-constrained domains.</p>2026-03-25T00:00:00+00:00Copyright (c) 2026 Juan Gabriel Flores Sanchez, Marcos Orellana, Patricio Santiago García-Montero, Jorge Luis Zambrano-Martinezhttps://journals-sol.sbc.org.br/index.php/jbcs/article/view/5787Comparing Explainable AI Techniques In Language Models: A Case Study For Fake News Detection in Portuguese2025-07-25T20:33:45+00:00Jéssica Vicentinijvicentini99@gmail.comRafael Bezerra de Menezes Rodriguesrafael.rodrigues@unesp.brArnaldo Candido Juniorarnaldo.candido@unesp.brIvan Rizzo Guilhermeivan.guilherme@unesp.br<p>Language models are widely used in natural language processing, but their complexity makes interpretation difficult, limiting their adoption in critical decision-making. This work explores Explainable Artificial Intelligence (XAI) techniques, such as LIME and Integrated Gradients (IG), to understand these models. The study evaluates the effectiveness of BERTimbau in classifying Portuguese news as true or fake, using the FakeRecogna and Fake.Br Corpus datasets. In the experiments, LIME proved to be easier to interpret than IG, and both methods showed limitations when applied to texts, as they focus only on the morphological and lexical levels, ignoring other important levels.</p>2026-01-21T00:00:00+00:00Copyright (c) 2026 Jéssica Vicentini, Rafael Bezerra de Menezes Rodrigues, Arnaldo Candido Junior, Ivan Rizzo Guilhermehttps://journals-sol.sbc.org.br/index.php/jbcs/article/view/5354A Coding-Efficiency Analysis of HEVC Encoder Embedded in High-End Mobile Chipsets2025-06-06T13:35:56+00:00Vítor Costavscosta@inf.ufpel.edu.brMurilo Perlebergmrperleberg@inf.ufpel.edu.brLuciano Agostiniagostini@inf.ufpel.edu.brMarcelo Portoporto@inf.ufpel.edu.br<p>High-end mobile devices require dedicated hardware for real-time video encoding and decoding processes. However, the inherent complexity of the video encoding process, combined with the physical limitations imposed by hardware design such as energy consumption, encoding time, memory usage, and heat dissipation, demands the implementation of various constraints and limitations in commercial hardware to simplify and make them feasible for general use. The High Efficiency Video Coding (HEVC) standard is the main targeted video encoder for processing high-resolution videos in high-end chipsets. This paper aims to analyze the HEVC encoder implemented into three commercial chipsets found in high-end smartphones (Apple iPhone 14 Pro, Samsung Galaxy S23 Plus, and Redmi Note 10S) from three major mobile chip manufacturers (Apple, Qualcomm, and MediaTek), considering the impacts of video encoder limitations on encoding efficiency (BD-Rate) and encoding time. The results in this paper may be used as a comparative foundation for hardware designers and future works in the field, as it exposes the encoding efficiency drawbacks and the encoding time gains that commercial chipsets exhibit in their HEVC encoder.</p>2026-01-22T00:00:00+00:00Copyright (c) 2026 Vítor Costa, Murilo Perleberg, Luciano Agostini, Marcelo Portohttps://journals-sol.sbc.org.br/index.php/jbcs/article/view/4242Learning on hierarchical trees with Random Forest2025-08-26T13:23:42+00:00Raquel Almeidaraquel1908@gmail.comLaurent Amsaleglaurent.amsaleg@irisa.frZenilton Kleber G. do Patrocínio Júniorzenilton@pucminas.brEwa Kijakewa.kijak@irisa.frSimon Malinowskisimon.malinowski@irisa.frSilvio Jamil Ferzoli Guimarãessjamil@pucminas.br<p style="font-weight: 400;">Hierarchies, as described in mathematical morphology, represent nested regions of interest and provide mechanisms to create coherent data organization. They facilitate high-level analysis and management of large amounts of data. Represented as hierarchical trees, they have formalisms intersecting with graph theory and generalizable applications. Due to the deterministic algorithms, the multiform representations, and the absence of a direct quality evaluation, it is hard to insert hierarchical information into a learning framework and benefit from the recent advances. Researchers usually tackle this problem by refining the hierarchies for a specific media and assessing their quality for a particular task. The downside of this approach is that it depends on the application, and the formulations limit the generalization to similar data. This work aims to create a learning framework that can operate with hierarchical data and is agnostic to the input and application. The idea is to transform the data into a regular representation required by most learning models while preserving the rich information in the hierarchical structure. The proposed methods use edge-weighted image graphs and hierarchical trees as input, and they evaluate different proposals on the edge detection and segmentation tasks. The learning model is the Random Forest, a fast and scalable method for working with high-dimensional data. Results demonstrate that it is possible to create a learning framework dependent only on the hierarchical data that presents a state-of-the-art performance in multiple tasks.</p>2026-01-26T00:00:00+00:00Copyright (c) 2026 Raquel Almeida, Laurent Amsaleg, Zenilton Kleber G. do Patrocínio Júnior, Ewa Kijak, Simon Malinowski, Silvio Jamil Ferzoli Guimarãeshttps://journals-sol.sbc.org.br/index.php/jbcs/article/view/5961Statistical Invariance vs. AI Safety: Why Prompt Filtering Fails Against Contextual Attacks2025-07-28T12:52:52+00:00Aline Iosteioste@ime.usp.brSarajane Marques Peressarajane@usp.brMarcelo Fingermfinger@ime.usp.br<p>Large Language Models (LLMs) are increasingly deployed in high-stakes applications, yet their alignment with ethical standards remains fragile and poorly understood. To investigate the probabilistic and dynamic nature of this alignment, we conducted a black-box evaluation of nine widely used LLM platforms, anonymized to emphasize the underlying mechanisms of ethical alignment rather than model benchmarking. We introduce the Semantic Hijacking Method (SHM) as an experimental framework, formally defined and grounded in probabilistic modeling, designed to reveal how ethical alignment can erode gradually, even when all user inputs remain policy-compliant. Across three experimental rounds (324 total executions), SHM achieved a 97.8% success rate in eliciting harmful content, with failure rates progressing from 93.5% (multi-turn conversations) to 100% (both refined sequences and single-turn interactions), demonstrating that vulnerabilities are inherent to semantic processing rather than conversational memory. A qualitative cross-linguistic analysis revealed cultural variations in harmful narratives, with Brazilian Portuguese responses frequently echoing historical and socio-cultural biases, making them more persuasive to local users. Overall, our findings demonstrate that ethical alignment is not a static barrier but a dynamic and fragile property that challenges binary safety metrics. Due to potential risks of misuse, all prompts and outputs are made available exclusively to authorized reviewers under ethical approval, and this publication focuses solely on reporting the research findings.</p>2026-01-27T00:00:00+00:00Copyright (c) 2026 Aline Ioste, SaraJane Peres, Marcelo Fingerhttps://journals-sol.sbc.org.br/index.php/jbcs/article/view/5684An Autonomous Hybrid Data Partitioning Approach for NewSQL Databases2025-07-22T14:04:23+00:00Geomar A. Schreinergschreiner@uffs.edu.brRafael de Santiagor.santiago@ufsc.brDenio Duarteduarte@uffs.edu.brRonaldo dos Santos Mellor.mello@ufsc.br<p>Like online games and the financial market, several applications require specific data management features such as large data volume support, data streaming, and the processing of thousands of OLTP transactions per second. In general, traditional relational databases are not suitable for these requirements. NewSQL is a new generation of databases that combines high scalability and availability with ACID support, being a promising solution for these kinds of applications. Although data partitioning is an essential feature for tuning relational databases, it is still an open issue for NewSQL databases. This paper proposes an automated approach for hybrid data partitioning that minimizes the number of distributed transactions and keeps the system well-balanced. In order to demonstrate its efficacy, we compare our solution with an optimal partitioning solution generated by a solver and a state-of-art baseline. The experiments show that the quality of the partitioning scheme is similar to the optional solution and overcomes the state-of-art approach in number of distributed transactions.</p>2026-02-02T00:00:00+00:00Copyright (c) 2026 Geomar A. Schreiner, Rafael de Santiago, Denio Duarte, Ronaldo dos Santos Mellohttps://journals-sol.sbc.org.br/index.php/jbcs/article/view/5646Limitless Feature Selection: Revolutionizing Evaluation with MH-FSF2025-05-13T14:38:37+00:00Vanderson Rochavanderson@ufam.edu.brDiego Kreutzdiegokreutz@unipampa.edu.brHendrio Bragançahendrio.luis@icomp.ufam.edu.brEduardo Feitosaefeitosa@icomp.ufam.edu.br<p>Feature selection plays a crucial role in developing effective predictive models by reducing dimensionality and emphasizing the most relevant attributes. However, current research in this area often lacks comprehensive benchmarking and frequently depends on proprietary datasets. These limitations hinder reproducibility and may lead to inconsistent or suboptimal model performance. To address these limitations, we introduce the MH-FSF framework, a comprehensive, modular, and extensible platform designed to facilitate the reproduction and implementation of feature selection methods. Developed through collaborative research, MH-FSF provides implementations of 17 methods (11 classical, 6 domain-specific) and enables systematic evaluation on 10 publicly available Android malware datasets. Our results reveal performance variations across both balanced and imbalanced datasets, highlighting the critical need for data preprocessing and selection criteria that account for these asymmetries. We demonstrate the importance of a unified platform for comparing diverse feature selection techniques, fostering methodological consistency and rigor. By providing this framework, we aim to significantly broaden the existing literature and pave the way for new research directions in feature selection, particularly within the context of Android malware detection.</p>2026-02-06T00:00:00+00:00Copyright (c) 2026 Vanderson Rocha, Diego Kreutz, Hendrio Bragança, Eduardo Feitosahttps://journals-sol.sbc.org.br/index.php/jbcs/article/view/5809BENCH4T3: A Framework to Create Benchmarks for Text-to-Triples Alignment Generation2025-08-21T13:19:38+00:00Victor Jesus Sotelo Chicov265173@dac.unicamp.brAndré Gomes Reginoaregino@cti.gov.brJulio Cesar dos Reisjreis@ic.unicamp.br<p>Integrating Large Language Models (LLMs) with Knowledge Graphs (KGs) can significantly enhance their capabilities, leveraging LLMs' text generation skills with KGs' explanatory power. However, establishing this connection is challenging and demands proper alignment between unstructured texts and triples. Building benchmarks demands massive human effort in data curation and translation for non-English languages. The demand for adequate benchmarks for validation purposes negatively impacts research advancements. This study proposes an end-to-end framework to guide the automatic construction of text-to-triple alignment benchmarks for any language, using KGs as input. Our solution extracts relations from input triples and processes them to create accurately mapped texts. The proposed pipeline utilizes data curation through prompt engineering and data augmentation to enhance diversity in the generated examples. We experimentally evaluate our framework for creating a bimodal representation of RDF triples and natural language texts, assessing its ability to generate natural language from these triples. A key focus is on developing a benchmark for the underrepresented Portuguese language, facilitating the construction of models that connect structured data (triples) with text. Our solution is suited to creating a benchmark to improve alignment between KG triples and text data. The results indicate that the generated benchmark outperforms the results of existing solutions. The generative approach benefits from our Portuguese benchmark, achieving competitive results compared to established literature benchmarks. Our solution enables automatic generation of benchmarks for aligning triples and text.</p>2026-02-06T00:00:00+00:00Copyright (c) 2026 Victor Jesus Sotelo Chico, André Gomes Regino, Julio Cesar Dos Reishttps://journals-sol.sbc.org.br/index.php/jbcs/article/view/5620Semiotic Engineering Theory for Human-Computer Integration: An Applicability and Usefulness Evaluation2025-09-03T18:04:29+00:00Glívia Angélica Rodrigues Barbosagliviaangelica@gmail.comRaquel Oliveira Pratesrprates@dcc.ufmg.br<p>The relationship between users and autonomous technologies is evolving towards integration (in the sense of partnership), transcending the stimulus-response interaction between these two agents. To follow this evolution, Human-Computer Interaction (HCI) researchers have defined and characterized a new interaction paradigm, Human-Computer Integration (HInt), which extends the focus of the HCI area to cover this new relationship of partnership between humans and autonomous technologies. As HInt is an emerging paradigm, the concepts and ontology of Semiotic Engineering Theory have been extended to address HInt as an extension of the traditional HCI interaction. Thus, this paper aims to evaluate and discuss the applicability and usefulness of the extension of Semiotic Engineering to define, explore, and explain the phenomena involved in HInt. Our findings provide useful insights and reflections on the benefits and limits of Semiotic Engineering for HInt to support the study, design, and evaluation of the partnership between humans and autonomous technologies.</p>2026-02-21T00:00:00+00:00Copyright (c) 2026 Glívia Angélica Rodrigues Barbosa, Raquel Oliveira Prateshttps://journals-sol.sbc.org.br/index.php/jbcs/article/view/6044STELLAR: A Structured, Trustworthy, and Explainable LLM-Led Architecture for Reliable Customer Support2025-08-11T00:33:37+00:00Matheus Ferracciú Scatolinm252099@dac.unicamp.brHelio Pedrinihelio@ic.unicamp.br<p>While Large Language Models (LLMs) offer transformative potential for automating customer support, significant hurdles remain concerning their reliability, explainability, and consistent performance in complex, sensitive interactions. This paper introduces <strong>STELLAR (Structured, Trustworthy, and Explainable LLM-Led Architecture for Reliable Customer Support)</strong>, a novel architectural blueprint designed to address these issues. STELLAR utilizes a <strong>Directed Acyclic Graph (DAG) structure</strong> composed of nine specialized modules and eleven predefined workflows to orchestrate support interactions in a structured and predictable manner. This design promotes enhanced traceability, reliability, and control compared to less constrained systems. The architecture integrates components for few-shot classification, Retrieval-Augmented Generation (RAG), urgency-aware human escalation, compliance verification, user interaction validation, and knowledge base refinement through a semi-automated loop. This modular design deliberately balances LLM-driven innovation with operational requirements such as human-in-the-loop integration and ethical safeguards through embedded checks. We evaluated the core modules of STELLAR in key tasks - classification, retrieval, and compliance - demonstrating strong performance and reliability. Together, these features position STELLAR as a robust and transparent foundation for the next generation of intelligent, reliable customer support systems.</p>2026-02-21T00:00:00+00:00Copyright (c) 2026 Matheus Ferracciú Scatolin, Helio Pedrinihttps://journals-sol.sbc.org.br/index.php/jbcs/article/view/5511HelBERT: A BERT-Based Pretraining Model for Public Procurement Tasks in Portuguese2025-07-12T19:55:53+00:00Weslley Emmanuel Martins Limaweslley@ufpi.edu.brVictor Ribeiro da Silvavictor.silva@ufpi.edu.brJasson Carvalho da Silvajasson_jcs@ufpi.edu.brRicardo de Andrade Lira Rabêloricardoalr@ufpi.edu.brAnselmo Cardoso de Paivapaiva@nca.ufma.br<p>Deep learning models excel in various tasks but require extensive annotated data for supervised learning. In NLP, limited annotated data hinders deep learning. Self-supervised pretraining addresses this by training models on unlabeled text to learn useful representations. Domain-specific pretraining is crucial for good performance in downstream tasks. Although pretrained BERT models exist for legal documents in some languages, none target public procurement documents in Portuguese. Public procurement documents have terminology that is not found in existing models. In this paper, we propose HelBERT, a BERT-based model pretrained on a large corpus of public procurement documents in the Brazilian Portuguese language, including laws, tender notices, and contracts. The experimental results demonstrate that HelBERT outperforms other models in all analyses. HelBERT surpasses models such as BERTimbau and JurisBERT in classification tasks by achieving improvements of 5% and 4% in the F1 Score, respectively. Furthermore, the model achieves gains that exceed 3% in semantic similarity tasks compared to the baseline models. Moreover, despite using a GPU with reduced memory and processing resources, the proposed approach achieves superior results with fewer and more efficient training epochs than the baseline models. These findings underscore the effectiveness of the proposed model in addressing NLP tasks within the public procurement domain.</p>2026-02-21T00:00:00+00:00Copyright (c) 2026 Weslley Emmanuel Martins Lima, Victor Ribeiro da Silva, Jasson Carvalho da Silva, Ricardo de Andrade Lira Rabêlo, Anselmo Cardoso de Paivahttps://journals-sol.sbc.org.br/index.php/jbcs/article/view/5457RecSys-Fairness: A Framework for Reducing Group Unfairness in Recommendations2025-08-07T10:30:08+00:00Rafael Vargas Mesquita dos Santosrafaelv@ifes.edu.brGiovanni Ventorim Comarelagc@inf.ufes.br<p>In this study, we address the importance of promoting fairness in recommendation systems, which are highly susceptible to biases that can lead to unfair outcomes for different user groups. We developed a fairness algorithm aimed at mitigating these injustices, which was applied to the MovieLens dataset and analyzed based on the recommendations produced by the ALS (Alternating Least Squares) and NCF (Neural Collaborative Filtering) methods. Users were grouped by activity level, gender, and age, and the results demonstrated the effectiveness of the fairness algorithm in substantially reducing group unfairness (R_{grp}) across all tested configurations, without causing significant losses in recommendation accuracy, measured by the Root Mean Squared Error (RMSE). In particular, a reduction in group unfairness of up to 65.57% was observed in the ALS method. Additionally, we identified an optimal convergence of the fairness algorithm for an estimated number of matrices (h) between 10 and 15, suggesting an effective balance point between promoting fairness and maintaining precision in recommendations. In comparison with the available benchmarks, under identical experimental conditions, we managed to improve group unfairness reductions by approximately 6% (from 59.77% to 65.57%).</p>2026-02-21T00:00:00+00:00Copyright (c) 2026 Rafael Vargas Mesquita dos Santos, Giovanni Ventorim Comarelahttps://journals-sol.sbc.org.br/index.php/jbcs/article/view/5608A Reliable Stream Learning Model for Network Intrusion Detection Systems2025-05-15T06:48:45+00:00Pedro Horchulhackpedro.horchulhack@ppgia.pucpr.brEduardo Kugler Viegaseduardo.viegas@ppgia.pucpr.brAltair Olivo Santinsantin@ppgia.pucpr.br<p>Developing a reliable Network Intrusion Detection System (NIDS) remains a complex task due to the non-stationary nature of network traffic and the need for frequent updates to maintain high classification performance. Many existing approaches assume a stationary network environment, which overlooks the challenges associated with periodic model updates, such as the need for large amounts of properly labeled data and significant computational resources. This issue is particularly challenging for real-time applications, where minimizing delays and ensuring accuracy is crucial. This paper proposes an analysis of how changes in the network behavior negatively affects the long-term of ML-Based NIDS. For such a problem, it is proposed a new NIDS approach integrating stream learning with a reject option technique to simplify the model update process while ensuring consistent classification accuracy over time. The proposal uses stream learning classifiers to incrementally incorporate new data, while the reject option allows the system to evaluate the reliability of classifications before they are used for updates. The scheme operates with minimal intervention, with rejected instances stored for future updates and used to fine-tune the model over time, ensuring adaptation to evolving network conditions. Experimental results demonstrate that the proposed approach maintains high classification accuracy over a year, even without recurrent updates, and achieves significant improvements in true positive rates compared to traditional methods. The system can operate for up to three months without updates, with no significant degradation in performance.</p>2026-03-02T00:00:00+00:00Copyright (c) 2026 Pedro Horchulhack, Eduardo Kugler Viegas, Altair Olivo Santinhttps://journals-sol.sbc.org.br/index.php/jbcs/article/view/5873CNNs for JPEGs: Designing Cost-Efficient Stems2025-07-04T21:33:55+00:00Samuel Felipe dos Santossamuel.felipe@ufscar.brNicu Sebeniculae.sebe@unitn.itJurandy Almeidajurandy.almeida@ufscar.br<p>Convolutional neural networks (CNNs) have achieved astonishing advances over the past decade, pushing the state-of-the-art in several computer vision tasks. CNNs are capable of learning robust representations of the data directly from RGB pixels. However, most image data is usually available in compressed format, of which the JPEG is the most widely used due to transmission and storage purposes. For this motive, a preliminary decoding process that has a high computational load and memory usage is demanded. Image decoding can be a performance bottleneck for devices with limited computational resources, such as embedded devices, even when hardware accelerators are used. For this reason, deep learning methods capable of learning directly from the compressed domain have been gaining attention in recent years. These methods usually extract a frequency domain representation of the image, like DCT, by a partial decoding, and then make adaptation to typical CNN architectures to work with it. In this paper, we perform an in-depth study of the computational cost of deep models designed for the frequency domain, evaluating the cost of decoding and passing images through the network. We notice that previous work increased the model's computational complexity to accommodate for the compressed images, nullifying the speed up gained by not decoding images. We propose to remove the changes to the model that increase the computational cost, replacing it with our designed lightweight stems. This way, we can take full advantage of the speed-up obtained by avoiding the decoding. Our strategies were successful in generating models that balance efficiency and effectiveness, allowing deep models to be deployed in a wider array of devices. We achieve up to 25.91% reduction in computational complexity (FLOPs), while only decreasing accuracy in up to 2.97%. We also propose the efficiency-effectiveness score S<sub>E</sub> to highlight models with favorable trade-offs between accuracy, computational cost and number of parameters.</p>2026-03-02T00:00:00+00:00Copyright (c) 2026 Samuel Felipe dos Santos, Nicu Sebe, Jurandy Almeidahttps://journals-sol.sbc.org.br/index.php/jbcs/article/view/6043Generalizing Feature Selection in Android Malware Detection: The SigAPI AutoCraft Approach2025-07-30T08:24:50+00:00Vanderson Rochavanderson.rocha@gmail.comLaura Tschiedeldiegokreutz@unipampa.edu.brDiego Kreutzdiegokreutz@unipampa.edu.brHendrio Bragançahendrio.luis@icomp.ufam.edu.brJoner Assolinjoner.assolin@icomp.ufam.edu.brRodrigo Brandão Mansilharodrigomansilha@unipampa.edu.brSilvio E. Quincozessilvioquincozes@unipampa.edu.brAngelo Gaspar Diniz Nogueiraangelonogueira.aluno@unipampa.edu.br<p>Feature selection methods are widely employed in Android malware detection to improve accuracy and efficiency by identifying the most relevant features. However, their generalizability often remains limited, as approaches like SigAPI are typically developed and evaluated on a small number of datasets, reducing their effectiveness across diverse scenarios. The practical use of SigAPI is further hindered by the need to predefine a minimum number of features, the instability of its evaluation metrics, and its inability to adapt efficiently to the heterogeneity commonly present in Android datasets. To address these limitations, we developed SigAPI AutoCraft, an enhanced and fully automated version of the original method. SigAPI AutoCraft achieves consistent and robust performance across ten Android malware datasets, substantially improving generalization. The results demonstrate a 5–15% increase in Matthews Correlation Coefficient (MCC) and up to a 7.6-fold improvement in feature reduction, underscoring its effectiveness and adaptability to complex and heterogeneous data environments.</p>2026-03-09T00:00:00+00:00Copyright (c) 2026 Vanderson Rocha, Laura Tschiedel, Diego Kreutz, Hendrio Bragança, Joner Assolin, Rodrigo Brandão Mansilha, Silvio E. Quincozes, Angelo Gaspar Diniz Nogueirahttps://journals-sol.sbc.org.br/index.php/jbcs/article/view/5378Towards a Lightweight Multi-View Android Malware Detection Model with Multi-Objective Feature Selection2025-06-28T08:16:04+00:00Philipe Fransoziphilipe.hfransozi@ppgia.pucpr.brJhonatan Geremiasjgeremias@ppgia.pucpr.brEduardo K. Viegaseduardo.viegas@ppgia.pucpr.brAltair O. Santinsantin@ppgia.pucpr.br<p>In recent years, a wide range of new Machine Learning (ML) techniques with high accuracy have been developed for Android malware detection. Despite their high accuracy, these techniques are seldom implemented in production environments due to their limited generalization capabilities, leading to reduced performance when applied to real-world scenarios. In light of this, this paper introduces a novel multi-view Android malware detection model implemented in two stages. The first stage involves extracting multiple feature sets from the analyzed Android application package, offering complementary behavioral representations that improve the system's generalization in the classification process. In the second stage, a multi-objective optimization is conducted to identify the optimal feature subset from each view and fine-tune the hyperparameters of individual classifiers, enabling an ensemble-based classification approach. The core innovation of our approach lies in the proactive selection of feature subsets and the optimization of hyperparameters that together enhance classification accuracy while minimizing processing overhead within a multi-view framework. Experiments conducted on a newly developed dataset, consisting of over 40 thousand Android application samples, validate the effectiveness of our proposal. The results indicate that our model can increase true-positive rates by up to 18% while reducing inference processing costs by as much as 72%.</p>2026-03-09T00:00:00+00:00Copyright (c) 2026 Philipe Fransozi, Jhonatan Geremias, Eduardo K. Viegas, Altair O. Santinhttps://journals-sol.sbc.org.br/index.php/jbcs/article/view/5370Intelligent Emotion Tracking System VIRE: Evaluation of Neural Network Architectures in Facial Emotion Recognition2025-08-27T17:03:57+00:00Nathan Ferraz da Silvanathansilva@usp.brGeraldo Pereira Rocha Filhogeraldo.rocha@uesb.edu.brRoger Immichroger@imd.ufrn.brVinícius Pereira Gonçalvesvpgvinicius@unb.brRodolfo Ipolito Meneguettemeneguette@icmc.usp.br<p>This work proposes an emotional monitoring system called Visual Identification of Recognition of Emotions (VIRE), based on convolutional neural networks (CNNs) to analyze facial expressions. Using the six basic emotions proposed by Paul Ekman as a reference, which can be identified from the composition of various facial muscle states, VIRE aims to assist in the diagnosis of mental health conditions. While emotional expressions are communicated in various ways, this research focuses primarily on facial expressions due to their expressiveness resulting from the mobility of facial muscles. The methodology involved collecting data from the FER2013 dataset, preprocessing the images, hyperparameter tuning, and training three different architectures: AlexNet, DenseNet, and a custom CNN. The research will classify expressions into basic emotions and evaluate the models' performance in terms of accuracy and other metrics. VIRE has demonstrated potential, achieving an accuracy of about 60%, although improvements are needed for practical application. The ultimate goal is to create a tool that integrates technology and health, facilitating the identification of emotional states that may indicate mental health issues, thereby contributing to more accurate and effective diagnoses.</p>2026-03-09T00:00:00+00:00Copyright (c) 2026 Nathan Ferraz da Silva, Geraldo Pereira Rocha Filho, Roger Immich, Rodolfo Ipolito Meneguette, Vinícius Pereira Gonçalveshttps://journals-sol.sbc.org.br/index.php/jbcs/article/view/5904Memorizing Features Efficiently for Self-supervised Video Object Segmentation2026-01-12T14:06:16+00:00Marcelo Mendonçaeng.marcelo.mendonca@gmail.comLuciano Oliveiraluciano.reboucas@gmail.com<p>Video object segmentation (VOS) involves consistently identifying and classifying object pixels in video sequences, a task that traditionally depends on extensive, manually annotated datasets. In this work, we present SHLS (Superfeatures in a Highly Compressed Latent Space), a self-supervised VOS method that reduces reliance on both annotations and large training datasets. SHLS employs a metric learning framework combining superpixels and deep learning features, enabling effective training with just 10,000 unlabeled still images. Utilizing an efficient memory clustering mechanism, SHLS generates ultra-compact representations called superfeatures, which efficiently store and classify object information across video sequences. Experiments on the DAVIS dataset demonstrate SHLS's strong performance in multi-object scenarios, underscoring its potential as a robust and efficient alternative in self-supervised VOS.</p>2026-03-15T00:00:00+00:00Copyright (c) 2026 Marcelo Mendonça, Luciano Oliveirahttps://journals-sol.sbc.org.br/index.php/jbcs/article/view/5551Sapo-boi: Bypassing Linux Kernel Network Stack in the Implementation of an XDP-based NIDS2025-06-28T08:13:27+00:00Raphael Kaviak Machnickirkmach17@gmail.comJoão Ribeiro Andreottijrandreotti@inf.ufpr.brUlisses Penteadoulisses@bluepex.comJorge Pires Correiajpcorreia@inf.ufpr.brVinicius Fulber-Garciavinicius@inf.ufpr.brAndré Grégiogregio@inf.ufpr.br<p>Network intrusion detection systems (NIDS) must inspect multiple parts of a packet to detect patterns of known attacks. With the advent of XDP, it has become feasible to implement such a system within the kernel's own network stack for the evaluation of ingress traffic. In this work, we propose Sapo-boi, an NIDS solution consisting of two modules: (i) the Suspicion Module, an XDP program capable of processing packets in parallel, discarding packets considered safe, and redirecting suspicious packets for verdict in user space through XDP sockets (Af_XDP); and (ii) the Evaluation Module, a user-level process capable of finding the rule to which the suspicious packet should be analyzed in constant time and triggering notifications if the suspicion is confirmed. The system demonstrated superior results in terms of packet analysis rates and CPU usage compared to traditional NIDS alternatives (Snort and Suricata).</p>2026-03-02T00:00:00+00:00Copyright (c) 2026 Raphael Kaviak Machnicki, João Ribeiro Andreotti, Ulisses Penteado, Jorge Pires Correia, Vinicius Fulber-Garcia, André Grégiohttps://journals-sol.sbc.org.br/index.php/jbcs/article/view/5567Portfolio-based Active Learning with Gaussian Processes for Vulnerabilities Risk Classification2025-06-28T08:12:00+00:00Davyson S. Ribeirodavysonribeiro@alu.ufc.brRafael S. Lemosrafael.lemos@alu.ufc.brFrancisco R. P. da Pontefco.rparente@gmail.comCésar Lincoln C. Mattoscesarlincoln@dc.ufc.brEmanuel B. Rodriguesemanuel@dc.ufc.br<p>Effective vulnerability management is essential for cybersecurity, particularly as the demand for skilled professionals often exceeds supply. This paper investigates the application of Gaussian Processes (GPs) integrated with Active Learning (AL) techniques to classify security vulnerabilities based on their risk of exploitation. The main objective is to optimize the labeling process, thereby reducing the amount of labeled data necessary for training an effective classifier. The proposed methodology combines the uncertainty predictions provided by GP models with five established data selection strategies, utilizing a portfolio-based approach. The portfolio avoids the need of choosing a single strategy and leverages the strengths of each technique. This approach enhances adaptability and balances exploration versus exploitation in complex optimization scenarios, ultimately improving the diversity of labeled samples and contributing to the development of better classifiers trained with less examples. Experiments were conducted using the CVEjoin dataset, which encompasses over 200,000 vulnerabilities, across three distinct evaluation scenarios. The different setups consider equivalent volumes of labeled data, but varying Active Learning iterations. When considering a single strategy, the results indicate that the BSB (best and second best) method consistently outperformed the others in terms of accuracy and F1 score, particularly with an increased number of labeling iterations. In the scenario where multiple strategies are used in a portfolio, the results indicate gains in all evaluation metrics. This study underscores the usefulness of a portfolio-based Active Learning approach in optimizing the labeling procedure and, ultimately, prioritizing vulnerabilities for remediation. This research lays the groundwork for extending the framework to other areas of cybersecurity, such as vulnerabilities in web applications and cloud environments, thereby improving overall security measures in the digital landscape.</p>2026-03-02T00:00:00+00:00Copyright (c) 2026 Davyson S. Ribeiro, Rafael S. Lemos, Francisco R. P. da Ponte, César Lincoln C. Mattos, Emanuel B. Rodrigueshttps://journals-sol.sbc.org.br/index.php/jbcs/article/view/5625Implementation and evaluation of the Forro stream cipher in Tofino programmable hardware for remote attestation in datacenters2025-06-28T08:11:42+00:00Rodrigo Alexander de Andrade Pierinirpierini@unicamp.brCaio Teixeiracaio@dca.fee.unicamp.brChristian Rodolfo Esteve Rothenbergchesteve@unicamp.brMarco Aurélio Amaral Henriquesmaah@unicamp.br<p>The software-defined networking (SDN) paradigm has enabled several innovations in computer networking, specially in programmable packet processing. This paper shows the feasibility and impact on computing resources of the Forro stream cipher algorithm in the Tofino programmable hardware switch. For comparison purposes, the ChaCha algorithm was also analyzed in terms of its performance and impact on the same device. It was observed that the Forro algorithm performs better and uses fewer resources than ChaCha in sequential implementations. However, when parallelization techniques are adopted, ChaCha performs better for higher data rates, but uses more ternary matching resources than Forro. For the use case of remote attestation in programmable data planes, the Forro cipher seems more promising, as it uses less limited resources and can achieve sufficient throughput rates for this scenario. We then propose P4DRA, a distributed remote attestation solution based in the programmable data plane that can offload the verification process of remote devices to the data plane, freeing resources from a central verifier based on a x86 server and improving the attestation proof verification speed by around 150 times.</p>2026-02-24T00:00:00+00:00Copyright (c) 2026 Rodrigo Alexander de Andrade Pierini, Caio Teixeira, Christian Rodolfo Esteve Rothenberg, Marco Aurélio Amaral Henriques