Journal of Information and Data Management
https://journals-sol.sbc.org.br/index.php/jidm
<p>JIDM is an electronic journal that is published three times a year. Submissions are continuously received, and the first phase of the reviewing process usually takes 6 to 7 months. JIDM is sponsored by the Brazilian Computer Society, focusing on information and data management in large repositories and document collections. It relates to different areas of Computer Science, including databases, information retrieval, digital libraries, knowledge discovery, data mining, and geographical information systems. </p>Brazilian Computer Societyen-USJournal of Information and Data Management2178-7107HDP+: Leveraging Anomaly and Change Point Detection for Pump-and-Dump in Cryptocurrency Exchanges
https://journals-sol.sbc.org.br/index.php/jidm/article/view/5728
<p>Cryptocurrencies are increasingly gaining relevance in financial markets, attracting both retail and institutional investors. As a result, assessing the risks associated with these new assets has become more important. Unlike traditional market assets, which are regulated and centrally managed, cryptocurrencies were designed with an egalitarian nature, enabling peer-to-peer transactions and providing varying levels of anonymity. These characteristics contribute to a favorable environment for illicit activities, including market manipulation. One such manipulation technique is the pump-and-dump (PD) scheme, which exploits the decentralized and anonymous nature of cryptocurrency markets. Typically orchestrated through public groups on social media platforms, these schemes involve coordinated surges in buy orders to artificially inflate a coin’s price, followed by rapid sell-offs that leave unsuspecting investors with losses. The detection of PD schemes is important because they compromise the integrity and reputation of cryptocurrency markets, undermining investor confidence and contributing to market instability. Most prior research on PD detection has focused on anomaly detection techniques. In this study, we investigate whether combining anomaly detection with change point detection in a hybrid framework can enhance detection performance. We propose HDP+, an improved version of the original HDP method, which processes raw trading records from exchanges and applies an ensemble of anomaly detection and change point detection algorithms. The primary distinction between HDP and HDP+ lies in the anomaly detection component: while HDP relies on traditional volatility-based methods, HDP+ analyzes the time series of rush orders. Experiments conducted on a dataset of confirmed PD events yielded results of 96.4% precision, 89.3% recall, and a 92.7% F1-score, surpassing previous statistical approaches to PD detection.</p>Matheus S. MouraLaís BaroniEduardo OgasawaraDiogo S. Mendonça
Copyright (c) 2026 Journal of Information and Data Management
2026-03-132026-03-1317119420310.5753/jidm.2026.5728FFT-Based Anomaly Detectors: Cutoff Frequency Adjustment and SMA-Based Approach
https://journals-sol.sbc.org.br/index.php/jidm/article/view/5732
<p>This article presents a method for anomaly detection in time series based on the Fast Fourier Transform (FFT) using high-pass filtering. In addition to five existing strategies for determining the cutoff frequency (TF, AF, CAF, BSF, CBSF), a novel approach called SMAF is introduced. SMAF combines spectral analysis with adaptive smoothing using the Simple Moving Average, enabling the detection of high-frequency anomalies without requiring the inverse transform. The experiments employ the Yahoo Webscope dataset and the Numenta Anomaly Benchmark (NAB), providing a comprehensive evaluation. FFT-based approaches are compared to traditional statistical techniques (FBIAD and ARIMA) and machine learning methods (LSTM, ELM, and SVM). The results show that FFT-based methods outperform both statistical and machine learning techniques in terms of F1 score, precision, accuracy, and execution time. Among them, SMAF achieves the highest precision and the lowest execution time, reinforcing the potential of FFT-based filtering for efficient and accurate anomaly detection in time series.</p>Ellen Paixão SilvaHelga BalbiEsther PacittiFabio PortoJoel SantosEduardo Ogasawara
Copyright (c) 2026 Journal of Information and Data Management
2026-03-132026-03-1317120420910.5753/jidm.2026.5732From Legacy to Modern Systems: An Enterprise-ready Workflow for Database Migrations
https://journals-sol.sbc.org.br/index.php/jidm/article/view/5730
<p> Data migration across different database management systems (DBMSs) is a critical task for companies aiming to modernize their infrastructure, reduce risks, and improve operational efficiency. This paper presents a workflow to assist in conducting migrations, from planning to post-migration. The objective is to provide a roadmap for migrations, reducing execution time, mitigating risks, and ensuring data integrity and security. The proposed approach was validated in a real company scenario, demonstrating its feasibility and effectiveness. This article makes a practical contribution to the data migration area, providing a validated roadmap that assists companies in conducting migrations efficiently and safely.</p>Gustavo MoraesVictor MisaelAngelo Brayner
Copyright (c) 2026 Journal of Information and Data Management
2026-03-132026-03-1317121021810.5753/jidm.2026.5730JFUSE: Json FUll Schema Extractor
https://journals-sol.sbc.org.br/index.php/jidm/article/view/5748
<p>Lately, we have witnessed a flood of data generated by several data-centric applications, and the generated data are available in a large fashion of formats. <br />However, those data are mostly weakly structured, irregular, and incomplete; they do not follow a predefined schema. A challenging task is to understand how those data are organized and structured. JSON has become a trendy format for data-centric applications to store and share data. Its success is due to embodying structure and data in the same representation. <br />This makes JSON documents loosely coupled with schemas. Still, schemas are essential for applications to deal with the data more efficiently. In this paper, we propose JFUSE, a tool to deal with the problem of discovering a schema from JSON collections. Besides inferring basic types (e.g., atomic types, arrays, and objects), JFUSE also discovers fields that represent keys in the collection, fields minimum/maximum constraint values, enumeration, tagged unions, metadata as data, objects as collections, and arrays as tuples. We propose a metamodel that can be easily transformed into any schema language (e.g., JSON Schema). Our experiments show that the proposed approach infers concise and correct schemas from (huge) JSON collections.basic types (e.g., atomic types, arrays, and objects), JFUSE also discovers fields that represent keys in the collection, fields minimum/maximum constraint values, enumeration, tagged unions, metadata as data, objects as collections, and arrays as tuples. We propose a metamodel that can be easily transformed into any schema language (e.g., JSON Schema). Our experiments show that the proposed approach infers concise and correct schemas from (huge) JSON collections.</p>Natália BenharaDenio DuarteGeomar A. SchreinerSamuel Feitosa
Copyright (c) 2026 Journal of Information and Data Management
2026-03-132026-03-131711910.5753/jidm.2026.5748MasterMobilityDB: A Storage and Processing Layer for Multiple Aspect Trajectory Data at Scale
https://journals-sol.sbc.org.br/index.php/jidm/article/view/5890
<p>Spatiotemporal data capturing the movement of real-world entities, gathered from sensors and GPS devices, along with its integration with other contextual georeferenced data, has led to the generation of large and complex trajectory datasets. These datasets, referred to as multiple aspect trajectories (MATs), present new challenges for moving object databases. This work introduces MasterMobilityDB, a persistence layer for MATs based on the Master representation model, built on top of the MobilityDB database through a dedicated API. A comparison with the state-of-the-art SecondoDB shows that MasterMobilityDB enables more natural query expressions and delivers better performance for MATs. This is a pioneer solution to deal with MAT persistence and management.</p>Flaris FellerRonaldo dos Santos Mello
Copyright (c) 2026 Journal of Information and Data Management
2026-03-132026-03-13171101610.5753/jidm.2026.5890Evaluating Heterogeneous Node Embedding Compositions Using Diversity Metrics
https://journals-sol.sbc.org.br/index.php/jidm/article/view/5727
<p>This paper evaluates the impact of different embedding composition strategies on classification performance, analyzing local node features, neighboring node features, and metapaths. We conduct a comprehensive experimental evaluation using an authorial Person Relationships heterogeneous graph, incorporating diversity metrics to assess dataset balance and structural complexity. This approach provides deeper insights into their influence on model effectiveness and extends prior research by comparing new results against an established baseline. The experimental findings reaffirm the effectiveness of embedding compositions, with Aggregated Features + Metapaths achieving a Micro-F1 score of 94.04\%, demonstrating highly accurate results, validated by diversity metrics. This outcome highlights the importance of embedding compositions in heterogeneous graph representations, reinforcing its potential to improve predictive performance in real-world graph structures.</p>Silvio Fernando AngoneseRenata Galante
Copyright (c) 2026 Journal of Information and Data Management
2026-03-132026-03-13171172510.5753/jidm.2026.5727Beyond Species: Enhancing Botanical Data Integrity Using Similarity Metrics in Authorship Attribution
https://journals-sol.sbc.org.br/index.php/jidm/article/view/5847
<p>This extended study builds upon prior work exploring the deduplication of botanical authorship records governed by the International Code of Nomenclature (ICN). We introduce new datasets (Sargassaceae and Agaricaceae) to evaluate the performance of multiple text similarity algorithms, including Jaccard, Levenshtein, Jaro-Winkler, Metaphone, N-grams, Smith-Waterman, and Fingerprinting. Our updated methodology incorporates enhanced preprocessing strategies, new threshold calibration techniques, and comprehensive metric-based evaluations (precision, recall, and F1 score). The results reaffirm the robustness of Smith-Waterman and highlight the dataset-dependent behavior of Metaphone and Fingerprinting. This expanded analysis contributes to a more generalized understanding of text similarity challenges in biological databases and reinforces the importance of tailored algorithm selection based on taxonomic structure and data quality.</p>Luma Rios DelponteCarina Friedrich DornelesSimone Silmara Werner
Copyright (c) 2026 Journal of Information and Data Management
2026-03-132026-03-13171263510.5753/jidm.2026.5847Provenance Support for Containerized Workflow Analyses in High-Performance Computing Environments
https://journals-sol.sbc.org.br/index.php/jidm/article/view/5723
<p>Deploying scientific workflows in High-Performance Computing (HPC) environments presents several challenges due to variations in computational infrastructure, execution environments, and resource availability. Containers offer a way to ease workflow deployment and foster reproducibility. However, effective use of containers requires more than just access to container images. Understanding container provenance is essential, as it provides detailed information on image creation, configuration, and execution history, which is critical when deploying workflows across different architectures and container engines. Existing provenance support focuses on tracking container actions and standalone processes, but does not relate it to the provenance of workflows. To address this limitation, we represent container metadata as provenance and relate it to the provenance captured by the workflow execution. This approach enables workflow deployment with multiple container configurations in HPC environments while being compliant with the W3C-PROV standard for structured container provenance. The proposed model was evaluated in a real scientific machine-learning workflow. The evaluation assessed how provenance data can improve traceability, support workflow reproducibility, and facilitate containerized workflow analyses.</p>Liliane KunstmannDébora PinaDaniel de OliveiraMarta Mattoso
Copyright (c) 2026 Journal of Information and Data Management
2026-03-132026-03-13171364510.5753/jidm.2026.5723How Effectively Do LLMs Automate Data Analysis? A Comparative Study with ChatGPT's Data Analyst, Grok, and Qwen
https://journals-sol.sbc.org.br/index.php/jidm/article/view/5963
<p><span class="citation-27 citation-end-27">Artificial Intelligence (AI) tools are increasingly becoming integral to analytical processes. This paper evaluates the potential of Large Language Models (LLMs), specifically OpenAI’s ChatGPT’s Data Analyst, </span>Grok 3, and Qwen2.5-Max in data analysis. We conducted a structured experiment employing this tool in 108 questions spanning descriptive, diagnostic, predictive, and prescriptive analyses to assess its effectiveness. The study revealed an overall efficiency rate of 72.22% for ChatGPT's Data Analyst, outperforming Grok 3 at 45.37% and Qwen-Max 2.5 at 8.33%. <span class="citation-26 citation-end-26">By discussing the strengths and limitations of a state-of-the-art LLM-based tool in aiding data scientists, this study aims to mark a critical milestone for future developments in the field, particularly as a reference for the open-source community</span>.</p>Carlos D. S. NogueiraDarlan S. AlmeidaBeatriz A. de MirandaClaudio E. C. Campelo
Copyright (c) 2026 Journal of Information and Data Management
2026-03-132026-03-13171465510.5753/jidm.2026.5963ActivEOn: An ontology for human activity modelling in smart spaces
https://journals-sol.sbc.org.br/index.php/jidm/article/view/5691
<p>One of the key challenges in smart environments is representing user context, as most applications need to collect and interpret data about their users. A crucial category of contextual information in these environments is user activity. This paper presents a literature mapping of recent research on human activity modelling using ontologies. Based on this analysis, we propose an ontology called ActivEOn, designed to represent human activities in smart spaces. This ontology provides high-level modelling of activities and related concepts, allowing for extensions into specific domains. The potential of the developed ontology was demonstrated through use cases in different smart environments modelled using Protégé software.</p>Leonardo Viana do NascimentoJosé Palazzo Moreira de Oliveira
Copyright (c) 2026 Journal of Information and Data Management
2026-03-132026-03-13171566610.5753/jidm.2026.5691Detecting and Analysing Duplicate Consumer Complaints and Collective Demands Across Multiple Platforms
https://journals-sol.sbc.org.br/index.php/jidm/article/view/5959
<p>The increasing volume of data in consumer complaint repositories poses considerable challenges for the effective management and analysis of this information. A primary issue is the prevalence of duplicate complaints, often submitted by the same consumer across different platforms as a strategy to exert pressure on service providers. Furthermore, the identification of collective consumer demands embedded within these complaints is essential for revealing systemic issues affecting broader consumer groups. This study proposes a computational framework to address these dual challenges: (i) the detection of duplicate complaints through temporal correlation and cross-platform matching of key attributes—such as consumer identity, service provider, and complaint subject—and (ii) the identification of collective demands via clustering techniques based on semantic similarity. To this end, natural language processing (NLP) methods are employed to extract and represent semantic content from unstructured complaint texts. Empirical results indicate that 95% of duplicate complaints are submitted within a 30-day window from the original entry. Additionally, the proposed clustering approach demonstrates validated effectiveness to enhance the management of unstructured consumer complaint data, facilitating more efficient conflict resolution and informed decision-making for regulatory agencies and service providers.</p>Gestefane Rabbi Júlia ViterboGabriel KakizakiZilton Cordeiro JuniorRaquel O. PratesJulio C. S. ReisMarcos André Gonçalves
Copyright (c) 2026 Journal of Information and Data Management
2026-03-132026-03-13171678110.5753/jidm.2026.5959Enhancing Automatic Speech Recognition Medical Transcriptions
https://journals-sol.sbc.org.br/index.php/jidm/article/view/5737
<p>Automated Speech Recognition (ASR) systems can reduce cognitive load and improve efficiency in medical documentation. This study evaluates Whisper and Wav2Vec2 PT for transcribing medical histories in Brazilian Portuguese. Using real audio-text pairs recorded by specialists and nonspecialists, we assess model performance across speaker profiles. We explore decoding with n-gram language models and post-processing with a BERT-based classifier to correct common spelling errors. Additionally, we apply large language models (LLMs) for text style transfer (TST), converting transcriptions into structured medical anamneses through prompt-based methods. Results show that Whisper outperforms Wav2Vec2 PT overall. The BERT-based correction model improves transcription accuracy, especially when applied after normalization. Among the LLMs tested, Mistral produced the most consistent and structured outputs. These findings demonstrate the potential of combining ASR with language model enhancements for medical documentation, while also highlighting ongoing challenges in clinical ASR.</p>Yanna Torres GonçalvesJoão Victor B AlvesBreno Alef Dourado SáJosé A Fernandes de MacedoTiciana L Coelho da Silva
Copyright (c) 2026 Journal of Information and Data Management
2026-03-132026-03-13171829110.5753/jidm.2026.5737A heuristic Data-Centric AI approach to predict non-contact injuries in elite football players
https://journals-sol.sbc.org.br/index.php/jidm/article/view/5855
<p>Preventing non-contact injuries in professional soccer is critical for safeguarding athlete health and minimizing disruptions to team performance and financial stability. This study investigates predictive modeling strategies for forecasting non-contact traumatic injuries during a microcycle of male professional players from Fluminense Football Club, integrating Data-Centric AI (DCAI) principles with machine learning algorithms. Building upon previous work, we extend the Regressive Multi-dimensional Model Selection (RMMS) methodology through new experiments that incorporate alternative class balancing strategies, hyperparameter tuning, and feature selection methods in place of Principal Component Analysis (PCA). Among the tested models, tree-based algorithms—particularly XGBoost—achieved the highest AUC-ROC (74.1\%), though this result remained below the 79.8\% baseline obtained with a Decision Tree in earlier research. Undersampling with a 70/30 ratio of non-injury to injury cases emerged as the most effective balancing approach, reinforcing prior findings. SHAP (SHapley Additive exPlanations) analysis identified Adaboost as the most positively impactful model, while feature selection and hyperparameter optimization yielded adverse effects on performance. These results suggest that PCA continues to be a more effective dimensionality reduction technique for this dataset. Future research should incorporate additional training seasons, match-related data, and broader athlete characteristics beyond GPS metrics—such as biochemical markers and perceived exertion—to improve model robustness and predictive accuracy.</p>Matheus Santos MeloJuliano SpinetiDiego Nunes BrandãoLucas Giusti TavaresJorge de Abreu Soares
Copyright (c) 2026 Journal of Information and Data Management
2026-03-132026-03-131719210110.5753/jidm.2026.5855Characterizing the Socioenvironmental and Behavioral Profile of Individuals with OCD Using the PNS 2019 Database
https://journals-sol.sbc.org.br/index.php/jidm/article/view/5780
<p>The objective of this study is to characterize the profile of individuals diagnosed with Obsessive-Compulsive Disorder (OCD) in the Brazilian population, considering socioenvironmental and behavioral aspects. For this purpose, the 2019 National Health Survey (PNS) database is considered. Based on a knowledge discovery process, including conceptual modeling of the domain for conceptual selection of attributes, the Explainable Boosting Machine (EBM) and Decision Tree algorithms are applied, aiming to identify relevant attributes for the classification of OCD. The results indicate that both aspects improve the model's performance, reaching an average F1-score of 63% (59% for OCD = yes, and 66% for OCD=No). Results consistent with the literature were also found, such as the relationship between OCD and poor sleep quality, diet quality, and mental disorders such as anxiety and depression, among other factors. This study has limitations, such as the use of data that may not accurately reflect socioeconomic and behavioral conditions during the development of OCD. Thus, this study serves as an exploratory guide, capable of identifying profiles more vulnerable to triggers of the disorder, but without the intention of replacing medical or psychological evaluation.</p>Anna Puga Campos RodriguesLuis Enrique Zarate
Copyright (c) 2026 Journal of Information and Data Management
2026-03-132026-03-1317110211110.5753/jidm.2026.5780Evaluating Data Drift Detection and Its Effects on Machine Learning System Performance
https://journals-sol.sbc.org.br/index.php/jidm/article/view/5738
<p>Software systems incorporating machine learning (ML) components are being increasingly deployed across various domains. Unlike traditional systems, ML systems are highly dependent on the quality of their input data, making their performance susceptible to changes in that data. This work explores the potential for improving ML systems by actively monitoring data flow and retraining models in response to drift detection. We begin by evaluating several widely used statistical and distance-based methods for detecting data drift, highlighting their advantages and limitations. Subsequently, we present experimental results using these methods on datasets exhibiting concept drift from the literature, as well as synthetic datasets with data drift. Our findings demonstrate how these techniques can enhance the robustness of ML systems, offering automatic adaptation regardless of the type of drift encountered.</p>Lucas Helfstein RochaKelly Braghetto
Copyright (c) 2026 Journal of Information and Data Management
2026-03-132026-03-1317111212110.5753/jidm.2026.5738POI Type Embedding Techniques for Recommendation Systems: A Comparative Analysis
https://journals-sol.sbc.org.br/index.php/jidm/article/view/5929
<p><span style="font-weight: 400;">Modern Point of Interest (POI) Recommendation Systems (RSs) employ diverse technologies to enhance user experience and engagement. These systems utilize various techniques to predict user preferences and recommend the next POI, including machine learning algorithms that integrate user behavior with category and time preferences from other users, as well as state-of-the-art embedding techniques that leverage geographic features around POIs to assess user preference likelihood. This study addresses key gaps in POI Recommendation Systems </span><span style="font-weight: 400;">by evaluating algorithms that enhance POI representation through the combined use of check-in data, geographic features, and advanced embedding techniques.</span></p>Diogo Alves SilveiraSalatiel Dantas SilvaNícolas Moreira Nobre Leite Claudio E. C. Campelo
Copyright (c) 2026 Journal of Information and Data Management
2026-03-132026-03-1317112213210.5753/jidm.2026.5929Incorporating LGPD Requirements and Restrictions into Database Design
https://journals-sol.sbc.org.br/index.php/jidm/article/view/5859
<p>The Brazilian General Data Protection Law (LGPD) specifies how personal data processing, storage, and disposal should be conducted, conditioning it to the prior authorization of the data subject. On the other hand, current information systems rely heavily on personal data and, therefore, must comply with the LGPD. In this context, the database system becomes an even more critical component in software development, as it is responsible for storing, updating, and retrieving data. However, the methodologies and tools used for database design do not incorporate the requirements and constraints of the LGPD, making it difficult to ensure compliance between databases and current legislation. This article presents a methodology, called LGPDbyD, to incorporate the impositions and principles of the LGPD into the database design process. To achieve this, we extend the ER model, the Relational model, and the CREATE TABLE command. Additionally, we discuss how to model, design, and implement the concepts of purpose, consent, and personal data retention period. Finally, we extend the brModelo tool to provide support for the requirements and constraints of the LGPD. LGPDbyD aims to facilitate the processes of database design and auditing in compliance with the LGPD.</p>Patricia VieiraJosé Maria MonteiroJavam MachadoAngelo Brayner
Copyright (c) 2026 Journal of Information and Data Management
2026-03-132026-03-1317113314510.5753/jidm.2026.5859Incremental Schema Evolution in Document-oriented Databases
https://journals-sol.sbc.org.br/index.php/jidm/article/view/5700
<p>Document-oriented databases offer high flexibility for storing structured, semi-structured, and unstructured data. However, the absence or obsolescence of predefined schemas can hinder the interpretation and analysis of the stored information. This study proposes an approach for the incremental evolution of schemas in document-oriented databases, aiming to continuously update the schema as new documents are inserted. The proposed solution is fully automated, requiring no manual intervention, and eliminates the need to reprocess the entire dataset. Experimental results, obtained from metadata collections derived from books and Twitter posts, show that the proposed approach achieves performance gains ranging from 1.93 to 2.85 times compared to a traditional batch-based approach.</p>Eleonilia Monteiro RodriguesCarlos Eduardo S. PiresDimas Cassimiro do N. Filho
Copyright (c) 2026 Journal of Information and Data Management
2026-03-132026-03-1317114615510.5753/jidm.2026.5700Leveraging LLMs for Topic Modeling and Classification in Brazilian Funk Lyrics
https://journals-sol.sbc.org.br/index.php/jidm/article/view/5578
<p>Song lyrics present unique challenges for topic modeling and classification due to their implicit discourse, reliance on figurative and poetic language, and use of slang. As a cultural expression of urban peripheries, Brazilian funk provides a rich social narrative. This work proposes LLMusic, an end-to-end framework for topic extraction and classification of song lyrics, using Brazilian funk as a case study. LLMusic synergistically combines prompt-based Large Language Models (LLMs) with advanced topic modeling techniques such as BERTopic, aiming to address the limitations of traditional methods for identifying subjectively represented topics in texts. Zero-shot prompting is also deployed for unsupervised classification of new lyrics based on the identified topics. Our assessments demonstrate that LLMusic outperforms BERTopic in identifying subjectively expressed topics while achieving strong performance in unsupervised topic classification. The paper describes the components of LLMusic for topic identification and topic classification and illustrates its effectiveness by analyzing the discourse in the most popular funk songs, highlighting its potential for large-scale lyrical analysis.</p>Jesus Daniel Yepez RojasBruno Tavares SantosFabíola de Carvalho Leite PeresKarin Becker
Copyright (c) 2026 Journal of Information and Data Management
2026-03-132026-03-1317115617010.5753/jidm.2026.5578Locally Differentially Private Applications with Longitudinal Data
https://journals-sol.sbc.org.br/index.php/jidm/article/view/5735
<p>Local differential privacy (LDP) was developed as a version of differential privacy (DP) that does not require a trusted curator or server. Frequency oracles, a class of LDP protocols for frequency estimation, function as the building blocks for diverse applications with LDP guarantees developed for tackling specific tasks such as answering range queries, and frequent item and itemset mining. However, these applications often build on frequency oracles with no adjustments for longitudinal data, and therefore can not provide LDP. In this paper, we investigate the practical effectiveness of state-of-the-art frequency oracle (FO) protocols designed for longitudinal data in various data analysis tasks. Specifically, we implement these protocols to perform three key tasks: answering range queries, identifying frequent items, and detecting frequent itemsets. Additionally, we incorporate post-processing techniques to enhance utility and improve overall performance. Our experimental evaluation includes four real-world datasets from diverse domains, allowing us to systematically measure and compare the utility of longitudinal LDP protocols.</p>Antonio A. Marreiras NetoEduardo R. Duarte NetoJosé S. Costa FilhoJavam C. Machado
Copyright (c) 2026 Journal of Information and Data Management
2026-03-132026-03-1317117118010.5753/jidm.2026.5735A Conceptual Framework for Building and Exploring Semantic Views of Enterprise Knowledge Graphs
https://journals-sol.sbc.org.br/index.php/jidm/article/view/5971
<p>An Enterprise Knowledge Graph (EKG) provides a powerful foundation for knowledge management, data integration, and analytics within organizations. It achieves this by offering a semantic view that semantically integrates diverse data sources from the organization’s data lake. This paper introduces a novel Data Design Pattern for constructing semantic views, referred to as DDPSV, specifically designed to support the creation of semantic views within an EKG. The proposed DDPSV organizes both data and metadata into four hierarchical layers, providing a standardized structure that facilitates the development, maintenance, and reuse of semantic views across different contexts. Building upon this foundation, a second key contribution is a novel incremental methodology for constructing the semantic view of an EKG, grounded in the proposed data design pattern . This methodology adopts a “pay-as-you-go” data integration strategy, allowing organizations to progressively build, refine, and evolve their knowledge graphs while ensuring semantic consistency, scalability, and adaptability throughout the integration process. In addition, the paper presents an interactive graphical interface designed to support context-sensitive navigation of the semantic view. This tool enhances user interaction by enabling intuitive exploration and deeper utilization of resources within the EKG</p>Vania Maria Ponte VidalJosé Renato S. FreitasTulio Vidal RolimNarciso ArrudaMarco A. CasanovaChiara Renso
Copyright (c) 2026 Journal of Information and Data Management
2026-03-132026-03-1317118119310.5753/jidm.2026.5971