Built-up Integration: A New Terminology and Taxonomy for Managing Information On-the-fly

Authors

  • Maria Helena Franciscatto Federal University of Paraná
  • Luis Carlos Erpen de Bona Federal University of Paraná
  • Celio Trois Federal University of Santa Maria
  • Marcos Didonet Del Fabro Universitè Paris-Saclay, CEA, List

DOI:

https://doi.org/10.5753/jidm.2024.3079

Keywords:

Data Integration, On-the-fly Integration, Taxonomy

Abstract

Obtaining useful data to meet specific query requirements usually demands to integrate data sources at query time, which is known as on-the-fly integration. Currently, many studies address this concept by discovering useful data sources in an ad-hoc manner, and merging them for providing actionable information to the end user. This set of steps, however, lack a standardization in their identification, since they are described in the literature under many different names. Hence, without an unified nomenclature and knowledge organization, the development in the area may be considerably impaired. This paper proposes a novel term called Built-up Integration aiming at knowledge regulation, and a taxonomy for embracing a set of common tasks observed in studies that select and integrate sources on-the-fly. As result from the taxonomy, we demonstrate how Built-up Integration features can be found in the literature, through an exemplification with related studies. We also highlight research opportunities regarding Built-up Integration, as a way to guide future development in a subdomain of Data Integration.

Downloads

Download data is not yet available.

References

Abelló, A., Darmont, J., Etcheverry, L., Golfarelli, M., Mazón, J.-N., Naumann, F., Pedersen, T., Rizzi, S. B.,Trujillo, J., Vassiliadis, P., et al. (2013). Fusion cubes: Towards self-service business intelligence. International Journal of Data Warehousing and Mining (IJDWM), 9(2):66–88.

Abelló, A., Romero, O., Pedersen, T. B., Berlanga, R., Nebot, V., Aramburu, M. J., and Simitsis, A. (2014). Using semantic web technologies for exploratory OLAP: a survey. IEEE transactions on knowledge and data engineering, 27(2):571–588.

Arenas, M., Croquevielle, L. A., Jayaram, R., and Riveros, C. (2021). When is approximate counting for conjunctive queries tractable? In Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing, pages 1015–1027.

Azuan, N. A. A. (2021). Exploring Manual Correction as a Source of User Feedback in Pay-As-You-Go Integrat. PhD thesis, The University of Manchester (United Kingdom).

Batory, D. (2005). Feature models, grammars, and propositional formulas. In International Conference on Software Product Lines, pages 7–20. Springer.

Bonura, S., Cammarata, G., Finazzo, R., Francaviglia, G., and Morreale, V. (2017). A novel webGIS-based situational awareness platform for trustworthy big data integration and analytics in mobility context. In OTM Confederated International Conferences” On the Move to Meaningful Internet Systems”, pages 86–98. Springer.

Castellanos, M., Gupta, C., Wang, S., Dayal, U., and Durazo, M. (2012). A platform for situational awareness in operational BI. Decision Support Systems, 52(4):869–883.

Cheatham, M. and Pesquita, C. (2017). Semantic data integration. Handbook of big data technologies, pages 263–305.

Chen, X., Han, Y., Wen, Y., Zhang, F., and Liu, W. (2017). A keyword-driven data service mashup plan generation approach for ad-hoc data query. In 2017 IEEE International Conference on Services Computing (SCC), pages 394–401. IEEE.

Cheng, B., Zhao, S., Qian, J., Zhai, Z., and Chen, J. (2018). Lightweight service mashup middleware with REST style architecture for iot applications. IEEE Transactions on Network and Service Management, 15(3):1063–1075.

Curry, E., Derguech, W., Hasan, S., Kouroupetroglou, C., and ul Hassan, U. (2019). A real-time linked dataspace for the internet of things: enabling “pay-as-you-go” data management in smart environments. Future Generation Computer Systems, 90:405–422.

Daniel, F., Matera, M., Quintarelli, E., Tanca, L., and Zaccaria, V. (2018). Context-aware access to heterogeneous resources through on-the-fly mashups. In International Conference on Advanced Information Systems Engineering, pages 119–134. Springer.

Daniel, G., Cabot, J., Deruelle, L., and Derras, M. (2020). Xatkit: a multimodal low-code chatbot development framework. IEEE Access, 8:15332–15346.

Das Sarma, A., Dong, X., and Halevy, A. (2008). Bootstrapping pay-as-you-go data integration systems. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data, pages 861–874.

Derakhshan, B., Mahdiraji, A. R., Rabl, T., and Markl, V. (2019). Continuous deployment of machine learning pipelines. In EDBT, pages 397–408.

Duan, Y., Edwards, J. S., and Dwivedi, Y. K. (2019). Artificial intelligence for decision making in the era of big data–evolution, challenges and research agenda. International journal of information management, 48:63–71.

El-Roby, A. (2018). Web Data Integration for Non-Expert Users. PhD thesis, University of Waterloo.

Endsley, M. R. (1995). Toward a theory of situation awareness in dynamic systems. Human factors, 37(1):32–64.

Fafalios, P. and Tzitzikas, Y. (2019). How many and what types of SPARQL queries can be answered through zero-knowledge link traversal? In Proceedings of the 34th ACM/SIGAPP Symposium on Applied Computing, pages 2267–2274.

Ferrández, A., Maté, A., Peral, J., Trujillo, J., De Gregorio, E., and Aufaure, M.-A. (2016). A framework for enriching data warehouse analysis with question answering systems. Journal of Intelligent Information Systems, 46(1):61–82.

Franklin, M., Halevy, A., and Maier, D. (2008). A first tutorial on dataspaces. Proceedings of the VLDB Endowment, 1(2):1516–1517.

Frommholz, I., Liu, H., and Melucci, M. (2020). Birdsbridging the gap between information science, information retrieval and data science. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 2455–2458.

Furche, T., Gottlob, G., Libkin, L., Orsi, G., and Paton, N. (2016). Data wrangling for big data: Challenges and opportunities. In Advances in Database Technology—EDBT 2016: Proceedings of the 19th International Conference on Extending Database Technology, pages 473–478.

Grammel, L. and Storey, M.-A. (2010). A survey of mashup development environments. In The smart internet, pages 137–151. Springer.

Hai, R., Miller, R., Jarke, M., and Quix, C. J. (2020). Data Integration and Metadata Management in Data Lakes. PhD thesis, Ph. D. Dissertation. RWTH Aachen University. DOI: https://doi. org/10.18154 ....

Halevy, A., Franklin, M., and Maier, D. (2006a). Principles of dataspace systems. In Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pages 1–9.

Halevy, A., Rajaraman, A., and Ordille, J. (2006b). Data integration: The teenage years. In Proceedings of the 32nd international conference on Very large data bases, pages 9–16.

Han, Y., Wang, G., Ji, G., and Zhang, P. (2013). Situational data integration with data services and nested table. Service Oriented Computing and Applications, 7(2):129–150.

Harth, A., Knoblock, C. A., Stadtmüller, S., Studer, R., and Szekely, P. (2013). On-the-fly integration of static and dynamic linked data. In Proceedings of the Fourth International Workshop on Consuming Linked Data colocated with the 12th International Semantic Web Conference, pages 1613–0073.

Hartig, O. (2013). SQUIN: a traversal based query execution system for the web of linked data. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, pages 1081–1084.

Hartig, O. (2014). Linked Data Query Processing Based on Link Traversal, pages 263–283. DOI: 10.1201/b16859-15.

Hartig, O. and Freytag, J.-C. (2012). Foundations of traversal based query execution over linked data. In Proceedings of the 23rd ACM conference on Hypertext and social media, pages 43–52.

Hartig, O. and Özsu, M. T. (2016). Walking without a map: Ranking-based traversal for querying linked data. In International Semantic Web Conference, pages 305–324. Springer.

Hedeler, C., Belhajjame, K., Fernandes, A. A., Embury, S. M., and Paton, N. W. (2009). Dimensions of dataspaces. In British National Conference on Databases, pages 55–66. Springer.

Herzig, D. M. and Tran, T. (2012). Heterogeneous web data search using relevance-based on the fly data integration. In Proceedings of the 21st international conference on World Wide Web, pages 141–150.

Hirmer, P. and Mitschang, B. (2017). TOSCA4Mashups: enhanced method for on-demand data mashup provisioning. Computer Science-Research and Development, 32(3-4):291–300.

Huang, A. F., Huang, S. B., Lee, E. Y., and Yang, S. J. (2008). Improving end-user programming with situational mashups in web 2.0 environment. In 2008 IEEE International Symposium on Service-Oriented System Engineering, pages 62–67. IEEE.

Jarke, M. and Quix, C. (2022). Federated data integration in data spaces. In Designing Data Spaces, pages 181–194. Springer.

Jeffery, S. R., Franklin, M. J., and Halevy, A. Y. (2008). Payas-you-go user feedback for dataspace systems. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data, pages 847–860.

Jovanovic, P., Nadal, S., Romero, O., Abelló, A., and Bilalli, B. (2021). Quarry: a user-centered big data integration platform. Information Systems Frontiers, 23(1):9–33.

Jovanovic, P., Romero, O., and Abelló, A. (2016). A unified view of data-intensive flows in business intelligence systems: a survey. In Transactions on Large-Scale Data-and Knowledge-Centered Systems XXIX, pages 66–107. Springer.

Kang, K. C., Cohen, S. G., Hess, J. A., Novak, W. E., and Peterson, A. S. (1990). Feature-oriented domain analysis (FODA) feasibility study. Technical report, DTIC Document.

Kantorovitch, J., Niskanen, I., Kalaoja, J., and Staykova, T. (2017). Designing situation awareness.

Khalajzadeh, H., Abdelrazek, M., Grundy, J., Hosking, J., and He, Q. (2018). A survey of current end-user data analytics tool support. In 2018 IEEE International Congress on Big Data (BigData Congress), pages 41–48. IEEE.

Kordjamshidi, P., Singh, S., Khashabi, D., Christodoulopoulos, C., Summons, M., Sinha, S., and Roth, D. (2017). Relational learning and feature extraction by querying over heterogeneous information networks. arXiv preprint arXiv:1707.07794.

Koupil, P., Hricko, S., and Holubová, I. (2022). A universal approach for multi-model schema inference. Journal of Big Data, 9(1):1–46.

Latih, R., Patel, A. M., Zin, A. M., Yiqi, T., and Muhammad, S. H. (2011). Whip: A framework for mashup development with block-based development approach. In Proceedings of the 2011 International Conference on Electrical Engineering and Informatics, pages 1–6. IEEE.

Lee, Y.-J. (2014). Semantic-based data mashups using hierarchical clustering and pattern analysis methods. J. Inf. Sci. Eng., 30(5):1601–1618.

Lee, Y.-J. and Kim, J.-S. (2012). Automatic web api composition for semantic data mashups. In 2012 Fourth International Conference on Computational Intelligence and Communication Networks, pages 953–957. IEEE.

Li, G. (2017). Human-in-the-loop data integration. Proceedings of the VLDB Endowment, 10(12):2006–2017.

Liu, C., Wang, J., Han, Y., et al. (2015). Discovery of service hyperlinks with user feedbacks for situational data mashup. International Journal of Database Theory and Application, 8(4):71–80.

Löser, A., Hueske, F., and Markl, V. (2008). Situational business intelligence. In International Workshop on Business Intelligence for the Real-Time Enterprise, pages 1–11. Springer.

Maskat, R. (2016). Pay-As-You-Go Instance-Level Integration. PhD thesis, The University of Manchester (United Kingdom).

Masmoudi, M., Lamine, S. B. A. B., Zghal, H. B., Archimede, B., and Karray, M. H. (2021). Knowledge hypergraph-based approach for data integration and querying: Application to earth observation. Future Generation Computer Systems, 115:720–740.

Matskanis, N., Andronikou, V., Massonet, P., Mourtzoukos, K., and Roumier, J. (2012). A linked data approach for querying heterogeneous sources. pages 411–414.

Miller, R. J. (2018). Open data integration. Proceedings of the VLDB Endowment, 11(12):2130–2139.

Missier, P., Fernandes, A. A., Lengu, R., Guerrini, G., and Mesiti, M. (2009). Data quality support to on-the-fly data integration using adaptive query processing. In SEBD, pages 213–220.

Mountantonakis, M. and Tzitzikas, Y. (2019). Large-scale semantic integration of linked data: A survey. ACM Computing Surveys (CSUR), 52(5):1–40.

Nadal, S., Romero, O., Abelló, A., Vassiliadis, P., and Vansummeren, S. (2019). An integration-oriented ontology to govern evolution in big data ecosystems. Information systems, 79:3–19.

Nargesian, F., Zhu, E., Miller, R. J., Pu, K. Q., and Arocena, P. C. (2019). Data lake management: challenges and opportunities. Proceedings of the VLDB Endowment, 12(12):1986–1989.

Nicklas, D., Schwarz, T., and Mitschang, B. (2017). A schema-based approach to enable data integration on the fly. International Journal of Cooperative Information Systems, 26(01):1650010.

Oussous, A., Benjelloun, F.-Z., Lahcen, A. A., and Belfkih, S. (2018). Big data technologies: A survey. Journal of King Saud University-Computer and Information Sciences, 30(4):431–448.

Paredes-Valverde, M. A., Alor-Hernández, G., Rodríguez-González, A., Valencia-García, R., and Jiménez-Domingo, E. (2015). A systematic review of tools, languages, and methodologies for mashup development. Software: Practice and Experience, 45(3):365–397.

Paton, N. W., Belhajjame, K., Embury, S. M., Fernandes, A. A., and Maskat, R. (2016). Pay-as-you-go data integration: Experiences and recurring themes. In International Conference on Current Trends in Theory and Practice of Informatics, pages 81–92. Springer.

Paton, N. W., Christodoulou, K., Fernandes, A. A., Parsia, B., and Hedeler, C. (2012). Pay-as-you-go data integration for linked data: opportunities, challenges and architectures. In Proceedings of the 4th International Workshop on Semantic Web Information Management, pages 1–8.

Qi, S. and Luo, Y. (2016). Object retrieval with image graph traversal-based re-ranking. Signal Processing: Image Communication, 41:101–114.

Schobbens, P.-Y., Heymans, P., and Trigaux, J.-C. (2006). Feature diagrams: A survey and a formal semantics. In 14th IEEE International Requirements Engineering Conference (RE’06), pages 139–148. IEEE.

Sehar, U., Ghazal, I., Mansoor, H., and Saba, S. (2022). A comprehensive literature review on approaches, techniques & challenges of mashup development. International Journal of Scientific & Engineering Research, 13.

Serban, F., Vanschoren, J., Kietz, J.-U., and Bernstein, A. (2013). A survey of intelligent assistants for data analysis. ACM Computing Surveys (CSUR), 45(3):1–35.

Serrano, F. R., Fernandes, A. A., and Christodoulou, K. (2018). An approach to quantify integration quality using feedback on mapping results. International Journal of Web Information Systems.

Singh, V., Chen, S.-S., Singhania, M., Nanavati, B., Gupta, A., et al. (2022). How are reinforcement learning and deep learning algorithms used for big data based decision making in financial industries–a review and research agenda. International Journal of Information Management Data Insights, 2(2):100094.

Tatemura, J., Chen, S., Liao, F., Po, O., Candan, K. S., and Agrawal, D. (2008). Uqbe: uncertain query by example for web service mashup. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data, pages 1275–1280.

Tran, T. N., Truong, D. K., Hoang, H. H., and Le, T. M. (2014). Linked data mashups: A review on technologies, applications and challenges. In Asian Conference on Intelligent Information and Database Systems, pages 253–262. Springer.

Umbrich, J., Hogan, A., Polleres, A., and Decker, S. (2015). Link traversal querying for a diverse web of data. Semantic Web, 6(6):585–624.

Vo, Q. D., Thomas, J., Cho, S., De, P., and Choi, B. J. (2018). Next generation business intelligence and analytics. In Proceedings of the 2nd International Conference on Business and Information Management, pages 163–168.

Wang, G., Fang, J., and Han, Y. (2013). Interactive recommendation of composition operators for situational data integration. In 2013 International Conference on Cloud and Service Computing, pages 120–127. IEEE.

Xu, P., Lu, J., et al. (2019). Towards a unified framework for string similarity joins. Proceedings of the VLDB Endowment.

Zhang, Y. and Ives, Z. G. (2020). Finding related tables in data lakes for interactive data science. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data, pages 1951–1966.

Ziegler, P. and Dittrich, K. R. (2007). Data integration—problems, approaches, and perspectives. In Conceptual modelling in information systems engineering, pages 39–58. Springer.

Downloads

Published

2024-02-19

How to Cite

Franciscatto, M. H., Erpen de Bona, L. C., Trois, C., & Didonet Del Fabro, M. (2024). Built-up Integration: A New Terminology and Taxonomy for Managing Information On-the-fly. Journal of Information and Data Management, 15(1), 80–92. https://doi.org/10.5753/jidm.2024.3079

Issue

Section

Regular Papers