BWBEV: A Bitwise Query Processing Algorithm for Approximate Prefix Search
DOI:
https://doi.org/10.5753/jbcs.2024.4236Keywords:
bit-parallelism, autocomplete, trie, error toleranceAbstract
We tackle the challenge of conducting an approximate prefix search within datasets of strings. We explore using a bit-parallelism technique to compute the edit distance between distinct strings and illustrate its adaptation for an approximate prefix search procedure referred to as BWBEV. This technique employs a unary representation of edit vectors alongside bitwise operations to efficiently update these vectors during the edit distance computation. We also show how to apply our new bit-parallelism technique strategy to online edit distance computation between strings without index structure. Our experiments with BWBEV applied to approximate prefix search for a query autocompletion task revealed a substantial acceleration of over 36% when contrasted against state-of-the-art methods.
Downloads
References
Alaofi, M., Gallagher, L., Mckay, D., Saling, L. L., Sanderson, M., Scholer, F., Spina, D., and White, R. W. (2022). Where do queries come from? In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '22, page 2850–2862, New York, NY, USA. Association for Computing Machinery. DOI: 10.1145/3477495.3531711.
Baeza-Yates, R. (1999). Faster approximate string matching. Algorithmica, 23:127-158. DOI: 10.1007/PL00009253.
Baeza-Yates, R. and Gonnet, G. H. (1992). A new approach to text searching. Commun. ACM, 35(10):74–82. DOI: 10.1145/135239.135243.
Bar-Yossef, Z. and Kraus, N. (2011). Context-sensitive query auto-completion. In Proceedings of the 20th International Conference on World Wide Web, WWW '11, page 107–116, New York, NY, USA. Association for Computing Machinery. DOI: 10.1145/1963405.1963424.
Cai, F. and de Rijke, M. (2016). A survey of query auto completion in information retrieval. Foundations and Trends® in Information Retrieval, 10(4):273-363. DOI: 10.1561/1500000055.
Chaudhuri, S. and Kaushik, R. (2009). Extending autocompletion to tolerate errors. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data, SIGMOD '09, pages 707–-718, New York, NY, USA. Association for Computing Machinery. DOI: 10.1145/1559845.1559919.
Cucerzan, S. and Brill, E. (2004). Spelling correction as an iterative process that exploits the collective knowledge of web users. In Conference on Empirical Methods in Natural Language Processing, volume 4, pages 293-300. [link].
Deng, D., Li, G., Wen, H., Jagadish, H. V., and Feng, J. (2016). Meta: An efficient matching-based method for error-tolerant autocompletion. Proc. VLDB Endow., 9(10):828–-839. DOI: 10.14778/2977797.2977808.
Durian, B., Holub, J., Peltola, H., and Tarhio, J. (2009). Tuning bndm with q-grams. In Proceedings of the Meeting on Algorithm Engineering & Experiments, page 29–37, USA. Society for Industrial and Applied Mathematics. DOI: 10.1137/1.9781611972894.3.
Ferreira, B., de Moura, E. S., and Silva, A. d. (2022). Applying burst-tries for error-tolerant prefix search. Inf. Retr., 25(4):481–518. DOI: 10.1007/s10791-022-09416-9.
Fredkin, E. (1960). Trie memory. Commun. ACM, 3(9):490–-499. DOI: 10.1145/367390.367400.
Heinz, S., Zobel, J., and Williams, H. E. (2002). Burst tries: A fast, efficient data structure for string keys. ACM Trans. Inf. Syst., 20(2):192–223. DOI: 10.1145/506309.506312.
Hu, S., Xiao, C., and Ishikawa, Y. (2018). An efficient algorithm for location-aware query autocompletion. IEICE TRANSACTIONS on Information and Systems, 101(1):181-192. DOI: 10.1587/transinf.2017EDP7152.
Hyyrö, H. (2003). A bit-vector algorithm for computing levenshtein and damerau edit distances. Nord. J. Comput., 10(1):29-39. Available online [link].
Ji, S., Li, G., Li, C., and Feng, J. (2009). Efficient interactive fuzzy keyword search. In Proceedings of the 18th International Conference on World Wide Web, WWW 09, pages 371–-380, New York, NY, USA. Association for Computing Machinery. DOI: 10.1145/1526709.1526760.
Krishnan, U., Moffat, A., and Zobel, J. (2017). A taxonomy of query auto completion modes. In Proceedings of the 22nd Australasian Document Computing Symposium, ADCS 2017, pages 2271-2280, New York, NY, USA. Association for Computing Machinery. DOI: 10.1145/3166072.3166081.
Levenshtein, V. (1966). Binary Codes Capable of Correcting Deletions, Insertions and Reversals. Soviet Physics Doklady, 10:707. Available online [link].
Li, G., Ji, S., Li, C., and Feng, J. (2011). Efficient fuzzy full-text type-ahead search. VLDB J., 20:617-640. DOI: 10.1007/s00778-011-0218-x.
Miller, R. B. (1968). Response time in man-computer conversational transactions. In Proceedings of the December 9-11, 1968, Fall Joint Computer Conference, Part I, AFIPS '68 (Fall, part I), pages 267–-277, New York, NY, USA. Association for Computing Machinery. DOI: 10.1145/1476589.1476628.
Myers, G. (1999). A fast bit-vector algorithm for approximate string matching based on dynamic programming. J. ACM, 46(3):395–415. DOI: 10.1145/316542.316550.
Navarro, G. (2001). A guided tour to approximate string matching. ACM Comput. Surv., 33(1):31–88. DOI: 10.1145/375360.375365.
Navarro, G. and Raffinot, M. (2001). Fast and flexible string matching by combining bit-parallelism and suffix automata. ACM J. Exp. Algorithmics, 5. DOI: 10.1145/351827.384246.
Peltola, H. and Tarhio, J. (2003). Alternative algorithms for bit-parallel string matching. In Nascimento, M. A., de Moura, E. S., and Oliveira, A. L., editors, String Processing and Information Retrieval, pages 80-93, Berlin, Heidelberg. Springer Berlin Heidelberg. DOI: 10.1007/978-3-540-39984-1_7.
Phophalia, A. (2011). A survey on learning to rank (letor) approaches in information retrieval. In 2011 Nirma University International Conference on Engineering, pages 1-6. DOI: 10.1109/NUiConE.2011.6153228.
Qin, J., Xiao, C., Hu, S., Zhang, J., Wang, W., Ishikawa, Y., Tsuda, K., and Sadakane, K. (2020). Efficient query autocompletion with edit distance-based error tolerance. The VLDB Journal, 29:1-25. DOI: 10.1007/s00778-019-00595-4.
Silva de Moura, E., Navarro, G., Ziviani, N., and Baeza-Yates, R. (2000). Fast and flexible word searching on compressed text. ACM Transactions on Information Systems (TOIS), 18(2):113-139. DOI: 10.1145/348751.348754.
Ukkonen, E. (1985). Algorithms for approximate string matching. Information and Control, 64(1):100-118. International Conference on Foundations of Computation Theory. DOI: https://doi.org/10.1016/S0019-9958(85)80046-2.
Wang, J. and Lin, C. (2020). Fast error-tolerant location-aware query autocompletion. In 2020 IEEE 36th International Conference on Data Engineering (ICDE), pages 1998-2001. IEEE. DOI: 10.1109/ICDE48307.2020.00223.
Wang, P.-W., Kolter, J. Z., Mohan, V., and Dhillon, I. S. (2018). Realtime query completion via deep language models. CEUR Workshop Proceedings. Available online [link].
Wright, A. H. (1994). Approximate string matching using within-word parallelism. Softw. Pract. Exper., 24(4):337–362. DOI: 10.1002/spe.4380240402.
Xiao, C., Qin, J., Wang, W., Ishikawa, Y., Tsuda, K., and Sadakane, K. (2013). Efficient error-tolerant query autocompletion. Proc. VLDB Endow., 6(6):373–-384. DOI: 10.14778/2536336.2536339.
Zhong, Q., Zhi, J., and Guo, G. (2022). Dynamic is optimal: Effect of three alternative auto-complete on the usability of in-vehicle dialing displays and driver distraction. Traffic injury prevention, 23(1):51-56. DOI: 10.1080/15389588.2021.2010052.
Zhou, X., Qin, J., Xiao, C., Wang, W., Lin, X., and Ishikawa, Y. (2016). Beva: An efficient query processing algorithm for error-tolerant autocompletion. ACM Trans. Database Syst., 41(1). DOI: 10.1145/2877201.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 Edleno S. de Moura, Berg Ferreira, Altigran da Silva, Ricardo Baeza-Yates
This work is licensed under a Creative Commons Attribution 4.0 International License.