Sentence-ITDL: Generating POI type Embeddings based on Variable Length Sentences

Authors

  • Salatiel Dantas Silva Universidade Federal de Campina Grande
  • Cláudio Campelo Universidade Federal de Campina Grande
  • Maxwell Guimarães de Oliveira Universidade Federal de Campina Grande

DOI:

https://doi.org/10.5753/jidm.2024.3209

Keywords:

Points of Interest, Machine Learning, Similarity, Geo-Semantics, Vector Embeddings

Abstract

Point of Interest (POI) types are one of the most researched aspects of urban data. Developing new methods capable of capturing the semantics and similarity of POI types enables the creation of computational mechanisms that may assist in many tasks, such as POI recommendation and Urban Planning. Several works have successfully modeled POI types considering POI co-occurrences in different spatial regions along with statistical models based on the Word2Vec technique from Natural Language Processing (NLP). In the state-of-the-art, binary relations between each POI in a region indicate the co-occurrences. The relations are used to generate a set of two-word sentences using the POI types. Such sentences feed a Word2Vec model that produces POI type embeddings. Although these works have presented good results, they do not consider the spatial distance among related POIs as a feature to represent POI types. In this context, we present the Sentence-ITDL, an approach based on Word2Vec variable length sentences that include such a distance to generate POI type embeddings, providing an improved POI type representation. Our approach uses the distance to generate Word2Vec variable-length sentences. We define ranges of distances mapped to word positions in a sentence. From the mapping, nearby will have their types mapped to close positions in the sentences.Word2Vec's architecture uses the word position in a sentence to adjust the training weights of each POI type. In this manner, POI type embeddings can incorporate the distance. Experiments based on similarity assessments between POI types revealed that our representation provides values close to human judgment.

Downloads

Download data is not yet available.

References

Bengio, Y., Ducharme, R., Vincent, P., and Janvin, C. (2003). A neural probabilistic language model. The journal of machine learning research, 3:1137–1155.

Carreira-Perpinán, M. A. (2015). A review of mean-shift algorithms for clustering. arXiv preprint arXiv:1503.00687, 1(1):1–28.

Chen, M., Zhu, L., Xu, R., Liu, Y., Yu, X., and Yin, Y. (2021). Embedding hierarchical structures for venue category representation. ACM Transactions on Information Systems (TOIS), 40(3):1–29.

Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 1(1):1–16.

Ding, J., Yu, G., Li, Y., Jin, D., and Gao, H. (2019). Learning from hometown and current city: Cross-city poi recommendation via interest drift and transfer learning. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 3(4):1–28.

Gao, S. and Yan, B. (2018). Place2vec: visualizing and reasoning about place type similarity and relatedness by learning context embeddings. In Adjunct Proceedings of the 14th International Conference on Location Based Services, pages 225–226, Rämistrasse 101, 8092 Zurich, Switzerland. ETH Zurich, ETH Zurich.

Harispe, S., Ranwez, S., Janaqi, S., and Montmain, J. (2015). Semantic similarity from natural language and ontology analysis. Synthesis Lectures on Human Language Technologies, 8(1):1–254.

Hu, S., He, Z., Wu, L., Yin, L., Xu, Y., and Cui, H. (2020). A framework for extracting urban functional regions based on multiprototype word embeddings using points-of-interest data. Computers, Environment and Urban Systems, 80:101442.

Hu, Y. and Han, Y. (2019). Identification of urban functional areas based on poi data: A case study of the guangzhou economic and technological development zone. Sustainability, 11(5):1385.

Huang, J., Wang, H., Sun, Y., Shi, Y., Huang, Z., Zhuo, A., and Feng, S. (2022). Ernie-geol: A geography-and-language pre-trained model and its applications in baidu maps. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 3029–3039, Washington, DC, USA. SIGKDD.

Laplante, P. A. (2015). Encyclopedia of Information Systems and Technology-Two Volume Set. CRC Press, Taylor & Francis.

Li, Z., Kim, J., Chiang, Y.-Y., and Chen, M. (2022). Spabert: A pretrained language model from geographic data for geo-entity representation. arXiv preprint arXiv:2210.12213, 1(1):1–13.

Liu, K., Yin, L., Lu, F., and Mou, N. (2020). Visualizing and exploring poi configurations of urban regions on poi-type semantic space. Cities, 99:102610.

Liu, X., Andris, C., and Rahimi, S. (2019). Place niche and its regional variability: Measuring spatial context patterns for points of interest with representation learning. Computers, Environment and Urban Systems, 75:146–160.

Liu, X., Hu, J., Shen, Q., and Chen, H. (2021). Geo-BERT pre-training model for query rewriting in POI search. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 2209–2214, Punta Cana, Dominican Republic. Association for Computational Linguistics. DOI: 10.18653/v1/2021.findings-emnlp.190.

Liu, Y., Yang, Z., Li, T., and Wu, D. (2022). A novel poi recommendation model based on joint spatiotemporal effects and four-way interaction. Applied Intelligence, 52(5):5310–5324.

McInnes, L., Healy, J., and Astels, S. (2017). hdbscan: Hierarchical density based clustering. Journal of Open Source Software, 2(11):205.

Mikolov, T., Chen, K., Corrado, G., and Dean, J. (2013). Efficient estimation of word representations in vector space. In Bengio, Y. and LeCun, Y., editors, 1st International Conference on Learning Representations, ICLR 2013, Scottsdale, Arizona, USA, May 2-4, 2013, Workshop Track Proceedings. DBLP.

Mou, X., Cai, F., Zhang, X., Chen, J., and Zhu, R. (2020). Urban function identification based on poi and taxi trajectory data. In Proceedings of the 3rd International Conference on Big Data Research, ICBDR ’19, page 152–156, New York, NY, USA. Association for Computing Machinery. DOI: 10.1145/3372454.3372468.

Ramsey, P. H. (1989). Critical values for spearman’s rank order correlation. Journal of educational statistics, 14(3):245–253.

Silva, S. D., Campelo, C. E. C., and de Oliveira, M. G. (2022). Generating POI type embeddings based on variableword2vec sentences. In Santos, L. B. L. and de Arruda Pereira, M., editors, XXIII Brazilian Symposium on Geoinformatics - GEOINFO 2022, São José dos Campos, SP, Brazil, November 28 30, 2022, pages 15–26. MC-TIC/INPE.

Tobler, W. R. (1970). A computer movie simulating urban growth in the detroit region. Economic geography, 46(sup1):234–240. van der Maaten, L. and Hinton, G. (2008). Visualizing data using t-sne. Journal of Machine Learning Research, 9(86):2579–2605.

Wang, Z., Li, H., and Rajagopal, R. (2020). Urban2vec: Incorporating street view imagery and pois for multi-modal urban neighborhood embedding. Proceedings of the AAAI Conference on Artificial Intelligence, 34:1013–1020. DOI: 10.1609/aaai.v34i01.5450. Wu, S. et al. (2021). Design and Implementation of LBW–A Mental Health Application for Children. PhD thesis, Graduate School of Vanderbilt University, Nashville, Tennessee.

Yan, B., Janowicz, K., Mai, G., and Gao, S. (2017). From itdl to place2vec: Reasoning about place type similarity and relatedness by learning embeddings from augmented spatial contexts. In Proceedings of the 25th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, SIGSPATIAL ’17, New York, NY, USA. Association for Computing Machinery. DOI: 10.1145/3139958.3140054.

Yao, Y., Li, X., Liu, X., Liu, P., Liang, Z., Zhang, J., and Mai, K. (2017). Sensing spatial distribution of urban land use by integrating points-of-interest and google word2vec model. International Journal of Geographical Information Science, 31(4):825–848. Zhai, W., Bai, X., Shi, Y., Han, Y., Peng, Z.-R., and Gu, C. (2019). Beyond word2vec: An approach for urban functional region extraction and identification by combining place2vec and pois. Computers, Environment and Urban Systems, 74:1–12.

Zhang, L., Sun, Z., Zhang, J., Wu, Y., and Xia, Y. (2022). Conversation-based adaptive relational translation method for next poi recommendation with uncertain check-ins. IEEE Transactions on Neural Networks and Learning Systems, 1(1):1–14. DOI: 10.1109/TNNLS.2022.3146443.

Downloads

Published

2024-07-17

How to Cite

Dantas Silva, S., E. C. Campelo, C., & Guimarães de Oliveira, M. (2024). Sentence-ITDL: Generating POI type Embeddings based on Variable Length Sentences. Journal of Information and Data Management, 15(1), 276–284. https://doi.org/10.5753/jidm.2024.3209

Issue

Section

GEOINFO 2022 - Extended Papers