Using Statistical Features to Find Phrasal Terms in Text Collections
DOI:
https://doi.org/10.5753/jidm.2010.1296Keywords:
phrasal terms, phrase queriesAbstract
In this work we investigate alternatives to automatically detect phrasal terms, defined here as phrasal verbs, phrasal nouns, phrasal adjectives or phrasal adverbs found in a text. The automatic identification of phrasal terms may have several applications in text processing systems. We approach this problem and present a novel approach for detecting phrasal terms in a collection of documents. Our solution is based on machine learning and uses statistical features of the word n-grams found in the documents. We also investigate the particular impact of adding phrasal terms in the retrieval model of a search engine when processing queries on several data sets. Our results show that we are able to discover valid phrasal terms with a small error rate, achieving detection results ranging from 70% to 94% in terms of F1. Furthermore, the discovered phrasal terms, when used to enhance search tasks, allow improvements in retrieval performance of up to 11% in terms of MAP when considering all queries, and up to 36% in terms of MAP when considering only the queries that contained the detected phrasal terms.Downloads
Download data is not yet available.
Downloads
Published
2010-10-06
How to Cite
Carvalho, A. L. da C., Moura, E. S. de, & Calado, P. (2010). Using Statistical Features to Find Phrasal Terms in Text Collections. Journal of Information and Data Management, 1(3), 583. https://doi.org/10.5753/jidm.2010.1296
Issue
Section
Regular Papers