FPCluster: An Efficient Out-of-core Clustering Strategy without a Similarity Metric
DOI:
https://doi.org/10.5753/jidm.2012.1442Keywords:
Clustering, out-of-core, protein families, spam detectionAbstract
Clustering is one of the most popular and relevant data mining tasks. Two challenges for determining clusters arethe volume of data to be grouped and the difficulty in defining a similarity metric applicable to the entire data set. In this work we present FPCluster, a new clustering algorithm that addresses both problems. The algorithm is based on building out-of-core frequent pattern trees, a data structure originally proposed for mining patterns. Additionally, the algorithm transparently handles missing features, a common constraint in real case scenarios. We applied FPCluster to two real scenarios: characterization of spam campaigns and clustering of protein families. We evaluated both the quality of the obtained groups and the computational efficiency of the proposed strategy. In particular, we achieved precision above 90% while the storage demand increased sub-linearly.Downloads
Download data is not yet available.
Downloads
Published
2012-09-27
How to Cite
Pires, D. E., Totti, L. C., Moreira, R. E., Fazzion, E. C., Fonseca, O. L., Meira Jr, W., Melo-Minardi, R. C. de, & Neto, D. G. (2012). FPCluster: An Efficient Out-of-core Clustering Strategy without a Similarity Metric. Journal of Information and Data Management, 3(2), 132. https://doi.org/10.5753/jidm.2012.1442
Issue
Section
SBBD 2011 Short Papers