Overcoming Bias in Community Detection Evaluation
DOI:
https://doi.org/10.5753/jidm.2020.2018Keywords:
Community Structure, Quality Evaluation, Bias, Ensemble Approach, Triangulation MethodAbstract
Community detection is a key task to further understand the function and the structure of complex networks. Therefore, a strategy used to assess this task must be able to avoid biased and incorrect results that might invalidate further analyses or applications that rely on such communities. Two widely used strategies to assess this task are generally known as structural and functional. The structural strategy basically consists in detecting and assessing such communities by using multiple methods and structural metrics. On the other hand, the functional strategy might be used when ground truth data are available to assess the detected communities. However, the evaluation of communities based on such strategies is usually done in experimental configurations that are largely susceptible to biases, a situation that is inherent to algorithms, metrics and network data used in this task. Furthermore, such strategies are not systematically combined in a way that allows for the identification and mitigation of bias in the algorithms, metrics or network data to converge into more consistent results. In this context, the main contribution of this article is an approach that supports a robust quality evaluation when detecting communities in real-world networks. In our approach, we measure the quality of a community by applying the structural and functional strategies, and the combination of both, to obtain different pieces of evidence. Then, we consider the divergences and the consensus among the pieces of evidence to identify and overcome possible sources of bias in community detection algorithms, evaluation metrics, and network data. Experiments conducted with several real and synthetic networks provided results that show the effectiveness of our approach to obtain more consistent conclusions about the quality of the detected communities.
Downloads
References
Abrahao, B., Soundarajan, S., Hopcroft, J., and Kleinberg, R. On the Separability of Structural Classes of Communities. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, NY, USA, pp. 624–632, 2012.
Almeida, H., Guedes, D., Meira Jr, W., and Zaki, M. J. Towards a Better Quality Metric for Graph Cluster Evaluation. Journal of Information and Data Management 3 (3): 378–393, 2012.
Amelio, A. and Pizzuti, C. Is Normalized Mutual Information a Fair Measure for Comparing Community Detection Methods? In Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2015. Paris, France, pp. 1584–1585, 2015.
Blondel, V. D., Guillaume, J.-L., Lambiotte, R., and Lefebvre, E. Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment 2008 (10): P10008, 2008.
Brandão, M. A. and Moro, M. M. The strength of co-authorship ties through different topological properties. Journal of the Brazilian Computer Society 23 (1): 5, 2017.
Brender, J. Framework for Meta-Assessment of Assessment Studies. In Handbook of Evaluation Methods for Health Informatics. Burlington, pp. 253–320, 2006.
Clauset, A., Newman, M. E. J., and Moore, C. Finding community structure in very large networks. Physical Review E vol. 70, pp. 066111, 2004.
Coscia, M. Discovering Communities of Community Discovery. In Proceedings of the 2019 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining. New York, NY, USA, pp. 1–8, 2019.
Coscia, M., Giannotti, F., and Pedreschi, D. A classification for community discovery methods in complex networks. Statistical Analysis and Data Mining: The ASA Data Science Journal 4 (5): 512–546, 2011.
Creswell, J. W. and Creswell, J. D. Research Design: Qualitative, Quantitative, and Mixed Methods Approaches. SAGE Publications, Thousand Oaks, California, USA, 2018.
Dao, V. L., Bothorel, C., and Lenca, P. Community structure: A comparative evaluation of community detection methods. Network Science 8 (1): 1–41, 2020.
Fortunato, S. Community detection in graphs. Physics Reports 486 (3–5): 75–174, 2010.
Gandica, Y., Decuyper, A., Cloquet, C., Thomas, I., and Delvenne, J.-C. Measuring the effect of node aggregation on community detection. EPJ Data Science 9 (1): 6, 2020.
Gemmetto, V., Barrat, A., and Cattuto, C. Mitigation of infectious disease at school: targeted class closure vs school closure. BMC Infectious Diseases 14 (1): 695, 2014.
Génois, M. and Barrat, A. Can co-location be used as a proxy for face-to-face contacts? EPJ Data Science 7 (1): 11, 2018.
Génois, M., Vestergaard, C. L., Fournet, J., Panisson, A., Bonmarin, I., and Barrat, A. Data on face-to-face contacts in an office building suggest a low-cost vaccination strategy based on community linkers. Network Science 3 (3): 326–347, 2015.
Ghasemian, A., Hosseinmardi, H., and Clauset, A. Evaluating Overfit and Underfit in Models of Network Community Structure. IEEE Transactions on Knowledge and Data Engineering 32 (9): 1722–1735, 2020.
Gösgens, M., Tikhonov, A., and Prokhorenkova, L. Systematic analysis of cluster similarity indices: How to validate validation measures. CoRR vol. arXiv:1911.04773, 2020.
Hric, D., Darst, R. K., and Fortunato, S. Community detection in networks: structural communities versus ground truth. Physical Review E 90 (6): 62805, 2014.
Isella, L., Stehlé, J., Barrat, A., Cattuto, C., Pinton, J.-F., and den Broeck, W. V. What’s in a crowd? Analysis of face-to-face behavioral networks. Journal of Theoretical Biology 271 (1): 166 – 180, 2011.
Jebabli, M., Cherifi, H., Cherifi, C., and Hamouda, A. Community detection algorithm evaluation with groundtruth data. Physica A: Statistical Mechanics and its Applications vol. 492, pp. 651 – 706, 2018.
Kivelä, M., Arenas, A., Barthelemy, M., Gleeson, J. P., Moreno, Y., and Porter, M. A. Multilayer networks. Journal of Complex Networks 2 (3): 203–271, 2014.
Labatut, V. Generalised measures for the evaluation of community detection methods. International Journal of Social Network Mining 2 (1): 44–63, 2015.
Lancichinetti, A. and Fortunato, S. Consensus clustering in complex networks. Scientific Reports vol. 2, pp. 336, 2012.
Lei, Y., Bezdek, J. C., Romano, S., Vinh, N. X., Chan, J., and Bailey, J. Ground truth bias in external cluster validity indices. Pattern Recognition vol. 65, pp. 58 – 70, 2017.
Leskovec, J., Lang, K. J., and Mahoney, M. Empirical Comparison of Algorithms for Network Community Detection. In Proceedings of the 19th International Conference on World Wide Web. New York, NY, USA, pp. 631–640, 2010.
Leão, J. C. An Approach for Detecting Communities from Sequences of Social Interactions. M.S. thesis, Universidade Federal de Minas Gerais, Belo Horizonte, MG, Brazil (in Portuguese), 2018.
Leão, J. C., Brandão, M. A., Vaz de Melo, P. O. S., and Laender, A. H. F. Mineração de Perfis Sociais em Redes Temporais. In Anais do 32o Simpósio Brasileiro de Bancos de Dados, SBBD 2017. Uberlândia, MG, pp. 264–269, 2017.
Leão, J. C., Brandão, M. A., Vaz de Melo, P. O. S., and Laender, A. H. F. Who is really in my social circle? Mining social relationships to improve detection of real communities. Journal of Internet Services and Applications 9 (1): 20:1–20:17, 2018.
Leão, J. C., Cardoso, R. J. S., and Santos, A. B. Uma análise temporal da rede de colaboração cientıf́ica do IFNMG: 10 anos de iniciação cientıf́ica e orientação acadêmica. Anais dos Simpósios de Informática do IFNMG - Campus Januária vol. 11, pp. 7, 2019.
Leão, J. C., Laender, A. H. F., and Vaz de Melo, P. O. S. A Multi-Strategy Approach to Overcoming Bias in Community Detection Evaluation. In Anais do 34o Simpósio Brasileiro de Bancos de Dados, SBBD 2019. Fortaleza, CE, pp. 13–24, 2019.
Liu, X., Cheng, H., and Zhang, Z. Evaluation of community detection methods. IEEE Transactions on Knowledge and Data Engineering 32 (9): 736 – 1746, 2020.
Newman, M. E. J. Modularity and community structure in networks. Proceedings of the National Academy of Sciences 103 (23): 8577–8582, 2006.
Newman, M. E. J. and Girvan, M. Finding and evaluating community structure in networks. Physical Review E 69 (2): 26113, 2004.
Nunes, I. O., Celes, C., D., M. S., Vaz de Melo, P. O. S., and Loureiro, A. A. F. GRM: Group Regularity Mobility Model. In Proceedings of the 20th ACM International Conference on Modeling, Analysis and Simulation of Wireless and Mobile Systems. New York, NY, USA, pp. 85–89, 2017.
O’Donoghue, T. and K., P. Qualitative Educational Research in Action: Doing and Reflecting. Routledge, Abingdon, UK, 2003.
Peel, L., Larremore, D. B., and Clauset, A. The ground truth about metadata and community detection in networks. Science Advances 3 (5): 1–8, 2017.
Pons, P. and Latapy, M. Computing communities in large networks using random walks. In Computer and Information Sciences - ISCIS 2005: 20th International Symposium, Istanbul, Turkey, October 26-28, 2005. Proceedings, p. Yolum, T. Güngör, F. Gürgen, and C. Özturan (Eds.). Berlin, Heidelberg, pp. 284–293, 2005.
Raghavan, U. N., Albert, R., and Kumara, S. Near linear time algorithm to detect community structures in large-scale networks. Physical Review E 76 (3): 036106, 2007.
Rand, W. M. Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association 66 (336): 846–850, 1971.
Reichardt, J. and Bornholdt, S. Statistical mechanics of community detection. Physical Review E vol. 74, pp. 016110, 2006.
Rocha, L. E. C., Masuda, N., and Holme, P. Sampling of temporal networks: Methods and biases. Physical Review E 96 (5): 52302, 2017.
Rosvall, M. and Bergstrom, C. T. Multilevel compression of random walks on networks reveals hierarchical organization in large integrated systems. PLOS ONE 6 (4): 1–10, 2011.
Stehlé, J., Voirin, N., Barrat, A., Cattuto, C., Isella, L., Pinton, J.-F., Quaggiotto, M., Van den Broeck, W., Régis, C., Lina, B., and Vanhems, P. High-resolution measurements of face-to-face contact patterns in a primary school. PLOS ONE 6 (8): 1–13, 08, 2011.
Vanhems, P., Barrat, A., Cattuto, C., Pinton, J.-F., Khanafer, N., Régis, C., Kim, B.-a., Comte, B., and Voirin, N. Estimating potential infection transmission routes in hospital wards using wearable proximity sensors. PLOS ONE 8 (9): 1–9, 2013.
Vieira, V. F., Xavier, C. R., and Evsukoff, A. G. Comparing the Community Structure Identified by Overlapping Methods. In Complex Networks and Their Applications VIII, H. Cherifi, S. Gaito, J. F. Mendes, E. Moro, and L. M. Rocha (Eds.). Cham, pp. 262–273, 2020.
Yang, J. and Leskovec, J. Defining and evaluating network communities based on ground-truth. Knowledge and Information Systems 42 (1): 181–213, 2015.
Zachary, W. W. An information flow model for conflict and fission in small groups. Journal of Anthropological Research 33 (4): 452–473, 1977.
Zaki, M. J. and Meira Jr., W. Data Mining and Analysis: Fundamental Concepts and Algorithms. Cambridge University Press, New York, NY, USA, 2014.
Zhao, Y. A survey on theoretical advances of community detection in networks. Computational Statistics 9 (5): e1403, 2017.