skip to main content
research-article

Scalable and axiomatic ranking of network role similarity

Published:01 February 2014Publication History
Skip Abstract Section

Abstract

A key task in analyzing social networks and other complex networks is role analysis: describing and categorizing nodes according to how they interact with other nodes. Two nodes have the same role if they interact with equivalent sets of neighbors. The most fundamental role equivalence is automorphic equivalence. Unfortunately, the fastest algorithms known for graph automorphism are nonpolynomial. Moreover, since exact equivalence is rare, a more meaningful task is measuring the role similarity between any two nodes. This task is closely related to the structural or link-based similarity problem that SimRank addresses. However, SimRank and other existing similarity measures are not sufficient because they do not guarantee to recognize automorphically or structurally equivalent nodes. This article makes two contributions. First, we present and justify several axiomatic properties necessary for a role similarity measure or metric. Second, we present RoleSim, a new similarity metric that satisfies these axioms and can be computed with a simple iterative algorithm. We rigorously prove that RoleSim satisfies all of these axiomatic properties. We also introduce Iceberg RoleSim, a scalable algorithm that discovers all pairs with RoleSim scores above a user-defined threshold θ. We demonstrate the interpretative power of RoleSim on both both synthetic and real datasets.

References

  1. Ioannis Antonellis, Hector Garcia-Molina, and Chi-Chao Chang. 2008. Simrank++: Query rewriting through link analysis of the clickgraph. Proc. VLDB Endow. 1, 1, 408--421. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. D. Avis. 1983. A survey of heuristics for the weighted matching problem. Network 13, 475--493.Google ScholarGoogle ScholarCross RefCross Ref
  3. Vladimir Batagelj, Patrick Doreian, and Anuška Ferligoj. 1992. An optimizational approach to regular equivalence. Social Networks 14, 121--135.Google ScholarGoogle ScholarCross RefCross Ref
  4. Stephen P. Borgatti and Martin G. Everett. 1992. Notions of position in social network analysis. Sociological Methodology 22, 1--35.Google ScholarGoogle ScholarCross RefCross Ref
  5. Stephen P. Borgatti and Martin G. Everett. 1993. Two algorithms for computing regular equivalence. Social Networks 15, 361--376.Google ScholarGoogle ScholarCross RefCross Ref
  6. Yuanzhe Cai, Gao Cong, Xu Jia, Hongyan Liu, Jun He, Jiaheng Lu, and Xiaoyong Du. 2009. Efficient algorithm for computing link-based similarity in real world networks. In Ninth IEEE Int. Conf. Data Mining (ICDM). IEEE Computer Society, 734--739. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Shai Carmi, Shlomo Havlin, Scott Kirkpatrick, Yuval Shavitt, and Eran Shir. 2007. A model of Internet topology using k-shell decomposition. In Proc. Nat’l Academy Sci. (PNAS) 104, 27, 11150--11154.Google ScholarGoogle ScholarCross RefCross Ref
  8. Dragos M. Cvetkovíc, Michael Doob, and Horst Sachs. 1998. Spectra of Graphs: Theory and Applications, 3rd Revised and Enlarged Edition. Wiley.Google ScholarGoogle Scholar
  9. Patrick Doreian, Vladimir Batagelj, and Anuška Ferligoj. 2005. Generalized Blockmodeling. Vol. 25. Cambridge University Press.Google ScholarGoogle Scholar
  10. Natalia Dragan, Michael L. Collard, and Jonathan I. Maletic. 2009. Using method stereotype distribution as a signature descriptor for software systems. In IEEE Int. Conf. Software Maintenance (ICSM). IEEE, 567--570.Google ScholarGoogle Scholar
  11. Martin G. Everett and Stephen P. Borgatti. 1996. Exact colorations of graphs and digraphs. Social Networks 18, 319--331.Google ScholarGoogle ScholarCross RefCross Ref
  12. Dániel Fogaras and Balázs Rácz. 2005. Scaling link-based similarity search. In Proc. 14th Int. Conf. World Wide Web (WWW). ACM, 641--650. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Scott Fortin. 1996. The Graph Isomorphism Problem. Technical Report TR 96-20. Dept. Computer Science, University of Alberta, Edmonton, Alberta, Canada.Google ScholarGoogle Scholar
  14. Linton C. Freeman. 1977. A set of measures of centrality based on betweenness. Sociometry 40, 1, 35--41.Google ScholarGoogle ScholarCross RefCross Ref
  15. Chris Godsil and Gordon Royle. 2001. Algebraic Graph Theory. Springer-Verlag.Google ScholarGoogle Scholar
  16. Emilie M. Hafner-Burton, Miles Kahler, and Alexander H. Montgomery. 2009. Network analysis for international relations. International Organization 63, 3, 559--592.Google ScholarGoogle ScholarCross RefCross Ref
  17. Petter Holme and Mikael Huss. 2005. Role-similarity based functional prediction in networked systems: Application to the yeast proteome. J. R. Soc. Interface 2, 4, 327--333.Google ScholarGoogle ScholarCross RefCross Ref
  18. Glen Jeh and Jennifer Widom. 2002. SimRank: A measure of structural-context similarity. In Proc. 8th ACM SIGKDD Int. Conf. Knowledge Discovery Data Mining (KDD). ACM, 538--543. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Xu Jia, Yuanzhe Cai, Hongyan Liu, Jun He, and Xiaoyong Du. 2009. Calculating similarity efficiently in a small world. In Proc. 5th Int. Conf. Advanced Data Mining Applications (ADMA). Springer-Verlag, Berlin, Heidelberg, 175--187. DOI: http://dx.doi.org/10.1007/978-3-642-03348-3_19 Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Ruoming Jin, Victor E. Lee, and Hui Hong. 2011. Axiomatic ranking of network role similarity. In KDD. ACM, 922--930. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. M. M. Kessler. 1963. Bibliographic coupling between scientific papers. American Documentation 14, 1, 10--25.Google ScholarGoogle ScholarCross RefCross Ref
  22. H. W. Kuhn. 1955. The Hungarian method for the assignment problem. Naval Research Logistics Quarterly 2, 1--2, 83--97.Google ScholarGoogle ScholarCross RefCross Ref
  23. Victor E. Lee, Ning Ruan, Ruoming Jin, and Charu Aggarwal. 2010. Managing and Mining Graph Data. Springer, Chapter 10: A survey of algorithms for dense subgraph discovery, 303--336.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. E. A. Leicht, Petter Holme, and Mark E. J. Newman. 2005. Vertex similarity in networks. Phys. Rev. E 73, 2, 026120.Google ScholarGoogle Scholar
  25. Michael Ley, Marc Herbstritt, Marcel R. Ackermann, Oliver Hoffmann, Michael Wagner, Stefanie von Keutz, Katharina Hostert, and Doris Holzträger. 2012. The DBLP Computer Science Bibliography. Schloss Dagstuhl - Leibniz-Zentrum für Informatik. http://www.informatik.uni-trier.de/∼ley/db/.Google ScholarGoogle Scholar
  26. Pei Li, Yuanzhe Cai, Hongyan Liu, Jun He, and Xiaoyong Du. 2009. Exploiting the block structure of link graph for efficient similarity computation. In Proc. 13th Pacific-Asia Conf. Advances Knowledge Discovery Data Mining (PAKDD). Springer-Verlag, Berlin, Heidelberg, 389--400. DOI: http://dx.doi.org/10.1007/978-3-642-01307-2_36 Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Zhenjiang Lin, Irwin King, and Michael R. Lyu. 2006. PageSim: A novel link-based similarity measure for the World Wide Web. In Proc. IEEE/WIC/ACM Int’l Conf. Web Intelligence. IEEE Computer Society, 687--693. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Zhenjiang Lin, Michael R. Lyu, and Irwin King. 2007. Extending link-based algorithms for similar Web pages with neighborhood structure. In Proc. IEEE/WIC/ACM Int’l Conf. Web Intelligence. IEEE Computer Society, 263--266. http://www.cse.cuhk.edu.hk/∼king/PUB/WI2007_Lin.pdf. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Zhenjiang Lin, Michael R. Lyu, and Irwin King. 2009. MatchSim: A novel neighbor-based similarity measure with maximum neighborhood matching. In Proc. 18th ACM Conf. Inform. Knowledge Manage. (CIKM). ACM, 1613--1616. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Dmitry Lizorkin, Pavel Velikhov, Maxim Grinev, and Denis Turdakov. 2008. Accuracy estimate and optimization techniques for SimRank computation. In Proc. VLDB Endow. 1, 1, 422--433. DOI: http://dx.doi.org/10.1145/1453856.1453904 Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. F. P. Lorrain and H. C. White. 1971. Structural equivalence of individuals in networks. J. Math. Sociology 1, 49--80.Google ScholarGoogle ScholarCross RefCross Ref
  32. J. J. Luczkovich, Stephen P. Borgatti, J. C. Johnson, and Martin G. Everett. 2003. Defining and measuring trophic role similarity in food webs using regular coloration. J. Theoretical Biology 220, 3, 303--321.Google ScholarGoogle ScholarCross RefCross Ref
  33. Ben D. MacArthur, Rubén J. Sánchez-García, and James W. Anderson. 2008. Note: Symmetry in complex networks. J. Discrete Applied Math. 156, 18, 3525--3531. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Maarten Marx and Michael Masuch. 2003. Regular equivalence and dynamic logic. Social Networks 25, 1, 51--65.Google ScholarGoogle ScholarCross RefCross Ref
  35. B. D. McKay. 1981. Practical graph isomorphism. Congressus Numerantium 30, 45--87.Google ScholarGoogle Scholar
  36. Guy Melançon and Arnaud Sallaberry. 2008. Edge metrics for visual graph analytics: A comparative study. In Proc. 12th Int. Conf. Inform. Visual. IEEE Computer Society, 610--615. DOI: http://dx.doi.org/10.1109/IV.2008.10 Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Microsoft Research. 2012. Microsoft academic search. http://academic.research.microsoft.com/RankList? entitytype=2&topdomainid=2&subdomainid=7. (2012). Accessed August 2012.Google ScholarGoogle Scholar
  38. Mark Newman. 2006. Internet network. http://www-personal.umich.edu/∼mejn/netdata/.Google ScholarGoogle Scholar
  39. Mark E. J. Newman. 2004. Coauthorship networks and patterns of scientific collaboration. In Proc. Nat’l Academy Sci. (PNAS) 101, Suppl 1, 5200--5205.Google ScholarGoogle ScholarCross RefCross Ref
  40. Evelien Otte and Ronald Rousseau. 2002. Social network analysis: A powerful strategy, also for the information sciences. J. Information Science 28, 6, 441--453.Google ScholarGoogle ScholarCross RefCross Ref
  41. Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. 1999. The PageRank Citation Ranking: Bringing Order to the Web. Technical Report 1999-66. Stanford InfoLab. http://ilpubs.stanford.edu:8090/422/1/1999-66.pdf.Google ScholarGoogle Scholar
  42. Ronald Read and Derek Corneil. 1977. The graph isomorphism disease. J. Graph Theory 1, 339--363.Google ScholarGoogle ScholarCross RefCross Ref
  43. Michael Schultz and Mark Liberman. 1999. Topic detection and tracking using idf-weighted cosine coefficient. In Proc. DARPA Broadcast News Workshop. Morgan Kaufmann, 189--192.Google ScholarGoogle Scholar
  44. Henry Small. 1973. Co-citation in the scientific literature: A new measure of the relationship between two documents. J. Amer. Soc. Information Sci. 24, 265--269.Google ScholarGoogle ScholarCross RefCross Ref
  45. Malcolm K. Sparrow. 1993. A linear algorithm for computing automorphic equivalence classes: The numerical signatures approach. Social Networks 15, 2, 151--170. DOI: http://dx.doi.org/10.1016/0378-8733(93)90003-4Google ScholarGoogle ScholarCross RefCross Ref
  46. Jie Tang, Jing Zhang, Limin Yao, and Juanzi Li. 2008. Extraction and mining of an academic social network. In Proc. 17th Int. Conf. World Wide Web (WWW). ACM, 1193--1194. DOI: http://dx.doi.org/10.1145/1367497.1367722 Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. T. T. Tanimoto. 1958. An elementary mathematical theory of classification and prediction. IBM Taxonomy Application M. A. 6, 3.Google ScholarGoogle Scholar
  48. Sudhir L. Tauro, Georgos Siganos, C. Palmer, and Michalis Faloutsos. 2001. A simple conceptual model for the Internet topology. In Proc. IEEE Global Telecomm. Conf. IEEE, 1667--1671.Google ScholarGoogle ScholarCross RefCross Ref
  49. Yuchung J. Wang and George Y. Wong. 1987. Stochastic blockmodels for directed graphs. J. American Statistical Assoc. 82, 397, 8--19.Google ScholarGoogle ScholarCross RefCross Ref
  50. Stanley Wasserman and Katherine Faust. 1994. Social Network Analysis: Methods and Applications. Cambridge University Press.Google ScholarGoogle Scholar
  51. Douglas R. White and Karl P. Reitz. 1983. Graph and semigroup homomorphisms on networks of relations. Social Networks 5, 193--234.Google ScholarGoogle ScholarCross RefCross Ref
  52. Harrison White, Scott Boorman, and Ronald Breiger. 1976. Social structure from multiple networks. I: Blockmodels of roles and positions. Am. J. Sociology 81, 730--780.Google ScholarGoogle ScholarCross RefCross Ref
  53. Wensi Xi, Edward A. Fox, Weiguo Fan, Benyu Zhang, Zheng Chen, Jun Yan, and Dong Zhuang. 2005. SimFusion: Measuring similarity using unified relationship matrix. In Proc. 28th Int. ACM SIG Conf. Research Develop. Inform. Retrieval (SIGIR). ACM, 130--137. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Erjia Yan and Ying Ding. 2009. Applying centrality measures to impact analysis: A coauthorship network analysis. J. Am. Soc. Information Sci. Technology 60, 10, 2107--2118. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Xiaoxin Yin, Jiawei Han, and Philip S. Yu. 2006. LinkClus: Efficient clustering via heterogeneous semantic links. In Proc. 32nd Int. Conf. Very Large Data Bases (VLDB). VLDB Endowment, 427--438. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Peixiang Zhao, Jiawei Han, and Yizhou Sun. 2009. P-Rank: A comprehensive structural similarity measure over information networks. In Proc. 18th ACM Conf. Inform. Knowledge Manage. (CIKM). ACM, 553--562. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Scalable and axiomatic ranking of network role similarity

              Recommendations

              Reviews

              Hector Zenil

              The thorough survey of measures for network similarity makes for a fine paper. This paper aims to assess and dissect the main assumptions of a network role similarity metric. This is a metric that does not just match the topological properties of a network onto another one or achieve the same goal through graph theory; it finds nodes that may have similar roles to others in the way they are connected. Think of two families where the roles are clear: there are almost always two parents, yet the two families may have a different number of children and/or related family members. When comparing two networks, one may want to find the nodes playing the role of parent in both networks. Little by little, the authors explore and build the intuition for defining ranking metrics beyond SimRank that are up to the task. The paper is technically precise, highly understandable, and well written. Accessible to readers with only a little background in network science, this is essential reading for network scientists, whether they are new or in need of similarity metrics beyond more traditional ones such as the Jaccard index or link similarity, to mention two more traditional examples. The authors go on to introduce their own axiomatic role similarity metric, with full understanding of the current and previous literature on the subject. They aim to provide a sound measure with nothing but the essential properties for optimal network role similarity. Then they return to more established algorithms and test them against their axiomatization. They show, for example, that SimRank is not admissible because automorphism confirmation does not hold; this is also true for MatchSim. The authors claim that their RoleSim similarity measure, however, is optimal. They later proceed to experimental evaluation and concerns related to the time complexity of algorithms and other aspects, even testing on real-world networks (the Internet). The appendix is full of details, including theorems and proofs that support the claims in the main text. Online Computing Reviews Service

              Access critical reviews of Computing literature here

              Become a reviewer for Computing Reviews.

              Comments

              Login options

              Check if you have access through your login credentials or your institution to get full access on this article.

              Sign in

              Full Access

              • Published in

                cover image ACM Transactions on Knowledge Discovery from Data
                ACM Transactions on Knowledge Discovery from Data  Volume 8, Issue 1
                Casin special issue
                February 2014
                157 pages
                ISSN:1556-4681
                EISSN:1556-472X
                DOI:10.1145/2582178
                Issue’s Table of Contents

                Copyright © 2014 ACM

                Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                Publisher

                Association for Computing Machinery

                New York, NY, United States

                Publication History

                • Published: 1 February 2014
                • Accepted: 1 August 2013
                • Revised: 1 April 2013
                • Received: 1 September 2012
                Published in tkdd Volume 8, Issue 1

                Permissions

                Request permissions about this article.

                Request Permissions

                Check for updates

                Qualifiers

                • research-article
                • Research
                • Refereed

              PDF Format

              View or Download as a PDF file.

              PDF

              eReader

              View online with eReader.

              eReader