Abstract
Given a graph G, a source node s, and a target node t, the personalized PageRank (PPR) of t with respect to s is the probability that a random walk starting from s terminates at t. An important variant of the PPR query is single-source PPR (SSPPR), which enumerates all nodes in G and returns the top-k nodes with the highest PPR values with respect to a given source s. PPR in general and SSPPR in particular have important applications in web search and social networks, e.g., in Twitter’s Who-To-Follow recommendation service. However, PPR computation is known to be expensive on large graphs and resistant to indexing. Consequently, previous solutions either use heuristics, which do not guarantee result quality, or rely on the strong computing power of modern data centers, which is costly.
Motivated by this, we propose effective index-free and index-based algorithms for approximate PPR processing, with rigorous guarantees on result quality. We first present FORA, an approximate SSPPR solution that combines two existing methods—Forward Push (which is fast but does not guarantee quality) and Monte Carlo Random Walk (accurate but slow)—in a simple and yet non-trivial way, leading to both high accuracy and efficiency. Further, FORA includes a simple and effective indexing scheme, as well as a module for top-k selection with high pruning power. Extensive experiments demonstrate that the proposed solutions are orders of magnitude more efficient than their respective competitors. Notably, on a billion-edge Twitter dataset, FORA answers a top-500 approximate SSPPR query within 1s, using a single commodity server.
- Reid Andersen, Christian Borgs, Jennifer T. Chayes, John E. Hopcroft, Vahab S. Mirrokni, and Shang-Hua Teng. 2007. Local computation of PageRank contributions. In Proceedings of the WAW. 150--165.Google ScholarCross Ref
- Reid Andersen, Fan R. K. Chung, and Kevin J. Lang. 2006. Local graph partitioning using PageRank vectors. In Proceedings of the FOCS. 475--486.Google Scholar
- Konstantin Avrachenkov, Nelly Litvak, Danil Nemirovsky, Elena Smirnova, and Marina Sokol. 2011. Quick detection of Top-k personalized PageRank lists. In Proceedings of the WAW. 50--61.Google ScholarCross Ref
- Lars Backstrom and Jure Leskovec. 2011. Supervised random walks: Predicting and recommending links in social networks. In Proceedings of the WSDM. 635--644.Google ScholarDigital Library
- Bahman Bahmani, Kaushik Chakrabarti, and Dong Xin. 2011. Fast personalized PageRank on MapReduce. In Proceedings of the SIGMOD. 973--984.Google ScholarDigital Library
- Pavel Berkhin. 2005. Survey: A survey on PageRank computing. Int. Math. 2, 1 (2005), 73--120.Google ScholarCross Ref
- Deepayan Chakrabarti, Yiping Zhan, and Christos Faloutsos. 2004. R-MAT: A recursive model for graph mining. In Proceedings of the SDM. 442--446.Google ScholarCross Ref
- Fan R. K. Chung and Lincoln Lu. 2006. Survey: Concentration inequalities and martingale inequalities: A survey. Int. Math. 3, 1 (2006), 79--127.Google ScholarCross Ref
- Dániel Fogaras, Balázs Rácz, Károly Csalogány, and Tamás Sarlós. 2005. Towards scaling fully personalized PageRank: Algorithms, lower bounds, and experiments. Int. Math. 2, 3 (2005), 333--358.Google ScholarCross Ref
- Yasuhiro Fujiwara, Makoto Nakatsuji, Hiroaki Shiokawa, Takeshi Mishima, and Makoto Onizuka. 2013. Efficient ad hoc search for personalized PageRank. In Proceedings of the SIGMOD. 445--456.Google ScholarDigital Library
- Yasuhiro Fujiwara, Makoto Nakatsuji, Takeshi Yamamuro, Hiroaki Shiokawa, and Makoto Onizuka. 2012. Efficient personalized PageTank with accuracy assurance. In Proceedings of the KDD. 15--23.Google Scholar
- Manish S. Gupta, Amit Pathak, and Soumen Chakrabarti. 2008. Fast algorithms for top-k personalized PageRank queries. In Proceedings of the WWW. 1225--1226.Google Scholar
- Pankaj Gupta, Ashish Goel, Jimmy Lin, Aneesh Sharma, Dong Wang, and Reza Zadeh. 2013. WTF: The Who To Follow service at Twitter. In Proceedings of the WWW. 505--514.Google ScholarDigital Library
- Taher H. Haveliwala. 2002. Topic-sensitive PageRank. In Proceedings of the WWW. 517--526.Google Scholar
- Kalervo Järvelin and Jaana Kekäläinen. 2000. IR evaluation methods for retrieving highly relevant documents. In Proceedings of the SIGIR. 41--48.Google ScholarDigital Library
- Glen Jeh and Jennifer Widom. 2003. Scaling personalized web search. In Proceedings of the WWW. 271--279.Google ScholarDigital Library
- Peter Lofgren. 2015. Efficient algorithms for personalized PageRank. Retrieved from: CoRR abs/1512.04633 (2015).Google Scholar
- Peter Lofgren, Siddhartha Banerjee, and Ashish Goel. 2016. Personalized PageRank estimation and search: A bidirectional approach. In Proceedings of the WSDM. 163--172.Google ScholarDigital Library
- Peter A. Lofgren, Siddhartha Banerjee, Ashish Goel, and C. Seshadhri. 2014. FAST-PPR: Scaling personalized PageRank estimation for large graphs. In Proceedings of the KDD. 1436--1445.Google Scholar
- Takanori Maehara, Takuya Akiba, Yoichi Iwata, and Ken-ichi Kawarabayashi. 2014. Computing personalized PageRank quickly by exploiting graph structures. PVLDB 7, 12 (2014), 1023--1034.Google ScholarDigital Library
- Naoto Ohsaka, Takanori Maehara, and Ken-ichi Kawarabayashi. 2015. Efficient PageRank tracking in evolving networks. In Proceedings of the SIGKDD. 875--884.Google ScholarDigital Library
- Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. 1999. The PageRank Citation Ranking: Bringing Order to the Web. Technical Report. Stanford University, Stanford, CA.Google Scholar
- Atish Das Sarma, Anisur Rahaman Molla, Gopal Pandurangan, and Eli Upfal. 2013. Fast distributed PageRank computation. In Proceedings of the ICDCN. 11--26.Google Scholar
- Kijung Shin, Jinhong Jung, Lee Sael, and U. Kang. 2015. BEAR: Block elimination approach for random walk with restart on large graphs. In Proceedings of the SIGMOD. 1571--1585.Google Scholar
- Sibo Wang, Youze Tang, Xiaokui Xiao, Yin Yang, and Zengxiang Li. 2016. HubPPR: Effective indexing for approximate personalized PageRank. PVLDB 10, 3 (2016), 205--216. Retrieved from: http://www.vldb.org/pvldb/vol10/p205-wang.pdf.Google ScholarDigital Library
- Sibo Wang and Yufei Tao. 2018. Efficient algorithms for finding approximate heavy hitters in personalized PageRanks. In Proceedings of the SIGMOD. 1113--1127.Google ScholarDigital Library
- Sibo Wang, Renchi Yang, Xiaokui Xiao, Zhewei Wei, and Yin Yang. 2017. FORA: Simple and effective approximate single-source personalized PageRank. In Proceedings of the SIGKDD. 505--514.Google ScholarDigital Library
- Zhewei Wei, Xiaodong He, Xiaokui Xiao, Sibo Wang, Yu Liu, Xiaoyong Du, and Ji-Rong Wen. 2019. PRSim: Sublinear time SimRank computation on large power-law graphs. In Proceedings of the SIGMOD. 1042--1059.Google ScholarDigital Library
- Zhewei Wei, Xiaodong He, Xiaokui Xiao, Sibo Wang, Shuo Shang, and Ji-Rong Wen. 2018. TopPPR: Top-k personalized PageRank queries with precision guarantees on large graphs. In Proceedings of the SIGMOD. 441--456.Google ScholarDigital Library
- Minji Yoon, Jinhong Jung, and U. Kang. 2018. TPA: Fast, scalable, and accurate method for approximate random walk with restart on billion scale graphs. In Proceedings of the ICDE.Google Scholar
- Hongyang Zhang, Peter Lofgren, and Ashish Goel. 2016. Approximate personalized PageRank on dynamic graphs. In Proceedings of the KDD. 1315--1324.Google ScholarDigital Library
- Fanwei Zhu, Yuan Fang, Kevin Chen-Chuan Chang, and Jing Ying. 2013. Incremental and accuracy-aware personalized PageRank through scheduled approximation. PVLDB 6, 6 (2013), 481--492.Google ScholarDigital Library
Index Terms
Efficient Algorithms for Approximate Single-Source Personalized PageRank Queries
Recommendations
Personalized PageRank to a Target Node, Revisited
KDD '20: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data MiningPersonalized PageRank (PPR) is a widely used node proximity measure in graph mining and network analysis. Given a source node s and a target node t, the PPR value π(s,t) represents the probability that a random walk from s terminates at t, and thus ...
FORA: Simple and Effective Approximate Single-Source Personalized PageRank
KDD '17: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data MiningGiven a graph G, a source node s and a target node t, the personalized PageRank (PPR) of t with respect to s is the probability that a random walk starting from s terminates at t. A single-source PPR (SSPPR) query enumerates all nodes in G, and returns ...
TopPPR: Top-k Personalized PageRank Queries with Precision Guarantees on Large Graphs
SIGMOD '18: Proceedings of the 2018 International Conference on Management of DataPersonalized PageRank (PPR) is a classic metric that measures the relevance of graph nodes with respect to a source node. Given a graph G, a source node s, and a parameter k, a top-k PPR query returns a set of k nodes with the highest PPR values with ...
Comments