**Parallelizing approximate single-source personalized PageRank queries on shared memory**

*The VLDB Journal*( IF 2.904 )

**Pub Date : 2019-10-08**

*, DOI:*

*10.1007/s00778-019-00576-7*

Runhui Wang, Sibo Wang, Xiaofang Zhou

### Abstract

Given a directed graph *G*, a source node *s*, and a target node *t*, the personalized PageRank (PPR) \(\pi (s,t)\) measures the importance of node *t* with respect to node *s*. In this work, we study the single-source PPR query, which takes a source node *s* as input and outputs the PPR values of all nodes in *G* with respect to *s*. The single-source PPR query finds many important applications, e.g., community detection and recommendation. Deriving the exact answers for single-source PPR queries is prohibitive, so most existing work focuses on approximate solutions. Nevertheless, existing approximate solutions are still inefficient, and it is challenging to compute single-source PPR queries efficiently for online applications. This motivates us to devise efficient parallel algorithms running on shared-memory multi-core systems. In this work, we present how to efficiently parallelize the state-of-the-art index-based solution *FORA*, and theoretically analyze the complexity of the parallel algorithms. Theoretically, we prove that our proposed algorithm achieves a time complexity of \(O(W/P+\log ^2{n})\), where *W* is the time complexity of sequential *FORA* algorithm, *P* is the number of processors used, and *n* is the number of nodes in the graph. FORA includes a forward push phase and a random walk phase, and we present optimization techniques to both phases, including effective maintenance of active nodes, improving the efficiency of memory access, and cache-aware scheduling. Extensive experimental evaluation demonstrates that our solution achieves up to 37\(\times \) speedup on 40 cores and 3.3\(\times \) faster than alternatives on 40 cores. Moreover, the forward push alone can be used for local graph clustering, and our parallel algorithm for forward push is 4.8\(\times \) faster than existing parallel alternatives.