Publications
Papers marked with "†" are from theoretical venues, where the authors have equal contributions and are ordered alphabetically.
2024
- IEEE Transactions on Knowledge and Data Engineering, 2024
This survey paper provides a systematic review of several basic techniques and recent algorithms for PPR computation.
Personalized PageRank (PPR) is a traditional measure for node proximity on large graphs. For a pair of nodes \(s\) and \(t\), the PPR value \(\pi_s(t)\) equals the probability that an \(\alpha\)-discounted random walk from \(s\) terminates at \(t\) and reflects the importance between \(s\) and \(t\) in a bidirectional way. As a generalization of Google's celebrated PageRank centrality, PPR has been extensively studied and has found multifaceted applications in many fields, such as network analysis, graph mining, and graph machine learning. Despite numerous studies devoted to PPR over the decades, efficient computation of PPR remains a challenging problem, and there is a dearth of systematic summaries and comparisons of existing algorithms. In this paper, we recap several frequently used techniques for PPR computation and conduct a comprehensive survey of various recent PPR algorithms from an algorithmic perspective. We classify these approaches based on the types of queries they address and review their methodologies and contributions. We also discuss some representative algorithms for computing PPR on dynamic graphs and in parallel or distributed environments.
- In Proceedings of the 56th Annual ACM Symposium on Theory of Computing, Jun 2024
This paper uses simple techniques and analyses to give matching upper and lower bounds for estimating PageRank contributions and single-node PageRank. Our results for the upper bounds are derived by revisiting the known algorithms of ApproxContributions (a.k.a. Backward Push) and BiPPR.
We revisit the classic local graph exploration algorithm ApproxContributions proposed by Andersen, Borgs, Chayes, Hopcroft, Mirrokni, and Teng (WAW '07, Internet Math. '08) for computing an \(\epsilon\)-approximation of the PageRank contribution vector for a target node \(t\) on a graph with \(n\) nodes and \(m\) edges. We give a worst-case complexity bound of ApproxContributions as \(O\left(n\pi(t)/\epsilon\cdot\min\big(\Delta_{\mathrm{in}},\Delta_{\mathrm{out}},\sqrt{m}\big)\right)\), where \(\pi(t)\) is the PageRank score of \(t\), and \(\Delta_{\mathrm{in}}\) and \(\Delta_{\mathrm{out}}\) are the maximum in-degree and out-degree of the graph, resp. We also give a lower bound of \(\Omega\left(\min\big(\Delta_{\mathrm{in}}/\delta,\Delta_{\mathrm{out}}/\delta,\sqrt{m}/\delta,m\big)\right)\) for detecting the \(\delta\)-contributing set of \(t\), showing that the simple ApproxContributions algorithm is already optimal.
As ApproxContributions has become a cornerstone for computing random-walk probabilities, our results and techniques can be applied to derive better bounds for various relevant problems. In particular, we investigate the computational complexity of locally estimating a node's PageRank centrality. We improve the best-known upper bound of \(\widetilde{O}\left(n^{2/3}\cdot\min\left(\Delta_{\mathrm{out}}^{1/3},m^{1/6}\right)\right)\) given by Bressan, Peserico, and Pretto (SICOMP '23) to \(O\left(n^{1/2}\cdot\min\left(\Delta_{\mathrm{in}}^{1/2},\Delta_{\mathrm{out}}^{1/2},m^{1/4}\right)\right)\) by simply combining ApproxContributions with the Monte Carlo simulation method. We also improve their lower bound of \(\Omega\left(\min\left(n^{1/2}\Delta_{\mathrm{out}}^{1/2},n^{1/3}m^{1/3}\right)\right)\) to \(\Omega\left(n^{1/2}\cdot\min\left(\Delta_{\mathrm{in}}^{1/2},\Delta_{\mathrm{out}}^{1/2},m^{1/4}\right)\right)\) if \(\min(\Delta_{\mathrm{in}},\Delta_{\mathrm{out}})=\Omega\left(n^{1/3}\right)\), and to \(\Omega\left(n^{1/2-\gamma}\big(\min(\Delta_{\mathrm{in}},\Delta_{\mathrm{out}})\big)^{1/2+\gamma}\right)\) if \(\min(\Delta_{\mathrm{in}},\Delta_{\mathrm{out}})=o\left(n^{1/3}\right)\), where \(\gamma>0\) is an arbitrarily small constant. Our matching upper and lower bounds resolve the open problem of whether one can tighten the bounds given by Bressan, Peserico, and Pretto (FOCS '18, SICOMP '23). Remarkably, the techniques and analyses for proving all our results are surprisingly simple. - Zhewei Wei*, Ji-Rong Wen, and Mingji YangIn Proceedings of the 27th International Conference on Database Theory, Mar 2024
This paper proposes new algorithms to improve the upper bounds for approximating Single-Source Personalized PageRank with (degree-normalized) absolute error guarantees on various graphs.
Personalized PageRank (PPR) is an extensively studied and applied node proximity measure in graphs. For a pair of nodes \(s\) and \(t\) on a graph \(G=(V,E)\), the PPR value \(\pi(s,t)\) is defined as the probability that an \(\alpha\)-discounted random walk from \(s\) terminates at \(t\), where the walk terminates with probability \(\alpha\) at each step. We study the classic Single-Source PPR query, which asks for PPR approximations from a given source node \(s\) to all nodes in the graph. Specifically, we aim to provide approximations with absolute error guarantees, ensuring that the resultant PPR estimates \(\hat{\pi}(s,t)\) satisfy \(\max_{t\in V}\big|\hat{\pi}(s,t)-\pi(s,t)\big|\le\varepsilon\) for a given error bound \(\varepsilon\). We propose an algorithm that achieves this with high probability, with an expected running time of
- \(\widetilde{O}\big(\sqrt{m}/\varepsilon\big)\) for directed graphs, where \(m=|E|\);
- \(\widetilde{O}\big(\sqrt{d_{\mathrm{max}}}/\varepsilon\big)\) for undirected graphs, where \(d_{\mathrm{max}}\) is the maximum node degree in the graph;
- \(\widetilde{O}\left(n^{\gamma-1/2}/\varepsilon\right)\) for power-law graphs, where \(n=|V|\) and \(\gamma\in\left(\frac{1}{2},1\right)\) is the extent of the power law.
These sublinear bounds improve upon existing results. We also study the case when degree-normalized absolute error guarantees are desired, requiring \(\max_{t\in V}\big|\hat{\pi}(s,t)/d(t)-\pi(s,t)/d(t)\big|\le\varepsilon_d\) for a given error bound \(\varepsilon_d\), where the graph is undirected and \(d(t)\) is the degree of node \(t\). We give an algorithm that provides this error guarantee with high probability, achieving an expected complexity of \(\widetilde{O}\left(\sqrt{\sum_{t\in V}\pi(s,t)/d(t)}\big/\varepsilon_d\right)\). This improves over the previously known \(O(1/\varepsilon_d)\) complexity.