Publications | Mingji Yang

2024

TKDE 2024
Efficient Algorithms for Personalized PageRank Computation: A Survey

Mingji Yang, Hanzhi Wang, Zhewei Wei^*, Sibo Wang, and Ji-Rong Wen

IEEE Transactions on Knowledge and Data Engineering, 2024

TLDR Abstract BibTex arXiv Comments by Seshadhri

This survey paper provides a systematic review of several basic techniques and recent algorithms for PPR computation.

Personalized PageRank (PPR) is a traditional measure for node proximity on large graphs. For a pair of nodes \(s\) and \(t\), the PPR value \(\pi_s(t)\) equals the probability that an \(\alpha\)-discounted random walk from \(s\) terminates at \(t\) and reflects the importance between \(s\) and \(t\) in a bidirectional way. As a generalization of Google's celebrated PageRank centrality, PPR has been extensively studied and has found multifaceted applications in many fields, such as network analysis, graph mining, and graph machine learning. Despite numerous studies devoted to PPR over the decades, efficient computation of PPR remains a challenging problem, and there is a dearth of systematic summaries and comparisons of existing algorithms. In this paper, we recap several frequently used techniques for PPR computation and conduct a comprehensive survey of various recent PPR algorithms from an algorithmic perspective. We classify these approaches based on the types of queries they address and review their methodologies and contributions. We also discuss some representative algorithms for computing PPR on dynamic graphs and in parallel or distributed environments.
@article{yang2024efficient, author = {Mingji Yang and Hanzhi Wang and Zhewei Wei and Sibo Wang and Ji{-}Rong Wen}, title = {Efficient algorithms for {Personalized} {PageRank} computation: a survey}, journal = {{IEEE} Transactions on Knowledge and Data Engineering}, volume = {36}, number = {9}, pages = {4582--4602}, year = {2024}, doi = {10.1109/TKDE.2024.3376000} }
STOC 2024
Revisiting Local Computation of PageRank: Simple and Optimal †

Hanzhi Wang, Zhewei Wei^*, Ji-Rong Wen, and Mingji Yang

In Proceedings of the 56th Annual ACM Symposium on Theory of Computing, Jun 2024

TLDR Abstract BibTex arXiv Video Comments by Seshadhri

This paper uses simple techniques and analyses to give matching upper and lower bounds for estimating PageRank contributions and single-node PageRank. Our results for the upper bounds are derived by revisiting the known algorithms of ApproxContributions (a.k.a. Backward Push) and BiPPR.

We revisit the classic local graph exploration algorithm ApproxContributions proposed by Andersen, Borgs, Chayes, Hopcroft, Mirrokni, and Teng (WAW '07, Internet Math. '08) for computing an \(\epsilon\)-approximation of the PageRank contribution vector for a target node \(t\) on a graph with \(n\) nodes and \(m\) edges. We give a worst-case complexity bound of ApproxContributions as \(O\left(n\pi(t)/\epsilon\cdot\min\big(\Delta_{\mathrm{in}},\Delta_{\mathrm{out}},\sqrt{m}\big)\right)\), where \(\pi(t)\) is the PageRank score of \(t\), and \(\Delta_{\mathrm{in}}\) and \(\Delta_{\mathrm{out}}\) are the maximum in-degree and out-degree of the graph, resp. We also give a lower bound of \(\Omega\left(\min\big(\Delta_{\mathrm{in}}/\delta,\Delta_{\mathrm{out}}/\delta,\sqrt{m}/\delta,m\big)\right)\) for detecting the \(\delta\)-contributing set of \(t\), showing that the simple ApproxContributions algorithm is already optimal.
As ApproxContributions has become a cornerstone for computing random-walk probabilities, our results and techniques can be applied to derive better bounds for various relevant problems. In particular, we investigate the computational complexity of locally estimating a node's PageRank centrality. We improve the best-known upper bound of \(\widetilde{O}\left(n^{2/3}\cdot\min\left(\Delta_{\mathrm{out}}^{1/3},m^{1/6}\right)\right)\) given by Bressan, Peserico, and Pretto (SICOMP '23) to \(O\left(n^{1/2}\cdot\min\left(\Delta_{\mathrm{in}}^{1/2},\Delta_{\mathrm{out}}^{1/2},m^{1/4}\right)\right)\) by simply combining ApproxContributions with the Monte Carlo simulation method. We also improve their lower bound of \(\Omega\left(\min\left(n^{1/2}\Delta_{\mathrm{out}}^{1/2},n^{1/3}m^{1/3}\right)\right)\) to \(\Omega\left(n^{1/2}\cdot\min\left(\Delta_{\mathrm{in}}^{1/2},\Delta_{\mathrm{out}}^{1/2},m^{1/4}\right)\right)\) if \(\min(\Delta_{\mathrm{in}},\Delta_{\mathrm{out}})=\Omega\left(n^{1/3}\right)\), and to \(\Omega\left(n^{1/2-\gamma}\big(\min(\Delta_{\mathrm{in}},\Delta_{\mathrm{out}})\big)^{1/2+\gamma}\right)\) if \(\min(\Delta_{\mathrm{in}},\Delta_{\mathrm{out}})=o\left(n^{1/3}\right)\), where \(\gamma>0\) is an arbitrarily small constant. Our matching upper and lower bounds resolve the open problem of whether one can tighten the bounds given by Bressan, Peserico, and Pretto (FOCS '18, SICOMP '23). Remarkably, the techniques and analyses for proving all our results are surprisingly simple.
@inproceedings{wang2024revisiting, author = {Hanzhi Wang and Zhewei Wei and Ji{-}Rong Wen and Mingji Yang}, title = {Revisiting local computation of {PageRank}: Simple and optimal}, booktitle = {Proceedings of the 56th Annual {ACM} Symposium on Theory of Computing}, pages = {911--922}, year = {2024}, doi = {10.1145/3618260.3649661} }
ICDT 2024
Approximating Single-Source Personalized PageRank with Absolute Error Guarantees †

Zhewei Wei^*, Ji-Rong Wen, and Mingji Yang

In Proceedings of the 27th International Conference on Database Theory, Mar 2024

TLDR Abstract BibTex arXiv Slides

This paper proposes new algorithms to improve the upper bounds for approximating Single-Source Personalized PageRank with (degree-normalized) absolute error guarantees on various graphs.

Personalized PageRank (PPR) is an extensively studied and applied node proximity measure in graphs. For a pair of nodes \(s\) and \(t\) on a graph \(G=(V,E)\), the PPR value \(\pi(s,t)\) is defined as the probability that an \(\alpha\)-discounted random walk from \(s\) terminates at \(t\), where the walk terminates with probability \(\alpha\) at each step. We study the classic Single-Source PPR query, which asks for PPR approximations from a given source node \(s\) to all nodes in the graph. Specifically, we aim to provide approximations with absolute error guarantees, ensuring that the resultant PPR estimates \(\hat{\pi}(s,t)\) satisfy \(\max_{t\in V}\big|\hat{\pi}(s,t)-\pi(s,t)\big|\le\varepsilon\) for a given error bound \(\varepsilon\). We propose an algorithm that achieves this with high probability, with an expected running time of
- \(\widetilde{O}\big(\sqrt{m}/\varepsilon\big)\) for directed graphs, where \(m=|E|\);
- \(\widetilde{O}\big(\sqrt{d_{\mathrm{max}}}/\varepsilon\big)\) for undirected graphs, where \(d_{\mathrm{max}}\) is the maximum node degree in the graph;
- \(\widetilde{O}\left(n^{\gamma-1/2}/\varepsilon\right)\) for power-law graphs, where \(n=|V|\) and \(\gamma\in\left(\frac{1}{2},1\right)\) is the extent of the power law.
These sublinear bounds improve upon existing results. We also study the case when degree-normalized absolute error guarantees are desired, requiring \(\max_{t\in V}\big|\hat{\pi}(s,t)/d(t)-\pi(s,t)/d(t)\big|\le\varepsilon_d\) for a given error bound \(\varepsilon_d\), where the graph is undirected and \(d(t)\) is the degree of node \(t\). We give an algorithm that provides this error guarantee with high probability, achieving an expected complexity of \(\widetilde{O}\left(\sqrt{\sum_{t\in V}\pi(s,t)/d(t)}\big/\varepsilon_d\right)\). This improves over the previously known \(O(1/\varepsilon_d)\) complexity.
@inproceedings{wei2024approximating, author = {Zhewei Wei and Ji{-}Rong Wen and Mingji Yang}, title = {Approximating single-source {Personalized} {PageRank} with absolute error guarantees}, booktitle = {Proceedings of the 27th International Conference on Database Theory}, volume = {290}, pages = {9:1--9:19}, year = {2024}, doi = {10.4230/LIPIcs.ICDT.2024.9} }