Publications
Authors are ordered alphabetically with equal contributions by default. For papers marked with "†", authors are ordered by contribution.
2026
- On Solving Asymmetric Diagonally Dominant Linear Systems in Sublinear TimeTsz Chiu Kwok, Zhewei Wei, and Mingji YangIn Proceedings of the 17th Innovations in Theoretical Computer Science Conference, to appear, 2026
We initiate a study of solving row/column diagonally dominant (RDD/CDD) linear systems in sublinear time, which includes estimating (Personalized) PageRank and effective resistance on graphs as special cases. We characterize the problem's mathematical structure via a novel quantity called the maximum \(p\)-norm gap, and develop a collection of algorithmic results by adapting techniques from local graph algorithms.
We initiate a study of solving a row/column diagonally dominant (RDD/CDD) linear system \(\mathbf{M}\boldsymbol{x} = \boldsymbol{b}\) in sublinear time, with the goal of estimating \(\boldsymbol{t}^{\top}\boldsymbol{x}^{\ast}\) for a given vector \(\boldsymbol{t} \in \mathbb{R}^n\) and a specific solution \(\boldsymbol{x}^{\ast}\). This setting naturally generalizes the study of sublinear-time solvers for symmetric diagonally dominant (SDD) systems [Andoni-Krauthgamer-Pogrow, ITCS 2019] to the asymmetric case, which has remained underexplored despite extensive work on nearly-linear-time solvers for RDD/CDD systems.
Our first contributions are characterizations of the problem's mathematical structure. We express a solution \(\boldsymbol{x}^{\ast}\) via a Neumann series, prove its convergence, and upper bound the truncation error on this series through a novel quantity of \(\mathbf{M}\), termed the maximum \(p\)-norm gap. This quantity generalizes the spectral gap of symmetric matrices and captures how the structure of \(\mathbf{M}\) governs the problem's computational difficulty.
For systems with bounded maximum \(p\)-norm gap, we develop a collection of algorithmic results for locally approximating \(\boldsymbol{t}^{\top}\boldsymbol{x}^{\ast}\) under various scenarios and error measures. We derive these results by adapting the techniques of random-walk sampling, local push, and their bidirectional combination, which have proved powerful for special cases of solving RDD/CDD systems, particularly estimating PageRank and effective resistance on graphs. Our general framework yields deeper insights, extended results, and improved complexity bounds for these problems. Notably, our perspective provides a unified understanding of Forward Push and Backward Push, two fundamental approaches for estimating random-walk probabilities on graphs.
Our framework also inherits the hardness results for sublinear-time SDD solvers and local PageRank computation, establishing lower bounds on the maximum \(p\)-norm gap or the accuracy parameter. We hope that our work opens the door for further study into sublinear solvers, local graph algorithms, and directed spectral graph theory. - PageRank Centrality in Directed Graphs with Bounded In-DegreeIn Proceedings of the 2026 ACM-SIAM Symposium on Discrete Algorithms, to appear, 2026
For estimating single-node PageRank on directed graphs, if we parameterize the complexity by the maximum in-degree \(\Delta_{\mathrm{in}}\) of the graph, the upper bound in our STOC 2024 paper is optimal only when \(\Delta_{\mathrm{in}} = \Omega\left(n^{\Omega(1)}\right)\). In this paper, we improve the upper bound to match the lower bound (up to polylog factors) for all regimes of \(\Delta_{\mathrm{in}}\). Our key technique is a bidirectional algorithm that employs a novel randomized local exploration method.
We study the computational complexity of locally estimating a node's PageRank centrality in a directed graph \(G\). For any node \(t\), its PageRank centrality \(\pi(t)\) is defined as the probability that a random walk in \(G\), starting from a uniformly chosen node, terminates at \(t\), where each step terminates with a constant probability \(\alpha \in (0,1)\).
To obtain a multiplicative \(\big(1 \pm O(1)\big)\)-approximation of \(\pi(t)\) with probability \(\Omega(1)\), the previously best upper bound is \(O\left(n^{1/2}\min\left\{ \Delta_{\mathrm{in}}^{1/2},\ \Delta_{\mathrm{out}}^{1/2},\ m^{1/4} \right\}\right)\) from [Wang, Wei, Wen, Yang, STOC '24], where \(n\) and \(m\) denote the number of nodes and edges in \(G\), and \(\Delta_{\mathrm{in}}\) and \(\Delta_{\mathrm{out}}\) upper bound the in-degrees and out-degrees of \(G\), respectively. Using a refinement of the proof in the same paper, we establish a lower bound of \(\Omega\left(n^{1/2}\min\left\{ \Delta_{\mathrm{in}}^{1/2} \big/ n^{\gamma},\Delta_{\mathrm{out}}^{1/2} \big/ n^{\gamma},m^{1/4}\right\}\right)\), where \(\gamma = \frac{1}{2} \left(2\max\left\{\log_{1/(1-\alpha)}\Delta_{\mathrm{in}},1\right\}-1\right)^{-1}\). As \(\gamma\) only depends on \(\Delta_\mathrm{in}\) and \(n^{\gamma} = O(1)\) for \(\Delta_{\mathrm{in}} = \Omega\left(n^{\Omega(1)}\right)\), the known upper bound is tight if we only parameterize the complexity by \(n\), \(m\), and \(\Delta_{\mathrm{out}}\). However, there remains a gap of \(\Omega\left(n^{\gamma}\right)\) when considering the maximum in-degree \(\Delta_{\mathrm{in}}\), and this gap is large when \(\Delta_{\mathrm{in}}\) is small. In the extreme case where \(\Delta_{\mathrm{in}} \le 1/(1 - \alpha)\), we have \(\gamma = 1/2\), leading to a gap of \(\Omega\left(n^{1/2}\right)\) between the bounds \(O\left(n^{1/2}\right)\) and \(\Omega(1)\).
In this paper, we present a new algorithm that achieves the above lower bound (up to logarithmic factors). The algorithm assumes that \(n\) and the bounds \(\Delta_{\mathrm{in}}\) and \(\Delta_{\mathrm{out}}\) are known in advance. Our key technique is a novel randomized backwards propagation process that only propagates selectively based on Monte Carlo estimated PageRank scores.
2024
- IEEE Transactions on Knowledge and Data Engineering, Sep 2024
We systematically survey basic techniques and recent algorithms for PPR computation.
Personalized PageRank (PPR) is a traditional measure for node proximity on large graphs. For a pair of nodes \(s\) and \(t\), the PPR value \(\pi_s(t)\) equals the probability that an \(\alpha\)-discounted random walk from \(s\) terminates at \(t\) and reflects the importance between \(s\) and \(t\) in a bidirectional way. As a generalization of Google's celebrated PageRank centrality, PPR has been extensively studied and has found multifaceted applications in many fields, such as network analysis, graph mining, and graph machine learning. Despite numerous studies devoted to PPR over the decades, efficient computation of PPR remains a challenging problem, and there is a dearth of systematic summaries and comparisons of existing algorithms. In this paper, we recap several frequently used techniques for PPR computation and conduct a comprehensive survey of various recent PPR algorithms from an algorithmic perspective. We classify these approaches based on the types of queries they address and review their methodologies and contributions. We also discuss some representative algorithms for computing PPR on dynamic graphs and in parallel or distributed environments.
@article{yang2024efficient, author = {Mingji Yang and Hanzhi Wang and Zhewei Wei and Sibo Wang and Ji{-}Rong Wen}, title = {Efficient algorithms for {Personalized} {PageRank} computation: a survey}, journal = {{IEEE} Transactions on Knowledge and Data Engineering}, volume = {36}, number = {9}, pages = {4582--4602}, year = {2024}, doi = {10.1109/TKDE.2024.3376000} } - In Proceedings of the 56th Annual ACM Symposium on Theory of Computing, Jun 2024
We establish nearly-tight complexity upper and lower bounds on estimating PageRank contributions and single-node PageRank on directed graphs. Our techniques and analyses are surprisingly simple, where the upper bounds are derived by revisiting classic algorithms.
We revisit the classic local graph exploration algorithm ApproxContributions proposed by Andersen, Borgs, Chayes, Hopcroft, Mirrokni, and Teng (WAW '07, Internet Math. '08) for computing an \(\epsilon\)-approximation of the PageRank contribution vector for a target node \(t\) on a graph with \(n\) nodes and \(m\) edges. We give a worst-case complexity bound of ApproxContributions as \(O\left(n\pi(t)/\epsilon\cdot\min\big(\Delta_{\mathrm{in}},\Delta_{\mathrm{out}},\sqrt{m}\big)\right)\), where \(\pi(t)\) is the PageRank score of \(t\), and \(\Delta_{\mathrm{in}}\) and \(\Delta_{\mathrm{out}}\) are the maximum in-degree and out-degree of the graph, resp. We also give a lower bound of \(\Omega\left(\min\big(\Delta_{\mathrm{in}}/\delta,\Delta_{\mathrm{out}}/\delta,\sqrt{m}/\delta,m\big)\right)\) for detecting the \(\delta\)-contributing set of \(t\), showing that the simple ApproxContributions algorithm is already optimal.
As ApproxContributions has become a cornerstone for computing random-walk probabilities, our results and techniques can be applied to derive better bounds for various relevant problems. In particular, we investigate the computational complexity of locally estimating a node's PageRank centrality. We improve the best-known upper bound of \(\widetilde{O}\left(n^{2/3}\cdot\min\left(\Delta_{\mathrm{out}}^{1/3},m^{1/6}\right)\right)\) given by Bressan, Peserico, and Pretto (SICOMP '23) to \(O\left(n^{1/2}\cdot\min\left(\Delta_{\mathrm{in}}^{1/2},\Delta_{\mathrm{out}}^{1/2},m^{1/4}\right)\right)\) by simply combining ApproxContributions with Monte Carlo simulation. We also improve their \(\Omega\left(\min\left(n^{1/2}\Delta_{\mathrm{out}}^{1/2},n^{1/3}m^{1/3}\right)\right)\) lower bound to \(\Omega\left(n^{1/2}\cdot\min\left(\Delta_{\mathrm{in}}^{1/2},\Delta_{\mathrm{out}}^{1/2},m^{1/4}\right)\right)\) if \(\min(\Delta_{\mathrm{in}},\Delta_{\mathrm{out}})=\Omega\left(n^{1/3}\right)\). Our matching upper and lower bounds resolve the open problem of whether one can tighten the bounds given by Bressan, Peserico, and Pretto (FOCS '18, SICOMP '23). Remarkably, the techniques and analyses for proving all our results are surprisingly simple.@inproceedings{wang2024revisiting, author = {Hanzhi Wang and Zhewei Wei and Ji{-}Rong Wen and Mingji Yang}, title = {Revisiting local computation of {PageRank}: Simple and optimal}, booktitle = {Proceedings of the 56th Annual {ACM} Symposium on Theory of Computing}, pages = {911--922}, year = {2024}, doi = {10.1145/3618260.3649661} } - Zhewei Wei*, Ji-Rong Wen, and Mingji YangIn Proceedings of the 27th International Conference on Database Theory, Mar 2024
We improve the complexity upper bounds on approximating single-source Personalized PageRank with (degree-normalized) absolute error guarantees on various graphs.
Personalized PageRank (PPR) is an extensively studied and applied node proximity measure in graphs. For a pair of nodes \(s\) and \(t\) on a graph \(G=(V,E)\), the PPR value \(\pi(s,t)\) is defined as the probability that an \(\alpha\)-discounted random walk from \(s\) terminates at \(t\), where the walk terminates with probability \(\alpha\) at each step. We study the classic Single-Source PPR query, which asks for PPR approximations from a given source node \(s\) to all nodes in the graph. Specifically, we aim to provide approximations with absolute error guarantees, ensuring that the resultant PPR estimates \(\hat{\pi}(s,t)\) satisfy \(\max_{t\in V}\big|\hat{\pi}(s,t)-\pi(s,t)\big|\le\varepsilon\) for a given error bound \(\varepsilon\). We propose an algorithm that achieves this with high probability, with an expected running time of
- \(\widetilde{O}\big(\sqrt{m}/\varepsilon\big)\) for directed graphs, where \(m=|E|\);
- \(\widetilde{O}\big(\sqrt{d_{\mathrm{max}}}/\varepsilon\big)\) for undirected graphs, where \(d_{\mathrm{max}}\) is the maximum node degree in the graph;
- \(\widetilde{O}\left(n^{\gamma-1/2}/\varepsilon\right)\) for power-law graphs, where \(n=|V|\) and \(\gamma\in\left(\frac{1}{2},1\right)\) is the extent of the power law.
These sublinear bounds improve upon existing results. We also study the case when degree-normalized absolute error guarantees are desired, requiring \(\max_{t\in V}\big|\hat{\pi}(s,t)/d(t)-\pi(s,t)/d(t)\big|\le\varepsilon_d\) for a given error bound \(\varepsilon_d\), where the graph is undirected and \(d(t)\) is the degree of node \(t\). We give an algorithm that provides this error guarantee with high probability, achieving an expected complexity of \(\widetilde{O}\left(\sqrt{\sum_{t\in V}\pi(s,t)/d(t)}\big/\varepsilon_d\right)\). This improves over the previously known \(O(1/\varepsilon_d)\) complexity.@inproceedings{wei2024approximating, author = {Zhewei Wei and Ji{-}Rong Wen and Mingji Yang}, title = {Approximating single-source {Personalized} {PageRank} with absolute error guarantees}, booktitle = {Proceedings of the 27th International Conference on Database Theory}, volume = {290}, pages = {9:1--9:19}, year = {2024}, doi = {10.4230/LIPIcs.ICDT.2024.9} }