Communication Lower Bounds for Nested Bilinear Algorithms via Rank Expansion of Kronecker Products

Ju, Caleb; Zhang, Yifan; Solomonik, Edgar

doi:10.1007/s10208-023-09633-8

Communication Lower Bounds for Nested Bilinear Algorithms via Rank Expansion of Kronecker Products

Published: 06 November 2023

(2023)
Cite this article

Foundations of Computational Mathematics Aims and scope Submit manuscript

Caleb Ju¹^na1,
Yifan Zhang²^na1 &
Edgar Solomonik³

163 Accesses
Explore all metrics

Abstract

We develop lower bounds on communication in the memory hierarchy or between processors for nested bilinear algorithms, such as Strassen’s algorithm for matrix multiplication. We build on a previous framework that establishes communication lower bounds by use of the rank expansion, or the minimum rank of any fixed size subset of columns of a matrix, for each of the three matrices encoding a bilinear algorithm. This framework provides lower bounds for a class of dependency directed acyclic graphs (DAGs) corresponding to the execution of a given bilinear algorithm, in contrast to other approaches that yield bounds for specific DAGs. However, our lower bounds only apply to executions that do not compute the same DAG node multiple times. Two bilinear algorithms can be nested by taking Kronecker products between their encoding matrices. Our main result is a lower bound on the rank expansion of a matrix constructed by a Kronecker product derived from lower bounds on the rank expansion of the Kronecker product’s operands. We apply the rank expansion lower bounds to obtain novel communication lower bounds for nested Toom-Cook convolution, Strassen’s algorithm, and fast algorithms for contraction of partially symmetric tensors.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Algorithm 1

Introduction to Communication Avoiding Algorithms for Direct Methods of Factorization in Linear Algebra

COSTA: Communication-Optimal Shuffle and Transpose Algorithm with Process Relabeling

Total Exchange Routing on Hierarchical Dual-Nets

References

Agarwal, R., Cooley, J.: New algorithms for digital convolution. IEEE Transactions on Acoustics, Speech, and Signal Processing 25(5), 392–410 (1977)
Article MATH Google Scholar
Agrawal, A., Diamond, S., Boyd, S.: Disciplined geometric programming. Optimization Letters 13(5), 961–976 (2019)
Article MathSciNet MATH Google Scholar
Ballard, G., Buluc, A., Demmel, J., Grigori, L., Lipshitz, B., Schwartz, O., Toledo, S.: Communication optimal parallel multiplication of sparse random matrices. In: Proceedings of the twenty-fifth annual ACM symposium on Parallelism in algorithms and architectures, pp. 222–231 (2013)
Ballard, G., Demmel, J., Holtz, O., Lipshitz, B., Schwartz, O.: Communication-optimal parallel algorithm for Strassen’s matrix multiplication. In: Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures, pp. 193–204 (2012)
Ballard, G., Demmel, J., Holtz, O., Schwartz, O.: Minimizing communication in numerical linear algebra. SIAM Journal on Matrix Analysis and Applications 32(3), 866–901 (2011)
Article MathSciNet MATH Google Scholar
Ballard, G., Demmel, J., Holtz, O., Schwartz, O.: Graph expansion and communication costs of fast matrix multiplication. Journal of the ACM (JACM) 59(6), 1–23 (2013)
Article MathSciNet MATH Google Scholar
Ballard, G., Druinsky, A., Knight, N., Schwartz, O.: Hypergraph partitioning for sparse matrix-matrix multiplication. ACM Transactions on Parallel Computing (TOPC) 3(3), 1–34 (2016)
Article Google Scholar
Ballard, G., Knight, N., Rouse, K.: Communication lower bounds for matricized tensor times Khatri-Rao product. In: 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp. 557–567. IEEE (2018)
Bilardi, G., De Stefani, L.: The I/O complexity of Strassen’s matrix multiplication with recomputation. In: Workshop on Algorithms and Data Structures, pp. 181–192. Springer (2017)
Bilardi, G., De Stefani, L.: The I/O complexity of Toom-Cook integer multiplication. In: Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 2034–2052. SIAM (2019)
Bilardi, G., Preparata, F.P.: Horizons of parallel computation. Journal of Parallel and Distributed Computing 27(2), 172–182 (1995)
Article MATH Google Scholar
Bilardi, G., Preparata, F.P.: Processor-time tradeoffs under bounded-speed message propagation: Part II, lower bounds. Theory of Computing Systems 32(5), 531–559 (1999)
Article MathSciNet MATH Google Scholar
Brascamp, H.J., Lieb, E.H.: Best constants in Young’s inequality, its converse, and its generalization to more than three functions. Advances in Mathematics 20(2), 151–173 (1976)
Article MathSciNet MATH Google Scholar
Christ, M., Demmel, J., Knight, N., Scanlon, T., Yelick, K.: Communication lower bounds and optimal algorithms for programs that reference arrays–part 1. arXiv:1308.0068 (2013)
De Stefani, L.: On the I/O complexity of hybrid algorithms for integer multiplication. arXiv:1912.08045 (2020)
Demmel, J., Dinh, G.: Communication-optimal convolutional neural nets. arXiv:1802.06905 (2018)
Dinh, G., Demmel, J.: Communication-optimal tilings for projective nested loops with arbitrary bounds. In: Proceedings of the 32nd ACM Symposium on Parallelism in Algorithms and Architectures, pp. 523–525 (2020)
Golub, G.H., Van Loan, C.F.: Matrix Computations. The Johns Hopkins University Press, (2013)
Book MATH Google Scholar
Halmos, P.R.: Finite-dimensional vector spaces. Springer, (1958)
Hirata, S.: Tensor Contraction Engine: Abstraction and automated parallel implementation of configuration-interaction, coupled-cluster, and many-body perturbation theories. The Journal of Physical Chemistry A 107(46), 9887–9897 (2003)
Article Google Scholar
Hölder, O.: Über einen mittelwertssatz. Nachr. Acad. Wiss. Göttingen Math.-Phys. K pp. 38–47 (1889)
Hong, J.W., Kung, H.T.: I/O complexity: The red-blue pebble game. In: Proceedings of the thirteenth annual ACM symposium on Theory of computing, pp. 326–333 (1981)
Irony, D., Toledo, S., Tiskin, A.: Communication lower bounds for distributed-memory matrix multiplication. Journal of Parallel and Distributed Computing 64(9), 1017–1026 (2004)
Article MATH Google Scholar
Jain, S., Zaharia, M.: Spectral lower bounds on the I/O complexity of computation graphs. In: Proceedings of the 32nd ACM Symposium on Parallelism in Algorithms and Architectures, pp. 329–338 (2020)
Ju, C., Solomonik, E.: Derivation and analysis of fast bilinear algorithms for convolution. SIAM Review 62(4), 743–777 (2020)
Article MathSciNet MATH Google Scholar
Kogge, P., Shalf, J.: Exascale computing trends: Adjusting to the “new normal” for computer architecture. Computing in Science & Engineering 15(6), 16–26 (2013)
Article Google Scholar
Kruskal, J.B.: Three-way arrays: rank and uniqueness of trilinear decompositions, with application to arithmetic complexity and statistics. Linear algebra and its applications 18(2), 95–138 (1977)
Article MathSciNet MATH Google Scholar
Lavin, A., Gray, S.: Fast algorithms for convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4013–4021 (2016)
Loomis, L.H., Whitney, H.: An inequality related to the isoperimetric inequality. Bulletin of the American Mathematical Society 55(10), 961–962 (1949)
Article MathSciNet MATH Google Scholar
Nissim, R., Schwartz, O.: Revisiting the I/O-complexity of fast matrix multiplication with recomputations. In: 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp. 482–490. IEEE (2019)
Pan, V.: How can we speed up matrix multiplication? SIAM review 26(3), 393–415 (1984)
Article MathSciNet MATH Google Scholar
Pitas, I., Strintzis, M.: Multidimensional cyclic convolution algorithms with minimal multiplicative complexity. IEEE transactions on acoustics, speech, and signal processing 35(3), 384–390 (1987)
Article MathSciNet Google Scholar
Selesnick, I.W., Burrus, C.S.: Extending Winograd’s small convolution algorithm to longer lengths. In: Proceedings of IEEE International Symposium on Circuits and Systems-ISCAS’94, vol. 2, pp. 449–452. IEEE (1994)
Solomonik, E., Demmel, J.: Fast bilinear algorithms for symmetric tensor contractions. Computational Methods in Applied Mathematics 21(1), 211–231 (2021)
Article MathSciNet MATH Google Scholar
Solomonik, E., Demmel, J., Hoefler, T.: Communication lower bounds of bilinear algorithms for symmetric tensor contractions. SIAM Journal on Scientific Computing 43(5), A3328–A3356 (2021)
Article MathSciNet MATH Google Scholar
Strassen, V.: Gaussian elimination is not optimal. Numerische mathematik 13(4), 354–356 (1969)
Article MathSciNet MATH Google Scholar
Yao, A.C.C.: Some complexity questions related to distributive computing. In: Proceedings of the eleventh annual ACM symposium on Theory of computing, pp. 209–213 (1979)

Download references

Acknowledgements

The authors would like to thank the anonymous reviewers for the comments that significantly improved the presentation and clarity of this work. CJ is supported by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research, Department of Energy Computational Science Graduate Fellowship under Award Number DE-SC0022158.

Author information

Caleb Ju and Yifan Zhang had equal contribution. Work was partially done while at the University of Illinois at Urbana-Champaign.

Authors and Affiliations

H. Milton Stewart School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, GA, 30332, USA
Caleb Ju
Oden Institute for Computational Engineering and Sciences, University of Texas at Austin, Austin, TX, 78712, USA
Yifan Zhang
Department of Computer Science, University of Illinois at Urbana-Champaign, Champaign, IL, 61801, USA
Edgar Solomonik

Authors

Caleb Ju
View author publications
You can also search for this author in PubMed Google Scholar
Yifan Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Edgar Solomonik
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Edgar Solomonik.

Additional information

Communicated by Peter Buergisser.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Disclaimer This report was prepared as an account of work sponsored by an agency of the United States Government. Neither the United States Government nor any agency thereof, nor any of their employees, makes any warranty, express or implied,or assumes any legal liability or responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately owned rights. Reference herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise does not necessarily constitute or imply its endorsement, recommendation, or favoring by the United States Government or any agency thereof. The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States Government or any agency thereof.

Appendix: Improved Lower Bounds by Limiting the Grid Expansion

Let $\varvec{A}\in {\mathbb {C}}^{m_A \times n_A}$, $\varvec{B}\in {\mathbb {C}}^{m_B \times n_B}$, and $\varvec{C}= \varvec{A}\otimes \varvec{B}$. As mentioned after the main theorems in Sect. 3, bounds $\sigma _C$ in those results require defining $\sigma _A$ and $\sigma _B$ beyond $n_A$ and $n_B$. This extrapolation can lead to a loose lower bound $\sigma _C$ even if $\sigma _A$ and $\sigma _B$ are tight. We illustrate this phenomenon in the next example.

Example 3

Consider the case $\varvec{A}= \varvec{B}\in {{\,\mathrm{{\mathbb {R}}}\,}}^{4 \times 7}$ with rank and Kruskal rank (maximum k such that any k different columns are linearly independent [27]) being 4. For example,

$$\begin{aligned} \varvec{A}= \varvec{B}= \begin{bmatrix} 1&{}0&{}0&{}0&{}1&{}1&{}1\\ 0&{}1&{}0&{}0&{}1&{}2&{}3\\ 0&{}0&{}1&{}0&{}1&{}4&{}9\\ 0&{}0&{}0&{}1&{}1&{}8&{}27 \end{bmatrix}. \end{aligned}$$

Denote the ith columns of $\varvec{A}$ and $\varvec{B}$ as $\varvec{a}_i$ and $\varvec{b}_j$, respectively. Now ${\tilde{\sigma }}_A(x) = {\tilde{\sigma }}_B(x) = \min \left\{ x, 4\right\} $. Let $\varvec{C}= \varvec{A}\otimes \varvec{B}$, and we seek a rank expansion lower bound $\sigma _C(k)$ for $\varvec{C}$.

When $k = 13$, it is not hard to check that ${\tilde{\sigma }}_C(k) = 7$, which is attained by submatrix $\varvec{C}\varvec{P} = \left\{ \varvec{a}_i \otimes \varvec{b}_j: i = 1 \text { or } j = 1\right\} $. If we naturally take $\sigma _A(x) = \sigma _B(x) = \min \left\{ x, 4\right\} $ (on ${{\,\mathrm{{\mathbb {R}}}\,}}$), Theorem 9 gives $\sigma _C(13) = 4$. If we take instead $\sigma _A(x) = \sigma _B(x) = x^{\ln 4 / \ln 7}$, Theorem 9 gives $\sigma _C(13) \approx 6.2$, which is optimal after rounding up to integer. Although $x^{\ln 4/\ln 7}$ is not as tight as $\min \left\{ x, 4\right\} $ in the range $x \in [0,7]$, its extrapolation on $x \geqslant 7$ is greater than that of $\min \left\{ x, 4\right\} $.

Indeed, when $x \leqslant 7$, using $\min \left\{ x, 4\right\} $ offers a tighter bound than $x^{\ln 4/ \ln 7}$. For example, when $k = 5$, with $\sigma _A(x) = \sigma _B(x) = \min \left\{ x, 4\right\} $, $\sigma _C(5) = 4 = {\tilde{\sigma }}_C(5)$, but with $\sigma _A(x) = \sigma _B(x) = x^{\ln 4 / \ln 7}$, $\sigma _C(5) \approx 3.1$.

We see from above that a tighter rank expansion lower bounds on $\varvec{A}$ and $\varvec{B}$ may lead to a looser rank expansion lower bound $\sigma _C$. Thus, there is an opportunity to improve the established bound in Theorem 9. To achieve this improvement and avoid the behavior in the example above, we derive a lower bound $\sigma _C$ that evaluates $\sigma _A$ and $\sigma _B$ on their intended domains of $[0,n_A]$ and $[0,n_B]$, respectively.

1.1 The L-Shaped Bound

The derivation of the new rank expansion lower bound $\sigma _C$ is not much different from the main theorems. The only difference is in the continuous relaxation step (Sect. 5). Recall that in the discrete step (see Sect. 4.2.1 and definitions therein), we constructed the pre-CDG S that is a subgrid of the basis of $D = {{\,\textrm{VCollapse}\,}}(G)$. Thus, we know

$$\begin{aligned} {{\,\textrm{CDG}\,}}(S) \subseteq [0, \sigma _A(n_A)] \times [0, \sigma _B(n_B)]. \end{aligned}$$

To simplify the notation, we will denote

$$\begin{aligned} r_A:= \sigma _A(n_A),\ \ r_B:= \sigma _B(n_B),\ \ S:= {{\,\textrm{CDG}\,}}(S). \end{aligned}$$

In the continuous relaxation step, we carried out $\mathop {\textrm{Merge}}\limits (2, 3)$ (Definition 24) until S becomes a 2-step CDG, and then $\mathop {\textrm{Merge}}\limits (1, 2)$ to make it a rectangle. In fact, before the last step $\mathop {\textrm{Merge}}\limits (1, 2)$, the entire grid remains a subgrid of $[0, r_A] \times [0, r_B]$. To avoid leaving this domain (so that ${{\,\textrm{GridExp}\,}}(S)$ remains in $[0, n_A] \times [0, n_B]$), we stop $\mathop {\textrm{Merge}}\limits (1,2)$ if either the first or second step hits the boundary of the domain, which leads to an L-shaped grid (a 2-step CDG). We refer to this stoppage of $\mathop {\textrm{Merge}}\limits (1,2)$ as early stopping.

Definition 42

The L-shaped CDG $S = L(x_1, y_1; x_2, y_2)$ is the 2-step CDG with horizontal edges at $y_1,~ y_2$ and vertical edges at $x_1,~ x_2$. By convention, we require $0< x_1 < x_2$, $0< y_1 < y_2$.

Definition 43

Fix the values $r_A = \sigma _A(n_A)$ and $r_B = \sigma _B(n_B)$. We denote the collection of L-shaped grids of size t that touch the boundaries as

$$\begin{aligned} {{\,\mathrm{{\mathcal {L}}}\,}}(t) = \left\{ L = L(x_1, y_1; r_A, r_B): \vert L \vert = t\right\} . \end{aligned}$$

Denote the collection of rectangle grids of size t within $[0, r_A]\times [0, r_B]$ as

$$\begin{aligned} {{\,\mathrm{{\mathcal {R}}}\,}}(t) = \{R = [0, x] \times [0, y]: x \in [1, r_A],\ y \in [1, r_B],\ \vert R \vert = t\}. \end{aligned}$$

The L-shaped bound we derive in this section extends the bounds in the main theorems. Thus, we will work under the same assumptions on $\sigma _A$ and $\sigma _B$. See equations (6) and (7).

It is not hard to check that the early-stopped merge also results in an upper bound of $\langle S \rangle $ as stated in the lemma below, which serves as the analogue of Lemma 25.

Lemma 44

Let S be a CDG in $[0, r_A] \times [0, r_B]$ of size $|S| = t$. Then

$$\begin{aligned} |{{\,\textrm{GridExp}\,}}(S)| \leqslant \max _{M \in {{\,\mathrm{{\mathcal {L}}}\,}}(t) \cup {{\,\mathrm{{\mathcal {R}}}\,}}(t)} \langle M \rangle . \end{aligned}$$

(19)

Proof

The proof is similar to that of Lemma 25, where as long as the grid has at least three steps, we apply $\mathop {\textrm{Merge}}\limits (2, 3)$ repeatedly. This produces an L-shaped grid within $[0, r_A] \times [0, r_B]$. However, in the last merge operation, $\mathop {\textrm{Merge}}\limits (1, 2)$, we stop increasing u, the height difference between steps 1 and 2, if step 1 reaches height $r_B$ before step 2 reaches the ground. This may give an L-shaped grid $L(x_1, y_1; x_2, r_B)$. By Lemma 25, either this L-shaped grid or a rectangle grid is an upper bound of $\langle S \rangle $. If the rectangle one is an upper bound, then we are done since it is in ${{\,\mathrm{{\mathcal {R}}}\,}}(t)$.

Next, consider the case the upper bound is $L(x_1, y_1; x_2, r_B)$. Similar to how we increased/decreased the height of the steps, we can horizontally merge the two horizontal layers of the 2-step stair, and we similarly disallow the width of the lower layer to go beyond $r_A$. This may result in an L-shaped grid $L(x_1', y_1, r_A, r_B)$. After these two merges, the resulting grid may either be a rectangle from ${{\,\mathrm{{\mathcal {R}}}\,}}(t)$, or the L-shaped grid above, which is from ${{\,\mathrm{{\mathcal {L}}}\,}}$. The proof is thus complete. $\square $

With this new continuous relaxation step, we analogously derive the corresponding bound $\sigma _C$, which includes the L-shaped bound, in the following theorem.

Theorem 45

Suppose functions $\sigma _A$ and $\sigma _B$ are concave rank expansion lower bounds of $\varvec{A}\in {\mathbb {C}}^{m_A \times n_A}$ and $\varvec{B}\in {\mathbb {C}}^{m_B \times n_B}$, respectively, with $\sigma _A(0) = \sigma _B(0) = 0$. Let $r_A = \sigma _A(n_A)$, $r_B = \sigma _B(n_B)$, $d_A = \sigma _A^{\dagger }(1)$, and $d_B = \sigma _B^{\dagger }(1)$. Define

$$\begin{aligned} R_C(k) = \min _{\begin{array}{c} k_A\in [d_A, n_A],~k_B \in [d_B, n_B],\\ k_A k_B \geqslant k \end{array}} \sigma _A(k_A) \cdot \sigma _B(k_B), \end{aligned}$$

and

$$\begin{aligned} L_C(k) = \min _{\begin{array}{c} k_A \in [0, n_A],~k_B \in [0, n_B],\\ k_{A} r_B + k_B r_A - k_A k_B = k \end{array}} \sigma _A(k_{A})r_B + \sigma _B({k_{B}}) r_A - \sigma _A(k_{A}) \cdot \sigma _B(k_{B}). \end{aligned}$$

(20)

Then $\sigma _C(k) = \min \left\{ L_C(k), R_C(k)\right\} $ is a rank expansion lower bound of $\varvec{C}= \varvec{A}\otimes \varvec{B}$. When $R_C(k) \leqslant \max \left\{ r_A, r_B\right\} $, $R_C$ is a rank expansion lower bound for $\varvec{C}$.

Proof

Again, using a density argument on the functions, we hereafter assume $\sigma _A$ and $\sigma _B$ are strictly increasing smooth functions so that they are invertible. We start by showing $\sigma _C$ is continuous and increasing, as required by the definition of a rank expansion lower bound. Since $R_C$ and $L_C$ are continuous, so is $\sigma _C$. Clearly, $R_C$ is increasing. As for $L_C$, we have the equivalent definition,

$$\begin{aligned} L_C(k) = \min _{\begin{array}{c} k_A \in [0, n_A],~ k_B \in [0, n_B],\\ (n_A - k_A)(n_B - k_B)\, = \,n_A n_B - k \end{array}} r_A r_B - \left( r_A - \sigma _A(k_A)\right) \left( r_B - \sigma _B(k_B)\right) . \end{aligned}$$

Thus, as k increases, one can increase $k_A$ or $k_B$ so that $L_C$ increases. Hence, $L_C$ is also an increasing function. Consequently, $\sigma _C$ is an increasing function.

To prove $\sigma _C$ is a lower bound of the rank, we express the right-hand side of (19) as a function $\phi (t)$ as in the proof of Theorem 9. We compute the maximal expansion size of stairs in ${{\,\mathrm{{\mathcal {R}}}\,}}(t)$ and ${{\,\mathrm{{\mathcal {L}}}\,}}(t)$ below:

$$\begin{aligned} \begin{aligned} \phi _R(t)&= \max _{\begin{array}{c} t_A \in [1, r_A],~ t_B \in [1, r_B],\\ t_A t_B = t \end{array}} \sigma _A^{-1}(t_A)\sigma _B^{-1}(t_B);\\ \phi _L(t)&= \max _{\begin{array}{c} x_1 \in [0, r_A],~ y_1 \in [0, r_B],\\ r_Bx_1 + r_Ay_1 - x_1y_1 = t \end{array}} n_A\sigma _B^{-1}(y_1) + n_B\sigma _A^{-1}(x_1) - \sigma _A^{-1}(x_1)\sigma _B^{-1}(y_1). \end{aligned} \end{aligned}$$

(21)

Then we have

$$\begin{aligned} \phi (t):= \max \left\{ \phi _R(t), \phi _L(t)\right\} = \max _{M \in {{\,\mathrm{{\mathcal {L}}}\,}}(t) \cup {{\,\mathrm{{\mathcal {R}}}\,}}(t)} \langle M \rangle . \end{aligned}$$

When $k \leqslant d_Ad_B$, $\sigma _C(k) \leqslant R_C(k) = 1$, which is clearly a lower bound of the rank of a nonzero matrix. It remains to show the proposed function $\sigma _C$ satisfies $\sigma _C = \phi ^{-1}$ on $[d_Ad_B, +\infty )$.

Step 1. $R_C = \phi _R^{-1}$ on $[d_Ad_B, +\infty )$. This proof is the same as the proof for Lemma 26. All arguments carry through with the new constraints $t_A \leqslant r_A$, $t_B \leqslant r_B$, and $k_A \leqslant n_A$, $k_B \leqslant n_B$.

Step 2. $L_C = \phi _L^{-1}$. We will repeat the proof for Lemma 26 for this L-shaped case. First, we show $L_C \leqslant \phi _L^{-1}$. Assume by contradiction $\phi _L(\sigma _C(k)) > k$ for some k. Then there exist $x_1 \in [0, r_A],~ y_1 \in [0, r_B]$, and $r_B x_1 + r_A y_1 - x_1 y_1 = \sigma _C(k)$ such that

$$\begin{aligned} k' = n_A\sigma _B^{-1}(y_1) + n_B\sigma _A^{-1}(x_1) - \sigma _A^{-1}(x_1)\sigma _B^{-1}(y_1) > k. \end{aligned}$$

Consequently, as $L_C$ is strictly increasing,

$$\begin{aligned} \sigma _C(k)&\leqslant L_C(k) < L_C(k')\\&= \min _{\begin{array}{c} k_A\in [0, n_A],~k_B \in [0, n_B],\\ n_B k_A + n_A k_B - k_A k_B = k' \end{array}} r_B \sigma _A(k_A) + r_A \sigma _B(k_B) - \sigma _A(k_A) \cdot \sigma _B(k_B) \\&\leqslant r_B \sigma _A\left( \sigma _A^{-1}(x_1)\right) + r_A \sigma _B\left( \sigma _B^{-1}(y_1)\right) - \sigma _A\left( \sigma _A^{-1}(x_1)\right) \cdot \sigma _B\left( \sigma _B^{-1}(y_1)\right) \\&= r_B x_1 + r_A y_1 - x_1 y_1 \\&= \sigma _C(k). \end{aligned}$$

This is absurd. Next, we show the other direction, $\phi _L^{-1} \leqslant L_C$, by showing that if $L_C(k) = t$, then $\phi _L(t) \geqslant k$. Indeed, if $L_C(k) = t$, then let $x_1 = \sigma _A(k_A) \in [0, r_A]$ and $y_1 = \sigma _B(k_B) \in [0, r_B]$. Then we have both

$$\begin{aligned} \sigma _A^{-1}(x_1) r_B + \sigma _A^{-1}(y_1) r_A - \sigma ^{-1}_A(x_1) \sigma _B^{-1}(y_1)&= k, \\ r_Bx_1 + r_Ay_1 - x_1y_1&= t. \end{aligned}$$

Therefore,

$$\begin{aligned} \phi _L(t) \geqslant \sigma _A^{-1}(x_1) r_B + \sigma _A^{-1}(y_1) r_A - \sigma ^{-1}_A(x_1) \sigma _B^{-1}(y_1) = k. \end{aligned}$$

Hence, $L_C = \phi _L^{-1}$.

We now conclude $\sigma _C = \min \left\{ R_C, L_C\right\} $ is the inverse of $\phi = \max \left\{ \phi _R, \phi _L\right\} $. This is straightforward with steps 1 and 2, since all 4 functions here are positive increasing functions on ${{\,\mathrm{{\mathbb {R}}}\,}}_+$. We have established that $\sigma _C$ is a valid rank expansion lower bound for $\varvec{C}$.

Finally, to see why when $R_C(k) \leqslant \max \left\{ r_A, r_B\right\} $ implies that $R_C$ is a rank expansion lower bound, we consider the final merge operation $\mathop {\textrm{Merge}}\limits (1, 2)$. The L-shaped bound will only come into play if we need to increase the height difference u, and step 1 reaches height $r_B$ before step 2 goes down to the ground. However, each of the 2 steps is of width at least 1, since the original stair S corresponds to a CDG. This means step 1 will never reach $r_B$ before step 2 reaches the ground if $|S| \leqslant r_B$. Similarly, since $r_A$ and $r_B$ are interchangeable (by considering $\varvec{C}= \varvec{B}\otimes \varvec{A}$), we see that early stopping the merge and the L-shaped grid will never come into play if $|S| \leqslant \max \left\{ r_A, r_B\right\} $.

Suppose G is a grid and it produces a (pre-) CDG S with size $|S| \leqslant \max \left\{ r_A, r_B\right\} $. Then since we do not need to early stop the merge,

$$\begin{aligned} \phi _R({{\,\textrm{rank}\,}}(G)) \geqslant \phi _R(|S|) \geqslant |{{\,\textrm{GridExp}\,}}(S)| \geqslant |G|. \end{aligned}$$

Hence, ${{\,\textrm{rank}\,}}(G) \geqslant \phi _R^{-1}(k) = R_C(k)$, which establishes that $R_C$ is a rank expansion lower bound in this case. $\square $

The new bound in Theorem 45 is more complicated than Theorem 9, which does not involve L-shaped grids. However, the new bound derived in this section is often tighter. Indeed, let $\sigma _C^{R}$ and $\sigma _C^{R+L}$ be, respectively, the rank expansion lower bounds derived from Theorem 9 and Theorem 45. Recall the $\phi $ function we used when proving Theorem 9, shown below

$$\begin{aligned} \phi ^{\text {prev}}(t) = \max _{\begin{array}{c} t_A,~ t_B \geqslant 1,\\ t_A t_B = t \end{array}} \sigma _A^{-1}(t_A)\sigma _B^{-1}(t_B). \end{aligned}$$

(22)

Denote $\phi ^{\text {new}} = \max \left\{ \phi _R, \phi _L\right\} $ as the new $\phi $ function used in the proof of Theorem 45 above. Since $\sigma _C^{R} = \left( \phi ^{\text {prev}}\right) ^{-1}$ and $\sigma _C^{R+L} = \left( \phi ^{\text {new}}\right) ^{-1}$, if $\phi ^{\text {prev}} \geqslant \phi ^{\text {new}}$, the new bound $\sigma _C^{L+R}$ is then tighter (greater) than $\sigma _C^{R}$.

When $\phi _L \leqslant \phi _R$, it is clear $\phi ^{\text {prev}} \geqslant \phi ^{\text {new}}$, since $\phi _R$ and $\phi ^{\text {prev}}$ are maximizing the same function and $\phi _R$ has a smaller feasible region, so $\phi ^{\text {prev}} \geqslant \phi _R = \phi ^{\text {new}}$. When $\phi _L > \phi _R$, this is not clearly true. When the maximizer $x_1$ in $\phi _L$ is at least 1, then we can prove $\phi ^{\text {prev}} \geqslant \phi _L$, so $\sigma _C^{L+R} \geqslant \sigma _C^{R}$. We can require $x_1 \geqslant 1$ by early stopping the horizontal merge step in the proof of Lemma 44, but this will make the bound too complicated to state.

The bound in Theorem 45 does not require defining $\sigma _A$, $\sigma _B$ beyond $n_A$ and $n_B$. Thus, it also resolves the undesirable phenomenon in Example 3. It provides a tighter rank expansion lower bound for $\varvec{C}= \varvec{A}\otimes \varvec{B}$ when given tighter rank expansion lower bounds for $\varvec{A}$ and $\varvec{B}$. This is not always the case with Theorem 9 as shown in Example 3.

Suppose we have two rank expansion lower bounds $\sigma _A,~{\hat{\sigma }}_A$ for $\varvec{A}$ and $\sigma _B,~{\hat{\sigma }}_B$ for $\varvec{B}$. If $\sigma _{A,B} \geqslant {\hat{\sigma }}_{A, B}$ on $[0, n_{A, B}]$, then the corresponding functions $R_C \geqslant {\hat{R}}_C$ on $[0, n_An_B]$. If in addition $\sigma _{A, B}(n_{A, B}) = {\hat{\sigma }}_{A, B}(n_{A, B})$, for example, both are equal to the true rank of $\varvec{A}$ and $\varvec{B}$, then also $L_C \geqslant {\hat{L}}_C$ on $[0, n_An_B]$.

Despite these properties of the new bound in Theorem 45, Theorem 9 and Theorem 11 give much simpler bounds and can be easily applied recursively to derive a lower bound on the rank expansion for $\varvec{C}= \bigotimes _{i=1}^p \varvec{A}_i,~p\geqslant 3$.

The log-log convexity assumption simplifies Theorem 9. Similar assumptions can also simplify the bound in Theorem 45. Under such assumptions, we can show $R_C \leqslant L_C$, which removes the need of $L_C$ function. The obtained rank expansion lower bound is thus tighter than the one in Theorem 9 as discussed above. This is the main topic of the next section.

1.2 Simplifying the L-Shaped Bound

The main goal is to understand when $R_C \leqslant L_C$, and thus $\sigma _C = R_C$. We will show this is the case when $\sigma _A$ and $\sigma _B$ satisfy an appropriate log-log convexity condition (see Definition 10 for the definition and Sect. 5.4 for the basic properties). We will continue using the notations introduced in the previous section. Recall that $r_A = \sigma _A(n_A)$ and $r_B = \sigma _B(n_B)$.

Theorem 46

Let everything be defined as in Theorem 45. Suppose in addition functions $r_A - \sigma _A(n_A - x)$ and $r_B - \sigma _B(n_B - x)$ are log-log convex on $(0, n_A)$ and $(0, n_B)$, respectively. Then, $R_C(k) \leqslant L_C(k)$, and consequently,

$$\begin{aligned} \sigma _C(k) = \min _{\begin{array}{c} k_A\in [d_A, n_A],~k_B \in [d_B, n_B],\\ k_A k_B \geqslant k \end{array}} \sigma _A(k_A) \cdot \sigma _B(k_B) \end{aligned}$$

(23)

is a rank expansion lower bound of $\varvec{C}= \varvec{A}\otimes \varvec{B}$. This bound is tighter than the one in Theorem 9.

Proof

As defined in (21) during the proof of Theorem 45, the size of an L-shaped CDG after a grid expansion can be bounded by

$$\begin{aligned} \phi _L(t) = \max _{\begin{array}{c} x_1 \in [0, r_A],~ y_1 \in [0, r_B],\\ r_Bx_1 + r_Ay_1 - x_1y_1 = t \end{array}} n_A\sigma _B^{-1}(y_1) + n_B\sigma _A^{-1}(x_1) - \sigma _A^{-1}(x_1)\sigma _B^{-1}(y_1). \end{aligned}$$

Using a density argument, we assume that $\sigma _A^{-1}$ and $\sigma _B^{-1}$ are smooth and strictly increasing on $[0, r_A]$ and $[0, r_B]$. According to Theorem 45, when $R_C(k) \leqslant \max \left\{ r_A, r_B\right\} $, $R_C$ is indeed a rank expansion lower bound of $\varvec{C}$. Thus, hereafter we only consider the case $R_C(k) = t \geqslant \max \left\{ r_A, r_B\right\} $. To show $R_C \leqslant L_C$, it suffices to show $\phi _R(t) \geqslant \phi _L(t)$ when $t \geqslant \max \left\{ r_A, r_B\right\} $. Hence, we assume $t \geqslant \max \left\{ r_A, r_B\right\} $ in the proof below.

We show that when $\sigma _A^{-1}$ and $\sigma _B^{-1}$ satisfy the log-log convexity assumption, $\phi _L(t)$ is maximized at either $x_1 = 0$ or $y_1 = 0$. To begin with, note that when $x_1 = 0$,

$$\begin{aligned} \phi _L(t) = \sigma _A^{-1}(r_A) \sigma _B^{-1}(t/r_A), \end{aligned}$$

and when $y_1 = 0$,

$$\begin{aligned} \phi _L(t) = \sigma _A^{-1}(t/r_B) \sigma _B^{-1}(r_B). \end{aligned}$$

Since $t/r_A \geqslant 1$ and $t/r_B \geqslant 1$, then the pairs $(r_A, t/r_A)$ and $(t/r_B, r_B)$ are feasible in the optimization, and so

$$\begin{aligned} \phi _R(t) = \max _{\begin{array}{c} t_A \in [1, r_A],~ t_B \in [1, r_B],\\ t_A t_B = t \end{array}} \sigma _A^{-1}(t_A)\sigma _B^{-1}(t_B). \end{aligned}$$

Thus, if $\phi _L(t)$ is indeed maximized at $x_1 = 0$ or $y_1 = 0$, then we have $\phi _L \leqslant \phi _R$, and $R_C = \phi _R^{-1}$ is a valid rank expansion lower bound.

To that end, let us reuse the notation $f = \sigma _A^{-1}$ and $g = \sigma _B^{-1}$ for sake of simplicity. We introduce following functions on $(0, r_A)$ and $(0, r_B)$, respectively:

$$\begin{aligned} {\hat{f}}(x) = \frac{f(r_A) - f(r_A - x)}{xf'(r_A - x)} \text {~~and~~} {\hat{g}}(x) = \frac{g(r_B) - g(r_B - x)}{xg'(r_B - x)}. \end{aligned}$$

Since $r_A - \sigma _A(n_A - x)$ is log-log convex and increasing on $[0, n_A]$, its inverse, $p(x) \equiv f(r_A) - f(r_A - x)$, is log-log concave and increasing on $[0, r_A]$. Therefore,

$$\begin{aligned} \frac{d}{dx}\ln p(e^x) = \frac{e^xp'(e^x)}{p(e^x)} \end{aligned}$$

is positive and decreasing in x. Thus,

$$\begin{aligned} {\hat{f}}(x) = \frac{p(x)}{xp'(x)} \end{aligned}$$

is increasing in x. Similarly, ${\hat{g}}(x)$ is also increasing.

Writing $x = x_1$, we can rewrite $\phi _L$ as

$$\begin{aligned}&\phi _L(t) \\&= \max _{\begin{array}{c} x_1 \in [0, r_A],~ y_1 \in [0, r_B],\\ (r_A - x_1)(r_B - y_1)\, = \,r_A r_B - t \end{array}} n_A n_B - \left( n_A - f(x_1)\right) \left( n_B - g(y_1)\right) \\&= n_A n_B -\min _{x \in \left[ 0, r_A - \frac{r_Ar_B - t}{r_B}\right] } \left( f(r_A) - f(x)\right) \left( g(r_B) - g\left( r_B - \frac{r_Ar_B - t}{r_A - x}\right) \right) . \end{aligned}$$

In the trivial cases $t = r_Ar_B$, one can directly verify $\phi _R = \phi _L$. Now assume $t < r_Ar_B$, and thus $x < r_A$. For simplicity, we write $r_Ar_B - t = c > 0$, $\frac{c}{r_A - x} = s_x \in [\frac{c}{r_A}, r_B]$. It suffices to show

$$\begin{aligned} \mathop {\textrm{argmin}}\limits _{x \in [0, r_A - \frac{c}{r_B}]} \bigg \{ h(x):= \left( f(r_A) - f(x)\right) \left( g(r_B) - g(r_B - s_x)\right) \bigg \} \in \big \{ 0, r_A - \frac{c}{r_B} \big \}. \end{aligned}$$

The function h is $C^1$ on $[0, r_A - \frac{c}{r_B}]$, and

$$\begin{aligned} h'(x)&= [f(r_A) - f(x)]\cdot \frac{ds_x}{dx}\cdot g'(r_B - s_x) - f'(x) [g(r_B) - g(r_B - s_x)]\\&= s_x\left[ \frac{f(r_A) - f(x)}{r_A - x}g'(r_B - s_x) - f'(x) \frac{g(r_B) - g(r_B - s_x)}{s_x}\right] . \end{aligned}$$

Since f and g are convex and strictly increasing, $f' > 0$, $g' > 0$ on interval $x \in (0, r_A - \frac{c}{r_B})$, so on this interval,

$$\begin{aligned} h'(x)&= s_xf'(x)g'(r_B - s_x) \left( {\hat{f}}(r_A - x) - {\hat{g}}\left( \frac{c}{r_A - x}\right) \right) \\&\quad \qquad \propto _+ {\hat{f}}(r_A - x) - {\hat{g}}\left( \frac{c}{r_A - x}\right) . \end{aligned}$$

By monotonicity of ${\hat{f}}$ and ${\hat{g}}$, h can only be increasing, increasing then decreasing, or decreasing on $(0, r_A - \frac{c}{r_B})$. In any case, h can only attain minimum on the boundary $x \in \left\{ 0, r_A - c/r_B\right\} $. The proof is then complete. $\square $

In the example below, we show that when $r_A - \sigma _A(n_A - x)$ and $r_B - \sigma _A(n_B - x)$ are not log-log convex, i.e., when $f(r_A) - f(r_A - x)$ and $g(r_B) - g(r_B - x)$ are not log-log concave, it is possible to have $L_C \geqslant R_C$. Thus, Theorem 45 cannot always be simplified to Theorem 46.

Example 4

In general, given that $f = \sigma _A^{-1}$ is convex, strictly increasing, and $f(0) = 0$, we cannot ignore the ${{\,\mathrm{{\mathcal {L}}}\,}}$-shaped grids in maximization. A counterexample is given below.

Consider $r_A = 5$ and

$$\begin{aligned} \sigma _A^{-1}(x) = f(x) = {\left\{ \begin{array}{ll} x &{} x \leqslant 3\\ \frac{5}{2} + \frac{1}{2}e^{2(x - 3)} &{} 3< x \leqslant 4\\ \frac{5}{2} + \frac{1}{2}e^{2} + e^2 (x - 4) &{} 4 < x \leqslant 5 \end{array}\right. }. \end{aligned}$$

Then one can check that $f(r_A) - f(r_A - x)$ is not log-log concave and ${\hat{f}}$ is not monotonically increasing.

We demonstrate that in this case the L-shaped grids cannot be discarded. Let us take $g = f$. Consider grid $S = L(1, 1; 5, 5)$ of size 9. We have $\langle S \rangle \approx 26.17$. With a rectangle of area 9 inside the $5 \times 5$ region, the maximum expansion size, rounding up to an integer, is $\lceil \phi _R(9) \rceil = \lceil f(5)\cdot f(9/5) \rceil = 25 < \lfloor \langle S \rangle \rfloor \leqslant \lfloor \phi _L(9) \rfloor $. Thus in this case we have to return to Theorem 45 or Theorem 9.

To see how Theorem 46 improves the bound in Theorem 9, we give the following example. However, despite that the bound in Theorem 46 is tighter, it is often no longer concave, and it is in general not possible to apply the bound in Theorem 46 recursively as in Theorem 11.

Example 5

First consider $\varvec{C}= \varvec{A}\otimes \varvec{B}$, where $\sigma _A(k) = k^{1/2}$ and $\sigma _B(k) = k^{1/4}$. If we apply Theorem 9 or equivalently Theorem 11, we find a rank expansion lower bound,

$$\begin{aligned} \sigma _C^{\text {prev}}(k) = k ^{1/4}. \end{aligned}$$

Now if we turn to Theorem 46 (or Corollary 47 stated after this example), for $k > n_B$, we get a different rank expansion lower bound,

$$\begin{aligned} \sigma _C^{\text {new}}(k) = \left( \frac{k}{n_B}\right) ^{1/2} \cdot n_B^{1/4} = n_B^{-1/4}\cdot k^{1/2}, \end{aligned}$$

and when $k \leqslant n_B$, $\sigma _C^{\text {new}}(k) = \sigma _C^{\text {prev}}(k)$. The bound is improved by a factor of $k^{1/4}$. Numerically, with $n_A = n_B = 100$, $k = 10 n_A$, we have $\lceil \sigma _C^{\text {prev}}(k)\rceil = 6$, whereas $\sigma _C^{\text {new}}(k) = 10$. However, note that although $\sigma _C^{\text {new}}$ is continuous, it is no longer concave. Hence, it is not possible to apply the new bound recursively.

For another illustration, let us use logarithms for the rank expansion lower bound. Let $\varvec{C}= \varvec{A}\otimes \varvec{A}$ with $\sigma _A(k) = \ln (k + 1)$. From Theorem 11 (or Theorem 9), we get

$$\begin{aligned} \sigma _C^{\text {prev}}(k) = \ln \left( \frac{k}{e-1} + 1\right) . \end{aligned}$$

Using the new bound by Theorem 46 (or Corollary 47), when $k/n_A \geqslant e - 1$, we have

$$\begin{aligned} \sigma _C^{\text {new}}(k) = \ln (n_A + 1) \cdot \ln (k/n_A + 1). \end{aligned}$$

Thus in this regime,

$$\begin{aligned} \exp \left( \sigma _C^{\text {prev}}(k)\right)&= \frac{k}{e-1} + 1 ; \end{aligned}$$

(24)

$$\begin{aligned} \exp \left( \sigma _C^{\text {new}}(k)\right)&= \left( \frac{k}{n_A} + 1\right) ^{\ln (n_A + 1)} . \end{aligned}$$

(25)

If we choose k to be proportional to $n_A$, (25) eventually surpasses (24) as $n_A \rightarrow \infty $, and thus we obtain a tighter lower bound, $\sigma _C^{\text {new}}$, on the rank expansion. Numerically, with $n_A = 100$, $k = 10 n_A$, we have (24) $\approx 583$, (25) $\approx 63996$. Thus, $\lceil \sigma _C^{\text {prev}}(k)\rceil = 7$, whereas $\lceil \sigma _C^{\text {new}}(k)\rceil = 12$.

Finally, recall that we can further simplify the optimization problem in $R_C$ using Lemma 31 when $\sigma _A$ and $\sigma _B$ are log-log concave. By combining Theorem 46 with Lemma 31, we have an even simpler expression of the rank expansion lower bound.

Corollary 47

Let everything be defined as in Theorem 45, if $\sigma _A(x)$, $\sigma _B(x)$ are log-log concave, $r_A - \sigma _A(n_A - x)$ and $r_B - \sigma _B(n_B - x)$ are log-log convex, on $(0, n_A)$ and $(0, n_B)$ respectively, e.g., polynomials and logarithm functions listed in Proposition 30, then

$$\begin{aligned} R_C(k) = \min _{\begin{array}{c} k_A\in [d_A, n_A],~k_B \in [d_B, n_B],\\ k_A\in \left\{ d_A, n_A\right\} \text { or } k_B \in \left\{ d_B, n_B\right\} ,\\ k_A k_B \geqslant k \end{array}} \sigma _A(k_A) \cdot \sigma _B(k_B) \end{aligned}$$

is a rank expansion lower bound for $\varvec{C}$.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Ju, C., Zhang, Y. & Solomonik, E. Communication Lower Bounds for Nested Bilinear Algorithms via Rank Expansion of Kronecker Products. Found Comput Math (2023). https://doi.org/10.1007/s10208-023-09633-8

Download citation

Received: 02 November 2021
Revised: 14 September 2023
Accepted: 15 September 2023
Published: 06 November 2023
DOI: https://doi.org/10.1007/s10208-023-09633-8

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Communication Lower Bounds for Nested Bilinear Algorithms via Rank Expansion of Kronecker Products

Abstract

Access this article

Similar content being viewed by others

Introduction to Communication Avoiding Algorithms for Direct Methods of Factorization in Linear Algebra

COSTA: Communication-Optimal Shuffle and Transpose Algorithm with Process Relabeling

Total Exchange Routing on Hierarchical Dual-Nets

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix: Improved Lower Bounds by Limiting the Grid Expansion

Example 3

1.1 The L-Shaped Bound

Definition 42

Definition 43

Lemma 44

Proof

Theorem 45

Proof

1.2 Simplifying the L-Shaped Bound

Theorem 46

Proof

Example 4

Example 5

Corollary 47

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Communication Lower Bounds for Nested Bilinear Algorithms via Rank Expansion of Kronecker Products

Abstract

Access this article

Similar content being viewed by others

Introduction to Communication Avoiding Algorithms for Direct Methods of Factorization in Linear Algebra

COSTA: Communication-Optimal Shuffle and Transpose Algorithm with Process Relabeling

Total Exchange Routing on Hierarchical Dual-Nets

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix: Improved Lower Bounds by Limiting the Grid Expansion

Appendix: Improved Lower Bounds by Limiting the Grid Expansion

Example 3

1.1 The L-Shaped Bound

Definition 42

Definition 43

Lemma 44

Proof

Theorem 45

Proof

1.2 Simplifying the L-Shaped Bound

Theorem 46

Proof

Example 4

Example 5

Corollary 47

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation