Skip to main content
Log in

An alternating nonmonotone projected Barzilai–Borwein algorithm of nonnegative factorization of big matrices

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

In this paper, a new alternating nonmonotone projected Barzilai–Borwein (BB) algorithm is developed for solving large scale problems of nonnegative matrix factorization. Unlike the existing algorithms available in the literature, a nonmonotone line search strategy is proposed to find suitable step lengths, and an adaptive BB spectral parameter is employed to generate search directions such that the constructed subproblems are efficiently solved. Apart from establishment of global convergence for this algorithm, numerical tests on three synthetic datasets, four public face image datasets and a real-world transcriptomic dataset are conducted to show advantages of the developed algorithm in this paper. It is concluded that in terms of numerical efficiency, noise robustness and quality of matrix factorization, our algorithm is promising and applicable to face image reconstruction, and deep mining of transcriptomic profiles of the sub-genomes in hybrid fish lineage, compared with the state-of-the-art algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Availability of data and material

The data used to support the findings of this study are available from the corresponding author upon request.

Notes

  1. http://www.cl.cam.ac.uk/research/dtg/attarchive/facedatabase.html.

  2. http://cbcl.mit.edu/software-datasets/FaceData2.html.

  3. http://cvc.cs.yale.edu/cvc/projects/yalefaces/yalefaces.html.

  4. http://www.kasrl.org/jaffe.html.

  5. http://www.ece.uwaterloo.ca/~z70wang/research/ssim/.

  6. https://github.com/TJY0622/TJY.

References

Download references

Funding

This research is supported by the National Natural Science Foundation of China (Grant No. 71671190), the Fundamental Research Funds for the Central Universities of Central South University (Grant No. 206021706) and the Hunan Provincial Innovation Foundation For Postgraduate (Grant No. 150110022).

Author information

Authors and Affiliations

Authors

Contributions

Z.W. conceived and designed the research plan and wrote the paper. T.L. and J.T. performed the mathematical modelling, development of the algorithm, experiments and wrote the paper.

Corresponding author

Correspondence to Zhong Wan.

Ethics declarations

Conflict of interest

We declare that all the authors have no any conflict of interest about submission and publication of this paper.

Code availability

All the computer codes used in this study are available from the corresponding author upon request.

Additional information

Responsible editor: Evangelos Papalexakis.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Proof of Remark 4

Proof

From (Lin 2007b) and (Gillis and Glineur 2008), the KKT conditions of Problem (1.1) are

$$\begin{aligned} \begin{array}{lll} W_{ia}\ge 0, &{}H_{bj}\ge 0, &{}\\ \nabla _W F(W, H)_{ia}\ge 0, &{}\nabla _H F(W, H)_{bj}\ge 0,&{} \\ W_{ia}\cdot \nabla _W F(W, H)_{ia}=0, &{}H_{bj} \cdot \nabla _H F(W, H)_{bj}=0, &{}\forall ~i, a, b, j. \end{array} \end{aligned}$$
(A.1)

Equivalently, (A.1) can be rewritten as

$$\begin{aligned} \begin{array}{lll} \min \{W_{ia}, \nabla _W F(W, H)_{ia}\}=0,&\min \{H_{bj}, \nabla _H F(W, H)_{bj}=0\},&\forall ~i, a, b, j. \end{array} \end{aligned}$$
(A.2)

By the following definitions,

$$\begin{aligned} \nabla ^{P}_{W}F(W, H)_{ia}= & {} \left\{ \begin{array}{ll} \nabla _W F(W,H)_{ia}, &{} W_{ia}>0, \\ \text {min}\{0, \nabla _W F(W,H)_{ia}\},&{} W_{ia}=0, \end{array} \right. \end{aligned}$$
(A.3)
$$\begin{aligned} \nabla ^{P}_{H}F(W, H)_{bj}= & {} \left\{ \begin{array}{ll} \nabla _H F(W,H)_{bj}, &{} H_{bj}>0, \\ \text {min}\{0, \nabla _H F(W,H_{bj}\},&{} H_{bj}=0, \end{array} \right. \end{aligned}$$
(A.4)

we can prove that the KKT conditions (A.1) and equations (2.20) are equal. This completes the proof of Remark 4. \(\square \)

Appendix B: Proof of Theorem 1

Proof

We prove Theorem 1 by mathematical induction.

For all \(t\ge 0\), from the first result in Lemma 3, it follows that \(D_t\) is a sufficiently descent direction at \(Z_t\). Then, similar to Lemma 1 in (Huang et al. 2018), there is a step length \(\lambda _t \in \{1, \rho , \rho ^2, \ldots , \}\) such that the following inequality holds:

$$\begin{aligned} f(Z_t+\lambda _t D_t) \le f(Z_t) + \delta _t {\bar{Q}}_{t+1} \lambda _t \langle \nabla f(Z_t), D_t\rangle . \end{aligned}$$
(B.1)

It is easy to see that for \(t=0\), (3.1) and (3.2) hold.

Suppose that (3.1) and (3.2) also hold for \(t\in \{1, 2, \ldots , r-1\}\). Then, from Algorithm 1, it follows that \(f(Z_{r}) \le C_{\ell (r)}\) if \({\bar{f}}_1 \le {\bar{C}}_1\). Otherwise, \(f(Z_{r}) \le f(Z_{r-1}+{\hat{\lambda }}_2 D_{r-1}) \le C_{\ell (r-1)}\) and \(\eta _{r-1}=0\), which implies \(f(Z_{r}) \le \max \{C_{\ell (r-1)}, f(Z_r)\}= C_{\ell (r)}\). Consequently, \(f(Z_{t})\le C_{\ell {(t)}}\) holds for all \(t \in \{1, 2, \ldots , r\}\).

For \(t=r\), take \({\hat{\lambda }}_1= \lambda _t\) in (B.1). Then,

$$\begin{aligned} f(Z_t+{\hat{\lambda }}_1 D_t)\le & {} f(Z_t) + \delta _t {\bar{Q}}_{t+1}{\hat{\lambda }}_1 \langle \nabla f(Z_t), D_t\rangle \nonumber \\\le & {} C_{\ell (t)}+\delta _t {\bar{Q}}_{t+1} {\hat{\lambda }}_1 \langle \nabla f(Z_t), D_t\rangle \nonumber \\\le & {} {\bar{\eta }}_t Q_t (C_{\ell (t)}-C_t)+C_{\ell (t)}+\delta _t {\bar{Q}}_{t+1} {\hat{\lambda }}_1 \langle \nabla f(Z_t), D_t\rangle . \end{aligned}$$
(B.2)

The last inequality in (B.2) is equivalent to (3.1).

Take \({\hat{\lambda }}_2=\lambda _t\). Then, it follows from (B.1) and \({\bar{Q}}_{t+1}\ge 1\) that

$$\begin{aligned} f(Z_t+{\hat{\lambda }}_2 D_t)\le & {} f(Z_t) + \delta _t {\bar{Q}}_{t+1}{\hat{\lambda }}_2 \langle \nabla f(Z_t), D_t\rangle \nonumber \\\le & {} C_{\ell (t)}+\delta _t {\bar{Q}}_{t+1} {\hat{\lambda }}_2 \langle \nabla f(Z_t), D_t\rangle \nonumber \\\le & {} C_{\ell (t)}+\delta _t {\hat{\lambda }}_2 \langle \nabla f(Z_t), D_t\rangle , \end{aligned}$$
(B.3)

where the last inequality in (B.2) indicates that (3.2) holds. Therefore, Algorithm 1 is well defined. \(\square \)

Appendix C: Proof of Lemma 1

Proof

From the definition of \(C_{\ell (t)}\) and the inequality (3.3), it follows that

$$\begin{aligned} C_{\ell (t+1)}= & {} \displaystyle {\max _{\max \{0, t-M+2\}\le j\le t+1} \{C_j\}}\\\le & {} \max \{C_{\ell (t)}, C_{t+1}\}\\= & {} C_{\ell (t)}. \end{aligned}$$

Thus, the sequence\(\{C_{\ell (t)}\}\) is nonincreasing. It directly follows that

$$\begin{aligned} f(Z_t)\le C_{\ell (t)} \le C_{\ell (0)}= f(Z_0)\le f(H_0). \end{aligned}$$

Consequently, we have \(\{Z_t\} \subset L(H_0)=\{H\ge 0: f(H)\le f(H_0)\}\). The first result has been proved.

Since f is bounded below, from Theorem 2 in (Huang et al. 2018), it directly follows that the inequalities (3.4) and (3.5) hold. This ends the proof. \(\square \)

Appendix D: Proof of Lemma 2

Proof

Since f is continuously differentiable and \(\nabla f\) is Lipschitz continuous with the Lipschitz constant L, it holds that for any \(\lambda >0\),

$$\begin{aligned} f(Z_t + \lambda D_t )-f(Z_t)= & {} \int _0^{\lambda } \langle \nabla f(Z_t +t D_t), D_t\rangle dt \\= & {} \lambda \langle \nabla f(Z_t), D_t\rangle +\int _0^{\lambda } \langle \nabla f(Z_t +t D_t)-\nabla f(Z_t), D_t\rangle dt \\\le & {} \lambda \langle \nabla f(Z_t), D_t\rangle + \int _0^{\lambda } Lt\Vert D_t\Vert _F^2 dt\\= & {} \lambda \langle \nabla f(Z_t), D_t\rangle +\dfrac{L}{2}\lambda ^2 \Vert D_t\Vert _F^2. \end{aligned}$$

Consequently, by taking \(\lambda =\dfrac{\lambda _t}{\rho }\), we have

$$\begin{aligned} f\left( Z_t + \dfrac{\lambda _t}{\rho } D_t \right) -f(Z_t)\le \dfrac{\lambda _t}{\rho } \langle \nabla f(Z_t), D_t\rangle +\dfrac{L}{2}\left( \dfrac{\lambda _t}{\rho }\right) ^2 \Vert D_t\Vert _F^2. \end{aligned}$$
(D.1)

On the other hand, for the obtained step length \(\lambda _t\), either \(\lambda _t=1\), or (3.1), or (3.2) fails at least once. That is to say,

$$\begin{aligned} f\left( Z_t + \dfrac{\lambda _t}{\rho } D_t\right)> & {} {{\bar{\eta }}}_t Q_t (C_{\ell (t)}-C_t) +C_{\ell (t)}+\delta _t {\bar{Q}}_{t+1} \dfrac{\lambda _t}{\rho } \langle \nabla f(Z_t), D_t\rangle \\\ge & {} C_{\ell (t)}+\delta _t {\bar{Q}}_{t+1} \dfrac{\lambda _t}{\rho } \langle \nabla f(Z_t), D_t\rangle \\\ge & {} f(Z_t) +\delta _t {\bar{Q}}_{t+1} \dfrac{\lambda _t}{\rho } \langle \nabla f(Z_t), D_t\rangle , \end{aligned}$$

or

$$\begin{aligned} f\left( Z_t + \dfrac{\lambda _t}{\rho } D_t\right)> & {} C_{\ell (t)}+\delta _t \dfrac{\lambda _t}{\rho } \langle \nabla f(Z_t), D_t\rangle \\\ge & {} f(Z_t)+\delta _t \dfrac{\lambda _t}{\rho } \langle \nabla f(Z_t), D_t\rangle \\\ge & {} f(Z_t) +\delta _t {\bar{Q}}_{t+1} \dfrac{\lambda _t}{\rho } \langle \nabla f(Z_t), D_t\rangle . \end{aligned}$$

Therefore,

$$\begin{aligned} f\left( Z_t + \dfrac{\lambda _t}{\rho } D_t\right) -f\left( Z_t\right) \ge \delta _t {\bar{Q}}_{t+1} \dfrac{\lambda _t}{\rho } \langle \nabla f\left( Z_t\right) , D_t\rangle . \end{aligned}$$
(D.2)

Directly from (D.1) and (D.2), it follows that

$$\begin{aligned} \dfrac{\lambda _t}{\rho } \langle \nabla f(Z_t), D_t\rangle +\dfrac{L}{2}\left( \dfrac{\lambda _t}{\rho }\right) ^2 \Vert D_t\Vert _F^2 \ge \delta _t {\bar{Q}}_{t+1} \dfrac{\lambda _t}{\rho } \langle \nabla f(Z_t), D_t\rangle . \end{aligned}$$

By nonnegativity of \(\lambda _t\) and \(\rho \), we know

$$\begin{aligned} \lambda _t \ge \dfrac{2(\delta _t {\bar{Q}}_{t+1}-1)\rho \langle \nabla f(Z_t), D_t\rangle }{L\Vert D_t\Vert _F^2}. \end{aligned}$$

Combining the conditions \(\delta _t {\bar{Q}}_{t+1} \le \delta _{\max }\), \(0< \delta _{\max }<1\), and the first result in Lemma 3, we have proved the inequality (3.6). \(\square \)

Appendix E: Proof of Theorem 2

Proof

By the definition of \(D^{\alpha }(H)\), we have

$$\begin{aligned} D^1 (Z_t)=P(Z_t-\nabla f(Z_t))-Z_t. \end{aligned}$$

Then, \(\Vert D^1 (Z_t)\Vert _F=0\) is equivalent to the termination condition of Algorithm 2.

From the second result in Lemma 3, Algorithm 2 will terminate in a finite iterations with \(\epsilon =0\) if \(Z_t\) is a stationary point of (2.3).

In the case that \(\{Z_t\}\) is an infinite sequence, we can prove that there exists an infinite subsequence \(l_1 \le l_2 \le \dots \) such that \(\Vert D^1 (Z_{l_t})\Vert _F\) approaches zero as t tends to \(\infty \). Take \(l_t=\ell (tM)-1\), from (3.5) in Lemma 1, it follows that

$$\begin{aligned} \sum _{t=2}^{\infty } \lambda _{l_t} \left| \langle \nabla f(Z_{l_t}), D_{l_t}\rangle \right| <\infty . \end{aligned}$$

From Lemma 2, for all t, there exists a positive constant \({{\hat{\lambda }}}\) such that \({{\hat{\lambda }}} \le \lambda _{l_t} \le 1\). Consequently,

$$\begin{aligned} \lim _{t \rightarrow \infty } \left| \langle \nabla f(Z_{l_t}), D_{l_t}\rangle \right| =0. \end{aligned}$$

From the first result in Lemma 3 and \(0<\alpha _{\min }\le \alpha _{l_t}\le \alpha _{\max }\), we have

$$\begin{aligned} \lim _{t \rightarrow \infty } \left\| D_{l_t}\right\| _F \le \lim _{t \rightarrow \infty } \alpha _{l_t} \left| \langle \nabla f(Z_{l_t}), D_{l_t}\rangle \right| =0, \end{aligned}$$

which implies

$$\begin{aligned} \lim _{t \rightarrow \infty } \left\| D_{l_t}\right\| _F =0. \end{aligned}$$

From P4 and P5 of Proposition 2.1 in (Hager and Zhang 2006), it follows that

$$\begin{aligned} \Vert D_{l_t}\Vert _F\ge \min \left\{ \alpha _{\min }, 1\right\} \Vert D^1 (Z_{l_t})\Vert _F. \end{aligned}$$

Then, \(\Vert D_{l_t}\Vert _F=0\) implies \(\Vert D^1 (Z_{l_t})\Vert _F=0\). Therefore, (3.8) holds. The proof has been completed. \(\square \)

Appendix F: Proof of Theorem 3

Proof

Clearly, as a modification of the block coordinate descent method with two blocks for solving nonlinear optimization problems, the search direction and the step length computed by Algorithm 3 have the same nice properties as those by the algorithms developed in (Bertsekas 1999; Grippo and Sciandrone 2000). Therefore, the convergence result of Algorithm 3 is obtained directly from Corollary 2 in (Grippo and Sciandrone 2000). \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, T., Tang, J. & Wan, Z. An alternating nonmonotone projected Barzilai–Borwein algorithm of nonnegative factorization of big matrices. Data Min Knowl Disc 35, 1972–2008 (2021). https://doi.org/10.1007/s10618-021-00773-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-021-00773-5

Keywords

Navigation