Elsevier

Signal Processing

Volume 188, November 2021, 108178
Signal Processing

Computation of low-rank tensor approximation under existence constraint via a forward-backward algorithm

https://doi.org/10.1016/j.sigpro.2021.108178Get rights and content

Highlights

  • Performance analysis of iterative algorithms for the calculation of the CP decomposition.

  • A novel problem formulation for the CP tensor decomposition via logarithmic regularized barriers.

  • Forward-backward algorithm to compute the CP tensor decomposition under coherence constraints.

  • Accelerated proximal gradient remains slightly faster than forward-backward for difficult cases.

  • Better performance of forward-backward compared to other algorithms for less difficult cases.

Abstract

The Canonical Polyadic (CP) tensor decomposition has become an attractive mathematical tool in several fields during the last ten years. This decomposition is very powerful for representing and analyzing multidimensional data. The most attractive feature of the CP decomposition is its uniqueness, contrary to rank-revealing matrix decompositions, where the problem of rotational invariance remains. This paper presents the performance analysis of iterative descent algorithms for calculating the CP decomposition of tensors when columns of factor matrices are almost collinear – i.e. swamp problems arise. We propose in this paper a new and efficient proximal algorithm based on the Forward Backward splitting method. More precisely, the existence of the best low-rank tensor approximation is ensured thanks to a coherence constraint implemented via a logarithmic regularized barrier. Computer experiments demonstrate the efficiency and stability of the proposed algorithm in comparison to other iterative algorithms in the literature for the normal case, and also producing significant results even in difficult situations.

Introduction

In numerous applications, signals or data may depend on several quantities such as spatial coordinates, velocity, time, frequency, temperature, etc. And are therefore naturally represented by higher-order arrays of numerical values, which are generally known as tensors. Basically, a tensor is considered as a mathematical object that possesses the properties of multi-linearity when changing the coordinate system [1]. For our purposes, it will be sufficient to see a tensor of order N as a multi-dimensional array in which every element is accessed via N indices. For instance, a first-order tensor is a vector, which is simply a column of numbers, a second-order tensor is a matrix, a third-order tensor appears as numbers arranged in a rectangular box (or a cube, if all modes have the same dimension), etc. Tensors of order higher than two possess properties that are not enjoyed by matrices and vectors.

We shall mainly focus on the so-called Canonical Polyadic (CP) decomposition of tensors [2]. Because of a rediscovery forty years later, it received other names such as Parafac [3], [4] or Candecomp [5]. The acronym “CP” can smartly stand for either “Canonical Polyadic” or “Candecomp/Parafac”, as pointed out in Kiers [6], Comon et al. [7]. We shall assume this terminology. The CP decomposition has been already used in various fields [8], [9], [10], [11], [12], [13]. The main interest in using the CP decomposition lies in its uniqueness, under rather mild hypotheses [14], [15]. Other tensor decompositions exist, but permit only compression and not parameter identification since they are not unique [16], [17]. Various CP decomposition algorithms can encounter problems of slowness or sometimes lack of convergence; such cases may be due to tensor degeneracies. These situations have been well categorized by Richard Harshman [18] into the following three cases: Bottleneck when two or more factors in one of the modes are almost collinear [19], Swamp when all modes have at least two quasi-collinear factors [19], [20], [21] – a general case of bottleneck, and CP-degeneracies when some of the factors diverge to infinity while tend to cancelling each other, producing a better data fit [22], [23].

The traditional algorithm to compute the CP decomposition is ALS (Alternating Least Squares), which was initially proposed in Carroll and Chang [5]. Basically, the ALS is an iterative algorithm that progressively updates each unknown parameter individually in an alternating fashion starting from an initial guess. ALS continues until it can no longer improve the solution, or it reaches the allowed maximum number of iterations. As a result, the system to be solved is then turned into three simple least squares (LS) problems. The ALS algorithm converges towards a local minimum under mild conditions [24]. However, its convergence towards the global minimum can sometimes be slow, if ever reached. Furthermore, the convergence of the algorithm may, in certain cases, fall in swamps [20], where the convergence rate is very low and the error between two successive iterations remains unchanged. Various solutions have been proposed to face the slowness of convergence of the ALS algorithm, for instance [11], [25], [26]. The idea is to produce a good initialization via a Generalized Eigenvalue Decomposition (GEVD) of two tensor slices. But such an initialization assumes that the two factor matrices are of full rank, and that the third one doesn’t contain zero elements. Another way to increase the ALS convergence speed is to compress the tensor via a Tucker3 decomposition [27], [28]. In [11], the Tucker3 compression is applied as well as an initialization based on the proper analysis; this process has become a common practice to reduce the computational burden [7]. Other tricks to speed-up convergence include the Enhanced Line Search (ELS) method [19], which has demonstrated its effectiveness in cases of tensor affected by degenerative factors (Factor degeneracies), or to reduce ALS sensitivity to initialization [29]. The above algorithms enhance the convergence speed of the ALS algorithm but are still unable to handle swamp phenomena. Besides, a partitioned ALS algorithm was proposed in Tichavskỳ et al. [30], Phan et al. [31], in which factor matrices are partitioned into blocks, and blocks of different factor matrices are updates jointly. In contrast to alternating algorithms, all-at-once optimization algorithms [7], [32], [33], [34] allow simultaneous estimation of the matrix factors. These algorithms have shown good performances to deal with swamp or bottleneck phenomena when some required conditions are satisfied.

Recent studies [12], [34], [35] have shown that the introduction of coherence constraints overcome these phenomena. The solution proposed in Farias et al. [35] is a direct adaptation of ALS using the Dykstra projection algorithm over all correlation matrices. In the proposals [12], [34], which consist in estimating the factor matrices in a simultaneous way, such methods have not only proved to be efficient and stable for calculating the CP decomposition in the normal case, but also in difficult cases, where the estimated factors are close to collinear.

This document presents a performance evaluation of the iterative descent algorithms for calculating CP decomposition such as the unconstrained gradient descent algorithm [10], the constrained gradient descent algorithm [12], the proximal gradient descent algorithm [34] and the accelerated proximal gradient descent algorithm [34] and where we propose another new algorithm links the logarithmic barrier function with proximal methods [36], which are now reliable and powerful optimization tools, leading to a variety of proximal algorithms, such as the Proximal Point algorithm, the Proximal Gradient algorithm, the Forward Backward algorithm, and several others including linearization and/or splitting [37]. Since the CP decomposition problem in the simultaneous estimation version represents a non-convex problem [38], it is therefore necessary to develop efficient and fast algorithms for such problem. Among non-convex splitting methods, one can find the works of [39], [40], where a monotonous descent of the objective value is imposed to ensure convergence, while on the other hand, there are some recent works [41], [42] that introduce a generic method for non-convex non-smooth problems based on Kurdyka–Lojasiewicz theory. We shall be particularly interested in the Forward Backward proximal algorithm by following the convergence proof of [40] as it satisfies the assumptions of our novel CP decomposition formulation.

The paper is organized as follows. The Section 2 states notation and basic properties of tensors. In Section 3, formulation and conditions of CP approximation under coherence constraint are explained. In Section 4 we present the novel formulation for CP Decomposition as well as our new optimization algorithm. In Section 5 we report computer experiments, and eventually conclude in Section 6.

Section snippets

Preliminaries

First, we start by introducing some basic notations that will be used in this article. Scalars are denoted by lower-case letters, e.g., a, vectors are denoted by lower-case bold letters, e.g., a, matrices are denoted by upper-case bold letters, e.g., M, and tensors are denoted by calligraphic letters, e.g., T. Also, the conjugate of an element a will be denoted a*, while its modulus |a|. The conjugate of the matrix M will be denoted M*, its transpose MT and its Hermitian transpose MH.

Low rank

The problem we want to solve is the following: given a third-order tensor XCI×J×K, compute its best approximation of rank R. In other words, for a given R, the goal is to minimize an objective function f of the form:f(A,B,C;λ)=XX^F2=Xr=1Rλr(arbrcr)F2.As an alternative, we can also write the minimization of (8) using the vectorization defined in (3) with x=vec{X} as:minx^f(x^)=minx^xx^F2=minA,B,C,λxr=1Rλr(arbrcr)F2,where symbol represents the Kronecker product [49]. The

Problem formulation

For the sake of simplicity of writing, we will stack factor matrices in a single vector defined as: x=vec{[AT,BT,CT]}, and note that in this article, we have chosen to define the function g as the logarithmic term ln(Cp(x)) of the coherence constraint Cp(x). Formally, we face a constrained optimization problem of the form:minxf(x)s.t.g(x)0.

This optimization problem with inequality constraint can be solved by converting it to an unconstrained optimization problem with an appropriate penalty,

Results and discussions

In this section, we compare various algorithms based on computer experiments performed under two scenarios: (i) the first one, which is is a concrete example in which it is necessary to deal with the swamp phenomenon, i.e. where the factor matrices are highly correlated in all three modes and the constraint is not satisfied (Cp0) at convergence, and (ii) the second scenario addresses the normal case of the CP decomposition, where the limit point lies in the admissible region, i.e. when the

Conclusions

We have provided a novel formulation for the CP decomposition related to the logarithmic barrier function. We also described a new method to compute this decomposition for both normal and difficult cases, i.e. when the problem of swamp arises. This method is based on the Forward Backward splitting algorithm, with a simple and effective monitoring strategy capable of calculating the minimal CP decomposition of three-way arrays. We performed a complete comparison based on computer experiments,

CRediT authorship contribution statement

Marouane Nazih: Conceptualization, Methodology, Funding acquisition, Data curation, Formal analysis, Writing - original draft, Validation. Khalid Minaoui: Conceptualization, Methodology, Funding acquisition, Data curation, Formal analysis, Writing - review & editing, Validation. Elaheh Sobhani: Conceptualization, Methodology, Funding acquisition, Data curation, Formal analysis, Writing - original draft, Validation. Pierre Comon: Conceptualization, Methodology, Funding acquisition, Data

Declaration of Competing Interest

Authors declare that they have no conflict of interest.

References (60)

  • J.-L. Starck et al.

    Sparse Image and Signal Processing: Wavelets, Curvelets, Morphological Diversity

    (2010)
  • M. Nazih et al.

    A progression strategy of proximal algorithm for the unconstrained optimization

    2018 4th International Conference on Optimization and Applications (ICOA)

    (2018)
  • P. Comon

    Tensor decompositions-state of the art and applications

    Keynote Address in IMA Conf, Mathematics in Signal Processing, Warwick, UK

    (2000)
  • F.L. Hitchcock

    The expression of a tensor or a polyadic as a sum of products

    J. Math. Phys.

    (1927)
  • R.A. Harshman, et al., Foundations of the PARAFAC procedure: models and conditions for an “explanatory” multimodal...
  • R.A. Harshman

    Determination and proof of minimum uniqueness conditions for PARAFAC1

    UCLA Work. Pap. Phonetics

    (1972)
  • J.D. Carroll et al.

    Analysis of individual differences in multidimensional scaling via an n-way generalization of “Eckart–Young” decomposition

    Psychometrika

    (1970)
  • H.A.L. Kiers

    Towards a standardized notation and terminology in multiway analysis

    J. Chemom.

    (2000)
  • P. Comon et al.

    Tensor decompositions, alternating least squares and other tales

    J. Chemom.

    (2009)
  • K.R. Murphy et al.

    Fluorescence spectroscopy and multi-way techniques. PARAFAC

    Anal. Methods

    (2013)
  • A. Rouijel et al.

    CP decomposition approach to blind separation for DS-CDMA system using a new performance index

    EURASIP J. Adv. Signal Process.

    (2014)
  • N.D. Sidiropoulos et al.

    Blind PARAFAC receivers for DS-CDMA systems

    IEEE Trans. Signal Process.

    (2000)
  • S. Sahnoun et al.

    Joint source estimation and localization

    IEEE Trans. Signal Process.

    (2015)
  • X. Liu et al.

    Denoising of hyperspectral images using the PARAFAC model and statistical performance analysis

    IEEE Trans. Geosci. Remote Sens.

    (2012)
  • L.R. Tucker

    Some mathematical notes on three-mode factor analysis

    Psychometrika

    (1966)
  • T.G. Kolda

    Orthogonal tensor decompositions

    SIAM J. Matrix Anal. Appl.

    (2001)
  • J. Kruskal et al.

    How 3-MFA data can cause degenerate PARAFAC solutions, among other relationships

    Multiway Anal.

    (1989)
  • M. Rajih et al.

    Enhanced line search : a novel method to accelerate PARAFAC

    SIAM J. Matrix Anal. Appl.

    (2008)
  • B.C. Mitchell et al.

    Slowly converging PARAFAC sequences: swamps and two-factor degeneracies

    J. Chemom.

    (1994)
  • R.A. Harshman

    The problem and nature of degenerate solutions or decompositions of 3-way arrays

    Talk at the Tensor Decompositions Workshop, Palo Alto, CA

    (2004)
  • Cited by (1)

    • Computation of the nonnegative canonical tensor decomposition with two accelerated proximal gradient algorithms

      2022, Digital Signal Processing: A Review Journal
      Citation Excerpt :

      To the best of our knowledge, proximal algorithms have not yet been provided in the literature to simultaneously estimate factor matrices of the NCP. This all-at-once computation was considered for the CP decomposition in [29,56] but under a coherence constraint. One such approach has also been used under the nonnegative constraint in [51] for which the minimization was performed using the Enhanced Line Search (ELS) and the alternating backtracking with ELS for the gradient and quasi-Newton algorithms.

    View full text