Computation of low-rank tensor approximation under existence constraint via a forward-backward algorithm
Introduction
In numerous applications, signals or data may depend on several quantities such as spatial coordinates, velocity, time, frequency, temperature, etc. And are therefore naturally represented by higher-order arrays of numerical values, which are generally known as tensors. Basically, a tensor is considered as a mathematical object that possesses the properties of multi-linearity when changing the coordinate system [1]. For our purposes, it will be sufficient to see a tensor of order as a multi-dimensional array in which every element is accessed via indices. For instance, a first-order tensor is a vector, which is simply a column of numbers, a second-order tensor is a matrix, a third-order tensor appears as numbers arranged in a rectangular box (or a cube, if all modes have the same dimension), etc. Tensors of order higher than two possess properties that are not enjoyed by matrices and vectors.
We shall mainly focus on the so-called Canonical Polyadic (CP) decomposition of tensors [2]. Because of a rediscovery forty years later, it received other names such as Parafac [3], [4] or Candecomp [5]. The acronym “CP” can smartly stand for either “Canonical Polyadic” or “Candecomp/Parafac”, as pointed out in Kiers [6], Comon et al. [7]. We shall assume this terminology. The CP decomposition has been already used in various fields [8], [9], [10], [11], [12], [13]. The main interest in using the CP decomposition lies in its uniqueness, under rather mild hypotheses [14], [15]. Other tensor decompositions exist, but permit only compression and not parameter identification since they are not unique [16], [17]. Various CP decomposition algorithms can encounter problems of slowness or sometimes lack of convergence; such cases may be due to tensor degeneracies. These situations have been well categorized by Richard Harshman [18] into the following three cases: Bottleneck when two or more factors in one of the modes are almost collinear [19], Swamp when all modes have at least two quasi-collinear factors [19], [20], [21] – a general case of bottleneck, and CP-degeneracies when some of the factors diverge to infinity while tend to cancelling each other, producing a better data fit [22], [23].
The traditional algorithm to compute the CP decomposition is ALS (Alternating Least Squares), which was initially proposed in Carroll and Chang [5]. Basically, the ALS is an iterative algorithm that progressively updates each unknown parameter individually in an alternating fashion starting from an initial guess. ALS continues until it can no longer improve the solution, or it reaches the allowed maximum number of iterations. As a result, the system to be solved is then turned into three simple least squares (LS) problems. The ALS algorithm converges towards a local minimum under mild conditions [24]. However, its convergence towards the global minimum can sometimes be slow, if ever reached. Furthermore, the convergence of the algorithm may, in certain cases, fall in swamps [20], where the convergence rate is very low and the error between two successive iterations remains unchanged. Various solutions have been proposed to face the slowness of convergence of the ALS algorithm, for instance [11], [25], [26]. The idea is to produce a good initialization via a Generalized Eigenvalue Decomposition (GEVD) of two tensor slices. But such an initialization assumes that the two factor matrices are of full rank, and that the third one doesn’t contain zero elements. Another way to increase the ALS convergence speed is to compress the tensor via a Tucker3 decomposition [27], [28]. In [11], the Tucker3 compression is applied as well as an initialization based on the proper analysis; this process has become a common practice to reduce the computational burden [7]. Other tricks to speed-up convergence include the Enhanced Line Search (ELS) method [19], which has demonstrated its effectiveness in cases of tensor affected by degenerative factors (Factor degeneracies), or to reduce ALS sensitivity to initialization [29]. The above algorithms enhance the convergence speed of the ALS algorithm but are still unable to handle swamp phenomena. Besides, a partitioned ALS algorithm was proposed in Tichavskỳ et al. [30], Phan et al. [31], in which factor matrices are partitioned into blocks, and blocks of different factor matrices are updates jointly. In contrast to alternating algorithms, all-at-once optimization algorithms [7], [32], [33], [34] allow simultaneous estimation of the matrix factors. These algorithms have shown good performances to deal with swamp or bottleneck phenomena when some required conditions are satisfied.
Recent studies [12], [34], [35] have shown that the introduction of coherence constraints overcome these phenomena. The solution proposed in Farias et al. [35] is a direct adaptation of ALS using the Dykstra projection algorithm over all correlation matrices. In the proposals [12], [34], which consist in estimating the factor matrices in a simultaneous way, such methods have not only proved to be efficient and stable for calculating the CP decomposition in the normal case, but also in difficult cases, where the estimated factors are close to collinear.
This document presents a performance evaluation of the iterative descent algorithms for calculating CP decomposition such as the unconstrained gradient descent algorithm [10], the constrained gradient descent algorithm [12], the proximal gradient descent algorithm [34] and the accelerated proximal gradient descent algorithm [34] and where we propose another new algorithm links the logarithmic barrier function with proximal methods [36], which are now reliable and powerful optimization tools, leading to a variety of proximal algorithms, such as the Proximal Point algorithm, the Proximal Gradient algorithm, the Forward Backward algorithm, and several others including linearization and/or splitting [37]. Since the CP decomposition problem in the simultaneous estimation version represents a non-convex problem [38], it is therefore necessary to develop efficient and fast algorithms for such problem. Among non-convex splitting methods, one can find the works of [39], [40], where a monotonous descent of the objective value is imposed to ensure convergence, while on the other hand, there are some recent works [41], [42] that introduce a generic method for non-convex non-smooth problems based on Kurdyka–Lojasiewicz theory. We shall be particularly interested in the Forward Backward proximal algorithm by following the convergence proof of [40] as it satisfies the assumptions of our novel CP decomposition formulation.
The paper is organized as follows. The Section 2 states notation and basic properties of tensors. In Section 3, formulation and conditions of CP approximation under coherence constraint are explained. In Section 4 we present the novel formulation for CP Decomposition as well as our new optimization algorithm. In Section 5 we report computer experiments, and eventually conclude in Section 6.
Section snippets
Preliminaries
First, we start by introducing some basic notations that will be used in this article. Scalars are denoted by lower-case letters, e.g., , vectors are denoted by lower-case bold letters, e.g., a, matrices are denoted by upper-case bold letters, e.g., M, and tensors are denoted by calligraphic letters, e.g., . Also, the conjugate of an element a will be denoted , while its modulus . The conjugate of the matrix M will be denoted , its transpose and its Hermitian transpose .
Low rank
The problem we want to solve is the following: given a third-order tensor , compute its best approximation of rank . In other words, for a given , the goal is to minimize an objective function of the form:As an alternative, we can also write the minimization of (8) using the vectorization defined in (3) with as:where symbol represents the Kronecker product [49]. The
Problem formulation
For the sake of simplicity of writing, we will stack factor matrices in a single vector defined as: , and note that in this article, we have chosen to define the function as the logarithmic term of the coherence constraint . Formally, we face a constrained optimization problem of the form:
This optimization problem with inequality constraint can be solved by converting it to an unconstrained optimization problem with an appropriate penalty,
Results and discussions
In this section, we compare various algorithms based on computer experiments performed under two scenarios: (i) the first one, which is is a concrete example in which it is necessary to deal with the swamp phenomenon, i.e. where the factor matrices are highly correlated in all three modes and the constraint is not satisfied () at convergence, and (ii) the second scenario addresses the normal case of the CP decomposition, where the limit point lies in the admissible region, i.e. when the
Conclusions
We have provided a novel formulation for the CP decomposition related to the logarithmic barrier function. We also described a new method to compute this decomposition for both normal and difficult cases, i.e. when the problem of swamp arises. This method is based on the Forward Backward splitting algorithm, with a simple and effective monitoring strategy capable of calculating the minimal CP decomposition of three-way arrays. We performed a complete comparison based on computer experiments,
CRediT authorship contribution statement
Marouane Nazih: Conceptualization, Methodology, Funding acquisition, Data curation, Formal analysis, Writing - original draft, Validation. Khalid Minaoui: Conceptualization, Methodology, Funding acquisition, Data curation, Formal analysis, Writing - review & editing, Validation. Elaheh Sobhani: Conceptualization, Methodology, Funding acquisition, Data curation, Formal analysis, Writing - original draft, Validation. Pierre Comon: Conceptualization, Methodology, Funding acquisition, Data
Declaration of Competing Interest
Authors declare that they have no conflict of interest.
References (60)
- et al.
Blind separation of sources, part I: an adaptive algorithm based on neuromimetic architecture
Signal Process.
(1991) Three-way arrays: rank and uniqueness of trilinear decompositions, with application to arithmetic complexity and statistics
Linear Algebra Appl.
(1977)- et al.
On Kruskal’s uniqueness condition for the Candecomp/Parafac decomposition
Linear Algebra Appl.
(2007) - et al.
Two-factor degeneracies and a stabilization of PARAFAC
Chemom. Intell. Lab. Syst.
(1997) - et al.
Some convergence results on the regularized alternating least-squares method for tensor decomposition
Linear Algebra Appl.
(2013) - et al.
Improving the speed of multi-way algorithms:: Part i. tucker3
Chemom. Intell. Lab. Syst.
(1998) - et al.
Improving the speed of multiway algorithms: Part ii: Compression
Chemom. Intell. Lab. Syst.
(1998) - et al.
Low complexity damped Gauss–Newton algorithms for Candecomp/Parafac
SIAM J. Matrix Anal. Appl.
(2013) - et al.
Coherence constrained alternating least squares
2018 26th European Signal Processing Conference (EUSIPCO)
(2018) - et al.
Tensor rank and the ill-posedness of the best low-rank approximation problem
SIAM J. Matrix Anal. Appl.
(2008)
Sparse Image and Signal Processing: Wavelets, Curvelets, Morphological Diversity
A progression strategy of proximal algorithm for the unconstrained optimization
2018 4th International Conference on Optimization and Applications (ICOA)
Tensor decompositions-state of the art and applications
Keynote Address in IMA Conf, Mathematics in Signal Processing, Warwick, UK
The expression of a tensor or a polyadic as a sum of products
J. Math. Phys.
Determination and proof of minimum uniqueness conditions for PARAFAC1
UCLA Work. Pap. Phonetics
Analysis of individual differences in multidimensional scaling via an n-way generalization of “Eckart–Young” decomposition
Psychometrika
Towards a standardized notation and terminology in multiway analysis
J. Chemom.
Tensor decompositions, alternating least squares and other tales
J. Chemom.
Fluorescence spectroscopy and multi-way techniques. PARAFAC
Anal. Methods
CP decomposition approach to blind separation for DS-CDMA system using a new performance index
EURASIP J. Adv. Signal Process.
Blind PARAFAC receivers for DS-CDMA systems
IEEE Trans. Signal Process.
Joint source estimation and localization
IEEE Trans. Signal Process.
Denoising of hyperspectral images using the PARAFAC model and statistical performance analysis
IEEE Trans. Geosci. Remote Sens.
Some mathematical notes on three-mode factor analysis
Psychometrika
Orthogonal tensor decompositions
SIAM J. Matrix Anal. Appl.
How 3-MFA data can cause degenerate PARAFAC solutions, among other relationships
Multiway Anal.
Enhanced line search : a novel method to accelerate PARAFAC
SIAM J. Matrix Anal. Appl.
Slowly converging PARAFAC sequences: swamps and two-factor degeneracies
J. Chemom.
The problem and nature of degenerate solutions or decompositions of 3-way arrays
Talk at the Tensor Decompositions Workshop, Palo Alto, CA
Cited by (1)
Computation of the nonnegative canonical tensor decomposition with two accelerated proximal gradient algorithms
2022, Digital Signal Processing: A Review JournalCitation Excerpt :To the best of our knowledge, proximal algorithms have not yet been provided in the literature to simultaneously estimate factor matrices of the NCP. This all-at-once computation was considered for the CP decomposition in [29,56] but under a coherence constraint. One such approach has also been used under the nonnegative constraint in [51] for which the minimization was performed using the Enhanced Line Search (ELS) and the alternating backtracking with ELS for the gradient and quasi-Newton algorithms.