Abstract
Computing sparse solutions to overdetermined linear systems is a ubiquitous problem in several fields such as regression analysis, signal and image processing, information theory and machine learning. Additional non-negativity constraints in the solution are useful for interpretability. Most of the previous research efforts aimed at approximating the sparsity constrained linear least squares problem, and/or finding local solutions by means of descent algorithms. The objective of the present paper is to report on an efficient and modular implicit enumeration algorithm to find provably optimal solutions to the NP-hard problem of sparsity-constrained non-negative least squares. We focus on the problem where the system is assumed to be over-determined where the matrix has full column rank. Numerical results with real test data as well as comparisons of competing methods and an application to hyperspectral imaging are reported. Finally, we present a Python library implementation of our algorithm.
Similar content being viewed by others
Notes
As alluded to in Sect. 2, in machine learning and statistics an alternative notation is commonly used:
$$\begin{aligned} \begin{array}{ll@{}ll} \underset{\beta }{\min }&{}\left\Vert y - X\beta \right\Vert _2^2 &{}\\ \text {s.t.} &{}\beta \ge 0 &{}\\ \end{array} \end{aligned}$$(3.2)where \(\beta \in {\mathbb {R}}^{k},y \in {\mathbb {R}}^{n}\), \(X \in {\mathbb {R}}^{n\times k}\) and \(\beta\)s are referred to as coefficients. Unless otherwise indicated, we prefer to stick to the usual A, b-notation of linear systems of equations.
References
Aktaş FS, Ekmekçioğlu O, Pınar M Ç (2020) User’s guide to nnlsspar, https://github.com/Fatih-S-AKTAS/LSSPAR
Atamturk A, Gomez A (2020) Safe screening rules for l0-regression from perspective relaxations. In: Dauma H III, Singh A (eds) Proceedings of the 37th international conference on machine learning, vol 119 of Proceedings of machine learning research, PMLR, 13–18, pp. 421–430, http://proceedings.mlr.press/v119/atamturk20a.html
Beck A, Eldar YC (2013) Sparsity constrained nonlinear optimization: optimality conditions and algorithms. SIAM J Optim 23:1480–1509. https://doi.org/10.1137/120869778
Bertsimas D, King A, Mazumder R (2016) Best subset selection via a modern optimization lens. Ann Stat 44:813–852. https://doi.org/10.1214/15-aos1388
Bhatia K, Jain P, Kar P (2015) Robust regression via hard thresholding. arXiv:1506:02428
Bourguignon S, Ninin J, Carfantan H, Mongeau M (2016) Exact sparse approximation problems via mixed-integer programming: formulations and computational performance. IEEE Trans Signal Proc 64:1405–1419
Breiman L, Friedma JH (1985) Estimating optimal transformations for multiple regression and correlation. J Am Stat Assoc. https://doi.org/10.1186/s13040-017-0154-4
Chandrasekaran V, Recht B, Parrilo PA, Willsky AS (2012) The convex geometry of linear inverse problems. Found Comput Math 12:805–849. https://doi.org/10.1007/s10208-012-9135-7
Cplex II (2009) V12. 1: User’s manual for cplex. In: International business machines corporation vol 46
Diamond S, Boyd S (2016) CVXPY: a Python-embedded modeling language for convex optimization. J Mach Learn Res 17:1–5
Donne DD, Kowalski M, Liberti L (2020) Mip and set covering approaches for sparse approximation. In: iTWIST annual conference
Drumetz L, Meyer TR, Chanussot J, Bertozzi AL, Jutten C (2019) Hyperspectral image unmixing with endmember bundles and group sparsity inducing mixed norms. IEEE Trans Image Process 28:3435–3450. https://doi.org/10.1109/tip.2019.2897254
Eftekhari A, Esfahani PM (2021) The nonconvex geometry of linear inverse problems. arXiv:2101.02776
Eftekhari A, Tanner J, Thompson A, Toader B, Tyagi H (2021) Sparse non-negative super-resolution–simplified and stabilised. Appl Comput Harmon Anal 50:216–280. https://doi.org/10.1016/j.acha.2019.08.004, https://www.sciencedirect.com/science/article/pii/S1063520319300193
Engl H, Hanke M, Neubauer A (2000) Regularization of inverse problems. Mathematics and its applications, Springer, Netherlands https://books.google.com.tr/books?id=VuEV-Gj1GZcC
Ghaoui LE, Lebret H (1997) Robust solutions to least-squares problems with uncertain data. SIAM J Matrix Anal Appl 18:1035–1064. https://doi.org/10.1137/S0895479896298130
Golub GH, Loan CFV (1996) Matrix computations, John Hopkins University, Department of Computer Science, Stanford University; Department of Computer Science, Cornell University, 3rd. ed.,
Guo Y, Berman M (2012) A comparison between subset selection and l1 regularisation with an application in spectroscopy. Chemometric Intell Lab Syst 118:127–138. https://doi.org/10.1016/j.chemolab.2012.08.010, http://www.sciencedirect.com/science/article/pii/S0169743912001657
Gurobi L Optimization (2020) Gurobi optimizer reference manual. http://www.gurobi.com
Hamidieh K (2018) A data-driven statistical model for predicting the critical temperature of a superconductor. Comput Mater Sci 154:346–354. https://doi.org/10.1016/j.commatsci.2018.07.052
Heylen R, Burazerovic D, Scheunders P (2011) Fully constrained least squares spectral unmixing by simplex projection. IEEE Trans Geosci Remote Sens 49:4112–4122. https://doi.org/10.1109/tgrs.2011.2155070
Kelwin Fernandes PV, Cortez P (2015) A proactive intelligent decision support system for predicting the popularity of online news. In: Proceedings of the 17th EPIA, Portuguese conference on artificial intelligence
Lawson CL, Hanson RJ (1995) Solving least squares problems, SIAM. San Clemente, California; Rice University Houston, Texas classics in applied mathematics ed
Candanedo V. F. Luis M, Deramaix D (2017) Data driven prediction models of energy use of appliances in a low-energy house. Energy Build 140:81–97. https://doi.org/10.1016/j.enbuild.2017.01.083
Menon V, Du Q, Fowler JE (2016) Random-projection-based nonnegative least squares for hyperspectral image unmixing. In: 2016 8th workshop on hyperspectral image and signal processing: evolution in remote sensing (WHISPERS), https://doi.org/10.1109/whispers.2016.8071796
Miller A (2002) Subset selection in regression. CRC Press, Chapman & Hall/CRC Monographs on Statistics & Applied Probability
Nadisic N, Vandaele A, Gillis N, Cohen JE (2020) Exact sparse nonnegative least squares. In: ICASSP 2020–2020 IEEE international conference on acoustics, speech and signal processing (ICASSP). https://doi.org/10.1109/icassp40776.2020.9053295
Natarajan B (1995) Sparse approximate solutions to linear systems. SIAM J Comp 24:227–234
Rockafellar RT (1970) Convex analysis. Princeton University Press, Princeton, Princeton Mathematical Series
Slawski M, Hein M (2011) Sparse recovery by thresholded non-negative least squares. In: Shawe-Taylor J, Zemel R, Bartlett P, Pereira F, Weinberger KQ (eds) Advances in neural information processing systems, vol 24, Curran Associates, Inc., https://proceedings.neurips.cc/paper/2011/file/d6723e7cd6735df68d1ce4c704c29a04-Paper.pdf
Slawski M, Hein M (2013) Non-negative least squares for high-dimensional linear models: Consistency and sparse recovery without regularization. Electron J Stat 7:3004–3056. https://doi.org/10.1214/13-ejs868
Tikhonov AN, Goncharsky AV, Stepanov VV, Yagola AG (1995) Numerical methods for the solution of ill-posed problems. Mathematics and its applications, Springer, Dordrecht. https://doi.org/10.1007/978-94-015-8480-7, https://cds.cern.ch/record/1620560
Vidyasagar M (2019) An introduction to compressed sensing. SIAM, Philadelphia
Virtanen P, Gommers R, Oliphant TE, Haberland M, Reddy T, Cournapeau D, Burovski E, Peterson P, Weckesser W, Bright J, van der Walt SJ, Brett M, Wilson J, Millman KJ, Mayorov N, Nelson ARJ, Jones E, Kern R, Larson E, Carey CJ, Polat İ, Feng Y, Moore EW, VanderPlas J, Laxalde D, Perktold J, Cimrman R, Henriksen I, Quintero EA, Harris CR, Archibald AM, Ribeiro AH, Pedregosa F, van Mulbregt P, SciPy 1.0 Contributors, (2020) SciPy 1.0: fundamental algorithms for scientific computing in python. Nature Methods 17:261–272. https://doi.org/10.1038/s41592-019-0686-2
Winter ME (1999) N-findr: an algorithm for fast autonomous spectral end-member determination in hyperspectral data. Imaging Spectrometry V. https://doi.org/10.1117/12.366289
Zhu Y, Nilsson S (Nov. 2019) Guppy 3: A python programming environment & heap analysis toolset, github repository, https://github.com/zhuyifei1999/guppy3
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Submitted to the editors DATE.
Supplementary Information
Below is the link to the electronic supplementary material.
Appendices
Appendices
1.1 Details of parameters and variables
Both the matrix A and the vector b should be registered as numpy arrays and one should specify the data type as float64 even if all the values are integer to avoid potential numerical issues. While registering and working with matrices or vectors using Python, shape definition is very important. Shape of the vector leads to large changes in how linear algebra functions behave. Python has two different definitions of vector. If the user registers a one dimensional vector of ones of length n, Python registers it as a vector that has a shape of (n,). However, if instead numpy.ones([1,10]) is used, then it will mathematically express the same vector but in Python it will have the shape of (1,10). Using matrix operations with these two different vectors will give different results. The vector b should be registered as a column vector of shape (m,1).
NNLSSPAR calls the guppy3 package by Zhu and Nilsson for tracking the memory usage of the algorithm Zhu and Nilsson (2019). As a result, the guppy3 package should be installed to be able to use NNLSSPAR.
A description of the parameters of NNLSSPAR library is given below.
Parameters:
-
1.
A (input) = \(m\, \times \,n\) matrix storing the values of the independent variables where m must be greater than n
-
2.
b (input) = \(m\, \times \, 1\) vector storing the values of the dependent variables
-
3.
s (input) = An integer value indicating sparsity level (maximum number of nonzero elements in the solution)
-
4.
out (input, by default it is 1) = An integer value, that is a parameter controlling the detail level of output:
-
(a)
out = 0, Nothing will be printed
-
(b)
out = 1, The results and the hardware usage of the algorithm will be printed
-
(c)
out = 2, At every iteration, the lower bound of the current node and the number of nodes searched so far will be printed. After the algorithm terminates NNLSSPAR will report the results and the hardware usage. Although it is good to observe the progress of the algorithm, it should be noted here that it slows down the algorithm significantly.
-
(a)
-
5.
C (input, by default it is \(\emptyset\)) = Array of integers, storing the indices of chosen variables
-
6.
many (input, by default it is 4) = An integer specifying number of best subsets to be found
-
7.
residual_squared (output) = a list of length many ( \(s * many\) if all subsets function is used) containing residuals corresponding to subset found for some sparsity level. This only shows the residuals of the system after QR decomposition. True residual squared of the systems are residual_squared + permanent_residual
-
8.
indexes (output) = a list of length many (\(s * many\) if all subsets function is used) containing indices of the independent variables that make the subsets for some sparsity level
-
9.
coefficients (output) = a list of length many (\(s * many\) if all subsets function is used) containing values of the least squares solution of each subset for some sparsity level
-
10.
permanent_residual (output) = a float that records the residual squared of the system after orthogonal transform using QR decomposition of the matrix A
-
11.
cpu (output) = a float showing the CPU time of the algorithm in seconds. CPU time of the algorithm is print if out is not 0
-
12.
memory (output) = a float showing the total memory usage of the algorithm in bytes. Detailed table of memory usage is print if out is not 0
-
13.
real (output) = a float showing the wall time of the algorithm in seconds.
-
14.
nodes (hidden) = a list of integers showing number of visited nodes in the graph. It is equal to the number of branchings done due to best first search if only one solution is found. For each sparsity level it will be reported separately.
-
15.
rem_qsize (hidden) = a list of integers showing the number of unvisited nodes in the graph. This will be equal to nodes if only one solution is found. For each sparsity level it will be reported separately.
1.2 An example
Assume we are trying to solve an instance of a problem where A, b and s are specified for input into NNLSSPAR in Python syntax as follows:
In the output, the list of values of the variables are in the order given by the indexes. Hence, 6.64783762 is the value of fourth independent variable, 1.62440717 is the value of third independent variable, and so on.
1.3 A special case
If the solution of multiple subsets for each sparsity level \(i = 1,2, \ldots , s-1 ,s\) is desired, then the following function should be called:
Rights and permissions
About this article
Cite this article
Aktaş, F.S., Ekmekcioglu, Ö. & Pinar, M.Ç. Provably optimal sparse solutions to overdetermined linear systems with non-negativity constraints in a least-squares sense by implicit enumeration. Optim Eng 22, 2505–2535 (2021). https://doi.org/10.1007/s11081-021-09676-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11081-021-09676-2
Keywords
- Inverse problems
- Sparse approximation
- Overdetermined linear systems
- Sparse solutions
- Branch and bound
- Implicit enumeration
- Non-negative least squares