Provably optimal sparse solutions to overdetermined linear systems with non-negativity constraints in a least-squares sense by implicit enumeration

Aktaş, Fatih S.; Ekmekcioglu, Ömer; Pinar, Mustafa Ç.

doi:10.1007/s11081-021-09676-2

Provably optimal sparse solutions to overdetermined linear systems with non-negativity constraints in a least-squares sense by implicit enumeration

Research Article
Published: 28 August 2021

Volume 22, pages 2505–2535, (2021)
Cite this article

Optimization and Engineering Aims and scope Submit manuscript

Fatih S. Aktaş¹,
Ömer Ekmekcioglu¹ &
Mustafa Ç. Pinar¹

377 Accesses
2 Citations
Explore all metrics

Abstract

Computing sparse solutions to overdetermined linear systems is a ubiquitous problem in several fields such as regression analysis, signal and image processing, information theory and machine learning. Additional non-negativity constraints in the solution are useful for interpretability. Most of the previous research efforts aimed at approximating the sparsity constrained linear least squares problem, and/or finding local solutions by means of descent algorithms. The objective of the present paper is to report on an efficient and modular implicit enumeration algorithm to find provably optimal solutions to the NP-hard problem of sparsity-constrained non-negative least squares. We focus on the problem where the system is assumed to be over-determined where the matrix has full column rank. Numerical results with real test data as well as comparisons of competing methods and an application to hyperspectral imaging are reported. Finally, we present a Python library implementation of our algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The sparsest solution of the union of finite polytopes via its nonconvex relaxation

Article 18 February 2019

Sparse learning via Boolean relaxations

Article 31 March 2015

A linearly convergent doubly stochastic Gauss–Seidel algorithm for solving linear equations and a certain class of over-parameterized optimization problems

Article 22 May 2019

Notes

As alluded to in Sect. 2, in machine learning and statistics an alternative notation is commonly used:
$$\begin{aligned} \begin{array}{ll@{}ll} \underset{\beta }{\min }&{}\left\Vert y - X\beta \right\Vert _2^2 &{}\\ \text {s.t.} &{}\beta \ge 0 &{}\\ \end{array} \end{aligned}$$
(3.2)
where $\beta \in {\mathbb {R}}^{k},y \in {\mathbb {R}}^{n}$, $X \in {\mathbb {R}}^{n\times k}$ and $\beta$s are referred to as coefficients. Unless otherwise indicated, we prefer to stick to the usual A, b-notation of linear systems of equations.

References

Aktaş FS, Ekmekçioğlu O, Pınar M Ç (2020) User’s guide to nnlsspar, https://github.com/Fatih-S-AKTAS/LSSPAR
Atamturk A, Gomez A (2020) Safe screening rules for l0-regression from perspective relaxations. In: Dauma H III, Singh A (eds) Proceedings of the 37th international conference on machine learning, vol 119 of Proceedings of machine learning research, PMLR, 13–18, pp. 421–430, http://proceedings.mlr.press/v119/atamturk20a.html
Beck A, Eldar YC (2013) Sparsity constrained nonlinear optimization: optimality conditions and algorithms. SIAM J Optim 23:1480–1509. https://doi.org/10.1137/120869778
Article MathSciNet MATH Google Scholar
Bertsimas D, King A, Mazumder R (2016) Best subset selection via a modern optimization lens. Ann Stat 44:813–852. https://doi.org/10.1214/15-aos1388
Article MathSciNet MATH Google Scholar
Bhatia K, Jain P, Kar P (2015) Robust regression via hard thresholding. arXiv:1506:02428
Bourguignon S, Ninin J, Carfantan H, Mongeau M (2016) Exact sparse approximation problems via mixed-integer programming: formulations and computational performance. IEEE Trans Signal Proc 64:1405–1419
Article MathSciNet Google Scholar
Breiman L, Friedma JH (1985) Estimating optimal transformations for multiple regression and correlation. J Am Stat Assoc. https://doi.org/10.1186/s13040-017-0154-4
Article MathSciNet Google Scholar
Chandrasekaran V, Recht B, Parrilo PA, Willsky AS (2012) The convex geometry of linear inverse problems. Found Comput Math 12:805–849. https://doi.org/10.1007/s10208-012-9135-7
Article MathSciNet MATH Google Scholar
Cplex II (2009) V12. 1: User’s manual for cplex. In: International business machines corporation vol 46
Diamond S, Boyd S (2016) CVXPY: a Python-embedded modeling language for convex optimization. J Mach Learn Res 17:1–5
MathSciNet MATH Google Scholar
Donne DD, Kowalski M, Liberti L (2020) Mip and set covering approaches for sparse approximation. In: iTWIST annual conference
Drumetz L, Meyer TR, Chanussot J, Bertozzi AL, Jutten C (2019) Hyperspectral image unmixing with endmember bundles and group sparsity inducing mixed norms. IEEE Trans Image Process 28:3435–3450. https://doi.org/10.1109/tip.2019.2897254
Article MathSciNet MATH Google Scholar
Eftekhari A, Esfahani PM (2021) The nonconvex geometry of linear inverse problems. arXiv:2101.02776
Eftekhari A, Tanner J, Thompson A, Toader B, Tyagi H (2021) Sparse non-negative super-resolution–simplified and stabilised. Appl Comput Harmon Anal 50:216–280. https://doi.org/10.1016/j.acha.2019.08.004, https://www.sciencedirect.com/science/article/pii/S1063520319300193
Engl H, Hanke M, Neubauer A (2000) Regularization of inverse problems. Mathematics and its applications, Springer, Netherlands https://books.google.com.tr/books?id=VuEV-Gj1GZcC
Ghaoui LE, Lebret H (1997) Robust solutions to least-squares problems with uncertain data. SIAM J Matrix Anal Appl 18:1035–1064. https://doi.org/10.1137/S0895479896298130
Article MathSciNet MATH Google Scholar
Golub GH, Loan CFV (1996) Matrix computations, John Hopkins University, Department of Computer Science, Stanford University; Department of Computer Science, Cornell University, 3rd. ed.,
Guo Y, Berman M (2012) A comparison between subset selection and l1 regularisation with an application in spectroscopy. Chemometric Intell Lab Syst 118:127–138. https://doi.org/10.1016/j.chemolab.2012.08.010, http://www.sciencedirect.com/science/article/pii/S0169743912001657
Gurobi L Optimization (2020) Gurobi optimizer reference manual. http://www.gurobi.com
Hamidieh K (2018) A data-driven statistical model for predicting the critical temperature of a superconductor. Comput Mater Sci 154:346–354. https://doi.org/10.1016/j.commatsci.2018.07.052
Article Google Scholar
Heylen R, Burazerovic D, Scheunders P (2011) Fully constrained least squares spectral unmixing by simplex projection. IEEE Trans Geosci Remote Sens 49:4112–4122. https://doi.org/10.1109/tgrs.2011.2155070
Article Google Scholar
Kelwin Fernandes PV, Cortez P (2015) A proactive intelligent decision support system for predicting the popularity of online news. In: Proceedings of the 17th EPIA, Portuguese conference on artificial intelligence
Lawson CL, Hanson RJ (1995) Solving least squares problems, SIAM. San Clemente, California; Rice University Houston, Texas classics in applied mathematics ed
Candanedo V. F. Luis M, Deramaix D (2017) Data driven prediction models of energy use of appliances in a low-energy house. Energy Build 140:81–97. https://doi.org/10.1016/j.enbuild.2017.01.083
Article Google Scholar
Menon V, Du Q, Fowler JE (2016) Random-projection-based nonnegative least squares for hyperspectral image unmixing. In: 2016 8th workshop on hyperspectral image and signal processing: evolution in remote sensing (WHISPERS), https://doi.org/10.1109/whispers.2016.8071796
Miller A (2002) Subset selection in regression. CRC Press, Chapman & Hall/CRC Monographs on Statistics & Applied Probability
Nadisic N, Vandaele A, Gillis N, Cohen JE (2020) Exact sparse nonnegative least squares. In: ICASSP 2020–2020 IEEE international conference on acoustics, speech and signal processing (ICASSP). https://doi.org/10.1109/icassp40776.2020.9053295
Natarajan B (1995) Sparse approximate solutions to linear systems. SIAM J Comp 24:227–234
Article MathSciNet Google Scholar
Rockafellar RT (1970) Convex analysis. Princeton University Press, Princeton, Princeton Mathematical Series
Slawski M, Hein M (2011) Sparse recovery by thresholded non-negative least squares. In: Shawe-Taylor J, Zemel R, Bartlett P, Pereira F, Weinberger KQ (eds) Advances in neural information processing systems, vol 24, Curran Associates, Inc., https://proceedings.neurips.cc/paper/2011/file/d6723e7cd6735df68d1ce4c704c29a04-Paper.pdf
Slawski M, Hein M (2013) Non-negative least squares for high-dimensional linear models: Consistency and sparse recovery without regularization. Electron J Stat 7:3004–3056. https://doi.org/10.1214/13-ejs868
Article MathSciNet MATH Google Scholar
Tikhonov AN, Goncharsky AV, Stepanov VV, Yagola AG (1995) Numerical methods for the solution of ill-posed problems. Mathematics and its applications, Springer, Dordrecht. https://doi.org/10.1007/978-94-015-8480-7, https://cds.cern.ch/record/1620560
Vidyasagar M (2019) An introduction to compressed sensing. SIAM, Philadelphia
Book Google Scholar
Virtanen P, Gommers R, Oliphant TE, Haberland M, Reddy T, Cournapeau D, Burovski E, Peterson P, Weckesser W, Bright J, van der Walt SJ, Brett M, Wilson J, Millman KJ, Mayorov N, Nelson ARJ, Jones E, Kern R, Larson E, Carey CJ, Polat İ, Feng Y, Moore EW, VanderPlas J, Laxalde D, Perktold J, Cimrman R, Henriksen I, Quintero EA, Harris CR, Archibald AM, Ribeiro AH, Pedregosa F, van Mulbregt P, SciPy 1.0 Contributors, (2020) SciPy 1.0: fundamental algorithms for scientific computing in python. Nature Methods 17:261–272. https://doi.org/10.1038/s41592-019-0686-2
Winter ME (1999) N-findr: an algorithm for fast autonomous spectral end-member determination in hyperspectral data. Imaging Spectrometry V. https://doi.org/10.1117/12.366289
Zhu Y, Nilsson S (Nov. 2019) Guppy 3: A python programming environment & heap analysis toolset, github repository, https://github.com/zhuyifei1999/guppy3

Download references

Author information

Authors and Affiliations

Department of Industrial Engineering, Bilkent University, Ankara, Turkey
Fatih S. Aktaş, Ömer Ekmekcioglu & Mustafa Ç. Pinar

Authors

Fatih S. Aktaş
View author publications
You can also search for this author in PubMed Google Scholar
Ömer Ekmekcioglu
View author publications
You can also search for this author in PubMed Google Scholar
Mustafa Ç. Pinar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mustafa Ç. Pinar.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Submitted to the editors DATE.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 120 KB)

Appendices

1.1 Details of parameters and variables

Both the matrix A and the vector b should be registered as numpy arrays and one should specify the data type as float64 even if all the values are integer to avoid potential numerical issues. While registering and working with matrices or vectors using Python, shape definition is very important. Shape of the vector leads to large changes in how linear algebra functions behave. Python has two different definitions of vector. If the user registers a one dimensional vector of ones of length n, Python registers it as a vector that has a shape of (n,). However, if instead numpy.ones([1,10]) is used, then it will mathematically express the same vector but in Python it will have the shape of (1,10). Using matrix operations with these two different vectors will give different results. The vector b should be registered as a column vector of shape (m,1).

NNLSSPAR calls the guppy3 package by Zhu and Nilsson for tracking the memory usage of the algorithm Zhu and Nilsson (2019). As a result, the guppy3 package should be installed to be able to use NNLSSPAR.

A description of the parameters of NNLSSPAR library is given below.

Parameters:

1.
A (input) = $m\, \times \,n$ matrix storing the values of the independent variables where m must be greater than n
2.
b (input) = $m\, \times \, 1$ vector storing the values of the dependent variables
3.
s (input) = An integer value indicating sparsity level (maximum number of nonzero elements in the solution)
4.
out (input, by default it is 1) = An integer value, that is a parameter controlling the detail level of output:
1. (a)
  out = 0, Nothing will be printed
2. (b)
  out = 1, The results and the hardware usage of the algorithm will be printed
3. (c)
  out = 2, At every iteration, the lower bound of the current node and the number of nodes searched so far will be printed. After the algorithm terminates NNLSSPAR will report the results and the hardware usage. Although it is good to observe the progress of the algorithm, it should be noted here that it slows down the algorithm significantly.
5.
C (input, by default it is $\emptyset$) = Array of integers, storing the indices of chosen variables
6.
many (input, by default it is 4) = An integer specifying number of best subsets to be found
7.
residual_squared (output) = a list of length many ( $s * many$ if all subsets function is used) containing residuals corresponding to subset found for some sparsity level. This only shows the residuals of the system after QR decomposition. True residual squared of the systems are residual_squared + permanent_residual
8.
indexes (output) = a list of length many ($s * many$ if all subsets function is used) containing indices of the independent variables that make the subsets for some sparsity level
9.
coefficients (output) = a list of length many ($s * many$ if all subsets function is used) containing values of the least squares solution of each subset for some sparsity level
10.
permanent_residual (output) = a float that records the residual squared of the system after orthogonal transform using QR decomposition of the matrix A
11.
cpu (output) = a float showing the CPU time of the algorithm in seconds. CPU time of the algorithm is print if out is not 0
12.
memory (output) = a float showing the total memory usage of the algorithm in bytes. Detailed table of memory usage is print if out is not 0
13.
real (output) = a float showing the wall time of the algorithm in seconds.
14.
nodes (hidden) = a list of integers showing number of visited nodes in the graph. It is equal to the number of branchings done due to best first search if only one solution is found. For each sparsity level it will be reported separately.
15.
rem_qsize (hidden) = a list of integers showing the number of unvisited nodes in the graph. This will be equal to nodes if only one solution is found. For each sparsity level it will be reported separately.

1.2 An example

Assume we are trying to solve an instance of a problem where A, b and s are specified for input into NNLSSPAR in Python syntax as follows:

In the output, the list of values of the variables are in the order given by the indexes. Hence, 6.64783762 is the value of fourth independent variable, 1.62440717 is the value of third independent variable, and so on.

1.3 A special case

If the solution of multiple subsets for each sparsity level $i = 1,2, \ldots , s-1 ,s$ is desired, then the following function should be called:

Rights and permissions

Reprints and permissions

About this article

Cite this article

Aktaş, F.S., Ekmekcioglu, Ö. & Pinar, M.Ç. Provably optimal sparse solutions to overdetermined linear systems with non-negativity constraints in a least-squares sense by implicit enumeration. Optim Eng 22, 2505–2535 (2021). https://doi.org/10.1007/s11081-021-09676-2

Download citation

Received: 15 January 2021
Revised: 10 August 2021
Accepted: 10 August 2021
Published: 28 August 2021
Issue Date: December 2021
DOI: https://doi.org/10.1007/s11081-021-09676-2

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Provably optimal sparse solutions to overdetermined linear systems with non-negativity constraints in a least-squares sense by implicit enumeration

Abstract

Access this article

Similar content being viewed by others