Abstract
Choosing a subset of representative items from a set of alternatives is an important problem in many scientific fields such as environmental science and statistics. For most practical problems, however, a computationally efficient solution method is not known to exist. While this problem has attracted a significant amount of attention, the majority of specifically designed algorithms do not scale well with respect to the problem size or do not provide a usable open-source package. In this study, we show that any global continuous optimization technique can be used for solving the representative subset selection problem. The latter is achieved by designing a simple transformation which embeds the problem’s discrete space into a larger continuous space. The proposed methodology is applied to design problems in environmental and statistical domains. We evaluate the proposed method using several open-source global optimization packages, and show that this technique compares favorably with existing direct methods.
Similar content being viewed by others
References
Ahmed, N.A., & Gokhale, D.V. (2006). Entropy expressions and their estimators for multivariate distributions. IEEE Transactions on Information Theory, 35(3), 688–692.
Allcott, H., & Gentzkow, M. (2017). Social media and fake news in the 2016 election. Journal of Economic Perspectives.
Bäck, T. (1996). Evolutionary algorithms in theory and practice: evolution strategies, evolutionary programming, genetic algorithms. New York: Oxford University Press, Inc.
von Brömssen, C., Fölster, J., Futter, M., & McEwan, K. (2018). Statistical models for evaluating suspected artefacts in long-term environmental monitoring data. Environmental Monitoring and Assessment, 190(9), 558.
Bruns, D.A., Wiersma, G.B., & Rykiel, E.J. Jr. (1991). Ecosystem monitoring at global baseline sites. Environmental Monitoring and Assessment, 17(1), 3–31.
Chan, C.K., & Yao, X. (2008). Air pollution in mega cities in China. Atmospheric Environment, 42(1), 1–42.
Chao, Q., Yu, Y., & Zhou, Z. (2015). Subset selection by Pareto optimization. In Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M., & Garnett, R. (Eds.) Advances in neural information processing systems 28, Curran Associates, Inc. (pp. 1774–1782).
Chun-Wa, K., Jon, L., & Queyranne, M. (1995). An exact algorithm for maximum entropy sampling. Operations Research, 43(4), 684–691.
Cormen, T.H., Leiserson, C.E., Rivest, R.L., & Stein, C. (2001). Introduction to algorithms, Second Edition. The MIT Press and McGraw-Hill Book Company, Cambridge, Massachusetts London, England.
Geoffrey, D., Stepahie, M., & Marco, A. (1997). Adaptive greedy approximations. Constructive Approximation, 13(1), 57–98.
Goos, P., & Bradley, J. (2011). Optimal design of experiments: a case study approach. West Sussex: Wiley.
Kennedy, J., & Mendes, R. (2002). Population structure and particle swarm performance. In Proceedings of the 2002 Congress on Evolutionary Computation. CEC’02 (Cat. No.02TH8600), (Vol. 2 pp. 1671–1676).
Le, N.D., & Zidek, J.V. (2006). Statistical analysis of environmental space-time processes. New York: Springer.
Melanie, M. (1998). An introduction to genetic algorithms. Cambridge: The MIT Press .
Mi, Z., Meng, J., Guan, D., Shan, Y., Liu, Z., Wang, Y., Feng, K., & Wei, Y. (2017). Pattern changes in determinants of Chinese emissions. Environmental Research Letters, 12(7), 074003.
Mullen, K. (2014). Continuous global optimization in R. Journal of Statistical Software. Articles, 60 (6), 1–45.
Natarajan, B. (1995). Sparse approximate solutions to linear systems. SIAM Journal on Computing, 24(2), 227– 234.
Park, S.S., Jeong, J.U., & Schauer, J.J. (2013). Sources and their contribution of particulate water-soluble organic carbon observed during one year at a traffic dominated site. Atmospheric Environment, 77, 348?-357.
Price, K., Storn, R.M., & Lampinen, J.A. (2005). Differential evolution: a practical approach to global optimization (Natural Computing Series). Berlin: Springer.
Ramanathan, V., & Carmichael, G. (2008). Global and regional climate changes due to black carbon. Nature Geoscience, 1, 221–227.
Ramiro, R., Ferreira, M., & Schmidt, A.M. (2010). Stochastic search algorithms for optimal design of monitoring networks. Environmetrics, 21(1), 102–112.
Roy, F.B. (2000). Physics from Fisher information: a unification. American Journal of Physics, 68 (11), 1064– 1065.
Rubinstein, R.Y., & Kroese, D.P. (2017). Simulation and the Monte Carlo method, 3rd edn. New York: Wiley.
Shannon, C.E. (1948). A mathematical theory of communication. The Bell System Technical Journal, 27(3), 379–423.
Silverman, C., & Singer-Vine, J. (2016). Most Americans who see fake news believe it, new survey says. BuzzFeed News.
Stokes, J.R., & Horvath, A. (2010). Supply-chain environmental effects of wastewater utilities. Environmental Research Letters, 5(1), 014015.
Wiersma, G. (1984). Integrated global background monitoring network. In Symposium on research and monitoring in circumpolar biosphere reserves, Alberta, Canada, 27 Aug 1984, United States.
Wolters, M. (2015). A genetic algorithm for selection of fixed-size subsets with application to design problems. Journal of Statistical Software Code Snippets, 68(1), 1–18.
Yu, B., & Yuan, B. (1992). A dynamic selection algorithm for globally optimal subsets. Engineering Applications of Artificial Intelligence, 5(5), 457–462.
Acknowledgments
I am thoroughly grateful to the anonymous reviewers and the editor for their valuable and constructive remarks and suggestions.
Funding
This work was supported by the Australian Research Council Centre of Excellence for Mathematical & Statistical Frontiers, under CE140100049 grant number.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
This article does not contain any studies with human or animal subjects performed by any of the authors.
Conflict of interest
The author declares that there are no conflicts of interest.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Vaisman, R. Subset selection via continuous optimization with applications to network design. Environ Monit Assess 192, 361 (2020). https://doi.org/10.1007/s10661-019-7938-6
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10661-019-7938-6