Abstract
This study presents a comparison between sixteen filter ranking methods applied to a real air pollution problem. Adaptations of the Minimum-Redundancy-Maximum-Relevance (mRMR) algorithm to use the Spearman's rank correlation, the kernel canonical correlation analysis, the Hilbert–Schmidt independence criterion, correntropy, the Pearson's correlation and the distance correlation are included among them. These methods were compared by estimating the hourly NO2 concentrations at three monitoring stations located in the Bay of Algeciras (Spain). The estimation models were generated using Bayesian regularized artificial neural networks. Different estimation cases were tested for each ranking method. Finally, results were statistically compared to determine which filter ranking strategy produced the best performing model in each case. The proposed estimation scenarios showed how mRMR methods had better results than all the remaining methods when a small number of features was selected. However, their advantage was not so evident when the number of selected features increased. Results from the proposed mRMR methods were promising, especially in the case of the distance correlation mRMR, the kernel canonical correlation analysis mRMR and the Spearman's rank correlation mRMR. These ranking methods performed better than the original mRMR algorithm that employs mutual information internally.
Similar content being viewed by others
References
Akaho S (2001) A kernel method for canonical correlation analysis. In: Proceedings of the international meeting of the psychometric society (IMPS 2001). Osaka
Albanese D, Filosi M, Visintainer R et al (2013) Minerva and minepy: a C engine for the MINE suite and its R, Python and MATLAB wrappers. Bioinformatics 29:407–408. https://doi.org/10.1093/bioinformatics/bts707
Bach FR, Jordan MI (2002) Kernel independent component analysis. J Mach Learn Res 3:1–48. https://doi.org/10.1016/B978-044452701-1.00045-4
Blum AL, Langley P (1997) Selection of relevant features and examples in machine learning. Artif Intell 97:245–271. https://doi.org/10.1016/S0004-3702(97)00063-5
Bolboaca SD, Jäntschi L (2006) Pearson versus Spearman, Kendall’s Tau correlation analysis on structure-activity relationships of biologic active compounds. Leonardo J Sci 5:179–200
Bolón-Canedo V, Sánchez-Maroño N, Alonso-Betanzos A (2013) A review of feature selection methods on synthetic data. Knowl Inf Syst 34:483–519. https://doi.org/10.1007/s10115-012-0487-8
Brunelli U, Piazza V, Pignato L et al (2007) Two-days ahead prediction of daily maximum concentrations of SO2, O3, PM10, NO2, CO in the urban area of Palermo, Italy. Atmos Environ 41:2967–2995. https://doi.org/10.1016/j.atmosenv.2006.12.013
Burden F, Winkler D (2008) Bayesian regularization of neural networks. In: Livingstone DJ (ed) Artificial neural networks. Methods in molecular biology, vol 458. Humana Press. pp 23–42
Darbellay GA, Vajda I (1999) Estimation of the Information by an adaptive partitioning of the observation space. IEEE Trans Inf Theor 45:1315–1321. https://doi.org/10.1109/18.746793
Duch W (2006) Filter methods. In: Guyon I, Nikravesh M, Gunn S, Zadeh LA (eds) Feature extraction. Studies in fuzziness and soft computing. Springer. pp 89–117
Elangasinghe MA, Singhal N, Dirks KN, Salmond JA (2014) Development of an ANN–based air pollution forecasting system with explicit knowledge through sensitivity analysis. Atmos Pollut Res 5:696–708. https://doi.org/10.5094/APR.2014.079
Faustini A, Rapp R, Forastiere F (2014) Nitrogen dioxide and mortality: review and meta-analysis of long-term studies. Eur Respir J 44:744–753. https://doi.org/10.1183/09031936.00114713
Foresee FD, Hagan MT, Dan Foresee F, Hagan MT (1997) Gauss–Newton approximation to Bayesian learning. In: Proceedings of international conference on neural networks (ICNN'97). Houston, TX, USA, pp 1930–1935
Friedman M (1937) The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J Am Stat Assoc 32:675–701. https://doi.org/10.1080/01621459.1937.10503522
González-Enrique J, Ruiz-Aguilar JJ, Moscoso-López JA et al (2019a) A genetic algorithm and neural network stacking ensemble approach to improve NO2 level estimations. In: Rojas I, Joya G, Catala A (eds) Advances in computational intelligence. IWANN 2019. lecture notes in computer science. Springer, Cham, pp 856–867
Gonzalez-Enrique J, Turias IJ, Ruiz-Aguilar JJ et al (2019) Estimation of NO2 concentration values in a monitoring sensor network using a fusion approach. Fresenius Environ Bull 28:681–686
González-Enrique J, Turias IJ, Ruiz-Aguilar JJ et al (2019b) Spatial and meteorological relevance in NO2 estimations. A case study in the Bay of Algeciras (Spain). Stoch Environ Res Risk Assess 33:801–815. https://doi.org/10.1007/s00477-018-01644-0
Gretton A, Bousquet O, Smola A, Schölkopf B (2005) Measuring statistical dependence with Hilbert–-Schmidt norms. In: Jain S, Simon HU, Tomita E (eds) International conference on algorithmic learning theory (ALT). Lecture notes in computer science. Springer, Berlin Heidelberg, pp 63–78
Gunduz A, Principe JC (2009) Correntropy as a novel measure for nonlinearity tests. Signal Process 89:14–23. https://doi.org/10.1016/j.sigpro.2008.07.005
Guyon I, Elisseeff A (2003) An Introduction to Variable and Feature Selection. J Mach Learn Res 3:1157–1182. https://doi.org/10.1016/j.aca.2011.07.027
Hochberg Y, Tamhane AC (1987) Multiple comparison procedures. Wiley, New York, NY, USA
Hoi KI, Yuen KV, Mok KM (2009) Prediction of daily averaged PM10 concentrations by statistical time-varying model. Atmos Environ 43:2579–2581. https://doi.org/10.1016/j.atmosenv.2009.02.020
Hornik K, Stinchcombe M, White H (1989) Multilayer feedforward networks are universal approximators. Neural Netw 2:359–366. https://doi.org/10.1016/0893-6080(89)90020-8
Hotelling H (1936) Relations between two sets of variates. Biometrika 28:321–377. https://doi.org/10.1093/biomet/28.3-4.321
John GH, Kohavi R, Pfleger K (1994) Irrelevant Features and the subset selection problem. In: Cohen WW, Hirsh H (eds) Machine learning: proceedings of the eleventh international conference. Morgan Kaufmann Publishers, San Francisco, pp 121–129
Jović A, Brkić K, Bogunović N (2015) A review of feature selection methods with applications. In: 2015 38th International convention on information and communication technology, electronics and microelectronics (MIPRO). pp 1200–1205
Kira K, Rendell LA (1992) A practical approach to feature selection. In: Proceedings of the ninth international workshop on machine learning. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, pp 249–256
Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97:273–324. https://doi.org/10.1016/S0004-3702(97)00043-X
Kojadinovic I, Wottka T (2000) Comparison between a filter and a wrapper approach to variable subset selection in regression problems. ESIT 2000—European symposium on intelligent techniques. Aachen, Germany, pp 14–15
Kononenko I (1994) Estimating attributes: analysis and extensions of RELIEF. In: Bergadano F, De Raedt L (eds) Machine learning: ECML-94. Springer, Berlin Heidelberg, Berlin, Heidelberg, pp 171–182
Kumar V, Minz S (2014) Feature selection: a literature review. Smart Comput Rev 4:211–229. https://doi.org/10.6029/smartcr.2014.03.007
Kurtenbach R, Kleffmann J, Niedojadlo A, Wiesen P (2012) Primary NO2 emissions and their impact on air quality in traffic environments in Germany. Environ Sci Eur 24:21. https://doi.org/10.1186/2190-4715-24-21
Latza U, Gerdes S, Baur X (2009) Effects of nitrogen dioxide on human health: systematic review of experimental and epidemiological studies conducted between 2002 and 2006. Int J Hyg Environ Health 212:271–287. https://doi.org/10.1016/j.ijheh.2008.06.003
Liu H, Yu L (2005) Toward Integrating Feature Selection Algorithms for Classification and Clustering. IEEE Trans Knowl Data Eng 17:491–502. https://doi.org/10.1109/TKDE.2005.66
Liu W, Pokharel PP, Principe JC (2007) Correntropy: Properties and Applications in Non-Gaussian Signal Processing
MacKay DJC (1992) Bayesian interpolation. Neural Comput 4:415–447. https://doi.org/10.1162/neco.1992.4.3.415
Mohammadi M, Sharifi Noghabi H, Abed Hodtani G et al (2016) Robust and stable gene selection via Maximum-minimum correntropy criterion. Genomics 107:83–87. https://doi.org/10.1016/j.ygeno.2015.12.006
Moscoso-López JA, Urda D, González-Enrique J et al (2021) Hourly air quality index (AQI) forecasting using machine learning methods. In: Herrero Á, Cambra C, Urda D et al (eds) 15th International conference on soft computing models in industrial and environmental applications (SOCO 2020), Advances in intelligent systems and computing. Springer, Cham, pp 123–132
Pearson K (1895) Note on regression and inheritance in the case of two parents. Proc R Soc Lond 58:240–242
Peng H, Long F, Ding C (2005) Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27:1226–1238. https://doi.org/10.1109/TPAMI.2005.159
Porkodi R (2014) Comparison of filter based feature selection algorithms: an overview. Int J Innov Res Technol Sci 2:108–113
Ramírez-Gallego S, Lastra I, Martínez-Rego D et al (2017) Fast-mRMR: fast minimum redundancy maximum relevance algorithm for high-dimensional big data. Int J Intell Syst 32:134–152. https://doi.org/10.1002/int.21833
Rao M, Seth S, Xu J et al (2011) A test of independence based on a generalized correlation function. Signal Process 91:15–27. https://doi.org/10.1016/j.sigpro.2010.06.002
Reshef DN, Reshef YA, Finucane HK et al (2011) Detecting novel associations in large data sets. Science 80(334):1518–1524. https://doi.org/10.1126/science.1205438
Robnik-Šikonja M, Kononenko I (1997) An adaptation of relief for attribute estimation in regression. In: Fisher DH (ed) Machine learning: proceedings of the fourteenth international conference (ICML’97). Kaufmann, Morgan, pp 296–304
Ruiz-Aguilar JJ, Turias I, González-Enrique J et al (2020) A permutation entropy-based EMD–ANN forecasting ensemble approach for wind speed prediction. Neural Comput Appl. https://doi.org/10.1007/s00521-020-05141-w
Rumelhart DE, Hinton GE, Williams RJ (1986) Learning internal representations by error propagation. In: California U (ed) Parallel distributed processing: explorations in the microstructure of cognition, vol 1. MIT Press, Cambridge, MA, USA, pp 318–362
Saeys Y, Inza I, Larrañaga P (2007) A review of feature selection techniques in bioinformatics. Bioinformatics 23:2507–2517. https://doi.org/10.1093/bioinformatics/btm344
Sánchez-Maroño N, Alonso-Betanzos A, Tombilla-Sanromán M (2007) Filter methods for feature selection—a comparative study. In: Yin H, Tino P, Corchado E et al (eds) Intelligent data engineering and automated learning—IDEAL 2007. Lecture notes in computer science. Springer, Berlin Heidelberg, Berlin, Heidelberg, pp 178–187
Santamaría I, Pokharel PP, Principe JC (2006) Generalized correlation function: definition, properties, and application to blind equalization. IEEE Trans Signal Process 54:2187–2197. https://doi.org/10.1109/TSP.2006.872524
Shannon CE (1948) A mathematical theory of communication. Bell Syst Tech J 27:379–423. https://doi.org/10.1002/j.1538-7305.1948.tb01338.x
Sheikhpour R, Sarram MA, Gharaghani S, Chahooki MAZ (2017) A survey on semi-supervised feature selection methods. Pattern Recognit 64:141–158. https://doi.org/10.1016/j.patcog.2016.11.003
Shima M, Adachi M (2000) Effect of outdoor and indoor nitrogen dioxide on respiratory symptoms in schoolchildren. Int J Epidemiol 29:862–870. https://doi.org/10.1093/ije/29.5.862
Solomatine D, See LM, Abrahart RJ (2008) Data-driven modelling: concepts, approaches and experiences. In: Abrahart RJ, See LM, Solomatine DP (eds) Practical hydroinformatics: computational intelligence and technological developments in water applications. Springer, Berlin Heidelberg, Berlin, Heidelberg, pp 17–30
Szabó Z (2014) Information theoretical estimators toolbox. J Mach Learn Res 15:283–287
Szabó Z, Póczos B, Lőrincz A (2007) Undercomplete blind subspace deconvolution. J Mach Learn Res 8:1063–1095
Székely GJ, Rizzo ML, Bakirov NK (2007) Measuring and testing dependence by correlation of distances. Ann Stat 35:2769–2794. https://doi.org/10.1214/009053607000000505
Tang J, Alelyani S, Liu H (2014) Feature selection for classification: a review. Data classification. CRC Press, New York, pp 37–64
Tian X, He J, Shi Y (2020) Statistical dependence test with Hilbert–SCHMIDT independence criterion. J Phys Conf Ser 1601:032008. https://doi.org/10.1088/1742-6596/1601/3/032008
Urbanowicz RJ, Meeker M, La Cava W et al (2018) Relief-based feature selection: introduction and review. J Biomed Inform 85:189–203. https://doi.org/10.1016/j.jbi.2018.07.014
Van Roode S, Ruiz-Aguilar JJ, González-Enrique J, Turias IJ (2019) An artificial neural network ensemble approach to generate air pollution maps. Environ Monit Assess 191:727. https://doi.org/10.1007/s10661-019-7901-6
Van Strien RT, Gent JF, Belanger K et al (2004) Exposure to NO2 and nitrous acid and respiratory symptoms in the first year of life. Epidemiology 15:471–478. https://doi.org/10.1097/01.ede.0000129511.61698.d8
Vert JP, Schölkopf B, Tsuda K (2004) A primer on kernel methods. Kernel methods in computational biology. MITP, Cham, pp 35–70
Walrath R (2011) Standard scores. In: Goldstein S, Naglieri JA (eds) Encyclopedia of child behavior and development. Springer, US, Boston, MA, pp 1435–1436
Willmott CJ (1981) On the validation of models. Phys Geogr 2:184–194. https://doi.org/10.1080/02723646.1981.10642213
Wu B, Chen C, Kechadi TM, Sun L (2013) A comparative evaluation of filter-based feature selection methods for hyper-spectral band selection. Int J Remote Sens 34:7974–7990. https://doi.org/10.1080/01431161.2013.827815
Yu L, Liu H (2003) Feature selection for high-dimensional data: a fast correlation-based filter solution. In: Fawcett T, Mishra M (eds) Proceedings of the twentieth international conference on machine learning (ICML-2003). AAAI Press, Washington DC, USA, pp 856–863
Yu L, Liu H (2004) Efficient feature selection via analysis of relevance and redundancy. J Mach Learn Res 5:1205–1224
Zhang Y, Jia S, Huang H et al (2014) A novel algorithm for the precise calculation of the maximal information coefficient. Sci Rep 4:1–5. https://doi.org/10.1038/srep06662
Acknowledgements
This work is supported through grant RTI2018-098160-B-I00 from MICINN-SPAIN. Monitoring data has been kindly provided by the Environmental Agency of the Andalusian Government.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors have no conflicts of interest to declare that are relevant to the content of this article.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
González-Enrique, J., Ruiz-Aguilar, J.J., Moscoso-López, J.A. et al. A comparison of ranking filter methods applied to the estimation of NO2 concentrations in the Bay of Algeciras (Spain). Stoch Environ Res Risk Assess 35, 1999–2019 (2021). https://doi.org/10.1007/s00477-021-01992-4
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00477-021-01992-4