Skip to main content
Log in

Deep Kernel machines: a survey

  • Survey
  • Published:
Pattern Analysis and Applications Aims and scope Submit manuscript

Abstract

The emergence of deep learning frameworks paves the way for achieving higher-level data abstractions and possess the potential in consolidating both supervised and unsupervised learning paradigms. Researchers have made many successful explorations in the field of deep learning, with applications in the fields of face recognition, text mining, language translation, image prediction, and action recognition. Kernel machines act as a bridge between the linearity and nonlinearity for many machine learning algorithms such as support vector machines, extreme learning machines, and core vector machines. These Kernel machines play a vital role in mapping the data in the input space to a Kernel-induced high-dimensional feature space to obtain a better distribution of the data. In this Kernel-induced high-dimensional feature space, the distribution of data points will be more amenable to the classification problem under consideration. The Kernel trick facilitates in transforming the machine learning algorithms that require only inner product computations between the data vectors into a Kernel-based approach by selecting an appropriate Kernel function. In Kernel-based approaches, the Kernel functions can thus be utilized for accomplishing the inner product computations between the transformed data vectors in an implicitly defined Kernel-induced feature space. Unlike neural networks, the Kernel machines guarantee structural risk minimization and global optimal solutions. Also, the Kernel machines exhibit capabilities such as theoretical tractability and excellent performance in practical applications. These attempts motivated the researchers towards utilizing the emerging trends of deep learning with Kernel methods for building deep Kernel machines. Researchers integrate Kernel methods and deep learning networks for maintaining their advantages and make up their limitations, then apply the deep Kernel learning approaches for improving the performance of the learning algorithm in different applications. Different ways of building deep Kernel machines by integrating the Kernel methods and deep learning architectures include utilizing Kernel machines as the final classifier of deep learning networks, Kernelization in deep neural networks for better feature enrichment, and building deep Kernel machines by utilizing deep or multiple Kernels in different tasks. This survey attempts to provide an overview of different approaches in building several deep Kernel learning architectures for enhancing the learning algorithm properties and their performance in practical applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Abd-Elsalam RO, Hassan YF, Saleh MW (2017) New deep Kernel learning based models for image classification. Int J Adv Comput Sci Appl 8(7):407–411

    Google Scholar 

  2. Ackley DH, Hinton GE, Sejnowski TJ (1985) A learning algorithm for Boltzmann machines. Cogn Sci 9(1):147–169

    Google Scholar 

  3. Afzal A, Asharaf S (2017) Deep Kernel learning in core vector machines. Pattern Anal Appl 21:721

    MathSciNet  Google Scholar 

  4. Afzal A, Asharaf S (2018) Deep multiple multilayer Kernel learning in core vector machines. Expert Syst Appl 96:149–156

    Google Scholar 

  5. Al-Shedivat M, Wilson AG, Saatchi Y, Hu Z, Xing EP (2017) Learning scalable deep Kernels with recurrent structure. J Mach Learn Res 18(82):1–37

    MathSciNet  MATH  Google Scholar 

  6. Anselmi F, Rosasco L, Tan C, Poggio T (2015) Deep convolutional networks are hierarchical Kernel machines. arXiv preprint arXiv:150801084

  7. Anwar S, Hwang K, Sung W (2016) Learning separable fixed-point Kernels for deep convolutional neural networks. In: Acoustics, speech and signal processing (ICASSP), 2016 IEEE international conference on, IEEE, pp 1065–1069

  8. Aronszajn N (1950) Theory of reproducing Kernels. Trans Am Math Soc 68(3):337–404

    MathSciNet  MATH  Google Scholar 

  9. Bach FR, Lanckriet GR, Jordan MI (2004) Multiple Kernel learning, conic duality, and the SMO algorithm. In: Proceedings of the twenty-first international conference on Machine learning, ACM, p 6

  10. Badoiu M, Clarkson KL (2003) Smaller core-sets for balls. In: Proceedings of the fourteenth annual ACM-SIAM symposium on discrete algorithms, society for industrial and applied mathematics, pp 801–802

  11. Bădoiu M, Clarkson KL (2008) Optimal core-sets for balls. Comput Geom 40(1):14–22

    MathSciNet  MATH  Google Scholar 

  12. Bādoiu M, Har-Peled S, Indyk P (2002) Approximate clustering via core-sets. In: Proceedings of the thiry-fourth annual ACM symposium on theory of computing, ACM, pp 250–257

  13. Baum EB (1988) On the capabilities of multilayer perceptrons. J Complex 4(3):193–215

    MathSciNet  MATH  Google Scholar 

  14. Belue LM, Bauer KW (1995) Determining input features for multilayer perceptrons. Neurocomputing 7(2):111–121

    Google Scholar 

  15. Bengio Y, Lamblin P, Popovici D, Larochelle H (2007a) Greedy layer-wise training of deep networks. In: Advances in neural information processing systems, pp 153–160

  16. Bengio Y, LeCun Y et al (2007b) Scaling learning algorithms towards AI. Large-Scale Kernel Mach 34(5):1–41

    Google Scholar 

  17. Bengio Y et al (2009) Learning deep architectures for AI. Found Trends Mach Learn 2(1):1–127

    MathSciNet  MATH  Google Scholar 

  18. Blaschko MB, Lampert CH (2008) Learning to localize objects with structured output regression. In: European conference on computer vision, Springer, Berlin pp 2–15

  19. Brahma PP, Wu D, She Y (2015) Why deep learning works: a manifold disentanglement perspective. IEEE Trans Neural Netw Learn Syst 27(10):1997–2008

    MathSciNet  Google Scholar 

  20. Bu S, Liu Z, Han J, Wu J, Ji R (2014) Learning high-level feature by deep belief networks for 3-D model retrieval and recognition. IEEE Trans Multimedia 16(8):2154–2167

    Google Scholar 

  21. Burges CJ (1998) A tutorial on support vector machines for pattern recognition. Data Min Knowl Disc 2(2):121–167

    Google Scholar 

  22. Chen D, Jacob L, Mairal J (2018) Biological sequence modeling with convolutional Kernel networks. bioRxiv p 217257

  23. Cheng CC, Kingsbury B (2011) Arccosine Kernels: acoustic modeling with infinite neural networks. In: Acoustics, speech and signal processing (ICASSP), 2011 IEEE international conference on, IEEE, pp 5200–5203

  24. Cho Y, Saul LK (2009) Kernel methods for deep learning. In: Advances in neural information processing systems, pp 342–350

  25. Cho Y, Saul LK (2010) Large-margin classification in infinite neural networks. Neural Comput 22(10):2678–2697

    MATH  Google Scholar 

  26. Chung J, Gulcehre C, Cho K, Bengio Y (2014) Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:14123555

  27. Chung J, Gulcehre C, Cho K, Bengio Y (2015) Gated feedback recurrent neural networks. In: International conference on machine learning, pp 2067–2075

  28. Collier M, Beel J (2018) Implementing neural turing machines. In: International conference on artificial neural networks, Springer, Berlin pp 94–104

  29. Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297

    MATH  Google Scholar 

  30. Cover T, Hart P (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory 13(1):21–27

    MATH  Google Scholar 

  31. Cover TM (1965) Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition. IEEE Trans Electr Comput 3:326–334

    MATH  Google Scholar 

  32. Cox TF, Cox MA (2000) Multidimensional scaling. Chapman and hall/CRC Press, Boca Raton

    MATH  Google Scholar 

  33. De R, Hinton G, Williams R (1986) Learning internal representations by back-propagating errors. Nature 323:533–536

    MATH  Google Scholar 

  34. De Amorim RC, Mirkin B (2012) Minkowski metric, feature weighting and anomalous cluster initializing in k-means clustering. Pattern Recogn 45(3):1061–1075

    Google Scholar 

  35. Dean J, Corrado G, Monga R, Chen K, Devin M, Mao M, Senior A, Tucker P, Yang K, Le QV, et al. (2012) Large scale distributed deep networks. In: Advances in neural information processing systems, pp 1223–1231

  36. Deng L (2012) Three classes of deep learning architectures and their applications: a tutorial survey. APSIPA Trans Signal Inf Process. https://www.microsoft.com/en-us/research/publication/three-classes-of-deep-learning-architectures-and-their-applications-a-tutorial-survey/

  37. Deng L (2014) A tutorial survey of architectures, algorithms, and applications for deep learning. APSIPA Trans Signal Inform Process 3:e2

    Google Scholar 

  38. Dey R, Salemt FM (2017) Gate-variants of gated recurrent unit (GRU) neural networks. In: 2017 IEEE 60th international midwest symposium on circuits and systems (MWSCAS), IEEE, pp 1597–1600

  39. Fletcher R (2013) Practical methods of optimization. Wiley, New Jersey

    MATH  Google Scholar 

  40. Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587

  41. Graves A (2012) Long short-term memory. In: Supervised sequence labelling with recurrent neural networks, Springer, Berlin pp 37–45

  42. Graves A, Wayne G, Danihelka I (2014) Neural turing machines. arXiv preprint arXiv:14105401

  43. Greve RB, Jacobsen EJ, Risi S (2015) Evolving neural turing machines. In: Neural information processing systems: reasoning, attention, memory workshop

  44. Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182

    MATH  Google Scholar 

  45. Guyon I, Boser B, Vapnik V (1993) Automatic capacity tuning of very large vc-dimension classifiers. In: Advances in neural information processing systems, pp 147–155

  46. Hearst MA, Dumais ST, Osuna E, Platt J, Scholkopf B (1998) Support vector machines. IEEE Intell Syst their Appl 13(4):18–28

    Google Scholar 

  47. Hinton GE (2002) Training products of experts by minimizing contrastive divergence. Neural Comput 14(8):1771–1800

    MATH  Google Scholar 

  48. Hinton GE, Salakhutdinov RR (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504–507

    MathSciNet  MATH  Google Scholar 

  49. Hinton GE, Salakhutdinov RR (2008) Using deep belief nets to learn covariance Kernels for gaussian processes. In: Advances in neural information processing systems, pp 1249–1256

  50. Hinton GE, Osindero S, Teh YW (2006) A fast learning algorithm for deep belief nets. Neural Comput 18(7):1527–1554

    MathSciNet  MATH  Google Scholar 

  51. Hochreiter S, Schmidhuber J (1997a) Long short-term memory. Neural Comput 9(8):1735–1780

    Google Scholar 

  52. Hochreiter S, Schmidhuber J (1997b) LSTM can solve hard long time lag problems. In: Advances in neural information processing systems, pp 473–479

  53. Hofmann M (2006) Support vector machines-Kernels and the Kernel trick. Notes 26(3):1–16

    Google Scholar 

  54. Hofmann T, Schölkopf B, Smola AJ (2008) Kernel methods in machine learning. Ann Stat 36:1171–1220

    MathSciNet  MATH  Google Scholar 

  55. Huang GB, Lee H, Learned-Miller E (2012) Learning hierarchical representations for face verification with convolutional deep belief networks. In: Computer vision and pattern recognition (CVPR), 2012 IEEE conference on, IEEE, pp 2518–2525

  56. Huang J, Yuen PC, Chen WS, Lai JH (2007) Choosing parameters of Kernel subspace LDA for recognition of face images under pose and illumination variations. IEEE Trans Syst Man Cybern Part B (Cybernetics) 37(4):847–862

    Google Scholar 

  57. Jones DR (2001) A taxonomy of global optimization methods based on response surfaces. J Global Optim 21(4):345–383

    MathSciNet  MATH  Google Scholar 

  58. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105

  59. Kumar A, Irsoy O, Ondruska P, Iyyer M, Bradbury J, Gulrajani I, Zhong V, Paulus R, Socher R (2016) Ask me anything: dynamic memory networks for natural language processing. In: International conference on machine learning, pp 1378–1387

  60. Kumar P, Mitchell JS, Yildirim EA (2003) Approximate minimum enclosing balls in high dimensions using core-sets. J Exp Algorithmics (JEA) 8:1–1

    MathSciNet  MATH  Google Scholar 

  61. Larochelle H, Bengio Y, Louradour J, Lamblin P (2009) Exploring strategies for training deep neural networks. J Mach Learn Res 10:1–40

    MATH  Google Scholar 

  62. Le L, Hao J, Xie Y, Priestley J (2016) Deep Kernel: learning Kernel function from data using deep neural network. In: Proceedings of the 3rd IEEE/ACM international conference on big data computing, applications and technologies, pp 1–7

  63. Le QV (2013) Building high-level features using large scale unsupervised learning. In: Acoustics, speech and signal processing (ICASSP), 2013 IEEE international conference on, IEEE, pp 8595–8598

  64. LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324

    Google Scholar 

  65. Lee H, Grosse R, Ranganath R, Ng AY (2009) Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In: Proceedings of the 26th annual international conference on machine learning, ACM, pp 609–616

  66. Lu Z, May A, Liu K, Garakani AB, Guo D, Bellet A, Fan L, Collins M, Kingsbury B, Picheny M, et al. (2014) How to scale up Kernel methods to be as good as deep neural nets. arXiv preprint arXiv:14114000

  67. Mahé P, Ueda N, Akutsu T, Perret JL, Vert JP (2004) Extensions of marginalized graph Kernels. In: Proceedings of the twenty-first international conference on machine learning, p 70

  68. Mairal J (2016) End-to-end Kernel learning with supervised convolutional Kernel networks. In: Advances in neural information processing systems, pp 1399–1407

  69. Mairal J, Koniusz P, Harchaoui Z, Schmid C (2014) Convolutional Kernel networks. In: Advances in neural information processing systems, pp 2627–2635

  70. Malhotra P, TV V, Vig L, Agarwal P, Shroff G (2017) Timenet: Pre-trained deep recurrent neural network for time series classification. arXiv preprint arXiv:170608838

  71. Mockus J, Tiesis V, Zilinskas A (1978) The application of Bayesian methods for seeking the extremum. Towards Global Optim 2(117–129):2

    MATH  Google Scholar 

  72. Mohammadnia-Qaraei MR, Monsefi R, Ghiasi-Shirazi K (2018) Convolutional Kernel networks based on a convex combination of cosine Kernels. Pattern Recogn Lett 116:127–134

    Google Scholar 

  73. Montavon G, Müller KR (2012) Learning feature hierarchies with centered deep Boltzmann machines. arXiv preprint arXiv:12033783

  74. Montavon G, Müller KR, Braun ML (2010) Layer-wise analysis of deep networks with Gaussian Kernels. In: Advances in neural information processing systems, pp 1678–1686

  75. Montavon G, Braun ML, Müller KR (2011) Kernel analysis of deep networks. J Mach Learn Res 12:2563–2581

    MathSciNet  MATH  Google Scholar 

  76. Mu T, Nandi AK (2009) Multiclass classification based on extended support vector data description. IEEE Trans Syst Man Cybern Part B (Cybernetics) 39(5):1206–1216

    Google Scholar 

  77. Muller KR, Mika S, Ratsch G, Tsuda K, Scholkopf B (2001) An introduction to Kernel-based learning algorithms. IEEE Trans Neural Netw 12(2):181–201

    Google Scholar 

  78. Najafabadi MM, Villanustre F, Khoshgoftaar TM, Seliya N, Wald R, Muharemagic E (2015) Deep learning applications and challenges in big data analytics. J Big Data 2(1):1

    Google Scholar 

  79. Neal RM (1990) Learning stochastic feedforward networks. Dep Comput Sci Univ Tor 64:1577

    Google Scholar 

  80. Norouzi M, Ranjbar M, Mori G (2009) Stacks of convolutional restricted Boltzmann machines for shift-invariant feature learning. In: Computer vision and pattern recognition, 2009. CVPR 2009. IEEE conference on, IEEE, pp 2735–2742

  81. Pérez-Cruz F, Bousquet O (2004) Kernel methods and their potential use in signal processing. IEEE Signal Process Mag 21(3):57–65

    Google Scholar 

  82. Poggio T, Girosi F (1989) A theory of networks for approximation and learning. Tech. rep, Massachusetts inst of tech cambridge artificial intelligence lab

  83. Poria S, Cambria E, Gelbukh A (2015) Deep convolutional neural network textual features and multiple Kernel learning for utterance-level multimodal sentiment analysis. In: Proceedings of the 2015 conference on empirical methods in natural language processing, pp 2539–2544

  84. Pouyanfar S, Sadiq S, Yan Y, Tian H, Tao Y, Reyes MP, Shyu ML, Chen SC, Iyengar S (2018) A survey on deep learning: algorithms, techniques, and applications. ACM Comput Surv (CSUR) 51(5):92

    Google Scholar 

  85. Rebai I, BenAyed Y, Mahdi W (2016) Deep multilayer multiple Kernel learning. Neural Comput Appl 27(8):2305–2314

    Google Scholar 

  86. Rosenblatt F (1958) The perceptron: a probabilistic model for information storage and organization in the brain. Psychol Rev 65(6):386

    Google Scholar 

  87. Roweis ST, Saul LK (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500):2323–2326

    Google Scholar 

  88. Ruck DW, Rogers SK, Kabrisky M (1990) Feature selection using a multilayer perceptron. J Neural Netw Comput 2(2):40–48

    Google Scholar 

  89. Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagating errors. Nature 323(6088):533

    MATH  Google Scholar 

  90. Salakhutdinov R, Hinton G (2009) Deep boltzmann machines. In: van Dyk D, Welling M (eds) Proceedings of the twelfth international conference on artificial intelligence and statistics, PMLR, Hilton Clearwater Beach Resort, Clearwater Beach, Florida USA, Proceedings of machine learning research, vol 5, pp 448–455, http://proceedings.mlr.press/v5/salakhutdinov09a.html

  91. Schölkopf B (2001) The Kernel trick for distances. In: Advances in neural information processing systems, pp 301–307

  92. Scholkopf B, Smola AJ (2001) Learning with Kernels: support vector machines, regularization, optimization, and beyond. MIT press, Cambridge

    Google Scholar 

  93. Schölkopf B, Smola A, Müller KR (1998) Nonlinear component analysis as a Kernel eigenvalue problem. Neural Comput 10(5):1299–1319

    Google Scholar 

  94. Seeger M (2004) Gaussian processes for machine learning. Int J Neural Syst 14(02):69–106

    Google Scholar 

  95. Shawe-Taylor J, Cristianini N (2000) Support vector machines. An introduction to support vector machines and other Kernel-based learning methods, Cambridge university press, Cambridge pp 93–112

  96. Shawe-Taylor J, Cristianini N (2004) Kernel methods for pattern analysis. Cambridge University Press, Cambridge

    MATH  Google Scholar 

  97. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:14091556

  98. Smola AJ, Schölkopf B, Müller KR (1998) The connection between regularization operators and support vector Kernels. Neural Netw 11(4):637–649

    Google Scholar 

  99. Smolensky P (1986) Information processing in dynamical systems: foundations of harmony theory. Colorado Univ at Builder Dept of Computer Science, Tech. rep

  100. Snoek J, Larochelle H, Adams RP (2012) Practical Bayesian optimization of machine learning algorithms. In: Advances in neural information processing systems, pp 2951–2959

  101. Song H, Thiagarajan JJ, Sattigeri P, Spanias A (2018) Optimizing Kernel machines using deep learning. IEEE Trans Neural Netw Learn Syst 99:1–13

    Google Scholar 

  102. Strobl EV, Visweswaran S (2013) Deep multiple Kernel learning. In: Machine learning and applications (ICMLA), 2013 12th international conference on, IEEE, vol 1, pp 414–417

  103. Sukhbaatar S, Weston J, Fergus R, et al. (2015) End-to-end memory networks. In: Advances in neural information processing systems, pp 2440–2448

  104. Sutskever I, Hinton G (2010) Temporal-Kernel recurrent neural networks. Neural Netw 23(2):239–243

    MATH  Google Scholar 

  105. Suykens JA, Vandewalle J (2000) Recurrent least squares support vector machines. IEEE Trans Circuits Syst I: Fundam Theory Appl 47(7):1109–1114

    Google Scholar 

  106. Suzuki J, Hirao T, Sasaki Y, Maeda E (2003) Hierarchical directed acyclic graph Kernel: methods for structured natural language data. In: Proceedings of the 41st annual meeting on association for computational linguistics-volume 1, association for computational linguistics, pp 32–39

  107. Tang D, Qin B, Liu T (2015) Document modeling with gated recurrent neural network for sentiment classification. In: Proceedings of the 2015 conference on empirical methods in natural language processing, pp 1422–1432

  108. Tenenbaum JB, De Silva V, Langford JC (2000) A global geometric framework for nonlinear dimensionality reduction. Science 290(5500):2319–2323

    Google Scholar 

  109. Tsang IW, Kwok JT, Cheung PM (2005a) Core vector machines: fast SVM training on very large data sets. J Mach Learn Res 6:363–392

    MathSciNet  MATH  Google Scholar 

  110. Tsang IW, Kwok JTY, Cheung PM (2005b) Very large SVM training using core vector machines. In: AISTATS

  111. Tsang IW, Kocsor A, Kwok JT (2007) Simpler core vector machines with enclosing balls. In: Proceedings of the 24th international conference on machine learning, ACM, pp 911–918

  112. Vapnik V (1998) The support vector method of function estimation. In: Nonlinear modeling, Springer, Berlin pp 55–85

  113. Vapnik V, Lerner AY (1963) Recognition of patterns with help of generalized portraits. Avtomat i Telemekh 24(6):774–780

    Google Scholar 

  114. Vapnik V, Golowich SE, Smola AJ (1997) Support vector method for function approximation, regression estimation and signal processing. In: Advances in neural information processing systems, pp 281–287

  115. Vishwanathan SVN, Schraudolph NN, Kondor R, Borgwardt KM (2010) Graph Kernels. J Mach Learn Res 11:1201–1242

    MathSciNet  MATH  Google Scholar 

  116. Wahba G (1990) Spline models for observational data, vol 59. SIAM, Philadelphia

    MATH  Google Scholar 

  117. Wang T, Zhao D, Tian S (2015) An overview of Kernel alignment and its applications. Artif Intell Rev 43(2):179–192

    Google Scholar 

  118. Weinberger KQ, Blitzer J, Saul LK (2006) Distance metric learning for large margin nearest neighbor classification. In: Advances in neural information processing systems, pp 1473–1480

  119. Werbos PJ (1990) Backpropagation through time: what it does and how to do it. Proc IEEE 78(10):1550–1560

    Google Scholar 

  120. Weston J, Chopra S, Bordes A (2014) Memory networks. arXiv preprint arXiv:14103916

  121. Wiering MA, Schomaker LR (2014) Multi-layer support vector machines. Regul Opt Kernels Support Vector Mach 19:457

    Google Scholar 

  122. Wilson AG, Hu Z, Salakhutdinov R, Xing EP (2016) Deep Kernel learning. In: Artificial intelligence and statistics, pp 370–378

  123. Xiong C, Merity S, Socher R (2016) Dynamic memory networks for visual and textual question answering. In: International conference on machine learning, pp 2397–2406

  124. Xiong H, Swamy M, Ahmad MO (2005) Optimizing the Kernel in the empirical feature space. IEEE Trans Neural Netw 16(2):460–474

    Google Scholar 

  125. Yanardag P, Vishwanathan S (2015) Deep graph Kernels. In: Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining, pp 1365–1374

  126. Yang R, Tan J, Kafatos M (2006) A pattern selection algorithm in Kernel PCA applications. In: International conference on software and data technologies, Springer, pp 374–387

  127. Yu D, Deng L (2011) Deep learning and its applications to signal and information processing [exploratory dsp]. IEEE Signal Process Mag 28(1):145–154

    Google Scholar 

  128. Zaremba W, Sutskever I (2015) Reinforcement learning neural turing machines-revised. arXiv preprint arXiv:150500521

  129. Zhang Y, Sohn K, Villegas R, Pan G, Lee H (2015) Improving object detection with deep convolutional networks via Bayesian optimization and structured prediction. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 249–258

  130. Zhuang J, Tsang IW, Hoi SC (2011) Two-layer multiple Kernel learning. In: Proceedings of the fourteenth international conference on artificial intelligence and statistics, pp 909–917

Download references

Acknowledgements

This work has been supported by Kerala State Council for Science, Technology and Environment (KSCSTE) under the Fellowship No. 48/FSHP/2016/KSCSTE.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nair K. Nikhitha.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Nikhitha, N.K., Afzal, A.L. & Asharaf, S. Deep Kernel machines: a survey. Pattern Anal Applic 24, 537–556 (2021). https://doi.org/10.1007/s10044-020-00933-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10044-020-00933-1

Keywords

Navigation