On the Selection of the Regularization Parameter in Stacking

Fushiki, Tadayoshi

doi:10.1007/s11063-020-10378-6

On the Selection of the Regularization Parameter in Stacking

Published: 23 October 2020

Volume 53, pages 37–48, (2021)
Cite this article

Neural Processing Letters Aims and scope Submit manuscript

Tadayoshi Fushiki¹

195 Accesses
1 Citation
Explore all metrics

Abstract

Stacking is a model combination technique to improve prediction accuracy. Regularization is usually necessary in stacking because some predictions used in the model combination provide similar predictions. Cross-validation is generally used to select the regularization parameter, but it incurs a high computational cost. This paper proposes two simple low computational cost methods for selecting the regularization parameter. The effectiveness of the methods is examined in numerical experiments. Asymptotic results in a particular setting are also shown.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

A survey on ensemble learning

Article 30 August 2019

Xibin Dong, Zhiwen Yu, … Qianli Ma

Fundamentals of Artificial Neural Networks and Deep Learning

References

Belsley DA, Kuh E, Welsch R (1980) Regression diagnostics. Wiley, New York
Book Google Scholar
Bhatt S, Cameron E, Flaxman SR, Weiss DJ, Smith DL, Gething PW (2017) Improved prediction accuracy for disease risk mapping using Gaussian process stacked generalization. J R Soc Interface 14:20170520
Article Google Scholar
Breiman L (1996) Stacked regression. Mach Learn 24:49–64
MATH Google Scholar
Breiman L, Friedman JH (1985) Estimating optimal transformations in multiple regression and correlation (with discussion). J Am Stat Assoc 80:580–619
Article Google Scholar
Claeskens G, Hjort NL (2008) Model selection and model averaging. Cambridge University Press, Cambridge
MATH Google Scholar
Clarke B (2003) Comparing Bayes model averaging and stacking when model approximation error cannot be ignored. J Mach Learn Res 4:683–712
MathSciNet MATH Google Scholar
Hoerl A, Kennard R (1988) Ridge regression. In: Kotz S, Johnson HL, Read CB (eds) Encyclopedia of statistical sciences, vol 8. Wiley, New York, pp 129–136
Google Scholar
Konishi S, Kitagawa G (2007) Information criteria and statistical modeling. Springer, New York
MATH Google Scholar
LeBlanc M, Tibshirani R (1996) Combining estimates in regression and classification. J Am Stat Assoc 91:1641–1650
MathSciNet MATH Google Scholar
Minka T (2002) Bayesian model averaging is not model combination. (https://tminka.github.io/papers/minka-bma-isnt-mc.pdf). Accessed 31 Aug 2020
Nicolas-Alonso LF, Corralejo R, Gomez-Pilar J, Alvarez D, Hornero R (2015) Adaptive stacked generalization for multiclass motor imagery-based brain computer interfaces. IEEE Trans Neural Syst Rehabil Eng 23:702–712
Article Google Scholar
Rasmussen CE, Williams CKI (2006) Gaussian processes for machine learning. MIT Press, Cambridge
MATH Google Scholar
Sakkis G, Androutsopoulos I, Paliouras G, Karkaletsis V, Spyropoulos C D, Stamatopoulos P (2001) Stacking classifiers for anti-spam filtering of E-mail. In: Lee L, Donna H (Eds) Proceedings of EMNLP-01, 6th conference on empirical methods in natural language processing, pp 44–50
Sill J, Takacs G, Mackey L, Lin D (2009) Feature-weighted linear stacking. arXiv:0911.0460v2
Stone M (1974) Cross-validatory choice and assessment of statistical predictions. J R Stat Soc B 36:111–147
MathSciNet MATH Google Scholar
Wolpert D (1992) Stacked generalization. Neural Netw 5:241–259
Article Google Scholar
Xu L, Jiang J-H, Zhou Y-P, Wu H-L, Shen G-L, Yu R-Q (2007) MCCV stacked regression for model combination and fast spectral interval selection in multivariate calibration. Chemometr Intell Lab Syst 87:226–230
Article Google Scholar
Yao Y, Vehtari A, Simpson D, Gelman A (2018) Using stacking to average Bayesian predictive distributions. Bayesian Anal 13:917–944
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Niigata University, Niigata, Japan
Tadayoshi Fushiki

Authors

Tadayoshi Fushiki
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tadayoshi Fushiki.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Asymptotic Results

We assume that N can be divided by $K^2$, and $L_{\alpha }=L=N/K$. In this paper, K is fixed and L goes to infinity when N goes to infinity, but we assume that K is taken to be large enough in advance. Typically, $K=10$ or 20. In this paper, we assume that the regularization parameter $\lambda $ is chosen from (0, EN) for a fixed $E>0$. Each model is parameterized by a finite dimensional $\theta _m$: $\{ f_1(x;\theta _1)\} ,\dots ,\{ f_M(x;\theta _M)\}$. The parameters are estimated by M-estimation:

$$\begin{aligned} \hat{\theta }_m(\mathcal{D}) = \mathop {\text{ argmin }}_{\theta _m\in \Theta _m}\{\Psi _m(\mathcal{D};\theta _m)\} , \end{aligned}$$

where $\Psi _m(\mathcal{D};\theta _m)=\sum _{i=1}^N\Psi _m((x_i,y_i);\theta _m)$. For example,

$$\begin{aligned} \Psi _m(\mathcal{D};\theta _m) = \sum _{i=1}^N \{ y_i-f_m(x_i;\theta _m)\}^2. \end{aligned}$$

In this section, we use the following notation:

$$\begin{aligned}&\Psi _m^{\prime }(\mathcal{D};\theta _m) = \nabla _{\theta _m}\Psi _m(\mathcal{D};\theta _m), \\&\Psi _m^{\prime \prime }(\mathcal{D};\theta _m) = \nabla _{\theta _m}\nabla _{\theta _m}^T\Psi _m(\mathcal{D};\theta _m), \\&J_m(\theta _m) = \text{ E }(\Psi _m^{\prime \prime }((x,y);\theta _m)) , \\&\theta _m^0 = \mathop {\text{ argmin }}_{\theta _m\in \Theta _m}\{\text{ E }(\Psi _m((x,y);\theta _m))\} . \end{aligned}$$

Accordingly,

$$\begin{aligned} \hat{\theta }_m(\mathcal{D})-\theta _m^0 \approx -N^{-1}J_m(\theta _m^0)^{-1}\Psi _m^{\prime }(\mathcal{D};\theta _m^0). \end{aligned}$$

(7)

We assume regularity conditions that make the first-order approximation valid. We will use the following abbreviation:

$$\begin{aligned}&\hat{\theta }_m = \hat{\theta }_m(\mathcal{D}),\quad \hat{\theta }_m^{(-\alpha )} \\&= \hat{\theta }_m(\mathcal{D}^{(-\alpha )}),\quad \hat{\theta }_m^{(-\alpha ,-\beta )} = \hat{\theta }_m(\mathcal{D}^{(-\alpha ,-\beta )}). \end{aligned}$$

For asymptotic calculations, we assume that $\mathcal{D}^{(-\alpha )}$ are devided into $\mathcal{D}^{(-\alpha ,1)},\dots ,\mathcal{D}^{(-\alpha ,K)}$ as follows. First, $\mathcal{D}^{(\alpha )}$ are diveded into $\mathcal{D}^{(\alpha ,1)},\dots ,\mathcal{D}^{(\alpha ,K)}$. Second, $\mathcal{D}^{(-\alpha ,\beta )}=\cup _{\gamma =1,\gamma \ne \alpha }^K \mathcal{D}^{(\gamma ,\beta )}$. Third, $\mathcal{D}^{(-\alpha ,-\beta )}= \mathcal{D}^{(-\alpha )}\backslash \mathcal{D}^{(-\alpha ,\beta )}$.

Let

$$\begin{aligned}&\mathcal{D}_1^{(-\alpha ,-\beta _{(\alpha ,x,y)})} = \mathcal{D}^{(-\alpha ,-\beta _{(\alpha ,x,y)})}\cap \mathcal{D}^{(-\alpha _{(x,y)})} , \\&\mathcal{D}_2^{(-\alpha ,-\beta _{(\alpha ,x,y)})} = \mathcal{D}^{(-\alpha ,-\beta _{(\alpha ,x,y)})}\cap \mathcal{D}^{(\alpha _{(x,y)})} , \\&\mathcal{D}_1^{(-\alpha ,\beta _{(\alpha ,x,y)})} = \mathcal{D}^{(-\alpha ,\beta _{(\alpha ,x,y)})}\cap \mathcal{D}^{(-\alpha _{(x,y)})} , \\&\mathcal{D}_2^{(-\alpha ,\beta _{(\alpha ,x,y)})} = \mathcal{D}^{(-\alpha ,\beta _{(\alpha ,x,y)})}\cap \mathcal{D}^{(\alpha _{(x,y)})}. \end{aligned}$$

Then,

$$\begin{aligned}&\mathcal{D}^{(-\alpha ,-\beta _{(\alpha ,x,y)})} = \mathcal{D}_1^{(-\alpha ,-\beta _{(\alpha ,x,y)})}\cup \mathcal{D}_2^{(-\alpha ,-\beta _{(\alpha ,x,y)})} , \\&\mathcal{D}^{(-\alpha )} = \mathcal{D}_1^{(-\alpha ,-\beta _{(\alpha ,x,y)})}\cup \mathcal{D}_2^{(-\alpha ,-\beta _{(\alpha ,x,y)})}\cup \mathcal{D}_1^{(-\alpha ,\beta _{(\alpha ,x,y)})}\cup \mathcal{D}_2^{(-\alpha ,\beta _{(\alpha ,x,y)})} , \\&\mathcal{D}^{(-\alpha _{(x,y)})} = \mathcal{D}^{(\alpha )}\cup \mathcal{D}_1^{(-\alpha ,-\beta _{(\alpha ,x,y)})}\cup \mathcal{D}_1^{(-\alpha ,\beta _{(\alpha ,x,y)})}. \end{aligned}$$

By the Taylor expansion, we can obtain

$$\begin{aligned}&f_m(x;\hat{\theta }_m^{(-\alpha ,-\beta _{(\alpha ,x,y)})})-g_m^{(-\alpha )}(x) \nonumber \\\approx & {} -\nabla _{\theta _m}f_m(x;\theta _m^0)^T(J_m(\theta _m^0))^{-1} \left\{ \frac{1}{LK(K-1)^2}\Psi _m^{\prime }(\mathcal{D}_1^{(-\alpha ,-\beta _{(\alpha ,x,y)})};\theta _m^0)\right. \nonumber \\&\quad \left. +\frac{K^2-K+1}{LK(K-1)^2}\Psi _m^{\prime }(\mathcal{D}_2^{(-\alpha ,-\beta _{(\alpha ,x,y)})};\theta _m^0) \right. \nonumber \\&\left. -\frac{K+1}{LK(K-1)}\Psi _m^{\prime }(\mathcal{D}_1^{(-\alpha ,\beta _{(\alpha ,x,y)})};\theta _m^0) - \frac{1}{LK(K-1)}\Psi _m^{\prime }(\mathcal{D}_2^{(-\alpha ,\beta _{(\alpha ,x,y)})};\theta _m^0)\right. \nonumber \\&\quad \left. -\frac{1}{LK(K-1)}\Psi _m^{\prime }(\mathcal{D}^{(\alpha )};\theta _m^0) \right\} . \end{aligned}$$

(8)

The right-hand side of (8) is written as $h_{m,x}^{(-\alpha )}$.

We denote the elements of $\mathcal{D}^{(-\alpha )}$ by $(x_1^{(-\alpha )},y_1^{(-\alpha )}),...,(x_{N^{\prime }}^{(-\alpha )},y_{N^{\prime }}^{(-\alpha )})$, where $N^{\prime }=(K-1)L$. We define $N^{\prime }\times M$ matrices $U^{(-\alpha )}, X^{(-\alpha )}, X_0^{(-\alpha )}$ and $\Delta _0^{(-\alpha )}$ whose (i, j)-th elements are

$$\begin{aligned}&(U^{(-\alpha )})_{ij}=g_j(x_i^{(-\alpha )}),\\&(X^{(-\alpha )})_{ij}=f_j(x_i^{(-\alpha )};\hat{\theta }_j^{(-\alpha ,-\beta _{(\alpha ,x_i^{(-\alpha )},y_i^{(-\alpha )})})}),\\&(X_0^{(-\alpha )})_{ij}=f_j(x_i^{(-\alpha )};\theta _j^0),\\&(\Delta _0^{(-\alpha )})_{ij}=h_{j,x_i^{(-\alpha )}}^{(-\alpha )}. \end{aligned}$$

Let $y^{(-\alpha )}$ be a vector $(y_1^{(-\alpha )},..., y_{N^{\prime }}^{(-\alpha )})^T$. Then,

$$\begin{aligned}&\hat{u}^{(-\alpha )}(\lambda ;\mathcal{D}) \approx \hat{w}(\lambda ;\mathcal{D}^{(-\alpha )}) + A(\lambda ,N^{\prime })^{-1} [{N^{\prime }}^{-1}{\Delta _0^{(-\alpha )}}^T \{ y^{(-\alpha )}-X_0^{(-\alpha )}w(\lambda ,N^{\prime })\} \\&\quad - {N^{\prime }}^{-1}{X_0^{(-\alpha )}}^T{\Delta _0^{(-\alpha )}}w(\lambda ,N^{\prime })] . \end{aligned}$$

Here, $A_0$ is $M\times M$ matirx whose (i, j)-th element is $\text{ E }( f_i(x;\theta _i^0)f_j(x;\theta _j^0))$, $A(\lambda ,N^{\prime }) = A_0+\lambda /N^{\prime }I$, and $w(\lambda ,N^{\prime })=A(\lambda ,N^{\prime })^{-1}b$, where b is M-dimensional vector whose i-th element is $\text{ E }( yf_i(x;\theta _i^0))$.

By expanding $\text{ ACV}_2$, we can obtain

$$\begin{aligned} \text{ ACV}_2= & {} \frac{1}{N}\sum _{\alpha =1}^K\sum _{(x,y)\in \mathcal{D}^{(\alpha )}} \left\{ y-\sum _{m=1}^M\hat{u}_m^{(-\alpha )}(\lambda ;\mathcal{D})f_m(x;\hat{\theta }_m^{(-\alpha )}) \right\} ^2 \nonumber \\= & {} \frac{1}{N}\sum _{\alpha =1}^K\sum _{(x,y)\in \mathcal{D}^{(\alpha )}} \left\{ y-\sum _{m=1}^M\hat{w}_m(\lambda ;\mathcal{D}^{(-\alpha )})f_m(x;\hat{\theta }_m^{(-\alpha )}) \right\} ^2 \nonumber \\&+ \frac{1}{N}\sum _{\alpha =1}^K\sum _{(x,y)\in \mathcal{D}^{(\alpha )}} \left[ \sum _{m=1}^M \left\{ \hat{w}_m(\lambda ;\mathcal{D}^{(-\alpha )}) - \hat{u}_m^{(-\alpha )}(\lambda ;\mathcal{D})\right\} f_m(x;\hat{\theta }_m^{(-\alpha )}) \right] ^2 \nonumber \\&\quad + \frac{2}{N}\sum _{\alpha =1}^K\sum _{(x,y)\in \mathcal{D}^{(\alpha )}} \sum _{m=1}^M \left\{ \hat{w}_m(\lambda ;\mathcal{D}^{(-\alpha )}) - \hat{u}_m^{(-\alpha )}(\lambda ;\mathcal{D})\right\} f_m(x;\hat{\theta }_m^{(-\alpha )}) \nonumber \\&\quad \times \left\{ y-\sum _{l=1}^M\hat{w}_l(\lambda ;\mathcal{D}^{(-\alpha )})f_l(x;\hat{\theta }_l^{(-\alpha )}) \right\} . \end{aligned}$$

(9)

The first term of (9) is $\text{ CV }$.

The second term of (9) is

$$\begin{aligned}&\frac{1}{N}\sum _{\alpha =1}^K\sum _{(x,y)\in \mathcal{D}^{(\alpha )}} \sum _{i=1}^M\sum _{j=1}^Mf_i(x;\hat{\theta }_i^{(-\alpha )}) \left\{ \hat{w}_i(\lambda ;\mathcal{D}^{(-\alpha )})-\hat{u}_i^{(-\alpha )}(\lambda ;\mathcal{D})\right\} \\&f_j(x;\hat{\theta }_j^{(-\alpha )}) \left\{ \hat{w}_j(\lambda ;\mathcal{D}^{(-\alpha )})-\hat{u}_j^{(-\alpha )}(\lambda ;\mathcal{D})\right\} \\&\quad \simeq \frac{1}{K}\sum _{\alpha =1}^K\sum _{i=1}^M\sum _{j=1}^M(A(\lambda ,N^{\prime })^{-1}A_0A(\lambda ,N^{\prime })^{-1})_{ij}\\&\times \left( \frac{1}{N^{\prime }}\sum _{k=1}^{N^{\prime }} f_i(x_k^{(-\alpha )};\theta _i^0)\sum _{l=1}^M\left[ h_{l,x_k^{(-\alpha )}}^{(-\alpha )}w_l^0 -h_{i,x_k^{(-\alpha )}}^{(-\alpha )} \left\{ y_k^{(-\alpha )}-\sum _{l=1}^Mw_l^0f_l(x_k^{(-\alpha )};\theta _l^0)\right\} \right] \right) \\&\times \left( \frac{1}{N^{\prime }}\sum _{m=1}^{N^{\prime }} f_j(x_m^{(-\alpha )};\theta _j^0)\sum _{n=1}^M\left[ h_{n,x_m^{(-\alpha )}}^{(-\alpha )}w_n^0 -h_{j,x_m^{(-\alpha )}}^{(-\alpha )} \left\{ y_m^{(-\alpha )}-\sum _{n=1}^Mw_n^0f_n(x_m^{(-\alpha )};\theta _n^0)\right\} \right] \right) . \end{aligned}$$

Here, we consider the following expectation:

$$\begin{aligned} \text{ E }\left( a(x_i)\left\{ \sum _{j=1}^Nb_j(x_j)\right\} c(x_k)\left\{ \sum _{l=1}^Nd_l(x_l)\right\} \right) , \end{aligned}$$

where $x_1,...,x_N$ are independent, and $\text{ E }( b_j(x_j)) =0$ for $j=1,...,N$ and $\text{ E }( d_l(x_l)) =0$ for $l=1,...,N$. Then,

$$\begin{aligned}&\text{ E }\left( a(x_i)\left\{ \sum _{j=1}^Nb_j(x_j)\right\} c(x_k)\left\{ \sum _{l=1}^Nd_l(x_l)\right) \right] \nonumber \\&= \text{ E }\left( a(x_i)\left\{ \sum _{j\in \{ i,k\} }b_j(x_j)\right\} c(x_k)\left\{ \sum _{l\in \{ i,k\}}d_l(x_l)\right\} \right) \nonumber \\&\quad +\text{ E }\left( a(x_i)\left\{ \sum _{j\notin \{ i,k\}}b_j(x_j)\right\} c(x_k)\left\{ \sum _{l\notin \{ i,k\}}d_l(x_l)\right\} \right) . \end{aligned}$$

(10)

The second term of (10) is bounded by

$$\begin{aligned}&\text{ E }\left( a(x_i)\left\{ \sum _{j\notin \{ i,k\}}b_j(x_j)\right\} c(x_k)\left\{ \sum _{l\notin \{ i,k\}}d_l(x_l)\right\} \right) \nonumber \\&\le \left| \text{ E }( a(x_i)c(x_k))\right| \left( \text{ E }\left( \left\{ \sum _{j\notin \{ i,k\}}b_j(x_j)\right\} ^2\right) \right) ^{1/2} \left( \text{ E }\left( \left\{ \sum _{l\notin \{ i,k\}}d_l(x_l)\right\} ^2\right) \right) ^{1/2} \nonumber \\&\le \left| \text{ E }( a(x_i)c(x_k))\right| \left[ \text{ E }\left( \sum _{j=1}^Nb_j(x_j)^2\right) \right] ^{1/2} \left[ \text{ E }\left( \sum _{l=1}^Nd_l(x_l)^2\right) \right] ^{1/2}. \end{aligned}$$

(11)

By using (11), the expectation of the second term of (9) is bounded by

$$\begin{aligned} c_1/\{ NC_1(K)\} +o(N^{-1}), \end{aligned}$$

(12)

where $c_1$ is a constant which does not depend on K and $C_1(K)=\min [ K^2(K-1)^3/(K^2-K+1)^2,K^2(K-1)^2/\{ (K+1)^2(K-2)\} ]$.

By calculating a bound of the expectation of the third term, we can obtain

$$\begin{aligned} |\text{ E }(\text{ ACV}_2-\text{ CV})| \le c_2/\{ NC_2(K)\} +o(N^{-1}), \end{aligned}$$

where $C_2(K)=(K-1)^3/(K^2-K+1)$ and $c_2$ is a constant which does not depend on K. Thus, by taking K large in advance, the bias of $\text{ ACV}_2$ can be close to the bias of $\text{ CV }$.

Next, we consider $\text{ ACV}_1$. By using

$$\begin{aligned}&f_m(x;\hat{\theta }_m^{(-\alpha ,-\beta _{(\alpha ,x,y)})})-f_m(x;\hat{\theta }_m^{(-\alpha _{(x,y)})}) \\&\approx -\nabla _{\theta _m}f_m(x;\theta _m^0)^TJ_m(\theta _m^0)^{-1} \left\{ \frac{1}{L(K-1)^2}\Psi _m^{\prime }(\mathcal{D}_1^{(-\alpha ,-\beta _{(\alpha ,x,y)})};\theta _m^0)\right. \\&\quad \left. +\frac{K}{L(K-1)^2}\Psi _m^{\prime }(\mathcal{D}_2^{(-\alpha ,-\beta _{(\alpha ,x,y)})};\theta _m^0) \right. \\&\quad \left. - \frac{}{L(K-1)}\Psi _m^{\prime }(\mathcal{D}_1^{(-\alpha ,\beta _{(\alpha ,x,y)})};\theta _m^0) - \frac{1}{L(K-1)}\Psi _m^{\prime }(\mathcal{D}^{(\alpha )};\theta _m^0) \right\} , \end{aligned}$$

we can calculate $\hat{w}_m(\lambda ;\mathcal{D}^{(-\alpha )}) -\hat{v}_m^{(-\alpha )}(\lambda ;\mathcal{D})$. Expanding $\text{ ACV}_1$ as in (9), we can obtain

$$\begin{aligned} |\text{ E }(\text{ ACV}_1-\text{ CV})| \ge c_1/\{ N(K-1)/K\} +o(N^{-1}), \end{aligned}$$

where $c_1$ is a constant which does not depend on K. Thus, we may not make the bias small if we use large K. This result comes from the fact that the coefficient of the term $\Psi _m^{\prime }(\mathcal{D}^{(\alpha )};\theta _m^0)$ in $f_m(x;\hat{\theta }_m^{(-\alpha ,-\beta _{(\alpha ,x,y)})})-f_m(x;\theta _m^{(-\alpha _{(x,y)})})$ is $1/\{ L(K-1)\}$, while the coefficient of the term $\Psi _m^{\prime }(\mathcal{D}^{(\alpha )};\theta _m^0)$ in $f_m(x;\hat{\theta }_m^{(-\alpha ,-\beta _{(\alpha ,x,y)})})-g_m^{(-\alpha )}(x)$ is $1/\{ LK(K-1)\}$.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fushiki, T. On the Selection of the Regularization Parameter in Stacking. Neural Process Lett 53, 37–48 (2021). https://doi.org/10.1007/s11063-020-10378-6

Download citation

Accepted: 15 October 2020
Published: 23 October 2020
Issue Date: February 2021
DOI: https://doi.org/10.1007/s11063-020-10378-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

On the Selection of the Regularization Parameter in Stacking

Abstract

Access this article

Similar content being viewed by others

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

A survey on ensemble learning

Fundamentals of Artificial Neural Networks and Deep Learning

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Asymptotic Results

Rights and permissions

About this article

Cite this article

Keywords

Navigation

On the Selection of the Regularization Parameter in Stacking

Abstract

Access this article

Similar content being viewed by others

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

A survey on ensemble learning

Fundamentals of Artificial Neural Networks and Deep Learning

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Asymptotic Results

Asymptotic Results

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation