当前位置: X-MOL 学术Stat. Anal. Data Min. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Optimal ratio for data splitting
Statistical Analysis and Data Mining ( IF 1.3 ) Pub Date : 2022-04-04 , DOI: 10.1002/sam.11583
V. Roshan Joseph 1
Affiliation  

It is common to split a dataset into training and testing sets before fitting a statistical or machine learning model. However, there is no clear guidance on how much data should be used for training and testing. In this article, we show that the optimal training/testing splitting ratio is p : 1 $$ \sqrt{p}:1 $$ p:1, where p $$ p $$ p is the number of parameters in a linear regression model that explains the data well.

中文翻译:

数据拆分的最佳比例

在拟合统计或机器学习模型之前,通常将数据集拆分为训练集和测试集。但是,对于应该使用多少数据进行训练和测试,没有明确的指导。在这篇文章中,我们展示了最优的训练/测试分割比是 p 1 $$ \sqrt{p}:1 $$ p: 1, 在哪里 p $$ p $$ p是线性回归模型中能够很好地解释数据的参数数量。
更新日期:2022-04-04
down
wechat
bug