当前位置: X-MOL 学术Pattern Anal. Applic. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A sparse linear regression model for incomplete datasets
Pattern Analysis and Applications ( IF 3.7 ) Pub Date : 2019-12-04 , DOI: 10.1007/s10044-019-00859-3
Marcelo B. A. Veras , Diego P. P. Mesquita , Cesar L. C. Mattos , João P. P. Gomes

Incomplete data are often neglected when designing machine learning methods. A popular strategy adopted by practitioners to circumvent this consists of taking a preprocessing step to fill the missing components. These preprocessing algorithms are designed independently of the machine learning method that will be applied subsequently, which may lead to sub-optimal results. An alternative solution is to redesign classical machine learning methods to handle missing data directly. In this paper, we propose a variant of the forward stagewise regression (FSR) algorithm for incomplete data. The original FSR is an iterative procedure to estimate parameters of sparse linear models. The proposed method, named forward stagewise regression for incomplete datasets with GMM (FSIG), models the missing components as random variables following a Gaussian mixture distribution. In FSIG, the main steps of FSR are adapted to deaç with the intrinsic uncertainty of incomplete samples. The performance of FSIG was evaluated in an extensive set of experiments, and our model was able to outperform classical methods in most of the tested cases.

中文翻译:

不完整数据集的稀疏线性回归模型

设计机器学习方法时,经常会忽略不完整的数据。从业人员为避免这种情况采取的一种流行策略是采取预处理步骤来填充缺失的组件。这些预处理算法的设计与随后将应用的机器学习方法无关,这可能导致次优结果。另一种解决方案是重新设计经典的机器学习方法,以直接处理丢失的数据。在本文中,我们为不完整数据提出了一种前向逐步回归(FSR)算法的变体。原始FSR是一种迭代程序,用于估计稀疏线性模型的参数。所提出的方法,对于具有GMM(FSIG)的不完整数据集,称为前向逐步回归,根据高斯混合分布,将缺失的分量建模为随机变量。在FSIG中,FSR的主要步骤适用于处理不完整样本的内在不确定性。在广泛的实验中评估了FSIG的性能,在大多数测试案例中,我们的模型均能胜过传统方法。
更新日期:2019-12-04
down
wechat
bug