An extended DEIM algorithm for subset selection and class identification,Machine Learning

当前位置： X-MOL 学术 › Mach. Learn. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

An extended DEIM algorithm for subset selection and class identification
Machine Learning ( IF 7.5 ) Pub Date : 2021-03-21 , DOI: 10.1007/s10994-021-05954-3
Emily P Hendryx ₁ , Béatrice M Rivière ₂ , Craig G Rusin ₃

Affiliation

The discrete empirical interpolation method (DEIM) has been shown to be a viable index-selection technique for identifying representative subsets in data. Having gained some popularity in reducing dimensionality of physical models involving differential equations, its use in subset-/pattern-identification tasks is not yet broadly known within the machine learning community. While it has much to offer as is, the DEIM algorithm is limited in that the number of selected indices cannot exceed the rank of the corresponding data matrix. Although this is not an issue for many data sets, there are cases in which the number of classes represented in a given data set is greater than the rank of the data matrix; in such cases, it is impossible for the standard DEIM algorithm to identify all classes. To overcome this issue, we present a novel extension of DEIM, called E-DEIM. With the proposed algorithm, we also provide some theoretical results for using extensions of DEIM to form the CUR matrix factorization in identifying both rows and columns to approximate the original data matrix. Results from applying variations of E-DEIM to two different data sets indicate that the presented extension can indeed allow for the identification of additional classes along with those selected by standard DEIM. In addition, comparing these results to those of some more familiar methods demonstrates that the proposed deterministic E-DEIM approach including coherence performs comparably to or better than the other evaluated methods and should be considered in future class-identification tasks.

中文翻译：

用于子集选择和类识别的扩展 DEIM 算法

离散经验插值法 (DEIM) 已被证明是一种可行的索引选择技术，用于识别数据中的代表性子集。在减少涉及微分方程的物理模型的维数方面获得了一定的普及，它在子集/模式识别任务中的使用在机器学习社区中尚未广为人知。虽然它可以提供很多东西，但 DEIM 算法的局限性在于所选索引的数量不能超过相应数据矩阵的等级。尽管这对许多数据集来说不是问题，但在某些情况下，给定数据集中表示的类数大于数据矩阵的秩；在这种情况下，标准的 DEIM 算法不可能识别所有类。为了克服这个问题，我们提出了一种新的 DEIM 扩展，称为 E-DEIM。通过所提出的算法，我们还提供了一些理论结果，用于使用 DEIM 的扩展形成 CUR 矩阵分解来识别行和列以近似原始数据矩阵。将 E-DEIM 的变体应用于两个不同的数据集的结果表明，所呈现的扩展确实可以允许识别其他类以及标准 DEIM 选择的类。此外，将这些结果与一些更熟悉的方法进行比较表明，所提出的确定性 E-DEIM 方法（包括相干性）的性能与其他评估方法相当或更好，应该在未来的类识别任务中加以考虑。我们还提供了一些理论结果，用于使用 DEIM 的扩展来形成 CUR 矩阵分解，以识别行和列以近似原始数据矩阵。将 E-DEIM 的变体应用于两个不同的数据集的结果表明，所呈现的扩展确实可以允许识别其他类以及标准 DEIM 选择的类。此外，将这些结果与一些更熟悉的方法进行比较表明，所提出的确定性 E-DEIM 方法（包括相干性）的性能与其他评估方法相当或更好，应该在未来的类识别任务中加以考虑。我们还提供了一些理论结果，用于使用 DEIM 的扩展来形成 CUR 矩阵分解，以识别行和列以近似原始数据矩阵。将 E-DEIM 的变体应用于两个不同的数据集的结果表明，所呈现的扩展确实可以允许识别其他类以及标准 DEIM 选择的类。此外，将这些结果与一些更熟悉的方法进行比较表明，所提出的确定性 E-DEIM 方法（包括相干性）的性能与其他评估方法相当或更好，应该在未来的类识别任务中加以考虑。将 E-DEIM 的变体应用于两个不同的数据集的结果表明，所呈现的扩展确实可以允许识别其他类以及标准 DEIM 选择的类。此外，将这些结果与一些更熟悉的方法进行比较表明，所提出的确定性 E-DEIM 方法（包括相干性）的性能与其他评估方法相当或更好，应该在未来的类识别任务中加以考虑。将 E-DEIM 的变体应用于两个不同的数据集的结果表明，所呈现的扩展确实可以允许识别其他类以及标准 DEIM 选择的类。此外，将这些结果与一些更熟悉的方法进行比较表明，所提出的确定性 E-DEIM 方法（包括相干性）的性能与其他评估方法相当或更好，应该在未来的类识别任务中加以考虑。

更新日期：2021-03-22

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>