当前位置: X-MOL 学术Mach. Learn. Sci. Technol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A neural network for determination of latent dimensionality in non-negative matrix factorization
Machine Learning: Science and Technology ( IF 6.3 ) Pub Date : 2021-02-09 , DOI: 10.1088/2632-2153/aba372
Benjamin Nebgen , Raviteja Vangara , Miguel A. Hombrados-Herrera , Svetlana Kuksova , Boian Alexandrov

Non-negative matrix factorization (NMF) has proven to be a powerful unsupervised learning method for uncovering hidden features in complex and noisy data sets with applications in data mining, text recognition, dimension reduction, face recognition, anomaly detection, blind source separation, and many other fields. An important input for NMF is the latent dimensionality of the data, that is, the number of hidden features, K, present in the explored data set. Unfortunately, this quantity is rarely known a priori. The existing methods for determining latent dimensionality, such as automatic relevance determination (ARD), are mostly heuristic and utilize different characteristics to estimate the number of hidden features. However, all of them require human presence to make a final determination of K. Here we utilize a supervised machine learning approach in combination with a recent method for model determination, called NMFk, to determine the number of hidden features automatically. NMFk performs a set of NMF simulations on an ensemble of matrices, obtained by bootstrapping the initial data set, and determines which K produces stable groups of latent features that reconstruct the initial data set well. We then train a multi-layer perceptron (MLP) classifier network to determine the correct number of latent features utilizing the statistics and characteristics of the NMF solutions, obtained from NMFk. In order to train the MLP classifier, a training set of 58 660 matrices with predetermined latent features were factorized with NMFk. The MLP classifier in conjunction with NMFk maintains a greater than 95% success rate when applied to a held out test set. Additionally, when applied to two well-known benchmark data sets, the swimmer and MIT face data, NMFk/MLP correctly recovered the established number of hidden features. Finally, we compared the accuracy of our method to ARD, AIC and stability-based methods.



中文翻译:

用于确定非负矩阵分解中潜在维数的神经网络

事实证明,非负矩阵分解(NMF)是一种强大的无监督学习方法,可用于发现复杂且嘈杂的数据集中的隐藏特征,并将其应用于数据挖掘,文本识别,降维,人脸识别,异常检测,盲源分离和许多其他领域。NMF的一个重要输入是数据的潜在维数,即在探索的数据集中存在的隐藏特征的数量K。不幸的是,这个数量很少是先验的。用于确定潜在维数的现有方法(例如自动相关性确定(ARD))大多是启发式的,并利用不同的特征来估计隐藏特征的数量。但是,所有这些都需要人的存在才能做出K的最终确定。在这里,我们结合了一种有监督的机器学习方法和一种用于模型确定的最新方法(称为NMFk)来自动确定隐藏特征的数量。NMFk通过自举初始数据集获得的矩阵集合执行一组NMF仿真,并确定哪个K产生稳定的潜在特征组,这些特征可以很好地重建初始数据集。然后,我们训练多层感知器(MLP)分类器网络,以利用从NMFk获得的NMF解决方案的统计信息和特征来确定潜在特征的正确数量。为了训练MLP分类器,使用NMFk分解了具有预定潜在特征的58 660个矩阵的训练集。当将MLP分类器与NMFk结合使用时,将其应用于保留的测试集可保持95%以上的成功率。此外,当将NMFk / MLP应用于游泳者和MIT人脸数据这两个众所周知的基准数据集时,它们可以正确地恢复已建立数量的隐藏特征。最后,我们将我们的方法与ARD,AIC和基于稳定性的方法的准确性进行了比较。

更新日期:2021-02-09
down
wechat
bug