当前位置: X-MOL 学术BMC Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Optimization and expansion of non-negative matrix factorization.
BMC Bioinformatics ( IF 3 ) Pub Date : 2020-01-06 , DOI: 10.1186/s12859-019-3312-5
Xihui Lin 1 , Paul C Boutros 1, 2, 3
Affiliation  

BACKGROUND Non-negative matrix factorization (NMF) is a technique widely used in various fields, including artificial intelligence (AI), signal processing and bioinformatics. However existing algorithms and R packages cannot be applied to large matrices due to their slow convergence or to matrices with missing entries. Besides, most NMF research focuses only on blind decompositions: decomposition without utilizing prior knowledge. Finally, the lack of well-validated methodology for choosing the rank hyperparameters also raises concern on derived results. RESULTS We adopt the idea of sequential coordinate-wise descent to NMF to increase the convergence rate. We demonstrate that NMF can handle missing values naturally and this property leads to a novel method to determine the rank hyperparameter. Further, we demonstrate some novel applications of NMF and show how to use masking to inject prior knowledge and desirable properties to achieve a more meaningful decomposition. CONCLUSIONS We show through complexity analysis and experiments that our implementation converges faster than well-known methods. We also show that using NMF for tumour content deconvolution can achieve results similar to existing methods like ISOpure. Our proposed missing value imputation is more accurate than conventional methods like multiple imputation and comparable to missForest while achieving significantly better computational efficiency. Finally, we argue that the suggested rank tuning method based on missing value imputation is theoretically superior to existing methods. All algorithms are implemented in the R package NNLM, which is freely available on CRAN and Github.

中文翻译:

非负矩阵分解的优化和扩展。

背景技术非负矩阵分解(NMF)是广泛应用于包括人工智能(AI),信号处理和生物信息学在内的各个领域的技术。但是,现有算法和R包由于收敛速度慢或条目缺失而不能应用于大型矩阵。此外,大多数NMF研究仅关注盲分解:不利用先验知识即可进行分解。最后,缺乏用于选择秩超参数的经过验证的方法,也引起了对派生结果的关注。结果我们采用顺序坐标式下降到NMF的思想来提高收敛速度。我们证明了NMF可以自然地处理缺失值,并且该属性导致了一种确定秩超参数的新颖方法。进一步,我们展示了NMF的一些新颖应用,并展示了如何使用遮罩来注入先验知识和理想特性以实现更有意义的分解。结论我们通过复杂性分析和实验表明,我们的实现比众所周知的方法收敛更快。我们还表明,使用NMF进行肿瘤内容反褶积可以达到类似于ISOpure等现有方法的结果。我们提出的缺失值插补比诸如多重插补之类的常规方法更准确,并且可以与missForest相提并论,同时可显着提高计算效率。最后,我们认为基于缺失值插补的建议等级调整方法在理论上优于现有方法。所有算法均在R包NNLM中实现,可在CRAN和Github上免费获得。
更新日期:2020-01-06
down
wechat
bug