当前位置: X-MOL 学术Comput. Stat. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Bayesian perspective of statistical machine learning for big data
Computational Statistics ( IF 1.3 ) Pub Date : 2020-04-01 , DOI: 10.1007/s00180-020-00970-8
Rajiv Sambasivan , Sourish Das , Sujit K. Sahu

Statistical Machine Learning (SML) refers to a body of algorithms and methods by which computers are allowed to discover important features of input data sets which are often very large in size. The very task of feature discovery from data is essentially the meaning of the keyword ‘learning’ in SML. Theoretical justifications for the effectiveness of the SML algorithms are underpinned by sound principles from different disciplines, such as Computer Science and Statistics. The theoretical underpinnings particularly justified by statistical inference methods are together termed as statistical learning theory. This paper provides a review of SML from a Bayesian decision theoretic point of view—where we argue that many SML techniques are closely connected to making inference by using the so called Bayesian paradigm. We discuss many important SML techniques such as supervised and unsupervised learning, deep learning, online learning and Gaussian processes especially in the context of very large data sets where these are often employed. We present a dictionary which maps the key concepts of SML from Computer Science and Statistics. We illustrate the SML techniques with three moderately large data sets where we also discuss many practical implementation issues. Thus the review is especially targeted at statisticians and computer scientists who are aspiring to understand and apply SML for moderately large to big data sets.

中文翻译:

大数据统计机器学习的贝叶斯观点

统计机器学习(SML)是指一组算法和方法,通过这些算法和方法,计算机可以发现输入数据集的重要特征,这些特征通常非常大。从数据发现特征的任务实际上就是SML中关键字“学习”的含义。SML算法有效性的理论依据来自不同学科(例如计算机科学和统计学)的合理原则。通过统计推断方法特别合理的理论基础被统称为统计学习理论。本文从贝叶斯决策理论的角度对SML进行了综述,我们认为许多SML技术与使用所谓的贝叶斯范式进行推理紧密相关。我们讨论了许多重要的SML技术,例如有监督和无监督学习,深度学习,在线学习和高斯过程,尤其是在经常使用这些数据的超大型数据集的背景下。我们提供了字典,该字典映射了计算机科学和统计学中SML的关键概念。我们用三个中等大小的数据集说明了SML技术,并在其中讨论了许多实际的实现问题。因此,本次审查特别针对希望了解SML并将其应用于中大型数据集的统计学家和计算机科学家。我们提供了字典,该字典映射了计算机科学和统计学中SML的关键概念。我们用三个中等大小的数据集说明了SML技术,并在其中讨论了许多实际的实现问题。因此,本次审查特别针对希望了解SML并将其应用于中大型数据集的统计学家和计算机科学家。我们提供了字典,该字典映射了计算机科学和统计学中SML的关键概念。我们用三个中等大小的数据集说明了SML技术,并在其中讨论了许多实际的实现问题。因此,本次审查特别针对希望了解SML并将其应用于中大型数据集的统计学家和计算机科学家。
更新日期:2020-04-01
down
wechat
bug