当前位置: X-MOL 学术Big Data › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Deep Learning on Big, Sparse, Behavioral Data
Big Data ( IF 4.6 ) Pub Date : 2019-12-01 , DOI: 10.1089/big.2019.0095
Sofie De Cnudde 1 , Yanou Ramon 1 , David Martens 1 , Foster Provost 2
Affiliation  

The outstanding performance of deep learning (DL) for computer vision and natural language processing has fueled increased interest in applying these algorithms more broadly in both research and practice. This study investigates the application of DL techniques to classification of large sparse behavioral data-which has become ubiquitous in the age of big data collection. We report on an extensive search through DL architecture variants and compare the predictive performance of DL with that of carefully regularized logistic regression (LR), which previously (and repeatedly) has been found to be the most accurate machine learning technique generally for sparse behavioral data. At a high level, we demonstrate that by following recommendations from the literature, researchers and practitioners who are not DL experts can achieve world-class performance using DL. More specifically, we report several findings. As a main result, applying DL on 39 big sparse behavioral classification tasks demonstrates a significant performance improvement compared with LR. A follow-up result suggests that if one were to choose the best shallow technique (rather than just LR), there still would often be an improvement from using DL, but that in this case the magnitude of the improvement might not justify the high cost. Investigating when DL performs better, we find that worse performance is obtained for data sets with low signal-from-noise separability-in line with prior results comparing linear and nonlinear classifiers. Exploring why the deep architectures work well, we show that using the first-layer features learned by DL yields better generalization performance for a linear model than do unsupervised feature-reduction methods (e.g., singular-value decomposition). However, to do well enough to beat well-regularized LR with the original sparse representation, more layers from the deep distributed architecture are needed. With respect to interpreting how deep models come to their decisions, we demonstrate how the neurons on the lowest layer of the deep architecture capture nuances from the raw fine-grained features and allow intuitive interpretation. Looking forward, we propose the use of instance-level counterfactual explanations to gain insight into why deep models classify individual data instances the way they do.

中文翻译:

大型,稀疏行为数据的深度学习

深度学习(DL)在计算机视觉和自然语言处理方面的出色表现激发了人们对在研究和实践中更广泛地应用这些算法的兴趣。这项研究调查了DL技术在大型稀疏行为数据分类中的应用,在稀疏行为数据的分类中,这种行为在大数据收集时代已无处不在。我们报告了通过DL体系结构变体进行的广泛搜索,并将DL的预测性能与经过仔细正则化的Logistic回归(LR)进行了比较,后者先前(和反复发现)通常是针对稀疏行为数据的最准确的机器学习技术。在较高的水平上,我们证明了根据文献的建议,非DL专家的研究人员和从业人员可以使用DL获得世界一流的性能。更具体地说,我们报告了一些发现。作为主要结果,与LR相比,将DL应用于39个大型稀疏行为分类任务显示出显着的性能改进。后续结果表明,如果选择最佳的浅层技术(而不仅仅是LR),则使用DL仍常常会有所改进,但是在这种情况下,这种改进的幅度可能不足以证明高昂的成本是合理的。调查DL何时表现更好,我们发现,线噪比低的数据集符合线性和非线性分类器的先前结果,其性能较差。探索深层架构运作良好的原因,我们表明,使用DL学习的第一层特征会比无监督特征约简方法(例如奇异值分解)产生更好的线性模型泛化性能。但是,为了做得好到足以用原始的稀疏表示击败规则完善的LR,需要深度分布式体系结构中的更多层。关于解释深度模型如何做出决策的方面,我们演示了深度体系结构最低层的神经元如何从原始细粒度特征中捕获细微差别并允许直观解释。展望未来,我们建议使用实例级反事实解释来深入了解为何深层模型以这种方式对单个数据实例进行分类。
更新日期:2019-12-01
down
wechat
bug