当前位置: X-MOL 学术EPJ Data Sci. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Predicting and explaining behavioral data with structured feature space decomposition
EPJ Data Science ( IF 3.6 ) Pub Date : 2019-06-27 , DOI: 10.1140/epjds/s13688-019-0201-0
Peter G. Fennell , Zhiya Zuo , Kristina Lerman

Modeling human behavioral data is challenging due to its scale, sparseness (few observations per individual), heterogeneity (differently behaving individuals), and class imbalance (few observations of the outcome of interest). An additional challenge is learning an interpretable model that not only accurately predicts outcomes, but also identifies important factors associated with a given behavior. To address these challenges, we describe a statistical approach to modeling behavioral data called the structured sum-of-squares decomposition (S3D). The algorithm, which is inspired by decision trees, selects important features that collectively explain the variation of the outcome, quantifies correlations between the features, and bins the subspace of important features into smaller, more homogeneous blocks that correspond to similarly-behaving subgroups within the population. This partitioned subspace allows us to predict and analyze the behavior of the outcome variable both statistically and visually, giving a medium to examine the effect of various features and to create explainable predictions. We apply S3D to learn models of online activity from large-scale data collected from diverse sites, such as Stack Exchange, Khan Academy, Twitter, Duolingo, and Digg. We show that S3D creates parsimonious models that can predict outcomes in the held-out data at levels comparable to state-of-the-art approaches, but in addition, produces interpretable models that provide insights into behaviors. This is important for informing strategies aimed at changing behavior, designing social systems, but also for explaining predictions, a critical step towards minimizing algorithmic bias.

中文翻译:

通过结构化特征空间分解预测和解释行为数据

由于其行为规模,稀疏性(每个人很少观察到),异质性(行为不同)和阶级失衡(感兴趣的结果很少见到),因此对人类行为数据进行建模具有挑战性。另一个挑战是学习一个可解释的模型,该模型不仅可以准确地预测结果,还可以识别与给定行为相关的重要因素。为了解决这些挑战,我们描述了一种对行为数据进行建模的统计方法,称为结构化平方和分解(S3D)。该算法受决策树的启发,选择了重要特征,这些特征共同解释了结果的变化,量化了特征之间的相关性,并将重要特征的子空间分为较小的,对应于总体中行为相似的子组的更均质的区块。这个分区子空间使我们能够从统计和视觉上预测和分析结果变量的行为,从而为检查各种功能的影响并创建可解释的预测提供了一种媒介。我们应用S3D从不同站点(例如Stack Exchange,Khan Academy,Twitter,Duolingo和Digg)收集的大规模数据中学习在线活动的模型。我们展示了S3D创建了简约模型,可以在与最新方法相当的水平上预测保留数据中的结果,但是除此之外,还可以生成可解释的模型,从而提供对行为的洞察力。这对于告知旨在改变行为的策略,设计社交系统以及说明预测的方法非常重要,
更新日期:2019-06-27
down
wechat
bug