GraphBoot: Quantifying Uncertainty in Node Feature Learning on Large Networks,IEEE Transactions on Knowledge and Data Engineering

当前位置： X-MOL 学术 › IEEE Trans. Knowl. Data. Eng. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

GraphBoot: Quantifying Uncertainty in Node Feature Learning on Large Networks
IEEE Transactions on Knowledge and Data Engineering ( IF 8.9 ) Pub Date : 2021-01-01 , DOI: 10.1109/tkde.2019.2925355
Cuneyt Akcora , Yulia R. Gel , Murat Kantarcioglu , Vyacheslav Lyubchich , Bhavani Thuraisingham

In recent years, as online social networks continue to grow in size, estimating node features, such as sociodemographics, preferences and health status, in a scalable and reliable way has become a primary research direction in social network mining. Although many techniques have been developed for estimating various node features, quantifying uncertainty in such estimations has received little attention. Furthermore, most existing methods study networks parametrically, which limits insights about necessary quantity of queried data, reliable feature estimation, and estimator uncertainty. Uncertainty quantification is critical for answering key questions, such as, given a limited availability of social network data, how much data should be queried from the network?, and which node features can be learned reliably? More importantly, how can we evaluate uncertainty of our estimators? Uncertainty quantification is not equivalent to network sampling but constitutes a key complementary concept to sampling and the associated reliability analysis. To our knowledge, this paper is the first work that sheds light on uncertainty quantification and uncertainty propagation in social network feature mining. We propose a novel non-parametric bootstrap method for uncertainty analysis of node features in social network mining, derive its asymptotic properties, and demonstrate its effectiveness with extensive experiments. Furthermore, we develop a new metric based on dispersion of estimations, enabling analysts to assess how much more information is needed for increasing prediction reliability based on the estimated uncertainty. We demonstrate the effectiveness of our new uncertainty quantification methodology with extensive experiments on real life social networks, and a case study of mental health on Twitter.

中文翻译：

GraphBoot：量化大型网络节点特征学习的不确定性

近年来，随着在线社交网络规模的不断扩大，以可扩展和可靠的方式估计节点特征，如社会人口统计学、偏好和健康状况，已成为社交网络挖掘的主要研究方向。尽管已经开发了许多用于估计各种节点特征的技术，但在这种估计中量化不确定性却很少受到关注。此外，大多数现有方法都以参数方式研究网络，这限制了对查询数据的必要数量、可靠的特征估计和估计器不确定性的了解。不确定性量化对于回答关键问题至关重要，例如，鉴于社交网络数据的可用性有限，应从网络中查询多少数据？可以可靠地学习哪些节点特征？更重要的是，我们如何评估估计量的不确定性？不确定性量化并不等同于网络抽样，而是构成抽样和相关可靠性分析的关键补充概念。据我们所知，本文是第一部阐明社交网络特征挖掘中的不确定性量化和不确定性传播的工作。我们提出了一种新的非参数引导方法，用于社交网络挖掘中节点特征的不确定性分析，推导出其渐近特性，并通过大量实验证明其有效性。此外，我们开发了一个基于估计离散度的新指标，使分析师能够根据估计的不确定性评估需要多少信息来提高预测可靠性。

更新日期：2021-01-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11