当前位置: X-MOL 学术Stat. Methods Appl. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Learning social networks from text data using covariate information
Statistical Methods & Applications ( IF 1.1 ) Pub Date : 2021-09-18 , DOI: 10.1007/s10260-021-00586-2
Xiaoyi Yang 1 , Nynke M. D. Niezink 1 , Rebecca Nugent 1
Affiliation  

Accurately describing the lives of historical figures can be challenging, but unraveling their social structures perhaps is even more so. Historical social network analysis methods can help in this regard and may even illuminate individuals who have been overlooked by historians, but turn out to be influential social connection points. Text data, such as biographies, are a useful source of information for learning historical social networks but the identifcation of links based on text data can be challenging. The Local Poisson Graphical Lasso model models social networks by conditional independence structures, and leverages the number of name co-mentions in the text to infer relationships. However, this method does not take into account the abundance of covariate information that is often available in text data. Conditional independence structure like Poisson Graphical Model, which makes use name mention counts in the text can be useful tools to avoid false positive links due to the co-mentions but given historical tendency of frequently used or common names, without additional distinguishing information, we may introduce incorrect connections. In this work, we therefore extend the Local Poisson Graphical Lasso model with a (multiple) penalty structure that incorporates covariates, opening up the opportunity for similar individuals to have a higher probability of being connected. We propose both greedy and Bayesian approaches to estimate the penalty parameters. We present results on data simulated with characteristics of historical networks and show that this type of penalty structure can improve network recovery as measured by precision and recall. We also illustrate the approach on biographical data of individuals who lived in early modern Britain between 1500 and 1575. We will show how these covariates affect the statistical model’s performance using simulations, discuss how it helps to better identify links for the people with common names and those who are traditionally underrepresented in the biography text data.



中文翻译:

使用协变量信息从文本数据中学习社交网络

准确描述历史人物的生活可能具有挑战性,但解开他们的社会结构可能更是如此。历史社会网络分析方法可以在这方面有所帮助,甚至可以阐明那些被历史学家忽视但结果却是有影响力的社会联系点的个人。传记等文本数据是学习历史社交网络的有用信息来源,但基于文本数据的链接识别可能具有挑战性。Local Poisson Graphical Lasso 模型通过条件独立结构对社交网络进行建模,并利用文本中名称共同提及的数量来推断关系。然而,这种方法没有考虑到文本数据中经常可用的大量协变量信息。像泊松图模型这样的条件独立结构,它使文本中使用名称提及次数可以成为避免由于共同提及而导致误报的有用工具,但考虑到常用名称或常用名称的历史趋势,如果没有额外的区分信息,我们可能会引入不正确的连接。因此,在这项工作中,我们使用包含协变量的(多个)惩罚结构扩展了局部泊松图形套索模型,为相似个体提供了更高概率连接的机会。我们提出了贪婪和贝叶斯方法来估计惩罚参数。我们展示了对具有历史网络特征的模拟数据的结果,并表明这种类型的惩罚结构可以提高以精度和召回率衡量的网络恢复。

更新日期:2021-09-19
down
wechat
bug