当前位置: X-MOL 学术Appl. Netw. Sci. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Characterizing the hypergraph-of-entity and the structural impact of its extensions
Applied Network Science ( IF 1.3 ) Pub Date : 2020-10-27 , DOI: 10.1007/s41109-020-00320-z
José Devezas , Sérgio Nunes

The hypergraph-of-entity is a joint representation model for terms, entities and their relations, used as an indexing approach in entity-oriented search. In this work, we characterize the structure of the hypergraph, from a microscopic and macroscopic scale, as well as over time with an increasing number of documents. We use a random walk based approach to estimate shortest distances and node sampling to estimate clustering coefficients. We also propose the calculation of a general mixed hypergraph density measure based on the corresponding bipartite mixed graph. We analyze these statistics for the hypergraph-of-entity, finding that hyperedge-based node degrees are distributed as a power law, while node-based node degrees and hyperedge cardinalities are log-normally distributed. We also find that most statistics tend to converge after an initial period of accentuated growth in the number of documents. We then repeat the analysis over three extensions—materialized through synonym, context, and tf_bin hyperedges—in order to assess their structural impact in the hypergraph. Finally, we focus on the application-specific aspects of the hypergraph-of-entity, in the domain of information retrieval. We analyze the correlation between the retrieval effectiveness and the structural features of the representation model, proposing ranking and anomaly indicators, as useful guides for modifying or extending the hypergraph-of-entity.



中文翻译:

表征实体超图及其扩展的结构影响

实体超图是术语,实体及其关系的联合表示模型,在面向实体的搜索中用作索引方法。在这项工作中,我们从微观和宏观尺度以及随着时间的推移,随着文档数量的增加,描绘了超图的结构。我们使用基于随机游走的方法来估计最短距离,并使用节点采样来估计聚类系数。我们还建议根据相应的二分混合图来计算一般混合超图密度度量。我们对实体的超图分析这些统计数据,发现基于超边缘的节点度以幂定律分布,而基于节点的节点度和超边缘基数对数正态分布。我们还发现,大多数统计数字在文档数量开始急剧增长的初期后趋于收敛。然后,我们对三个扩展进行重复分析-通过synonymcontexttf_bin超边-以便评估它们在超图中的结构影响。最后,在信息检索领域,我们着重于实体超图的特定于应用程序的方面。我们分析了检索效率和表示模型的结构特征之间的相关性,提出了排名和异常指标,作为修改或扩展实体图的有用指南。

更新日期:2020-10-30
down
wechat
bug