当前位置: X-MOL 学术J. Informetr. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Effect of class imbalance in heterogeneous network embedding: An empirical study
Journal of Informetrics ( IF 3.4 ) Pub Date : 2020-02-07 , DOI: 10.1016/j.joi.2020.101009
Akash Anil , Sanasam Ranbir Singh

Network science has been extensively explored in solving various bibliometrics tasks such as Co-authorship prediction, Author classification, Author clustering, Author ranking, Paper ranking, etc. While majority of the past studies exploit homogeneous bibliographic network (consists of singular type of nodes and edges), in recent past there is a surge in using heterogeneous bibliographic entities and their inter-dependencies using heterogeneous information networks (HIN). Unlike homogeneous bibliographic networks, a bibliographic HIN consists of multi-typed nodes such as Author, Paper, Venue, etc. and corresponding relations. Thus bibliographic HIN is more complex and captures rich semantics of underlying bibliographic data as well as poses more challenges. Since a real-world HIN may have different number of instances for different node types, class imbalance is ubiquitous. Recent studies discuss class imbalance in brief and exploit meta-path-based strategies to address the issue. However, there is no work which quantitatively study the effect of class imbalance in regards to solving real-world bibliometrics tasks. Therefore, this paper first proposes a metric to estimate class imbalance in HIN and study the effects of class imbalance over two bibliometrics tasks, namely (i) Co-authorship prediction and (ii) Author's research area classification, using node features generated by network embedding-based frameworks for DBLP dataset. From various experimental analysis, it is evident that class imbalance in bibliographic HIN is an inherent characteristic and for better performance of the above-mentioned bibliometrics tasks, the bibliographic HINs must consider Author, Paper, and Venue as node types.



中文翻译:

类不平衡在异构网络嵌入中的影响:一项实证研究

在解决各种文献计量学任务(例如共同作者预测,作者分类,作者聚类,作者排名,论文排名等)方面,已经广泛探索了网络科学。尽管过去的大多数研究都利用同构书目网络(包括节点的奇异类型和边缘),在最近,使用异构书目实体及其相互依赖关系(使用异构信息网络(HIN))激增。与同类书目网络不同,书目HIN由多种类型的节点组成,例如Author,Paper,Venue等。和相应的关系。因此,书目HIN更加复杂,并捕获了基础书目数据的丰富语义,并带来了更多挑战。由于现实世界中的HIN对于不同的节点类型可能具有不同数量的实例,因此类不平衡无处不在。最近的研究简要讨论了班级不平衡问题,并利用基于元路径的策略来解决该问题。但是,还没有工作可以定量地研究类不平衡对解决现实世界的文献计量学任务的影响。因此,本文首先提出一种度量,以估计HIN中的类不平衡,并研究类不平衡对两个文献计量任务的影响,即(i)共同作者预测和(ii)作者的研究区域分类,使用网络嵌入生成的节点特征DBLP数据集的基于框架的框架。

更新日期:2020-02-07
down
wechat
bug