当前位置: X-MOL 学术Artif. Intell. Rev. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A survey on different dimensions for graphical keyword extraction techniques
Artificial Intelligence Review ( IF 10.7 ) Pub Date : 2021-04-23 , DOI: 10.1007/s10462-021-10010-6
Muskan Garg 1
Affiliation  

The transmission from offline activities to online activities due to the social disorder evolved from COVID-19 pandemic lockdown has led to increase in the online economic and social activities. In this regard, the Automatic Keyword Extraction (AKE) from textual data has become even more interesting due to its application over different domains of Natural Language Processing (NLP). It is observed that the Graphical Keyword Extraction Techniques (GKET) use Graph of Words (GoW) in literature for analysis in different dimensions. In this article, efforts have been made to study these different dimensions for GKET, namely, the GoW representation, the statistical properties of GoW, the stability of the structure of GoW, the diversity in approaches over GoW for GKET, and the ranking of nodes in GoW. To elucidate these different dimensions, a comprehensive survey of GKET is carried in different domains to make some inferences out of the existing literature. These inferences are used to lay down possible research directions for interdisciplinary studies of network science and NLP. In addition, the experimental results are analysed to compare and contrast the existing GKET over 21 different dataset, to analyse the Word Co-occurrence Networks (WCN) for 15 different languages, and to study the structure of WCN for different genres. In this article, some strong correspondences in different disciplinary approaches are identified for different dimensions, namely, GoW representation: ’Line Graphs’ and ’Bigram Words Graphs’; Feature extraction and selection using eigenvalues: ’Random Walk’ and ’Spectral Clustering’. Different observations over the need to integrate multiple dimensions has open new research directions in the inter-disciplinary field of network science and NLP, applicable to handle streaming data and language-independent NLP.



中文翻译:

图形关键词提取技术的不同维度调查

由于从 COVID-19 大流行锁定演变而来的社会混乱,从线下活动向线上活动的传播导致线上经济和社会活动的增加。在这方面,文本数据中的自动关键字提取 (AKE) 由于其在自然语言处理 (NLP) 的不同领域中的应用而变得更加有趣。观察到图形关键字提取技术(GKET)使用文献中的词图(GoW)进行不同维度的分析。在本文中,我们努力研究 GKET 的这些不同维度,即 GoW 表示、GoW 的统计特性、GoW 结构的稳定性、GKET 的 GoW 方法的多样性以及节点的排名在 GoW。为了阐明这些不同的维度,在不同领域对 GKET 进行了全面调查,以从现有文献中做出一些推论。这些推论被用来为网络科学和 NLP 的跨学科研究奠定可能的研究方向。此外,对实验结果进行分析,以比较和对比现有 GKET 超过 21 个不同的数据集,分析 15 种不同语言的词共现网络 (WCN),并研究不同体裁的 WCN 结构。在本文中,针对不同维度确定了不同学科方法中的一些强对应,即 GoW 表示:'Line Graphs' 和 'Bigram Words Graphs';使用特征值进行特征提取和选择:“随机游走”和“光谱聚类”。

更新日期:2021-04-23
down
wechat
bug