当前位置: X-MOL 学术Scientometrics › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Text representation model of scientific papers based on fusing multi-viewpoint information and its quality assessment
Scientometrics ( IF 3.9 ) Pub Date : 2021-06-23 , DOI: 10.1007/s11192-021-04028-4
Yonghe Lu , Jiayi Luo , Ying Xiao , Hou Zhu

Text representation is the preliminary work for in-depth analysis and mining of information in scientific papers. It directly affects the effects of downstream tasks such as, scientific papers classification, clustering, and similarity calculation. However, recent researches mainly considered citation network and partial structural information, which is insufficient when representing scientific papers. Therefore, in order to improve the performance of text representation model, this paper proposed MV-HATrans, a text representation model that combines multi-viewpoint information, such as the semantic information of knowledge graph and structural information. This model extracts word information from three aspects, including contextual content, part of speech, and word meaning of WordNet. Based on combination of hierarchical attention mechanism and transformer, the model achieves the full text representation of scientific papers. Finally, this paper uses the binary experimental dataset AAPR, which indicates whether scientific papers are accepted or not, and applies the proposed model of text representation to achieve the goal of automatic quality assessment. Results show that in the quality classification of scientific papers, adopting part-of-speech information and semantic information based on WordNet definitions can effectively achieve the accuracy of prediction as 70.14%. Among all the structural modules, authors and abstracts contributes the most to the quality classification of scientific papers, especially authors as 9.51%.



中文翻译:

基于融合多视点信息的科技论文文本表示模型及其质量评价

文本表示是对科学论文中的信息进行深入分析和挖掘的前期工作。它直接影响下游任务的效果,例如科学论文分类、聚类和相似度计算。然而,最近的研究主要考虑了引文网络和部分结构信息,这在表示科学论文时是不够的。因此,为了提高文本表示模型的性能,本文提出了MV-HATrans,一种结合了知识图谱语义信息和结构信息等多视点信息的文本表示模型。该模型从WordNet的上下文内容、词性和词义三个方面提取词信息。该模型基于分层注意力机制和transformer的结合,实现了科学论文的全文表示。最后,本文使用二进制实验数据集AAPR,指示科学论文是否被接受,并应用所提出的文本表示模型来实现自动质量评估的目标。结果表明,在科学论文的质量分类中,采用基于WordNet定义的词性信息和语义信息可以有效地达到70.14%的预测准确率。在所有结构模块中,作者和摘要对科学论文质量分类的贡献最大,尤其是作者,占 9.51%。本文使用二进制实验数据集AAPR,指示科学论文是否被接受,并应用所提出的文本表示模型来实现自动质量评估的目标。结果表明,在科学论文的质量分类中,采用基于WordNet定义的词性信息和语义信息可以有效地达到70.14%的预测准确率。在所有结构模块中,作者和摘要对科学论文质量分类的贡献最大,尤其是作者,占 9.51%。本文使用二进制实验数据集AAPR,指示科学论文是否被接受,并应用所提出的文本表示模型来实现自动质量评估的目标。结果表明,在科学论文的质量分类中,采用基于WordNet定义的词性信息和语义信息可以有效地达到70.14%的预测准确率。在所有结构模块中,作者和摘要对科学论文质量分类的贡献最大,尤其是作者,占 9.51%。结果表明,在科学论文的质量分类中,采用基于WordNet定义的词性信息和语义信息可以有效地达到70.14%的预测准确率。在所有结构模块中,作者和摘要对科学论文质量分类的贡献最大,尤其是作者,占 9.51%。结果表明,在科学论文的质量分类中,采用基于WordNet定义的词性信息和语义信息可以有效地达到70.14%的预测准确率。在所有结构模块中,作者和摘要对科学论文质量分类的贡献最大,尤其是作者,占 9.51%。

更新日期:2021-07-19
down
wechat
bug