当前位置: X-MOL 学术Cluster Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Deep-profiling: a deep neural network model for scholarly Web user profiling
Cluster Computing ( IF 4.4 ) Pub Date : 2021-06-09 , DOI: 10.1007/s10586-021-03315-2
Weiwei Lin , Haojun Xu , Jianzhuo Li , Ziming Wu , Zhengyang Hu , Victor Chang , James Z. Wang

Scholarly big data refer to the rapidly growing scholarly source of information, including a large number of authors, papers, and massive scale scholarly networks. Extracting the profile attributes for Web users is an important step in Web user analysis. For Web scholarly users, profile attributes extraction should integrate multi-source and heterogeneous information resources. However, the traditional extraction models have two main drawbacks: (1) The traditional models require manual feature selection based on specific domain knowledge; (2) The traditional models cannot adapt to the diversities of Scholarly Web pages and cannot discover the relationships between different target entities which are far apart in different domains. To address these issues, we propose a profile attributes extraction model, PAE-NN, based on a Bi-LSTM-CRF neural network. This model can automatically extract the characteristics and contextual representations of each extracting entity through a Recurrent Neural Network with end-to-end training. It takes advantage of the long-memory sequence characteristics of LSTM network to effectively discover the long-term dependencies on extracting entities. Our experimental results on published datasets from the SMPCUP2017 Open Academic Competition and Aminer demonstrate that the proposed PAE-NN model outperforms existing models in terms of extraction precision, recall, and F1-score with large-scale training data.



中文翻译:

深度剖析:一种用于学术网络用户剖析的深度神经网络模型

学术大数据是指快速增长的学术信息来源,包括大量作者、论文和大规模学术网络。提取 Web 用户的配置文件属性是 Web 用户分析中的一个重要步骤。对于网络学术用户,档案属性提取应整合多源异构信息资源。然而,传统的提取模型有两个主要缺点:(1)传统模型需要根据特定领域知识手动选择特征;(2) 传统模型不能适应学术网页的多样性,不能发现不同领域相距甚远的不同目标实体之间的关系。为了解决这些问题,我们提出了一个轮廓属性提取模型,PAE-NN,基于 Bi-LSTM-CRF 神经网络。该模型可以通过具有端到端训练的循环神经网络自动提取每个提取实体的特征和上下文表示。它利用 LSTM 网络的长记忆序列特性,有效地发现提取实体的长期依赖关系。我们对来自 SMPCUP2017 公开学术竞赛和 Aminer 的已发布数据集的实验结果表明,所提出的 PAE-NN 模型在提取精度、召回率和具有大规模训练数据的 F1 分数方面优于现有模型。它利用 LSTM 网络的长记忆序列特性,有效地发现提取实体的长期依赖关系。我们对来自 SMPCUP2017 公开学术竞赛和 Aminer 的已发布数据集的实验结果表明,所提出的 PAE-NN 模型在提取精度、召回率和具有大规模训练数据的 F1 分数方面优于现有模型。它利用 LSTM 网络的长记忆序列特性,有效地发现提取实体的长期依赖关系。我们对来自 SMPCUP2017 公开学术竞赛和 Aminer 的已发布数据集的实验结果表明,所提出的 PAE-NN 模型在提取精度、召回率和具有大规模训练数据的 F1 分数方面优于现有模型。

更新日期:2021-06-09
down
wechat
bug