Neural ParsCit: a deep learning-based reference string parser,International Journal on Digital Libraries

当前位置： X-MOL 学术 › International Journal on Digital Libraries › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Neural ParsCit: a deep learning-based reference string parser
International Journal on Digital Libraries ( IF 1.6 ) Pub Date : 2018-05-19 , DOI: 10.1007/s00799-018-0242-1
Animesh Prasad , Manpreet Kaur , Min-Yen Kan

We present a deep learning approach for the core digital libraries task of parsing bibliographic reference strings. We deploy the state-of-the-art long short-term memory (LSTM) neural network architecture, a variant of a recurrent neural network to capture long-range dependencies in reference strings. We explore word embeddings and character-based word embeddings as an alternative to handcrafted features. We incrementally experiment with features, architectural configurations, and the diversity of the dataset. Our final model is an LSTM-based architecture, which layers a linear chain conditional random field (CRF) over the LSTM output. In extensive experiments in both English in-domain (computer science) and out-of-domain (humanities) test cases, as well as multilingual data, our results show a significant gain ($$p<0.01$$p<0.01) over the reported state-of-the-art CRF-only-based parser.

中文翻译：

Neural ParsCit：基于深度学习的参考字符串解析器

我们为解析书目参考字符串的核心数字图书馆任务提供了一种深度学习方法。我们部署了最新的长期短期记忆（LSTM）神经网络体系结构，这是循环神经网络的一种变体，可以捕获参考字符串中的长期依赖关系。我们探索单词嵌入和基于字符的单词嵌入，以替代手工制作的功能。我们逐步测试功能，架构配置和数据集的多样性。我们的最终模型是基于LSTM的架构，该架构在LSTM输出上分层放置线性链条件随机场（CRF）。在英语域内（计算机科学）和域外（人文）测试用例以及多语言数据的大量实验中，我们的结果显示出可观的收益（$$ p <0.01 $$ p <0。

更新日期：2018-05-19

点击分享查看原文

点击收藏

阅读更多本刊最新论文