当前位置: X-MOL 学术Scientometrics › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
An approach for detecting the commonality and specialty between scientific publications and patents
Scientometrics ( IF 3.9 ) Pub Date : 2021-07-05 , DOI: 10.1007/s11192-021-04085-9
Shuo Xu 1 , Ling Li 1 , Liyuan Hao 1 , Xin An 2 , Guancan Yang 3
Affiliation  

Scientific publications and patents are usually viewed as respective proxies of scientific research and technical development. There is considerable effort spent towards establishing topic linkages between science and technology with the lexical- or topic-based approaches. However, due to the heterogeneity between scholarly articles and patents in terms of purpose, statement, and quality, the performance is not satisfactory. To understand the difficulties of topic linkages and improve the performance, a framework is proposed to detect the commonality and specialty between scientific publications and patents from the two perspectives: linguistic characteristics and thematic structures. Extensive experimental results on the DrugBank dataset discover five commonness and five significant differences in terms of linguistic characteristics. For example, nouns are used most frequently among them, and scientific publications contain more word tokens than patent documents, but patents have usually longer sentences and use more clauses. In the meanwhile, common and special thematic structures are also uncovered between scientific publications and patents. The themes about general description in the pharmaceutical field are shared by two heterogeneous resources. The scientific publications tend to explain the disease mechanism and the medication content, while patents bias towards the preparation and practical application of drugs.



中文翻译:

一种检测科学出版物和专利之间共性和特殊性的方法

科学出版物和专利通常被视为科学研究和技术发展的各自代理。在科学和技术与基于词汇或基于主题的方法之间建立主题联系方面付出了相当大的努力。然而,由于学术文章和专利在目的、陈述和质量方面的异质性,表现不尽如人意。为了理解主题链接的难点并提高性能,提出了一个框架,从语言特征和主题结构两个角度检测科学出版物和专利之间的共性和特殊性。DrugBank 数据集上的大量实验结果发现了语言特征方面的五个共性和五个显着差异。例如,其中名词使用频率最高,科学出版物比专利文件包含更多的词标记,但专利通常有更长的句子和更多的从句。同时,在科学出版物和专利之间也发现了共同和特殊的主题结构。制药领域的一般描述主题由两种异构资源共享。科学出版物倾向于解释疾病机制和药物内容,而专利则偏向于药物的制备和实际应用。科学出版物和专利之间也发现了共同和特殊的主题结构。制药领域的一般描述主题由两种异构资源共享。科学出版物倾向于解释疾病机制和药物内容,而专利则偏向于药物的制备和实际应用。科学出版物和专利之间也发现了共同和特殊的主题结构。制药领域的一般描述主题由两种异构资源共享。科学出版物倾向于解释疾病机制和药物内容,而专利则偏向于药物的制备和实际应用。

更新日期:2021-07-05
down
wechat
bug