当前位置: X-MOL 学术BMC Med. Inform. Decis. Mak. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
RCorp: a resource for chemical disease semantic extraction in Chinese.
BMC Medical Informatics and Decision Making ( IF 3.5 ) Pub Date : 2019-12-05 , DOI: 10.1186/s12911-019-0936-3
Yueping Sun 1 , Li Hou 1 , Lu Qin 1 , Yan Liu 1 , Jiao Li 1 , Qing Qian 1
Affiliation  

BACKGROUND To robustly identify synergistic combinations of drugs, high-throughput screenings are desirable. It will be of great help to automatically identify the relations in the published papers with machine learning based tools. To support the chemical disease semantic relation extraction especially for chronic diseases, a chronic disease specific corpus for combination therapy discovery in Chinese (RCorp) is manually annotated. METHODS In this study, we extracted abstracts from a Chinese medical literature server and followed the annotation framework of the BioCreative CDR corpus, with the guidelines modified to make the combination therapy related relations available. An annotation tool was incorporated to the standard annotation process. RESULTS The resulting RCorp consists of 339 Chinese biomedical articles with 2367 annotated chemicals, 2113 diseases, 237 symptoms, 164 chemical-induce-disease relations, 163 chemical-induce-symptom relations, and 805 chemical-treat-disease relations. Each annotation includes both the mention text spans and normalized concept identifiers. The corpus gets an inter-annotator agreement score of 0.883 for chemical entities, 0.791 for disease entities which are measured by F score. And the F score for chemical-treat-disease relations gets 0.788 after unifying the entity mentions. CONCLUSIONS We extracted and manually annotated a chronic disease specific corpus for combination therapy discovery in Chinese. The result analysis of the corpus proves its quality for the combination therapy related knowledge discovery task. Our annotated corpus would be a useful resource for the modelling of entity recognition and relation extraction tools. In the future, an evaluation based on the corpus will be held.
更新日期:2019-12-05
down
wechat
bug