当前位置: X-MOL 学术BMC Med. Inform. Decis. Mak. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Using an artificial neural network to map cancer common data elements to the biomedical research integrated domain group model in a semi-automated manner.
BMC Medical Informatics and Decision Making ( IF 3.3 ) Pub Date : 2019-12-23 , DOI: 10.1186/s12911-019-0979-5
Robinette Renner 1 , Shengyu Li 2 , Yulong Huang 3 , Ada Chaeli van der Zijp-Tan 3 , Shaobo Tan 2 , Dongqi Li 2 , Mohan Vamsi Kasukurthi 2 , Ryan Benton 2 , Glen M Borchert 4 , Jingshan Huang 2, 5 , Guoqian Jiang 6
Affiliation  

BACKGROUND The medical community uses a variety of data standards for both clinical and research reporting needs. ISO 11179 Common Data Elements (CDEs) represent one such standard that provides robust data point definitions. Another standard is the Biomedical Research Integrated Domain Group (BRIDG) model, which is a domain analysis model that provides a contextual framework for biomedical and clinical research data. Mapping the CDEs to the BRIDG model is important; in particular, it can facilitate mapping the CDEs to other standards. Unfortunately, manual mapping, which is the current method for creating the CDE mappings, is error-prone and time-consuming; this creates a significant barrier for researchers who utilize CDEs. METHODS In this work, we developed a semi-automated algorithm to map CDEs to likely BRIDG classes. First, we extended and improved our previously developed artificial neural network (ANN) alignment algorithm. We then used a collection of 1284 CDEs with robust mappings to BRIDG classes as the gold standard to train and obtain the appropriate weights of six attributes in CDEs. Afterward, we calculated the similarity between a CDE and each BRIDG class. Finally, the algorithm produces a list of candidate BRIDG classes to which the CDE of interest may belong. RESULTS For CDEs semantically similar to those used in training, a match rate of over 90% was achieved. For those partially similar, a match rate of 80% was obtained and for those with drastically different semantics, a match rate of up to 70% was achieved. DISCUSSION Our semi-automated mapping process reduces the burden of domain experts. The weights are all significant in six attributes. Experimental results indicate that the availability of training data is more important than the semantic similarity of the testing data to the training data. We address the overfitting problem by selecting CDEs randomly and adjusting the ratio of training and verification samples. CONCLUSIONS Experimental results on real-world use cases have proven the effectiveness and efficiency of our proposed methodology in mapping CDEs with BRIDG classes, both those CDEs seen before as well as new, unseen CDEs. In addition, it reduces the mapping burden and improves the mapping quality.

中文翻译:

使用人工神经网络以半自动方式将癌症常见数据元素映射到生物医学研究集成域组模型。

背景技术医学界使用多种数据标准来满足临床和研究报告的需要。ISO 11179 通用数据元素 (CDE) 代表了一种提供可靠数据点定义的标准。另一个标准是生物医学研究集成领域组(BRIDG)模型,它是一个领域分析模型,为生物医学和临床研究数据提供上下文框架。将 CDE 映射到 BRIDG 模型非常重要;特别是,它可以帮助将 CDE 映射到其他标准。不幸的是,手动映射(当前创建 CDE 映射的方法)容易出错且耗时。这给使用 CDE 的研究人员造成了重大障碍。方法在这项工作中,我们开发了一种半自动算法来将 CDE 映射到可能的 BRIDG 类。首先,我们扩展并改进了之前开发的人工神经网络(ANN)对齐算法。然后,我们使用 1284 个 CDE 的集合(具有到 BRIDG 类的稳健映射)作为黄金标准来训练并获得 CDE 中六个属性的适当权重。然后,我们计算了 CDE 和每个 BRIDG 类之间的相似度。最后,该算法生成感兴趣的 CDE 可能所属的候选 BRIDG 类列表。结果 对于语义上与训练中使用的 CDE 相似的 CDE,匹配率达到了 90% 以上。对于部分相似的,匹配率达到 80%,而对于语义截然不同的,匹配率高达 70%。讨论我们的半自动映射流程减轻了领域专家的负担。六个属性的权重都很重要。实验结果表明,训练数据的可用性比测试数据与训练数据的语义相似性更重要。我们通过随机选择 CDE 并调整训练样本和验证样本的比例来解决过拟合问题。结论 实际用例的实验结果证明了我们提出的将 CDE 与 BRIDG 类映射的方法的有效性和效率,包括以前见过的 CDE 以及新的、未见过的 CDE。此外,它还减少了测绘负担并提高了测绘质量。
更新日期:2019-12-23
down
wechat
bug