当前位置: X-MOL 学术Biol. Direct › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A novel framework for horizontal and vertical data integration in cancer studies with application to survival time prediction models.
Biology Direct ( IF 5.5 ) Pub Date : 2019-11-21 , DOI: 10.1186/s13062-019-0249-6
Iliyan Mihaylov 1 , Maciej Kańduła 2, 3 , Milko Krachunov 1 , Dimitar Vassilev 1
Affiliation  

BACKGROUND Recently high-throughput technologies have been massively used alongside clinical tests to study various types of cancer. Data generated in such large-scale studies are heterogeneous, of different types and formats. With lack of effective integration strategies novel models are necessary for efficient and operative data integration, where both clinical and molecular information can be effectively joined for storage, access and ease of use. Such models, combined with machine learning methods for accurate prediction of survival time in cancer studies, can yield novel insights into disease development and lead to precise personalized therapies. RESULTS We developed an approach for intelligent data integration of two cancer datasets (breast cancer and neuroblastoma) - provided in the CAMDA 2018 'Cancer Data Integration Challenge', and compared models for prediction of survival time. We developed a novel semantic network-based data integration framework that utilizes NoSQL databases, where we combined clinical and expression profile data, using both raw data records and external knowledge sources. Utilizing the integrated data we introduced Tumor Integrated Clinical Feature (TICF) - a new feature for accurate prediction of patient survival time. Finally, we applied and validated several machine learning models for survival time prediction. CONCLUSION We developed a framework for semantic integration of clinical and omics data that can borrow information across multiple cancer studies. By linking data with external domain knowledge sources our approach facilitates enrichment of the studied data by discovery of internal relations. The proposed and validated machine learning models for survival time prediction yielded accurate results. REVIEWERS This article was reviewed by Eran Elhaik, Wenzhong Xiao and Carlos Loucera.

中文翻译:

癌症研究中水平和垂直数据集成的新框架及其在生存时间预测模型中的应用。

背景技术近来,高通量技术已与临床测试一起广泛用于研究各种类型的癌症。在这种大规模研究中生成的数据是异类的,具有不同的类型和格式。由于缺乏有效的整合策略,新型模型对于有效和可操作的数据整合是必要的,其中可以有效地结合临床和分子信息以进行存储,访问和使用。这种模型与机器学习方法相结合,可以准确预测癌症研究中的生存时间,可以对疾病发展产生新颖的见解,并可以进行精确的个性化治疗。结果我们开发了一种方法来对两个癌症数据集(乳腺癌和神经母细胞瘤)进行智能数据集成-在CAMDA 2018``癌症数据集成挑战赛''中提供了该方法,并比较了预测生存时间的模型。我们开发了一个新颖的基于语义网络的数据集成框架,该框架利用了NoSQL数据库,在该数据库中,我们使用原始数据记录和外部知识源将临床和表达概况数据进行了组合。利用集成数据,我们引入了肿瘤集成临床功能(TICF)-一种新功能,可准确预测患者的生存时间。最后,我们应用并验证了几种机器学习模型来预测生存时间。结论我们为临床和组学数据的语义集成开发了一个框架,该框架可以在多个癌症研究中借鉴信息。通过将数据与外部领域的知识资源相链接,我们的方法有助于通过发现内部关系来丰富所研究的数据。提出并验证的用于生存时间预测的机器学习模型产生了准确的结果。审阅者本文由Eran Elhaik,肖文忠和Carlos Loucera审阅。
更新日期:2020-04-22
down
wechat
bug