当前位置: X-MOL 学术Nat. Mach. Intell. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Out-of-distribution generalization from labelled and unlabelled gene expression data for drug response prediction
Nature Machine Intelligence ( IF 18.8 ) Pub Date : 2021-11-11 , DOI: 10.1038/s42256-021-00408-w
Hossein Sharifi-Noghabi 1, 2 , Parsa Alamzadeh Harjandi 1 , Martin Ester 1, 2 , Colin C. Collins 2, 3 , Olga Zolotareva 4, 5
Affiliation  

Data discrepancy between preclinical and clinical datasets poses a major challenge for accurate drug response prediction based on gene expression data. Different methods of transfer learning have been proposed to address such data discrepancy in drug response prediction for different cancers. These methods generally use cell lines as source domains, and patients, patient-derived xenografts or other cell lines as target domains; however, it is assumed that the methods have access to the target domain during training or fine-tuning, and they can only take labelled source domains as input. The former is a strong assumption that is not satisfied during deployment of these models in the clinic, whereas the latter means these methods rely on labelled source domains that are of limited size. To avoid these assumptions, we formulate drug response prediction in cancer as an out-of-distribution generalization problem, which does not assume that the target domain is accessible during training. Moreover, to exploit unlabelled source domain data—which tends to be much more plentiful than labelled data—we adopt a semi-supervised approach. We propose Velodrome, a semi-supervised method of out-of-distribution generalization that takes labelled and unlabelled data from different resources as input and makes generalizable predictions. Velodrome achieves this goal by introducing an objective function that combines a supervised loss for accurate prediction, an alignment loss for generalization and a consistency loss to incorporate unlabelled samples. Our experimental results demonstrate that Velodrome outperforms state-of-the-art pharmacogenomics and transfer learning baselines on cell lines, patient-derived xenografts and patients. Finally, we showed that Velodrome models generalize to different tissue types that were well-represented, under-represented or completely absent in the training data. Overall, our results suggest that Velodrome may guide precision oncology more accurately.



中文翻译:

用于药物反应预测的标记和未标记基因表达数据的分布外泛化

临床前和临床数据集之间的数据差异对基于基因表达数据的准确药物反应预测提出了重大挑战。已经提出了不同的迁移学习方法来解决不同癌症药物反应预测中的这种数据差异。这些方法一般使用细胞系作为源域,将患者、患者来源的异种移植物或其他细胞系作为靶域;但是,假设这些方法在训练或微调期间可以访问目标域,并且它们只能将标记的源域作为输入。前者是一个强假设,在临床部署这些模型时不满足,而后者意味着这些方法依赖于大小有限的标记源域。为了避免这些假设,我们将癌症中的药物反应预测制定为分布外泛化问题,它不假设目标域在训练期间是可访问的。此外,为了利用未标记的源域数据——这往往比标记数据丰富得多——我们采用了半监督方法。我们提出了 Velodrome,这是一种分布外泛化的半监督方法,它将来自不同资源的标记和未标记数据作为输入并进行可泛化的预测。Velodrome 通过引入一个目标函数来实现这一目标,该函数结合了用于准确预测的监督损失、用于泛化的对齐损失和用于合并未标记样本的一致性损失。我们的实验结果表明,Velodrome 在细胞系、患者来源的异种移植物和患者方面优于最先进的药物基因组学和迁移学习基线。最后,我们展示了 Velodrome 模型可以推广到训练数据中代表性良好、代表性不足或完全不存在的不同组织类型。总体而言,我们的结果表明 Velodrome 可以更准确地指导精准肿瘤学。

更新日期:2021-11-11
down
wechat
bug