Enhancing droplet-based single-nucleus RNA-seq resolution using the semi-supervised machine learning classifier DIEM.,Scientific Reports

当前位置： X-MOL 学术 › Sci. Rep. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Enhancing droplet-based single-nucleus RNA-seq resolution using the semi-supervised machine learning classifier DIEM.
Scientific Reports ( IF 4.6 ) Pub Date : 2020-07-03 , DOI: 10.1038/s41598-020-67513-5
Marcus Alvarez ₁ , Elior Rahmani ₂ , Brandon Jew ₃ , Kristina M Garske ₁ , Zong Miao _{1,

3} , Jihane N Benhammou _{1,

4} , Chun Jimmie Ye ₅ , Joseph R Pisegna _{1,

4} , Kirsi H Pietiläinen _{6,

7} , Eran Halperin _{1,

2,

8,

9,

10} , Päivi Pajukanta _{1,

3,

11}

Affiliation

Department of Human Genetics, David Geffen School of Medicine at UCLA, Los Angeles, CA, 90095, USA.
Department of Computer Science, School of Engineering, UCLA, Los Angeles, CA, 90095, USA.
Bioinformatics Interdepartmental Program, UCLA, Los Angeles, CA, USA.
Vache and Tamar Manoukian Division of Digestive Diseases, UCLA, Los Angeles, CA, USA.
Department of Epidemiology and Biostatistics, Department of Bioengineering and Therapeutic Sciences, Institute for Human Genetics, UCSF, San Francisco, USA.
Obesity Research Unit, Research Programs Unit, Diabetes and Obesity, University of Helsinki, Biomedicum Helsinki, Helsinki, Finland.
Obesity Center, Endocrinology, Abdominal Center, Helsinki University Central Hospital and University of Helsinki, Helsinki, Finland.
Department of Anesthesiology, UCLA Health, Los Angeles, CA, 90095, USA.
Department of Computational Medicine, School of Medicine, UCLA, Los Angeles, CA, 90095, USA.
Institute for Precision Health, School of Medicine, UCLA, Los Angeles, CA, 90095, USA.
Department of Human Genetics, Institute for Precision Health, David Geffen School of Medicine at UCLA, Gonda Center, Room 6335B, 695 Charles E. Young Drive South, Los Angeles, CA, 90095-7088, USA.

Single-nucleus RNA sequencing (snRNA-seq) measures gene expression in individual nuclei instead of cells, allowing for unbiased cell type characterization in solid tissues. We observe that snRNA-seq is commonly subject to contamination by high amounts of ambient RNA, which can lead to biased downstream analyses, such as identification of spurious cell types if overlooked. We present a novel approach to quantify contamination and filter droplets in snRNA-seq experiments, called Debris Identification using Expectation Maximization (DIEM). Our likelihood-based approach models the gene expression distribution of debris and cell types, which are estimated using EM. We evaluated DIEM using three snRNA-seq data sets: (1) human differentiating preadipocytes in vitro, (2) fresh mouse brain tissue, and (3) human frozen adipose tissue (AT) from six individuals. All three data sets showed evidence of extranuclear RNA contamination, and we observed that existing methods fail to account for contaminated droplets and led to spurious cell types. When compared to filtering using these state of the art methods, DIEM better removed droplets containing high levels of extranuclear RNA and led to higher quality clusters. Although DIEM was designed for snRNA-seq, our clustering strategy also successfully filtered single-cell RNA-seq data. To conclude, our novel method DIEM removes debris-contaminated droplets from single-cell-based data fast and effectively, leading to cleaner downstream analysis. Our code is freely available for use at https://github.com/marcalva/diem.

中文翻译：

使用半监督机器学习分类器 DIEM 增强基于液滴的单核 RNA-seq 分辨率。

单核 RNA 测序 (snRNA-seq) 测量单个细胞核而不是细胞中的基因表达，从而可以在实体组织中进行无偏见的细胞类型表征。我们观察到 snRNA-seq 通常会受到大量环境 RNA 的污染，这可能导致下游分析有偏差，例如如果忽略了虚假细胞类型的识别。我们提出了一种在 snRNA-seq 实验中量化污染和过滤液滴的新方法，称为使用期望最大化 (DIEM) 进行碎片识别。我们基于可能性的方法对碎片和细胞类型的基因表达分布进行建模，这些是使用 EM 估计的。我们使用三个 snRNA-seq 数据集评估 DIEM：(1) 体外人类分化前脂肪细胞，(2) 新鲜小鼠脑组织，(3) 来自六个个体的人冷冻脂肪组织 (AT)。所有三个数据集都显示了核外 RNA 污染的证据，我们观察到现有方法无法解释受污染的液滴并导致虚假细胞类型。与使用这些最先进的方法进行过滤相比，DIEM 可以更好地去除含有高水平核外 RNA 的液滴并产生更高质量的簇。尽管 DIEM 是为 snRNA-seq 设计的，但我们的聚类策略也成功过滤了单细胞 RNA-seq 数据。总而言之，我们的新方法 DIEM 可以快速有效地从基于单细胞的数据中去除被碎片污染的液滴，从而实现更清洁的下游分析。我们的代码可在 https://github.com/marcalva/diem 上免费使用。我们观察到现有方法无法解释受污染的液滴并导致虚假细胞类型。与使用这些最先进的方法进行过滤相比，DIEM 可以更好地去除含有高水平核外 RNA 的液滴并产生更高质量的簇。尽管 DIEM 是为 snRNA-seq 设计的，但我们的聚类策略也成功过滤了单细胞 RNA-seq 数据。总而言之，我们的新方法 DIEM 可以快速有效地从基于单细胞的数据中去除被碎片污染的液滴，从而实现更清洁的下游分析。我们的代码可在 https://github.com/marcalva/diem 上免费使用。我们观察到现有方法无法解释受污染的液滴并导致虚假细胞类型。与使用这些最先进的方法进行过滤相比，DIEM 可以更好地去除含有高水平核外 RNA 的液滴并产生更高质量的簇。尽管 DIEM 是为 snRNA-seq 设计的，但我们的聚类策略也成功过滤了单细胞 RNA-seq 数据。总而言之，我们的新方法 DIEM 可以快速有效地从基于单细胞的数据中去除被碎片污染的液滴，从而实现更清洁的下游分析。我们的代码可在 https://github.com/marcalva/diem 上免费使用。DIEM 可以更好地去除含有高水平核外 RNA 的液滴，并产生更高质量的簇。尽管 DIEM 是为 snRNA-seq 设计的，但我们的聚类策略也成功过滤了单细胞 RNA-seq 数据。总而言之，我们的新方法 DIEM 可以快速有效地从基于单细胞的数据中去除被碎片污染的液滴，从而实现更清洁的下游分析。我们的代码可在 https://github.com/marcalva/diem 上免费使用。DIEM 可以更好地去除含有高水平核外 RNA 的液滴，并产生更高质量的簇。尽管 DIEM 是为 snRNA-seq 设计的，但我们的聚类策略也成功过滤了单细胞 RNA-seq 数据。总而言之，我们的新方法 DIEM 可以快速有效地从基于单细胞的数据中去除被碎片污染的液滴，从而实现更清洁的下游分析。我们的代码可在 https://github.com/marcalva/diem 上免费使用。

更新日期：2020-07-03

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>