当前位置: X-MOL 学术Arterioscler. Thromb. Vasc. Biol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Multi-Omics and Single-Cell Omics: New Tools in Drug Target Discovery
Arteriosclerosis, Thrombosis, and Vascular Biology ( IF 8.7 ) Pub Date : 2024-03-27 , DOI: 10.1161/atvbaha.124.320686
Joseph Loscalzo 1
Affiliation  

Biological systems are inherently noisy, with noise caused by the measurement process (ie, technical noise) and by inherent biological variability (ie, biological noise). Biological noise accounts for phenotypic differences within a population of individuals, as well as differences between two different cells of the same lineage within an organ of a single individual. Biomedical investigators typically strive to eliminate technical noise as much as possible to optimize the signal-to-noise ratio of a measurement and to mitigate the impact of biological noise by repeated measurements, thereby improving the statistical confidence in the mean signal.


With the advent of the genomic era, the biomedical research paradigm has moved from pure reductionism to a more holistic, integrated approach to system analysis. In this setting, biological noise plays a key role, accounting for heterogeneity in response to perturbations between cells and between individuals and offering a mechanistic explanation for incomplete genetic penetrance (divergent pathophenotypes), variable functional phenotypes in subjects with identical disease-causing genetic variants, as well as variability in drug response. Clearly, understanding the bases for this type of biological heterogeneity is essential for defining individual disease risk, prognosis, and precision therapeutics.


Since the beginning of the Human Genome Project, we have moved from DNA sequence determination to a detailed assessment of other omics components. Technical approaches to measuring each omics level have evolved rapidly. More efficient, rapid, lower cost DNA sequencing is now complemented by bulk transcriptomics, proteomics, epigenomics, and metabolomics, which, in turn, have been followed by single-cell transcriptomics and spatial transcriptomics, as well as single-cell proteomics, epigenomics, and metabolomics. This extraordinary growth in the detailed omics characterization of a biological system has led to analytical challenges that have yet to be resolved satisfactorily. These multi-omics systems are of high dimensionality and are overdetermined from a dynamical systems perspective, which leads to the following questions: How can one assess the interactions among multi-omics layers? How can one reduce the dimensionality of the system to make it optimally tractable and biomedically useful? How can one exploit the biological noise inherent in these multi-omics systems to define disease expression within an individual and consequent precision therapeutics tailored to that individual’s multiome?


We and others have approached this problem through the lens of molecular interaction networks (Figure 1). Molecular networks (eg, protein-protein interaction networks, metabolic networks, gene regulatory networks, and Bayesian coexpression networks) provide graphical depictions of relationships between the elements within a given network. For example, in the typical protein-protein interaction network, each node represents a protein, and its physical interaction with another protein in the proteome is depicted by a link or edge. We have shown that subnetworks are contained within the comprehensive protein-protein interaction network that are associated with specific diseases (disease modules).2,3 By reducing the overall dimensionality of the network through a disease-focused perspective, these constructs offer a more detailed, causal view of the pathways and proteins that govern disease pathobiology. Disease modules that overlap provide insights into pathways and proteins that are common to different diseases. In addition, disease modules offer a mechanistic path toward drug target identification, making the target discovery process more efficient and rational. One can also use this same approach to repurpose approved drugs as we have done for coronary artery disease4 and SARS-CoV-2 infection5 by applying a variety of network-based statistical approaches including network-based artificial intelligence strategies.4–6


Figure 1. Network integration of multi-omics data from the patient to the single-cell level. Analysis of integrated multi-omics networks provides a basis for resolving phenotypic heterogeneity in human pathobiology and in defining precise drug therapies. Reproduced with permission from Wang et al.1


Expanding this analysis of the protein-protein interaction network to include the transcriptome yields additional useful information, including which pathways within a disease module are present in a given cell type or tissue.7 Only tissues that express key components of pathways governing disease phenotype manifest disease, regardless of whether or not genetic variants associated with the disease are themselves expressed in that tissue. Furthermore, using a transcriptome-based differential analysis of gene expression pairs (pairwise correlation analysis) in diseased versus normal tissue, one can generate patient-specific disease modules, or reticulotypes (after the Latin for network), that can provide unique information on patient-specific disease mechanisms and patient-specific drug targets, as we showed for hypertrophic cardiomyopathy.8


It is important to point out that biological variability of complex multi-omics systems can be a reflection of deterministic genetic variability (variants and mutants), as well as biochemical stochasticity (epigenetics, posttranslational modification of the proteome or the transcriptome). This latter class of causes of omics noise is more challenging to assess owing to its randomness and incompleteness. Theoretically, these stochastic modifications of the proteome can serve as the basis for a multitude of instantiations of the protein-protein interaction network with possibly differing effects on the phenotypes of interest. In the case of exposome-based epigenomic modification, it is not always clear as to what the time-activity relationship is to the pathophenotypes of interest. Furthermore, natural, time-dependent variation in the epigenome as a result of stage of development coupled with transgenerational effects on epigenetic marks makes this level of omics analysis even more complicated, with no widely accepted approach available as yet.


Are there evolutionary advantages to multi-omics networks that can inform our analysis of them? One answer to this question lies in an analysis of the dynamical behavior of such networks. Coupling among omics layers appears to increase the stable operating range of the system, mitigating the consequences of (adverse) perturbations, rendering the system more resilient that it would be were one omics layer to operate in isolation.9 It is likely that variance in network structure, protein expression, or protein function (heritable or stochastic) yields biological noise that influences this resilience. Variation in the concentration of metabolites in metabolic networks coupled with variation in the catalytic constants of their coupled reactions supports a stabler operating range than limited or no variation, providing the system with the ability to utilize, for example, energy sources effectively in conditions of dearth or abundance.10 Allosteric modulation of enzyme systems provides a similar functional advantage.11


In addition to molecular-level omics, cell-based immunophenotyping has been utilized in conjunction with single-cell RNA sequencing to characterize in more detail subsets of cells of ostensibly similar lineage. This approach, coupled with multiorgan and single-cell analyses of gene expression, has provided insight into patient-specific regulation of the immune response and patient-specific drug target identification in allergic12 and autoimmune disorders.13


Detailed immunophenotyping is one method for characterizing relevant phenotypic features that govern disease expression.14 Robust phenotyping in general, including orthogonal phenotyping (ie, characterizing phenotypic features that are not believed or known to play a role in the disease of interest), is essential for identifying disease mechanisms and potential drug targets. Coupled with multi-omics data, in-depth phenotyping can be utilized to ascertain key mechanisms of disease in a patient-specific manner, as well as potential therapeutic targets. Furthermore, similar approaches can be used to identify network-based, multi-omics–derived biomarkers that can be used to predict disease course and response to therapy.


To achieve these goals effectively requires a specific computational workflow linked to newer network-based analytical algorithms15 (Figure 2). Typically, the analytical exercise requires first a feature selection step (ie, which elements in the phenotypic and multi-omics data sets may be important for outcome prediction), followed by data integration and representation (ie, dimensionality reduction and newly transformed data rendering), and a data-clustering methodology. This approach is designed to yield information on outcome associations, biomarker generation, and risk or response prediction. There are numerous methods that have been developed for use in this workflow that include supervised versus unsupervised feature selection, linear versus nonlinear feature extraction and data representation, linear versus network-based multi-omics data integration, and partitioning versus hierarchical clustering algorithms. There are many examples of specific algorithms under each of these feature selection or model types, the choice of which depends upon the nature of the data set and the goal of the analysis.


Figure 2. Computational strategy for disease subtyping. Top, Progression from one-size-fits-all medicine to individualized medicine. Bottom, Computational pipeline for analysis of multi-omics data sets. repr indicates representation. Reproduced with permission from Maiorino and Loscalzo.15


Recent advances in network-based machine learning and artificial intelligence utilize a combination of physicochemical features of small-molecule drugs, estimated binding constants to protein targets, and network features of disease modules and their multi-omics context to predict drug-target interactions.6,16 This type of analysis coupled with AI-based functional assessment of the disease pathway or module within which the drug target is localized can then be used to minimize the number of drug-target interactions that require experimental testing in vitro and in vivo, decreasing development time and offering the promise of accelerating implementation in human clinical trials.


Biomedical investigators have spent the past 200 years attempting to reduce the complexity of the biological systems they study. In the current era of large data sets of multi-omics systems, we can no longer afford to avoid their intrinsic complexity. In fact, that complexity and the natural biological variation (noise) that further complicates it should be viewed as a means to move the field of precision medicine forward. Recognizing that many, if not most, pathophenotypes are convergent, and that deeper phenotyping will unveil the subtle distinctions that discriminate between individuals, a robust analysis of the molecular complexity underlying these phenotypes will provide a deeper individualized understanding of disease mechanism and a path to more precises and effective therapies.


This work was supported, in part, by the National Institutes of Health grants HL155107, HL166137, and HG007691 and by American Heart Association grants 957729 and 24MERIT185447.


None.


Disclosures None.


The American Heart Association celebrates its 100th anniversary in 2024. This article is part of a series across the entire AHA Journal portfolio written by international thought leaders on the past, present, and future of cardiovascular and cerebrovascular research and care. To explore the full Centennial Collection, visit https://www.ahajournals.org/centennial


For Sources of Funding and Disclosures, see page 762.




中文翻译:

多组学和单细胞组学:药物靶标发现的新工具

生物系统本质上是有噪声的,噪声是由测量过程(即技术噪声)和固有的生物变异性(即生物噪声)引起的。生物噪声解释了个体群体内的表型差异,以及单个个体器官内同一谱系的两个不同细胞之间的差异。生物医学研究人员通常努力尽可能消除技术噪声,以优化测量的信噪比,并通过重复测量减轻生物噪声的影响,从而提高平均信号的统计置信度。


随着基因组时代的到来,生物医学研究范式已经从纯粹的还原论转向更全面、综合的系统分析方法。在这种情况下,生物噪声起着关键作用,解释了细胞之间和个体之间对扰动的异质性,并为不完全的遗传外显率(不同的病理表型)、具有相同致病遗传变异的受试者的可变功能表型提供了机械解释,以及药物反应的变异性。显然,了解这种生物异质性的基础对于确定个体疾病风险、预后和精准治疗至关重要。


自人类基因组计划开始以来,我们已经从 DNA 序列测定转向对其他组学成分的详细评估。测量每个组学水平的技术方法已经迅速发展。更高效、更快速、更低成本的 DNA 测序现在得到了批量转录组学、蛋白质组学、表观基因组学和代谢组学的补充,而随后又出现了单细胞转录组学和空间转录组学,以及单细胞蛋白质组学、表观基因组学、和代谢组学。生物系统详细组学表征的惊人增长带来了尚未令人满意解决的分析挑战。这些多组学系统具有高维性,并且从动力系统的角度来看是超定的,这导致了以下问题:如何评估多组学层之间的相互作用?如何降低系统的维数,使其最易于处理且具有生物医学用途?如何利用这些多组学系统中固有的生物噪声来定义个体内的疾病表达,以及随后针对该个体的多组学量身定制的精准治疗?


我们和其他人通过分子相互作用网络的视角解决了这个问题(图 1)。分子网络(例如蛋白质-蛋白质相互作用网络、代谢网络、基因调控网络和贝叶斯共表达网络)提供给定网络内元素之间关系的图形描述。例如,在典型的蛋白质-蛋白质相互作用网络中,每个节点代表一种蛋白质,其与蛋白质组中另一种蛋白质的物理相互作用由链接或边描述。我们已经证明,子网络包含在与特定疾病(疾病模块)相关的综合蛋白质-蛋白质相互作用网络中。2,3通过以疾病为中心的视角降低网络的整体维度,这些构建体提供了控制疾病病理学的途径和蛋白质的更详细的因果视图。重叠的疾病模块提供了对不同疾病常见的途径和蛋白质的见解。此外,疾病模块为药物靶点识别提供了机制路径,使靶点发现过程更加高效和合理。人们还可以采用与我们针对冠状动脉疾病4和 SARS-CoV-2 感染5所做的相同的方法,通过应用各种基于网络的统计方法(包括基于网络的人工智能策略)来重新调整已批准药物的用途。4-6


图 1. 从患者到单细胞水平的多组学数据的网络集成。综合多组学网络的分析为解决人类病理学中的表型异质性和定义精确的药物治疗提供了基础。经王等人许可转载。1


将蛋白质-蛋白质相互作用网络的分析扩展到包括转录组会产生额外的有用信息,包括疾病模块内的哪些途径存在于给定的细胞类型或组织中。7只有表达控制疾病表型途径关键成分的组织才会表现出疾病,无论与疾病相关的遗传变异是否本身在该组织中表达。此外,使用基于转录组的患病组织与正常组织中基因表达对的差异分析(成对相关分析),可以生成患者特定的疾病模块或网状结构(在拉丁语中表示网络),可以提供有关患者的独特信息-特定的疾病机制和患者特定的药物靶点,正如我们针对肥厚型心肌病所展示的那样。8


需要指出的是,复杂多组学系统的生物变异性可以反映确定性遗传变异性(变异体和突变体)以及生化随机性(表观遗传学、蛋白质组或转录组的翻译后修饰)。由于其随机性和不完整性,后一类组学噪声的原因更难以评估。理论上,蛋白质组的这些随机修饰可以作为蛋白质-蛋白质相互作用网络的大量实例化的基础,可能对感兴趣的表型产生不同的影响。在基于暴露组的表观基因组修饰的情况下,并不总是清楚时间-活动关系与感兴趣的病理表型之间的关系。此外,由于发育阶段导致的表观基因组自然的、时间依赖性的变异,加上表观遗传标记的跨代效应,使得这一水平的组学分析变得更加复杂,目前还没有广泛接受的方法。


多组学网络是否具有进化优势可以为我们的分析提供信息?这个问题的一个答案在于对此类网络的动态行为的分析。组学层之间的耦合似乎增加了系统的稳定运行范围,减轻了(不利)扰动的后果,使系统比一个组学层单独运行时更具弹性。9网络结构、蛋白质表达或蛋白质功能(遗传或随机)的差异可能会产生影响这种弹性的生物噪音。代谢网络中代谢物浓度的变化加上其偶联反应的催化常数的变化支持比有限或无变化更稳定的操作范围,为系统提供了在匮乏条件下有效利用能源等的能力或丰富。10酶系统的变构调节提供了类似的功能优势。11


除了分子水平组学之外,基于细胞的免疫表型分析还与单细胞 RNA 测序结合使用,以更详细地表征表面上相似谱系的细胞子集。这种方法与基因表达的多器官和单细胞分析相结合,为了解过敏性12和自身免疫性疾病中患者特异性免疫反应调节和患者特异性药物靶点识别提供了见解。13


详细的免疫表型分析是表征控制疾病表达的相关表型特征的一种方法。14一般而言,稳健的表型分析,包括正交表型分析(即表征不被认为或不知道在感兴趣的疾病中发挥作用的表型特征),对于识别疾病机制和潜在的药物靶点至关重要。结合多组学数据,深入的表型分析可用于以患者特异性的方式确定疾病的关键机制以及潜在的治疗靶点。此外,类似的方法可用于识别基于网络的、多组学衍生的生物标志物,这些生物标志物可用于预测病程和对治疗的反应。


为了有效地实现这些目标,需要与更新的基于网络的分析算法相链接的特定计算工作流程15(图 2)。通常,分析练习首先需要特征选择步骤(即表型和多组学数据集中的哪些元素可能对结果预测很重要),然后是数据集成和表示(即降维和新转换的数据呈现) ,以及数据聚类方法。该方法旨在产生有关结果关联、生物标志物生成以及风险或反应预测的信息。已经开发了许多方法用于此工作流程,包括监督与无监督特征选择、线性与非线性特征提取和数据表示、线性与基于网络的多组学数据集成,以及分区与分层聚类算法。每个特征选择或模型类型都有许多特定算法的示例,其选择取决于数据集的性质和分析的目标。


图 2. 疾病亚型分型的计算策略。顶部,从一刀切的医学到个体化医学的进展。底部,用于分析多组学数据集的计算管道。 repr表示代表性。经 Maiorino 和 Loscalzo 许可转载。15


基于网络的机器学习和人工智能的最新进展利用小分子药物的理化特征、估计的蛋白质靶点结合常数以及疾病模块的网络特征及其多组学背景的组合来预测药物-靶点相互作用。6,16这种类型的分析与基于人工智能的疾病途径或药物靶点定位模块的功能评估相结合,可用于最大限度地减少需要体外和体内实验测试的药物与靶点相互作用的数量,缩短开发时间并有望加速人体临床试验的实施。


过去 200 年来,生物医学研究人员一直试图降低他们所研究的生物系统的复杂性。在当前多组学系统大数据集的时代,我们无法再回避其内在的复杂性。事实上,这种复杂性和自然生物变异(噪音)使其进一步复杂化,应该被视为推动精准医学领域向前发展的一种手段。认识到许多(如果不是大多数)病理表型是趋同的,并且更深入的表型分析将揭示个体之间的微妙区别,对这些表型背后的分子复杂性进行强有力的分析将提供对疾病机制的更深入的个体化理解,并为更多的研究提供途径。精准有效的治疗方法。


这项工作部分得到了美国国立卫生研究院拨款 HL155107、HL166137 和 HG007691 以及美国心脏协会拨款 957729 和 24MERIT185447 的支持。


没有任何。


披露无。


美国心脏协会将于 2024 年庆祝成立 100 周年。本文是国际思想领袖撰写的整个 AHA 期刊系列文章的一部分,内容涉及心脑血管研究和护理的过去、现在和未来。要探索完整的百年纪念收藏,请访问 https://www.ahajournals.org/centennial


有关资金来源和披露信息,请参阅第 762 页。


更新日期:2024-03-28
down
wechat
bug