当前位置: X-MOL 学术Thorax › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
How to handle big data for disease stratification in respiratory medicine?
Thorax ( IF 9.0 ) Pub Date : 2023-07-01 , DOI: 10.1136/thorax-2023-220138
Krasimira Tsaneva-Atanasova 1 , Chris Scotton 2
Affiliation  

Increasingly complex datasets of biomedical measurements offer an opportunity for discovering patient endotypes. These represent subtypes of a disease marked by distinct pathomechanisms—which can have enormous implications for prognosis and clinical management. Such datasets often include imaging, genomics and transcriptomics, proteomics, microbiotal composition, allergen/environmental exposures, and immunological data—as well as patient outcomes and routinely collected clinical parameters. Given the sheer volume of data, interpretation is extremely challenging. Recently, topological data analysis (TDA) has been rapidly gaining in popularity for application to such datasets (see Skaf and Laubenbacher1 for review). Respiratory medicine is no exception, as topology offers a suite of techniques and tools that could be applied to diverse data. This enables a holistic approach to robustly identify multidimensional properties and relationships within a given multimodal dataset, by using the full range of available clinical and pathobiological data simultaneously. Topological methods also naturally lend themselves to visualisation, rendering them useful for applications that require user interpretation and understanding. TDA offers a more unbiased and rigorous approach to analysing complex datasets, since it does not depend on prior hypotheses nor focus on pairwise relationships within the data. This contrasts with other established analytical methods, such as supervised clustering and classical association analyses. The Mapper algorithm2 is a popular technique in TDA that converts a complex dataset with many dimensions into a simpler network representation embedded in a lower number of dimensions. To achieve this, common techniques such as principal components analysis (PCA), t-distributed stochastic neighbour embedding and uniform manifold approximation and projection (UMAP) could be employed to reduce the dimensionality of the data. The latter has certainly gained notoriety in light of the plethora of single cell RNAseq data currently in circulation. Specifically, the Mapper algorithm starts by applying a projection (eg, UMAP) to the data set and using …

中文翻译:


呼吸内科疾病分层大数据如何处理?



日益复杂的生物医学测量数据集为发现患者内型提供了机会。这些代表了具有不同病理机制的疾病亚型,这可能对预后和临床管理产生巨大影响。此类数据集通常包括成像、基因组学和转录组学、蛋白质组学、微生物组成、过敏原/环境暴露和免疫学数据,以及患者结果和常规收集的临床参数。鉴于数据量巨大,解释极具挑战性。最近,拓扑数据分析 (TDA) 在此类数据集的应用中迅速普及(请参阅 Skaf 和 Laubenbacher1 进行回顾)。呼吸医学也不例外,因为拓扑提供了一套可应用于不同数据的技术和工具。这使得通过同时使用全部可用的临床和病理生物学数据,能够采用整体方法来稳健地识别给定多模式数据集中的多维属性和关系。拓扑方法也自然地适合可视化,使它们对于需要用户解释和理解的应用程序很有用。 TDA 提供了一种更加公正和严格的方法来分析复杂数据集,因为它不依赖于先前的假设,也不关注数据中的成对关系。这与其他已建立的分析方法形成对比,例如监督聚类和经典关联分析。 Mapper 算法2 是 TDA 中的一种流行技术,它将具有多个维度的复杂数据集转换为嵌入较少维度的更简单的网络表示。 为了实现这一目标,可以采用主成分分析 (PCA)、t 分布随机邻域嵌入和均匀流形逼近和投影 (UMAP) 等常用技术来降低数据的维数。鉴于目前流通的大量单细胞 RNAseq 数据,后者无疑已经声名狼藉。具体来说,Mapper 算法首先将投影(例如 UMAP)应用于数据集并使用……
更新日期:2023-06-19
down
wechat
bug