当前位置: X-MOL 学术BMC Pulm. Med. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Novel biomarker genes which distinguish between smokers and chronic obstructive pulmonary disease patients with machine learning approach.
BMC Pulmonary Medicine ( IF 2.6 ) Pub Date : 2020-02-03 , DOI: 10.1186/s12890-020-1062-9
Kazushi Matsumura 1 , Shigeaki Ito 1
Affiliation  

BACKGROUND Chronic obstructive pulmonary disease (COPD) is combination of progressive lung diseases. The diagnosis of COPD is generally based on the pulmonary function testing, however, difficulties underlie in prognosis of smokers or early stage of COPD patients due to the complexity and heterogeneity of the pathogenesis. Computational analyses of omics technologies are expected as one of the solutions to resolve such complexities. METHODS We obtained transcriptomic data by in vitro testing with exposures of human bronchial epithelial cells to the inducers for early events of COPD to identify the potential descriptive marker genes. With the identified genes, the machine learning technique was employed with the publicly available transcriptome data obtained from the lung specimens of COPD and non-COPD patients to develop the model that can reflect the risk continuum across smoking and COPD. RESULTS The expression levels of 15 genes were commonly altered among in vitro tissues exposed to known inducible factors for earlier events of COPD (exposure to cigarette smoke, DNA damage, oxidative stress, and inflammation), and 10 of these genes and their corresponding proteins have not previously reported as COPD biomarkers. Although these genes were able to predict each group with 65% accuracy, the accuracy with which they were able to discriminate COPD subjects from smokers was only 29%. Furthermore, logistic regression enabled the conversion of gene expression levels to a numerical index, which we named the "potential risk factor (PRF)" index. The highest significant index value was recorded in COPD subjects (0.56 at the median), followed by smokers (0.30) and non-smokers (0.02). In vitro tissues exposed to cigarette smoke displayed dose-dependent increases of PRF, suggesting its utility for prospective risk estimation of tobacco products. CONCLUSIONS Our experimental-based transcriptomic analysis identified novel genes associated with COPD, and the 15 genes could distinguish smokers and COPD subjects from non-smokers via machine-learning classification with remarkable accuracy. We also suggested a PRF index that can quantitatively reflect the risk continuum across smoking and COPD pathogenesis, and we believe it will provide an improved understanding of smoking effects and new insights into COPD.

中文翻译:

新型生物标记基因可通过机器学习方法区分吸烟者和慢性阻塞性肺疾病患者。

背景技术慢性阻塞性肺疾病(COPD)是进行性肺疾病的组合。COPD的诊断通常基于肺功能测试,但是由于发病机制的复杂性和异质性,吸烟者或COPD早期患者的预后困难。期望将组学技术的计算分析作为解决此类复杂性的解决方案之一。方法我们通过体外试验将人支气管上皮细胞暴露于诱导剂中,以发现COPD的早期事件,从而获得了转录组学数据,以确定潜在的描述性标记基因。利用已识别的基因,机器学习技术与从COPD和非COPD患者的肺标本中获得的公开转录组数据一起使用,以开发可反映吸烟和COPD风险连续性的模型。结果在暴露于已知诱导因素的体外组织中,COPD的较早事件(暴露于香烟烟雾,DNA损伤,氧化应激和炎症)通常会改变15个基因的表达水平,其中10个基因及其相应蛋白质具有以前没有报道为COPD生物标志物。尽管这些基因能够以65%的准确度预测每个组,但是它们能够区分COPD受试者和吸烟者的准确度仅为29%。此外,逻辑回归可以将基因表达水平转换为数字索引,我们将其命名为“潜在危险因素(PRF)”索引。COPD受试者的最高显着性指数最高(中位数为0.56),其次是吸烟者(0.30)和非吸烟者(0.02)。暴露于香烟烟雾的体外组织显示出剂量依赖性PRF的增加,表明其可用于预测烟草制品的风险。结论我们基于实验的转录组学分析确定了与COPD相关的新基因,并且这15个基因可以通过机器学习分类以非凡的准确性区分吸烟者和COPD受试者与非吸烟者。我们还建议了一个PRF指数,该指数可以定量反映吸烟和COPD发病机制之间的风险连续性,并且我们相信它将提供对吸烟效果的更好理解和对COPD的新见解。COPD受试者的最高显着性指数最高(中位数为0.56),其次是吸烟者(0.30)和非吸烟者(0.02)。暴露于香烟烟雾的体外组织显示出剂量依赖性PRF的增加,表明其可用于预测烟草制品的风险。结论我们基于实验的转录组学分析确定了与COPD相关的新基因,并且这15个基因可以通过机器学习分类以非凡的准确性区分吸烟者和COPD受试者与非吸烟者。我们还建议了一个PRF指数,该指数可以定量反映吸烟和COPD发病机制之间的风险连续性,并且我们相信它将提供对吸烟效果的更好理解和对COPD的新见解。COPD受试者的最高显着性指数最高(中位数为0.56),其次是吸烟者(0.30)和非吸烟者(0.02)。暴露于香烟烟雾的体外组织显示出剂量依赖性PRF的增加,表明其可用于预测烟草制品的风险。结论我们基于实验的转录组学分析确定了与COPD相关的新基因,并且这15个基因可以通过机器学习分类以非凡的准确性区分吸烟者和COPD受试者与非吸烟者。我们还建议了一个PRF指数,该指数可以定量反映吸烟和COPD发病机制之间的风险连续性,并且我们相信它将提供对吸烟效果的更好理解和对COPD的新见解。30)和不吸烟者(0.02)。暴露于香烟烟雾的体外组织显示出剂量依赖性PRF的增加,表明其可用于预测烟草制品的风险。结论我们基于实验的转录组学分析确定了与COPD相关的新基因,并且这15个基因可以通过机器学习分类以非凡的准确性区分吸烟者和COPD受试者与非吸烟者。我们还建议了一个PRF指数,该指数可以定量反映吸烟和COPD发病机制之间的风险连续性,并且我们相信它将提供对吸烟效果的更好理解和对COPD的新见解。30)和不吸烟者(0.02)。暴露于香烟烟雾的体外组织显示出剂量依赖性PRF的增加,表明其可用于预测烟草制品的风险。结论我们基于实验的转录组学分析确定了与COPD相关的新基因,并且这15个基因可以通过机器学习分类以非凡的准确性区分吸烟者和COPD受试者与非吸烟者。我们还建议了一个PRF指数,该指数可以定量反映吸烟和COPD发病机制之间的风险连续性,并且我们相信它将提供对吸烟效果的更好理解和对COPD的新见解。结论我们基于实验的转录组学分析确定了与COPD相关的新基因,并且这15个基因可以通过机器学习分类以非凡的准确性区分吸烟者和COPD受试者与非吸烟者。我们还建议了一个PRF指数,该指数可以定量反映吸烟和COPD发病机制之间的风险连续性,并且我们相信它将提供对吸烟效果的更好理解和对COPD的新见解。结论我们基于实验的转录组学分析确定了与COPD相关的新基因,并且这15个基因可以通过机器学习分类以非凡的准确性区分吸烟者和COPD受试者与非吸烟者。我们还建议了一个PRF指数,该指数可以定量反映吸烟和COPD发病机制之间的风险连续性,并且我们相信它将提供对吸烟效果的更好理解和对COPD的新见解。
更新日期:2020-02-04
down
wechat
bug