A new framework for exploratory network mediator analysis in omics data

  1. Tianwei Yu2
  1. 1Department of Biostatistics and Bioinformatics, Emory University, Atlanta, Georgia 30322, USA;
  2. 2Shenzhen Research Institute of Big Data, School of Data Science, the Chinese University of Hong Kong, Shenzhen (CUHK-Shenzhen), Guangdong 518172, P.R. China;
  3. 3School of Medicine, the Chinese University of Hong Kong, Shenzhen (CUHK-Shenzhen), Guangdong 518172, P.R. China;
  4. 4Department of Medicine, Emory University, Atlanta, Georgia 30322, USA;
  5. 5Department of Biostatistics, University of Michigan, Ann Arbor, Michigan 48109, USA
  1. 6 These authors contributed equally to this work.

  • Corresponding authors: jiankang{at}umich.edu, yutianwei{at}cuhk.edu.cn
  • Abstract

    Omics methods are widely used in basic biology and translational medicine research. More and more omics data are collected to explain the impact of certain risk factors on clinical outcomes. To explain the mechanism of the risk factors, a core question is how to find the genes/proteins/metabolites that mediate their effects on the clinical outcome. Mediation analysis is a modeling framework to study the relationship between risk factors and pathological outcomes, via mediator variables. However, high-dimensional omics data are far more challenging than traditional data: (1) From tens of thousands of genes, can we overcome the curse of dimensionality to reliably select a set of mediators? (2) How do we ensure that the selected mediators are functionally consistent? (3) Many biological mechanisms contain nonlinear effects. How do we include nonlinear effects in the high-dimensional mediation analysis? (4) How do we consider multiple risk factors at the same time? To meet these challenges, we propose a new exploratory mediation analysis framework, medNet, which focuses on finding mediators through predictive modeling. We propose new definitions for predictive exposure, predictive mediator, and predictive network mediator, using a statistical hypothesis testing framework to identify predictive exposures and mediators. Additionally, two heuristic search algorithms are proposed to identify network mediators, essentially subnetworks in the genome-scale biological network that mediate the effects of single or multiple exposures. We applied medNet on a breast cancer data set and a metabolomics data set combined with food intake questionnaire data. It identified functionally consistent network mediators for the exposures’ impact on the outcome, facilitating data interpretation.

    Footnotes

    • [Supplemental material is available for this article.]

    • Article published online before print. Article, supplemental material, and publication date are at https://www.genome.org/cgi/doi/10.1101/gr.278684.123.

    • Freely available online through the Genome Research Open Access option.

    • Received November 2, 2023.
    • Accepted April 11, 2024.

    This article, published in Genome Research, is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/.

    | Table of Contents
    OPEN ACCESS ARTICLE

    Preprint Server