Sequence-based correction of barcode bias in massively parallel reporter assays

  1. Aravinda Chakravarti1
  1. 1Center for Human Genetics and Genomics, New York University School of Medicine, New York, New York 10016, USA;
  2. 2Institute of Molecular Medicine, McGovern Medical School, University of Texas Health Science Center at Houston, Houston, Texas 77030, USA;
  3. 3Department of Genetics, Harvard Medical School, Boston, Massachusetts 02115, USA;
  4. 4Department of Biomedical Engineering, Johns Hopkins University, Baltimore, Maryland 21218, USA
  • 5 Present address: Division of Nephrology, Boston Children's Hospital, Boston, MA 02115, USA

  • Corresponding author: dongwon.lee{at}childrens.harvard.edu
  • Abstract

    Massively parallel reporter assays (MPRAs) are a high-throughput method for evaluating in vitro activities of thousands of candidate cis-regulatory elements (CREs). In these assays, candidate sequences are cloned upstream or downstream from a reporter gene tagged by unique DNA sequences. However, tag sequences may themselves affect reporter gene expression and lead to major potential biases in the measured cis-regulatory activity. Here, we present a sequence-based method for correcting tag-sequence-specific effects and show that our method can significantly reduce this source of variation and improve the identification of functional regulatory variants by MPRAs. We also show that our model captures sequence features associated with post-transcriptional regulation of mRNA. Thus, this new method helps not only to improve detection of regulatory signals in MPRA experiments but also to design better MPRA protocols.

    Footnotes

    • Received July 18, 2020.
    • Accepted July 7, 2021.

    This article is distributed exclusively by Cold Spring Harbor Laboratory Press for the first six months after the full-issue publication date (see https://genome.cshlp.org/site/misc/terms.xhtml). After six months, it is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/.

    | Table of Contents

    Preprint Server