The motif composition of variable number tandem repeats impacts gene expression

  1. Mark J.P. Chaisson1,3
  1. 1Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, California 90089, USA;
  2. 2Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, California 90033, USA;
  3. 3The Genomic and Epigenomic Regulation Program, USC Norris Cancer Center, University of Southern California, Los Angeles, California 90033, USA
  • Corresponding author: mchaisso{at}usc.edu
  • Abstract

    Understanding the impact of DNA variation on human traits is a fundamental question in human genetics. Variable number tandem repeats (VNTRs) make up ∼3% of the human genome but are often excluded from association analysis owing to poor read mappability or divergent repeat content. Although methods exist to estimate VNTR length from short-read data, it is known that VNTRs vary in both length and repeat (motif) composition. Here, we use a repeat-pangenome graph (RPGG) constructed on 35 haplotype-resolved assemblies to detect variation in both VNTR length and repeat composition. We align population-scale data from the Genotype-Tissue Expression (GTEx) Consortium to examine how variations in sequence composition may be linked to expression, including cases independent of overall VNTR length. We find that 9422 out of 39,125 VNTRs are associated with nearby gene expression through motif variations, of which only 23.4% are accessible from length. Fine-mapping identifies 174 genes to be likely driven by variation in certain VNTR motifs and not overall length. We highlight two genes, CACNA1C and RNF213, that have expression associated with motif variation, showing the utility of RPGG analysis as a new approach for trait association in multiallelic and highly variable loci.

    Footnotes

    • [Supplemental material is available for this article.]

    • Article published online before print. Article, supplemental material, and publication date are at https://www.genome.org/cgi/doi/10.1101/gr.276768.122.

    • Freely available online through the Genome Research Open Access option.

    • Received March 17, 2022.
    • Accepted March 29, 2023.

    This article, published in Genome Research, is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/.

    | Table of Contents
    OPEN ACCESS ARTICLE

    Preprint Server