Skip to main content
Log in

Elucidation of genomic organizations of transgenic soybean plants through de novo genome assembly with short paired-end reads

  • Published:
Molecular Breeding Aims and scope Submit manuscript

Abstract

Elucidation of the genomic organizations of transgene insertion sites is essential for the genetic studies of transgenic plants. Herein, we establish an analysis pipeline that identifies the transgene insertion sites as well as the presence of vector backbones, through de novo genome assembly with high-throughput sequencing data in two transgenic soybean lines, AtYUCCA6-#5 and 35S-UGT72E3/2-#7. Sequencing data of approximately 28× and 29× genome coverages for each line generated by high-throughput sequencing were de novo assembled. The databases generated from the de novo assembled sequences were used to search contigs that contained putative insertion sites and their flanking sequences (integration sites) of transgene fragments using transgenic vector sequences as queries. The predicted integration site sequences, which are located at three annotated genes that might regulate plant development or confer disease resistance, were then confirmed by local alignment against the soybean reference genome and PCR amplification. As results, we revealed the precise transgene-flanking sequences and sequence rearrangements at insertion sites in both the transgenic lines, as well as the aberrant insertion of a transgene fragment. Consequently, relative to experimental or enrichment technologies, our approach is straightforward and time-effective, providing an alternative method for the identification of insertion sites in transgenic plants.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  • Al-Babili S, Beyer P (2005) Golden Rice - five years on the road - five years to go? Trends Plant Sci 10:565–573. https://doi.org/10.1016/j.tplants.2005.10.006

    Article  CAS  PubMed  Google Scholar 

  • Alonso JM, Stepanova AN, Leisse TJ et al (2003) Genome-wide insertional mutagenesis of Arabidopsis thaliana. Science. 301:653–657. https://doi.org/10.1126/science.1086391

    Article  PubMed  Google Scholar 

  • An S, Park S, Jeong DH, Lee DY, Kang HG, Yu JH, Hur J, Kim SR, Kim YH, Lee M, Han S, Kim SJ, Yang J, Kim E, Wi SJ, Chung HS, Hong JP, Choe V, Lee HK, Choi JH, Nam J, Kim SR, Park PB, Park KY, Kim WT, Choe S, Lee CB, An G (2003) Generation and analysis of end sequence database for T-DNA tagging lines in rice. Plant Physiol 133:2040–2047. https://doi.org/10.1104/pp.103.030478

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Arumuganathan K, Earle ED (1991) Nuclear DNA content of some important plant species. Plant Mol Biol Report 9:208–218

    Article  CAS  Google Scholar 

  • Azpiros-Leeban R, Feldmann KA (1997) T-DNA insertion mutagenesis in Arabidopsis: going back and forth. Trends Genet 13:162–156. https://doi.org/10.1016/S0168-9525(97)01094-9

    Article  Google Scholar 

  • Bentley DR, Balasubramanian S, Swerdlow HP, Smith GP, Milton J, Brown CG, Hall KP, Evers DJ, Barnes CL, Bignell HR, Boutell JM, Bryant J, Carter RJ, Keira Cheetham R, Cox AJ, Ellis DJ, Flatbush MR, Gormley NA, Humphray SJ, Irving LJ, Karbelashvili MS, Kirk SM, Li H, Liu X, Maisinger KS, Murray LJ, Obradovic B, Ost T, Parkinson ML, Pratt MR, Rasolonjatovo IMJ, Reed MT, Rigatti R, Rodighiero C, Ross MT, Sabot A, Sankar SV, Scally A, Schroth GP, Smith ME, Smith VP, Spiridou A, Torrance PE, Tzonev SS, Vermaas EH, Walter K, Wu X, Zhang L, Alam MD, Anastasi C, Aniebo IC, Bailey DMD, Bancarz IR, Banerjee S, Barbour SG, Baybayan PA, Benoit VA, Benson KF, Bevis C, Black PJ, Boodhun A, Brennan JS, Bridgham JA, Brown RC, Brown AA, Buermann DH, Bundu AA, Burrows JC, Carter NP, Castillo N, Chiara E. Catenazzi M, Chang S, Neil Cooley R, Crake NR, Dada OO, Diakoumakos KD, Dominguez-Fernandez B, Earnshaw DJ, Egbujor UC, Elmore DW, Etchin SS, Ewan MR, Fedurco M, Fraser LJ, Fuentes Fajardo KV, Scott Furey W, George D, Gietzen KJ, Goddard CP, Golda GS, Granieri PA, Green DE, Gustafson DL, Hansen NF, Harnish K, Haudenschild CD, Heyer NI, Hims MM, Ho JT, Horgan AM, Hoschler K, Hurwitz S, Ivanov DV, Johnson MQ, James T, Huw Jones TA, Kang GD, Kerelska TH, Kersey AD, Khrebtukova I, Kindwall AP, Kingsbury Z, Kokko-Gonzales PI, Kumar A, Laurent MA, Lawley CT, Lee SE, Lee X, Liao AK, Loch JA, Lok M, Luo S, Mammen RM, Martin JW, McCauley PG, McNitt P, Mehta P, Moon KW, Mullens JW, Newington T, Ning Z, Ling Ng B, Novo SM, O’Neill MJ, Osborne MA, Osnowski A, Ostadan O, Paraschos LL, Pickering L, Pike AC, Pike AC, Chris Pinkard D, Pliskin DP, Podhasky J, Quijano VJ, Raczy C, Rae VH, Rawlings SR, Chiva Rodriguez A, Roe PM, Rogers J, Rogert Bacigalupo MC, Romanov N, Romieu A, Roth RK, Rourke NJ, Ruediger ST, Rusman E, Sanches-Kuiper RM, Schenker MR, Seoane JM, Shaw RJ, Shiver MK, Short SW, Sizto NL, Sluis JP, Smith MA, Ernest Sohna Sohna J, Spence EJ, Stevens K, Sutton N, Szajkowski L, Tregidgo CL, Turcatti G, vandeVondele S, Verhovsky Y, Virk SM, Wakelin S, Walcott GC, Wang J, Worsley GJ, Yan J, Yau L, Zuerlein M, Rogers J, Mullikin JC, Hurles ME, McCooke NJ, West JS, Oaks FL, Lundberg PL, Klenerman D, Durbin R, Smith AJ (2008) Accurate whole human genome sequencing using reversible terminator chemistry. Nature 456:53–59. https://doi.org/10.1038/nature07517

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Cade R, Burgin K, Schilling K et al (2018) Regulatory science evaluation of whole genome sequencing and an insertion site characterization method for molecular characterization of GM maize. J Regul Sci 6:1–14

    Google Scholar 

  • Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, Madden TL (2009) BLAST+: architecture and applications. BMC Bioinformatics 10:1–9. https://doi.org/10.1186/1471-2105-10-421

    Article  CAS  Google Scholar 

  • De Buck S, De Wilde C, Van Montagu M, Depicker A (2000) T-DNA vector backbone sequences are frequently integrated into the genome of transgenic plants obtained by Agrobacterium-mediated transformation. Mol Breed 6:459–468. https://doi.org/10.1023/A:1026575524345

    Article  Google Scholar 

  • Guo B, Guo Y, Hong H, Qiu LJ (2016) Identification of genomie insertion and flanking sequence of G2-EPSPS and GAT transgenes in soybean using whole genome sequencing method. Front Plant Sci 7:1–9. https://doi.org/10.3389/fpls.2016.01009

    Article  Google Scholar 

  • Guttikonda SK, Marri P, Mammadov J, Ye L, Soe K, Richey K, Cruse J, Zhuang M, Gao Z, Evans C, Rounsley S, Kumpatla SP (2016) Molecular characterization of transgenic events using next generation sequencing approach. PLoS One 11:e0149515. https://doi.org/10.1371/journal.pone.0149515

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Jeong SC, Hayes AJ, Biyashev RM, Maroof MAS (2001) Diversity and evolution of a non-TIR-NBS sequence family that clusters to a chromosomal “hotspot” for disease resistance genes in soybean. Theor Appl Genet 103:406–414. https://doi.org/10.1007/s001220100567

    Article  CAS  Google Scholar 

  • Jeong S-C, Pack IS, Cho E-Y, Youk ES, Park S, Yoon WK, Kim CG, Choi YD, Kim JK, Kim HM (2007) Molecular analysis and quantitative detection of a transgenic rice line expressing a bifunctional fusion TPSP. Food Control 18:1434–1442. https://doi.org/10.1016/j.foodcont.2006.10.007

    Article  CAS  Google Scholar 

  • Kajitani R, Toshimoto K, Noguchi H, Toyoda A, Ogura Y, Okuno M, Yabana M, Harada M, Nagayasu E, Maruyama H, Kohara Y, Fujiyama A, Hayashi T, Itoh T (2014) Efficient de novo assembly of highly heterozygous genomes from whole-genome shotgun short reads. Genome Res 24:1384–1395. https://doi.org/10.1101/gr.170720.113

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Kim S, Park M, Yeom SI, Kim YM, Lee JM, Lee HA, Seo E, Choi J, Cheong K, Kim KT, Jung K, Lee GW, Oh SK, Bae C, Kim SB, Lee HY, Kim SY, Kim MS, Kang BC, Jo YD, Yang HB, Jeong HJ, Kang WH, Kwon JK, Shin C, Lim JY, Park JH, Huh JH, Kim JS, Kim BD, Cohen O, Paran I, Suh MC, Lee SB, Kim YK, Shin Y, Noh SJ, Park J, Seo YS, Kwon SY, Kim HA, Park JM, Kim HJ, Choi SB, Bosland PW, Reeves G, Jo SH, Lee BW, Cho HT, Choi HS, Lee MS, Yu Y, Do Choi Y, Park BS, van Deynze A, Ashrafi H, Hill T, Kim WT, Pai HS, Ahn HK, Yeam I, Giovannoni JJ, Rose JKC, Sørensen I, Lee SJ, Kim RW, Choi IY, Choi BS, Lim JS, Lee YH, Choi D (2014) Genome sequence of the hot pepper provides insights into the evolution of pungency in Capsicum species. Nat Genet 46:270–278. https://doi.org/10.1038/ng.2877

    Article  CAS  PubMed  Google Scholar 

  • Kovalic D, Garnaat C, Guo L, Yan Y, Groat J, Silvanovich A, Ralston L, Huang M, Tian Q, Christian A, Cheikh N, Hjelle J, Padgette S, Bannon G (2012) The use of next generation sequencing and junction sequence analysis bioinformatics to achieve molecular characterization of crops improved through modern biotechnology. Plant Genome 5:149–163. https://doi.org/10.3835/plantgenome2012.10.0026

    Article  CAS  Google Scholar 

  • Kwon T, Kim HJ, Yun SY, Kim J, Cho HS, Nam J, Chung YS (2017) Enhancement of syringin contents in soybean seeds with seed-specific expression of a chimeric UGT72E3/E2 gene. Plant Biotechnol Rep 11:439–447. https://doi.org/10.1007/s11816-017-0464-5

    Article  Google Scholar 

  • Langmead B, Salzberg SL (2012) Fast gapped-read alignment with Bowtie 2. Nat Methods 9:357–359. https://doi.org/10.1038/nmeth.1923

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Leoni C, Volpicella M, De Leo F et al (2011) Genome walking in eukaryotes. FEBS J 278:3953–3977. https://doi.org/10.1111/j.1742-4658.2011.08307.x

    Article  CAS  PubMed  Google Scholar 

  • Magoč T, Salzberg SL (2011) FLASH: fast length adjustment of short reads to improve genome assemblies. Bioinformatics 27:2957–2963. https://doi.org/10.1093/bioinformatics/btr507

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Merchant S, Wood DE, Salzberg SL (2014) Unexpected cross-species contamination in genome sequencing projects. PeerJ 2014:1–7. https://doi.org/10.7717/peerj.675

    Article  Google Scholar 

  • Park D, Park S, Ban YW et al (2017) A bioinformatics approach for identifying transgene insertion sites using whole genome sequencing data. BMC Biotechnol 17:67. https://doi.org/10.1186/s12896-017-0386-x

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Park D, Park SH, Kim YS, Choi BS, Kim JK, Kim NS, Choi IY (2019a) NGS sequencing reveals that many of the genetic variations in transgenic rice plants match the variations found in natural rice population. Genes and Genomics 41:213–222. https://doi.org/10.1007/s13258-018-0754-5

    Article  CAS  PubMed  Google Scholar 

  • Park JS, Kim HJ, Cho HS, Jung HW, Cha JY, Yun DJ, Oh SW, Chung YS (2019b) Overexpression of AtYUCCA6 in soybean crop results in reduced ROS production and increased drought tolerance. Plant Biotechnol Rep 13:161–168. https://doi.org/10.1007/s11816-019-00527-2

    Article  Google Scholar 

  • Pauwels K, De Keersmaecker SCJ, De Schrijver A et al (2015) Next-generation sequencing as a tool for the molecular characterisation and risk assessment of genetically modified plants: added value or not? Trends Food Sci Technol 45:319–326. https://doi.org/10.1016/j.tifs.2015.07.009

    Article  CAS  Google Scholar 

  • Saghai-Maroof MA, Soliman KM, Jorgensen RA, Allard RW (1984) Ribosomal DNA spacer-length polymorphisms in barley: mendelian inheritance, chromosomal location, and population dynamics. Proc Natl Acad Sci U S A 81:8014–8018

    Article  CAS  Google Scholar 

  • Siddique K, Wei J, Li R, Zhang D, Shi J (2019) Identification of T-DNA insertion site and flanking sequence of a genetically modified maize event IE09S034 using next-generation sequencing technology. Mol Biotechnol 61:694–702. https://doi.org/10.1007/s12033-019-00196-0

    Article  CAS  PubMed  Google Scholar 

  • Smith N, Kilpatrick JB, Whitelam GC (2001) Superfluous transgene integration in plants. CRC Crit Rev Plant Sci 20:215–249. https://doi.org/10.1080/20013591099218

    Article  CAS  Google Scholar 

  • Yang L, Wang C, Holst-Jensen A, Morisset D, Lin Y, Zhang D (2013) Characterization of GM events by insert knowledge adapted re-sequencing approaches. Sci Rep 3:1–9. https://doi.org/10.1038/srep02839

    Article  CAS  Google Scholar 

Download references

Funding

This work was supported by the Korea Research Institute of Bioscience and Biotechnology Research Initiative Program and partly by the National Research Foundation grant (NRF-2018R1A2A2A05021904) funded by the Korean government.

Author information

Authors and Affiliations

Authors

Contributions

S.C.J conceived the presented idea and designed the project. M.S.K performed the bioinformatics analysis. C.G.K, T.K, J.N., and Y.S.C. provided transgenic plant materials. H.J. J.H.K., D.N.B., and I.S.P. performed experiments. S.C.J. and H.J. wrote the manuscript with input from all other authors.

Corresponding author

Correspondence to Soon-Chun Jeong.

Ethics declarations

Conflict of interest

The authors declare that they have no competing interests.

Code availability

Not applicable.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kim, MS., Jo, H., Kim, J.H. et al. Elucidation of genomic organizations of transgenic soybean plants through de novo genome assembly with short paired-end reads. Mol Breeding 41, 1 (2021). https://doi.org/10.1007/s11032-020-01191-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11032-020-01191-z

Keywords

Navigation