当前位置: X-MOL 学术bioRxiv. Genom. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A novel high-accuracy genome assembly method utilizing a high-throughput workflow
bioRxiv - Genomics Pub Date : 2020-11-27 , DOI: 10.1101/2020.11.26.400507
Qingdong Zeng , Wenjin Cao , Liping Xing , Guowei Qin , Jianhui Wu , Michael F. Nagle , Qin Xiong , Jinhui Chen , Liming Yang , Prasad Bajaj , Annapurna Chitikineni , Yan Zhou , Yunxin Yu , Jiang Xu , Xiaojun Nie , Lin Huang , Shengjie Liu , Jan Šafář , Hana Šimková , Weining Song , Baozhu Guo , Shilin Chen , Jaroslav Doležel , Zhaodong Hao , Qiang Cheng , Jianguo Liang , Jiansong Tang , Aizhong Cao , Qiang Wang , Xiangqian Lu , Shouping Yang , Hongxiang Ma , Jiajie Liu , Xiaoting Wang , Hong Zhang , Zhonghua Wang , Wanquan Ji , Changfa Wang , Fengping Yuan , Jisen Shi , Rajeev K. Varshney , Zhensheng Kang , Dejun Han , Haibin Xu

Across domains of biological research using genome sequence data, high-quality reference genome sequences are essential for characterizing genetic variation and understanding the genetic basis of phenotypes. However, the construction of genome assemblies for various species is often hampered by complexities of genome organization, especially repetitive and complex sequences, leading to mis-assembly and missing regions. Here, we describe a high-throughput gold standard genome assembly workflow using a large-scale bacterial artificial chromosome (BAC) library with a refined two-step pooling strategy and the Lamp assembler algorithm. This strategy minimizes the laborious processes of physical map construction and clone-by-clone sequencing, enabling inexpensive sequencing of several thousand BAC clones. By applying this strategy with a minimum tiling path BAC clone library for the short arm of chromosome 2D (2DS) of bread wheat, 98% of BAC sequences, covering 92.7% of the 2DS chromosome, were assembled correctly for this species with a highly complex and repetitive genome. We also identified 48 large mis-assemblies in the reference wheat genome assembly (IWGSC RefSeq v1.0) and corrected these large mis-assemblies in addition to filling 92.2% of the gaps in RefSeq v1.0. Our 2DS assembly represents a new benchmark for the assembly of complex genomes with both high accuracy and efficiency.

中文翻译:

利用高通量工作流程的新型高精度基因组组装方法

在使用基因组序列数据的生物学研究的各个领域中,高质量的参考基因组序列对于表征遗传变异和理解表型的遗传基础至关重要。然而,各种物种的基因组组装体的构建常常因基因组组织的复杂性而受到阻碍,尤其是重复性和复杂的序列,从而导致组装错误和缺失区域。在这里,我们描述了使用大规模细菌人工染色体(BAC)库,改进的两步合并策略和Lamp组装程序算法的高通量金标准基因组组装工作流程。该策略最大程度地减少了物理图谱构建和逐个克隆测序的繁琐过程,从而实现了数千个BAC克隆的廉价测序。通过将这种策略与最小切分路径一起应用到面包小麦2D染色体短臂(2DS)的BAC克隆文库中,该物种的98%BAC序列(覆盖2DS染色体的92.7%)已正确组装,具有高度复杂性和重复的基因组。我们还确定了参考小麦基因组装配(IWGSC RefSeq v1.0)中的48个大型装配错误,并纠正了这些大型装配错误,除了填补了RefSeq v1.0中92.2%的空白。我们的2DS组装代表了高精度和高效组装复杂基因组的新基准。我们还确定了参考小麦基因组装配(IWGSC RefSeq v1.0)中的48个大型装配错误,并纠正了这些大型装配错误,除了填补了RefSeq v1.0中92.2%的空白。我们的2DS组装代表了高精度和高效组装复杂基因组的新基准。我们还确定了参考小麦基因组装配(IWGSC RefSeq v1.0)中的48个大型装配错误,并纠正了这些大型装配错误,除了填补了RefSeq v1.0中92.2%的空白。我们的2DS组装代表了高精度和高效组装复杂基因组的新基准。
更新日期:2020-11-27
down
wechat
bug