当前位置: X-MOL 学术BMC Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
FSBC: fast string-based clustering for HT-SELEX data.
BMC Bioinformatics ( IF 3 ) Pub Date : 2020-06-24 , DOI: 10.1186/s12859-020-03607-1
Shintaro Kato 1, 2 , Takayoshi Ono 2 , Hirotaka Minagawa 1 , Katsunori Horii 1 , Ikuo Shiratori 1 , Iwao Waga 1 , Koichi Ito 2 , Takafumi Aoki 2
Affiliation  

The combination of systematic evolution of ligands by exponential enrichment (SELEX) and deep sequencing is termed high-throughput (HT)-SELEX, which enables searching aptamer candidates from a massive amount of oligonucleotide sequences. A clustering method is an important procedure to identify sequence groups including aptamer candidates for evaluation with experimental analysis. In general, aptamer includes a specific target binding region, which is necessary for binding to the target molecules. The length of the target binding region varies depending on the target molecules and/or binding styles. Currently available clustering methods for HT-SELEX only estimate clusters based on the similarity of full-length sequences or limited length of motifs as target binding regions. Hence, a clustering method considering the target binding region with different lengths is required. Moreover, to handle such huge data and to save sequencing cost, a clustering method with fast calculation from a single round of HT-SELEX data, not multiple rounds, is also preferred. We developed fast string-based clustering (FSBC) for HT-SELEX data. FSBC was designed to estimate clusters by searching various lengths of over-represented strings as target binding regions. FSBC was also designed for fast calculation with search space reduction from a single round, typically the final round, of HT-SELEX data considering imbalanced nucleobases of the aptamer selection process. The calculation time and clustering accuracy of FSBC were compared with those of four conventional clustering methods, FASTAptamer, AptaCluster, APTANI, and AptaTRACE, using HT-SELEX data (>15 million oligonucleotide sequences). FSBC, AptaCluster, and AptaTRACE could complete the clustering for all sequence data, and FSBC and AptaTRACE performed higher clustering accuracy. FSBC showed the highest clustering accuracy and had the second fastest calculation speed among all methods compared. FSBC is applicable to a large HT-SELEX dataset, which can facilitate the accurate identification of groups including aptamer candidates. FSBC is available at http://www.aoki.ecei.tohoku.ac.jp/fsbc/.

中文翻译:

FSBC:HT-SELEX数据的基于字符串的快速聚类。

通过指数富集(SELEX)和深度测序对配体进行系统进化的组合称为高通量(HT)-SELEX,可从大量寡核苷酸序列中搜索适体候选物。聚类方法是鉴定包括适体候选物的序列组以进行实验分析评估的重要程序。通常,适体包括特异性靶结合区域,这是与靶分子结合所必需的。靶结合区域的长度根据靶分子和/或结合方式而变化。当前用于HT-SELEX的聚类方法仅基于全长序列的相似性或基序的有限长度作为目标结合区域来估计聚类。因此,需要考虑不同长度的目标结合区域的聚类方法。此外,为了处理如此庞大的数据并节省测序成本,从单轮HT-SELEX数据(而不是多轮)中快速计算的聚类方法也是首选方法。我们为HT-SELEX数据开发了基于字符串的快速聚类(FSBC)。FSBC旨在通过搜索各种长度的过度代表字符串作为目标绑定区域来估计聚类。FSBC还设计用于快速计算,其中考虑到适体选择过程的不平衡核碱基,从HT-SELEX数据的单轮(通常是最后一轮)中减少搜索空间。将FSBC的计算时间和聚类准确性与四种常规聚类方法FASTAptamer,AptaCluster,APTANI和AptaTRACE进行了比较,使用HT-SELEX数据(> 1500万个寡核苷酸序列)。FSBC,AptaCluster和AptaTRACE可以完成所有序列数据的聚类,而FSBC和AptaTRACE的聚类准确性更高。在所有比较的方法中,FSBC表现出最高的聚类精度,并且具有第二快的计算速度。FSBC适用于大型HT-SELEX数据集,这可以帮助准确识别包括适体候选物的组。FSBC可从http://www.aoki.ecei.tohoku.ac.jp/fsbc/获得。FSBC适用于大型HT-SELEX数据集,这可以帮助准确识别包括适体候选物的组。FSBC可从http://www.aoki.ecei.tohoku.ac.jp/fsbc/获得。FSBC适用于大型HT-SELEX数据集,这可以帮助准确识别包括适体候选物的组。FSBC可从http://www.aoki.ecei.tohoku.ac.jp/fsbc/获得。
更新日期:2020-06-24
down
wechat
bug