当前位置: X-MOL 学术Algorithms Mol. Biol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
GrpClassifierEC: a novel classification approach based on the ensemble clustering space.
Algorithms for Molecular Biology ( IF 1.5 ) Pub Date : 2020-02-13 , DOI: 10.1186/s13015-020-0162-7
Loai Abdallah 1 , Malik Yousef 2
Affiliation  

BACKGROUND Advances in molecular biology have resulted in big and complicated data sets, therefore a clustering approach that able to capture the actual structure and the hidden patterns of the data is required. Moreover, the geometric space may not reflects the actual similarity between the different objects. As a result, in this research we use clustering-based space that convert the geometric space of the molecular to a categorical space based on clustering results. Then we use this space for developing a new classification algorithm. RESULTS In this study, we propose a new classification method named GrpClassifierEC that replaces the given data space with categorical space based on ensemble clustering (EC). The EC space is defined by tracking the membership of the points over multiple runs of clustering algorithms. Different points that were included in the same clusters will be represented as a single point. Our algorithm classifies all these points as a single class. The similarity between two objects is defined as the number of times that these objects were not belong to the same cluster. In order to evaluate our suggested method, we compare its results to the k nearest neighbors, Decision tree and Random forest classification algorithms on several benchmark datasets. The results confirm that the suggested new algorithm GrpClassifierEC outperforms the other algorithms. CONCLUSIONS Our algorithm can be integrated with many other algorithms. In this research, we use only the k-means clustering algorithm with different k values. In future research, we propose several directions: (1) checking the effect of the clustering algorithm to build an ensemble clustering space. (2) Finding poor clustering results based on the training data, (3) reducing the volume of the data by combining similar points based on the EC. AVAILABILITY AND IMPLEMENTATION The KNIME workflow, implementing GrpClassifierEC, is available at https://malikyousef.com.

中文翻译:

GrpClassifierEC:一种基于集成聚类空间的新型分类方法。

背景技术分子生物学的进步导致数据集庞大而复杂,因此需要一种能够捕获数据的实际结构和隐藏模式的聚类方法。此外,几何空间可能无法反映不同对象之间的实际相似性。因此,在这项研究中,我们使用基于聚类的空间,将分子的几何空间转换为基于聚类结果的分类空间。然后我们利用这个空间来开发一种新的分类算法。结果在这项研究中,我们提出了一种名为 GrpClassifierEC 的新分类方法,该方法将给定的数据空间替换为基于集成聚类 (EC) 的分类空间。EC 空间是通过在多次运行的聚类算法上跟踪点的成员资格来定义的。包含在同一集群中的不同点将表示为一个点。我们的算法将所有这些点归为一个类。两个对象之间的相似度定义为这些对象不属于同一个簇的次数。为了评估我们建议的方法,我们将其结果与几个基准数据集上的 k 个最近邻、决策树和随机森林分类算法进行比较。结果证实,建议的新算法 GrpClassifierEC 优于其他算法。结论 我们的算法可以与许多其他算法集成。在本研究中,我们仅使用具有不同 k 值的 k-means 聚类算法。在未来的研究中,我们提出了几个方向:(1)检查聚类算法的效果,构建一个集成聚类空间。(2)根据训练数据发现聚类结果不佳,(3)通过基于EC的相似点组合来减少数据量。可用性和实施​​ 实施 GrpClassifierEC 的 KNIME 工作流程可在 https://malikyousef.com 获得。
更新日期:2020-02-13
down
wechat
bug