当前位置: X-MOL 学术J. Big Data › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
OsamorSoft: clustering index for comparison and quality validation in high throughput dataset
Journal of Big Data ( IF 8.6 ) Pub Date : 2020-07-09 , DOI: 10.1186/s40537-020-00325-6
Ifeoma Patricia Osamor , Victor Chukwudi Osamor

The existence of some differences in the results obtained from varying clustering k-means algorithms necessitated the need for a simplified approach in validation of cluster quality obtained. This is partly because of differences in the way the algorithms select their first seed or centroid either randomly, sequentially or some other principles influences which tend to influence the final result outcome. Popular external cluster quality validation and comparison models require the computation of varying clustering indexes such as Rand, Jaccard, Fowlkes and Mallows, Morey and Agresti Adjusted Rand Index (ARIMA) and Hubert and Arabie Adjusted Rand Index (ARIHA). In literature, Hubert and Arabie Adjusted Rand Index (ARIHA) has been adjudged as a good measure of cluster validity. Based on ARIHA as a popular clustering quality index, we developed OsamorSoft which constitutes DNA_Omatrix and OsamorSpreadSheet as a tool for cluster quality validation in high throughput analysis. The proposed method will help to bridge the yawning gap created by lesser number of friendly tools available to externally evaluate the ever-increasing number of clustering algorithms. Our implementation was tested alongside with clusters created with four k-means algorithms using malaria microarray data. Furthermore, our results evolved a compact 4-stage OsamorSpreadSheet statistics that our easy-to-use GUI java and spreadsheet-based tool of OsamorSoft uses for cluster quality comparison. It is recommended that a framework be evolved to facilitate the simplified integration and automation of several other cluster validity indexes for comparative analysis of big data problems.

中文翻译:

OsamorSoft:用于在高通量数据集中进行比较和质量验证的聚类索引

从不同的聚类k均值算法获得的结果中存在一些差异,因此需要一种简化的方法来验证获得的聚类质量。部分原因是算法在选择随机或顺序地选择其第一个种子或质心的方式上存在差异,或其他一些可能影响最终结果结果的原理影响。流行的外部群集质量验证和比较模型需要计算各种群集指标,例如兰德,雅卡德,福克斯和锦葵,莫雷和阿格里斯蒂调整后的兰德指数(ARI MA)和休伯特和阿拉比调整后的兰德指数(ARI HA)。在文学中,休伯特和阿拉比调整兰德指数(ARI HA)已被裁定为衡量聚类有效性的好方法。基于ARI HA作为流行的聚类质量指标,我们开发了OsamorSoft,它构成了DNA_OmatrixOsamorSpreadSheet,作为高通量分析中的聚类质量验证工具。所提出的方法将有助于弥合由较少数量的可用于外部评估聚类算法数量不断增加的友好工具所产生的打哈欠差距。我们的实施方案与使用疟疾微阵列数据通过四个k均值算法创建的聚类一起进行了测试。此外,我们的结果得出了紧凑的4阶段OsamorSpreadSheet统计数据,该统计数据是我们易于使用的GUI Java和基于电子表格的工具,OsamorSoft用于集群质量比较。建议开发一个框架,以促进其他几个集群有效性指标的简化集成和自动化,从而对大数据问题进行比较分析。
更新日期:2020-07-09
down
wechat
bug