ScalaParBiBit: scaling the binary biclustering in distributed-memory systems,Cluster Computing

当前位置： X-MOL 学术 › Cluster Comput. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

ScalaParBiBit: scaling the binary biclustering in distributed-memory systems
Cluster Computing ( IF 3.6 ) Pub Date : 2021-03-19 , DOI: 10.1007/s10586-021-03261-z
Basilio B. Fraguela , Diego Andrade , Jorge González-Domínguez

Biclustering is a data mining technique that allows us to find groups of rows and columns that are highly correlated in a 2D dataset. Although there exist several software applications to perform biclustering, most of them suffer from a high computational complexity which prevents their use in large datasets. In this work we present ScalaParBiBit, a parallel tool to find biclusters on binary data, quite common in many research fields such as text mining, marketing or bioinformatics. ScalaParBiBit takes advantage of the special characteristics of these binary datasets, as well as of an efficient parallel implementation and algorithm, to accelerate the biclustering procedure in distributed-memory systems. The experimental evaluation proves that our tool is significantly faster and more scalable that the state-of-the-art tool ParBiBit in a cluster with 32 nodes and 768 cores. Our tool together with its reference manual are freely available at https://github.com/fraguela/ScalaParBiBit.

中文翻译：

ScalaParBiBit：扩展分布式内存系统中的二进制二进制聚类

Biclustering是一种数据挖掘技术，可让我们找到在2D数据集中高度相关的行和列组。尽管存在几个用于执行二元聚类的软件应用程序，但是它们中的大多数都具有较高的计算复杂度，从而妨碍了它们在大型数据集中的使用。在这项工作中，我们介绍了ScalaParBiBit，这是一种在二进制数据上查找双峰的并行工具，在文本挖掘，市场营销或生物信息学等许多研究领域中都很常见。ScalaParBiBit利用这些二进制数据集的特殊特性以及有效的并行实现和算法，可以加快分布式内存系统中的二类聚类过程。实验评估证明，与具有32个节点和768个内核的群集中的最新工具ParBiBit相比，我们的工具明显更快，更具扩展性。我们的工具及其参考手册可从https://github.com/fraguela/ScalaParBiBit免费获得。

更新日期：2021-03-19

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11