当前位置: X-MOL 学术Inform. Sci. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Parallel and distributed association rule mining in life science: A novel parallel algorithm to mine genomics data
Information Sciences ( IF 8.1 ) Pub Date : 2018-07-26 , DOI: 10.1016/j.ins.2018.07.055
Giuseppe Agapito , Pietro Hiram Guzzi , Mario Cannataro

Association rule mining (ARM) is largely employed in several scientific areas and application domains, and many different algorithms for learning association rules from databases have been introduced. Despite the presence of many existing algorithms, there is still room for the introduction of novel approaches tailored for novel kinds of datasets. Because often the efficiency of such algorithms depends on the type of analyzed dataset. For instance, classical ARM algorithms present some drawbacks for biological datasets produced by microarray technologies in particular containing Single Nucleotide Polymorphisms (SNPs). In particular classical algorithms require large execution times also with small datasets. Therefore the possibility to improve the performance of such algorithms by leveraging parallel computing is a growing research area. The main contributions of this paper are: a comparison among different sequential, parallels and distributed ARM techniques, and the presentation of a novel ARM algorithm, named Balanced Parallel Association Rule Extractor from SNPs (BPARES), that employs parallel computing and a novel balancing strategy to improve response time. BPARES improves performance without loosing in accuracy as well as it handles more efficiently the available computational power and reduces the memory consumption.



中文翻译:

生命科学中的并行和分布式关联规则挖掘:挖掘基因组数据的新型并行算法

关联规则挖掘(ARM)广泛应用于多个科学领域和应用领域,并且引入了许多不同的算法来从数据库中学习关联规则。尽管存在许多现有算法,但仍存在引入针对新颖数据集量身定制的新颖方法的空间。因为此类算法的效率通常取决于所分析数据集的类型。例如,经典的ARM算法对于由微阵列技术(特别是包含单核苷酸多态性(SNP))产生的生物学数据集提出了一些缺陷。特别地,经典算法还需要较小的数据集,而且执行时间也较长。因此,通过利用并行计算来改善这种算法的性能的可能性是一个不断发展的研究领域。本文的主要贡献是:比较了不同的顺序,并行和分布式ARM技术,并提出了一种新颖的ARM算法,即SNP中的平衡并行关联规则提取器(BPARES),该算法采用并行计算和新颖的平衡策略。缩短响应时间。BPARES在不降低精度的情况下提高了性能,并且更有效地处理了可用的计算能力并减少了内存消耗。

更新日期:2020-04-21
down
wechat
bug