当前位置: X-MOL 学术Int. J. Parallel. Program › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Charismatic Document Clustering Through Novel K-Means Non-negative Matrix Factorization (KNMF) Algorithm Using Key Phrase Extraction
International Journal of Parallel Programming ( IF 0.9 ) Pub Date : 2018-08-07 , DOI: 10.1007/s10766-018-0591-9
E. Laxmi Lydia , P. Krishna Kumar , K. Shankar , S. K. Lakshmanaprabu , R. M. Vidhyavathi , Andino Maseleno

The tedious challenging of Big Data is to store and retrieve of required data from the search engines. Problem Defined There is an obligation for the quick and efficient retrieval of useful information for the many organizations. The elementary idea is to arrange these computing files of organization into individual folders in an hierarchical order of folders. Manually, to order these files into folders, there is an ardent need to know about the file contents and name of the files to give impression of files, so that it provides an alignment of certain set of files as a bunch. Problem Statement Manual grouping of files has its own complications, for example when these files are in numerous amounts and also their contents cannot be illustrious by their labels. Therefore, it’s an intense requirement for Document clustering with data processing machines for enthusiastic results. Existing System A couple of analyzers are impending with dynamic algorithms and comprehensive analogy of extant algorithms, but, yet, these have been restricted to organizations and colleges. After recent updated rules of NMF their raised a self interest in document clustering. These rules gave trust in its performances with better results when compared to Latent Semantic Indexing with Singular Value Decomposition. Proposed System A new working miniature called Novel K-means Non-Negative Matrix Factorization (KNMF) is implemented using renovated guidelines of NMF which has been diagnosed for clustering documents consequently. A new data set called Newsgroup20 is considered for the exploratory purpose. Removal of common clutter/stop words using keywords from Key Phrase Extraction Algorithm and a new proposed Iterated Lovin stemming will be utilized in preprocessing step inassisting to KNMF. Compared to the Porter stemmer and Lovins stemmer algorithms, Iterative Lovins algorithm is providing 5% more reduction. 60% of the document terms are been minimized to root as remaining terms are already root words. Eventually, an appeal to these processes named “Progressive Text mining radical” is developed inlateral exertion of K-Means algorithm from the defined Apache Mahout Project which is used to analyze the performance of the MapReduce framework in Hadoop.

中文翻译:

通过使用关键短语提取的新型 K 均值非负矩阵分解 (KNMF) 算法进行魅力文档聚类

大数据的乏味挑战是从搜索引擎中存储和检索所需的数据。问题定义 许多组织有义务快速有效地检索有用的信息。基本思想是将这些组织的计算文件按文件夹的分层顺序排列到单独的文件夹中。手动将这些文件排序到文件夹中,迫切需要了解文件内容和文件名称以给出文件印象,以便它提供一组文件的对齐方式。问题说明 手动对文件进行分组有其自身的复杂性,例如当这些文件数量众多且其内容无法通过标签显示时。所以,这是对使用数据处理机器进行文档聚类以获得热情结果的强烈要求。现有系统 一些分析器即将推出动态算法和现有算法的综合类比,但是,这些仅限于组织和学院。在最近更新 NMF 规则后,他们对文档聚类产生了兴趣。与具有奇异值分解的潜在语义索引相比,这些规则使人们相信其性能和更好的结果。提议的系统 一个名为 Novel K-means Non-Negative Matrix Factorization (KNMF) 的新工作缩影是使用 NMF 的更新指南实现的,该指南已被诊断用于聚类文档。出于探索目的,考虑使用名为 Newsgroup20 的新数据集。使用来自关键短语提取算法的关键字和新提出的迭代 Lovin 词干去除常见的杂乱/停用词将用于预处理步骤,以支持 KNMF。与 Porter 词干分析器和 Lovins 词干分析器算法相比,迭代 Lovins 算法提供了 5% 以上的减少。60% 的文档术语被最小化为根,因为剩余的术语已经是根词。最终,对这些名为“渐进式文本挖掘激进”的过程的诉求是从定义的 Apache Mahout 项目中开发的 K-Means 算法的内部应用,该项目用于分析 Hadoop 中 MapReduce 框架的性能。与 Porter 词干分析器和 Lovins 词干分析器算法相比,迭代 Lovins 算法提供了 5% 以上的减少。60% 的文档术语被最小化为根,因为剩余的术语已经是根词。最终,对这些名为“渐进式文本挖掘激进”的过程的诉求是从定义的 Apache Mahout 项目中开发的 K-Means 算法的内部应用,该项目用于分析 Hadoop 中 MapReduce 框架的性能。与 Porter 词干分析器和 Lovins 词干分析器算法相比,迭代 Lovins 算法提供了 5% 以上的减少。60% 的文档术语被最小化为根,因为剩余的术语已经是根词。最终,对这些名为“渐进式文本挖掘激进”的过程的诉求是从定义的 Apache Mahout 项目中开发的 K-Means 算法的内部应用,该项目用于分析 Hadoop 中 MapReduce 框架的性能。
更新日期:2018-08-07
down
wechat
bug