Charismatic Document Clustering Through Novel K-Means Non-negative Matrix Factorization (KNMF) Algorithm Using Key Phrase Extraction

Laxmi Lydia, E.; Krishna Kumar, P.; Shankar, K.; Lakshmanaprabu, S. K.; Vidhyavathi, R. M.; Maseleno, Andino

doi:10.1007/s10766-018-0591-9

Charismatic Document Clustering Through Novel K-Means Non-negative Matrix Factorization (KNMF) Algorithm Using Key Phrase Extraction

Published: 07 August 2018

Volume 48, pages 496–514, (2020)
Cite this article

International Journal of Parallel Programming Aims and scope Submit manuscript

E. Laxmi Lydia ORCID: orcid.org/0000-0002-6788-7051¹,
P. Krishna Kumar²,
K. Shankar³,
S. K. Lakshmanaprabu⁴,
R. M. Vidhyavathi⁵ &
…
Andino Maseleno⁶

913 Accesses
13 Citations
Explore all metrics

Abstract

The tedious challenging of Big Data is to store and retrieve of required data from the search engines. Problem Defined There is an obligation for the quick and efficient retrieval of useful information for the many organizations. The elementary idea is to arrange these computing files of organization into individual folders in an hierarchical order of folders. Manually, to order these files into folders, there is an ardent need to know about the file contents and name of the files to give impression of files, so that it provides an alignment of certain set of files as a bunch. Problem Statement Manual grouping of files has its own complications, for example when these files are in numerous amounts and also their contents cannot be illustrious by their labels. Therefore, it’s an intense requirement for Document clustering with data processing machines for enthusiastic results. Existing System A couple of analyzers are impending with dynamic algorithms and comprehensive analogy of extant algorithms, but, yet, these have been restricted to organizations and colleges. After recent updated rules of NMF their raised a self interest in document clustering. These rules gave trust in its performances with better results when compared to Latent Semantic Indexing with Singular Value Decomposition. Proposed System A new working miniature called Novel K-means Non-Negative Matrix Factorization (KNMF) is implemented using renovated guidelines of NMF which has been diagnosed for clustering documents consequently. A new data set called Newsgroup20 is considered for the exploratory purpose. Removal of common clutter/stop words using keywords from Key Phrase Extraction Algorithm and a new proposed Iterated Lovin stemming will be utilized in preprocessing step inassisting to KNMF. Compared to the Porter stemmer and Lovins stemmer algorithms, Iterative Lovins algorithm is providing 5% more reduction. 60% of the document terms are been minimized to root as remaining terms are already root words. Eventually, an appeal to these processes named “Progressive Text mining radical” is developed inlateral exertion of K-Means algorithm from the defined Apache Mahout Project which is used to analyze the performance of the MapReduce framework in Hadoop.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Phrase Based Web Document Clustering: An Indexing Approach

Text mining using nonnegative matrix factorization and latent semantic analysis

Article 21 April 2021

Ali Hassani, Amir Iranmanesh & Najme Mansouri

A Novel Technique for Web Pages Clustering Using LSA and K-Medoids Algorithm

References

Jain, A.K.: Data clustering: 50 years beyond K-means. Pattern Recognit. Lett. 31(8), 651–666 (2010)
Article Google Scholar
Zhang, H., Fritts, J.E., Goldman, S.A.: Image segmentation evaluation: a survey of unsupervised methods. Comput. Vis. Image Underst. 110(2), 260–280 (2008)
Article Google Scholar
Baeza Yates, R., Ribeiro Neto, B., et al.: Modern Information Retrieval. ACMPress, New York (1999)
Google Scholar
Miller, D.J., Wang, Y., Kesidis, G.: Emergent unsupervised clustering paradigms with potential application to bioinformatics. Front. Biosci. 13(1), 677–690 (2008)
Article Google Scholar
Guduru, N.: Text Mining with Support Vector Machines (SVM) and Non-Negative Matrix Factorization (NMF) Algorithm. Master’s Thesis, University of Rhode Island, CS Department (2006)
Berry, M.W., Dumais, S.T., O’Brien, G.W.: Using linear algebra for intelligent information retrieval. SIAM Rev. 37(4), 573–595 (1994)
Article MathSciNet Google Scholar
Landauer, T.K., Foltz, P.W., Laham, D.: An introduction to latent semantic analysis. Discourse Process. 25(2–3), 259–284 (1998)
Article Google Scholar
Lee, D.D., Seung, H.: Learning the parts of objects by non-negative matrix factorization (NMF). Nature 401, 788–791 (1999)
Article Google Scholar
Lee, D.D., Seung, H.: Algorithm for non-negative matrix factorization. In: Dietterich, T.G., Tresp, V. (eds.) Advances in Neural Information Processing Systems, Volume 13, Proceedings of the Conference: 556562. The MIT Press
Ding, C., He, X., Simon, H.D.: On the equivalence of nonnegative matrix factorization (NMF) and spectral clustering. In: Proceedings of the 2005 SIAM International Conference on Data Mining, pp. 606–610. Society for Industrial and Applied Mathematics (2005)
Xu, W., Liu, X., Gong, Y.: Document clustering based on NON-negative matrix factorization. In: Proceedings in ACM SIGIR, pp. 267–273 (2003)
Yang, C.F., Ye, M., Zhao, J.: Document clustering based on non-negative sparse matrix factorization. In: International Conference on Advances in Natural Computation, pp. 557–563 (2005)
Kanjani, K.: Parallel Non Negative Matrix Factorization for Document Clustering. CPSC-659 (Parallel and Distributed Numerical Algorithms) Course. Texas A&M University, Tech. Rep. (2007)
Porter, M.F.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)
Article Google Scholar
Lovins, J.B.: Development of a stemming algorithm. Mech. Translat. Comp. Linguistics 11(1–2), 22–31 (1968)
Google Scholar
Laxmi, H.V.T.E.V., Somasundaram, K.: 2HARS: heterogeneity-aware resource scheduling in grid environment using K-centroids clustering and PSO techniques. Int. J. Appl. Eng. Res. 10(7), 18047–18062 (2015)
Google Scholar
Laxmi Lydia, E., Ben Swarup, M., Narsimham, C.: A disparateness–aware scheduling using K-centroids clustering and PSO techniques in hadoop cluster. Int. J. Adv. Found. Res. Comput. 2(12) (2015)
Laxmi Lydia, E.: Text mining with lucene and hadoop: document clustering with updated rules of NMF non-negative matrix factorization. Int. J. Pure Appl. Math. 118, 191–198 (2018)
Google Scholar

Download references

Acknowledgement

This work is financially supported by the Department of Science and Technology (DST), Science and Engineering Research Board (SERB) under the scheme of ECR. We thank DST_SERB for the financial support to carry the research work.

Author information

Authors and Affiliations

Department of Computer Science Engineering, Vignan’s Institute of Information Technology, Duvvada, Andhra Pradesh, India
E. Laxmi Lydia
Department of Computer Science and Engineering, V V College of Engineering, Tuticorin District, Tamil Nadu, India
P. Krishna Kumar
School of Computing, Kalasalingam Academy of Research and Education, Krishnankoil, India
K. Shankar
Department of Electronics and Instrumentation Engineering, BS Abdur Rahman Crescent Institute of Science and Technology, Chennai, India
S. K. Lakshmanaprabu
Department of Bioinformatics, Alagappa University, Karaikudi, India
R. M. Vidhyavathi
Department of Informatics Management, STMIK Pringsewu, Pringsewu, Lampung, Indonesia
Andino Maseleno

Authors

E. Laxmi Lydia
View author publications
You can also search for this author in PubMed Google Scholar
P. Krishna Kumar
View author publications
You can also search for this author in PubMed Google Scholar
K. Shankar
View author publications
You can also search for this author in PubMed Google Scholar
S. K. Lakshmanaprabu
View author publications
You can also search for this author in PubMed Google Scholar
R. M. Vidhyavathi
View author publications
You can also search for this author in PubMed Google Scholar
Andino Maseleno
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to E. Laxmi Lydia.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Laxmi Lydia, E., Krishna Kumar, P., Shankar, K. et al. Charismatic Document Clustering Through Novel K-Means Non-negative Matrix Factorization (KNMF) Algorithm Using Key Phrase Extraction. Int J Parallel Prog 48, 496–514 (2020). https://doi.org/10.1007/s10766-018-0591-9

Download citation

Received: 13 May 2018
Accepted: 30 July 2018
Published: 07 August 2018
Issue Date: June 2020
DOI: https://doi.org/10.1007/s10766-018-0591-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Charismatic Document Clustering Through Novel K-Means Non-negative Matrix Factorization (KNMF) Algorithm Using Key Phrase Extraction

Abstract

Access this article

Similar content being viewed by others

Phrase Based Web Document Clustering: An Indexing Approach

Text mining using nonnegative matrix factorization and latent semantic analysis

A Novel Technique for Web Pages Clustering Using LSA and K-Medoids Algorithm

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Charismatic Document Clustering Through Novel K-Means Non-negative Matrix Factorization (KNMF) Algorithm Using Key Phrase Extraction

Abstract

Access this article

Similar content being viewed by others

Phrase Based Web Document Clustering: An Indexing Approach

Text mining using nonnegative matrix factorization and latent semantic analysis

A Novel Technique for Web Pages Clustering Using LSA and K-Medoids Algorithm

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation