This paper is published in Volume-4, Issue-4, 2018
Area
Computer Science And Engineering
Author
Utkarsh Jaiswal, Kunal Gupta
Org/Univ
Amity University, Noida, Uttar Pradesh, India
Pub. Date
29 August, 2018
Paper ID
V4I4-1526
Publisher
Keywords
Clustering, K-means, Documents, Hadoop, YARN, Filtering

Citationsacebook

IEEE
Utkarsh Jaiswal, Kunal Gupta. Document clustering using K-Means clustering in Hadoop using Map Reduce, International Journal of Advance Research, Ideas and Innovations in Technology, www.IJARIIT.com.

APA
Utkarsh Jaiswal, Kunal Gupta (2018). Document clustering using K-Means clustering in Hadoop using Map Reduce. International Journal of Advance Research, Ideas and Innovations in Technology, 4(4) www.IJARIIT.com.

MLA
Utkarsh Jaiswal, Kunal Gupta. "Document clustering using K-Means clustering in Hadoop using Map Reduce." International Journal of Advance Research, Ideas and Innovations in Technology 4.4 (2018). www.IJARIIT.com.

Abstract

The high dimensional information concerns expansive volume, mind-boggling, mounting informational indexes with various, self-governing sources. As the Data expanding radically every day, it is a noteworthy concern to oversee and compose the information productively. This developed the need for machine learning systems. With the Fast advancement of Networking, information stockpiling and the information gathering limit, Machine learning bunch calculations are presently quickly growing in all science and building spaces, for example, Pattern acknowledgment, information mining, bioinformatics, and proposal frameworks. In order to help the adaptable machine learning system with Map Reduce and Hadoop bolster, we are utilizing YARN Yet Another Resource Negotiator to deal with the High Voluminous information. Different Cluster issues, for example, Cluster propensity, partition, Cluster legitimacy, and Cluster recital canister be effectively overwhelmed by YARN bunching calculations. Mahout oversees information in four stages i.e., bringing information, content mining, bunching, arrangement, and community-oriented separating. In the proposed approach, different information writes, for example, Numbers, Raw Data and 3D-Images however, datasets are arranged in the few classifications i.e., Collaborative Filtering, Clustering, Classification or Frequent Itemset Mining.