This paper is rejected in Volume-5, Issue-3, 2019
Area
Machine Learning
Author
Aamir Ahmad Khandy, Dr. Rohit Miri
Org/Univ
Dr. C.V. Raman University, Kargi Road Kota, Bilaspur, Chittisgrah, India
Sub. Date
12 May, 2019
Paper ID
V5I3-1378
Publisher
Keywords
Big data, Unstructured data, Clustering algorithms, MongoDB

Abstract

Data that has been arranged and systematized into an organized and formatted repository, usually a database, so that its elements and essential features and can be made directly accessible for more powerful and adequate processing and analysis is known as Structured Data. Un-structured data is data that doesn’t fit accurately in a traditional database and has no identifiable internal structure and a predefined data model. We cannot perform different operations like update, insert and delete on un-structured data. Clustering is a process of unsupervised learning and is the most common method for mathematical and demographic data analysis. It is the main task of preliminary data mining, and an ordinary technique for statistical data analysis, mathematical data analysis, demographic data analysis, used in many fields, including ML (Machine Learning), recognition of patterns, analysis of images, retrieval of information, bioinformatics, compression of data and computer graphics. Available clustering algorithms have the difficulty to determine the number of clusters in a dataset and also are difficult to cluster outliers even that have common groups. A final related drawback arises from the shape of the data cluster where it is difficult and complex to cluster non-spherical and overlapping datasets. In this framework, we intended and designed an algorithm called uDCLUST (Un-structured Data Clustering), which identifies an appropriate number of clusters in unstructured data as well as cluster outliers easily with non-spherical and overlapping datasets.