Abstract:
Knowledge discovery from data using clustering algorithm include stages of data preprocessing, clustering the preprocessed dataset and evaluating patterns for obtaining knowledge. Along with the popularity of Hadoop, k-Means algorithm has been enhanced based on MapReduce for clustering big dataset. We enhance this existing algorithm such that it includes the capabilities for performing data preprocessing, generating patterns and measures such that these can be used for evaluating the quality of clusters. Our preliminary experiment results in small Hadoop cluster indicate that our proposed technique performs well for clustering a case study big dataset.
Description:
Makalah dipresentasikan pada 2016 IEEE International Conference on Cloud Computing and Big Data Analysis (ICCCBDA 2016). IEEE, Sichuan Association for Science and Technology, Sichuan Institue of Electronics, Southwest Jiaotong University and Xihua University China. Chengdu, China, 5 - 7 July 2016.