Big Data Reduction Technique using Parallel Hierarchical Agglomerative Clustering

Show simple item record

dc.contributor.author Moertini, Veronica Sri
dc.contributor.author Suarjana, Gde W.
dc.contributor.author Venica, Liptia
dc.contributor.author Karya, Gede
dc.date.accessioned 2018-05-08T08:37:15Z
dc.date.available 2018-05-08T08:37:15Z
dc.date.issued 2018
dc.identifier.issn 1819-9224 ( versi online)
dc.identifier.other artsc285
dc.identifier.uri http://hdl.handle.net/123456789/5917
dc.description IAENG INTERNATIONAL JOURNAL OF COMPUTER SCIENCE; Vol.45 No.1, 2018 en_US
dc.description.abstract Volume and velocity are two characteristics of big data. Big data “comes in” with high velocity that the volume increases quickly. Efforts are needed to resolve these issues. This paper presents a big data reduction technique that can be used to reduce incoming big data periodically. The results, patterns that represent the original data with smaller size can be kept for further analysis, while the voluminous big data can be discarded. Clustering is a technique that can be used for reducing data. Based on our study, we find that agglomerative clustering is suitable to be adopted for reducing big data having low to medium number of attributes. Our proposed technique is based on Hadoop MapReduce, a computing framework for distributed systems, where Map and Reduce functions run in parallel in machine nodes. The excerpt of our technique: Map preprocesses and randomly divides the big data into disjoint partitions, Reduce constructs cluster trees (dendrograms) from partitions and computes patterns from the clusters formed from the trees. The output is a collection of patterns having a lot smaller number of objects and attributes. To provide flexibilities, we design few input parameters set by users. The effect of those parameters are shown by our experiment results. By experimenting using big data in a Hadoop cluster with up to 15 commodity computers, we conclude that the Hadoop file system block size and number of nodes affect the execution time and the size of incoming big data that can be processed. en_US
dc.description.uri http://www.iaeng.org/IJCS/index.html
dc.language.iso en en_US
dc.publisher International Association of Engineers - Hong Kong en_US
dc.relation.ispartofseries IAENG INTERNATIONAL JOURNAL OF COMPUTER SCIENCE;Vol.45 No.1, 2018
dc.subject CLUSTER PATTERN en_US
dc.subject PARALLEL CLUSTERING en_US
dc.subject BIG DATA REDUCTION en_US
dc.subject MAPREDUCE en_US
dc.title Big Data Reduction Technique using Parallel Hierarchical Agglomerative Clustering en_US
dc.type Journal Articles en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search UNPAR-IR


Advanced Search

Browse

My Account