Basic knowledge construction technique to reduce the volume of low-dimensional big data

Karya, Gede; Sitohang, Benhard; Akbar, Saiful; Moertini, Veronica S

dc.contributor.author	Karya, Gede
dc.contributor.author	Sitohang, Benhard
dc.contributor.author	Akbar, Saiful
dc.contributor.author	Moertini, Veronica S
dc.date.accessioned	2023-05-09T08:35:44Z
dc.date.available	2023-05-09T08:35:44Z
dc.date.issued	2020
dc.identifier.issn	1935-5688
dc.identifier.other	maklhsc777
dc.identifier.uri	http://hdl.handle.net/123456789/15039
dc.description	Makalah dipresentasikan pada 2020 Fifth International Conference on Informatics and Computing (ICIC), November 2020. p. 1-8.	en_US
dc.description.abstract	Big-data has the characteristics of high volume, velocity, and variety (3v) and continues to grow exponentially following the development of the use of world information and communication technology. The main problem in the use of big data is data deluge. The need for technology and big-data storage and processing methods to offset the exponential data growth rate is potentially unlimited, giving rise to the problem of increasing exponential technology requirements as well. In this paper, we propose a new approach in the realm of big-data analysis, through separating the basic-knowledge construction process from the original data into knowledge with much smaller velocity and volume. There are three problems to be solved, such as formulating basic-knowledge, developing a method for constructing basic-knowledge from initial data, and developing a technique for analyzing basic-knowledge into final knowledge. In this study, the technique used to build basicknowledge is clustering-based. Analysis of basic-knowledge into final-knowledge is limited to the clustering-based analysis process. The main contributions in this paper are basicknowledge formulation, new big-data analytic architecture, basic-knowledge construction algorithms (DSC4BKC), and analysis algorithms from basic-knowledge (BDAfBK) to finalknowledge. To test our proposed method, we use the BIRCH clustering algorithm with O(n) complexity as the baseline. We also used the artificial test-data generated from WEKA, and the IRIS4D and Diabetes data from the UCI Machine Learning Data Set for validation. Our test shows that the proposed method much more efficient in using data storage (84.69% up to 99.80%), faster in processing (20.84% up to 86.91%), and produces final-knowledge that is similar to the baseline.	en_US
dc.language.iso	en	en_US
dc.publisher	IEEE	en_US
dc.subject	LOW-DIMENSIONAL BIG DATA	en_US
dc.subject	REDUCTION	en_US
dc.subject	DATA STREAM CLUSTERING	en_US
dc.subject	BIG DATA ANALYSIS	en_US
dc.subject	BASIC KNOWLEDGE CONSTRUCTION	en_US
dc.title	Basic knowledge construction technique to reduce the volume of low-dimensional big data	en_US
dc.type	Conference Papers	en_US