Basic knowledge construction technique to reduce the volume of low-dimensional big data

Karya, Gede; Sitohang, Benhard; Akbar, Saiful; Moertini, Veronica S

Basic knowledge construction technique to reduce the volume of low-dimensional big data

Karya, Gede; Sitohang, Benhard; Akbar, Saiful; Moertini, Veronica S

URI: http://hdl.handle.net/123456789/15039

Date: 2020

Abstract:

Big-data has the characteristics of high volume, velocity, and variety (3v) and continues to grow exponentially following the development of the use of world information and communication technology. The main problem in the use of big data is data deluge. The need for technology and big-data storage and processing methods to offset the exponential data growth rate is potentially unlimited, giving rise to the problem of increasing exponential technology requirements as well. In this paper, we propose a new approach in the realm of big-data analysis, through separating the basic-knowledge construction process from the original data into knowledge with much smaller velocity and volume. There are three problems to be solved, such as formulating basic-knowledge, developing a method for constructing basic-knowledge from initial data, and developing a technique for analyzing basic-knowledge into final knowledge. In this study, the technique used to build basicknowledge is clustering-based. Analysis of basic-knowledge into final-knowledge is limited to the clustering-based analysis process. The main contributions in this paper are basicknowledge formulation, new big-data analytic architecture, basic-knowledge construction algorithms (DSC4BKC), and analysis algorithms from basic-knowledge (BDAfBK) to finalknowledge. To test our proposed method, we use the BIRCH clustering algorithm with O(n) complexity as the baseline. We also used the artificial test-data generated from WEKA, and the IRIS4D and Diabetes data from the UCI Machine Learning Data Set for validation. Our test shows that the proposed method much more efficient in using data storage (84.69% up to 99.80%), faster in processing (20.84% up to 86.91%), and produces final-knowledge that is similar to the baseline.