Pivot Based Data Preprocessing System for Data Optimization

Authors

  • Ravichandra H
  • Amutha S

Keywords:

Big-data, hadoop, map-reduce, preprocessing

Abstract

Due to the fastest growth in the data size from various Internet sources DATA extraction, transformation and processing of big data becomes very much important in academics as well as industry. Big data processing platforms like Hadoop adopt MapReduce programming model to perform data processing of applications. Functionality of Hadoop first collects data and stores them in different data node with replication policy, later computation of these data is done using the map reduce operations. The process of collection and analysis of data requires a greater number of IO operations hence computing resources are not utilized in an efficient way. This paper proposes a preprocessing system based on Hadoop Platform on the idea of preprocessing system to reduce the IO operation on the redundant data collected and is stored in the data nodes leading to the reduced data size on the data nodes and IO operations and thus processing of these data takes less time. Experiments are conducted on the dataset generated which contains the redundant integer data.

Published

2019-11-15

Issue

Section

Articles