The Need of a Unified File Format in Big Data Analysis

Main Article Content

Srihari Desai, Tushar Kumar Chopra, Yashas M.B, Suneetha K.R.

Abstract

— Big data is a larger, more complex data set extracted from different data sources. These enormous amounts of data may be utilized to solve several issues in industries like business, health, and technology that weren't previously solvable. Utilizing such a large and varied amount of data requires an effective management system. It consists of extraction of data, processing it to meet the requirements and providing required storage. Big data preprocessing constitutes a challenging task, as the previous existent approaches cannot be directly applied, since the size of the data sets or data streams make them unfeasible. As a result, data preparation has become more popular in cloud computing, and its contributions to the big data framework have been upgraded to include techniques like feature selection, imperfect data, imbalances learning, and instance reduction. The rise of technologies like machine learning, data analytics, and artificial intelligence, is altering the big data technology landscape. The use of these technologies in conjunction with big data allows businesses to improve their visualization capabilities, and make complex data more usable, and more accessible through visual representation.  Big data framework is used to work with real-time data. It is crucial to maintain proper data file formats in to enable effective big data storage and exploitation. For data to be shared between different settings, file formats are crucial. Information, received from different sources, use different file formats. Hence, this paper tries to provide an idea to form a single platform in order to process variety of data.

Article Details

Section
Articles