Big Data is a collection of data that is huge in volume (may be in Petabytes), yet growing exponentially with time. It is data with so large size and complexity that none of traditional data management tools can store it or process it efficiently. Big data is also a data but with huge size.
Some of the sources of big data are :
- Social Media : Twitter, Facebook, Linked In, google generates huge amount of data per day. Stat says that 500+ TB of data is generated into database of site of Facebook, where data are mainly generated in terms of image, audio, video, message etc.
- Stock Exchange : For example, New York stock exchange generate one. TB of new data per day.
- A single Jet Engine can generate 10+ terabytes of data in 30 minutes of flight time. With many thousand flights per day, generation of data reaches up to many Petabytes.
- E-commerce Site : Sites like Amazon, Flipkart, Alibaba generates huge amount of logs from which users buying trends can be traced.
- Weather Station : All the weather station and satellite gives very large huge data which are stored and manipulated to forecast weather.
- Telecom Company : Telecom study the user trends and accordingly publish their plans and for this they store the data of its million users.
Types of Big Data
There are three types of Big Data
1. Structured : Any data that can be stored, accessed and processed in the form of fixed format is termed as a 'structured' data. Data stored in a relational database management system(RDBMS) is one example of 'structured' data/ It is easy to process structured data as it has a fixed schema. Structured Query Language(SQL) is often used to manage such kind of data.
2. Unstructured : Any data with unknown form or the structure is classified as unstructured data. In addition to the size being huge, un-structured data poses multiple challenges in terms of its processing for deriving value out of it. A typical example of unstructured data is a heterogeneous data source containing a combination of simple text files, images, videos etc. The output returned by google search is unstructured data.
3. Semi-structured : Semi-structured data can contain both the form of data. We can see semi-structured data as a structured in form but it is actually not defined with e.g. a table definition in relational DBMS. Example of semi structured data is a data represented an XML file.
Characteristics of Big Data
Big data can be described by the following characteristics ;
- Volume
- Variety
- Velocity
- Variability
1. Volume : The name Big Data itself is related to a size which is enormous. Whether a particular data can actually be considered as a Big Data or not, is dependent upon volume of data. Hence, 'Volume' is one characteristics which needs to be considered while dealing with Big Data.
2. Variety : Variety refers to heterogeneous sources and the nature of data, both structured and unstructured. During earlier days, spreadsheets and databases were the only sources of data considered by most of the applications. Nowadays, data in the form of emails, photos, videos, monitoring devices, PDFs, audio, etc. are also being considered in the analysis applications. This variety of unstructured data poses certain issues for storage, mining and analyzing data.
3. Velocity : The term 'velocity' refers to the speed of generation of data. How fast the data is generated and processed to meet the demands, determines real potential in the data. It is estimated that the volume of data will double in every 2 years.
4. Variability : This refers to the inconsistency which can be shown by the data at times, thus hampering the process of being able to handle and manage the data effectively.
Benefits of Big Data Processing
- Businesses can utilize outside intelligence while taking decisions : Access to social data from search engines and sites like Facebook, Twitter are enabling organizations to fine tune their business strategies.
- Improved customer service : Traditional customer feedback systems are getting replaced by new systems designed with Big Data technologies. In these new systems, Big Data and natural language processing technologies are being used to read and evaluate consumer responses.
Application of Big Data
Some of the application of Big Data are :
- Smarter Healthcare : Making use of the petabytes of patient's data, the organization can extract meaningful information and then build applications that can predict the patient's deteriorating condition in advance.
- Telecom : Telecom sectors collects information, analyzes it and provide solutions to different problems. By using Big Data applications, telecom companies have been able to significantly reduce data packet loss, which occurs when networks are overloaded, and thus, providing a seamless connection to their customers.
- Retail : Retail has some of the tightest margins, and is one of the greatest beneficiaries of big data. The beauty of using big data in retail is to understand consumer behavior, Amazon's recommendation engine provides suggestion based on the browsing history of the consumer.
- Traffic Control : Traffic congestion is a major challenge for many cities globally. Effective use of data and sensors will be key to managing traffic better as cities become increasingly densely populated.
- Manufacturing : Analyzing big data in the manufacturing industry can reduce component defects, improve product quality, increase efficiency, and save time and money.
- Search Quality : Every time we are extracting information from google, we are simultaneously generating data for it. Google stores this data and uses it to improve its search quality.