Big Data Analytics – Introduction to Big Data

In this tutorial we are going to discuss about the Introduction to Big Data Module from the subject Big Data Analytics. So initially we will concentrate on the sections like Characteristics of Big Data, Types of Big Data, Structured and Un-structured Data. So let us start.

Big data is a broad, rapidly evolving topic. As the name implies Big Data is also data but with a huge size. Big Data is a term used to describe a collection of data that is huge in volume and yet growing exponentially with time. In short such data is so large and complex that none of the traditional data management tools are able to store it or process it efficiently. It is a blanket term for the non-traditional strategies and technologies needed to gather, organize, process, and gather insights from large datasets.

In this module, we will discuss about big data on a fundamental level and define common concepts you might come across while researching the subject. We will also discuss a high-level look at some of the processes and technologies currently being used in this platform.

Characteristics of Big Data

Big Data has certain characteristics and hence is defined using 4Vs namely:

  • Volume: the amount of data that businesses can collect is really enormous and hence the volume of the data becomes a critical factor in Big Data analytics.
  • Velocity: the rate at which new data is being generated all thanks to our dependence on the internet, sensors, machine-to-machine data is also important to parse Big Data in a timely manner.
  • Variety:the data that is generated is completely heterogeneous in the sense that it could be in various formats like video, text, database, numeric, sensor data and so on.
  • Veracity: knowing whether the data that is available is coming from a credible source is of utmost importance before deciphering and implementing Big Data for business needs.

Types of Big Data

BigData’ could be found in three forms:

  1. Structured
  2. Unstructured
  3. Semi-structured

Structured

Any data that can be stored, accessed and processed in the form of fixed format is termed as a ‘structured’ data. Over the period of time, talent in computer science has achieved greater success in developing techniques for working with such kind of data and also deriving value out of it. However, nowadays, we are foreseeing issues when a size of such data grows to a huge extent, typical sizes are being in the rage of multiple zettabytes.

Examples Of Structured Data

An ‘Employee’ table in a database is an example of Structured Data

Employee_ID Employee_Name Gender Department Salary_In_lacs
2365 Rajesh Kulkarni Male Finance 650000
3398 Pratibha Joshi Female Admin 650000
7465 Shushil Roy Male Admin 500000
7500 Shubhojit Das Male Finance 500000
7699 Priya Sane Female Finance 550000

Unstructured

Any data with unknown form or the structure is classified as unstructured data. In addition to the size being huge, un-structured data poses multiple challenges in terms of its processing for deriving value out of it. A typical example of unstructured data is a heterogeneous data source containing a combination of simple text files, images, videos etc.

Examples Of Un-structured Data

The output returned by ‘Google Search’

Semi-structured

Semi-structured data can contain both the forms of data. We can see semi-structured data as a structured in form but it is actually not defined with e.g. a table definition in relational DBMS. Example of semi-structured data is a data represented in an XML file.

Three Challenges that big data face.

  • Data or Volume
  • Process
  • Management

Data or Volume

  • The volume of data, especially machine-generated data, is exploding,
  • How fast that data is growing every year, with new sources of data that are emerging. For example, in the year 2000, 800,000petabytes (PB) of data were stored in the world,andit is expected to reach 35 zettabytes(ZB) by2020 (according to IBM).

Processing

  • More than 80% of today’s information is unstructured and it is typically too big to manage effectively.
  • Today, companies are looking to leverage a lot more
  • Data from a wider variety of sources both inside and outside the organization.
  • Things like documents, contracts, machine data, sensor data, social media, health records, emails, etc. The list is endless really.

Management

  • A lot of this data is unstructured, or has a complex structure that’s hard to represent in rows and columns.

In the next tutorial will discuss about the Intelligent Data Analysis. Click here to see more.

Related Posts

Leave a Comment

Share via
Copy link
Powered by Social Snap
Continue in browser
To install tap Add to Home Screen
Add to Home Screen
To install tap
and choose
Add to Home Screen
Continue in browser
To install tap
and choose
Add to Home Screen
Continue in browser
Continue in browser
To install tap
and choose
Add to Home Screen
We would like to show you notifications for the latest news and updates.
Dismiss
Allow Notifications