In this tutorial we are going to discuss about the Introduction to Big Data Module from the subject Big Data Analytics. So initially we will concentrate on the sections like Characteristics of Big Data, Types of Big Data, Structured and Un-structured Data. So let us start.
Big data is a broad, rapidly evolving topic. As the name implies Big Data is also data but with a huge size. Big Data is a term used to describe a collection of data that is huge in volume and yet growing exponentially with time. In short such data is so large and complex that none of the traditional data management tools are able to store it or process it efficiently. It is a blanket term for the non-traditional strategies and technologies needed to gather, organize, process, and gather insights from large datasets.
In this module, we will discuss about big data on a fundamental level and define common concepts you might come across while researching the subject. We will also discuss a high-level look at some of the processes and technologies currently being used in this platform.
Characteristics of Big Data
Big Data has certain characteristics and hence is defined using 4Vs namely:
- Volume: the amount of data that businesses can collect is really enormous and hence the volume of the data becomes a critical factor in Big Data analytics.
- Velocity: the rate at which new data is being generated all thanks to our dependence on the internet, sensors, machine-to-machine data is also important to parse Big Data in a timely manner.
- Variety:the data that is generated is completely heterogeneous in the sense that it could be in various formats like video, text, database, numeric, sensor data and so on.
- Veracity: knowing whether the data that is available is coming from a credible source is of utmost importance before deciphering and implementing Big Data for business needs.
Types of Big Data
BigData’ could be found in three forms:
Any data that can be stored, accessed and processed in the form of fixed format is termed as a ‘structured’ data. Over the period of time, talent in computer science has achieved greater success in developing techniques for working with such kind of data and also deriving value out of it. However, nowadays, we are foreseeing issues when a size of such data grows to a huge extent, typical sizes are being in the rage of multiple zettabytes.
Examples Of Structured Data
An ‘Employee’ table in a database is an example of Structured Data
Any data with unknown form or the structure is classified as unstructured data. In addition to the size being huge, un-structured data poses multiple challenges in terms of its processing for deriving value out of it. A typical example of unstructured data is a heterogeneous data source containing a combination of simple text files, images, videos etc.
Examples Of Un-structured Data
The output returned by ‘Google Search’
Semi-structured data can contain both the forms of data. We can see semi-structured data as a structured in form but it is actually not defined with e.g. a table definition in relational DBMS. Example of semi-structured data is a data represented in an XML file.
Three Challenges that big data face.
- Data or Volume
Data or Volume
- The volume of data, especially machine-generated data, is exploding,
- How fast that data is growing every year, with new sources of data that are emerging. For example, in the year 2000, 800,000petabytes (PB) of data were stored in the world,andit is expected to reach 35 zettabytes(ZB) by2020 (according to IBM).
- More than 80% of today’s information is unstructured and it is typically too big to manage effectively.
- Today, companies are looking to leverage a lot more
- Data from a wider variety of sources both inside and outside the organization.
- Things like documents, contracts, machine data, sensor data, social media, health records, emails, etc. The list is endless really.
- A lot of this data is unstructured, or has a complex structure that’s hard to represent in rows and columns.
In the next tutorial will discuss about the Intelligent Data Analysis. Click here to see more.
Geethu is a lecturer by profession and a blogger by passion. She loves writing and in ClassRounder she is looking to share the tutorials and the notes she prepared for the students. If you have any doubts within the topic, you can contact her at [email protected]