For the last decade or so, the size of machine-readable data sets has increased dramatically and the problem of ”data explosion” has become apparent. On the other hand, recent developments in computing have provided the basic infrastructure for fast access to vast amounts of online data and many of the advanced computational methods for extracting information from large quantities of data are beginning to mature. These developments have created a new range of problems and challenges for the analysts, as well as new opportunities for intelligent systems in data analysis and have led to the emergence of the field of Intelligent Data Analysis (IDA), a combination of diverse disciplines including Artificial Intelligence and Statistics. IDA is one of the most important approaches in the field of data mining, which attracts great concerns from the researchers.
Intelligent data analysis reveals implicit, previously unknown and potentially valuable information or knowledge from large amounts of data. It is also a kind of decision support process. Based on artificial intelligence, machine learning, pattern recognition, statistics, database and visualization technology mainly, IDA automatically extracts useful information, necessary knowledge and interesting models from a lot of online data in order to help decision makers make the right choices.
The process of IDA generally consists of the following three stages: (1) data preparation; (2) rule finding or data mining; (3) result validation and explanation. Data preparation involves selecting the required data from the relevant data source and integrating this into a data set to be used for data mining. Rule finding is working out rules contained in the data set by means of certain methods or algorithms. Result validation requires examining these rules, and result explanation is giving intuitive, reasonable and understandable descriptions using logical reasoning.
Nature of Data
As we all know,computer systems work with different types of digital data. In the early days of computing, data consisted primarily of text and numbers, but in modern-day computing, there are lots of different multimedia data types, such as audio, images, graphics and video.
In the case of big data, there are two types of variables you’ll find in your data – numerical and categorical. Numerical data can be divided into continuous or discrete values. And categorical data can be broken down into nominal and ordinal values.
Numerical data is information that is measurable, and it is, of course, data represented as numbers and not words or text.
Continuous numbers are numbers that don’t have a logical end to them. Examples include variables that represent money or height.
Discrete numbers are the opposite; they have a logical end to them. Some examples include variables for days in the month.
For categorical data, this is any data that isn’t a number, which can mean a string of text or date. These variables can be broken down into nominal and ordinal values.
Ordinal values are values that have a set order to them. Examples of ordinal values include having a priority on a bug such as “Critical” or “Low” or the ranking of a race as “First” or “Third”. Nominal values are the opposite of ordinal values, and they represent values with no set order to them. Nominal value examples include variables such as “Country” or “Marital Status”.
Next we will look in to Analysis Vs Reporting in Big Data. To Know more check it here.
Geethu is a lecturer by profession and a blogger by passion. She loves writing and in ClassRounder she is looking to share the tutorials and the notes she prepared for the students. If you have any doubts within the topic, you can contact her at [email protected]