# Exploratory Data Analysis

Machine learning is an application of AI(Artificial Intelligence) that makes computers to learn themselves from given data without being explicitly programmed. Now days computers are much powerful that they can easily be trained with much amount of data with so much minimum time. As a data scientist it is also mandatory that one have to know how the data is varying, how the data is categorized and how distributed. With the help of Exploratory Data Analysis(EDA) we get conclusions about the data that human can observe with the help of graphs, charts and values.

#### Definition

Exploratory Data Analysis refers to the critical process of performing initial investigations on data so as to discover patterns,to spot anomalies,to test hypothesis and to check assumptions with the help of summary statistics and graphical representations.

#### Explanation of EDA with sample iris dataset:

I am taking iris dataset as a sample dataset and performing EDA. Iris dataset contains four features:

1. sepal_length
2. sepal_width
3. petal_length
4. petal_width and 3 classes
1. setosa
2. verginica
3. versicolor

First step is to import required libraries and than read data files.

Now we have to show number of raws and columns in dataframe, shape provides that functionality. After that we have to figure out which columns that dataframes contains. dataframe.columns returns list of columns that dataframe contains. We have to observe how the data is, so we have to display initial first raws.head() function provides that functionality. Data contains null values , so we have to fill that null blocks with some values. So first of all we have to figure out how much null each columns contains. Here species is target column , means we have to classified data in spices. So we have to observe which unique species exists in data frame. #### 2 D Scatter plot

Two-dimensional scatterplots visualize a relation (correlation) between two variables X and Y . Individual data points are represented in two-dimensional space, where axes represent the variables . The two coordinates that determine the location of each point correspond to its specific values on the two variables.

Histogram
Hey there, my name is Parth Shah. I am from Modasa(Gujarat).
Currently I am working in Tata Consultency Services Gandhinagar as a Assistant System Engineer Trainee.