We now have an essential idea of the data. We need to expand that with some visualizations.
We are going to look at two types of plots:
Univariate plots to better understand each attribute.
Multivariate plots to better understand the relationships between attributes.
Tähän video, tarvitsee testvideomp4.yaml edukamu-komponentin, jos käytetään videoita mp4:na, pelkkä url kansioon ei tunnista videoformaattia
We start with some univariate plots, that is, plots of every individual variable.
Given that the input variables are numeric, we can make box and whisker plots of each.
# box and whisker plots
dataset.plot(kind='box', subplots=True, layout=(2,2), sharex=False, sharey=False)
pyplot.show()
A box plot or boxplot is a method for graphically depicting groups of numerical data through their quartiles. Box plots may also have lines extending from the boxes (whiskers) indicating variability outside the upper and lower quartiles, hence the term box and whisker plot. This gives us a much clearer idea of the distribution of the input attributes.
We can also create a histogram of each input variable to get an idea of the distribution.
# histograms
dataset.hist()
pyplot.show()
A histogram plot is a way of inputting data into bins of different amounts and displaying them in a column chart. In this exercise, we are splitting our dataset to display the correlation between the amount and size of flower parts.
It looks like maybe two of the input variables have a Gaussian distribution (Gaussian or Normal Distribution is a very common term in statistics, seen as a centered curve in the visualized data). This is useful to note as we can use algorithms that can exploit this assumption.
Now we can look at the interactions between the variables.
First, let’s look at scatterplots of all pairs of attributes. This will help us to spot structured relationships between input variables.
# scatter plot matrix
scatter_matrix(dataset)
pyplot.show()
Note the diagonal grouping of some pairs of attributes. This suggests a high correlation and a predictable relationship.
Tähän questionscroll- tehtäväkomponentti