Data Vizualization with Python

Data Vizualization is a part of Exploratory data analysis - charts and graphs can tell you much more than what a simple table or a bunch of numbers tell you.

Libraries: Matplotlib and Seaborn

Type of graphs/charts

## lm plot 
## ScatterPlot Between two numeric variables 
## Correlation PLot between all numeric variables 
## Histogram of one Variable with Density Plot 
## Histogram of all Variables
## Frequency chart of classes in a categorical variable 
## Factor Plots - Separate plots by classes of a categorical Variable 
## Violin Plot 
## Violin Plot + Factor 
## Pairplot of all variables 
## 2 plots in a single plot #### Facetgrids - separate study
## BoxPlot - One variable 
## BoxPlot - Multiple Categorical (facets) 
## One bar and one line chart - Secondary Axis 
## Time Series Chart of two series in a single chart - Wherever applicable
## QQ PLot for normality
## Missing Value Chart

Other Charts
## Density Plot - overlay with Scatter PLot
## Joint Plot
## Set Color Palette
## Customization - Row & Col labels, xticks & yticks, grid, background, colors, title, legend position and size, 
# marker type and size, font size, inserting a text, inserting a line

Seaborn : import seaborn as sns

LM Plot

sns.lmplot(x='sepal_length',y='sepal_width', data=data)

Test Image

Barplot between two variables zipped - stored in separate vectors - chart customization

cols = train.columns
uniques = [len(train[col].unique()) for col in cols]

sns.set(font_scale=1.2)
ax = sns.barplot(cols, uniques,  log=True)
ax.set(xlabel='Feature', ylabel='log(unique count)', title='Number of unique values per feature')
for p, uniq in zip(ax.patches, uniques):
    height = p.get_height()
    ax.text(p.get_x()+p.get_width()/2.,
            height + 10,
            uniq,
            ha="center") 

Test Image

Scattrplot - Seaborn doesn’t have a decidated scatter plot function

sns.lmplot(x='sepal_length',y='sepal_width', data=data, fit_reg=False, hue = 'Spec')

Test Image

Correlation Plot

## Correlation plot
plt.figure(figsize=(10,10))
sns.heatmap(corr)
plt.title('Correlation between different fearures')

Test Image

Histogram of one variable with distplot

sns.distplot(data.sepal_length)

Test Image

sns.distplot(data.sepal_length, kde=False, bins=20, color='red')

Test Image

Frequency chart of classes in categorical variable

sns.countplot(data.Spec)

Test Image

Factor Plot - Plot by a class


g = sns.factorplot(x='sepal_length', 
                   y='sepal_width', 
                   data=data, 
                   hue='Spec',  # Color by stage
                   col='Spec',  # Separate by stage
                   kind='swarm') # Swarmplot
 
# Rotate x-axis labels
g.set_xticklabels(rotation=-45)

Test Image

Set Color Palette

mycolors = ['#78C850',  # Grass
                    '#F08030',  # Fire
                    '#6890F0',  # Water
                    '#A8B820',  # Bug
                    '#A8A878',  # Normal
                    '#A040A0',  # Poison
                    '#F8D030',  # Electric
                    '#E0C068',  # Ground
                    '#EE99AC',  # Fairy
                    '#C03028',  # Fighting
                    '#F85888',  # Psychic
                    '#B8A038',  # Rock
                    '#705898',  # Ghost
                    '#98D8D8',  # Ice
                    '#7038F8',  # Dragon
                   ]

Violin plot with self color palette

sns.violinplot(y='sepal_length', x='Spec', data=data, 
               palette=mycolors) # Set color palette

Test Image

ViolinPlot + Factor Plot


sns.factorplot('sepal_length', 
                   data=data, 
                   hue='Spec',  # Color by stage
                   col='Spec',  # Separate by stage
                   kind='violin') 

Test Image

Simple PairPlot

sns.pairplot(data) 

Test Image

Pairplot with customization

sns.pairplot(data, hue = 'Spec', diag_kind="hist")

Test Image

Pair Plot - Complex

g = sns.PairGrid(data, hue="Spec") 
g.map_upper(sns.regplot) 
g.map_lower(sns.residplot) 
g.map_diag(plt.hist) 
for ax in g.axes.flat: 
    plt.setp(ax.get_xticklabels(), rotation=45) 
g.add_legend() 
g.set(alpha=0.5)

Test Image

BoxPlot - one and multiple variables

sns.boxplot('sepal_length', data=data)

Test Image

Boxplot


sns.set_style('whitegrid')
sns.boxplot(data[['sepal_length', 'sepal_width']])

Test Image

Boxplot with factorplot

sns.factorplot(x='sepal_length', 
                   data=data, 
                   hue='Spec',  # Color by stage
                   col='Spec',  # Separate by stage
                   kind='box') # Swarmplot

Test Image

Overlaying two charts

# Set figure size with matplotlib
plt.figure(figsize=(10,6))
 
# Create plot
sns.violinplot(x='sepal_length',
               y='sepal_width', 
               data=data, 
               inner=None, # Remove the bars inside the violins
               palette=mycolors)
 
sns.swarmplot(x='sepal_length', 
              y='sepal_width', 
              data=data, 
              color='k', # Make points black
              alpha=0.7) # and slightly transparent
 
# Set title with matplotlib
plt.title('Attack by Type')

Test Image

Overlaying two charts

plt.figure(figsize=(10,6))
 
# Create plot
sns.violinplot(x='sepal_length', y='sepal_width', data=data, inner=None,palette=mycolors) 
sns.swarmplot(x='sepal_length', 
              y='sepal_width', 
              data=data, 
              color='k', # Make points black
              alpha=0.7) # and slightly transparent

Test Image

Written on January 4, 2018
[ ]