Seaborn: statistical data visualization
Seaborn is a Python visualization library based on matplotlib. It provides a high-level interface for drawing attractive statistical graphics.
Seaborn is a library for making attractive and informative statistical graphics in Python. It is built on top of matplotlib and tightly integrated with the PyData stack, including support for numpy and pandas data structures and statistical routines from scipy and statsmodels.
Some of the Features that seaborn offer are
- Several built-in themes that improve on the default matplotlib aesthetics
- Tools for choosing color palettes to make beautiful plots that reveal patterns in your data
- Functions for visualizing univariate and bivariate distributions or for comparing them between subsets of data
- Tools that fit and visualize linear regression models for different kinds of independent and dependent variables
- Functions that visualize matrices of data and use clustering algorithms to discover structure in those matrices
- A function to plot statistical timeseries data with flexible estimation and representation of uncertainty around the estimate
- High-level abstractions for structuring grids of plots that let you easily build complex visualizations
Seaborn aims to make visualization a central part of exploring and understanding data. The plotting functions operate on dataframes and arrays containing a whole dataset and internally perform the necessary aggregation and statistical model-fitting to produce informative plots. If matplotlib “tries to make easy things easy and hard things possible”, seaborn tries to make a well-defined set of hard things easy
Grouped barplots
import seaborn as sns sns.set(style="whitegrid") # Load the example Titanic dataset titanic = sns.load_dataset("titanic") # Draw a nested barplot to show survival for class and sex g = sns.factorplot(x="class", y="survived", hue="sex", data=titanic, size=6, kind="bar", palette="muted") g.despine(left=True) g.set_ylabels("survival probability")
Codes for Violinplot
import seaborn as sns
import matplotlib.pyplot as plt
sns.set(style="whitegrid")
# Load the example dataset of brain network correlations
df = sns.load_dataset("brain_networks", header=[0, 1, 2], index_col=0)
# Pull out a specific subset of networks
used_networks = [1, 3, 4, 5, 6, 7, 8, 11, 12, 13, 16, 17]
used_columns = (df.columns.get_level_values("network")
.astype(int)
.isin(used_networks))
df = df.loc[:, used_columns]
# Compute the correlation matrix and average over networks
corr_df = df.corr().groupby(level="network").mean()
corr_df.index = corr_df.index.astype(int)
corr_df = corr_df.sort_index().T
# Set up the matplotlib figure
f, ax = plt.subplots(figsize=(11, 6))
# Draw a violinplot with a narrower bandwidth than the default
sns.violinplot(data=corr_df, palette="Set3", bw=.2, cut=1, linewidth=1)
# Finalize the figure
ax.set(ylim=(-.7, 1.05))
sns.despine(left=True, bottom=True)
ok
ReplyDeleteThe goal ofdata visualization is to make your data easier to understand at a glance. Here are some tips you can use to make your own data easier to understand and take action on: Label your charts and tables so that readers can see the exact data you're referring to at a glance. Make sure all charts are relevant and necessary, and that they don't distract from the data. Make sure all charts and tables have a title and an explanation of all abbreviations. Make sure all charts are easy to read and don't contain unnecessary, distracting elements. Make sure all charts and tables are easy to understand at a glance and don't require detailed
ReplyDelete