# Week5 Assignments - Plotting

This assignment is exploratory. You still have to submit the assignment, but no need to follow the specific instructions provided.

The basic idea is to play with some of the plotting functions. Plotting tends to be somewhat situation- and data- specific. If you already have data, we encourage you to use those data in this assignment and play around with some plots. We can provide feedback through OKPy to any explicit questions you may have or anything we notice in your code.

If you do not have data, you can obtain data from any of the following sources:
* https://catalog.data.gov/dataset
* http://mlr.cs.umass.edu/ml/datasets.html
* https://www.kaggle.com/datasets
* https://opendata.socrata.com



Either from your data or from the data provided above, try to create plots to glean anything interesting from the data. Plot features against each other and color by some factor, overlay histograms of different features, create boxplots, etc. Feel free to look at some of the plot templates available on the matplotlib / bokeh / seaborn websites, and see if you can recreate something they have done.

For more specificity, you can try to work through the following tasks:
1. With matplotlib: Plot a notched boxplot of a given feature.
2. With matplotlib: Plot an overlaid histogram of two features, each feature in a different color.
3. With Pandas: Plot a histogram of a specific column.
4. With Seaborn: Plot a histogram with a density line.
5. With Seaborn: Plot a scatterplot with a trend line.
6. With matplotlib: Plot three scatterplots on the same plot, with each having a different color and shape.

Do not necessarily feel bound by these tasks - while you should know how to complete them, preferably look at your dataset and think of plots that would be informative, and try to implement them. 

If you don't have a dataset that catches your interest, you can use sklearn's built-in wine dataset. In the cell below we've provided the code to load it and format it for easier use. We've also provided the solution to the first task using the wine dataset.

In [None]:
""""Read in and format the wine dataset"""
# Imports ---------------------------------------------------
import matplotlib.pyplot as plt
%matplotlib inline
import pandas as pd
import numpy as np
from sklearn import datasets

# Load data set  --------------------------------------------
wine = datasets.load_wine()

# let's make it a named dataframe
# this way it is more comfortable to play with
wine = pd.DataFrame(
    data=np.c_[wine["data"], wine["target"]],
    columns=wine.feature_names + ["quality"],
)

# let's take look at it
print(wine.head())

In [None]:
"""Explore wine alcohol content by quality"""
# first, create a list to provide to the boxplot function
# list will have three components, each component denotes a different
# quality level (0, 1, 2). Each component will contain a pandas Series
# of alcohol levels belonging to wine of the corresponding quality.
data_to_plot = [wine.loc[wine["quality"] == x, "alcohol"] for x in [0., 1., 2.]]

# let's plot it! and make it a notched boxplot
# we will also use the patch_artist, for stylistic purposes
plt.boxplot(data_to_plot, notch=True, patch_artist=True)
plt.show()