Before you turn this problem in, make sure everything runs as expected. First, **restart the kernel** (in the menubar, select Kernel$\rightarrow$Restart) and then **run all cells** (in the menubar, select Cell$\rightarrow$Run All).

Make sure you fill in any place that says `YOUR CODE HERE` or "YOUR ANSWER HERE".

# Week2 Assignments


This assignment has three components. Each of the first three components receives one point if all the tests pass. 
The last component on Pandas may require more effort, it consists of three subtasks, each of the subtasks receives 1 point.

All the exercises are designed so that the solutions will need only one or a few lines of code.

Some concepts may be new to you and may require digging into the Python, NumPy or Pandas documentation, the links are provided.

Do not hesitate to contact instuctors and TA via #week2 channel on Slack if you get stuck. Join the channel first by clicking on Channels.

## Part A. Create a missing function (1 point)

In this exercise you need to create a function __missing_link(x)__ that is passed to another function as an argument in order to perform a calculation.

We know the final result (see the assert operator), but we do not know the intermediate calculation leading to that result.

Read about Python built-in functions __all()__ and __zip()__
https://docs.python.org/3.3/library/functions.html

and about the iterators and generators here:
https://docs.python.org/3.3/library/stdtypes.html#typeiter

In [2]:
def calculate(func, it):
    """
    Performs calculation by applying *func* to each item of iterator *it*
    Returns a generator as a result.
    """
    return (2 * func(a) for a in it)


In [7]:
list(calculate(lambda x: x**2, range(7)))

[0, 2, 8, 18, 32, 50, 72]

In [17]:
def missing_link(x):
    """Define a function that will be passed to calculate() as an argument"""
    return x**2
## You can check the result of the missing_link() function and of calculate() if you wish:
# print(list(map(missing_link, range(5))))
# print(list(calculate(missing_link, range(5))))

In [18]:
missing_link(2.5)

6.25

In [16]:
_observed_results = calculate(missing_link, range(7))
_expected_results = [0, 2, 8, 18, 32, 50, 72]

assert all(a == b for a, b in zip(_observed_results, _expected_results))

## Part B. NumPy (1 point)

Define __x__ as a subdivision of an interval from -4 PI to 4 PI into 32 equal parts, i.e. with a PI/4 step.

Including both endpoints that should give 33 points.

Using NumPy calculate __cos()__ and __sin()__ and find the values of

__x__ where __cos(x)__ is equal to __sin(x)__ and store these values in the variable __y__.

Use NumPy vector operations.

Use __np.pi__ constant and __np.linspace()__ function: 
https://docs.scipy.org/doc/numpy/reference/generated/numpy.linspace.html

Note that due to the way floating points are stored in memory exact comparison is nearly always impossible. You should use __np.isclose()__ instead. That would allow some room for floating point errors.
https://docs.scipy.org/doc/numpy-dev/reference/generated/numpy.isclose.html

This plot may be helpful:
http://www.wolframalpha.com/input/?i=plot+sinx+and+cosx+from+-4pi+to+4pi

In [None]:
!conda install numpy

In [20]:
!conda list | grep numpy

numpy                     1.16.4           py37h926163e_0  
numpy-base                1.16.4           py37ha711998_0  


In [19]:
import numpy as np

In [30]:
x = np.linspace(-4*np.pi, 4*np.pi, num = 33)
y = x[np.isclose(np.cos(x), np.sin(x))]

# YOUR CODE HERE
pass

In [39]:
len(x)

33

In [37]:
np.cos(x)

array([ 1.00000000e+00,  7.07106781e-01, -4.28626380e-16, -7.07106781e-01,
       -1.00000000e+00, -7.07106781e-01,  3.06161700e-16,  7.07106781e-01,
        1.00000000e+00,  7.07106781e-01, -1.83697020e-16, -7.07106781e-01,
       -1.00000000e+00, -7.07106781e-01,  6.12323400e-17,  7.07106781e-01,
        1.00000000e+00,  7.07106781e-01,  6.12323400e-17, -7.07106781e-01,
       -1.00000000e+00, -7.07106781e-01, -1.96005386e-15,  7.07106781e-01,
        1.00000000e+00,  7.07106781e-01, -1.47019514e-15, -7.07106781e-01,
       -1.00000000e+00, -7.07106781e-01, -2.20498322e-15,  7.07106781e-01,
        1.00000000e+00])

In [36]:
x.shape

(33,)

In [28]:
assert x.shape[0] == 33
assert -4*np.pi in x
assert 0.0 in x
assert 4*np.pi in x

assert y.shape[0] == 8
assert np.all(np.isclose(y/np.pi, np.array([-3.75, -2.75, -1.75, -0.75,  0.25,  1.25,  2.25,  3.25])))

AssertionError: 

## Part C. Working with Pandas dataframes (3 points)

We will explore FBI reports on gun checks provided by the National Instant Criminal Background Check System (NICS)
https://www.fbi.gov/services/cjis/nics

Before ringing up the sale, cashiers call in a check to the FBI or to other designated agencies to ensure that each customer does not have a criminal record or isn’t otherwise ineligible to make a purchase. More than 230 million such checks have been made, leading to more than 1.3 million denials.

NICS and background checks is a hot topic and it is important to be able to do some basic fact-checking using the data available. https://www.washingtonpost.com/news/fact-checker/wp/2020/02/23/fact-checking-trump-nra-claims-on-gun-background-checks/?utm_term=.3e0284ad3774

The FBI NICS provides data as PDF reports, which is a really bad example of distributing the data.
There is a community-developed parser that extracted the data from PDF files. Parsed dataset that we will be using is available here: 
https://github.com/BuzzFeedNews/nics-firearm-background-checks/blob/master/README.md

Note that the number of background checks can not be directly interpreted as the number of guns sold because the actual sale protocols vary state to state.

For reproducibility we will be using a local copy of the file.

In [40]:
import pandas as pd

# NICS parsed dataset url (local copy)
url = "https://github.com/biof509/spring2019/blob/master/week2/nics-firearm-background-checks.csv?raw=true"
guns = pd.read_csv(url)

In [44]:
len(guns.columns.to_list())

27

In [45]:
guns.shape

(13090, 27)

In [47]:
guns.describe()

Unnamed: 0,permit,permit_recheck,handgun,long_gun,other,multiple,admin,prepawn_handgun,prepawn_long_gun,prepawn_other,...,returned_other,rentals_handgun,rentals_long_gun,private_sale_handgun,private_sale_long_gun,private_sale_other,return_to_seller_handgun,return_to_seller_long_gun,return_to_seller_other,totals
count,13066.0,1705.0,13070.0,13071.0,6105.0,13090.0,13067.0,11147.0,11145.0,5720.0,...,2420.0,1595.0,1430.0,3355.0,3355.0,3355.0,3080.0,3355.0,2860.0,13090.0
mean,6852.824966,1756.825806,6153.306809,7811.399434,399.23751,273.582735,57.232724,4.91325,7.696725,0.217308,...,1.160744,0.131034,0.126573,19.492101,15.724292,1.549031,0.515909,0.563338,0.11014,22425.314133
std,25796.442361,14385.751256,8900.603197,9250.726906,1325.676832,775.608156,592.143131,11.002679,16.140515,1.113343,...,4.483513,0.954292,0.846551,84.593459,65.473715,5.821297,2.133664,2.089335,0.419157,34832.760536
min,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,0.0,0.0,925.0,2102.5,20.0,15.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,4727.25
50%,602.0,0.0,3174.5,5130.0,137.0,128.0,0.0,0.0,1.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,12641.5
75%,4633.5,2.0,7543.75,10400.5,394.0,306.0,0.0,5.0,8.0,0.0,...,0.0,0.0,0.0,7.0,9.0,0.0,0.0,0.0,0.0,26186.25
max,522188.0,199766.0,107224.0,108058.0,77929.0,38907.0,28083.0,164.0,269.0,49.0,...,64.0,13.0,12.0,1017.0,913.0,91.0,62.0,56.0,4.0,541978.0


In [48]:
guns.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 13090 entries, 0 to 13089
Data columns (total 27 columns):
month                        13090 non-null object
state                        13090 non-null object
permit                       13066 non-null float64
permit_recheck               1705 non-null float64
handgun                      13070 non-null float64
long_gun                     13071 non-null float64
other                        6105 non-null float64
multiple                     13090 non-null int64
admin                        13067 non-null float64
prepawn_handgun              11147 non-null float64
prepawn_long_gun             11145 non-null float64
prepawn_other                5720 non-null float64
redemption_handgun           11150 non-null float64
redemption_long_gun          11149 non-null float64
redemption_other             5720 non-null float64
returned_handgun             2805 non-null float64
returned_long_gun            2750 non-null float64
returned_other   

In [69]:
guns.loc[0:4]

Unnamed: 0,month,state,permit,permit_recheck,handgun,long_gun,other,multiple,admin,prepawn_handgun,...,returned_other,rentals_handgun,rentals_long_gun,private_sale_handgun,private_sale_long_gun,private_sale_other,return_to_seller_handgun,return_to_seller_long_gun,return_to_seller_other,totals
0,2020-08,Alabama,22465.0,0.0,5991.0,5741.0,218.0,274,0.0,12.0,...,0.0,0.0,0.0,39.0,25.0,1.0,0.0,0.0,0.0,37227
1,2020-08,Alaska,218.0,0.0,2593.0,3057.0,220.0,181,0.0,5.0,...,0.0,0.0,0.0,17.0,31.0,1.0,0.0,1.0,0.0,6818
2,2020-08,Arizona,12014.0,455.0,10833.0,6987.0,887.0,693,0.0,18.0,...,1.0,0.0,0.0,30.0,17.0,4.0,1.0,1.0,0.0,34149
3,2020-08,Arkansas,5684.0,690.0,4809.0,5590.0,226.0,342,95.0,14.0,...,0.0,0.0,0.0,20.0,13.0,6.0,0.0,0.0,0.0,20502
4,2020-08,California,38687.0,0.0,34617.0,25666.0,3117.0,0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,103019


In [41]:
guns.head()

Unnamed: 0,month,state,permit,permit_recheck,handgun,long_gun,other,multiple,admin,prepawn_handgun,...,returned_other,rentals_handgun,rentals_long_gun,private_sale_handgun,private_sale_long_gun,private_sale_other,return_to_seller_handgun,return_to_seller_long_gun,return_to_seller_other,totals
0,2020-08,Alabama,22465.0,0.0,5991.0,5741.0,218.0,274,0.0,12.0,...,0.0,0.0,0.0,39.0,25.0,1.0,0.0,0.0,0.0,37227
1,2020-08,Alaska,218.0,0.0,2593.0,3057.0,220.0,181,0.0,5.0,...,0.0,0.0,0.0,17.0,31.0,1.0,0.0,1.0,0.0,6818
2,2020-08,Arizona,12014.0,455.0,10833.0,6987.0,887.0,693,0.0,18.0,...,1.0,0.0,0.0,30.0,17.0,4.0,1.0,1.0,0.0,34149
3,2020-08,Arkansas,5684.0,690.0,4809.0,5590.0,226.0,342,95.0,14.0,...,0.0,0.0,0.0,20.0,13.0,6.0,0.0,0.0,0.0,20502
4,2020-08,California,38687.0,0.0,34617.0,25666.0,3117.0,0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,103019


In [None]:
# Use .head() .info() and .describe() to explore the dataset

### Part C. Subtask 1 (1 point)

First, use __pd.to_datetime()__ with argument __yearfirst=True__ to convert the column __"month"__ to a Pandas Series with DateTime objects. Add a new column __"year"__ to __guns__ dataframe and save the results of conversion there.
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.to_datetime.html

You can access Python __datetime.date__ objects via the __.dt__ property of Pandas Series:
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.dt.date.html

Look up the attributes of __datetime.date__ class, we will need attribute __.year__
https://docs.python.org/3/library/datetime.html


In [None]:
df.loc[]

In [79]:
guns['year'] = pd.to_datetime(guns['month'], yearfirst=True).dt.year

In [80]:
# YOUR CODE HERE
pass


In [81]:
assert (guns['year'].min(), guns['year'].max()) == (1998, 2020)

### Part C. Subtask 2 (1 point)

Group __guns__ dataframe by year and sum up the __totals__ (together, regardless of state). Use the variables
__totals_2000__ and __totals_2017__ to store the corresponding results.

You will need https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.groupby.html

In [None]:
# YOUR CODE HERE
pass

In [None]:
assert totals_2000 == 8427096
assert totals_2017 == 24955919

### Part C. Subtask 3 (1 point)

Group data by state (regardless of year) and calculate the mean value of __long_gun__ and __handgun__ checks separately for each state. Calculate the number of states that had more long gun background checks on average over the years than handgun checks. Calculate the number of states with more handgun checks. Store these results in __states_with_more_long_guns__ and __states_with_more_handguns__ variables, respectively.

Hint: Use vector operations. No for loops are needed. A result of comparison of two vectors is a vector of booleans. You can sum up the vector of booleans to calculate the number of True values in it.

In [None]:
# YOUR CODE HERE
pass

In [None]:
assert states_with_more_long_guns == 35
assert states_with_more_handguns == 20