Week 1¶

Week 1 will be a general introduction to the course and an introduction to machine learning topics.

Preparation¶

I would encourage everyone to bring a laptop to the classes to follow along. If you do not already have python installed the simplest way to set up your computer is probably with the Anaconda installer. Use the installer for python 3.6 appropriate for your operating system.

We will be using python 3.6 throughout the course (a recently released version 3.7 could also be used). If you already have a version of python 3 installed all the examples we go through will probably work without any errors. We highly recommend upgrading your python installation if you have only Python 2 installed.

If you have previously used the anaconda installer a new environment can be created with:

conda create -n py36 python=3.6 anaconda

The necessary command to activate this new environment will depend on your operating system.

Useful Packages¶

Depending on your background the following packages may be new to you. As we will be making extensive use of each developing familiarity with each will be useful to you.

Jupyter¶

Jupyter notebook is a web application that enables sharing of live code together with its output. This is very useful and the majority of course material is in the form of jupyter notebooks.

The resources below are useful if jupyter is new to you.

Numpy¶

Numpy is a foundational package for data analysis in python. The key component it provides is a multi-dimensional array object. The vast majority of packages for scientific tasks in python expect numpy arrays. Numpy also provides many useful basic functions for operating on numerical data.

The resources below will help you develop a familiarity with array operations with numpy.

https://www.datacamp.com/community/tutorials/python-numpy-tutorial
https://cs231n.github.io/python-numpy-tutorial/ (more machine learning focused)
https://docs.scipy.org/doc/numpy-dev/user/quickstart.html (official documentation)

Matplotlib¶

Matplotlib is a plotting package. We will use this extensively for visualizing datasets and results.

Most of our use of matplotlib is basic but having made a plot or two will be useful

http://matplotlib.org/users/pyplot_tutorial.html

Pandas¶

Pandas provides data structures and data analysis tools that are intended to be easier to use. Pandas excels with tabular data.

We will use pandas mainly for loading datasets and preprocessing.

Files¶

Week 1 Assignment