Syllabus

BIOF509 - Machine Learning and Object-Oriented Programming with Python

Spring 2018

Instructors:

Teaching assistant:

Important links:

First class: 1st February 2018 at 5pm in building 10, room B1C205

Final class: 15th May 2018

This document is subject to revision. Last revised 1st February 2018.

Course Description

Learning Objectives

By the end of this course you should be able to:

  1. Create working python programs using the basic features of the python language together with numpy, pandas, and biopython (A brief refresh)
  2. Demonstrate the tools commonly used in professional settings to aid development
  3. Describe the common types of machine learning tasks
  4. Implement a simple linear regression model utilizing numpy
  5. List the advantages and disadvantages of different machine learning algorithms
  6. Apply machine learning algorithms for both regression and classification
  7. Convert a data set into a form suitable for use by machine learning algorithms
  8. Apply dimensionality reduction to a data set for visualization and further processing
  9. Identify subpopulations using clustering algorithms
  10. Choose appropriate model parameters
  11. Evaluate the results of a machine learning model
  12. Integrate a machine learning model in a workflow
  13. Compare different programming paradigms including procedural, functional and object oriented.
  14. Define what an object is in the context of programming
  15. Identify the features of an object definition
  16. Contrast attributes, properties and methods
  17. Review special methods
  18. Design a public interface for a class
  19. Utilize inheritance and abstraction
  20. Choose when and how to raise and handle exceptions appropriately

Logistics

This is a 15 week course starting on the 1st February 2018, and finishing on 10th May 2018. Classes will take place between 5:00pm and 7:00pm each Thursday in building 10, room B1C205 within the FAES Academic Center.

Attendance in class is strongly recommended; however, we realize other commitments will occasionally prevent attendance. Class materials will generally be distributed over the course website.

Most classes will have hands-on tutorials and assignments. Both practice and graded assignments will generally be provided. Graded assignments should be submitted prior to the following class. So that you can follow along during class bringing a laptop to each class is strongly encouraged.

Important dates:

  • 23 February 2018 - Last day to drop/withdraw
  • 30 March 2018 - Last day to change status (credit or audit)

Required Materials

Each student is encouraged to bring their own laptop to each class. For the course, we will use Python 3. Any python installation should work, but you must be able to install packages. The Anaconda Scientific Python Distribution from Continuum Analytics will likely be the easiest approach to configuring python if you do not already have python installed. The Anaconda installer will automatically install many of the packages we will use during the course.

Assignments and Grading

The emphasis of the course is on learning and mastering the skills covered. It is our hope that everyone will be able to complete the assignments and project. If some of the material appears unclear please ask for clarification.

The final project is 50% of the course, with the weekly assignments representing the remainder.

Weekly Assignments

Weekly assignments will generally consist of multiple components. Unless otherwise specified, each component will be graded pass / fail. A component will be graded as “pass” if it runs and produces the expected results. The final grade will be equal to the percentage of components that are graded as “pass” out of all the assignment’s components.

Final Project

The final project will consist of the following components:

1) Project documentation. Each project should have documentation clarifying its goal and functionality. The code itself should be well-documented, with comments spread out to aid understanding. Functions and classes should have docstrings describing their functionality, inputs and outputs.

2) Project code. The code should be well-organized and easy to read. It should also be written modularly, so that each part of code is reusable. The code should run and produce the correct output under different conditions. It should also have robust error checking.

3) Project presentation. Each student will present their project at the end of the semester. The idea here is to present the project’s goals, input, and output, preferably while showing snippets of code.

Project grades will be determined based on the components outlined above, with each component representing 33% of the project grade.

Course Materials

Course materials will be distributed on this website in the corresponding weekly sections.

Schedule

Week 1 (01 February 2018): Course overview and a Python refresher.

Week 2 (08 February 2018): Different programming paradigms. The main object-oriented programming (OOP) concepts.

Week 3 (15 February 2018): Developing applications with OOP.

Week 4 (22 February 2018): Introduction to Numpy, Pandas and Scikit-Learn.

Week 5 (01 March 2018): Plotting in Python: Matplotlib, Pandas, Seaborn.

Week 6 (08 March 2018): Data retrieval and dataset preprocessing in Scikit-Learn.

Week 7 (15 March 2018): Regression with Numpy and Scikit-Learn.

Week 8 (22 March 2018): Classification with Scikit-Learn

Week 9 (29 March 2018): Unsupervised learning and Clustering with Scikit-Learn.

Week 10 (05 April 2018): Dimensionality reduction and feature selection with Scikit-Learn.

Week 11 (12 April 2018): Deep learning and other advanced ML tasks.

Week 12 (19 April 2018): The machine learning workflow with Scikit-Learn.

Week 13 (26 April 2018): Turning Machine-Learning projects into software. Questions and Answers session.

Week 14 (03 May 2018): Project presentations and feedback. Part I.

Week 15 (10 May 2018): Project presentations and feedback. Part II.