DataSci W207:
Applied Machine Learning

Session 6: Th, 6:30-8:00pm PT

Office hours: Wed, 6:30-7:30pm PT

Description

The goal of this course is to provide a broad introduction to the key ideas in machine learning. The emphasis will be on intuition and practical examples rather than theoretical results. Through a variety of lecture examples and programming projects, you will learn how to apply powerful machine-learning techniques to new problems, how to run evaluations and interpret results, and how to think about scaling up from thousands of data points to billions.

This class meets for one 90 min class periods each week. It includes three guided programming projects and one more open-ended final project.

All materials in this course are posted on GitHub in the form of Jupyter notebooks.

Announcements
  • Final project feedback forms: self-evaluation, team-evaluation.
  • To make the best use of our time: if you plan to join office hours please sign up here.
  • Please fill out this PRE-COURSE survey so I can get to know a bit more about you and your programming background.
  • We WILL NOT be using the iSchool Virtual Campus for communication. We will be using it only for assignment submissions.
Class Logistics

Course Prerequisites

  • Core data science courses: research design, storing and retrieving data, exploring and analyzing data.

  • Undergraduate-level probability and statistics. Linear algebra is recommended.

Programming Prerequisites

  • Python (v3). We will be primarly using numpy and scikit-learn.

  • Jupiter and JupiterLab notebooks. You can install them in your computer using pip or Anaconda. More information here.

  • Git(Hub), including clone/commmit/push from the command line. You can sign up for an account here.

OS

  • Mac/Windows/Linux are all acceptable to use.

Textbook

  • Check readings posted on the iSchool Virtual Platform.

Assignments

  • The three guided programming projects are due before class starts on week 3, 5 (May 18, June 1), week 9 (June 29), week 13 (July 20).
  • Code submmited via GitHub (see notes below).

Final Project

  • You are allowed to work in teams. You will present your final project in class during the final session (Aug 3). The presentation time should not exeed 15-20 min.
  • Code submmited via GitHub (see notes below).

Live Session Plan


Week Lecture Lecture Materials Deadlines (6:30 pm PT)
Supervised Learning
05/04 - 05/10 Introduction Week 1
05/11 - 05/17 Nearest neighbors Week 2
05/18 - 05/24 Naive Bayes Week 3 Project 1 (Part 1-5)
05/25 - 05/31 Decission trees Week 4
06/01 - 06/07 Cross-validation and Ensemble learning Week 5 Project 1 (Part 6-11)
06/08 - 06/14 Regression analysis Week 6 Final project: group and dataset
06/15 - 06/21 Neural networks Week 7
06/22 - 06/28 Support vector machines Week 8
Unsupervised Learning
06/29 - 07/05 Cluster analysis Week 9 Project 2
07/06 - 07/12 Gaussian mixture models Week 10 Final project: baseline presentation
07/13 - 07/19 Dimensionality reduction Week 11
Other Topics
07/20 - 07/26 Network analysis Week 12
07/27 - 08/02 Recommender systems Week 13 Project 3
08/03 - 08/09 Wrap-up Week 14 Final project: code and presentation

Communication channel

We will use Slack to communicate throughout the semester. Questions/comments related to your projects (NO CODE) are strongly encouraged.


Section Slack channel
6 #w207-6

Final Project

For the final project you will form a group (3-4 people are ideal; 2-5 people are allowed; no 1 person group allowed (DON'T ASK).

Your group can only include members from the section in which you are enrolled.

You will pick your own dataset, but I will also provide some suggestions.

Deadlines to remember:

  • week 06/08 - 06/14: inform me about your group and the dataset you plan to use.
  • week 07/06 - 07/12: prepare a baseline presentation of your project. You will present in class (no more than 10 min).
  • week 08/03 - 08/09: code submission and final presentation in class (no more than 15-20 min).

A few project ideas:

PowerPoint slide ideas (feel free to use Jupyter Notebook Slides or other presentation tools):

Project Submission Guidelines

A Github Classroom link will be provided for each project. When you click on the link you will create a private repo (I already have admin rights).

Once you are ready to submit, commit and push changes to your private repo.

In ISVC, you will only submit the link to your private repo (DO NOT upload the Jupyter Notebook).

Links:

Grading

Final grades will be determined by computing the weighted average of programming projects, final group project, and participation.

Baseline grading range for this course is: A for 93 or above, A- for 90 or above, B+ for 87 or above, B for 83 or above, B- for 80 or above, C+ for 77 or above, C for 73 or above, C- for 70 and above, D+ for 67 and above, D for 63 and above, D- for 60 and above, and F for 59 and below.

Participation5%
Programming projects20% (x3)
Final project35%
Late Policy
Late submissions will be accepted up to one week past the deadline with a 10% penalty, but you need to let me know if you will be submitting late.