Failed the Turing Test: Coursera Machine Learning by Stanford

Just finished up my first full blown course from Coursera, a course from Stanford University on Machine Learning. I'd watched through the lecture series for the Stanford Natural Language Processing class, but I didn't do the programming exercises (yet...) so I don't really count that one. Finished up way ahead of schedule, though to be fair, they set a pretty leisurely pace.

Overall, I thought it was an excellent class, and a great introduction to machine learning concepts. Most of it I'd seen before in my Predictive Analytics coursework with Auburn (Linear and logistic regression in SPSS, cluster analysis and PCA with SAS Enterprise Miner), but it was great to get under the hood a little bit and get a deeper understanding of how these algorithms work. Now, while they say you don't need a background in linear algebra, the vector and matrix concepts required by the class were pretty intimidating at times, and it wasn't always real clear how to translate the mathematical notation from the lectures into Octave code. Particularly in the early weeks, I frequently had to look up completed code on GitHub to see what the heck was going on. Once I got the hang of it, though, I was pretty consistently able to do the homework assignments "unassisted". Khan Academy has a series on linear algebra that I found helpful.

I posted my work on the exercises to GitHub: Machine Learning exercises on GitHub. I'd probably be a poor first choice anyway, and there really is nothing like the feeling of accomplishment when you get the solution right on your own, but spinning your wheels is super frustrating too, so only reference it if you really are getting nowhere.

So here is the basic course outline (by week):

Introduction

Setting up environment (Octave or MATLAB),
linear regression
cost function and gradient descent

More regression

Multivariate linear regression
Octave tutorial
Assignment 1: compute cost function and gradient descent

Logistic regression

logistic regression model, cost function
regularization
Assignment 2: compute sigmoid function, cost, gradient, and regularization for logistic regression

Neural networks

Model and intuitions
Assignment 3: mostly more logistic regression, plus some neural network

More neural networks

Training: feedforward and back propagation
Assignment 4: implement back propagation

Evaluating learning system

Bias vs variance
Skewed data
Error analysis
Assignment 5: regularized linear regression, plotting learning curve

Support Vector Machines (SVM)

Gaussian, linear kernels
Assignment 6: build part of spam filter with gaussian kernel

Unsupervised Learning

Clusting with K-Means
Dimensionality reduction with PCA
Assignment 7: dealing with centroids for clustering, compressing and recovering image data with PCA

Anomoly Detection and Recommender Systems

Anomoly detection with guassian density (normal distribution)
Multivariate distributions
Compare to supervised learning
Predicting movie ratings (recommender system)
Assignment 8: implement anomoly detection and recommender system components

Large Scale Machine Learning

Stochastic and Mini-batch gradient descent
Map-reduce
Online learning (streaming dataset)

Application: Photo OCR

Machine learning pipelines
Ceiling analysis (what part of system to work on)

Failed the Turing Test

Sunday, October 4, 2015

Coursera Machine Learning by Stanford

No comments:

Post a Comment