6.86x Machine Learning with Python-From Linear Models to Deep Learning チェックリスト

Unit2 Nonlinear Classification, Linear regression, Collaborative Filtering

Project 2: Digit recognition (Part 1)

プロジェクトの概要

  • digit recognition problem using the MNIST (Mixed National Institute of Standards and Technology) database
  • 手書きの数字(0-9)画像の識別
  • 複数の手法を試してみる。そして、その結果を比較する。

MNISTのデータについて

  • binary images of handwritten digits
  • この手書きの数字画像はアメリ国勢調査局と高校生から収集されたデータ
    • The digits were collected from among Census Bureau employees and high school students.
  • training: 60,000
  • testing: 10,000
  • 28 * 28 pixels (784-dim)

問題

1. Introduction
2. Linear Regression with Closed Form Solution
3. Support Vector Machine
  • One vs. Rest SVM
  • Binary classification error
  • Implement C-SVM
    • C parameter in python’s sklearn library refers the regularization parameter
  • Multiclass SVM
4. Multinomial (Softmax) Regression and Gradient Descent
5. Temperature
6. Changing Labels
7. Classification Using Manually Crafted Features
8. Dimensionality Reduction Using PCA
9. Cubic Features
10. Kernel Methods

Unit3 Neural netowrk

Homework4

1. Neural Networks

  • Feed Forward Step

  • Decision Boundaries

  • Inverse Temperature

2. LSTM

  • LSTM states

  • LSTM states 2

  • LSTM info

3. Backpropagation

  • Computing the Error

  • Parameter Derivatives

  • Activation Functions: Sigmoid

  • Simple Network

    • chain ruleを適用する単純作業で凡ミス
  • SGD

参考

www.andreykurenkov.com

Backpropagation is one of those ideas like Bayes Rule that's "whoa, why didn't I think of that"...after you see it. It's just the plain old ordinary chain rule we learned early in calculus. Unfortunately, it's not as obvious or trivial as Bayes Rule (a one-line "proof" from the definition of conditional probability), and that Caused Problems... See, waaay back in 1969, when the idea of neural networks was just starting, some people pointed out that unless you have multiple layers, you can't solve problems that, well, aren't linearly separable. But with multiple layers, how can you ever figure out what the weights should be, hmm? HMMM? Pfft! Not possible, some people said. Ok, "some people" are Marvin MInsky and Seymour Papert. In 1969, they wrote a book called Perceptrons. I was in high school at the time, and had a habit of prowling the science section our our local library for new books, so I snagged Perceptrons when it first arrived. By about chapter 3, I was thinking..."They don't like this." I closed the book, and took it back. Apparently Minsky also went 'round telling his highly-placed contacts that neural networks were a dead end, and neural network research shouldn't be funded. (Yes yes, they were probably reacting to the AI hype and over-promising that led to the AI Winter...but they failed to recognize the difference between the intractable search-based methods of traditional AI...and the not-yet-named statistics-based methods of machine learning, of which neural networks were an early example.) And that killed the field for almost 20 years. But...even before that, assorted folks were busy solving the problem. Folks were working on pieces of it in the early 1960s. Paul Werbos dissertation detailing and analyzing backpropagation was published in 1974. These folks were available for Minsky and Papert to talk to... It wasn't until the re-re-discovery of backpropagation by Rumelhart, Hinton, Williams, and McClelland in 1986 that neural networks began the journey to acceptance all over again. The moral of the story is, if you have a Big Name, and lots of Clout, the more cautious you need to be about pooh-pooh-ing something. (I only found out recently about Minsky telling folks not to fund NN work, so I'm upset with them all over again.)

4. Word Embeddings

  • Word Embeddings
    • 正解だったけど勘。復習必要

Project 3: Digit recognition (Part 2)

プロジェクトの概要

  • digit recognition problem using the MNIST (Mixed National Institute of Standards and Technology) database
  • 手書きの数字(0-9)画像の識別
  • Part2では、neural netowrkでMINSTのデータを識別する
  • 2.-6.はsimpleなneural netsをスクラッチで実装する
  • 7.-10.は前半でスクラッチで行ったMINSTデータ識別をPyTorchを用いたdeep neural networkモデルで行ってみる

    Using a framework like PyTorch means you don't have to implement all of the details (like in the earlier problem) and can spend more time thinking through your high level architecture.

問題

1. Introduction
2. Neural Network Basics
  • 6.まではsimpleなneural netをスクラッチで実装していく

    You will implement the net from scratch (you will probably never do this again, don't worry

  • 今回のモデル
    • three layers
      1. Input layer
        • neuron * 2
      2. Hidden layer
        • neuron * 3
      3. Output layer
        • neuron * 1
3. Activation Functions
4. Training the Network
  • NNでデータをtrainするメソッドを実装する
  • Forward propagationを実装する。これはカンタン
  • 次にBack propagationをchain ruleで手計算してから実装する。これが一番辛かった
  • Back propagationで各パラメーター(重み)とバイアスをSGDで最適化する
  • numpy.vectorize()で関数をvectoriztion
5. Predicting the Test Data
  • 5.で実装したメソッドでtraining。最適化されたパラメータでoutputを計算
    • NeuralNetworkクラスのattributeにtraining後のパラメータがセットされているので、これを使ってoutputを計算するだけ。このoutputがpredictionになる
6. Conceptual Questions
7. Classification for MNIST using deep neural networks
  • 2.-6.ではスクラッチでsimpleなNNを組んでやったことを、今度はPytorchでdeep neural networkモデルでやってみる。
8. Fully-Connected Neural Networks
  • まずは用意されたコードを実行
  • mini grid searchでhyper parameterのチューニング。精度の比較
    • batch size
    • learning rate
    • momentum
    • activation function ReLU -> LeakyReLU
    • hidden layer size (neuronの数)
9. Convolutional Neural Networks
  • PyTorchでCNNを実装する
  • 構成
    • A convolutional layer with 32 filters of size 3×3
    • A ReLU nonlinearity
    • A max pooling layer with size 2×2
    • A convolutional layer with 64 filters of size 3×3
    • A ReLU nonlinearity
    • A max pooling layer with size 2×2
    • A flatten layer
    • A fully connected layer with 128 neurons
    • A dropout layer with drop probability 0.5
    • A fully-connected layer with 10 neurons
  • dropout layerが出てくる。透過率?を設定してチャネルを絞るイメージ。今回はp=0.5なのでチャネルが半分になる
  • dropout

  • Without GPU acceleration, you will likely find that this network takes quite a long time to train. For that reason, we don't expect you to actually train this network until convergence. Implementing the layers and verifying that you get approximately 93% training accuracy and 98% validation accuracy after one training epoch (this should take less than 10 minutes) is enough for this project. If you are curious, you can let the model train longer; if implemented correctly, your model should achieve >99% test accuracy after 10 epochs of training. If you have access to a CUDA compatible GPU, you could even try configuring PyTorch to use your GPU.

10. Overlapping, multi-digit MNIST
  • 時間切れでできなかった。あとでやる。

参考

If you're planning on trying to avoid use of np.vectorize (or the PyTorch equivalent), you may need to have different code in your ReLU functions for ndarray (or tensor, or list). Here's how to test what data type your function got passed: isinstance(x, t), where x is the input and t is a type specifier. isinstance() returns True or False, so can be used in an if statement. Here are the various type specifiers:

# Numpy ndarray
if isinstance(x, np.ndarray):
    # Nice vectorized code for Numpy arrays here.
# PyTorch tensor
# We may not have PyTorch imported, so try to import it here.
try:
    import torch
    if isinstance(x, torch.Tensor):
        # Appropriate code for PyTorch tensors here.
except:
    # No PyTorch available, so the input can't have been a PyTorch tensor.
    pass
# Python list (unlikely we'll get this, but still...)
if isinstance(x, list):
    # Your list code here.

Finish up with code for ordinary scalars.

As is pointed out in another thread, the grader for rectified_linear_unit_derivative is requiring that we return ints, even though it is passing us floats. If you find you want to distinguish between scalar int and float types, you can use "int" and "float" as type specifiers. (Python 3 doesn't have a distinction between "float" and "long", so you don't have to test for "long".) But since the grader wants only ints just now, this is moot.

# Python float
if isinstance(x, float):
    # Return floats if given floats...or not, because the grader wants ints.
# Python int
if isinstance(x, int):
    # Return ints if given ints...