6.86x Machine Learning with Python-From Linear Models to Deep Learning チェックリスト

Unit2 Nonlinear Classification, Linear regression, Collaborative Filtering
- Project 2: Digit recognition (Part 1)
Unit3 Neural netowrk
- Homework4
- Project 3: Digit recognition (Part 2)

Unit2 Nonlinear Classification, Linear regression, Collaborative Filtering

Project 2: Digit recognition (Part 1)

プロジェクトの概要

digit recognition problem using the MNIST (Mixed National Institute of Standards and Technology) database
手書きの数字(0-9)画像の識別
複数の手法を試してみる。そして、その結果を比較する。

MNISTのデータについて

binary images of handwritten digits
この手書きの数字画像はアメリカ国勢調査局と高校生から収集されたデータ
- The digits were collected from among Census Bureau employees and high school students.
training: 60,000
testing: 10,000
28 * 28 pixels (784-dim)

問題

1. Introduction

2. Linear Regression with Closed Form Solution

3. Support Vector Machine

One vs. Rest SVM
Binary classification error
Implement C-SVM
- C parameter in python’s sklearn library refers the regularization parameter
Multiclass SVM

4. Multinomial (Softmax) Regression and Gradient Descent

5. Temperature

6. Changing Labels

7. Classification Using Manually Crafted Features

8. Dimensionality Reduction Using PCA

9. Cubic Features

10. Kernel Methods

Unit3 Neural netowrk

Homework4

1. Neural Networks

Feed Forward Step
Decision Boundaries
Inverse Temperature

2. LSTM

LSTM states
LSTM states 2
LSTM info

3. Backpropagation

Computing the Error
Parameter Derivatives
Activation Functions: Sigmoid
Simple Network
- chain ruleを適用する単純作業で凡ミス
SGD

参考

www.andreykurenkov.com

Backpropagation is one of those ideas like Bayes Rule that's "whoa, why didn't I think of that"...after you see it. It's just the plain old ordinary chain rule we learned early in calculus. Unfortunately, it's not as obvious or trivial as Bayes Rule (a one-line "proof" from the definition of conditional probability), and that Caused Problems... See, waaay back in 1969, when the idea of neural networks was just starting, some people pointed out that unless you have multiple layers, you can't solve problems that, well, aren't linearly separable. But with multiple layers, how can you ever figure out what the weights should be, hmm? HMMM? Pfft! Not possible, some people said. Ok, "some people" are Marvin MInsky and Seymour Papert. In 1969, they wrote a book called Perceptrons. I was in high school at the time, and had a habit of prowling the science section our our local library for new books, so I snagged Perceptrons when it first arrived. By about chapter 3, I was thinking..."They don't like this." I closed the book, and took it back. Apparently Minsky also went 'round telling his highly-placed contacts that neural networks were a dead end, and neural network research shouldn't be funded. (Yes yes, they were probably reacting to the AI hype and over-promising that led to the AI Winter...but they failed to recognize the difference between the intractable search-based methods of traditional AI...and the not-yet-named statistics-based methods of machine learning, of which neural networks were an early example.) And that killed the field for almost 20 years. But...even before that, assorted folks were busy solving the problem. Folks were working on pieces of it in the early 1960s. Paul Werbos dissertation detailing and analyzing backpropagation was published in 1974. These folks were available for Minsky and Papert to talk to... It wasn't until the re-re-discovery of backpropagation by Rumelhart, Hinton, Williams, and McClelland in 1986 that neural networks began the journey to acceptance all over again. The moral of the story is, if you have a Big Name, and lots of Clout, the more cautious you need to be about pooh-pooh-ing something. (I only found out recently about Minsky telling folks not to fund NN work, so I'm upset with them all over again.)

4. Word Embeddings

Word Embeddings
- 正解だったけど勘。復習必要

Project 3: Digit recognition (Part 2)

プロジェクトの概要

digit recognition problem using the MNIST (Mixed National Institute of Standards and Technology) database
手書きの数字(0-9)画像の識別
Part2では、neural netowrkでMINSTのデータを識別する
2.-6.はsimpleなneural netsをスクラッチで実装する
7.-10.は前半でスクラッチで行ったMINSTデータ識別をPyTorchを用いたdeep neural networkモデルで行ってみる
Using a framework like PyTorch means you don't have to implement all of the details (like in the earlier problem) and can spend more time thinking through your high level architecture.

問題

1. Introduction

2. Neural Network Basics

6.まではsimpleなneural netをスクラッチで実装していく
You will implement the net from scratch (you will probably never do this again, don't worry
今回のモデル
- three layers
  1. Input layer
    - neuron * 2
  2. Hidden layer
    - neuron * 3
  3. Output layer
    - neuron * 1

3. Activation Functions

activation functions
- hidden layer: ReLU function
- output layer: identity function
bias
- hidden layer has bias
- output layer dons not have bias
ReLUの値と、ReLUの微分の値を返す関数をそれぞれ実装する
- 微分はこの後にbackpropagationを実装するために使う
ここで実装する関数は非vectorize
Your function does not need to handle a vectorized input
Vectorization(ベクトル化)
- 要は多次元配列を渡して要素毎の計算をfor文で書かなくてもよいようにする
- 例えば行列計算、非vectorize関数ではfor文をネストして書かないといけない
- なぜ、for文で書くことがよくないのか
  - Pythonのfor文がまず遅い
  - 単純に書くのがめんどいし、可読性悪くなる
- What exactly is Vectorization?
- What is“vectorization”?
- Pythonを使った数値計算のコツ
- Pythonで数値計算のコツ：for文書いたら負けかなと思っている

4. Training the Network

NNでデータをtrainするメソッドを実装する
Forward propagationを実装する。これはカンタン
次にBack propagationをchain ruleで手計算してから実装する。これが一番辛かった
Back propagationで各パラメーター(重み)とバイアスをSGDで最適化する
numpy.vectorize()で関数をvectoriztion

5. Predicting the Test Data

5.で実装したメソッドでtraining。最適化されたパラメータでoutputを計算
- NeuralNetworkクラスのattributeにtraining後のパラメータがセットされているので、これを使ってoutputを計算するだけ。このoutputがpredictionになる

6. Conceptual Questions

7. Classification for MNIST using deep neural networks

2.-6.ではスクラッチでsimpleなNNを組んでやったことを、今度はPytorchでdeep neural networkモデルでやってみる。

8. Fully-Connected Neural Networks

まずは用意されたコードを実行
mini grid searchでhyper parameterのチューニング。精度の比較
- batch size
- learning rate
- momentum
- activation function ReLU -> LeakyReLU
- hidden layer size (neuronの数)

9. Convolutional Neural Networks

PyTorchでCNNを実装する
構成
- A convolutional layer with 32 filters of size 3×3
- A ReLU nonlinearity
- A max pooling layer with size 2×2
- A convolutional layer with 64 filters of size 3×3
- A ReLU nonlinearity
- A max pooling layer with size 2×2
- A flatten layer
- A fully connected layer with 128 neurons
- A dropout layer with drop probability 0.5
- A fully-connected layer with 10 neurons
dropout layerが出てくる。透過率?を設定してチャネルを絞るイメージ。今回はp=0.5なのでチャネルが半分になる
dropout
Without GPU acceleration, you will likely find that this network takes quite a long time to train. For that reason, we don't expect you to actually train this network until convergence. Implementing the layers and verifying that you get approximately 93% training accuracy and 98% validation accuracy after one training epoch (this should take less than 10 minutes) is enough for this project. If you are curious, you can let the model train longer; if implemented correctly, your model should achieve >99% test accuracy after 10 epochs of training. If you have access to a CUDA compatible GPU, you could even try configuring PyTorch to use your GPU.

10. Overlapping, multi-digit MNIST

時間切れでできなかった。あとでやる。

参考

If you're planning on trying to avoid use of np.vectorize (or the PyTorch equivalent), you may need to have different code in your ReLU functions for ndarray (or tensor, or list). Here's how to test what data type your function got passed: isinstance(x, t), where x is the input and t is a type specifier. isinstance() returns True or False, so can be used in an if statement. Here are the various type specifiers:

# Numpy ndarray
if isinstance(x, np.ndarray):
    # Nice vectorized code for Numpy arrays here.

# PyTorch tensor
# We may not have PyTorch imported, so try to import it here.
try:
    import torch
    if isinstance(x, torch.Tensor):
        # Appropriate code for PyTorch tensors here.
except:
    # No PyTorch available, so the input can't have been a PyTorch tensor.
    pass

# Python list (unlikely we'll get this, but still...)
if isinstance(x, list):
    # Your list code here.

Finish up with code for ordinary scalars.

As is pointed out in another thread, the grader for rectified_linear_unit_derivative is requiring that we return ints, even though it is passing us floats. If you find you want to distinguish between scalar int and float types, you can use "int" and "float" as type specifiers. (Python 3 doesn't have a distinction between "float" and "long", so you don't have to test for "long".) But since the grader wants only ints just now, this is moot.

# Python float
if isinstance(x, float):
    # Return floats if given floats...or not, because the grader wants ints.
# Python int
if isinstance(x, int):
    # Return ints if given ints...