Unsupervised Learning

Week 1 - Clustering

What is clustering?

No labelling of target (Y), instead we used clustering algorithms to find interesting structure in data.

Applications like:

K-Mean Clustering

Steps:

  1. Randomly initialize K cluster centroids $\mu1, \mu2, …, \mu K $
    • Randomly pick K from training examples.
      • Run it multiple times to allow finding the best local optima.
      • Repeat: Random initialize -> Step 2 -> compute cost function (distortion)
      • Pick set of clusters that gave lower cost.
  2.  Repeat{
         # Assign points to cluster centroids: 
             for i = 1 to m: 
                 c^(i) := index (from 1 to K) of cluster centroid closest to x(i)
    
         # Move cluster centroids
         for k = 1 to K,
             Mu(k):= avg(mean) of points assigned to cluster k
    

    Eliminate cluster if no point assigned to it.

Optimization objective

$ J(c^{(1)}, …, c^{(m)}, \mu_1,.. \mu_K) $ or distortion function = $ {1 \over m} \sum_{i=1}^{m}   x^{(i)} - \mu_{c^{(i)}}   ^2 $

Choosing the number of clusters

Week 1 - Anomaly detection

Evaluate

  1. Fit model p(x) on training set x(1),…,x(m)

  2. On a cross validation/test example x, predict

     y = 1 if p(x)< e (anomaly)
     y = 0 if p(x)>= e (normal) 
    

    Possible evaluation metrics:

    • True positive, false positive, false negative, true negative
    • Precision/Recall
    • F1-score

Anomaly detection vs Supervised Learning

| Anomaly detection | Supervised Learning | | ———– | ———– | | Very small number of positive examples (y = 1). (0-20 is common). Large number of negative (y =0) examples. | Large number of positive and negative examples. Many different “types” of anomalies. Hard for any algorithm to learn from positive examples what the normalies look like; future anomalies may look nothing like any of the anomalous examples we’ve seen so far. | Enough positive examples for algorithm to get a sense of what positive examples are like, future positive examples likely to be similar to ones in training set. | | <li>Fraud Detection <li>Manufacturing - Finding new previously unseen defects in manufacturing <li> Monitoring machines in data center | <li> Email spam classification <li> Manufacturing - Finding known previously seen defects (like scratches or bending) <li> Weather prediction (sunny/rainy/etc.) <li> Disease classification |

Building an anomaly detection algorithm, the choice of features is very important and when building anomaly detection systems. Tuning features is important for anomaly detection.

Choosing features to use

Week 2 - Collaborative Filtering

Week 3 - Reinforcement Learning

Based on setting reward function and let the algorithm to automatically figure out how to choose good actions.

Return in RL

Making Decisions

State action value function (Q-function or $Q^*$)

Stochastic Approach

Expected Return = Average ($R_1 + \gamma R_2 + \gamma^2 R_3 + …$) $Q(s,a) = R(s) + \gamma E[\max\limits_{a’} Q(s’,a’)]$

Continuous state spaces

For actual case, instead of discrete case, there can be inputs based on continuous.

Deep Reinforcement Learning

Create x->y for training in supervised learning to use Bellman Equation

x = Q(s,a)

y = R(s) + \gamma E[\max\limits_{a’} Q(s’,a’)]$

Learning Algorithm (Deep Q Network)

Initialize neural network randomly as guess of Q(s,a)

Repeat{

Take actions in the lunar lander. Get (s, a, R(s),s').
Store 10,000 most recent (s,a,R(s),s') tuples. (Replay buffer) }

Train neural network:

$Q = Q_{new}$

Choosing actions while still learning

In some state s, option:

  1. Pick the action a that maximizes Q(s,a)
    • With prob 0.95, pick the action a that maximizes Q(s,a) (GREEDY)
    • With prob 0.05, pick an action a randomly (EXPLORATION)
    • $\epsilon - Greedy Policy(\epsilon = 0.05)$
    • Usually start $\epsilon$ high and gradually decrease (1 -> 0.01)

Tuning RL (Mini-batch and soft updates)

RL can be finicky to tune the parameters compared to supervised learning

Limitation of RL