Advance Learning Algorithms
01 August 2022 -
2 mins read time
Tags:
Learning
Machine Learning
Artificial Intelligence
Week 3 - Advice on applying ML
Error metrics for skewed datasets
- Generate confusion matrix and look at precision and recall
- Precision = True Positive / Total Predicted Positive
- Recall = True Positives / Total Actual Positive
- AUC for representation of threshold adjustments
- F1 score (Harmonic Mean of P and R, emphasized the smaller value) measure the trade-off P and R in one score:
\(F1 = {1 \over {1 \over 2}({1\over P} + {1\over R})} = 2 {PR \over P + R}\)
Week 4 - Decision Tree / Tree Ensembles
Decision tree
Steps:
- Start with all examples at the root node
- Calculate information gain for all possible features, and pick the one with
the highest information gain
- Split dataset according to selected feature, and create left and right
branches of the tree
- Keep repeating splitting process until stopping criteria is met:
- When a node is 100% one class
- When splitting a node will result in the tree exceeding a maximum
depth
- Information gain from additional splits is less than threshold
- When number of examples in a node is below a threshold
Decisions
- How to choose what feature to split on at each node?
- Maximize Purity (or minimize impurity): How even the split (more = good)?
- When do you stop splitting?
- When a node is 100% one class
- When splitting a node will result in the tree exceeding a maximum depth (defined parameter)
- When improvements in purity score are below a threshold
- When number of examples in a node is below a threshold
Regression Decision Tree
- Choosing a split is based on weighted average variance. It plays a very similar role to the weighted average entropy that we had used when deciding what split to use for a classification problem. Result show how much variance it reduced.
Tree Ensemble (Multiple Decision Trees)
- Bagging (Parallel)
- Bootstrap samples with replacement (smaller subset of full training)
- Generated samples are run with weak learners and aggregated through methods like voting
- Aim to decrease variance, not bias
- E.g. Random Forest
- Boosting (Sequential)
- Focus more on subset of examples that not doing well (increase probability of selection)
- Built in regularization to prevent overfitting
- Aim to decrease bias, not variance.
- E.g. XGBoost
When to use Decision Tree and Tree Ensembles
- Works well on tabular (structured) data
- Not recommended for unstructured data
- Fast
- Small decision trees may be human interpretable
Neural Networks
- Works well on all types of data
- May be slower
- Works with transfer learning
- Easier to string together multiple neural networks