02:00: lecture theme: how to build a decision tree 05:10: gap (skip) 05:40: decision tree example 08:09: tabular view of decision tree - use Numpy 2D array 9:00: if a factor value represents a leaf node, the associated SplitVal represents a y_pred value and not a split value 10:30: student question about binary tree efficiency 13:00: algorithm for building decision tree 15:00: decision tree algorithm (JR Quinlan) 16:30: IF base cases, RETURN row of data 17:00: description of recursive algorithm 22:40: student questions 24:50: how to determine the "best" feature 29:10: student questions • Check which axis your tree rows are being appended to each other 30:45: ndarray representation of decision tree 33:30: Use highest absolute value of correlation to select optimal 'splitting' factor • Numpy has a built-in correlation method • Split value should be the median value 35:30: gap (skip) 36:20: ndarray representation of decision tree (cont.) 39:00: when multiple factors have the same correlation value with y, use a deterministic approach to select the "best split factor" (e.g. the lowest indexed factor) rather than a random approach -- this will make results more reproduceable because we have to calculate correlation or entropy, Gini Impurity, etc. 43:30: which steps in JR Quinlan decision tree algorithm are the most computationally expensive? 44:30: A: determining the best feature to split on 45:30: Most time-intensive parts of the JR Quinlan Decision Tree algorithm are: 1. Determining the best feature to split on 2. Calculating the median for SplitVal • Random trees can help with this 46:30: Random Tree Algorithm (A Cutler) 50:30: You can create a random tree by randomly selecting features of the tree OR by creating a tree with randomly selected subsets of the training data 52:30: Strengths & weaknesses of Decision Tree learners • Cost of learning: decision trees more expensive than parametric or KNN • Cost of querying: Linear regression is fastest. But decision trees are faster than KNN • Decision Trees: No need to normalize data
At around 11:50, there was a question on why binary trees are more efficient and why not use higher branching? In fact, higher branching trees, like 2-3 RedBlack trees and even B-Trees are made from the same basic BST structure. For example to answer 'yes, no or maybe', left branch could just be 'yes' and right branch just delegates answering to the sub-tree (which could be a leaf node)
FYI Factor 2 was "volatile acidity" and negatively correlated with rating around -0.391 (almost as highly correlated as alcohol content). Volatile acidity is the presence of acidic aromas in wine basically, and can have a similar effect to smelling vinegar or even nail polish remover (which is why it's a negative correlation - more is worse). Factor 10 was the amount of sulphates in the wine, which had a positive correlation with score of around 0.251. (I actually think they meant "sulfites" or possibly it's just a translation issue.) Sulfites in wine occur both naturally and via additives. The purpose of sulfites is to prevent spoilage and oxidation during the fermentation process, which obviously weakens (or outright ruins) the wine.
@@zhuzhuqing4862 Whaddya mean? 10.9 is the median of 10.0 and 11.8 on the right left subtree. 10.7 is the median of 10.5 and 10.9 on the right right subtree. Looks fine to me.
02:00: lecture theme: how to build a decision tree
05:10: gap (skip)
05:40: decision tree example
08:09: tabular view of decision tree - use Numpy 2D array
9:00: if a factor value represents a leaf node, the associated SplitVal represents a y_pred value and not a split value
10:30: student question about binary tree efficiency
13:00: algorithm for building decision tree
15:00: decision tree algorithm (JR Quinlan)
16:30: IF base cases, RETURN row of data
17:00: description of recursive algorithm
22:40: student questions
24:50: how to determine the "best" feature
29:10: student questions
• Check which axis your tree rows are being appended to each other
30:45: ndarray representation of decision tree
33:30: Use highest absolute value of correlation to select optimal 'splitting' factor
• Numpy has a built-in correlation method
• Split value should be the median value
35:30: gap (skip)
36:20: ndarray representation of decision tree (cont.)
39:00: when multiple factors have the same correlation value with y, use a deterministic approach to select the "best split factor" (e.g. the lowest indexed factor) rather than a random approach -- this will make results more reproduceable because we have to calculate correlation or entropy, Gini Impurity, etc.
43:30: which steps in JR Quinlan decision tree algorithm are the most computationally expensive?
44:30: A: determining the best feature to split on
45:30: Most time-intensive parts of the JR Quinlan Decision Tree algorithm are:
1. Determining the best feature to split on
2. Calculating the median for SplitVal
• Random trees can help with this
46:30: Random Tree Algorithm (A Cutler)
50:30: You can create a random tree by randomly selecting features of the tree OR by creating a tree with randomly selected subsets of the training data
52:30: Strengths & weaknesses of Decision Tree learners
• Cost of learning: decision trees more expensive than parametric or KNN
• Cost of querying: Linear regression is fastest. But decision trees are faster than KNN
• Decision Trees: No need to normalize data
DT Learner Algorithm Explanation starts at 14:30
RT Learner Algorithm Explanation starts at 47:00
Algorithm explained starting at 14:27
My man!
At around 11:50, there was a question on why binary trees are more efficient and why not use higher branching?
In fact, higher branching trees, like 2-3 RedBlack trees and even B-Trees are made from the same basic BST structure. For example to answer 'yes, no or maybe', left branch could just be 'yes' and right branch just delegates answering to the sub-tree (which could be a leaf node)
These requirements require a law degree to understand...
🤣
FYI Factor 2 was "volatile acidity" and negatively correlated with rating around -0.391 (almost as highly correlated as alcohol content). Volatile acidity is the presence of acidic aromas in wine basically, and can have a similar effect to smelling vinegar or even nail polish remover (which is why it's a negative correlation - more is worse). Factor 10 was the amount of sulphates in the wine, which had a positive correlation with score of around 0.251. (I actually think they meant "sulfites" or possibly it's just a translation issue.) Sulfites in wine occur both naturally and via additives. The purpose of sulfites is to prevent spoilage and oxidation during the fermentation process, which obviously weakens (or outright ruins) the wine.
You're probably going to use random values using cutlers model
When we split based on the median (half the data is greater than median and the other half is less), do we assume that the column is always sorted?
The answer is yes, the column is sorted. I should have continued watching.
42:44 The answer is WRONG!!!!!!!!!!!!!!!!!!!!!!!
The value 10.9 should on the right right subtree.
@@zhuzhuqing4862 Whaddya mean? 10.9 is the median of 10.0 and 11.8 on the right left subtree. 10.7 is the median of 10.5 and 10.9 on the right right subtree. Looks fine to me.
Shouldn't the top right value be 8 instead of 7?
@@michaelgentry7534 yes.