Thanks for your question! Here's what Jericho had to say about how he got those numbers: Isolation forests use a large number of randomized attempts to separate the data and count how many cuts it takes in each attempt to separate each datapoint. From that collection of counts for each record, scores are calculated. Since this is not straightforward to show by hand, I used the scikit-learn Python package and the wine dataset to calculate the scores, limiting the wine dataset to flavonoids and malic acid features. Then I took some example points from the outer edges and one from the middle of the real results and illustrated them as closely as possible in the whiteboard example. --- Here are the links to the scikit-learn and Python resources: scikit-learn.org/stable/modules/generated/sklearn.ensemble.IsolationForest.html archive.ics.uci.edu/dataset/109/wine
Great video, explained in a very intuitive way!
Glad you enjoyed the video!
how are you getting the numbers -0.05, 0.10 and so on?
Thanks for your question! Here's what Jericho had to say about how he got those numbers:
Isolation forests use a large number of randomized attempts to separate the data and count how many cuts it takes in each attempt to separate each datapoint. From that collection of counts for each record, scores are calculated.
Since this is not straightforward to show by hand, I used the scikit-learn Python package and the wine dataset to calculate the scores, limiting the wine dataset to flavonoids and malic acid features. Then I took some example points from the outer edges and one from the middle of the real results and illustrated them as closely as possible in the whiteboard example.
---
Here are the links to the scikit-learn and Python resources:
scikit-learn.org/stable/modules/generated/sklearn.ensemble.IsolationForest.html
archive.ics.uci.edu/dataset/109/wine
Thank you a lot go ahead!