Isolation Forests: Identify Outliers in Data

แชร์
ฝัง
  • เผยแพร่เมื่อ 5 มี.ค. 2023
  • In this video, senior data scientist Jericho McLeod walks us through an anomaly detection method called Isolation Forests. He demonstrates how to use the technique to quickly and accurately identify outliers by isolating data points. This method has many advantages, including its speed, ability to generalize, and low memory usage.
    Transcript: www.elderresearch.com/resourc...
  • วิทยาศาสตร์และเทคโนโลยี

ความคิดเห็น • 5

  • @elhairachmohamedlimam9640
    @elhairachmohamedlimam9640 ปีที่แล้ว

    Thank you a lot go ahead!

  • @Bentley642
    @Bentley642 3 หลายเดือนก่อน

    Great video, explained in a very intuitive way!

    • @elderresearch
      @elderresearch  3 หลายเดือนก่อน

      Glad you enjoyed the video!

  • @muslimahmukbang417
    @muslimahmukbang417 3 หลายเดือนก่อน

    how are you getting the numbers -0.05, 0.10 and so on?

    • @elderresearch
      @elderresearch  3 หลายเดือนก่อน

      Thanks for your question! Here's what Jericho had to say about how he got those numbers:
      Isolation forests use a large number of randomized attempts to separate the data and count how many cuts it takes in each attempt to separate each datapoint. From that collection of counts for each record, scores are calculated.
      Since this is not straightforward to show by hand, I used the scikit-learn Python package and the wine dataset to calculate the scores, limiting the wine dataset to flavonoids and malic acid features. Then I took some example points from the outer edges and one from the middle of the real results and illustrated them as closely as possible in the whiteboard example.
      ---
      Here are the links to the scikit-learn and Python resources:
      scikit-learn.org/stable/modules/generated/sklearn.ensemble.IsolationForest.html
      archive.ics.uci.edu/dataset/109/wine