Let’s Write a Decision Tree Classifier from Scratch - Machine Learning Recipes #8

แชร์
ฝัง
  • เผยแพร่เมื่อ 22 ม.ค. 2025

ความคิดเห็น • 233

  • @nbamj88
    @nbamj88 7 ปีที่แล้ว +285

    In nearly 10 min, he explained the topic extremely well
    Amazing job.

  • @FacadeMan
    @FacadeMan 7 ปีที่แล้ว +91

    Thanks a lot, Josh. To a very basic beginner, every sentence you say is a gem. It took me half hour to get the full meaning of the first 4 mins of the video, as I was taking notes and repeating it to myself to grasp everything that was being said.
    The reason I wanted to showcase my slow pace is to say how important and understandable I felt in regard to every sentence.
    And, it wasn't boring at all.
    Great job, and please, keep em coming.

  • @dabmab2624
    @dabmab2624 4 ปีที่แล้ว +224

    Why can't all professors explain things like this? My professor: "Here is the idea for decision tree, now code it"

    • @exoticme4760
      @exoticme4760 4 ปีที่แล้ว +2

      agreed!

    • @pauls60r
      @pauls60r 4 ปีที่แล้ว +12

      I realized years after graduation, many professors either have received no training in teaching or have little interest in teaching, undergrads in particular. I can't say I've learned more on TH-cam than I did in college but I have a whole lot of "OOOOOH, that's what my professor was talking about!" moments when watching videos like this. This stuff would've altered my life 20 years ago.

    • @carol8099
      @carol8099 4 ปีที่แล้ว

      Same! I really wish they could dig more into the coding part, but they either don't cover it or don't teach coding well.

    • @avijitmandal9124
      @avijitmandal9124 4 ปีที่แล้ว

      hey can someone give the link for doing pruning

    • @Skyfox94
      @Skyfox94 4 ปีที่แล้ว +2

      Whilst I definitely agree, I have to say that, in order to understand algorithms like this one, you'll have to just work through them. No matter how many interesting and well thought out videos you watch, it'll always be most effective if you afterwards try and build it yourself. The fact that you're watching this in your free time shows that you are interested in the topic. That's also worth a lot. Sometimes you'll only be able to appreciate what professors taught you, after you get out of college/uni and realize how useful it would have been.

  • @donking6996
    @donking6996 4 ปีที่แล้ว +5

    I am crying tears of joy! How can you articulate such complex topics so clearly!

  • @cbrtdgh4210
    @cbrtdgh4210 6 ปีที่แล้ว +5

    This is the best single resource on decision trees that I've found, and it's a topic that isn't covered enough considering that random forests are a very powerful and easy tool to implement. If only they released more tutorials!

  • @georgevjose
    @georgevjose 7 ปีที่แล้ว +55

    Finally after a year. Pls continue this course.

  • @riadhsaid3548
    @riadhsaid3548 6 ปีที่แล้ว +14

    Even it took me more than 30 minutes to complete & understand the video. I can not tell you how this explanation is amazing !
    This is how we calculate the impurity !
    PS: G(k) = Σ P(i) * (1 - P(i))
    i = (Apple, Grape,Lemon)
    2/5 * (1- 2/5) + 2/5 * (1- 2/5) + 1/5 *(1-1/5)=
    0.4 * (0.6) + 0.4 * (0.6) + 0.2 * (0.8)=
    0.24 + 0.24 + 0.16 = 0.64

    • @ksenyaisavnina
      @ksenyaisavnina 4 ปีที่แล้ว

      or 1 - (2/5)^2 - (2/5)^2 - (1/5)^2

    • @vardhanshah8843
      @vardhanshah8843 4 ปีที่แล้ว

      Thank you very much for this explanation I went to the comment section to ask this question but you answer it very nicely.

  • @hbunyamin
    @hbunyamin 5 ปีที่แล้ว +13

    I have already known the concept; however, when I have to translate the concept into code ... I find it quite difficut and this video explains that smoothly.
    Thank you so much for the explanation!

  • @sundayagu5755
    @sundayagu5755 4 ปีที่แล้ว +1

    As a beginner, this work has given me hope to pursue a career in ML. I have red and understood the concepts of Decision Tree. But the code becomes a mountain which has been levelled. Jose, thank you my brother and may God continue to increase you 🙏.

  • @shreyanshvalentino
    @shreyanshvalentino 7 ปีที่แล้ว +73

    a year later, finally!

  • @WilloftheWinds
    @WilloftheWinds 7 ปีที่แล้ว +8

    Welcome back Josh, thought we would never get another awesome tutorial, thanks for your good work.

  • @TomHarrisonJr
    @TomHarrisonJr 5 ปีที่แล้ว +2

    One of the clearest and most accessible presentations I have seen. Well done! (and thanks!)

  • @AyushGupta-kp9xf
    @AyushGupta-kp9xf 4 ปีที่แล้ว

    So much value in just 10 mins, this is Gold

  • @JulitaOtusek
    @JulitaOtusek 6 ปีที่แล้ว +32

    I think you might confusing Information Gain and Gini Index. Information gain is reduce of entropy, not reduce of gini impurity. I almost did a mistake in my Engineering paper because of this video. But I luckily noticed different definition of information gain in a different source. Maybe it's just thing of naming but it can mislead people who are new in this subject :/

    • @liuqinzhe508
      @liuqinzhe508 2 ปีที่แล้ว +3

      Yes. Information gain and Gini index are not really related to each other when we generate a decision tree. They are two different approaches. But overall still a wonderful video.

    • @leonelp9593
      @leonelp9593 2 ปีที่แล้ว

      thanks for clarify this!

  • @mindset873
    @mindset873 4 ปีที่แล้ว

    I've never seen any other channels like this. So deep and perfect.

  • @BestPromptHub
    @BestPromptHub 7 ปีที่แล้ว

    You have no idea how your videos helped me out on my journey on Machine Learning. thanks a lot Josh you are awesome.
    回复

  • @gautambakliwal826
    @gautambakliwal826 7 ปีที่แล้ว +1

    You have saved weeks amount of work. So short yet so deep. Guys first try to understand the code then watch the video.

  • @anupam1
    @anupam1 7 ปีที่แล้ว +3

    Thanks, was really looking for this series...nice to see you back

  • @BlueyMcPhluey
    @BlueyMcPhluey 7 ปีที่แล้ว +1

    loving this series, glad it's back

  • @falmanna
    @falmanna 7 ปีที่แล้ว

    Please keeps this series going.
    It's awesome!

  • @teosurch
    @teosurch 2 หลายเดือนก่อน

    Incredibly clear explanation. Thank you!

  • @hyperealisticglass
    @hyperealisticglass 6 ปีที่แล้ว +7

    This single 9-minute video does a way better job than what my ML teacher did for 3 hours.

    • @marklybeer9038
      @marklybeer9038 3 ปีที่แล้ว

      I know, right? I had the same experience with an instructor. . . it was a horrible memory. Thanks for the video!

  • @BreakPhreak
    @BreakPhreak 7 ปีที่แล้ว

    Started to watch the series 2 days ago, you are explaining SO well. Many thanks!
    More videos on additional types of problems we can solve with Machine Learning would be very helpful. Few ideas: traveling salesman problem, generating photos while emulating analog artefacts or simple ranking of new dishes I would like to try based on my restaurants' order history. Even answering with the relevant links/terminology would be fantastic.
    Also, would be great to know what problems are still hard to solve or should not be solved via Machine Learning :)

  • @leoyuanluo
    @leoyuanluo 4 ปีที่แล้ว

    best video about decision tree thus far

  • @tymothylim6550
    @tymothylim6550 3 ปีที่แล้ว

    Thank you very much for this video! I learnt a lot on how to understand Gini Coefficient and how it is used to pick the best questions to split the data!

  • @ryanp9441
    @ryanp9441 2 ปีที่แล้ว

    so INSTRUCTIVE. thank you so much for your clear & precise explanation

  • @gautamgadipudi8213
    @gautamgadipudi8213 4 ปีที่แล้ว

    Thank you Josh! This is my first encounter with machine learning and you made it very interesting.

  • @andrewbeatty5912
    @andrewbeatty5912 7 ปีที่แล้ว +25

    Brilliant explanation !

  • @aryamanful
    @aryamanful 6 ปีที่แล้ว

    I don't generally comment on videos but this video has so much clarity something had to be said

  • @Sanchellios
    @Sanchellios 7 ปีที่แล้ว

    OH MYYYYYYYYY!!!! You're back! I'm SOSOOOOOOSOSOSOSOSOSOOO happy!

  • @msctube45
    @msctube45 4 ปีที่แล้ว

    Thank you Josh for preparing and explaining this presentation aa well as the software to help the understanding of the topics. Great job!

  • @Dedsman
    @Dedsman 7 ปีที่แล้ว +3

    Why Impurity is calculated one way on 5:33 and on the code it's calculated differently? (1-(times the # of possible labels) vs 1-(# of possible labels)**2)?

    • @yizhang8106
      @yizhang8106 7 ปีที่แล้ว

      same question..

    • @ThePujjwal
      @ThePujjwal 7 ปีที่แล้ว

      The wiki explains this one line derivation
      en.wikipedia.org/wiki/Decision_tree_learning#Gini_impurity

  • @Conk-bepis
    @Conk-bepis 5 ปีที่แล้ว +2

    Please cover ID 3 algorithm, explanation for CART was great!

  • @alehandr0s
    @alehandr0s 5 ปีที่แล้ว

    In the most simple and comprehensive way. Great job!

  • @sajidbinmahamud2414
    @sajidbinmahamud2414 7 ปีที่แล้ว +12

    Long time!
    i've been waiting for so long

  • @huuhieupham9059
    @huuhieupham9059 6 ปีที่แล้ว +1

    Thanks for your sharing. You made it easy to understand for everybody

  • @AbdulRahman-jl2hv
    @AbdulRahman-jl2hv 5 ปีที่แล้ว

    thank you for such a simple yet comprehensive explanation.

  • @browneealex288
    @browneealex288 4 ปีที่แล้ว

    At 8:41 He says Now the previous call returns and this node become decision node. What does that mean? How is this possible to return to the root node(false branch(upper line ))after executing the final return of the function. Please give your thoughts it will help me a lot.

  • @fathimadji8570
    @fathimadji8570 4 ปีที่แล้ว +2

    Excuse me, I am still not clear about how the value of 0.64 comes out, can you explain a little more?

  • @rohitgavirni3400
    @rohitgavirni3400 4 ปีที่แล้ว

    The script is tightly edited. Much appreciated.

  • @Po-YuJuan-g9k
    @Po-YuJuan-g9k 2 ปีที่แล้ว +1

    Sooo dooope !!!!
    Helpful 🔥🔥🔥

  • @lenaara4569
    @lenaara4569 7 ปีที่แล้ว

    You explained it so well. I have been struggling to get it since 2 days. great job !!

  • @johnstephen399
    @johnstephen399 7 ปีที่แล้ว

    This was awesome. Please continue this series.

  • @dinasamir2778
    @dinasamir2778 4 ปีที่แล้ว

    It is great course. I hope you continue and make videos to all machine learning algorithms. Thanks Alot.

  • @rodrik1
    @rodrik1 6 ปีที่แล้ว

    best video on decision trees! super clear explanation

  • @mingzhu8093
    @mingzhu8093 6 ปีที่แล้ว +1

    Question about calculating impurity. If we do probability, we first draw data which give us probability of 0.2 then we draw label which give us another 0.2. Shouldn't the impurity be 1 - 0.2*0.2=0.96?

  • @j0kersama669
    @j0kersama669 4 ปีที่แล้ว +1

    6:22 Impurity = 0.62? How? What is the formular?

  • @gorudonu
    @gorudonu 7 ปีที่แล้ว +2

    Was waiting for the next episode! Thank you!

  • @stefanop.6097
    @stefanop.6097 7 ปีที่แล้ว +1

    Please continue your good work! We love you!

  • @aseperate
    @aseperate ปีที่แล้ว

    The Gino impurity function in the code does not output the same responses listed in the video. It’s quite confusing.

  • @jakobmethfessel6226
    @jakobmethfessel6226 5 ปีที่แล้ว +1

    I thought CART determined splits solely on gini index and that ID3 uses the average impurity to produce information gain.

  • @saimmehmood6936
    @saimmehmood6936 7 ปีที่แล้ว +1

    Would be glad to see English subtitles added to this episode as well.

    • @hamza-325
      @hamza-325 7 ปีที่แล้ว

      His english is very clear for me

  • @csorex2376
    @csorex2376 5 ปีที่แล้ว +2

    Can you cover Random Forest and SVM too

  • @debanjandhar6395
    @debanjandhar6395 6 ปีที่แล้ว

    Awesome video, helped me lot.... Was struggling to understand these exact stuffs.....Looking forward to the continuing courses.

  • @njagimwaniki4321
    @njagimwaniki4321 6 ปีที่แล้ว

    How come at 6:20 he calls it average but doesn't divide it by 2? Also the same thing in a stack overflow question it seems to be called entropy after. Is this correct?

  • @dunstantough5134
    @dunstantough5134 2 ปีที่แล้ว

    This video has saved my life 😆

  • @jaydevparmar9876
    @jaydevparmar9876 7 ปีที่แล้ว

    great to see you back

  • @guitarheroprince123
    @guitarheroprince123 7 ปีที่แล้ว

    Gosh I remember when this series first started, I knew nothing about AI or machine learning and now I'm like full on neural nets and TensorFlow. Gotta admit since I don't have formal education on ml, I don't classical models as much I understand neural nets.

  • @slr3123
    @slr3123 2 ปีที่แล้ว

    I understood it as "when Gini Impurity of parent node is zero, Information Gain with child nodes is also zero. So we don't have to ask more question to classify." Is it right?

  • @aryamanful
    @aryamanful 6 ปีที่แล้ว

    I have a follow up question. How did we come up with the questions. As in..how did we know we would like to ask if the diameter is > 3, why not ask if diameter > 2?

  • @alirezagh1456
    @alirezagh1456 9 หลายเดือนก่อน

    One of the best course i ever seed

  • @mrvzhao
    @mrvzhao 7 ปีที่แล้ว +3

    At first glance this almost looks like Huffman coding. Thanks for the great vid BTW!

  • @congliulyc
    @congliulyc 6 ปีที่แล้ว

    best and most helpful tutorial ever seen! Thanks!

  • @IvanSedov-i7f
    @IvanSedov-i7f 4 ปีที่แล้ว

    I like your video, man. Its real simple and cool.

  • @senyotsedze3388
    @senyotsedze3388 ปีที่แล้ว

    you are awesome, man! but why is it that, the second question on if the color is yellow? you separated only apple when two grapes are red. Or is it because they are already taken care of at the first false split of the node?

  • @andreachristelle5359
    @andreachristelle5359 5 ปีที่แล้ว

    Clear with good English and Python explanations. So nice to find both together! Thank you!

  • @adarshs1636
    @adarshs1636 7 ปีที่แล้ว

    I have a doubt, At 6:20, how the impurity becomes 0.64 ? also the impurity for false condition giving 0.62. Please help me.

    • @ThePujjwal
      @ThePujjwal 7 ปีที่แล้ว

      HI Adarsh, here is the definition of impurity as per wikipedia en.wikipedia.org/wiki/Decision_tree_learning#Gini_impurity
      "Gini impurity is a measure of how often a randomly chosen element from the set would be incorrectly labeled if it was randomly labeled according to the distribution of labels in the subset."
      The last part from above statement "labeled according to the distribution of labels in the subset." is important part of definition.
      So the explanation for 0.64 is as follows :
      => (2/5) * (1 - 2/5)^2 + (2/5) * (1 - 2/5)^2 + (1/5) * (1-1/5)^2
      => 16/25 = 0.64
      The below formula can also be applied (for proof refer the above wiki link)
      => 1 - ( (2/5)^2 + (2/5)^2 + (1/5)^2 ) = 0.64
      Similar calculation can be done for 0.62. Do let me know if you need any other help on these calculations or how this formula came, i think in video, gini impurity is not explained too well (that's what i felt)

  • @Xiaoniana
    @Xiaoniana 5 ปีที่แล้ว

    Thank
    Thank's it was very informative. It took me hours to understand what is meant. Keep going

  • @allthingsmmm
    @allthingsmmm 5 ปีที่แล้ว

    Could you do an example in which the output triggers a method that changes it's self based on success or failure? An easier example, iterations increase or decrease based on probability; Or left, right up, down memorizing a maze pattern?

  • @moeinhasani8718
    @moeinhasani8718 6 ปีที่แล้ว

    very useful.this the best tutorial out on web

  • @adampaxton5214
    @adampaxton5214 3 ปีที่แล้ว

    Great video and such clear code to accompany it! I learned a lot :)

  • @guccilover2009
    @guccilover2009 6 ปีที่แล้ว

    amazing video!!! Thank you so much for the great lecture and showing the python code to make us understand the algorithm better!

  • @mohammadbayat1635
    @mohammadbayat1635 ปีที่แล้ว

    Why Impurity is 0.62 after partitioning on "Is color green" on the left subtree?

  • @erikslatterv
    @erikslatterv 7 ปีที่แล้ว +1

    You’re back!!!

  • @panlis6243
    @panlis6243 6 ปีที่แล้ว

    I don't get one thing here. How do we determine the number for the question. Like I understand that we try out different features to see which gives us the most info but how do we choose the number and condition for it?

  • @omarsherif88
    @omarsherif88 2 ปีที่แล้ว

    Awesome tutorial, many thanks!

  • @tooniatoonia2830
    @tooniatoonia2830 3 ปีที่แล้ว

    I built a tree from scratch but I am stuck making a useful plot like is obtainable in sklearn. Any help?

  • @Julia-zi9cl
    @Julia-zi9cl 6 ปีที่แล้ว

    Does anyone knows how they created those flowcharts with tables at 1:24~ 1:27?

  • @bhuvanagrawal1323
    @bhuvanagrawal1323 5 ปีที่แล้ว

    Could you make a similar video on fuzzy decision tree classifiers or share a good source for studying and implementing them?

  • @uditarpit
    @uditarpit 5 ปีที่แล้ว

    It is easy to find best split if data is categorical. How do split happens in a time optimized way if variable is continuous unlike color or just 2 values of diameter? Should I just run through min to max values? Can median be used here? Please suggest!!

  • @HarpreetKaur-qq8rx
    @HarpreetKaur-qq8rx 4 ปีที่แล้ว

    Why is the impurity at the decision node "color=green" equal to 0.62

  • @muslimbekabduganiev7483
    @muslimbekabduganiev7483 4 ปีที่แล้ว +1

    You are creating a question with only one value, what if I want to have a question like "Is it GREEN OR YELLOW?". So, basically, I will have to test all combinations of values of size 2 to find the best info_gain for a particular attribute. Furthermore, we could test all possible sizes of a question. Would that give a better result or is it better to use only one value of the attribute to build the question?

    • @muslimbekabduganiev7483
      @muslimbekabduganiev7483 4 ปีที่แล้ว

      On top of that, why do we use binary partitioning? Can't we use the same attribute to ask a new question on the false rows, but excluding attribute values used in the true rows?

  • @ritikvimal4915
    @ritikvimal4915 4 ปีที่แล้ว

    well explained in such a short time

  • @adamtalent3559
    @adamtalent3559 5 ปีที่แล้ว

    Thanks for your lovely lecture.how to catagorize more than 2 prediction classes at the same time ?

  • @KamEt-69
    @KamEt-69 4 ปีที่แล้ว

    How comes that in the calculation of the GINI Impurity we remove from the impurity the square of the probability of each label?

  • @mcab2222
    @mcab2222 6 ปีที่แล้ว

    perfect video on the implementation and the topic

  • @supriyakarmakar1111
    @supriyakarmakar1111 6 ปีที่แล้ว

    I get lots of idea , thanks sir.But my question to you that if the data set is too large then what will i do ?

  • @sarrakharbach
    @sarrakharbach 6 ปีที่แล้ว

    That was suuuuper amazing!! Thanks for the video!

  • @RajChauhan-hd9hu
    @RajChauhan-hd9hu 6 ปีที่แล้ว

    If the training_data in the code you showed is very large then how to make necessary changes to get the same output?

  • @سميرشيخ-ب1س
    @سميرشيخ-ب1س 7 ปีที่แล้ว +1

    After such a long time!

  • @बिहारीभायजी
    @बिहारीभायजी 2 ปีที่แล้ว

    Amazing tutorial but confused too..
    6:22 Here it is not clear , information gain for what?
    7:45 Here we are finding IG corresponding to each question at each node

  • @MW2ONLINEGAMER100
    @MW2ONLINEGAMER100 6 ปีที่แล้ว

    Thank you so much, beautifully written code too.

  • @kaziranga_national_park
    @kaziranga_national_park ปีที่แล้ว

    Hello sir, it is possible to classify animal-trapped camera images and segregate them into folders using an automatic process. This can be done using machine learning and computer vision techniques. Please make a video. I work in the forest department. Many Photographs capture a maximum of 18 lakhs. One by one segregation of having a problem. Please help us

  • @doy2001
    @doy2001 6 ปีที่แล้ว

    Impeccable explanation!

  • @venkateshkoka8508
    @venkateshkoka8508 7 ปีที่แล้ว

    do you have the code/video for changing the decisiontreeclassifier to decisionTreeRegressor??

  • @sampadathorat5885
    @sampadathorat5885 4 ปีที่แล้ว

    I m using ua algo....but getting queries...how do I connect u??

  • @sergior.m.5694
    @sergior.m.5694 6 ปีที่แล้ว

    Best explanation ever, thank you sir

  • @houjunliu5978
    @houjunliu5978 7 ปีที่แล้ว

    Yaaaay! Your back!

  • @ericcartman106
    @ericcartman106 2 ปีที่แล้ว

    Care to explain where does 4/5 or 1/5 come from? There are 11 training examples, and the left only has 8, the number does not add up.

    • @necraz__6446
      @necraz__6446 ปีที่แล้ว

      Diameter is a parameter in the decision tree not count of apples or grapes. Each row is unique apple or grape or lemon(that's why they are in a column called Label - this is how the tree learns based on the previous two parameters - color and parameter) The average impurity is calculated from count of rows in true and false branches.

  •  4 ปีที่แล้ว

    What is the syntax to run this on a CSV file?