Use OrdinalEncoder instead of OneHotEncoder with tree-based models

แชร์
ฝัง
  • เผยแพร่เมื่อ 18 พ.ย. 2024

ความคิดเห็น • 14

  • @dataschool
    @dataschool  3 ปีที่แล้ว +2

    Have you tried OrdinalEncoder with your tree-based model? Let me know how it compares to OneHotEncoder!

    • @sophiazhou9119
      @sophiazhou9119 2 ปีที่แล้ว

      I tried with randomforest and tree classifier, but the problem with ordinalEncoder is that the tree might treat it as a real number and break it down into a decimal number when spitting. How do you deal with that?

  • @dhirajkumarsahu999
    @dhirajkumarsahu999 3 ปีที่แล้ว +2

    Yes, this makes sense to me. Models like linear regression gives importance to features based on the weights. Hence using one hot encoding in case of unordered categories is important in case of linear regression. Please correct me if I am wrong.

  • @elmoreglidingclub3030
    @elmoreglidingclub3030 2 ปีที่แล้ว

    Very interesting. I’d like to work with this a bit; what is the data set you used?
    I have an interesting data set (~2,300 rows, 13 features) that can give some bizarre accuracy results using a single classification tree but performs much, much better with a random forest. I’ll try ordinal encoding on it and let you know how it performs. Good stuff! Again, please, what is this data set?

    • @dataschool
      @dataschool  ปีที่แล้ว +1

      See here: nbviewer.org/github/justmarkham/scikit-learn-tips/blob/master/notebooks/43_ordinal_encoding_for_trees.ipynb

  • @grzegorzzawadzki8718
    @grzegorzzawadzki8718 3 ปีที่แล้ว

    Thanks! That was very helpful.

  • @Dara-lj8rk
    @Dara-lj8rk 3 ปีที่แล้ว +1

    Good one thanks

    • @dataschool
      @dataschool  3 ปีที่แล้ว

      You're very welcome!

  • @alfathterry7215
    @alfathterry7215 3 ปีที่แล้ว

    interesting...

  • @anandvyavahare2031
    @anandvyavahare2031 3 ปีที่แล้ว

    Who on earth even tried it to find it? 😂😂