Dealing with Imbalanced Datasets in ML Classification Problems | DataHour by Damini Dasgupta

แชร์
ฝัง
  • เผยแพร่เมื่อ 21 ส.ค. 2024

ความคิดเห็น • 13

  • @chukwumanwakpa3330
    @chukwumanwakpa3330 ปีที่แล้ว +1

    Thank you so much analytics Vidhya. I must commend you all for my improvement in deploying ML algorithms in solving problems. Just an observation please, I think it would be better if we can get the slide for all the data sets used in all lectures. thank you. Much love from Nigeria.

    • @Analyticsvidhya
      @Analyticsvidhya  ปีที่แล้ว

      Dear Learner, Refer to our DataHack Platform for DataHour Material and Speaker Coordinates: datahack.analyticsvidhya.com/contest/all/

  • @shreyashimukhopadhyay6354
    @shreyashimukhopadhyay6354 ปีที่แล้ว +1

    Great tutorial on imbalanced data 👍Analytics Vidya Can you please share the notebook.

    • @Analyticsvidhya
      @Analyticsvidhya  ปีที่แล้ว +1

      Dear Shreyashi, here's the download link: drive.google.com/drive/folders/1KV-CNZmuy8sqYDc1Jri-hCvYCeHSbYxs?usp=sharing

    • @shreyashimukhopadhyay6354
      @shreyashimukhopadhyay6354 ปีที่แล้ว

      @@Analyticsvidhya Thank you so much! 👍

  • @younesgasmi8518
    @younesgasmi8518 7 หลายเดือนก่อน

    thank you so much miss. my question is can I use the undersampling technique before splitting the dataset into training and testing sets because there is not any data leakage when we use this method

    • @Analyticsvidhya
      @Analyticsvidhya  7 หลายเดือนก่อน +1

      You're absolutely right that undersampling can be done before splitting the dataset in imbalanced classification problems. In fact, it's generally considered the preferred approach to avoid data leakage!
      Here's why:
      ➡️ Data leakage: If you undersample after splitting, some minority class instances might accidentally leak into the test set, making your model's performance seem better than it truly is on unseen data.
      ➡️ Preserving real-world distribution: Undersampling before the split ensures the training set reflects the actual imbalance in your real-world data, leading to a model that generalizes better.
      However, keep in mind that undersampling also has drawbacks like losing potentially valuable minority class data. It's always a good idea to compare different techniques like oversampling or SMOTE before making a final decision.

  • @gecarter53
    @gecarter53 ปีที่แล้ว

    Great seminar. Is the code publicly available?

    • @Analyticsvidhya
      @Analyticsvidhya  ปีที่แล้ว

      Dear Learner, Refer to our DataHack Platform for DataHour Material and Speaker Coordinates: datahack.analyticsvidhya.com/contest/all/

  • @alexilaiho6441
    @alexilaiho6441 2 หลายเดือนก่อน

    The talk fails to cover how to deal with imabalnced datasets using SMOTE, and also using Focal Loss for Neural Nets

    • @Analyticsvidhya
      @Analyticsvidhya  2 หลายเดือนก่อน

      DataHour Resources 🔗 bit.ly/3xUaue4

  • @therevolution8611
    @therevolution8611 ปีที่แล้ว

    Can I use oversampling , if I have multi label text for classification purpose??

    • @Analyticsvidhya
      @Analyticsvidhya  ปีที่แล้ว

      Dear Learner, Refer to our DataHack Platform for DataHour Material and Speaker Coordinates: datahack.analyticsvidhya.com/contest/all/