How to deal with unbalanced dataset using custom samplers in Pytorch?

แชร์
ฝัง
  • เผยแพร่เมื่อ 21 ส.ค. 2024

ความคิดเห็น • 10

  • @adrielcabral6634
    @adrielcabral6634 ปีที่แล้ว

    Hi ! I have a question. Using this method, every batch have equal ratio of positive/negative ? for example, batch_size of 32 always will have 16 positives samples and 16 negatives samples (using malignant_pct = 0.5) ? If not, how i can do this ? Great Video, thanks !!!

  • @matejchumlen4785
    @matejchumlen4785 3 ปีที่แล้ว +1

    Hi, thanks for the video, do you have a public repo with the notebook or is it part of a private course?

  • @taki_fouhal
    @taki_fouhal 4 ปีที่แล้ว +1

    thank you for your awesome explanation, i am just wondering if this custom sampler can cover all dataset examples throughout training, because if we randomly shuffle the datset every time we construct a batch we might miss informations from samples who never gotten a chance to be picked due to random picking

    • @JarvislabsAI
      @JarvislabsAI  4 ปีที่แล้ว

      The minority cases gets mostly covered. For the majority classes you can change the replacement to False and increase the number of elements that the iterator has to return. So that it covers most of the majority classes. But not showing all the samples for a majority class is not a problems as we train the model for multiple epochs. Over time the model gets a chance to look at every example.

  • @canernm
    @canernm 3 ปีที่แล้ว

    Hi, thanks for the video! Quick question: when creating a custom sampler, is the __iter__ method always necessary?

  • @aniruddh1507
    @aniruddh1507 3 ปีที่แล้ว

    great video! I have a question though..... I used the diagnosis column in the train data as my target value. Hence I get around 9 target values. The label and the value count is as follows
    unknown 27126
    nevus 5193
    melanoma 584
    BKL 223
    In this scenario I am not focusing on getting the softmax probability which is the competition metric. I want to directly get the result for the test image whether it is nevus, melanoma etc..... I am using resnet 18 as my model and I am using a 10 fold cross validation. Now the question is....according to the data above, should I use sampling of weight or is the imbalance not that bad?
    Also if I need to use sampling of weights, should I follow the methodology you provided in the video?? I am confused

  • @FeddFGC
    @FeddFGC 3 ปีที่แล้ว

    How come when I write your exact Custom sampler, it doesnt compile on my end? Says subsets dont have an attribute "df". Same goes if I use my whole set instead of a subset

  • @mrigankanath7337
    @mrigankanath7337 2 ปีที่แล้ว

    in the end we have wriiten idxs = idxs[:n] , what does this mean? is this necessary?

  • @mikhaeldito
    @mikhaeldito 2 ปีที่แล้ว

    How can I use this for FastAI?

    • @JarvislabsAI
      @JarvislabsAI  2 ปีที่แล้ว

      You may find this useful jarvislabs.ai/blogs/vin-metric-learning/#implementing-a-custom-sampler