Tutorial 44-Balanced vs Imbalanced Dataset and how to handle Imbalanced Dataset

แชร์
ฝัง
  • เผยแพร่เมื่อ 10 ม.ค. 2019
  • Here is a detailed explanation about the balanced vs imbalanced dataset and how to handle the imbalanced dataset.
    #balancedvsimbalanceddataset
    You can buy my book where I have provided a detailed explanation of how we can use Machine Learning, Deep Learning in Finance using python
    Packt url : prod.packtpub.com/in/big-data...
    Amazon url: www.amazon.com/Hands-Python-F...

ความคิดเห็น • 61

  • @sushantapanda4589
    @sushantapanda4589 4 ปีที่แล้ว +1

    You are a great tutor, the way you are explaining, great to see you holds to the subject. Awesome

  • @venkataraomannem6585
    @venkataraomannem6585 5 ปีที่แล้ว +1

    Well-done sir. Thanks for sharing very good to understand everyone

  • @azmathalisyed9114
    @azmathalisyed9114 5 ปีที่แล้ว +1

    Great information,, good explanation.. 👌👌

  • @MsRAJDIP
    @MsRAJDIP 5 ปีที่แล้ว

    Can u show the techniques of handling missing data excluding mean,median mode technique. I read that u can use regression or classification technique to find missing values but never seen implemented.

  • @mayurkhandeshe4813
    @mayurkhandeshe4813 4 ปีที่แล้ว

    your teaching is very effective sir.... very easy to understand

  • @umang8895
    @umang8895 4 ปีที่แล้ว

    great video, easy to understand.

  • @praveensingh1234
    @praveensingh1234 4 ปีที่แล้ว

    Very nice explain, Thanks a lot.

  • @shahnawazkhan1636
    @shahnawazkhan1636 3 ปีที่แล้ว

    Great sir there is no need to join any institute to learn the Data Science just follow the Krish Naik sir playlist.

  • @neelpatel3844
    @neelpatel3844 3 ปีที่แล้ว

    Very informative, thank you.

  • @louerleseigneur4532
    @louerleseigneur4532 2 ปีที่แล้ว

    Thanks Krish

  • @vineetsansi
    @vineetsansi 4 ปีที่แล้ว

    XGboost will take care of the weights by itself and we don't need to do any weight adjustment manually ... is that right??
    Great videos .. thanks for sharing them .. I am sure you will get big number of followers very soon!!
    I am also applying XGBoost to the DataScience youtube channels that I am following and your channel seems to be getting heavier and heavier weights ;)

  • @prasanthkumar7328
    @prasanthkumar7328 5 ปีที่แล้ว

    while doing downsampling as mentioned we will be reducing the points to 100 so which points to be reduced or simply picking in random is also not a good practice so how could we select those 100 points.

  • @SandeepSingh-tf7ni
    @SandeepSingh-tf7ni 5 ปีที่แล้ว

    Simplistic approach for Beginners, would really appreciate if you could do demo with dataset(1000 rows ) of 4-5 features, as well please explain Xgboost. Thanks in advance. Look forward to you response.

  • @abhijitsarkar5946
    @abhijitsarkar5946 5 ปีที่แล้ว

    Nice series. Get going. The numbers should be 630, 270 and the accuracy exactly 90%. This is the same as your original imbalance.

  • @tusharbhatnagar3146
    @tusharbhatnagar3146 4 ปีที่แล้ว +1

    Can you make video/tutorial on hyper parameter tuning in classification algorithms!! As it has been coming to many interviews also.

  • @satyaranjanbehera5492
    @satyaranjanbehera5492 5 ปีที่แล้ว

    good explanation..Thanks..

  • @harshays2873
    @harshays2873 4 ปีที่แล้ว

    sir suppose if i have less data to train my model at the time what i have to do?

  • @manishshukla125
    @manishshukla125 4 ปีที่แล้ว

    Thanks Sir, plz make a video for overfitting and underfitting

  • @niketanjha
    @niketanjha 5 ปีที่แล้ว

    Really helpful 🙏

  • @HarpreetKaur-mn4we
    @HarpreetKaur-mn4we 5 ปีที่แล้ว

    Very helpful video

  • @Shylajakarthick
    @Shylajakarthick 5 ปีที่แล้ว

    Thank you so much

  • @rahulmahajan6391
    @rahulmahajan6391 4 ปีที่แล้ว

    Can we do down sampling in credit card fraud detection dataset?

  • @shashankvashishtha9149
    @shashankvashishtha9149 3 ปีที่แล้ว

    can u please explain the that 2 algorithms xgboost and adaboost?

  • @biswanandanpattanayak1938
    @biswanandanpattanayak1938 4 ปีที่แล้ว

    how to handle missing data if data is 1tb or more? please explain

  • @arjyabasu1311
    @arjyabasu1311 4 ปีที่แล้ว +5

    Upto what ratio should we consider it as a balanced dataset ??

    • @arjyabasu1311
      @arjyabasu1311 4 ปีที่แล้ว

      @Kushal Hu what ratio that is?

  • @dr.bheemsainik4316
    @dr.bheemsainik4316 2 ปีที่แล้ว

    Sir, I have data with binary classification output variables. the ratio of classes is 7.5:2.5. Is this balanced data or unbalanced data?

  • @surendranathify82
    @surendranathify82 5 ปีที่แล้ว

    very useful . thanks. could you pls post video on PCA and LDA and about Regularization as well. thanks

    • @krishnaik06
      @krishnaik06  5 ปีที่แล้ว

      Thanks, please check my playlist, video is already there for PCA

  • @moulidinavahi1498
    @moulidinavahi1498 4 ปีที่แล้ว

    How we can downsample data points ?

  • @venkataraomannem6585
    @venkataraomannem6585 5 ปีที่แล้ว

    Sir can you please do this same as practically. Thank you sir

  • @karndeepsingh
    @karndeepsingh 4 ปีที่แล้ว

    how to deal with imbalance dataset when we have multiclass in target variable?

    • @MasterofPlay7
      @MasterofPlay7 4 ปีที่แล้ว

      use other metrics such as F1 score instead of accuracy...

  • @mandarpawar27
    @mandarpawar27 4 ปีที่แล้ว

    Hi krish
    Plz upload vedios on interview questions

  • @udanial
    @udanial 2 ปีที่แล้ว

    From which playlist this video is?

  • @DatAcuity
    @DatAcuity 3 ปีที่แล้ว

    I am just asking,
    So, if we use xgboost algorithm for classification problem we no need to bother about class imbalance.
    Am I right sir.

  • @prithviraj25
    @prithviraj25 4 ปีที่แล้ว

    Thankyou Sir

  • @victorxu9634
    @victorxu9634 5 ปีที่แล้ว

    great content. would be nice if it goes deeper

  • @aayushijain2160
    @aayushijain2160 4 ปีที่แล้ว +1

    Sir I have a doubt in this question that how to handle imbalanced data-set either by using right evaluation metrics or by these sampling techniques???Please let me know I'm very much confused...

    • @cinemascope8847
      @cinemascope8847 4 ปีที่แล้ว

      aayushi jain SMOTE can be used where we are increasing the minority data. Safest technique

  • @sashpatra88
    @sashpatra88 4 ปีที่แล้ว

    Krish : Can you put this in MACHINE LEARNING playlist? If I am not missing anything

  • @gopalakrishna9510
    @gopalakrishna9510 4 ปีที่แล้ว

    can you explian with python codes ?

  • @kakarlanagajyothi4089
    @kakarlanagajyothi4089 4 ปีที่แล้ว

    any videos for continuing these

  • @joyeetamallik5063
    @joyeetamallik5063 4 ปีที่แล้ว

    Can you share python code to implement these up sampling techniques. Is this concept is also applicable to NLP datasets?

    • @PhilippHusiA
      @PhilippHusiA 4 ปีที่แล้ว

      If working with tf.keras, add following code to model:
      1) from skearn.utils import class_weight
      2) class_weights = class_weight.compute_class_weight('balanced',np.unique(y_train),y_train)
      3) history = tf.keras.model.fit(x_train, y_train, batch_size=x, class_weight=class_weights)

  • @pruthvigirijala8146
    @pruthvigirijala8146 4 ปีที่แล้ว

    Discuss everything in upcoming video..? :p

  • @sushedbubai
    @sushedbubai 5 ปีที่แล้ว +1

    Waiting for more interview questions

    • @krishnaik06
      @krishnaik06  5 ปีที่แล้ว +1

      You can find the complete playlist on the below youtube url
      th-cam.com/play/PLZoTAELRMXVPkl7oRvzyNnyj1HS4wt2K-.html
      I will be updating this with all the questions

  • @gopalakrishna9510
    @gopalakrishna9510 4 ปีที่แล้ว

    i really happy with imbalanced and balanced dataset explaination.......

  • @Beyond90Days
    @Beyond90Days 4 ปีที่แล้ว

    how is accuracy 350/30 ?

  • @kakarlanagajyothi4089
    @kakarlanagajyothi4089 4 ปีที่แล้ว

    Small doubt missclassfication and imbalanced are same or different..

  • @azad8upt
    @azad8upt 4 ปีที่แล้ว +1

    It should be 250+ in test not 350+

  • @davinderc
    @davinderc 5 ปีที่แล้ว +2

    Consider using better whiteboard software. Your written words and numbers are nearly impossible to read in Paint.

    • @krishnaik06
      @krishnaik06  5 ปีที่แล้ว +1

      Hi Davinder, feedback taken

  • @NinjaAnkit
    @NinjaAnkit 4 ปีที่แล้ว

    what i feel when see you videos explanation, your explanation like as when you communicating in english in your video its feels like you are communicating in regional language ..........thats why i understand more.........you explain most of the difficult terms in simple way. I love your explanation.

    • @NinjaAnkit
      @NinjaAnkit 4 ปีที่แล้ว

      and also i remembered most of the concept in long time .

    • @NinjaAnkit
      @NinjaAnkit 4 ปีที่แล้ว

      and i also like your videos which i watched.

  • @pruthvigirijala8146
    @pruthvigirijala8146 4 ปีที่แล้ว

    You know..? :p