Outlier detection and removal using IQR | Feature engineering tutorial python # 4

แชร์
ฝัง
  • เผยแพร่เมื่อ 5 พ.ย. 2024

ความคิดเห็น • 135

  • @codebasics
    @codebasics  2 ปีที่แล้ว

    Check out our premium machine learning course with 2 Industry projects: codebasics.io/courses/machine-learning-for-data-science-beginners-to-advanced

    • @ashaikh147
      @ashaikh147 10 หลายเดือนก่อน

      i am interested for data science course

  • @h44r96
    @h44r96 ปีที่แล้ว +1

    Came back to this video after a long time. Just noticed that the video description itself explains everything straight to the point Sir. Great job

  • @shreyasb.s3819
    @shreyasb.s3819 4 ปีที่แล้ว +4

    The way of explaining in simply way is very good. Thanks a lot .

  • @salvinjohn6977
    @salvinjohn6977 4 ปีที่แล้ว

    Most underrated TH-cam channel ...even though he teaches much better and easy to understand than so many TH-cam channels which confuse you with lot of stuff.

    • @deepeshmhatre4291
      @deepeshmhatre4291 3 ปีที่แล้ว

      underrated ? you gotta be a fool then, he has more than 280K subs.

    • @salvinjohn6977
      @salvinjohn6977 3 ปีที่แล้ว

      @@deepeshmhatre4291 dude comback after a year and reply. i was talking about the number of views he was getting and yeah a lot can change in 4months fool.

  • @maheshpeddykudi5009
    @maheshpeddykudi5009 4 ปีที่แล้ว

    Interested VS committed.... awesome.... one have to be committed..... it's very clear now...thanks for the resources bro...

  • @maseedilyas203
    @maseedilyas203 4 ปีที่แล้ว +3

    Thanks man very useful you are the best . I've watched your entire machine learning tutorial . I learned many things from this.

  • @DirrrtyD91
    @DirrrtyD91 ปีที่แล้ว

    Thank you! I needed to address outliers in a data set that I was performing an ANOVA test on, and this helped a lot.

  • @humayunnasir6261
    @humayunnasir6261 4 ปีที่แล้ว

    Amazing Video Series. You are the best teacher on internet. Huge Respect from Pakistan.

  • @mouaztabboush5571
    @mouaztabboush5571 3 ปีที่แล้ว +1

    All I can think about is aamir skipping the stairs and entering through the windows xD
    btw. great tutorial!!

  • @anushkajain9529
    @anushkajain9529 4 ปีที่แล้ว

    Kudos to sharing your knowledge in such a simplified manner. Exercises at the end gives a huge confidence. Thanks !! :)

  • @sumitrawat4103
    @sumitrawat4103 3 ปีที่แล้ว +1

    You always teaches complex topics with so much ease, great lec...sir please add more tutorial in list

  • @iaconst4.0
    @iaconst4.0 3 หลายเดือนก่อน

    eres un excelente profesor, entiendo la explicacion en Ingles perfectamente, desde peru , gracias por compartir tus conocimientos!!

  • @dilnawazahmed949
    @dilnawazahmed949 ปีที่แล้ว

    Very very good simple and beautiful explanation
    Sir you are awesome 👍

  • @nasreenbanu2245
    @nasreenbanu2245 2 ปีที่แล้ว

    Sir please upload video on reinforcement learning.Your teaching is great.May God bless you and your family.

  • @e-normous
    @e-normous ปีที่แล้ว

    Thank you so much for this very helpful video. You managed to explain the needed concept straight to the point. Awesome!

  • @sujitha3335
    @sujitha3335 3 ปีที่แล้ว

    Just a clear cut explanation

  • @georgepistikoudis
    @georgepistikoudis ปีที่แล้ว

    Great work. I started studying statistics with your video, and i find them amazing. I just wanted to ask you to check if the value of Q3 on your example is wrong. According to my calculations Q3 should be 6.15 and not 6.27. I calculated this using excel formula percentile.inc(array;0,75). I understand this is not so important, cause the value in your videos is the understanding of how to use these methods to do a better data analysis. Thank you.

  • @spacearc2353
    @spacearc2353 ปีที่แล้ว

    Happy teacher's day Sir 🎉

  • @hardikatri7803
    @hardikatri7803 4 ปีที่แล้ว

    Just Awesome. Really loved the teaching style. Waiting for more tutorials..

    • @codebasics
      @codebasics  4 ปีที่แล้ว

      I am glad it was helpful

  • @manikantareddyarikatla9138
    @manikantareddyarikatla9138 ปีที่แล้ว

    I have followed your python and machine learning playlists and learnt many new things. They are quite amazing and helped me to solve many machine learning problems.
    Sir would you please suggest a playlist to learn and code in JAVA at Intermediate level as I have to attend campus placement within 3 months.

  • @kirandeepmarala5541
    @kirandeepmarala5541 4 ปีที่แล้ว +1

    such an amazing series..Thank you very Much Sir..I am Learning A lot daily from You Sir..

    • @codebasics
      @codebasics  4 ปีที่แล้ว

      Glad to know kirandeep

  • @jiyabyju565
    @jiyabyju565 2 ปีที่แล้ว

    thanks for all your efforts..may God bless you Sir.

  • @kashifiqbal5151
    @kashifiqbal5151 4 ปีที่แล้ว

    Such an amazing video series, Thank you very much sir, keep continue.

  • @leamon9024
    @leamon9024 4 ปีที่แล้ว

    Thanks a lot. Awesome tutorials. Looking forward to more amazing contents in Feature engineering series.

  • @rajivmehta4535
    @rajivmehta4535 3 ปีที่แล้ว

    Thank you, such a simple explanation about the IQR and Python Code too!

  • @gurkanyesilyurt4461
    @gurkanyesilyurt4461 ปีที่แล้ว

    Thanks for this very useful video!

  • @anoopbhagat13
    @anoopbhagat13 4 ปีที่แล้ว

    Thank you sir, for explaining the concept so lucidly.

  • @vidhyasagar7978
    @vidhyasagar7978 4 ปีที่แล้ว +1

    Thanks for another amazing video.Could you also please make a video explaining different scenarios on how to decide when to use which Outlier technique?

  • @shikharsaxena9989
    @shikharsaxena9989 4 ปีที่แล้ว +1

    Thanks sir.i used to mug up those code but now i can code myself

  • @jonathancampos1023
    @jonathancampos1023 4 ปีที่แล้ว

    I am pretty much thankful for this video sir, I improved my understanding!

  • @abdultaufiq2237
    @abdultaufiq2237 4 ปีที่แล้ว

    such a great explanation, you are teaching really well.
    I have watched your entire ML videos.
    Thanks for providing us such a good explained ML videos
    I just wanna request you to please add one more Algorithm to your ML video series of XGboost , how to install it and all the necessary stuffs
    thank you so much

    • @codebasics
      @codebasics  4 ปีที่แล้ว

      Sure. Point noted

  • @geekyprogrammer4831
    @geekyprogrammer4831 3 ปีที่แล้ว

    Sir please continue this series

  • @manasaraju8552
    @manasaraju8552 ปีที่แล้ว

    Thanks for sharing this content sir

  • @aradhyadev459
    @aradhyadev459 4 หลายเดือนก่อน +1

    sir, how to remove outliers from multiple columns? do we have to always do it 1 by 1, or is there any trick?

  • @manideep1882
    @manideep1882 3 ปีที่แล้ว +1

    I came here after seeing friends spending 12-13lakh rupees on "university" courses which don't realize value and later repent. Thank you for this content brother. May you achieve many more milestones in life !

  • @mohammadpatel2569
    @mohammadpatel2569 4 ปีที่แล้ว

    Clear explanation..love it

  • @ashwinsg2037
    @ashwinsg2037 4 ปีที่แล้ว +1

    Thank you so much Sir for this amazing techniques. Though one request, could you please make some tutorials on how to select best features from the dataset i.e. FEATURE SELECTION

  • @shaikansarbasha4169
    @shaikansarbasha4169 4 ปีที่แล้ว

    Excellent sir

  • @yogeshbharadwaj6200
    @yogeshbharadwaj6200 4 ปีที่แล้ว

    very simple video sir...tks. a lot..

  • @mohitchatterjee5517
    @mohitchatterjee5517 3 ปีที่แล้ว

    Sir...this playlist is really awesome.....I just want to know...this is all about Feature Engineering or you will upload videos soon...Thank You.....We all feel Glad If you were reply us.

    • @codebasics
      @codebasics  3 ปีที่แล้ว +1

      Mohit the playlist is not complete. I will upload more videos in future

    • @mohitchatterjee5517
      @mohitchatterjee5517 3 ปีที่แล้ว

      @@codebasics thnq so much sir

  • @tuongnguyen9391
    @tuongnguyen9391 4 ปีที่แล้ว

    your teaching skill is excellent ! alot of subtle point has been address directly to the viewer in a neat and professional manner.
    What is the average time you spend on making the script and video ?

    • @codebasics
      @codebasics  4 ปีที่แล้ว +2

      Thanks Dear, be real, produce genuine content and you will be successful. Average time for making the script and video depends on the content. Let’s say for making video on project might take weeks while video on tips takes some days only.

  • @tugrulpinar16
    @tugrulpinar16 3 ปีที่แล้ว

    Thank you for these amazing videos.

  • @vinyasshreedhar9833
    @vinyasshreedhar9833 3 ปีที่แล้ว

    You have shown us an example of only 1 column 'height'. Thanks for the concept. What if we have a larger dataset of multiple columns for example 15 columns out of which there are 12 numerical columns. Do we need to plot Box Plots for each column to check the outliers ? Or going by IQR method, do we need to write the above IQR codes for each and every variable ? Is there any shorter method ? Please advise.

    • @stickmanjournal
      @stickmanjournal 2 ปีที่แล้ว

      to detect the outliers we can simply use boxplot or histogram to give us a clear representation about the data. I Usually use IQR to detect the specific index of the outliers to be then treated.

  • @ridwanulhoquejr2080
    @ridwanulhoquejr2080 4 ปีที่แล้ว

    waiting for next episode 🤞

  • @ramandeepbains862
    @ramandeepbains862 2 ปีที่แล้ว

    I think the 1.2 and 2.3 height ranges are possible because we don't have age data. the lower limit is not clear .. plz explain ...

  • @GauravSharma-td2fd
    @GauravSharma-td2fd ปีที่แล้ว

    Good video!!

  • @flamboyantperson5936
    @flamboyantperson5936 4 ปีที่แล้ว

    Great work Dhaval. My name is there in the excel :)

    • @codebasics
      @codebasics  4 ปีที่แล้ว +1

      Yes 😊👍 hope you are doing well aamir.

    • @flamboyantperson5936
      @flamboyantperson5936 4 ปีที่แล้ว

      @@codebasics I am doing well. I have to watch all your recent videos on deep learning I saw in notification they all are awesome. I will watch like and comment.

  • @arunkoparde3035
    @arunkoparde3035 4 ปีที่แล้ว

    Hello sir thanks for sharing your knowledge sir i am worked on outliers by using z score after filter data if I checked outliers by using box plot it shows outlier please tell me sir. Whether I need continue with next process or I need remove outlier one more time sir .

  • @georgepistikoudis
    @georgepistikoudis ปีที่แล้ว

    Hi, thank you for this great video. I tried to code it exactly but the display of the dataset is different, separating the columns with a semicolon. Any idea why?
    Name;Height
    0 Mohan;1.2
    1 Maria;2.3
    2 Sahib;4.9
    3 Tao;5.1
    4 Virat;5.2
    5 Khusbu;5.2
    6 Dmitry;5.5
    7 Selena;5.5
    8 John;5.6
    9 Imran;5.6
    10 Jose;5.8
    11 Deepika;5.9
    12 Joseph;6
    13 Binod;6.1
    14 Gulshan;6.2
    15 Johnson;6.5
    16 Donald;7.1
    17 Aamir;14.5
    18 Ken;23.2
    19 Liu;40.2

  • @sahanjayawarna4894
    @sahanjayawarna4894 4 ปีที่แล้ว

    Hi Sir, outliers should only be removed from x variables or y variable as well? My understanding is that we should do it for only for x variables. Appreciate your advise sir

    • @codebasics
      @codebasics  4 ปีที่แล้ว +1

      When you remove outlier you are removing entire row which includes x and y both

    • @sahanjayawarna4894
      @sahanjayawarna4894 4 ปีที่แล้ว

      @@codebasics Sure sir, what I meant was that outlier removal exercise should be performed only x (independent) variables, not on y (dependent) variable. Yes, when we remove x, entire row will be deleted including y, but outlier removal is focused only on x variables. Hope this statement is correct.

  • @SomeoneSpecial447
    @SomeoneSpecial447 7 หลายเดือนก่อน

    What is the dataset has multiple feature columns instead of just one as you have shown here? how would the code change? Please respond.

  • @saurabhbarasiya4721
    @saurabhbarasiya4721 4 ปีที่แล้ว

    Great sir

  • @bladeofimmortal7224
    @bladeofimmortal7224 2 ปีที่แล้ว

    Sir is this technique applicable to any dataset?

  • @aliyanrazzaq7521
    @aliyanrazzaq7521 3 ปีที่แล้ว +1

    How did you find that "1.5" for upper and lower limit?

  • @letitiaab
    @letitiaab 7 หลายเดือนก่อน

    Thank you so much

  • @sathwikrc
    @sathwikrc 3 ปีที่แล้ว +1

    I have a doubt here 3:10, to find lower/upper limit, 1.5 can be used for all type of data or only to few data types in particular

  • @sa89879
    @sa89879 4 ปีที่แล้ว

    thanks for sharing

  • @aldot1532
    @aldot1532 3 ปีที่แล้ว

    what if one of our observations is exactly equal to the upper or lower limit? Should we use = instead of < and > ?

  • @Mars7822
    @Mars7822 2 ปีที่แล้ว

    brilliant

  • @adwaitdwivedi9581
    @adwaitdwivedi9581 2 ปีที่แล้ว

    Don't want to remove outliers, want to boundary them as per the IQR values, what to do sir, stuck in this

  • @satyavardhan8204
    @satyavardhan8204 4 ปีที่แล้ว +1

    Make a video series on seaborn please

    • @codebasics
      @codebasics  4 ปีที่แล้ว +1

      Added that in my Todo list

  • @haintuvn
    @haintuvn 4 ปีที่แล้ว

    Must we remove "outlier" from a data set? Thank you Teacher!

    • @codebasics
      @codebasics  4 ปีที่แล้ว

      No always necessary.. it depends. but majority of the time yes we end up *treating* them which means not necessarily remove it but change the value to say median.

  • @jaganinfo
    @jaganinfo 4 ปีที่แล้ว

    now i'm able to identify outliers easily by watching this video. *Why we are multiplying IQR with 1.5 why not 2.5?*

    • @codebasics
      @codebasics  4 ปีที่แล้ว +1

      That's just the statistical rule of thumb. It is like asking why do we use 3 STD deviation or more for removing outliers..😊👍

    • @jaganinfo
      @jaganinfo 4 ปีที่แล้ว

      @@codebasics *Threshold* can be set with the specified number of Standard Deviation. *Default value is 3* . Outliers increase the standard deviation, so it may fail to detect outliers. more extreme the outlier, more SD is affected. So, 3 is the default.

  • @analyticalguru1441
    @analyticalguru1441 4 ปีที่แล้ว

    Sir I want to be a data scientist...should I learn ml 1st and then start data science?..plz do reply...

    • @codebasics
      @codebasics  4 ปีที่แล้ว

      ML is a way to do data science. They are not separate

  • @ayeshagondekar2075
    @ayeshagondekar2075 3 ปีที่แล้ว

    Why do we have to multiply IQR by 1.5 only? (while calculating lower and upper limit)

  • @snehasneha9290
    @snehasneha9290 4 ปีที่แล้ว

    Sir plz suggest me best youtube channel for learning tableau

  • @AlonAvramson
    @AlonAvramson 3 ปีที่แล้ว

    Thank you!

  • @bhavindedhia9968
    @bhavindedhia9968 4 ปีที่แล้ว

    Sir Please make video on real-world data EDA

  • @Hermoine8
    @Hermoine8 4 ปีที่แล้ว

    This thing of how to calculate the percentile is messing me up........by using the formula percentile=100*(i-0.5)/100....we get different values of percentile.... please clear me this basics

  • @yathinprakashkethepalli5034
    @yathinprakashkethepalli5034 3 ปีที่แล้ว

    There were many other videos in this playlist, where are they now?

    • @codebasics
      @codebasics  3 ปีที่แล้ว +1

      I separated deep learning videos into a separate playlist because it is not appropriate to have deep learning videos in a playlist "machine learning for beginners" To find other videos in youtube search "codebasics deep learning"

  • @azad_agi
    @azad_agi ปีที่แล้ว

    Thank you

  • @manish17788
    @manish17788 2 ปีที่แล้ว

    what if data has no outlier. In that case we will loose tiny data? how to know if not outlier removal is needed in big dataset?

  • @arjukundu1835
    @arjukundu1835 ปีที่แล้ว

    sir from where you take 1.5 plz tell nah

  • @muayas9602
    @muayas9602 2 ปีที่แล้ว

    Is there any difference between quantile & percentile?

  • @sunnyarora4916
    @sunnyarora4916 3 ปีที่แล้ว

    What if the dataset has many attributes like height here? shall we check individually for all?

  • @OmidAtaollahi
    @OmidAtaollahi 4 ปีที่แล้ว

    perfect... perfect... thank you..

  • @ahmadladkani3340
    @ahmadladkani3340 3 ปีที่แล้ว

    I have a question hopefully a quick reply I do understand the concept but I need to see how to remove outliers for multiple columns

    • @aradhyadev459
      @aradhyadev459 4 หลายเดือนก่อน

      same, did u identify?

  • @cookiepedia9316
    @cookiepedia9316 3 ปีที่แล้ว

    Hi Sir while doing assignment I am getting error saying can only compare identically labeled series objects. Please explain why I am getting this.

  • @mylife7810
    @mylife7810 2 ปีที่แล้ว

    Is it complete tutorial for ml????

  • @rohanpandey7356
    @rohanpandey7356 2 ปีที่แล้ว

    sir can you tell me if i have more columns how i use iqr

  • @cahyoardhi
    @cahyoardhi 4 ปีที่แล้ว

    amazing, thank you!

  • @amc8437
    @amc8437 3 ปีที่แล้ว

    #If there is more than one, variable is code?;
    Q1=df.height,weight,age.qaurtile(0.25)
    Q3=df.gieght,weight,age.quartile (0.75)

  • @mithilanavishka4531
    @mithilanavishka4531 2 ปีที่แล้ว

    sir I have looked at all 3 tutorials of your removing outlier, Now i have data set I need to remove outliers but I have no idea, which method i need to use,what are the thing i need to consider to find bet outlier removal method for my data set?

  • @Balubindass
    @Balubindass 4 ปีที่แล้ว

    Sir I have question, if i have 20+ columns in dataset then should select each column and find out lier and delete ? If so what is the column3,4 has the outlier but rest all columns values are perfect. Then do we delete the outlier rows data in whole dataset?

    • @codebasics
      @codebasics  4 ปีที่แล้ว

      You don't need to necessarily delete them but rather *treat* them. Which means if outlier seems to be occuring due to a data collection error may be one approach is to replace it with either a median or mean value or any other value that is appropriate based on a situation

  • @tharunnl7810
    @tharunnl7810 8 หลายเดือนก่อน

    how to perform this when the dataset has around 100 independent variables?

    • @aradhyadev459
      @aradhyadev459 4 หลายเดือนก่อน

      same, did u identify?

  • @alexalbon8274
    @alexalbon8274 4 ปีที่แล้ว

    love u sirrrrrrr

  • @MuthuserpiSasikumar
    @MuthuserpiSasikumar 7 หลายเดือนก่อน

    Hi Can you please give me any reference book for this videos

  • @ukquaratine1019
    @ukquaratine1019 3 ปีที่แล้ว

    From the given dataset I am lower_bound as a negative value is it okay because I read a blog in which it is mentioned that an IQR cannot be negative. Any help would be really appreciaed

    • @dilipreddy1535
      @dilipreddy1535 3 ปีที่แล้ว

      hey,yes i got the same,negative value ,pls can any one reply with answer

  • @barkhapaswan5807
    @barkhapaswan5807 3 ปีที่แล้ว

    Take a bow🙏

    • @codebasics
      @codebasics  3 ปีที่แล้ว

      Glad it was helpful!

  • @harleyquinn5245
    @harleyquinn5245 4 ปีที่แล้ว

    Sir can l become data analyst after 12

    • @codebasics
      @codebasics  4 ปีที่แล้ว +1

      I would suggest doing a degree in data analysis if your country has such colleges providing specialization in data analysis. Otherwise go for either computer science or a major in mathematics, statistics etc. Once you study for few years you will have good grasp on many different concepts and then you can start applying for data analyst job. As such you can learn everything on your own too but that would require lot of self discipline, hard work. But yes you can do it in your own for sure without going to a college

    • @harleyquinn5245
      @harleyquinn5245 4 ปีที่แล้ว

      @@codebasics
      Sir if I do direct data analyst course after 12th ,can I get success on this path and thanks for replying me Sir.

  • @AB-vw8wz
    @AB-vw8wz 3 ปีที่แล้ว

    Please do not remove outlier, you should either keep it or preprocess the values never remove

  • @Krishna_Sharanam_Paramita
    @Krishna_Sharanam_Paramita 4 ปีที่แล้ว

    What is data frame here?

    • @codebasics
      @codebasics  4 ปีที่แล้ว

      Plz watch my pandas tutorials playlist. First one, you will get an idea