Python Pandas Tutorial (Part 9): Cleaning Data - Casting Datatypes and Handling Missing Values

แชร์
ฝัง
  • เผยแพร่เมื่อ 1 ธ.ค. 2024

ความคิดเห็น • 181

  • @coreyms
    @coreyms  4 ปีที่แล้ว +86

    Hey everyone. Hope you all had a great weekend! I will be traveling to Vancouver this week to visit a Quantum Computing company and learn more about the work they're doing, so I'm not sure when the next Pandas video will be ready for release. I will be working on it while I'm there, but I likely won't have it recorded and released until midway through next week. Let me know if anyone has any questions they would like me to ask them about Quantum Computing!

    • @harshvardhan1156
      @harshvardhan1156 4 ปีที่แล้ว +7

      Hey, Corey. Thankyou for everything. I am not from Computer Science background, Out of curiosity I started learning to code and here I am now, has done more than 20 datascience project. Your videos are literally best, I have taken some courses for high price and I can un-undoubtedly say that your way of teaching is way more interactive, complete and easy to grab.
      I just want to know how you plan for any course, like in 1st or 2nd video You said that I will cover this topic in later videos. So do you make whole content, practice it? deepdive in it and make your own order and then start teaching?
      It would be very helpful for me if you share about how you prepare for any topic.
      Thank you very much
      Love from INDIA

    • @harjotsinghbaidwan2204
      @harjotsinghbaidwan2204 4 ปีที่แล้ว +4

      I have many times seen while using dataframe that column names are not at same level and this creates an issue during extraction of values.
      Do you have any idea about it?

    • @JiminPark-ld2xx
      @JiminPark-ld2xx 2 ปีที่แล้ว

      How do I download dataset after cleaning my data using Jupyter notebook online? Plzz ans..

  • @malikdiallo9976
    @malikdiallo9976 4 ปีที่แล้ว +65

    I like this series in pandas. thank you so much Corey.

  • @ahammadshawki8
    @ahammadshawki8 4 ปีที่แล้ว +140

    Please make a playlist on numpy after pandas.

    • @ChetanAnnam
      @ChetanAnnam 3 ปีที่แล้ว +6

      Yeah please do that

  • @gauravmarwaha8466
    @gauravmarwaha8466 4 ปีที่แล้ว +5

    this series on pandas is the most complete and informative series ive found till date...!!!

  • @corben3348
    @corben3348 4 ปีที่แล้ว +37

    Good teaching is an art... This playlist is so helpful ! Thank you for your work !

  • @saravanannatarajan6515
    @saravanannatarajan6515 4 ปีที่แล้ว +64

    Corey you're teaching is awesome!!! Much appreciated!!!
    Expecting series on Machine Learning/Deep Learning in the near future...

  • @ishanpand3y
    @ishanpand3y 4 ปีที่แล้ว +12

    This is the most amazing series on Pandas ever. I just finished watching number 9th. Sir thank you so much providing such great content. 🧡🤍💚

  • @ashishdeora8522
    @ashishdeora8522 4 ปีที่แล้ว +13

    Thank you Corey for this. My parents urged me to join your community. They are saying you are doing wonderful job. Thank you Corey for enabling us

  • @adamgdev
    @adamgdev 4 ปีที่แล้ว +2

    You never disappoint!! And I never have to speed you up because you keep a great pace with no BS! Thank you!!

  • @sayantanchakraborty75
    @sayantanchakraborty75 4 ปีที่แล้ว +13

    Best series on Python Pandas . Thank you so much Mate. Love from India

  • @benhancock1541
    @benhancock1541 4 ปีที่แล้ว +7

    Thanks for this Corey - your tutorials are always great! I've been using pandas for almost 2 years and still learned stuff 👍

  • @srivathsgondi191
    @srivathsgondi191 11 หลายเดือนก่อน

    Now thats a lovely explaination, i like how u showed the function can be used in different scenarios!

  • @YeekyYeeky
    @YeekyYeeky 3 ปีที่แล้ว

    can't wait for your numpy series , this channel is gold , Thank you Corey

  • @stanislawjarzynski6133
    @stanislawjarzynski6133 3 ปีที่แล้ว +1

    You're a great teacher, Corey!

  • @gagansoni9665
    @gagansoni9665 4 ปีที่แล้ว +3

    i understand your pandas tutorials very clearly. this is helping me a lot. thank you so much corey. i wish to see your tutorials on machine learning using python.

  • @zzzorgjanbatist564
    @zzzorgjanbatist564 4 ปีที่แล้ว +2

    As usual Corey best of the best!!!

  • @mapa5000
    @mapa5000 ปีที่แล้ว

    You really care about making a video addressing many scenarios and possible issues … that’s phenomenal !! … I really appreciate it … thank you so much!!

  • @rockeyvalley
    @rockeyvalley 4 ปีที่แล้ว +1

    Great stuff Corey!!! Keep up the good work!

  • @kuls43
    @kuls43 4 ปีที่แล้ว +6

    11:36 we can use df.replace(['NA', 'Missing'], np.nan, inplace=True) instead

    • @AtlasIndustries101
      @AtlasIndustries101 4 ปีที่แล้ว +2

      could've used in other df.replace(...) line too. But I think he is trying to keep it simple for us to understand it easily.

  • @codegeek8256
    @codegeek8256 4 ปีที่แล้ว +3

    Hi @ Corey Schafer
    I am very with your teachings, these are great building blocks towards data science, i hope one day we arrive there.

  • @minxxdia1132
    @minxxdia1132 4 ปีที่แล้ว +2

    wow, this is the best playlist for python pandas. thankyou so much!

  • @analyticswithothello8213
    @analyticswithothello8213 2 ปีที่แล้ว

    Corey, you are teaching the best!

  • @darkmaraux
    @darkmaraux 4 ปีที่แล้ว +1

    This video was so smooth! Right in the point! Thanks!!!

  • @ABDULKARIMHOMAIDI
    @ABDULKARIMHOMAIDI หลายเดือนก่อน

    Thanks man for such valuable series of videos, please add more video on new features on pandas !!!

  • @andreykaok9497
    @andreykaok9497 4 ปีที่แล้ว +2

    Brilliant tutorials on Pandas!
    Very much looking forward to the time series lessons.

  • @zixinlee2165
    @zixinlee2165 4 ปีที่แล้ว +2

    Thank you so much for creating these videos!! They're really valuable for self-learners like me.

  • @finncollins5696
    @finncollins5696 ปีที่แล้ว +1

    Learnt a lot so far. Thanks so much Corey,.

  • @rauberhozenplotz7009
    @rauberhozenplotz7009 4 ปีที่แล้ว +1

    Great content - great style of speaking and explaining - thank you!

  • @danielflorea3001
    @danielflorea3001 3 ปีที่แล้ว

    Simple and clear explanations. Great job.

  • @002_priyanshugoswami5
    @002_priyanshugoswami5 4 ปีที่แล้ว +3

    love you coreyyyyy best channel

  • @Ian-bb7vv
    @Ian-bb7vv 3 ปีที่แล้ว

    I had to say, thank you!! I think you guys are really helping to fill the unequal educational resources between the rich and the poor. Great job and I hope you now that what you are doing is really meaningful

  • @gayatriwaghmare6293
    @gayatriwaghmare6293 4 ปีที่แล้ว +1

    The series is very helpful to me. Thank you sir.

  • @VikasGuptacherie
    @VikasGuptacherie 4 ปีที่แล้ว +1

    Very helpful series with nice explanations !!!

  • @davebeckham5429
    @davebeckham5429 4 ปีที่แล้ว +1

    Many thanks for sharing excellent tutorials Corey.

  • @TopicalAuthority
    @TopicalAuthority 4 ปีที่แล้ว +1

    Great lesson!

  • @anubhavrauniyar3192
    @anubhavrauniyar3192 2 ปีที่แล้ว

    We love you Corey Schafer!!!! Lots of love from India🥰

  • @kameshinipillay4587
    @kameshinipillay4587 2 ปีที่แล้ว +1

    Thank you, learning so much :)

  • @saraghafelehbashi5808
    @saraghafelehbashi5808 2 ปีที่แล้ว

    much appreciated! could you please have more video like that? cleaning data and see the diffrent errors come with it!
    it would really helpful for juniors.

  • @lucasbartomioli7861
    @lucasbartomioli7861 7 หลายเดือนก่อน

    Man, i love you! Thanks a lot from Argentina!

  • @alexthewebdesigner1856
    @alexthewebdesigner1856 2 ปีที่แล้ว

    @Corey Schafer
    Something told me that I'hd better watch this video. Just when I thought that I'd sanitized a large data set, I realize now that there could potentially be some data (or missing data) that could crash my application. Great video. Thank you Sir!

  • @Al-Ahdal
    @Al-Ahdal 4 ปีที่แล้ว +2

    Boss, it is requested to kindly make videos on comprehensive data analysis series, covering all aspects in much detail, and covering all possible areas for data analysis. Your channel and vdos are awesome. Great work indeed...... 👍

  • @ahmedhosny3855
    @ahmedhosny3855 ปีที่แล้ว

    such a great work done by you , hope you all the best man

  • @njgaming4422
    @njgaming4422 10 หลายเดือนก่อน +1

    instead of replacing separately you can just pass the list of strings that you want replace
    E.g : df['YearsCode'].replace(['Less than 1 year','More than 50 years'],[0,51],inplace=True)

  • @Shkkmj6868
    @Shkkmj6868 4 ปีที่แล้ว

    It's very useful .You are great at articulating . Thank you so much .

  • @samratsengupta8881
    @samratsengupta8881 4 ปีที่แล้ว +2

    Thanks Corey, i have no words to say. As an inspiring data scientist, your pandas videos were really cool.
    I don't know if you will ever read this but this has helped and has put a smile on my 'confused about pandas' face.
    i have subscribed and will watch your videos for becoming a self taught data scientist.
    God Bless You

  • @kirannagar8295
    @kirannagar8295 4 ปีที่แล้ว +1

    Hey , truly glad for your all series . If possible , please do make a course video on Pyspark .

  • @stressfreetrading1341
    @stressfreetrading1341 4 ปีที่แล้ว +1

    Love the way u teach. thanks a lot... Love from India

  • @ajinzrathod
    @ajinzrathod 3 ปีที่แล้ว

    Corey you are great.❤️
    Love from India ❤️

  • @juancarcelen3437
    @juancarcelen3437 4 ปีที่แล้ว +2

    Hi Corey thank you so much for posting these videos. Your tutorials have helped me transition the concepts I know into actual useful code. I would like to test my progress and would really appreciate if you can put out a link with some data analysis projects (i.e. a database to download, questions to answer using data analysis, and the code that was written to answer those questions).
    Thank you so much and keep the videos coming you're an amazing teacher!!

  • @NikitaSharma-bs4gg
    @NikitaSharma-bs4gg 3 ปีที่แล้ว

    That was such a good video- thank you for sharing

  • @kingjoshuamanatad2140
    @kingjoshuamanatad2140 4 ปีที่แล้ว

    In 27:28 of the video. For a one liner code. df['YearsCode'].replace(['Less than 1 year','More than 50 years'],[0,51]), inplace=True). Correct me if I'm wrong I'm new to Python. But great video again Corey! Hats off!

    • @vladimirwimmer11
      @vladimirwimmer11 6 หลายเดือนก่อน

      this does not work anymore as Corey was mentioning, rather like this>> df.replace({'YearsCode': {'Less than 1 year':0, 'More than 50 years':51} },inplace= True)

  • @dadoll1660
    @dadoll1660 4 ปีที่แล้ว +4

    This is gold.

  • @robertmnganya7533
    @robertmnganya7533 3 ปีที่แล้ว

    Excellent teaching. Thank you.

  • @manishgpt25
    @manishgpt25 3 ปีที่แล้ว

    thanks a ton for this series..helped a lot in clearing concepts!!

  • @FakeAccount
    @FakeAccount 4 ปีที่แล้ว +2

    You're a legend, my guy.

  • @haiderali2050
    @haiderali2050 4 ปีที่แล้ว

    Thank you so much, i have learnt a lot and able to automize my daily Excel routine work

  • @aegystierone8505
    @aegystierone8505 4 ปีที่แล้ว +1

    Please do a video about your visit to the Quantum Computing trip in Vancouver!

  • @codewithluq
    @codewithluq 4 ปีที่แล้ว +2

    Thank you Corey again. My resume is getting more interesting everyday. Viva

  • @teetanrobotics5363
    @teetanrobotics5363 4 ปีที่แล้ว +3

    I love your tutorials. Could you also make tutorials for scipy and scikit learn?

  • @ironpolux
    @ironpolux 3 ปีที่แล้ว

    Great vid, pls do one on multiple indexes!

  • @JoKaR80-d5r
    @JoKaR80-d5r 3 ปีที่แล้ว

    These are awesome! Thanks a million!

  • @mohamedikbalguetout32
    @mohamedikbalguetout32 3 ปีที่แล้ว

    hey bro always I fond the solutions in your videos thanks man

  • @stephanie_ong
    @stephanie_ong 4 ปีที่แล้ว

    Thanks again for such a helpful video.

  • @mikkybricks
    @mikkybricks 4 ปีที่แล้ว +2

    Thanks Corey

  • @interestingstudies4422
    @interestingstudies4422 3 ปีที่แล้ว

    Amazing video...solved my problems ☺️☺️🙏🏻

  • @athas12
    @athas12 ปีที่แล้ว

    for the last part of the video, you can actually create two lists and use these lists in replace method to change all values at once. It is slightly easier especially if the df has multiple values to replace

  • @arkahm
    @arkahm 4 ปีที่แล้ว +1

    Great video! How about a video in spitting data and passing the split into a function? That would be great!

  • @andr101
    @andr101 4 ปีที่แล้ว +1

    great series, thanks!

  • @mohammedkaifmirza7585
    @mohammedkaifmirza7585 2 ปีที่แล้ว

    Amazing tutorial 😍👌

  • @varunkrishnaKyathanpally
    @varunkrishnaKyathanpally 4 ปีที่แล้ว

    Thank you , excellent tutorial as always :)

  • @muntadher8087
    @muntadher8087 2 ปีที่แล้ว

    Thank you so much!! You are the best

  • @gauravmarwaha8466
    @gauravmarwaha8466 4 ปีที่แล้ว +1

    good video again..!! thanks a lot

  • @litan5006
    @litan5006 2 ปีที่แล้ว

    Good pandas video. Thank you

  • @muntadher8087
    @muntadher8087 2 ปีที่แล้ว

    useing this func ( df.fillna("Unfilled", inplace = True) ) to replace the missing values is good practice I belive, for me it's easier than replace and more dynamic

  • @bharaths1396
    @bharaths1396 4 ปีที่แล้ว +1

    Your content is awsome....!
    How do replace nan values with other values only in a particular column?
    Please Help
    Thank You

  • @shivavijaya1537
    @shivavijaya1537 4 ปีที่แล้ว +1

    Hi Corey, please post a video on python sys module

  • @KimJennie-fl3sg
    @KimJennie-fl3sg 4 ปีที่แล้ว

    This also work if we want to drop a column if 0 and 1 index have NaN
    df.dropna(axis='columns', how='any', subset=[0, 1])

  • @quoit99training83
    @quoit99training83 4 ปีที่แล้ว

    amazing series - hi Corey, how many PARTS u think will end up in this playlist? Thank you for helping the community :)

  • @shanghainewbison7687
    @shanghainewbison7687 ปีที่แล้ว

    Another way to change the dtype of df["YearsCode"].
    def change_to_float(string):
    try:
    result = float(string)
    except:
    result = np.nan
    finally:
    return result
    df['YearsCode'] = df['YearsCode'].apply(change_to_float)
    df['YearsCode'].mean()

    • @vipulsuthar3796
      @vipulsuthar3796 11 หลายเดือนก่อน

      Thanks man! Was stuck on this... really appreciated!!

  • @saqibhussain1354
    @saqibhussain1354 4 ปีที่แล้ว +1

    Great video - I wonder if you can do a few on the business side like freelancing and how to get clients as python developers?

  • @noureddineettayyeby5210
    @noureddineettayyeby5210 4 ปีที่แล้ว +2

    Thank you

  • @stayinawesum
    @stayinawesum 4 ปีที่แล้ว +1

    can you make video explaining:
    primitive data types vs data types vs adt vs data structure

  • @dhanraj112
    @dhanraj112 4 ปีที่แล้ว +3

    is brilliant give certificates after completion of course?

  • @harishrudroju1379
    @harishrudroju1379 4 ปีที่แล้ว +1

    Hii corey, can u plz make a video on how to bypass captcha while scrapping a web site

  • @bobsalita3417
    @bobsalita3417 4 ปีที่แล้ว +1

    Can you use join() or merge() to do multiple replacements?

  • @nikhilb3880
    @nikhilb3880 4 ปีที่แล้ว

    I love this series man, more than you could expect.
    If I may ask, what state and country are you from? Because I saw snow on your 2nd channel and now I'm confused about whether you live in the USA or in a European country.
    Thanks again for this series

    • @coreyms
      @coreyms  4 ปีที่แล้ว +1

      Hey there. I currently live in Greenville SC in the United States. The snow videos were likely from Boulder Colorado where I lived for several years.

  • @prasad1686
    @prasad1686 4 ปีที่แล้ว +2

    Hi Corey, your videos are "the great". I am beginner can you please tell me how to get "cheet sheet" or ".py scripts" of your video playlist "Python Tutorials" "1 to 136", to speedup learning as i am slow in typing. Thank you.

  • @nayeemuddinmoinuddin2186
    @nayeemuddinmoinuddin2186 4 ปีที่แล้ว

    @Corey Schafer - Please do a video series on PySpark.

  • @adarshtiwari7395
    @adarshtiwari7395 3 ปีที่แล้ว

    That is BRILLIANT

  • @KevinTempelx
    @KevinTempelx 4 ปีที่แล้ว

    Thank you!

  • @ADNANAHMED-eo5xx
    @ADNANAHMED-eo5xx 4 ปีที่แล้ว

    Please continue the series sir

  • @kinjalvora256
    @kinjalvora256 4 ปีที่แล้ว +1

    Hi Corey,
    Thanks for the awesome series. While I have not yet finished the series, I would like to know, how we can deal with duplicates.
    If you have a column let's say with duplicate apps and the apps have reviews, size, installations and you want to let's say get a mean for the reviews, take the first size and sum of the installations and merge the rest of the columns for those apps as they were, like Ratings. How would one do that?

  • @anshuldwivedi7210
    @anshuldwivedi7210 3 ปีที่แล้ว +1

    Great series, but I must say I find R much easier to understand since there are not as many exceptions as there are in python. Everything is a function and there is no concept of method (at least that I'm aware of)

  • @ADNANAHMED-eo5xx
    @ADNANAHMED-eo5xx 4 ปีที่แล้ว +1

    Thanx a lot

  • @leoeghosa6977
    @leoeghosa6977 ปีที่แล้ว

    hello corey,
    in your video the first
    df.dropna() drops rows 4 and 5; according to your explanation because the "How" function is set to any , whereas row 6 also contains a string of 'missing', 'anonumous' is also a string, why doesn't row 6 drop?, what differentiates the two strings.
    Thank you, Awesome lessons

  • @d_omar1468
    @d_omar1468 4 ปีที่แล้ว

    great job brow

  • @shadmanmartinpiyal4057
    @shadmanmartinpiyal4057 6 หลายเดือนก่อน

    excellent!

  • @md-ayaz
    @md-ayaz 4 ปีที่แล้ว

    @Corey Schafer Can you make video on getting started on Open Source Contribution?

  • @SusanAmberBruce
    @SusanAmberBruce 4 ปีที่แล้ว +2

    Corey, do you happen to know what Linux distro's ship with python 3 currently?

  • @md.abdullahalmasum4942
    @md.abdullahalmasum4942 3 ปีที่แล้ว

    thank you sir .