Exploratory Data Analysis in Pandas | Python Pandas Tutorials

แชร์
ฝัง
  • เผยแพร่เมื่อ 8 ก.พ. 2025
  • Take my Full Python Course Here: www.analystbui...
    In this series we will be walking through everything you need to know to get started in Pandas! In this video, we learn about Exploratory Data Analysis in Pandas.
    Dataset in GitHub:
    github.com/Ale...
    Code in GitHub: github.com/Ale...
    Favorite Pandas Course:
    Data Analysis with Pandas and Python - bit.ly/3KHMLlu
    ____________________________________________
    SUBSCRIBE!
    Do you want to become a Data Analyst? That's what this channel is all about! My goal is to help you learn everything you need in order to start your career or even switch your career into Data Analytics. Be sure to subscribe to not miss out on any content!
    ____________________________________________
    RESOURCES:
    Coursera Courses:
    📖Google Data Analyst Certification: coursera.pxf.i...
    📖Data Analysis with Python - coursera.pxf.i...
    📖IBM Data Analysis Specialization - coursera.pxf.i...
    📖Tableau Data Visualization - coursera.pxf.i...
    Udemy Courses:
    📖Python for Data Analysis and Visualization- bit.ly/3hhX4LX
    📖Statistics for Data Science - bit.ly/37jqDbq
    📖SQL for Data Analysts (SSMS) - bit.ly/3fkqEij
    📖Tableau A-Z - bit.ly/385lYvN
    Please note I may earn a small commission for any purchase through these links - Thanks for supporting the channel!
    ____________________________________________
    BECOME A MEMBER -
    Want to support the channel? Consider becoming a member! I do Monthly Livestreams and you get some awesome Emoji's to use in chat and comments!
    / @alextheanalyst
    ____________________________________________
    Websites:
    💻Website: AlexTheAnalyst.com
    💾GitHub: github.com/Ale...
    📱Instagram: @Alex_The_Analyst
    ____________________________________________
    0:00 Intro
    1:51 First Look at Data
    3:45 Info()
    4:40 Describe()
    5:47 Counting all Null Values
    7:09 Count of Unique Values
    8:15 Sorting on Values
    10:40 Correlation between Columns
    11:53 Heatmap using Seaborn
    14:43 Grouping Data
    25:02 Visualizing Grouped Data
    26:17 Boxplots for Outliers
    29:07 Data Types of Columns
    30:41 Outro
    All opinions or statements in this video are my own and do not reflect the opinion of the company I work for or have ever worked for

ความคิดเห็น • 221

  • @santiagofajardo4949
    @santiagofajardo4949 ปีที่แล้ว +130

    Hello,
    at minute 24:24, I managed to reverse the range of column names using [5:13][::-1]. The expression [::-1] is used to reverse ranges and it is very useful:
    df2 = df.groupby('Continent')[df.columns[5:13][::-1]].mean(numeric_only=True).sort_values(by='2022 Population', ascending=False)
    df2
    Thank you very much, Mr. Alex, for these tutorials.

    • @WorkJob-g3o
      @WorkJob-g3o ปีที่แล้ว +1

      Thank You!

    • @renanz21
      @renanz21 ปีที่แล้ว +5

      Alternatively, start counting columns backwards,
      df2 = df.groupby("Continent")[df.columns[-5:-13:-1]].mean().sort_values(by='2022 Population', ascending=False)
      df2

    • @AlexisTeseyra-mj4ft
      @AlexisTeseyra-mj4ft 4 หลายเดือนก่อน +2

      or df3.plot().invert_xaxis()

    • @kunalbolar2488
      @kunalbolar2488 2 หลายเดือนก่อน

      thx

  • @pbp7
    @pbp7 ปีที่แล้ว +57

    Man, “Oceania” was so funny 😂, tks for the class!

  • @OkallTheAnalyst
    @OkallTheAnalyst 11 หลายเดือนก่อน +51

    Incase you are running into an error at minute 11:12, add numeric_only = True to the corr. i.e df.corr(numeric_only = True).

    • @mananagrawal4114
      @mananagrawal4114 10 หลายเดือนก่อน

      thanks man !

    • @Hamzahnahmad
      @Hamzahnahmad 6 หลายเดือนก่อน

      thank you. really helpful!

    • @usmanhammed7158
      @usmanhammed7158 5 หลายเดือนก่อน

      Thank you

    • @jaymanhire
      @jaymanhire 2 หลายเดือนก่อน

      Thanks! Never seen that before!

    • @ruthbeaubrun6954
      @ruthbeaubrun6954 2 หลายเดือนก่อน

      thank you!!

  • @JW-pu1uk
    @JW-pu1uk ปีที่แล้ว +37

    This is absolutely top tier content. I can't stress this enough to people new, or going into the DA/DS field: you WILL be exploring and cleaning data sets much more than you will be visualizing and building models.
    Thanks for this, Alex!

  • @AlastorGarcia
    @AlastorGarcia ปีที่แล้ว +15

    Thanks Alex! Right now i'm applying to my first DA Job and you have no idea how useful your videos have been for me!!

    • @ermano5586
      @ermano5586 ปีที่แล้ว +2

      Hey? How is it going? Did you succed in applying for the job you want?

  • @lajota-7
    @lajota-7 ปีที่แล้ว +17

    Oceania is one of the 7 Continents (North America, South America, Europe, Asia, Africa, Oceania, Antartica). It's basically Australia and the countries (islands) around it.
    Hope that helps!

  • @kartikgupta370
    @kartikgupta370 ปีที่แล้ว +12

    We can also write this to save time writing all the column names in the list "df2 = df.groupby('Continent')[df.columns[12:4:-1]].mean(numeric_only=True).sort_values(by='2022 Population', ascending=False)
    "

  • @sj1795
    @sj1795 ปีที่แล้ว +3

    EXCELLENT SUPERB video!! I can't believe it--I'm 6/7 videos away from the end of your FANTASTIC bootcamp series! Wahoo! I learned a lot in this video. :) As for "ending on a low note", hardly Alex lol All your content is uplifting and rewarding! As always, THANK YOU!

  • @satrapech6107
    @satrapech6107 ปีที่แล้ว +45

    the correction of df.corr() is:
    numeric_columns = df.select_dtypes(include=[np.number])
    correlation_matrix = numeric_columns.corr
    correlation_matrix()

    • @pradiptanugraha6841
      @pradiptanugraha6841 ปีที่แล้ว +1

      Thanks it works. Why df.corr() not working on me ?

    • @rajkumarjadi7061
      @rajkumarjadi7061 ปีที่แล้ว

      thanks man.

    • @ciareghi
      @ciareghi ปีที่แล้ว +58

      df.corr(numeric_only = True)
      worked for me

    • @arrofifahmi7708
      @arrofifahmi7708 ปีที่แล้ว +2

      @@ciareghi me too mate! Thanks a lot!

    • @SDMNKhan
      @SDMNKhan ปีที่แล้ว

      name 'np' not defined?

  • @pradiptisimkhada292
    @pradiptisimkhada292 ปีที่แล้ว +5

    I just finished all the videos in you bootcamp playlist few hours ago and I'm excited to do this again..

  • @adekeyedamola320
    @adekeyedamola320 หลายเดือนก่อน +4

    If you are having any at 16:59, that's because you are looking for the mean of other datatypes apart from numbers. To resolve this, include numeric_only=True in your mean function, as in "df.groupby('Continent').mean(numeric_only=True)"

    • @jh1896
      @jh1896 หลายเดือนก่อน

      you're the GOAT

  • @abhishekchaudhary7913
    @abhishekchaudhary7913 ปีที่แล้ว +4

    df4=df3.sort_index(ascending=True)
    df4 at 26:11 as alex is sorting manually you sort the year directly by this command

  • @MaximKazartsev
    @MaximKazartsev ปีที่แล้ว +5

    Alex, thank you for this great video and everything you do!
    In order to avoid manual ordering of the population years, there is a way to use df.columns method, by adding reversed. The whole construction looks like
    df2 = df.groupby('Continent')[list(reversed(df.columns[5:13]))].mean().sort_values(by='2022 Population', ascending=False)
    And it works )

  • @DEDE-ix9lg
    @DEDE-ix9lg ปีที่แล้ว +1

    I always enjoy a video from Alex. Making one of the best videos , while some other channels just can be a real headache

  • @shankarmidatala2049
    @shankarmidatala2049 6 หลายเดือนก่อน +1

    Namaste! I found your tutorials "Simple, Easy to follow, and To the point". Thanks.

  • @DuckingDuck-th2lt
    @DuckingDuck-th2lt ปีที่แล้ว +12

    Hello, Alex!
    Once again, thanks a lot for all your hard work!
    At 13:10 I got an error ValueError: 'box_aspect' and 'fig_aspect' must be positive"
    Solved it by putting the plt.rcParams BEFORE the sns.heatmap
    The other problem was that some functions didn't work until I added the parameter numeric_only = True, e.g., df.corr (numeric_only=True) or .mean(numeric_only = True)
    Hope, it can help someone!

    • @yanpaucon1043
      @yanpaucon1043 9 หลายเดือนก่อน

      Thank you, You are the Best!

    • @alexishuynh
      @alexishuynh 4 หลายเดือนก่อน

      It certainly helped. Thank you, DuckingDuck.

  • @frenamakenson9844
    @frenamakenson9844 11 หลายเดือนก่อน +28

    Hello,
    100000000 thanks for sharing
    For the Corealtion part at 11mn
    df.corr(numeric_only=True) # pass numeric only param to not having error

  • @kogureyoeh
    @kogureyoeh ปีที่แล้ว +7

    at 24:00
    you can just simply add ".sort_index()" on the "df3 = df2.transpose()", so that we don't have to manually rearrange the columns.
    df3 = df2.transpose().sort_index() worked on my end, hope on your end too.

  • @toygar8699
    @toygar8699 ปีที่แล้ว +29

    For those get error in heatmap:
    import matplotlib.pyplot as plt
    numeric_columns = df.select_dtypes(include=['float'])
    sns.heatmap(numeric_columns.corr(), annot=True)
    plt.rcParams['figure.figsize'] = (20, 7)
    plt.show()

    • @asmitaupadhyay4656
      @asmitaupadhyay4656 10 หลายเดือนก่อน

      thank you

    • @nointernetnarwhal7615
      @nointernetnarwhal7615 10 หลายเดือนก่อน

      THANK YOU!!!!!! I almost quit for good.

    • @nassrmohamed278
      @nassrmohamed278 9 หลายเดือนก่อน

      i had that error in corr : " could not convert string to float: 'AFG'"
      do you know how to solve this

    • @kaleabgirma-x2b
      @kaleabgirma-x2b 9 หลายเดือนก่อน

      thanks a lot toygar

    • @yanpaucon1043
      @yanpaucon1043 9 หลายเดือนก่อน

      @@nassrmohamed278 df.corr(numeric_only=True)

  • @Zenitsu-mq7fq
    @Zenitsu-mq7fq 10 หลายเดือนก่อน +3

    24:50
    df2 = df.groupby('Continent').mean(numeric_only=True).iloc[:, -5:-13:-1].sort_values(by = '1970 Population', ascending = False)
    df2 = df2.transpose()
    df2.plot()
    This way we don't use the copypasting and changing columns, just use reversed indexes)

  • @ngwamalfred8151
    @ngwamalfred8151 ปีที่แล้ว +1

    Where would l have been without this video .

  • @quotesdiary310
    @quotesdiary310 ปีที่แล้ว +2

    Hi Alex
    Thank you so much for your support for freshers in the field of data analytics.

  • @Inc0gnit030
    @Inc0gnit030 ปีที่แล้ว +1

    I really enjoyed this introduction to Pandas! Keep up the good work!

  • @tranguyen4462
    @tranguyen4462 10 หลายเดือนก่อน

    omg I laughed out loud at the "Oceania" part ;)))) Alex is so funny and brutally honest about things he didn't know ;)))

  • @АлександрПокладов-х8т
    @АлександрПокладов-х8т ปีที่แล้ว +1

    Hey, just a quick note here, when we're plotting the populations, it's only related to the numeric values compared to the highest populations, in fact (for example) Oceania's population increased in around 2.5 times
    Anyway, thanks for the content, it's amazing

  • @keluargaindo-timordiuk
    @keluargaindo-timordiuk ปีที่แล้ว +7

    For the grouping data I do df2=df.drop(columns=['CCA3','Country','Capital'])
    df3=df2.groupby('Continent').mean(numeric_only=True).sort_values(by="2022 Population",ascending=False)
    df3
    to get to the same output as seen in the video

    • @danielmariobuchberger
      @danielmariobuchberger ปีที่แล้ว

      Me too, this should be explained, because Strings can not get easy a mean...to long is most the problem!

    • @bolajiawofuwa8116
      @bolajiawofuwa8116 ปีที่แล้ว

      THANK YOU!!!!!!

  • @claudiotomasvaldespinuer4588
    @claudiotomasvaldespinuer4588 11 วันที่ผ่านมา

    hey! one of the best videos ive ever seen. regards from chile

  • @nadarioferguson6276
    @nadarioferguson6276 9 หลายเดือนก่อน

    Thank you so much for this. I really enjoyed it and learned a lot of what I had forgotten a few years ago.

  • @abdulsami6117
    @abdulsami6117 ปีที่แล้ว

    Love from Pakistan Alex, Really Helpful and Enjoyable.
    I also like the OOPS sound you make 😂😂

  • @mirzashahbaz5336
    @mirzashahbaz5336 หลายเดือนก่อน

    25:13 you can apply negative slicing for reversing the plot

  • @neildelacruz6059
    @neildelacruz6059 ปีที่แล้ว +1

    Thank you Alex this is very helpful.

  • @Charlay_Charlay
    @Charlay_Charlay ปีที่แล้ว

    Thank you for the Pandas class!

  • @moniquebrasilbaptista1989
    @moniquebrasilbaptista1989 ปีที่แล้ว

    I am sure I am going to use some of these tips. Thank you!😍❤

  • @adityavamsi12
    @adityavamsi12 3 หลายเดือนก่อน

    Love from India❤❤

  • @aayushitrivedi3481
    @aayushitrivedi3481 ปีที่แล้ว +2

    love your videos alexx ;)

  • @СергейСтуднев
    @СергейСтуднев ปีที่แล้ว +1

    Thank you for the useful information!

  • @aishwaryapattnaik3082
    @aishwaryapattnaik3082 ปีที่แล้ว +2

    Thanks a lot for this clear cut explanation. Can you make something similar for NLP projects end to end ?

  • @sarayusemesta6132
    @sarayusemesta6132 8 หลายเดือนก่อน

    26:00
    you can just add this to inverted columns
    df2 = df.groupby('Continent')[df.columns[5:13]].mean(numeric_only=True).sort_values('2022 Population', ascending=False)
    df2_inverted = df2.iloc[:, ::-1]
    df2_inverted

  • @staquatica1607
    @staquatica1607 ปีที่แล้ว +49

    I got some error's (using pycharm) that I solved by using "mumeric_only=True". For instance: df.corr(numeric_only=True) and df.groupby("Continent").mean(numeric_only=True)

    • @mohammedshadaabkhan3228
      @mohammedshadaabkhan3228 ปีที่แล้ว +6

      Hey use this code instead
      numeric_df = df.select_dtypes(include='number') # Select only numeric columns
      plt.figure(figsize=(20, 7)) # Set the figure size
      sns.heatmap(numeric_df.corr(), annot=True) # Create the heatmap with annotations
      plt.show()

    • @DevanshAsawa
      @DevanshAsawa ปีที่แล้ว +1

      helped a ton thanks

    • @haley2486
      @haley2486 ปีที่แล้ว +1

      Thanks for posting! I had to do SHIFT+TAB on the corr() function to find out how to get only numeric values.

    • @nassrmohamed278
      @nassrmohamed278 9 หลายเดือนก่อน +1

      thaaaaaaaaaaaaaaank youuuuuuuuuuuuuuuuu

  • @vitorribeirosa
    @vitorribeirosa ปีที่แล้ว +1

    Neat...
    Thanks for sharing this content.
    Cheers

  • @SoggyBagelz
    @SoggyBagelz ปีที่แล้ว +3

    Lets goo!

  • @LaMeeLifestyle
    @LaMeeLifestyle ปีที่แล้ว +4

    Thanks for all you do. I’m loving the bootcamp. Just finished excel project. However, please can you make a video on story telling?

  • @kevindeschepper8140
    @kevindeschepper8140 7 หลายเดือนก่อน

    To exclude rank from being display in the numerice data: columns_to_include = df.select_dtypes(include=['number']).columns.difference(['Rank'])

  • @jeffrey6124
    @jeffrey6124 6 หลายเดือนก่อน +1

    Hope you also make a Pyspark series 🤓

  • @elfridhasman4181
    @elfridhasman4181 ปีที่แล้ว +1

    Thank you Alex💯🔥

  • @TheRobinCreations
    @TheRobinCreations ปีที่แล้ว

    Thank you so much it was very informative.

  • @anuarroho2561
    @anuarroho2561 6 หลายเดือนก่อน +4

    mean(numeric_only=True)

  • @jjsan1
    @jjsan1 9 หลายเดือนก่อน

    This is great! Thank you!

  • @kevindeschepper8140
    @kevindeschepper8140 7 หลายเดือนก่อน

    another way to select the columns (think of a big data sets where indicing with numbers would be challeging) columns_to_include_2 = df.select_dtypes(include=['number']).filter(like='population').columns

    • @kevindeschepper8140
      @kevindeschepper8140 7 หลายเดือนก่อน

      columns_to_include_2 = df.select_dtypes(include=['number']).filter(like='Population').columns.difference(["World Population Percentage"]):P

  • @thepasstimevideos7195
    @thepasstimevideos7195 5 หลายเดือนก่อน

    superb video sir..

  • @quotesdiary310
    @quotesdiary310 ปีที่แล้ว +1

    Thank you so much alex

  • @БулатШарафутдинов-р6д
    @БулатШарафутдинов-р6д ปีที่แล้ว

    Again, thank you were much!

  • @innocentnduaguba
    @innocentnduaguba ปีที่แล้ว +2

    Thank you so much Alex, truly great content you put out there. I have a question please; when I run df.groupby('Continent').mean() and df.corr() I get errors, please what could be the cause and what can I do to remedy it.

    • @sabithsaqlain1367
      @sabithsaqlain1367 ปีที่แล้ว +1

      use df.corr(numeric_only = True)

    • @sj1795
      @sj1795 ปีที่แล้ว +1

      @@sabithsaqlain1367 THANK YOU for this!! This was driving me a little nutty. Really appreciate you sharing this. :)

    • @SDMNKhan
      @SDMNKhan ปีที่แล้ว

      I could not fix the mean() issue.

    • @chriscurtis95
      @chriscurtis95 9 หลายเดือนก่อน +1

      df.groupby('Continent').mean(numeric_only=True)

    • @Gratitude-x3g
      @Gratitude-x3g 7 หลายเดือนก่อน

      @@chriscurtis95 🙏 Thank You!

  • @enix492
    @enix492 ปีที่แล้ว +2

    Hello Alex. I read a few reviews on your recommended course on Udemy. People are saying that it is a bit outdated especially the last section. Do you think I should still go for it and the non updated part doesn't matter? Love your content and thanks for everything you do here.

    • @AlexTheAnalyst
      @AlexTheAnalyst  ปีที่แล้ว +3

      I haven't taken it in a while - worth listening to more recent comments. Could be outdated?

  • @Chathur732
    @Chathur732 5 หลายเดือนก่อน +1

    at 11:12 the df.corr() does not work now. Instead use:
    df_numeric = df.select_dtypes(include=[float, int])
    correlation_matrix = df_numeric.corr()
    correlation_matrix

    • @abhi8243
      @abhi8243 5 หลายเดือนก่อน

      Thank u

    • @onosemuodeikuesiri7620
      @onosemuodeikuesiri7620 3 หลายเดือนก่อน

      This is simple and more straightforward
      df.corr(numeric_only = True )

  • @harisahmed7833
    @harisahmed7833 2 หลายเดือนก่อน +1

    im getting this error on df.corr() "could not convert string to float: 'AFG'" plz help

  • @minasghazaryan9344
    @minasghazaryan9344 ปีที่แล้ว +6

    Hi, Alex. First of all thanks for a great video and explanations in it.
    If you could help out with the issue I get running your exact code I would be more than grateful.
    Running the df.corr() line gives me the following error: ValueError: could not convert string to float: 'AFG' .
    Same comes for the heatmap,etc. What could it be here?
    Thanks a lot in advance.

    • @ReneePieschke
      @ReneePieschke ปีที่แล้ว

      Getting the same errors.

    • @11zaad
      @11zaad ปีที่แล้ว +2

      try this ==> df.corr(numeric_only=True)

    • @dustin3320
      @dustin3320 ปีที่แล้ว +13

      Best to use df.corr(numeric_only=True) to get around this

    • @Batira583
      @Batira583 ปีที่แล้ว

      you saved my life thanks so much @@dustin3320

    • @fede77
      @fede77 ปีที่แล้ว +2

      df.corr(numeric_only = True)

  • @bleuthner
    @bleuthner 26 วันที่ผ่านมา

    Hi, I'm struggeling again at [27] (~16:44): finally I found that there are some pandas issues: f.e. in this video it's used: df.groupby('Continent').mean() which leads to an error. ==>
    current pandas needs: df.groupby('Continent').mean(numeric_only=True) !!
    ==> Type Error: agg function failed [how->mean,dtype->object]
    I hope this will help.

  • @youssefbekk4453
    @youssefbekk4453 ปีที่แล้ว

    high level , thanks

  • @HarshKumar-ws3wv
    @HarshKumar-ws3wv 11 หลายเดือนก่อน

    Sir, in your opinion : Jupyter vs Pycharm? Which is better for Exploratory Data Analysis ?

  • @haithammontaser7769
    @haithammontaser7769 ปีที่แล้ว

    Hello Alex. Thanks for the video and content. Is there any video for data per-processing?

  • @octaverius
    @octaverius ปีที่แล้ว +3

    Alex which continent do you think Australia is in 😮

    • @AlexTheAnalyst
      @AlexTheAnalyst  ปีที่แล้ว

      :D

    • @chefernandez563
      @chefernandez563 ปีที่แล้ว

      Australia is also a continent tho😂 sometimes ppl will also refere to NZ ans Aus as the "Australias" but Oceania includes the other surrounding islands

    • @octaverius
      @octaverius ปีที่แล้ว +1

      @@chefernandez563 Oceania is a continent, Australia is a country. How people often speak is not relevant

    • @dragoneer121
      @dragoneer121 ปีที่แล้ว

      @@octaverius Actually it is relevant. Though different countries do have different models and its entirely up to convention. Australia the continent is usually considered the 3 islands of mainland Australia, Tasmania and Papua New Guinea

  • @zachary626
    @zachary626 2 หลายเดือนก่อน

    df.corr() ❌
    df.corr(numeric_only=True) ✅
    since this posting numeric_only now defaults to False so if using newer versions of panda here is the correction:

  • @axan6000
    @axan6000 2 หลายเดือนก่อน

    If you want research a trend you need equal range ex. every 10 years on x axis. In your case there is 10 years and then 5 years and 2 years. In my opinion this is wrong approach. Am I right?
    Btw. Nice content :)

  • @diegomartins7214
    @diegomartins7214 ปีที่แล้ว

    Thank you!

  • @sivasagarchakkarai1687
    @sivasagarchakkarai1687 7 หลายเดือนก่อน +1

    If "df.corr()" doesn't work for the same data set were using in this Video. And It throughs an error : could not covert string to float: 'AFG'. Like this, Try : df.corr(numeric_only = True)

    • @nitinrawat-g6t
      @nitinrawat-g6t 7 หลายเดือนก่อน

      same

    • @nitinrawat-g6t
      @nitinrawat-g6t 7 หลายเดือนก่อน

      numeric_columns = df.select_dtypes(include=[np.number])
      correlation_matrix = numeric_columns.corr
      correlation_matrix()

  • @iqraasif3783
    @iqraasif3783 ปีที่แล้ว +1

    Hi, can someone help. When I plot figures that have been grouped, it doesn't show the figure, just says .

    • @JayDenton-n1n
      @JayDenton-n1n ปีที่แล้ว

      21:09 I just figured it out. Simply add another line after the plot, like:
      df2.plot()
      plt.show()

  • @rjk537
    @rjk537 ปีที่แล้ว +1

    I'm a law graduate without any experience or qualifications in data analysis whatsoever but i want to get into data analysis. Will i be able to get a job in this field? and if yes then what possible skills and certifications will help me to achieve the same? please give me some tips and insights it would be really helpful!

    • @ermano5586
      @ermano5586 ปีที่แล้ว

      Yes, you can, from skills I would prefer mostly analytical thinking, learn probability and statistics, other high math stuff.
      From certification mr Alex said that Amazon and Tableau certifications, and others will help, but anyways if it's long-term learning certificate, I think it is ok to have it on CV. But the thing that highlites you it is the projects that you have done mostly for your job and I mean not only portfolio projects but another ones to show your uniqueness.

  • @adminravi
    @adminravi ปีที่แล้ว +1

    Is it ok if I use:
    pd.set_option('display.float_format', '{:.2f}'.format) instead of
    pd.set_option('display.float_format', lambda x: '%.2f' % x)

    • @rohallav
      @rohallav ปีที่แล้ว

      or even better you can do lambda x: f"{x:.2f}"

  • @ajeyarajupadhyaya8287
    @ajeyarajupadhyaya8287 4 หลายเดือนก่อน

    Hey please tell me how to get a discount for the python with pandas course It is too expensive in Indian currency

  • @truthgaming2296
    @truthgaming2296 ปีที่แล้ว

    its spells 'O-Ce-A-Nia' btw
    btw thank for this guidance SIr Alex :)

  • @gauravpunera3256
    @gauravpunera3256 ปีที่แล้ว +1

    Alex please make video on how to get international remote data analyst job

  • @OazadOMER
    @OazadOMER ปีที่แล้ว +1

    Thank you very much Alex I'm shifting from Ph to Data Analyst with your bootcamp I had an issue with plt.show() AttributeError: module 'matplotlib' has no attribute 'show' i's deprecated and I counldn't find something sameller and also my chart not showing numbers 14:10
    Best regards

    • @dishanbhandari
      @dishanbhandari 9 หลายเดือนก่อน +1

      Hi there, did u find the solution to your problem of not showing numbers? I ran into the same problem too.

    • @olaleyeboluwatife949
      @olaleyeboluwatife949 7 หลายเดือนก่อน

      @@dishanbhandari hey mate, you found the solution?

  • @ayoubchouket
    @ayoubchouket 11 หลายเดือนก่อน

    thank you

  • @r10053506
    @r10053506 8 หลายเดือนก่อน

    why is my program when running corr() is not automatically detecting numbers and runs into an error

  • @nishanths3176
    @nishanths3176 5 หลายเดือนก่อน

    Can I get the dataset for this

  • @dishanbhandari
    @dishanbhandari 9 หลายเดือนก่อน

    My heatmap doesn’t contain the data values inside them as in 14:18 instead it just shows a heatmap with column values as in the top most band. I have written the code just as shown above df.corr(numeric_only=True) as well as that ‘annot’ but still no data values. Pls Anyone help

    • @NyeinHtutSwe
      @NyeinHtutSwe 9 หลายเดือนก่อน

      i am also run into same problem :). I still cant find the solution

    • @jDub997D
      @jDub997D 8 หลายเดือนก่อน +1

      upgrade your seaborn package
      pip install seaborn --upgrade
      restart your kernel and rerun all the boxes

    • @olaleyeboluwatife949
      @olaleyeboluwatife949 7 หลายเดือนก่อน

      @@jDub997D 1000 thanks bruv... bless you

    • @dcj247
      @dcj247 หลายเดือนก่อน

      Try the following if you're still lost on this part.
      numeric_columns = df.select_dtypes(include=['float'])
      sns.heatmap(df.corr(numeric_only = True), annot=True)
      plt.show()

  • @donvious
    @donvious 10 หลายเดือนก่อน

    hi, where is the link for the csv format document?

  • @oluwanifemishittu9586
    @oluwanifemishittu9586 3 หลายเดือนก่อน

    where do i get the csv file from?

  • @karanvaghela4668
    @karanvaghela4668 ปีที่แล้ว

    Hey alex why we should use python instead of SQl Because SQl is easy

  • @Marcusram
    @Marcusram ปีที่แล้ว

    we can do df3=df3.iloc[::-1] to solve the problem with the date order

  • @l7932
    @l7932 8 หลายเดือนก่อน

    thanks sir

  • @arpitmaheshwari122
    @arpitmaheshwari122 ปีที่แล้ว +1

    hey, can anyone tell if the correlation command is working in vs code?
    I'm getting a value error in this part.
    please share the solution if you have one
    thanks :)

    • @Shashankkundena
      @Shashankkundena 11 หลายเดือนก่อน

      Hey, just use numeric_only = True

  • @akademy_performance_digital
    @akademy_performance_digital ปีที่แล้ว

    great

  • @philiprhome3824
    @philiprhome3824 ปีที่แล้ว +1

    as R user, the syntax of pandas is just weird in compare to tidyverse (dplyr and tidyr)

  • @SieanElpidama
    @SieanElpidama 8 หลายเดือนก่อน

    my heatmap is broken its not showing all the values even if I wrote the annot = True anyone have a fix? i tried almost everything when I hit shift+tab

  • @chgfxghjjkllll
    @chgfxghjjkllll 5 หลายเดือนก่อน

    oh-shee-ana ! you killed me ...

  • @meredithleonor5035
    @meredithleonor5035 ปีที่แล้ว

    why use anaconda instead of google collab, just curious looking forward in visual tutorial at python and statistics thanks i really need this type of tutorial i am studying cohort analysis and RFM analysis

    • @peaceandlove8862
      @peaceandlove8862 ปีที่แล้ว

      Oceania is the continent that includes Australian and New Zealand.

  • @osiomogieasekome8799
    @osiomogieasekome8799 ปีที่แล้ว

    I couldn't get seaborn to import... I tried online solutions about installation but it didn't work

  • @alikoohi8265
    @alikoohi8265 ปีที่แล้ว

    informative video thanks.Just found an easier way to reverse order of rows:
    df3 = df2.transpose().loc[::-1] 😉

  • @sandipthepro
    @sandipthepro 4 หลายเดือนก่อน

    Unable to use groupby() in 'Continents' its showing an error: agg function failed [how->mean,dtype->object]
    Plese help me with this solution anyone

  • @juancruzmarques2106
    @juancruzmarques2106 2 หลายเดือนก่อน

    Oceania ¿!?? ... Tell me you're american without telling me you're american...
    I'm messing around don't take it seriously,
    Great video!

  • @ermano5586
    @ermano5586 ปีที่แล้ว

    I have one problem, which is that the table does not display columns starting from "area (km^2)" when we call "df" to view the table, I mean there is no scrollbar for horizontal data, can anyone help for this, please?

    • @ruchirmittal9207
      @ruchirmittal9207 ปีที่แล้ว +1

      Try another browser. Some browsers doesn't support that feature.

  • @rnjesus9950
    @rnjesus9950 ปีที่แล้ว

    This worked for me where df.corr() did not:
    # Select numeric columns (excluding any non-numeric columns)
    numeric_columns = df.select_dtypes(include=['float64', 'int64'])
    # Calculate the correlation matrix
    correlation_matrix = numeric_columns.corr()
    correlation_matrix

  • @BhaskarDial
    @BhaskarDial 8 หลายเดือนก่อน

    corr_matrix = df.select_dtypes(include='number').corr()
    # Then proceed with creating the heatmap
    sns.heatmap(corr_matrix, annot=True)
    plt.rcParams['figure.figsize'] = (20, 7)
    plt.show()
    I have used this code for heatmap but the notebook doesn't populate the heatmap with individual correlation values rather colored tiles only. please anyone can help?

    • @pixelsNpositivity
      @pixelsNpositivity 7 หลายเดือนก่อน

      pip install --upgrade seaborn matplotlib
      Update seaborn and matplotlib. It worked for me

  • @shafiq_ramli
    @shafiq_ramli 2 หลายเดือนก่อน

    So there's another EDA acronym in tech other than Event Driven Architecture.

  • @srijanrawat4014
    @srijanrawat4014 ปีที่แล้ว

    i am having problem in downloading the file , can anyone help me out

  • @orlumbuseuw5646
    @orlumbuseuw5646 ปีที่แล้ว +19

    Was there here an adult ignorant of what Oceania is or is this some inner joke in the channel?

    • @octaverius
      @octaverius ปีที่แล้ว +2

      I can't believe this

    • @litoavila.
      @litoavila. ปีที่แล้ว +1

      Also FYI America is just one continent, in case you doubt it

    • @MatthewBreithaupt
      @MatthewBreithaupt ปีที่แล้ว +2

      OceanEeeA

    • @MatthewBreithaupt
      @MatthewBreithaupt ปีที่แล้ว +1

      FYI Australia is not a *small* island. Oceania doesn't "mean" anything, it's the name of a continent containing the countries listed right in front of you since you already filtered the data 😂😂

  • @taroge5464
    @taroge5464 ปีที่แล้ว +1

    no explanation.................pd.set_option('display.float_format',lambda x : '%.2f' % x)

  • @dragoneer121
    @dragoneer121 ปีที่แล้ว +1

    Continents are mostly a social convention. The english spekaing countries tend to use 7, while spanish speaking countries have a 6 continent model where it uses Oceania and combines North and south America.
    Australia is the continent but Oceania is a geopolitical convenience. If it was not included most of the pacific isalnd countries would not be associated with a continent. North and South America are another convenience and Central america is only a region by American standards.
    As an example of how ridiculous it is as a continent, Hawaii would be included if it was independant.