When should I use a "groupby" in pandas?

แชร์
ฝัง
  • เผยแพร่เมื่อ 15 ต.ค. 2024
  • The pandas "groupby" method allows you to split a DataFrame into groups, apply a function to each group independently, and then combine the results back together. This is called the "split-apply-combine" pattern, and is a powerful tool for analyzing data across different categories. In this video, I'll explain when you should use a groupby and then demonstrate its flexibility using four different examples.
    SUBSCRIBE to learn data science with Python:
    www.youtube.co...
    JOIN the "Data School Insiders" community and receive exclusive rewards:
    / dataschool
    == RESOURCES ==
    GitHub repository for the series: github.com/jus...
    "groupby" documentation: pandas.pydata.o...
    "agg" documentation: pandas.pydata.o...
    "plot" documentation: pandas.pydata.o...
    == LET'S CONNECT! ==
    Newsletter: www.dataschool...
    Twitter: / justmarkham
    Facebook: / datascienceschool
    LinkedIn: / justmarkham

ความคิดเห็น • 637

  • @soumenhalder4831
    @soumenhalder4831 4 ปีที่แล้ว +4

    Me a Ph.D. student in experimental high energy physics getting spirit from all these tricks, and avoiding some boring tasks. Thank you very much, sir.

    • @dataschool
      @dataschool  4 ปีที่แล้ว

      You're very welcome!

  • @coolfooly
    @coolfooly 6 ปีที่แล้ว +109

    Seriously this was an amazing explanation. I cant like it enough.

  • @krishnendusaha5940
    @krishnendusaha5940 6 ปีที่แล้ว +47

    Great video. It's better than many of the so- called great courses on the internet! Cheers!

    • @dataschool
      @dataschool  6 ปีที่แล้ว

      Awesome, thank you! :)

  • @mkaberli
    @mkaberli 5 ปีที่แล้ว +4

    You’re my go to person for informative videos on pandas, Numpy, and Matplotlib.

  • @mitchellyula4447
    @mitchellyula4447 3 หลายเดือนก่อน +1

    Thank you so much! I never used the groupby function until now and this helped me complete my pandas assignment! very clear explanation and your videos have helped me a ton!

    • @dataschool
      @dataschool  2 หลายเดือนก่อน

      Glad it helped! 🙌

  • @shahzan525
    @shahzan525 4 ปีที่แล้ว +3

    Your content is amazing , I visited some others channel , they are not taught like you,
    I see this video in suggestions & I learn perfectly how to use groupsby and where to use ...
    Thanks man........

  • @diegodesouza302
    @diegodesouza302 4 ปีที่แล้ว

    ok...i just left my afforded course from edx to watch only Data School videos...no regreted!!!! The way this guy explain should be a pattern to all teaches in the world! Love from Brazil!!!

    • @dataschool
      @dataschool  3 ปีที่แล้ว +1

      Thank you so much!

  • @uguree
    @uguree 3 ปีที่แล้ว +1

    I will start your Sci-kit series too as I enjoyed so much watching all these pandas series as very clear.

  • @marcinzaremba1811
    @marcinzaremba1811 3 ปีที่แล้ว +2

    That is exactly what I was looking for and couldn't find at the same time. Thank you!!!

    • @dataschool
      @dataschool  3 ปีที่แล้ว

      You are welcome!

  • @arashchitgar7445
    @arashchitgar7445 5 ปีที่แล้ว +3

    WOW... I wish all the tutorials were that clear. Thanks man!

    • @dataschool
      @dataschool  5 ปีที่แล้ว +1

      You are too kind! 😄

    • @uguree
      @uguree 3 ปีที่แล้ว

      Corey and Kevin you are brilliant. Thanks Corey as he advertised your channel, so pleased. It's like here go how to make a you tube content to teach

  • @robertobarriosduran2622
    @robertobarriosduran2622 4 ปีที่แล้ว +2

    Even for my bad english level, this explanation was so clear. Big Thanks!

  • @BigAC7
    @BigAC7 3 ปีที่แล้ว +1

    This is great! I'm taking an absolutely hateful pandas course that's basically a needle-in-a-haystack scavenger hunt. Faced with a .groupby() problem where the course materials in no way logically lead to the answer, I wound up here. The problem doesn't precisely conform to your tutorial, but just three of your lines were enough to convey the basic mechanism of action that led to an answer. Thanks!

    • @dataschool
      @dataschool  3 ปีที่แล้ว

      Awesome to hear!

  • @robhubert8350
    @robhubert8350 6 ปีที่แล้ว +1

    Nice job with the explanation. I combed through dozens of videos and articles online specific to leveraging groupby and your's by far is the best.

    • @dataschool
      @dataschool  6 ปีที่แล้ว

      Thanks very much for your kind words!

  • @muyideenjimoh6672
    @muyideenjimoh6672 4 ปีที่แล้ว +1

    First video I clicked on and it answered all my confusion. Many thanks.

  • @ruslanzakharov4218
    @ruslanzakharov4218 5 ปีที่แล้ว +1

    It was really useful for me. I don't know english well, but I understood almost all words in this video

    • @dataschool
      @dataschool  5 ปีที่แล้ว

      That's awesome to hear! 🙌

  • @dipakshah1008
    @dipakshah1008 3 ปีที่แล้ว +1

    You explain pandas very well! Pls keep posting ...would love to see something on how to apply custom functions to tables (avoiding iteration use)

    • @ChartExplorers
      @ChartExplorers 3 ปีที่แล้ว

      Hi, it sounds like what you are looking for is vectorization. I would recommend reading up on numpy where() and numpy select(). I looked through Kevin's (data school's) videos and didn't see anything on this topic. He is amazing at explaining things, in the mean time you many be interested in this video where I talk about vectorizing a little bit at the end of the video here:
      th-cam.com/video/CG3EV7UBELA/w-d-xo.html

  • @fezulhasan3647
    @fezulhasan3647 2 ปีที่แล้ว

    Man you are amazing there is no one who can explain better than you. Love from India buddy.

    • @dataschool
      @dataschool  2 ปีที่แล้ว

      Thank you so much!

  • @sulaimankhan8033
    @sulaimankhan8033 4 ปีที่แล้ว +1

    Your'e rate of speech is inline with your coding..., I appreciate it.. thanks

  • @nitishkumar-bk8kd
    @nitishkumar-bk8kd 4 ปีที่แล้ว

    the best way of explaining pandas ,big fan for your teaching

  • @asadtanvir4065
    @asadtanvir4065 6 ปีที่แล้ว +2

    Thank you Man! You are doing a great job here by showing Pandas tutorials in such easy way. Pray that you can carry out this benevolent work more

    • @dataschool
      @dataschool  6 ปีที่แล้ว

      Thanks very much for your kind words!

  • @TexaSol2020
    @TexaSol2020 3 ปีที่แล้ว

    excellent description. Much better than many on LinkedIn.

  • @janaal-sokhon2462
    @janaal-sokhon2462 4 ปีที่แล้ว

    Thank you so much for all of your videos that explain pandas. You seriously explain better than my teachers. THANK YOU

  • @weskitlasten417
    @weskitlasten417 6 ปีที่แล้ว +2

    Thanks! I realize this is years old, but worth a shot...
    One thing I would love to see clearly explained is how to change values in one column based on a condition on values in another column. For example, if drinks.spirits > 100 and drinks.total_liters_of pure_alcohol >10, change drinks.continent (or other column) to "lush". As an addition, if values in one column (say drinks.name) contain the word "dirty" change drinks.briny to True. Etc

    • @dataschool
      @dataschool  6 ปีที่แล้ว +5

      For your first example, the code would look something like this:
      drinks.loc[(drinks.spirits > 100) & (drinks.total_liters_of pure_alcohol >10), 'continent'] = 'lush'
      loc is super useful! Here's more information on loc: th-cam.com/video/xvpNA7bC8cs/w-d-xo.html

  • @julieye2260
    @julieye2260 4 ปีที่แล้ว

    Thank you so much! This video is very helpful and great quality! I appreciate your speaking speed; it is definitely friendly for foreigners to follow.

  • @daudujonnie
    @daudujonnie ปีที่แล้ว

    You are simply fabulous. I think you deserve a grand master award.

  • @cyruswpl
    @cyruswpl 8 ปีที่แล้ว +2

    This is really helpful and easy to understand! Looking forward to your future great works, keep it up.

  • @nelsonkayode106
    @nelsonkayode106 ปีที่แล้ว

    This is the best groupby video I've seen. Thank you for your commitment. You just gained a subscriber.

    • @dataschool
      @dataschool  ปีที่แล้ว

      Thank you so much!

  • @fadilyassin4597
    @fadilyassin4597 ปีที่แล้ว

    the way you teach is unique and good

  • @ramakanthrayanchi8888
    @ramakanthrayanchi8888 8 ปีที่แล้ว

    Awesome. The tip at last would be very much helpful if we want to apply a function on all the numeric columns at once for each group independently.

  • @Jettiesburg
    @Jettiesburg 7 ปีที่แล้ว +5

    Thanks, top video!
    Only suggestion would be to expand on this with some strategies for creating a new DF with the grouped/agg values.
    Thanks again

    • @dataschool
      @dataschool  7 ปีที่แล้ว

      Thanks for your suggestion!

  • @rafaelzanfolin7745
    @rafaelzanfolin7745 3 ปีที่แล้ว +1

    Thank you sir! Eu really saved my life, hugs from Brazil!!

  • @karmakast806
    @karmakast806 5 ปีที่แล้ว +1

    oh my go.. such clarity in explanation.

  • @josephvargas6756
    @josephvargas6756 6 ปีที่แล้ว

    You sir, are a Gentleman and a Scholar. Thanks for your videos.

  • @sarandam5125
    @sarandam5125 5 ปีที่แล้ว +2

    Thank you very much for an amazing explanation. I doubted .groupby method for a long time and finally found this video, This video totally clear my doubt out :D

    • @dataschool
      @dataschool  5 ปีที่แล้ว

      You are very welcome! So glad it was helpful to you :)
      You might be interested in my latest video, which has some usage examples of groupby: th-cam.com/video/dPwLlJkSHLo/w-d-xo.html

  • @fawadkhan8905
    @fawadkhan8905 5 ปีที่แล้ว

    How nice of you to come up with such great videos. Thanks

    • @dataschool
      @dataschool  5 ปีที่แล้ว

      Thanks for your kind words!

  • @redcat7467
    @redcat7467 4 ปีที่แล้ว +1

    Man, you got one hell of a channel. Thanks!

    • @dataschool
      @dataschool  3 ปีที่แล้ว +1

      I appreciate that!

  • @panchopaulo111
    @panchopaulo111 4 ปีที่แล้ว +1

    This video was just amazing!! You just got a new subscriber.

  • @jakeb2553
    @jakeb2553 4 ปีที่แล้ว +2

    What do you recommend if I want to group by conditionals? Let's say I have a column "Blood Pressure" and I want to group by values 120.

  • @terrancedejesus625
    @terrancedejesus625 8 ปีที่แล้ว

    Fantastic video. Using query() is another great way to filter through data and add multiple filters. I have often found it to be less typing and alot more simple. Example: drinks.query("continent == 'Europe'").beer_servings.mean(). Thanks for the video!!

    • @dataschool
      @dataschool  8 ปีที่แล้ว

      Thanks for your kind words! I personally don't use "query" because the pandas documentation lists it as "experimental", which tells me that they might change the API or eventually remove it: pandas.pydata.org/pandas-docs/stable/indexing.html#the-query-method-experimental
      However, functions can always change regardless of whether they are listed as "experimental", so I guess there's no harm in using it! :)

  • @astronauta8132
    @astronauta8132 6 ปีที่แล้ว +1

    man, your explanations are so good. keep it straight, keep it simple, keep it upPPP!!!!

    • @dataschool
      @dataschool  6 ปีที่แล้ว

      Thanks very much for your kind words! :)

  • @davebeckham5429
    @davebeckham5429 4 ปีที่แล้ว +1

    Excellent tutorials as always Kevin. Thanks for sharing.

  • @scottlucas3710
    @scottlucas3710 7 ปีที่แล้ว

    Excellent job of explaining pretty complicated subject matter !

    • @dataschool
      @dataschool  7 ปีที่แล้ว

      Thanks so much!

    • @Mark-sf4kh
      @Mark-sf4kh 6 ปีที่แล้ว

      Data School you guys have a slack channel or blog where one can post questions?

  • @Kristina_Tsoy
    @Kristina_Tsoy 2 ปีที่แล้ว +1

    Amazing video as always, Kevin! Thank you for great tip at 03:48

  • @praveenudayakumar5128
    @praveenudayakumar5128 7 ปีที่แล้ว

    Good explanation, audio clarity and good English too thank u bro.

  • @ranger52289
    @ranger52289 ปีที่แล้ว

    Great explanation! How could you show the information for the country in each continent with, say, the highest beer servings?

  • @brandonwarfield5611
    @brandonwarfield5611 11 หลายเดือนก่อน

    Wow! You explained that so well that it sticks. Thank you!!

    • @dataschool
      @dataschool  9 หลายเดือนก่อน

      You're very welcome!

  • @Tony-rx2zj
    @Tony-rx2zj 4 ปีที่แล้ว

    i recently noticed someone use this approach, drinks.groupby("continent").agg({"beer_servings":"mean"}) instead of drinks.groupby("continent").beer_servings.agg("mean") . Is it more preferable to use the first solution as it returns "beer_serving" label on column?

  • @scorpsallday33
    @scorpsallday33 7 ปีที่แล้ว

    Are you fucking kidding me.... you sir are the man. These tutorials have been blessing me. Where have you been all my life 😂

    • @dataschool
      @dataschool  7 ปีที่แล้ว

      Awesome! Thanks so much for your very kind comments :)

  • @MrCraptakular
    @MrCraptakular 7 ปีที่แล้ว +1

    Hello. Firstly, thank you for this magnificent video series. I really mean that! I have a question, continuing with your beer example. If you didn't have a continent column in the source data how would you accomplish the same output?

    • @dataschool
      @dataschool  7 ปีที่แล้ว

      Glad you like the series! Regarding your question, the continent column has to be part of the DataFrame, otherwise you can't group by it. Hope that answers your question!

  • @aditimohapatra312
    @aditimohapatra312 4 หลายเดือนก่อน

    sir why in the last 2 cases where we didn't specify, in there with mean it is not executed but with min, max and count, it is being executed without showing any error? same for the visual form also?? help

  • @RichardGreco
    @RichardGreco 4 ปีที่แล้ว

    Excellent tutorial! If possible maybe another video to refresh and/or to address Pandas 1.0.x?

    • @dataschool
      @dataschool  4 ปีที่แล้ว

      Thanks for your suggestion!

  • @kleczekr
    @kleczekr 8 ปีที่แล้ว

    You can also try to run the code:
    import seaborn as sb
    before running the code plotting the graph. The difference is really amazing (and you can see the length of bars for Europe and North America, which in the standard graph are behind the legend).

    • @dataschool
      @dataschool  8 ปีที่แล้ว +1

      Great tip! Yes, the default plot aesthetics will change simply by importing Seaborn.

  • @raveeshmalhotra7347
    @raveeshmalhotra7347 4 ปีที่แล้ว

    greetings BROTHER,
    at 1:47 you have written command drinks.groupby('continent').beer_servings.mean() , is there another way of writing which includes df['beer_servings'] instead of writing in attribute format (.beer_servings)??

  • @brianbii1104
    @brianbii1104 3 ปีที่แล้ว

    I really enjoy ur video. i was wondering if you had any videos on how to do a report and if not then can u point to some resources that may have a way to explain it to me.

  • @bobvance9519
    @bobvance9519 ปีที่แล้ว +1

    Could you use the "for each" formulation when you're grouping by multiple categories?

    • @dataschool
      @dataschool  ปีที่แล้ว

      Great question! Yes, you would say "for each combination of these categories".

  • @All_Things_Trucking
    @All_Things_Trucking ปีที่แล้ว

    Hey, thanks for the video on GroupBy. I am curious how you can use GroupBy in the context of quicker data processing of slicing and dicing. Rather than use a full dataset, use first groupby function to condense dataset and then work with that condensed dataset. Can you think about that scenario and elaborate how to do it in the most efficient way?

    • @dataschool
      @dataschool  ปีที่แล้ว

      Sure, you can save the groupby object as then use that for future calculations!

  • @RBambangWidiatmoko
    @RBambangWidiatmoko 3 ปีที่แล้ว

    Thank you for your awesome explanation.
    One question, if
    - drinks.groupby('continent').beer_servings.mean() is for one column and
    - drinks.groupby('continent').mean() is for all columns (except 'continent')
    How about some columns, eg. beer_servings & wine_servings ?

    • @ChartExplorers
      @ChartExplorers 3 ปีที่แล้ว +1

      Not sure which version you are going for...
      # Groupby two columns
      drinks.groupby(['beer_servings','wine_servings']).mean()
      # Groupby one column and get mean on multiple columns
      drinks.groupby(['continent'])[['beer_servings','wine_servings']].mean()
      Here is a video that goes over these methods in more detail th-cam.com/video/ipoSjrN0oh0/w-d-xo.html

    • @RBambangWidiatmoko
      @RBambangWidiatmoko 3 ปีที่แล้ว

      @@ChartExplorers thank you 👍

  • @СтепанЦыбин-ю9д
    @СтепанЦыбин-ю9д 5 ปีที่แล้ว

    I have already written before and want to say it again. Great lessons, thanks teacher!

    • @dataschool
      @dataschool  5 ปีที่แล้ว +1

      You are so very welcome - thanks for your kind comment! 🙌

  • @zohaibhasan5061
    @zohaibhasan5061 9 หลายเดือนก่อน +2

    I am a beginner to pandas. Amazing videos that were suggested by a friend. I have a doubt where I am stuck. For me in Python 3 drinks.groupby('continent').mean() does not work. min, max and count works but this does not (not without specifying 1 or more column names)

    • @dataschool
      @dataschool  9 หลายเดือนก่อน +2

      Excellent question! In the current version of pandas, if a DataFrame contains non-numeric data and you want to calculate the mean of all numeric columns after a groupby, you have to include the argument numeric_only=True. Thus: drinks.groupby('continent').mean(numeric_only=True). Hope that helps!

    • @zohaibhasan5061
      @zohaibhasan5061 9 หลายเดือนก่อน

      @@dataschool Yes it sure helps!!! Thanks for pointing out this argument

  • @ckt991
    @ckt991 2 ปีที่แล้ว

    I really like your videos, thanks. What if you wanted to check for condition before doing any math? Life if the min > 0, then do some math operation?

  • @ahowl7mx
    @ahowl7mx 3 ปีที่แล้ว

    I have a groupby question that I can't find on the internet. What if my dataframe columns are dates: 1/1/2021, 1/15/2021, 2/1/2021...5/15. Can I groupby by month? The dates run along the x axis as columns and not the Y axis as rows.

    • @dataschool
      @dataschool  3 ปีที่แล้ว

      I think you will need to convert to datetime format and then use resample. Hope that helps!

  • @tomrhee1
    @tomrhee1 6 หลายเดือนก่อน

    Unfortunately, some of the values in the drinks data frame turned out to be NaN. I wonder if you could show us how to handle missing values to this. Thanks.

  • @shiyunjiang8085
    @shiyunjiang8085 5 ปีที่แล้ว +2

    Can I ask how to calculate the group by aggregation and only count unique number?Thanks ( BTW YOU ARE AMAZING)

    • @dataschool
      @dataschool  5 ปีที่แล้ว

      Thanks for your kind words! As for your question, could you clarify? I don't quite understand. Thanks!

  • @abhishek-hb1vg
    @abhishek-hb1vg 5 ปีที่แล้ว +1

    wooooooowowowowowoow this is what i felt when u used the matplotlib function. U r amazing.

  • @johnnyrotten43
    @johnnyrotten43 5 ปีที่แล้ว +1

    You made that so easy. Thank you.

    • @dataschool
      @dataschool  4 ปีที่แล้ว

      You're very welcome!

  • @RayedWahed
    @RayedWahed 8 ปีที่แล้ว

    Great Series! Can you please upload a couple of detailed videos on plotting graphs and graphically visualizing data to spot noise and to determine which features may be less important? Thank you

    • @dataschool
      @dataschool  8 ปีที่แล้ว +1

      Thanks for the suggestion! I will eventually cover visualization in this series. Regarding feature selection, I'm unlikely to cover it in this video series, though I will cover it in an upcoming course. Make sure you are subscribed to my newsletter to hear about upcoming courses: www.dataschool.io/subscribe/

  • @saifeddinenasralli5656
    @saifeddinenasralli5656 4 ปีที่แล้ว

    Hello,
    what if i want to group by a string type of columns
    i mean i could use groupby('first_column).second_column.sum() but this will give me the strings attached to each others without a space between them :/
    so how can i add the space ??

  • @rohitjoshi1777
    @rohitjoshi1777 4 ปีที่แล้ว

    Clear explanation please make video on enumerate function in pandas

    • @dataschool
      @dataschool  3 ปีที่แล้ว

      Thanks for your suggestion!

  • @iampujan
    @iampujan 6 ปีที่แล้ว +1

    Easily understandable. Thanks for the video and keep making such videos.

    • @dataschool
      @dataschool  6 ปีที่แล้ว

      You're very welcome!

  • @JunaidInHenan
    @JunaidInHenan 4 ปีที่แล้ว

    Hi man, thanks for wonderful explanation, I wanna extract mean value of different columns which is just you shown, but my question is can i extract mean of text columns by transforming or something etc? I have data set which has these columns (Tweet, verified, Retweet, followers following and label(category)) Now i want to calculate mean of all the columns based on label column which is actually category column and 7 categories. e.g: i want to display in tweet column that : what is average no of tweets in category 1, category 2, category 3 and so on. same as for other columns, but problem is tweet column is text column. so is there any way to transform, and calculate the average etc? Thanking in anticipation.

  • @creekielappy
    @creekielappy 8 ปีที่แล้ว

    Well hopefully you could help me out, I'm currently working with a dataset that has a lot of redundant description values in a certain column, can a groupby be used to find the maximum number value for each of those redundant values? Say I have 4 rows that have the ACC value, but each of those rows have the number value as follows: 1, 4, 6, 3.... can a groupyby be used to find the one row that has the max number value?

    • @dataschool
      @dataschool  8 ปีที่แล้ว

      I'm sorry, I don't quite understand. Could you provide a short example to explain what you are trying to do? This link might help you to create a reproducible example: stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples

  • @wadabulldog
    @wadabulldog 7 ปีที่แล้ว

    Hi, can i possibly use groupby to assign items in a dataframe as keys in a dictionary, and then reference other items as their values? For instance, in a two column DF and you have name,age,sex,country in one column and the corresponding answers in the next column, can you use groupby to make a nested dictionary that looks like this
    biodata={name:{age:20,sex:male,country:latvia}}
    If it isn't possible, kindly advise on which pandas function is more appropriate for this.
    Thanks

    • @dataschool
      @dataschool  7 ปีที่แล้ว

      I'm not sure how to do what you are suggesting. Sorry!

  • @adityaghosh8601
    @adityaghosh8601 5 ปีที่แล้ว

    i am using imdb data set
    Index(['Plot', 'Title', 'imdbVotes', 'Poster', 'imdbRating', 'Genre', 'imdbID',
    'Year', 'Language'],
    dtype='object')
    here is the question :Extract the unique genres and its count and store in data frame with index key.
    imdb.groupby('Genre' ,as_index='Genre').Genre.agg(['count'])
    1 is this right querry correct to extract unique genre and its count
    2. does groupby groups by unique items

    • @dataschool
      @dataschool  5 ปีที่แล้ว

      1. I think that query accomplishes your goal, though I don't think you need the as_index parameter. You could also use: pd.DataFrame(movies.genre.value_counts().sort_index())
      2. If I understand your question, the answer is yes.
      Hope that helps!

    • @adityaghosh8601
      @adityaghosh8601 5 ปีที่แล้ว

      @@dataschool Thank you so much sir.never expected this quick response. God bless you sir

    • @dataschool
      @dataschool  5 ปีที่แล้ว

      You're very welcome!

  • @raghavendra4955
    @raghavendra4955 4 ปีที่แล้ว +1

    Y there are less views🙃🙃.. Its an amazing explanation..

  • @hakunamatata-qu7ft
    @hakunamatata-qu7ft 4 ปีที่แล้ว

    please do a video on all possible string operations possible in data preprocessing

    • @dataschool
      @dataschool  4 ปีที่แล้ว

      Does this help? th-cam.com/video/bofaC0IckHo/w-d-xo.html

  • @robindong3802
    @robindong3802 6 ปีที่แล้ว

    Very clear and simple. great demo.

  • @AhmedThahir2002
    @AhmedThahir2002 2 ปีที่แล้ว

    For the multiple aggregation, is there a way to round the result to 2 decimal places

  • @Chandrikareddyy
    @Chandrikareddyy 7 ปีที่แล้ว

    can I perform standard deviation in the same way?

    • @dataschool
      @dataschool  7 ปีที่แล้ว

      Sure, there is a std() method.

  • @zulqurnainahmad1130
    @zulqurnainahmad1130 3 ปีที่แล้ว

    one quick question, do i need to verify that country fiedl type is a string or the groupBy works with Objects. i have a lot of fields in my dataset which are Objects and can't use the groupby. it that is the case then how to convert Object to string?

    • @ChartExplorers
      @ChartExplorers 3 ปีที่แล้ว

      Strings are objects in Pandas (so you don't need to convert objects to strings). Groupby will work with object/string columns.

  • @rohan2441139
    @rohan2441139 7 ปีที่แล้ว

    Hey! I was trying to run this snippet from a Notebook
    mean_tissue = {chunk.Taxon[0]:chunk.Tissue.mean() for chunk in data_chunks} # This part is returning a Key Error
    Can we do this using the groupby method or any fix to get this running?

    • @dataschool
      @dataschool  7 ปีที่แล้ว

      I'm sorry, it's hard for me to say without knowing a lot more about the data and data types. Good luck!

  • @mshparber
    @mshparber 4 ปีที่แล้ว +1

    Great! What about grouping by several columns?

    • @dataschool
      @dataschool  4 ปีที่แล้ว

      df.groupby(['col1', 'col2'])...

  • @RandomStuff-dw2bj
    @RandomStuff-dw2bj 3 ปีที่แล้ว

    how can we do different functions for multiple columns - mean for 2 columns and sum for other three columns? 2 question: how can we do linear regression for each group after group by ?

  • @majidm4215
    @majidm4215 4 ปีที่แล้ว

    you are awesome, I have never watched that many videos in one day Thank you so much

  • @rajpaul1501
    @rajpaul1501 5 ปีที่แล้ว

    You are simply awesome. Need more and more of your tutorials.

  • @hermanheunis9354
    @hermanheunis9354 4 ปีที่แล้ว

    Great Video! Brevity is the soul of wit. How would you call columns in your dataframe that has spaces and no underscores? Would it be a list value i.e. ['beer servings'] .. but would the list value syntax allow you to call columns as methods i.e. drinks.groupby('continents')['beer servings'].mean(). Just asking because in the workplace I found people do not add underscores or camel case to there headings.

    • @dataschool
      @dataschool  3 ปีที่แล้ว

      Thanks! This article might be helpful to you: www.dataschool.io/pandas-dot-notation-vs-brackets/

    • @hermanheunis9354
      @hermanheunis9354 3 ปีที่แล้ว

      @@dataschool Thank you!

  • @magelauditore333
    @magelauditore333 4 ปีที่แล้ว

    U r awesome man. Pls Pls make numpy series for data science, stats

  • @kwphysics
    @kwphysics 2 ปีที่แล้ว

    This was a useful video and I appreciate your explanation!

    • @dataschool
      @dataschool  2 ปีที่แล้ว +1

      Glad it was helpful!

  • @rin_645
    @rin_645 4 ปีที่แล้ว +2

    4:49 and dot aggggggggg~ hahaha 😂
    But seriously tho, what an absurdly good explanation of groupby function. I wish you could be a professor in my university🤪

    • @dataschool
      @dataschool  4 ปีที่แล้ว +1

      Thank you! 😄

  • @harijayaram
    @harijayaram 7 ปีที่แล้ว

    very clear example of groupby ...thanks

    • @dataschool
      @dataschool  7 ปีที่แล้ว

      You're very welcome!

  • @robertobaldo3128
    @robertobaldo3128 6 ปีที่แล้ว

    Thank you very much for your explanation!
    I am using the groupby function to sum another variable of the dataset daily. Something like:
    DaySize=IT.groupby('Date')['Size'].sum()
    I correctly am given back the sum of the "Size" column but only for the days where the sum is not zero. Is there a way to have back also the "zero size days" in order to manage them better later?
    Thanks in advance

    • @dataschool
      @dataschool  6 ปีที่แล้ว

      I'm not sure off-hand, sorry!

  • @mohammadabusitta9880
    @mohammadabusitta9880 6 ปีที่แล้ว

    great video
    I have a question >> how to specify two columns ?? beer_servings and wine_servings for example
    thanx in advance

    • @dataschool
      @dataschool  6 ปีที่แล้ว +1

      drinks[['beer_servings', 'wine_servings']]

  • @BillyNawa
    @BillyNawa 6 ปีที่แล้ว

    Hiya,
    The following code:
    topcommenters = pd.DataFrame(dfcom.groupby("author")["score"].sum().sort_values(ascending = False))
    topcommenters.head(5)
    Gives me a long error saying "TypeError: '

    • @dataschool
      @dataschool  6 ปีที่แล้ว

      It's hard to say, without more details. Good luck!

  • @andreazecchi812
    @andreazecchi812 3 ปีที่แล้ว +1

    that's a fantastic explaination! Thanks so much!

    • @dataschool
      @dataschool  3 ปีที่แล้ว +1

      You're very welcome!

  • @tarekkhalefa1836
    @tarekkhalefa1836 4 ปีที่แล้ว

    thanks for your simple explanation

  • @theodoretourneux5662
    @theodoretourneux5662 2 ปีที่แล้ว

    fantastic! exactly what I needed clearly explained and organized. thank you!

  • @hariaiyar1119
    @hariaiyar1119 4 ปีที่แล้ว +1

    Amazing explanation bro.

  • @kleczekr
    @kleczekr 8 ปีที่แล้ว

    Hi Kevin!
    Thank you again for the wonderful course -- all of your tutorials are really interesting.
    May I ask where the data about alcohol usage around the world is from?
    Take care!
    Rafal

    • @dataschool
      @dataschool  8 ปีที่แล้ว +1

      You're very welcome!
      All of the information about the datasets is here: github.com/justmarkham/pandas-videos#datasets

    • @kleczekr
      @kleczekr 8 ปีที่แล้ว

      Thank you!

  • @artistz1831
    @artistz1831 6 ปีที่แล้ว

    Thanks Kevin for this helpful video. One question here: how to export the groupby table and the plot from python?

    • @dataschool
      @dataschool  6 ปีที่แล้ว

      For the groupby, you could save it as a DataFrame and then use the to_csv method. For the plot, there is a savefig method. Hope that helps!

  • @elilavi7514
    @elilavi7514 8 ปีที่แล้ว

    Thanks a lot for the video . I like you series on pandas , every lesson I learn new things ! Sometimes with a large dataset , the groupby( ) could be a little bit heavy for certain server and slow . Does the groupby have any alternatives for a big datasets ?

    • @dataschool
      @dataschool  8 ปีที่แล้ว

      +Eli Lavi Thanks for all the great questions... keep them coming! :)
      I'm not aware of any alternatives, but I will keep your question in mind in case I come across a solution!

  • @adrianadominguezp.1553
    @adrianadominguezp.1553 6 ปีที่แล้ว

    Does this mean that these are equivalent? Assuming dplyr in R:
    ## Python
    drinks.groupby('continent').beer_servings.agg(['count', 'min', 'max', 'mean'])
    ## R
    drinks %>%
    group_by(continent) %>%
    summarise(count = n(), min = min(beer_servings), max = max(beer_servings), mean = mean(beer_servings))

    • @dataschool
      @dataschool  6 ปีที่แล้ว

      I don't keep up with dplyr any more, I'm sorry! Perhaps someone else can answer.