Most people would have simply demonstrated the short way, but by taking the time to elucidate what's happening behind the scenes, you're performing a far greater service. Great job.
Him: So here's how you filter out by certain values. Me: *Copies code* Gotacha, thanks. Him: But that's the long way and you'd never actually do that Me: Oh haha of course. *deletes code*
Thank you so much. I'm new to pandas as this was very helpful, I couldn't understand why it wasn't obvious how to filter rows. You explained it so clearly.
Awesome explanation! I have referred to some resources on pandas earlier and felt like i could easily skip your content to get through faster and grab only the unknown piece of code in the process . But the way you explain things it shows your hold on pandas and i am highly motivated to go through the full video. Cheers great job.
I like your in-depth explanation of _how_ it works, rather than a "recipe" of "just do it this way" without explaining _why_ it works. Now I'm off to see what other videos on pandas you have!
Excellent! I'm glad the depth of the explanation was helpful to you! My complete pandas playlist is here: th-cam.com/play/PL5-da3qGB5ICCsgW1MxlZ0Hq8LL5U3u9y.html
You are the best Pandas teacher on TH-cam. I saw your previous videos as well but this is by far the best. You made the whole concept so simple. Thank you very much
I loved how you showcased every possible way, from the heavy technique of using a for loop and then right to using operators. It made things absolutely clear. Thanks!!
Your videos really are terrifically helpful, explaining things thoroughly from base concepts upwards as you do is just brilliant and your explanations are always extremely clear and precise. Thank you so much for all the time you spend on this - I'm tremendously grateful as I'm sure are many others.
Thanks so much, I appreciate it! :) Right now, I only sell one course, called Machine Learning with Text in Python: www.dataschool.io/learn/ But if you want to hear about new courses that I release, just subscribe to my newsletter: www.dataschool.io/subscribe/
+Alessandro Sarretta You're welcome! I think you will find that understanding how it works in this case will help you to better understand lots of other pandas functionality!
At 5:30, what's the need for converting the booleans list to pandas series? Because we can get the results even without doing that by using this command > movies[booleans] . It gives the same result.
Great videos! I have been struggling with the ideas and concepts behind decision trees and ensembles. I hope these are topics that you will cover in the future and if not, I would really appreciate any resources to gain a deeper understanding of this topic. Thank you!
Thanks! Regarding decision trees and ensembles, I highly recommend chapter 8 of this book for a conceptual understanding: www-bcf.usc.edu/~gareth/ISL/ Here are videos related to that book: www.dataschool.io/15-hours-of-expert-machine-learning-videos/ For Python code and more resources, see classes 17 and 18 of my data science course: github.com/justmarkham/DAT8 Hope that helps!
I am in heaven right now. "You get to understand what things are when you know what it is they do." I really love the fact that you provide the "nuts and bolts" it helps me better understand the "how and the why."
I've learnt so much through watching this one video. Thank you. Now, I may need to re-write some my existing codes to make it more efficient and simple to comprehend. :-)
Good afternoon. "Long time listener, first time caller." I really like your videos and your teaching style. As others have noted, you explain things clearly and in digestible and understandable chunks, which I appreciate. I have a question and/or a request for Q&A video. I am comparing dataframes. I am using a merge statement and a "fulll outer join". This works well for identifying records from both dataframes that do not match each other. Going a step further, I'd like to identify the individual attributes (i.e., columns) that do not match. My use case is that I often compare extremely wide datasets (200+ columns) and it is sometimes difficult to find the "offending/differing" column(s). I have researched at various places online, and have yet to find a solution that truly fits my needs.
Thanks for your kind words! That's an interesting question, I'm not sure if I 100% understand. It would be super helpful if you could code up a simple example (just a few columns) of what you are currently doing and explain exactly what your goal is. Thanks!
Beautiful teaching technique. You purposefully did it the long way first. Wow!. So the condition that I always used in [ ] in pandas is actually just a Boolean Series whose length matches the length of the dataframe.
These videos are pure gold! I have a question though: The final point at 11:55 is that column selection after filtering is not the best thing and it may cause strange behavior in some cases. I was wondering what is that strange behavior and what are those cases? Thanks!
Glad the videos are helpful to you! Regarding your question, I demonstrate the problem (and the solution) in this video: th-cam.com/video/4R4WsDJ-KVc/w-d-xo.html
At 12:36, you can also do movies.loc[movies.duration >= 200]['genre'] because loc seems to allow you to select rows only without specifying columns and then you can select a column separately at the end. However, I will do movies.loc[movies.duration >= 200, 'genre'] because I think it's what @DataSchool showed. What about when you were getting all columns : Do we still prefer to use loc movies.loc[movies.duration >= 200] ? or is movies[movies.duration >= 200] just as good with no issues?
Thanks so much! Excellent step by step explanation of the concept and method! Your clear explanation shows how well you have those concepts and logic embedded in you brain 🙂 Must subscribe!
Maybe I am insane because the mispronunciation of Boolean (which should be "BOOL-EE-UN") made this video very hard to watch. :( Information is great. Too bad I'm such a freak. lol
Hey man, you are doing a great work! I have the following question, though: What if I want to filter the data frame by a column that contains a list (actors_list) but only if an exact element (string) is present in that list? Let's say "I want every movie (data frame row) in which Al Pacino plays" ? Thank you in advance and keep up the good work!
Best explanation I found so far. Thank you very much! But there is a tiny but very important bit missing: How can I filter by MORE then one criteria?????????
9:25, you also can just directly go to movies[booleans] because it turns out you can filter with a list of booleans. It gives the same answer. Any reason you would recommend against it? Ultimately, I would use movies[movies.duration >= 200 ] anyway so it does not matter but just saying.
@AchinGupta I completely agree. You break down the methods so they are very easy to understand. This is definitely helped by the fact that you have a solid grasp of pandas! Thanks again
This video was great! Thanks. I have a dataframe that I need to pair 2 rows of data, each have there own timestamp, drop the second timestamp and then filter the data with these pairs. I have alot of data to sort through, any help would be great
Very well explained! I was able to filter some data from a huge CSV file with your help! My first python program! =) import pandas as pd df = pd.read_csv('huge.csv',delimiter=";") chosen_ncm = 27129000 booleans = [] for ncm in df. CO_NCM: if ncm == chosen_ncm: booleans.append(True) else: booleans.append(False) filtered_ncm = pd.Series(booleans) out_csv = 'filter_result_' + str(chosen_ncm) + '.csv' df[filtered_ncm].to_csv(out_csv)
Most people would have simply demonstrated the short way, but by taking the time to elucidate what's happening behind the scenes, you're performing a far greater service. Great job.
Thanks so much for your thoughtful comment! I really appreciate your support.
Him: So here's how you filter out by certain values.
Me: *Copies code* Gotacha, thanks.
Him: But that's the long way and you'd never actually do that
Me: Oh haha of course. *deletes code*
Your voice is so soothing and makes the understanding of concepts much easier and enjoyable. Thank You
Thank you!
I have learned much from your channel. You're a natural for teaching.
Wow, thank you so much! I really appreciate your kind words and your contribution!! 🙏
You say just the right words to get the concept and everything behind with lucidity. Thank you. i am a beginner but do not feel like so. :)
Wow, what a nice thing to say! :)
I watch you at 1.5x :-D but like he said even a year later. Nice way to teach. Thank you.
you deserve my tuition money for that
th-cam.com/video/vlf4Pn9nMug/w-d-xo.html
Achin Gupta:
my words. This guy is awesome
You are the only man I can understand on TH-cam about Data Science.
Ha! Thank you :)
Everything you're doing here for us, it's gonna go back to you in a certain way in the future! Thank You a lot for this master piece!!
You are too kind, thank you! 🙏
3 years later and this dude is still a legend and still saving my ass on basic stuff for work!
Ha! That's great to hear 😎
Despite the fact that English is not my native language, these are the best lessons about the Pandas that I saw! Thank you!
It's great to hear that my videos have been helpful to you! :)
When great teaching meets simplicity. Thanks so much!
Thank you so much! 🙏
your way of teaching is awesome!!!!! Thanks bro
Thank you so much! 🙌
It's imperative that you know the logic behind these codes. Thank you Mark for this amazing playlist.
My pleasure!
Thank you so much. I'm new to pandas as this was very helpful, I couldn't understand why it wasn't obvious how to filter rows.
You explained it so clearly.
Awesome explanation! I have referred to some resources on pandas earlier and felt like i could easily skip your content to get through faster and grab only the unknown piece of code in the process . But the way you explain things it shows your hold on pandas and i am highly motivated to go through the full video. Cheers great job.
There is something charming in how you say "Okay" when you finish a thought :) The video is very helpful. Thanks for putting this up here.
Ha! Glad my verbal tics are charming :)
@@dataschool
Kevin.verbal_function.finish_thought
is this right?
The explanations are really well conceived and you speak so clearly that even the youtube captioner can capture it, subscribed!
Awesome! Thanks for subscribing, and thanks for your kind comment!
It's was THAT simple, yeah? I wasted two days trying to figure it out myself. Life saver you are. Thanks for the wonderful vdo.
You're welcome!
Thank you. I believe by showing us the theory behind it, you're helping us remember the method for a far longer period
You’re very welcome!
you are a great tutor, taking the long road and then giving the short answer really teaches the concept very well. Thank you sooo much
You're very welcome!
I like your in-depth explanation of _how_ it works, rather than a "recipe" of "just do it this way" without explaining _why_ it works. Now I'm off to see what other videos on pandas you have!
Excellent! I'm glad the depth of the explanation was helpful to you!
My complete pandas playlist is here: th-cam.com/play/PL5-da3qGB5ICCsgW1MxlZ0Hq8LL5U3u9y.html
You are the best Pandas teacher on TH-cam. I saw your previous videos as well but this is by far the best. You made the whole concept so simple. Thank you very much
Wow! Thank you so much for your kind words! :)
Love these, helped with something I was stuck on for 2 days. I'm a beginner, loads of love
Great to hear!
I loved how you showcased every possible way, from the heavy technique of using a for loop and then right to using operators. It made things absolutely clear. Thanks!!
You're very welcome!
Very clear explanation and make me understand how the logic works behind the filter. Great thanks.
You're very welcome!
Worth watching the whole video but if you are just looking for the shortcut: 9:21
+Gian Carlo Martinelli Great idea to link directly to it... thanks!
Thanks. I was hoping he would first go straight to the point and then explain afterwords.
th-cam.com/video/vlf4Pn9nMug/w-d-xo.html
Best way to explain, simple and accurate. Thanks
Thank you!
You are an amazing presenter. Knowledgeable, fluent, crystal clear. Much respect. Let me know how i can give something back to you.
Thank you so much! If you want to support me, I'd love for you to join my community of "Data School Insiders" on Patreon: www.patreon.com/dataschool
Can't believe you do the entire video with so little edits. Amazing!
It requires a lot of planning... plus it helps that I love teaching this subject and I know it well!
Such a pretty explanation. Now I discovered the 'Why' i use the bracket. Thanks!
You're welcome!
Your videos really are terrifically helpful, explaining things thoroughly from base concepts upwards as you do is just brilliant and your explanations are always extremely clear and precise. Thank you so much for all the time you spend on this - I'm tremendously grateful as I'm sure are many others.
Wow, thank you so much for your incredibly kind words! I'm very glad to hear that you have gotten a lot out of the series.
Man. You helped million times more than pandas documentary..
Great to hear! :)
Thank you. Great content is still great, even 5 years later!
Thanks very much for your kind words!
a big thanks for putting an end to my last 3 hours' confused state.
Great to hear!! You are very welcome!
You're an EXCELLENT TEACHER. Thanks a heap!
Thank you! 😃
thanks , I learnt more from your videos than other online pandas resources combine
You're very welcome!
All the concepts very well explained. Your videos on pandas tutorial are just great !!
Thanks! :)
he put step by step all the stones in my brain and set the wall! :D
Simply great !
I love the metaphor... thanks for your kind words! :)
;)
Bro, you are just great teacher! Please keep going what you are already doing. I'll buy whatever you produce!
God bless you! :)
Thanks so much, I appreciate it! :)
Right now, I only sell one course, called Machine Learning with Text in Python: www.dataschool.io/learn/
But if you want to hear about new courses that I release, just subscribe to my newsletter: www.dataschool.io/subscribe/
You are very good teacher! Congrats!
Thanks!
Bro... Thanks for your great tutorials. I am beginner. From your videos i could understand the correct concept. Thank you very much.
Great to hear!
amazing explanation. thanks for your time and patience to educate all.
Glad it was helpful to you!
Very nice tutorials. Easy to understand and to the point. Please prepare more for Data analysis.
Thanks for your kind words, and for the suggestion!
Sir,your content is the best for pandas. Thanks a Lot
Expertly explained. Well done
Thank you!
Very nice thurrough explanation! 🙏 thanks
Thank you!
you are really a great teacher. It was so easy for me to understand pandas. Thank you so very much.
You're very welcome!
pandasDoc < dataSchool = True
Ha! Thanks :)
Explained with brilliance! Congrats.
Thank you!
Thanks, great step by step learning resource!
I already knew the quick answer, but not the reason why it was working :-) Very informative!
+Alessandro Sarretta You're welcome! I think you will find that understanding how it works in this case will help you to better understand lots of other pandas functionality!
The best ever Python site! Many thanks for time and effort !!
You're very welcome!
At 5:30, what's the need for converting the booleans list to pandas series? Because we can get the results even without doing that by using this command > movies[booleans] . It gives the same result.
I can't remember... maybe when I recorded the video, the filtering required a boolean Series rather than a list of booleans.
movies[movies.duration>=200]
Thanks Molly!
how can i do this by removing NaN
Love this video !! Exactly the one I was looking for. THANK YOU !... well explained..
You're very welcome! :)
Saved me so much time. Thanks for this !
Great to hear!
Another fantastic video, Im so used to the 8-9 min vidoes now , 13+mins was a bit intense :)
Thank you so much!
This is very helpful! Your accent is so clear and it's very easy to understand you. Thanks!
Thanks!
Great videos!
I have been struggling with the ideas and concepts behind decision trees and ensembles. I hope these are topics that you will cover in the future and if not, I would really appreciate any resources to gain a deeper understanding of this topic. Thank you!
Thanks! Regarding decision trees and ensembles, I highly recommend chapter 8 of this book for a conceptual understanding: www-bcf.usc.edu/~gareth/ISL/
Here are videos related to that book: www.dataschool.io/15-hours-of-expert-machine-learning-videos/
For Python code and more resources, see classes 17 and 18 of my data science course: github.com/justmarkham/DAT8
Hope that helps!
Your explanations are so clear. Thanks
Many thanks for your tutorials. They are powerful. You make complex things easier. Thank you for your work.
Thank so much for your kind comment! I'm glad they are helpful to you!
I am in heaven right now. "You get to understand what things are when you know what it is they do." I really love the fact that you provide the "nuts and bolts" it helps me better understand the "how and the why."
It's really nice to hear that I've helped you gain some insight on this!
Awesome !! Your explanation is very clear and easy to follow. Keep up the good work !
Thanks for your kind words!
Great teacher, practically my mentor haha
Ha! :)
I've learnt so much through watching this one video. Thank you. Now, I may need to re-write some my existing codes to make it more efficient and simple to comprehend. :-)
Great to hear!
Good afternoon. "Long time listener, first time caller." I really like your videos and your teaching style. As others have noted, you explain things clearly and in digestible and understandable chunks, which I appreciate. I have a question and/or a request for Q&A video. I am comparing dataframes. I am using a merge statement and a "fulll outer join". This works well for identifying records from both dataframes that do not match each other. Going a step further, I'd like to identify the individual attributes (i.e., columns) that do not match. My use case is that I often compare extremely wide datasets (200+ columns) and it is sometimes difficult to find the "offending/differing" column(s). I have researched at various places online, and have yet to find a solution that truly fits my needs.
Thanks for your kind words! That's an interesting question, I'm not sure if I 100% understand. It would be super helpful if you could code up a simple example (just a few columns) of what you are currently doing and explain exactly what your goal is. Thanks!
You're an expert!!! Thank you so much!
Thank you!
Thanks !! I was looking for something very specific and 2 of your videos was just right on point ... just subscribed !!
Awesome! Thanks for subscribing :)
Lots of love from India 🙏🙏
you have done a great job sir
thank you.
You're welcome!
Lifesaver, i can't thank you enough!!!
You're very welcome!
I felt like my dog could understand this. Thank you
Beautiful teaching technique. You purposefully did it the long way first. Wow!. So the condition that I always used in [ ] in pandas is actually just a Boolean Series whose length matches the length of the dataframe.
Exactly! :)
These videos are pure gold!
I have a question though: The final point at 11:55 is that column selection after filtering is not the best thing and it may cause strange behavior in some cases. I was wondering what is that strange behavior and what are those cases? Thanks!
Glad the videos are helpful to you!
Regarding your question, I demonstrate the problem (and the solution) in this video: th-cam.com/video/4R4WsDJ-KVc/w-d-xo.html
You have outstanding skills to teach. Thank you.
You're welcome!
you can use a list comprehension instead of the traditional for loop at 4:00
booleans = [length>199 for length in movies.duration]
Yes, that's another option.
At 12:36, you can also do movies.loc[movies.duration >= 200]['genre'] because loc seems to allow you to select rows only without specifying columns and then you can select a column separately at the end. However, I will do movies.loc[movies.duration >= 200, 'genre'] because I think it's what @DataSchool showed. What about when you were getting all columns : Do we still prefer to use loc movies.loc[movies.duration >= 200] ? or is movies[movies.duration >= 200] just as good with no issues?
Either option is fine!
Thanks so much! Excellent step by step explanation of the concept and method! Your clear explanation shows how well you have those concepts and logic embedded in you brain 🙂 Must subscribe!
Thank you so much, Murad! I really appreciate your kind words, and also for joining as a channel member! 🙏
Your series is terrific, only one issue, as has been mentioned before in these comments:
"boolean" has 3 syllables.
Glad you like the videos!
Thank you very much sir for all your videos. its really a great experience and help to learn many things.
Thanks!
Like your video. Very detailed and clear step-by-step explanation!
Thanks!
Boolean is three syllables. Thanks for your videos!
You are very welcome!
Danke!
Thank you so much, I truly appreciate it! 🙏
Great job ! I like the sensation when you feel more smarter because you understand something clearly ;-) #FromFrance
Ha! Great to hear :)
thanks. was great help
1 points if u can tell:
how to use multiple condition in the the data frame and select multiple columns of the same date frame
BOO-lee-an
These are great though.
Glad you like the series!
This was driving me a bit nuts too, but you're not quite right either. Named after George Boole, it's pronounced "Boole - ee - an"
Maybe I am insane because the mispronunciation of Boolean (which should be "BOOL-EE-UN") made this video very hard to watch. :( Information is great. Too bad I'm such a freak. lol
Hey man, you are doing a great work!
I have the following question, though:
What if I want to filter the data frame by a column that contains a list (actors_list) but only if an exact element (string) is present in that list?
Let's say "I want every movie (data frame row) in which Al Pacino plays" ?
Thank you in advance and keep up the good work!
Thanks! I think this will help you: th-cam.com/video/bofaC0IckHo/w-d-xo.html
You are an amazing teacher.... So clear and so succinct
Thank you!
You just saved me a few hours reading a boring(and expensive) book on Python
Ha! That's great to hear!
Best explanation I found so far. Thank you very much!
But there is a tiny but very important bit missing: How can I filter by MORE then one criteria?????????
have found it:
www.dataschool.io/python-pandas-tips-and-tricks/#filteringrowsbycondition
Thanks :-)
See this video: th-cam.com/video/YPItfQ87qjM/w-d-xo.html
9:25, you also can just directly go to movies[booleans] because it turns out you can filter with a list of booleans. It gives the same answer. Any reason you would recommend against it? Ultimately, I would use movies[movies.duration >= 200 ] anyway so it does not matter but just saying.
I created the booleans object for educational value only, so no, I would never recommend actually using it.
Never knew loc is a powerful filter tool!
A really understandable and good explanation!
Glad it was helpful!
Great videos and very clear explanations, thank you!
Thanks!
@AchinGupta I completely agree. You break down the methods so they are very easy to understand. This is definitely helped by the fact that you have a solid grasp of pandas! Thanks again
Thanks for your kind comment! :)
This video was great! Thanks. I have a dataframe that I need to pair 2 rows of data, each have there own timestamp, drop the second timestamp and then filter the data with these pairs. I have alot of data to sort through, any help would be great
Really great walkthrough, thank you!
My pleasure!
what a teacher, thanks.
Thanks!
This was such a good explanation
Thank you!
Awesome video, loved the intuitive explanation!
Great to hear!
The best of the best
Thanks!
loc helped a lot.
Very well explained! I was able to filter some data from a huge CSV file with your help! My first python program! =)
import pandas as pd
df = pd.read_csv('huge.csv',delimiter=";")
chosen_ncm = 27129000
booleans = []
for ncm in df. CO_NCM:
if ncm == chosen_ncm:
booleans.append(True)
else:
booleans.append(False)
filtered_ncm = pd.Series(booleans)
out_csv = 'filter_result_' + str(chosen_ncm) + '.csv'
df[filtered_ncm].to_csv(out_csv)
Great explanation as always
Thanks!