I'd like to thank the author. You really do a great job. Everything is structured, decomposed and coherent. Some guys just jump in complex coding without really explaining what's going on there.
I just wanted to thank you for such a great explanation of joins. I did not have it explained to me and struggled for the longest time to understand them. It takes a good teacher and someone who can understand it simply for one to understand it. Seriously, you are amazing!!
Your videos are fantastic. I really appreciate the simple "common sense" approach to the teaching. It is quite easy for instructors to dive right into python lingo.
The best new feature with merge is the validate option to make sure your join is 1:1, 1:M, etc. This is very useful for machine learning projects or end user reports that rely on upstream data that is updated regularly. It's saved me headaches a few times.
Thanks for the videos Kevin! I love your teaching style and how you make each concept so crystal clear. Please keep making these videos! Just signed up to become a patron of yours and am taking your course on Data Camp (I wish you taught more courses on there!) Once I master Pandas will try out your machine learning course too :) ps your son is so adorable
This is 1st time i walked into your video and i am very much impressed by your explaination and your english speaking pace is perfect. loved your content. Thanks a lot. :)
Your videos are always amazing. You are a national treasure in my book. Don't change a thing, but for viewer 1.75 speed is the speed to watch these in.
I'm not able to read that file "u.item" , I copied the same code from GitHub but pandas wasn't able to read that. It showed me Unicode Error... How do I solve that issue..
Brilliant Stuff, All videos are awesome. Clearly explained all fundamentals...Thanks for making this stuff easy. On a different line, you remind me of "Sheldon" from the TV series The Big bang theory and this is a compliment. :)
hi, when I used pd.concat([df1,df2]), I got a tuple object instead of a dataframe object. I am using Python 3.9 environment. I would like to know what should I do to get a dataframe object rather than a tuple object?
I really enjoy your tutorials, thanks so much! I have 5 csv files that come out daily each containing a date column. i want to merge them all using the date as the merge field. i tried a basic merge with 2 of the csv files and date was used as the merge-on field by default - so it worked. ultimately i just need one date column in my masterfile with all the other column data merged. should I continue to do this or is it better to set the date column as the index, or something else?
Hello Data school,I need to convert below dataframe into datetime dtypes period 0 28.02.2020 10:32:17:640 1 28.02.2020 10:32:18:656 2 28.02.2020 10:32:19:656 3 28.02.2020 10:32:20:671 4 28.02.2020 10:32:21:687 5 28.02.2020 10:32:22:687 6 28.02.2020 10:32:23:703 df['period'] = pd.to_datetime(df['period']) i used above code but it is throwing error ValueError: ('Unknown string format:', '28.02.2020 10:32:17:640') how do i go ahead..?
Great video. However I have a little issue. I have 3 data frames that I am trying to merge together. The first is a pretty long database with columns (cust_id, gained_on gained_from_supplier, lost_to_supplier, sales_channel_id) the second is the supplier data frame (supplier_name, supplier_id) what I am trying to do is merge the supplier id and name from the second data frame, to the database frame which has the ID so supplier id to the number using the lefton/right on but instead it returns both columns - the supplier ID and name of both dataframes. Then the same with the channel data frame (sales_channel_name, sales_channel_id) and merge this with the sales_channel_id in the database dataframe and show the name instead. Any help would be appreciated, thank you!
Hi Kevin , First of all thanks for the wonderful lecturer , I am facing a problem to merge two data frames which i have shown you below .. Data frame 1: BackupServer BackupDay StartDate ClientName BackupStatus Backup re-run(Y/N) Incident Reason for the Backup Failures Backup Final Outcome RGSIBAK004 01-05-2020 2020-04-30 06:40:29 RGBPLNM110 Completed NaN NaN NaN NaN RGSIBAK004 01-05-2020 2020-04-30 06:53:07 RGPIAPP037 Completed NaN NaN NaN NaN RGSIBAK004 01-05-2020 2020-04-30 15:32:38 RGPIISD001 Failed Yes IN893523 VM disconnected Failed RGSIBAK004 01-05-2020 2020-04-30 18:00:08 RGPPFTP005 Completed NaN NaN NaN NaN RGSIBAK004 01-05-2020 2020-04-30 18:00:02 RGPQWEB069 Completed NaN NaN NaN NaN Data Frame 2 : BackupServer BackupDay StartDate Client Name Backup Status Backup Rerun (Y/N) Incident Failures Backup Final Result RGPIAUN003.FDNET.COM 2020-05-01 Thu Apr 30 21:00:03 EDT 2020 rgpqbda112.fdnet.com Activity completed successfully. NaN NaN NaN NaN RGPIAUN003.FDNET.COM 2020-05-01 Thu Apr 30 21:00:03 EDT 2020 rgpppcc051.fdnet.com Activity completed successfully. NaN NaN NaN NaN RGPIAUN003.FDNET.COM 2020-05-01 Thu Apr 30 21:00:03 EDT 2020 rgpppcc050.fdnet.com Activity completed successfully. NaN NaN NaN NaN RGPIAUN003.FDNET.COM 2020-05-01 Thu Apr 30 21:00:03 EDT 2020 rgpppcc011.fdnet.com Activity completed successfully. NaN NaN NaN NaN RGPIAUN003.FDNET.COM 2020-05-01 Thu Apr 30 21:00:03 EDT 2020 rgpdbda105.fdnet.com Activity completed successfully. NaN NaN NaN NaN Although the two data frames have three column names "Backupserver" , "Backupday" and start date ...the content in the columns is different and i am not able to merge these two data frames into one ? Can you help me on this?
Hi kevin how are you doing, is there any way using pandas or another library for conditional merging?, if I want to choose from two data Thank you very much
I mean if we have two different tables has same numbers of columns and We want to merg them but, not all data only the rows of data we want using condisonal formulas
You are a great instructor. I have learned a lot from you regarding pandas. The video with title "How do I merge DataFrames in pandas?" has left some queries in my mind. I would be thankful to you if you clear those too. What type of join is used here movie_ratings = pd.merge(movies , ratings)? if it is inner join it should result in 1682 rows in total in movie_ratings dataframe, as movies dataframe has 1682 rows. But in video i have observed that movie_ratings results in 100,000 rows of data.
You're great teacher! I see the despite having a large 100K row file, the number of rows do not get expanded after the merge. They beautifully stay the same and just add the movie titles to the reviews. Can you comment on why this is not always the case. I have tried and my output file gets expanded by a few rows (17 out of 1000) and I have not been able to figure out why. I have checked multiple videos and some come absurd not practical solutions (like the files are the same size) or arbitrarily eliminate any dups (despite some may be valid rows), but none explain the reason and how to identify those rows that could be dups. Your comments are appreciated.
I have a data frame and I have a list and a tuple , I want to merge all three together . I am aware merge can only do two tables at a time, but do you have any helpful hints on how to go about merged the table , list and df. I want make to make the result a new data frame
I have the source file and target file. so in that, I have to compare 140 columns and show the result if it matches or not. for example, there is a column as Country1 in source and in target as Country2. to compare that i will use if(source['country1]==target['country2])return True else return false. to compare 140+ columns it will take time to compare 140 columns. and in both of the file columns are not in ordered. so how can I solve this?
slowly talk is very helpfull to me. I have 2 questions. The first is : What's if i want merge only one certain column (rating) from df rating to df movie . The second: What's if I want to sum the rate of each Movie_Id . Tks you so much and looking for your answer.
How to merge multiple large dataframes in a fast way? I joined with usual merge() but it seems too slow. I found a clue of using pandas.Index() with the merge method, but i don't know how to use it.
hi bro, I am currently working in a project. The mentors says that use foreign keys and primary keys in pandas and create table with the keys. so my question is, the usage of foreign and primary keys in pandas is possible or if we can't what shall I do to merge the two tables contains the same column which we are doing in the MYSQL coding. Thank you.
Thanks for the video. I was able to successfully meagre and find some errors from Ids I did not find using VBA vlookups. I was curious. Is there a way to highlight difference between columns in this merged database. example: Number of Vehicles_SS: 7 vs Number of Vehicles_SA: 2 and it would highlight the row, or even just those those values, base on the ID it was merged on? I am having a hard time find this. Trying to get rid of VBA, which i have doing this, But it is SUPER slow with the data I have to process.
Sir if we have hundreds of columns without the name. Then how can we name them using pandas and a for loop or lambda function because if we try to name them using names=[] it will be a very time-consuming process. The name of the columns can be col1, col2 , col3...etc.
Thanks for the video, I have a query sir, Let's consider if I have a table 1 with features (order Id) and (product Id) and table 2 with features (order Id) and (product Id).How to fetch the observations which is present in table 1 that not present in table 2
above logic is beautifully explained, hi kevin, i have a question if you could please reply, I have three csv files csv1(20000 rows), csv2(20000 rows),cvs3(20000 rows), i want to merge these files into single data frame without losing a single record? Like i want to read these files into a one data frame that should have 60000 rows ideally. P.S: All the files have same columns (PostID, time, tweetURL, Content, RetweetNum , LikeNum, CommentsNum, Verified, Following, Follower). And in the resulting data frame i want to have all these columns at once as heading and want all 60000 rows. Is it possible ? kevin i will wait for your reply man, i know this post is old, maybe your read my question. THANK YOU
what is the support column in sklearn's classification_report represent and what parameters can adjust it, I'm struggling on an highly imbalanced dataset, smoted it but this metric 'support' is off showing highly imbalanced!
Hi wanted to ask how you check for data consistency in columns. Like checking for integers in a string column or trying to find values like 2A in a column with double letter values eg. AA, BB etc
Great question, though there's no "one way" to catch all of these issues! Here are some tricks that might be helpful, though: th-cam.com/video/RlIiVeig3hc/w-d-xo.html
Lets say a pandas df and mysql have column A, B, C and same schema, Column A in SQL is the primary key. now how to upsert a pandas df to mysql table? When primary key conflicts, then update the remaining columns, when doesn't conflict/exists, then do an Insert Into.. Whats the most efficient way to do this?
Hi Kevin, I have a troublesome Question Here I am analyzing a dataset which is totally textual. I want to assign Grading for certain text in a column by appending a new column of Grading to each existing column. I have achieved it using a for loop but I can't save the dataframe created because the for loop overwrites the created it. I need help. Code of for loop for (ColumnName,ColumnData) in b_questions.iteritems(): b_questions['Grading'] = b_questions[ColumnName].map({'Consistently Good':4,'Outstanding':5,'Satisfactory':3}) data = b_questions.loc[:,[ColumnName,'Grading']] print(data)
If I'm understanding your question, I think you just need to run this one line of code: b_questions['Grading'] = b_questions['Insert column name here'].map({'Consistently Good':4,'Outstanding':5,'Satisfactory':3}) Hope that helps!
Great work as allways. Very useful. Thanks for sharing it! By the way, any chance you get some video done about PySpark? It will be very usefull to treat this from the biginning considering examples based on a local connection (one computer) first and then a couple of examples emulating a cluster connection.
How do I merge df1 and df2 by two columns (fiels) at clausula on? For example: dfUltStatus = pd.merge(dfUltStatus, dfDescStatus, on=['CODIGO_STATUS','SUB_CODIGO_STATUS'], how = 'left') The object is merge the two data frames through these two fields to bring the description field.
Hi, can you please post video on realtime large csv file having millions of rows using chunks or modin and how to merge those chunks after importing in Pandas.
Thanks for your suggestion! FYI, if your computer does not have enough RAM to load a large DataFrame into memory, reading the DataFrame in chunks will not solve that problem. It will be just as large once you merge the chunks back together (which you can do using the "concat" function.)
Hi Kevin, thanks to your turtoring, I learn a lot from your channel, it's amazing! Since I just learn Pandas, I'm a little bit confused about concat(), melt(), merge(), pivot(), stack()...They're really annoying to me >< I really hope we have a one for all solution of how to use these functions XD Thank you!
You are really good at explaining things. One of the better teachers on youtube. Thanks a ton for this video and I hope there's more coming.
Thank you!
I'd like to thank the author. You really do a great job. Everything is structured, decomposed and coherent. Some guys just jump in complex coding without really explaining what's going on there.
I just wanted to thank you for such a great explanation of joins. I did not have it explained to me and struggled for the longest time to understand them. It takes a good teacher and someone who can understand it simply for one to understand it. Seriously, you are amazing!!
Thank you so much! 🙏
This is one of the best ever videos on pandas functions that I have watched. Well done Data School. I will look forward to more such videos.
Thank you so much! 🙏
This, by far, is the best explanation of these concepts. Thanks for sharing.
Wow, thank you so much for your kind words! 🙏
Thanks, Kevin.. this is the clearest explanation of the merge I have seen.
Thank you so much!
Your videos are fantastic. I really appreciate the simple "common sense" approach to the teaching. It is quite easy for instructors to dive right into python lingo.
Thank you so much for your kind words!
The best new feature with merge is the validate option to make sure your join is 1:1, 1:M, etc. This is very useful for machine learning projects or end user reports that rely on upstream data that is updated regularly. It's saved me headaches a few times.
The "validate" option is great, I agree! I also like "indicator", which I explained here: twitter.com/justmarkham/status/1153653794829418496
Thanks for the videos Kevin! I love your teaching style and how you make each concept so crystal clear. Please keep making these videos! Just signed up to become a patron of yours and am taking your course on Data Camp (I wish you taught more courses on there!) Once I master Pandas will try out your machine learning course too :) ps your son is so adorable
You are too kind, Summer! Thank you SO much for your kind words AND for becoming a patron! 🙌
This is 1st time i walked into your video and i am very much impressed by your explaination and your english speaking pace is perfect. loved your content. Thanks a lot. :)
Thanks so much for your kind words!
Your videos are always amazing. You are a national treasure in my book. Don't change a thing, but for viewer 1.75 speed is the speed to watch these in.
Thank you! 🙌
Supercool. Very impressive how you manage to explain the pretty complicated functionality of merge. Thanks.
Glad it was helpful!
I'm not able to read that file "u.item" , I copied the same code from GitHub but pandas wasn't able to read that. It showed me Unicode Error... How do I solve that issue..
insert this encoding='latin-1' and you will be fine
Great as teacher, calm, taking your time to clearly explain fundamentals!
Thanks so much for your kind words, I truly appreciate it!
That was honestly really good! thank you so much for your work
Excellent way of teaching. Thanks Kevin
Glad it was helpful! 🙌
This was a well-paced, clear and complete explanation of the topic, thank you very much! It helped me a lot
That's awesome to hear!
Thanks Kevin I have been looking for this for long time!
Awesome! I'm so glad to hear this is the video you needed! 🙌
Nice teaching method. precision over pace.
Glad it was helpful!
Dude! Let me tell you, you saved me a lot of time and work! Thank you so much!
Great to hear!
Kevin you are a super hero of Data science, best videos on tube...
Thank you!
Thanks! very nicely explained. Now, I can perform joins using Pandas, quite effortlessly.
Glad it helped!
@@dataschool Yeah! beside books, I follow you, especially for Pandas. Great help. Thanx...
You're welcome!
Thank you so much Kevin, your neat explanation along with the file you share makes it so clear, was really needing it!
Please keep making these videos! You are awesome!
Thank you!
Brilliant Stuff, All videos are awesome. Clearly explained all fundamentals...Thanks for making this stuff easy.
On a different line, you remind me of "Sheldon" from the TV series The Big bang theory and this is a compliment. :)
Ha! So many people have said that 😄
hi, when I used pd.concat([df1,df2]), I got a tuple object instead of a dataframe object. I am using Python 3.9 environment. I would like to know what should I do to get a dataframe object rather than a tuple object?
Finally a clear explaination of merge function !! Thanks, subscribed
Excellently explained as always. Keep up the great work!!
Thank you!
I really enjoy your tutorials, thanks so much! I have 5 csv files that come out daily each containing a date column. i want to merge them all using the date as the merge field. i tried a basic merge with 2 of the csv files and date was used as the merge-on field by default - so it worked. ultimately i just need one date column in my masterfile with all the other column data merged. should I continue to do this or is it better to set the date column as the index, or something else?
wonderfull, loved your slow passed english, that helped me a lot
Glad it helped!
This video is pure GOLD, absolutely wonderful, loved the clear explanations , thank you...
So glad to hear it was helpful to you! 🙌
Were is the link of the data set which has been used in this video.
I want to practice this with your data set can you please send me link?
You're an amazing teacher. Thanks a lot for these.
Thank you! 😃
Great to see you again as well as your high-quality content in your video
Thanks so much for your kind words! 😄
You are an excellent teacher!!! I'm a fan. TY.
Hello Data school,I need to convert below dataframe into datetime dtypes
period
0 28.02.2020 10:32:17:640
1 28.02.2020 10:32:18:656
2 28.02.2020 10:32:19:656
3 28.02.2020 10:32:20:671
4 28.02.2020 10:32:21:687
5 28.02.2020 10:32:22:687
6 28.02.2020 10:32:23:703
df['period'] = pd.to_datetime(df['period'])
i used above code but it is throwing error ValueError: ('Unknown string format:', '28.02.2020 10:32:17:640')
how do i go ahead..?
Not sure, sorry!
Very clear and informative. Thank you very much.
You're very welcome!
Great video. However I have a little issue. I have 3 data frames that I am trying to merge together. The first is a pretty long database with columns (cust_id, gained_on gained_from_supplier, lost_to_supplier, sales_channel_id) the second is the supplier data frame (supplier_name, supplier_id) what I am trying to do is merge the supplier id and name from the second data frame, to the database frame which has the ID so supplier id to the number using the lefton/right on but instead it returns both columns - the supplier ID and name of both dataframes. Then the same with the channel data frame (sales_channel_name, sales_channel_id) and merge this with the sales_channel_id in the database dataframe and show the name instead. Any help would be appreciated, thank you!
Very easy to follow, and thanks for making very useful video!
Thank you!
Hi Kevin , First of all thanks for the wonderful lecturer , I am facing a problem to merge two data frames which i have shown you below ..
Data frame 1:
BackupServer BackupDay StartDate ClientName BackupStatus Backup re-run(Y/N) Incident Reason for the Backup Failures Backup Final Outcome
RGSIBAK004 01-05-2020 2020-04-30 06:40:29 RGBPLNM110 Completed NaN NaN NaN NaN
RGSIBAK004 01-05-2020 2020-04-30 06:53:07 RGPIAPP037 Completed NaN NaN NaN NaN
RGSIBAK004 01-05-2020 2020-04-30 15:32:38 RGPIISD001 Failed Yes IN893523 VM disconnected Failed
RGSIBAK004 01-05-2020 2020-04-30 18:00:08 RGPPFTP005 Completed NaN NaN NaN NaN
RGSIBAK004 01-05-2020 2020-04-30 18:00:02 RGPQWEB069 Completed NaN NaN NaN NaN
Data Frame 2 :
BackupServer BackupDay StartDate Client Name Backup Status Backup Rerun (Y/N) Incident Failures Backup Final Result
RGPIAUN003.FDNET.COM 2020-05-01 Thu Apr 30 21:00:03 EDT 2020 rgpqbda112.fdnet.com Activity completed successfully. NaN NaN NaN NaN
RGPIAUN003.FDNET.COM 2020-05-01 Thu Apr 30 21:00:03 EDT 2020 rgpppcc051.fdnet.com Activity completed successfully. NaN NaN NaN NaN
RGPIAUN003.FDNET.COM 2020-05-01 Thu Apr 30 21:00:03 EDT 2020 rgpppcc050.fdnet.com Activity completed successfully. NaN NaN NaN NaN
RGPIAUN003.FDNET.COM 2020-05-01 Thu Apr 30 21:00:03 EDT 2020 rgpppcc011.fdnet.com Activity completed successfully. NaN NaN NaN NaN
RGPIAUN003.FDNET.COM 2020-05-01 Thu Apr 30 21:00:03 EDT 2020 rgpdbda105.fdnet.com Activity completed successfully. NaN NaN NaN NaN
Although the two data frames have three column names "Backupserver" , "Backupday" and start date ...the content in the columns is different and i am not able to merge these two data frames into one ? Can you help me on this?
Hi kevin how are you doing, is there any way using pandas or another library for conditional merging?, if I want to choose from two data Thank you very much
Could you describe in more detail what you mean by "conditional merging"? Thanks!
I mean if we have two different tables has same numbers of columns and We want to merg them but, not all data only the rows of data we want using condisonal formulas
You should perform the operation in two steps: first do the filter, and then do the merge.
This video was very helpful and clear. Thank you for this content.
You're welcome!
You are doing a really great job with this. Thank you so much! :)
Thanks!
You are a great instructor. I have learned a lot from you regarding pandas.
The video with title "How do I merge DataFrames in pandas?" has left some queries in my mind. I would be thankful to you if you clear those too.
What type of join is used here movie_ratings = pd.merge(movies , ratings)?
if it is inner join it should result in 1682 rows in total in movie_ratings dataframe, as movies dataframe has 1682 rows. But in video i have observed that movie_ratings results in 100,000 rows of data.
What would we use to show ONLY all the values that do not match ? .... i.e. anything other that inner join
You're great teacher! I see the despite having a large 100K row file, the number of rows do not get expanded after the merge. They beautifully stay the same and just add the movie titles to the reviews. Can you comment on why this is not always the case. I have tried and my output file gets expanded by a few rows (17 out of 1000) and I have not been able to figure out why. I have checked multiple videos and some come absurd not practical solutions (like the files are the same size) or arbitrarily eliminate any dups (despite some may be valid rows), but none explain the reason and how to identify those rows that could be dups. Your comments are appreciated.
How to merge two dataframes based on 4 common columns with repatative elements?
Good to see you. I love the logic you teach.
Thank you! Glad my videos are helpful to you 👍
Plesae i am trying to merge two datasets as you have explained but it is giving an error that i should check for duplicates
Excelent video! keep sharing content like this. Greetings from Argentina
Thanks!
Wow, I've seen some of your videos and I just can say THANK YOU. It's so easy to understand you :3
Thanks for your kind words! Glad you like my videos!
I have a data frame and I have a list and a tuple , I want to merge all three together . I am aware merge can only do two tables at a time, but do you have any helpful hints on how to go about merged the table , list and df. I want make to make the result a new data frame
I have the source file and target file. so in that, I have to compare 140 columns and show the result if it matches or not. for example, there is a column as Country1 in source and in target as Country2. to compare that i will use if(source['country1]==target['country2])return True else return false. to compare 140+ columns it will take time to compare 140 columns. and in both of the file columns are not in ordered. so how can I solve this?
slowly talk is very helpfull to me. I have 2 questions. The first is : What's if i want merge only one certain column (rating) from df rating to df movie . The second: What's if I want to sum the rate of each Movie_Id . Tks you so much and looking for your answer.
Does it happen while merging two data frames, only heads get to merge, No data get merged inside the new data frame?
How to get the common mobile number from two different csv file having the different column name
great video. my question is when im working on project when exactly i have to combine ?
How to merge multiple large dataframes in a fast way? I joined with usual merge() but it seems too slow. I found a clue of using pandas.Index() with the merge method, but i don't know how to use it.
Thank you so much for the concise clear explanation. Much appreciated.
how about not a specific file? for example all .csv or all .tsv file? how to concatenate a header to that file? Thanks
Great video, informative and clear. Thanks
You're welcome!
thnx for the video, that's awesome, particular the parts on explaining joins. clear and concise
Great to hear!
hi bro, I am currently working in a project. The mentors says that use foreign keys and primary keys in pandas and create table with the keys. so my question is, the usage of foreign and primary keys in pandas is possible or if we can't what shall I do to merge the two tables contains the same column which we are doing in the MYSQL coding. Thank you.
Thank you very much for the precise explanation, just what I needed to know!
You're very welcome! 🙏
Thanks for the video. I was able to successfully meagre and find some errors from Ids I did not find using VBA vlookups.
I was curious. Is there a way to highlight difference between columns in this merged database.
example:
Number of Vehicles_SS: 7 vs Number of Vehicles_SA: 2
and it would highlight the row, or even just those those values, base on the ID it was merged on?
I am having a hard time find this. Trying to get rid of VBA, which i have doing this, But it is SUPER slow with the data I have to process.
Thank you so much for the clear and concise explanation
You're welcome!
Love the way you explain it, thanks for your vids. Keep it up (thumbs)
thanks Kevin, but where is the concat video
How to verify if all the columns are incorporated in the merged DataFrame by using simple comparison Operator in Python after merging two DataFrame?
Dear Kiven, I have some difficulties in fine tuning PLSRegression sklearn.cross_decomposition.PLSRegression. Can you please touch this issue one day?
Thanks for your suggestion!
Thanks a lot Kevin
We have missed you.
Thank you! 😊
Thank you so much for explaining it clearly. Now I understand on merging dataframe more. TQVM
You're welcome!
Can pandas execute query reading sql from file or filename.sql?
Hello. Great vid. But how do I follow along? Other videos had the bitly link. I can’t find the dataset for this exercise.
Datasets are here: github.com/justmarkham/pandas-videos/tree/master/data
Missing your pandas tutorials.. thanks
It's nice to be missed! You can find all of my pandas tutorials here: th-cam.com/play/PL5-da3qGB5ICCsgW1MxlZ0Hq8LL5U3u9y.html
Sir if we have hundreds of columns without the name. Then how can we name them using pandas and a for loop or lambda function because if we try to name them using names=[] it will be a very time-consuming process. The name of the columns can be col1, col2 , col3...etc.
How to concatenate multiple row into single row separated by comma
Awesome tutorial, Thank you very much man!
You're welcome!
the resulting dataset I got has a value of null. What do i do?
Thanks for the video,
I have a query sir,
Let's consider if I have a table 1 with features (order Id) and (product Id) and table 2 with features (order Id) and (product Id).How to fetch the observations which is present in table 1 that not present in table 2
Great question! See trick 16 in this video: th-cam.com/video/tWFQqaRtSQA/w-d-xo.html
hello, i cant retrieve merged df in another cell, how can i fix that ?
Thank you for this video.
I have been struggling with merge and concat today :)
You're very welcome! Glad it's helpful to you!
Not wait too much to watch this.
I hope the video is helpful to you!
I wish we had 3x on youtube, great video!
What's the concat video? You say there is one, but I can't find it with search.
It's at the end of this video: th-cam.com/video/15q-is8P_H4/w-d-xo.html
Hope that helps!
Thanks man! You saved my weekend :*
Glad I could help!
can we merge more than two dataframes using pandas?
This is very helpful. Thank you so much.
You're very welcome!
above logic is beautifully explained, hi kevin, i have a question if you could please reply,
I have three csv files csv1(20000 rows), csv2(20000 rows),cvs3(20000 rows), i want to merge these files into single data frame without losing a single record? Like i want to read these files into a one data frame that should have 60000 rows ideally.
P.S: All the files have same columns (PostID, time, tweetURL, Content, RetweetNum , LikeNum, CommentsNum, Verified, Following, Follower). And in the resulting data frame i want to have all these columns at once as heading and want all 60000 rows. Is it possible ? kevin i will wait for your reply man, i know this post is old, maybe your read my question. THANK YOU
Superb!!!
I got Evey explanation, thanks
You're welcome!
Hello, Many thanks for you tutorial. It's great!!! But i.m stuck is any techics to join two dataframes if one of them stack other not stack?
what is the support column in sklearn's classification_report represent and what parameters can adjust it, I'm struggling on an highly imbalanced dataset, smoted it but this metric 'support' is off showing highly imbalanced!
Hi wanted to ask how you check for data consistency in columns. Like checking for integers in a string column or trying to find values like 2A in a column with double letter values eg. AA, BB etc
Great question, though there's no "one way" to catch all of these issues! Here are some tricks that might be helpful, though: th-cam.com/video/RlIiVeig3hc/w-d-xo.html
Lets say a pandas df and mysql have column A, B, C and same schema, Column A in SQL is the primary key.
now how to upsert a pandas df to mysql table?
When primary key conflicts, then update the remaining columns, when doesn't conflict/exists, then do an Insert Into..
Whats the most efficient way to do this?
Hi Kevin,
I have a troublesome Question Here
I am analyzing a dataset which is totally textual. I want to assign Grading for certain text in a column by appending a new column of Grading to each existing column.
I have achieved it using a for loop but I can't save the dataframe created because the for loop overwrites the created it. I need help.
Code of for loop
for (ColumnName,ColumnData) in b_questions.iteritems():
b_questions['Grading'] = b_questions[ColumnName].map({'Consistently Good':4,'Outstanding':5,'Satisfactory':3})
data = b_questions.loc[:,[ColumnName,'Grading']]
print(data)
If I'm understanding your question, I think you just need to run this one line of code:
b_questions['Grading'] = b_questions['Insert column name here'].map({'Consistently Good':4,'Outstanding':5,'Satisfactory':3})
Hope that helps!
Great work as allways.
Very useful.
Thanks for sharing it!
By the way, any chance you get some video done about PySpark? It will be very usefull to treat this from the biginning considering examples based on a local connection (one computer) first and then a couple of examples emulating a cluster connection.
Thanks for your kind words as always, Hector! Sorry, I don't have any videos about PySpark, but I appreciate the suggestion! 👍
@@dataschool I would love for you to do that. I am possitive that you will get a lot of interested guys, among them me of course.
My best regards!
How do I merge df1 and df2 by two columns (fiels) at clausula on? For example: dfUltStatus = pd.merge(dfUltStatus, dfDescStatus, on=['CODIGO_STATUS','SUB_CODIGO_STATUS'], how = 'left')
The object is merge the two data frames through these two fields to bring the description field.
Thank you!! I finally got the dataframe I wanted!
Hi, can you please post video on realtime large csv file having millions of rows using chunks or modin and how to merge those chunks after importing in Pandas.
Thanks for your suggestion! FYI, if your computer does not have enough RAM to load a large DataFrame into memory, reading the DataFrame in chunks will not solve that problem. It will be just as large once you merge the chunks back together (which you can do using the "concat" function.)
Hi Kevin, thanks to your turtoring, I learn a lot from your channel, it's amazing! Since I just learn Pandas, I'm a little bit confused about concat(), melt(), merge(), pivot(), stack()...They're really annoying to me >< I really hope we have a one for all solution of how to use these functions XD Thank you!
I agree, it's tricky to separate out when you should use each one of those!