Hi adam, the video content is very helpful and i enjoyed learning it. I am into a job where these things run at back-end (matching of data coming in different files format from different sources and then parsing and matching it) and i have a very big curiosity to learn it and hands-on these things. Your video give me a spark(fire) to start it. A BIG THANKS for delivering this content.
keep up the great work! i think you should keep the series where you go on upwork(or similar freelance websites) and do jobs and show your work throughout! It provides real encourage for developers who want to create some work experience. Also your an excellent teacher ! I think this will be great for your channel
An alternative to adding prefixes, the merge function has the parameter "suffixes". This will identify overlapping column names in the left and right side and allow you to define the suffix you wish to include. e.g. pd.merge(df1, df2, suffixes=("_df1", "_df2")). A time saver I find myself using often
how do i use this approach for standardizing different ways of writing country names (United States vs United States of America) and i only want one type of country name (so basically getting rid of synonyms)
Thanks for sharing these tips, is it possible to compare all columns to display only data that has mismatches without hard coding column names in the compare function? thanks a lot
Hi @Make Data Useful I have an Issue while comparing two CSV Files below is the case. 1. The Structure are same for the both CSV 2. Different number of Records in both CSV ( ex CSV1- 50 Records , CSV2- 40 Recods) 3. I dont have a single Key column (Its a Composite key ) Please provide me a solution I'm new to Python. I Tried the below way but its working if the CSV has same number of rows but its not working if it has different number of rows: I'm storing the data is CSV1 and CSV2 in a data frame df1 and df2 correspondingly andthe converting then to objects so that it can be easily compared. Source_file = str('a.csv') Target_file = str('b.csv') Source = pd.read_csv(Source_file) df1 = pd.DataFrame(Source) df1 = df1.astype(str) print(df1) #df1.dtypes Target = pd.read_csv(Target_file) df2 = pd.DataFrame(Target) df2 = df2.astype(str) print(df2) Getting the list of headers in df1 header_list = df1.columns.tolist() for x in range(len(header_list)) : df3[header_list[x]] = np.where(df1[header_list[x]] == df2[header_list[x]], 'True', 'False') print (df3) df3.to_csv('Output', index=False) please helpout in this scenario
Very nice video :) I am struggling with something rn. I have two input files with different data, indexes, column names etc. I extracted them from original tables(they were split, partially uppercase) with original position in column(indices). I have to check if the names are the same. If they are I have to save their index in original file row to the opposite so eg. csv1: 0 Bob Marley, [2,5,7] ; csv2: 0 Elon Musk[4,11]. Pls tell me if it's something difficult and worth showing or maybe I'm just inexperienced... Thank you.
hii, i have a doubt while comparing one column with other column in the data frame, but the column name is different, for eg: 1st column name is "Column1.subLot", the other column name is "subLot". both the columns are in different dataframe. the dataset one in excel other in json file can you please a video for my question please.
Thank you Adam. I am learning python and this is really helpful. In your compare function you pass in the account_type as the column to be compared, how can we modify this to compare all the columns in the file? In other words, I need to compare and list the differences for each column. Example:Compare Column Account_type Expected Output: Account_num Statement_1_account_type Statement_2_account_type 1 Checking Savings Compare Column Statement_balance Expected Output Account_num Statement_1_Balance Statement_2_balance 3 10 15 and so on till we find differences in each column?
Hi Sriharsha, There is a couple of different ways to go about it. One approach would be to: Create a new column on each of your input dataframes made up of the three columns you want to compare then use these two new columns to compare.
I have the source file and target file. so in that, I have to compare 140 columns and show the result if it matches or not. for example, there is a column as Country1 in source and in target as Country2. to compare that i will use if(source['country1]==target['country2])return True else return false. to compare 140+ columns it will take time to compare 140 columns. and in both of the file columns are not in ordered. so how can I solve this?
hi I want to scrap top mba colleges from (Shiksha website) but the problem is that its page changes automatically like from page 0 - 1 by itself by scrolling down can you help me out how to scrap all pages, it has around 100 colleges but I can scrap only 40 colleges at a time
How can we combine or join two datasets without any common fields? For eg: I have one dataset bank.csv and another dataset salary.csv. For both datasets there are no common columns. So how can we join them in Python?
First i need to create a excel sheet with three columns ' file1', 'file2', 'match/mismatch' like this and in python i need to get two csv's as input then compare those csv's , if the data's in both the csv's match the result has to be redirected to the excel file created earlier like, file1 file2 match/mismatch sample1.csv sample2.csv match Thank you.
Hi adam, the video content is very helpful and i enjoyed learning it. I am into a job where these things run at back-end (matching of data coming in different files format from different sources and then parsing and matching it) and i have a very big curiosity to learn it and hands-on these things. Your video give me a spark(fire) to start it. A BIG THANKS for delivering this content.
Glad it was helpful! New content coming out every week now :)
keep up the great work! i think you should keep the series where you go on upwork(or similar freelance websites) and do jobs and show your work throughout! It provides real encourage for developers who want to create some work experience. Also your an excellent teacher ! I think this will be great for your channel
Thank you! I like your thinking, an upwork/freelance series is a smart idea. I'll get to work planning out the series.
An alternative to adding prefixes, the merge function has the parameter "suffixes". This will identify overlapping column names in the left and right side and allow you to define the suffix you wish to include. e.g. pd.merge(df1, df2, suffixes=("_df1", "_df2")).
A time saver I find myself using often
I love that! Thank you for the tip, very helpful :)
This is just awesome, man. Big ups.
Thanks for the Awesome video 🥰🥰🥰
Very useful 👌👌👌
Same problem as above with different number of rows. e. g one file has 70 rows with the other file has 120 rows same number of columns in both files.
Is any solution to the problem?
Sir where can I find this code? Have you created any separate link for uploading code?
how do i use this approach for standardizing different ways of writing country names (United States vs United States of America) and i only want one type of country name (so basically getting rid of synonyms)
Thanks for sharing these tips, is it possible to compare all columns to display only data that has mismatches without hard coding column names in the compare function? thanks a lot
Hi @Make Data Useful
I have an Issue while comparing two CSV Files below is the case.
1. The Structure are same for the both CSV
2. Different number of Records in both CSV ( ex CSV1- 50 Records , CSV2- 40 Recods)
3. I dont have a single Key column (Its a Composite key )
Please provide me a solution I'm new to Python.
I Tried the below way but its working if the CSV has same number of rows but its not working if it has different number of rows:
I'm storing the data is CSV1 and CSV2 in a data frame df1 and df2 correspondingly andthe converting then to objects so that it can be easily compared.
Source_file = str('a.csv')
Target_file = str('b.csv')
Source = pd.read_csv(Source_file)
df1 = pd.DataFrame(Source)
df1 = df1.astype(str)
print(df1)
#df1.dtypes
Target = pd.read_csv(Target_file)
df2 = pd.DataFrame(Target)
df2 = df2.astype(str)
print(df2)
Getting the list of headers in df1
header_list = df1.columns.tolist()
for x in range(len(header_list)) :
df3[header_list[x]] = np.where(df1[header_list[x]] == df2[header_list[x]], 'True', 'False')
print (df3)
df3.to_csv('Output', index=False)
please helpout in this scenario
Very nice video :) I am struggling with something rn. I have two input files with different data, indexes, column names etc. I extracted them from original tables(they were split, partially uppercase) with original position in column(indices). I have to check if the names are the same. If they are I have to save their index in original file row to the opposite so eg. csv1: 0 Bob Marley, [2,5,7] ; csv2: 0 Elon Musk[4,11]. Pls tell me if it's something difficult and worth showing or maybe I'm just inexperienced... Thank you.
Hey Eryk, do you want to drop me a sample of the two files with an expected output?
Hello, I have a two datasets with each dataset has a different column names. How can I merge those?
hii, i have a doubt while comparing one column with other column in the data frame, but the column name is different, for eg: 1st column name is "Column1.subLot", the other column name is "subLot". both the columns are in different dataframe. the dataset one in excel other in json file
can you please a video for my question please.
thanks really helpful.
Thank you Adam. I am learning python and this is really helpful. In your compare function you pass in the account_type as the column to be compared, how can we modify this to compare all the columns in the file? In other words, I need to compare and list the differences for each column.
Example:Compare Column Account_type
Expected Output:
Account_num Statement_1_account_type Statement_2_account_type
1 Checking Savings
Compare Column Statement_balance
Expected Output
Account_num Statement_1_Balance Statement_2_balance
3 10 15
and so on till we find differences in each column?
Hi Sriharsha,
There is a couple of different ways to go about it. One approach would be to:
Create a new column on each of your input dataframes made up of the three columns you want to compare then use these two new columns to compare.
@@MakeDataUseful It might work for few columns, what about a dataframe with 55 columns, how would you compare?
I have the source file and target file. so in that, I have to compare 140 columns and show the result if it matches or not. for example, there is a column as Country1 in source and in target as Country2. to compare that i will use if(source['country1]==target['country2])return True else return false. to compare 140+ columns it will take time to compare 140 columns. and in both of the file columns are not in ordered. so how can I solve this?
How can i combine 2 csv files with different columns bro?
What would be different if the number of columns did not match ?
hi I want to scrap top mba colleges from (Shiksha website) but the problem is that its page changes automatically like from page 0 - 1 by itself by scrolling down
can you help me out how to scrap all pages, it has around 100 colleges but I can scrap only 40 colleges at a time
Thanks for the video it help checked 18000 lines of data. Is there a way to check multiple rows at the same time.
There is a couple of parallel processing approaches out there, checkout dask as one example.
Is it possible to make a video to compare two CVS and throw in a config file to exclude certain columns? And possibly option to ignore headers.
Of course, have you got a couple of sample files you can link me to? I will make a Q&A video
How can we combine or join two datasets without any common fields?
For eg: I have one dataset bank.csv and another dataset salary.csv.
For both datasets there are no common columns. So how can we join them in Python?
How would you look to join them without Python? Once we know that we can look at how to do that using Python.
First i need to create a excel sheet with three columns ' file1', 'file2', 'match/mismatch' like this
and in python i need to get two csv's as input then compare those csv's , if the data's in both the csv's match the result has to be redirected to the excel file created earlier like,
file1 file2 match/mismatch
sample1.csv sample2.csv match
Thank you.
That sounds completely doable, if you have some sample data drop me a link and I'll put together a video.
Hay can you help me to fetch multiple data through remote.
Sure can! Drop me the details 👍
@@MakeDataUseful give me time , when can I connect ?
@@MakeDataUseful can we connect now ?
@@swagatikarout8227 just drop a comment below, I'll make a video about it 🤙🤙