Learn Python - Pandas merge two CSV files - Questions from the comments episode 2

แชร์
ฝัง
  • เผยแพร่เมื่อ 3 ธ.ค. 2024

ความคิดเห็น • 52

  • @Gurpr33t.S1ngh
    @Gurpr33t.S1ngh 4 ปีที่แล้ว +1

    Hi adam, the video content is very helpful and i enjoyed learning it. I am into a job where these things run at back-end (matching of data coming in different files format from different sources and then parsing and matching it) and i have a very big curiosity to learn it and hands-on these things. Your video give me a spark(fire) to start it. A BIG THANKS for delivering this content.

    • @MakeDataUseful
      @MakeDataUseful  4 ปีที่แล้ว

      Glad it was helpful! New content coming out every week now :)

  • @mohammedafsanuddin6067
    @mohammedafsanuddin6067 4 ปีที่แล้ว +5

    keep up the great work! i think you should keep the series where you go on upwork(or similar freelance websites) and do jobs and show your work throughout! It provides real encourage for developers who want to create some work experience. Also your an excellent teacher ! I think this will be great for your channel

    • @MakeDataUseful
      @MakeDataUseful  4 ปีที่แล้ว

      Thank you! I like your thinking, an upwork/freelance series is a smart idea. I'll get to work planning out the series.

  • @Jayu345
    @Jayu345 4 ปีที่แล้ว +1

    An alternative to adding prefixes, the merge function has the parameter "suffixes". This will identify overlapping column names in the left and right side and allow you to define the suffix you wish to include. e.g. pd.merge(df1, df2, suffixes=("_df1", "_df2")).
    A time saver I find myself using often

    • @MakeDataUseful
      @MakeDataUseful  4 ปีที่แล้ว

      I love that! Thank you for the tip, very helpful :)

  • @joshuaabok3329
    @joshuaabok3329 4 ปีที่แล้ว +1

    This is just awesome, man. Big ups.

  • @srijithscientia8614
    @srijithscientia8614 2 ปีที่แล้ว

    Thanks for the Awesome video 🥰🥰🥰

  • @jagdishjoshi6052
    @jagdishjoshi6052 4 ปีที่แล้ว

    Very useful 👌👌👌

  • @maxquin2137
    @maxquin2137 4 ปีที่แล้ว +4

    Same problem as above with different number of rows. e. g one file has 70 rows with the other file has 120 rows same number of columns in both files.

  • @harishkonakandla
    @harishkonakandla 3 ปีที่แล้ว +1

    Sir where can I find this code? Have you created any separate link for uploading code?

  • @ReadyF0RHeady
    @ReadyF0RHeady 13 ชั่วโมงที่ผ่านมา

    how do i use this approach for standardizing different ways of writing country names (United States vs United States of America) and i only want one type of country name (so basically getting rid of synonyms)

  • @vijaybabaria3253
    @vijaybabaria3253 ปีที่แล้ว

    Thanks for sharing these tips, is it possible to compare all columns to display only data that has mismatches without hard coding column names in the compare function? thanks a lot

  • @pramodhsuryajr2467
    @pramodhsuryajr2467 4 ปีที่แล้ว

    Hi @Make Data Useful
    I have an Issue while comparing two CSV Files below is the case.
    1. The Structure are same for the both CSV
    2. Different number of Records in both CSV ( ex CSV1- 50 Records , CSV2- 40 Recods)
    3. I dont have a single Key column (Its a Composite key )
    Please provide me a solution I'm new to Python.
    I Tried the below way but its working if the CSV has same number of rows but its not working if it has different number of rows:
    I'm storing the data is CSV1 and CSV2 in a data frame df1 and df2 correspondingly andthe converting then to objects so that it can be easily compared.
    Source_file = str('a.csv')
    Target_file = str('b.csv')
    Source = pd.read_csv(Source_file)
    df1 = pd.DataFrame(Source)
    df1 = df1.astype(str)
    print(df1)
    #df1.dtypes
    Target = pd.read_csv(Target_file)
    df2 = pd.DataFrame(Target)
    df2 = df2.astype(str)
    print(df2)
    Getting the list of headers in df1
    header_list = df1.columns.tolist()
    for x in range(len(header_list)) :
    df3[header_list[x]] = np.where(df1[header_list[x]] == df2[header_list[x]], 'True', 'False')
    print (df3)
    df3.to_csv('Output', index=False)
    please helpout in this scenario

  • @erykbazela7281
    @erykbazela7281 4 ปีที่แล้ว +2

    Very nice video :) I am struggling with something rn. I have two input files with different data, indexes, column names etc. I extracted them from original tables(they were split, partially uppercase) with original position in column(indices). I have to check if the names are the same. If they are I have to save their index in original file row to the opposite so eg. csv1: 0 Bob Marley, [2,5,7] ; csv2: 0 Elon Musk[4,11]. Pls tell me if it's something difficult and worth showing or maybe I'm just inexperienced... Thank you.

    • @MakeDataUseful
      @MakeDataUseful  4 ปีที่แล้ว +1

      Hey Eryk, do you want to drop me a sample of the two files with an expected output?

  • @vinothkumar9796
    @vinothkumar9796 3 ปีที่แล้ว

    Hello, I have a two datasets with each dataset has a different column names. How can I merge those?

  • @reinekeerthi9448
    @reinekeerthi9448 2 ปีที่แล้ว

    hii, i have a doubt while comparing one column with other column in the data frame, but the column name is different, for eg: 1st column name is "Column1.subLot", the other column name is "subLot". both the columns are in different dataframe. the dataset one in excel other in json file
    can you please a video for my question please.

  • @7Rodp
    @7Rodp 4 ปีที่แล้ว

    thanks really helpful.

  • @sriharshadevraj968
    @sriharshadevraj968 4 ปีที่แล้ว +1

    Thank you Adam. I am learning python and this is really helpful. In your compare function you pass in the account_type as the column to be compared, how can we modify this to compare all the columns in the file? In other words, I need to compare and list the differences for each column.
    Example:Compare Column Account_type
    Expected Output:
    Account_num Statement_1_account_type Statement_2_account_type
    1 Checking Savings
    Compare Column Statement_balance
    Expected Output
    Account_num Statement_1_Balance Statement_2_balance
    3 10 15
    and so on till we find differences in each column?

    • @MakeDataUseful
      @MakeDataUseful  4 ปีที่แล้ว

      Hi Sriharsha,
      There is a couple of different ways to go about it. One approach would be to:
      Create a new column on each of your input dataframes made up of the three columns you want to compare then use these two new columns to compare.

    • @KhushbooSingh-it6pi
      @KhushbooSingh-it6pi 3 ปีที่แล้ว

      @@MakeDataUseful It might work for few columns, what about a dataframe with 55 columns, how would you compare?

  • @vinayakchikkorde8151
    @vinayakchikkorde8151 3 ปีที่แล้ว

    I have the source file and target file. so in that, I have to compare 140 columns and show the result if it matches or not. for example, there is a column as Country1 in source and in target as Country2. to compare that i will use if(source['country1]==target['country2])return True else return false. to compare 140+ columns it will take time to compare 140 columns. and in both of the file columns are not in ordered. so how can I solve this?

  • @videofyph
    @videofyph 4 ปีที่แล้ว

    How can i combine 2 csv files with different columns bro?

  • @kingofthesummer5180
    @kingofthesummer5180 2 ปีที่แล้ว

    What would be different if the number of columns did not match ?

  • @ankurmalik7388
    @ankurmalik7388 3 ปีที่แล้ว

    hi I want to scrap top mba colleges from (Shiksha website) but the problem is that its page changes automatically like from page 0 - 1 by itself by scrolling down
    can you help me out how to scrap all pages, it has around 100 colleges but I can scrap only 40 colleges at a time

  • @filo013
    @filo013 3 ปีที่แล้ว

    Thanks for the video it help checked 18000 lines of data. Is there a way to check multiple rows at the same time.

    • @MakeDataUseful
      @MakeDataUseful  3 ปีที่แล้ว

      There is a couple of parallel processing approaches out there, checkout dask as one example.

  • @MrTonie469
    @MrTonie469 4 ปีที่แล้ว

    Is it possible to make a video to compare two CVS and throw in a config file to exclude certain columns? And possibly option to ignore headers.

    • @MakeDataUseful
      @MakeDataUseful  4 ปีที่แล้ว

      Of course, have you got a couple of sample files you can link me to? I will make a Q&A video

  • @anjanabhaskar417
    @anjanabhaskar417 3 ปีที่แล้ว

    How can we combine or join two datasets without any common fields?
    For eg: I have one dataset bank.csv and another dataset salary.csv.
    For both datasets there are no common columns. So how can we join them in Python?

    • @MakeDataUseful
      @MakeDataUseful  3 ปีที่แล้ว

      How would you look to join them without Python? Once we know that we can look at how to do that using Python.

  • @kishorekumar2743
    @kishorekumar2743 4 ปีที่แล้ว

    First i need to create a excel sheet with three columns ' file1', 'file2', 'match/mismatch' like this
    and in python i need to get two csv's as input then compare those csv's , if the data's in both the csv's match the result has to be redirected to the excel file created earlier like,
    file1 file2 match/mismatch
    sample1.csv sample2.csv match
    Thank you.

    • @MakeDataUseful
      @MakeDataUseful  4 ปีที่แล้ว

      That sounds completely doable, if you have some sample data drop me a link and I'll put together a video.

  • @swagatikarout8227
    @swagatikarout8227 3 ปีที่แล้ว +1

    Hay can you help me to fetch multiple data through remote.

    • @MakeDataUseful
      @MakeDataUseful  3 ปีที่แล้ว +1

      Sure can! Drop me the details 👍

    • @swagatikarout8227
      @swagatikarout8227 3 ปีที่แล้ว

      @@MakeDataUseful give me time , when can I connect ?

    • @swagatikarout8227
      @swagatikarout8227 3 ปีที่แล้ว

      @@MakeDataUseful can we connect now ?

    • @MakeDataUseful
      @MakeDataUseful  3 ปีที่แล้ว

      @@swagatikarout8227 just drop a comment below, I'll make a video about it 🤙🤙