Spark Interview Question | Scenario Based | Merge DataFrame in Spark | LearntoSpark

Azarudeen Shahul

มุมมอง 44 040

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 25 ธ.ค. 2024

ความคิดเห็น • 87

@4brogames 3 ปีที่แล้ว ⁺¹
Real and true looking forward to see more videos
@ajaykiranchundi9979 2 ปีที่แล้ว
Last approach was incredible. Did not know it was possible to subtract the columns to get the delta!!
@user-co8oc1rm5w 3 ปีที่แล้ว ⁺¹
being a newbie to spark I find it very helpful boss.keep it up brother.looking forward to see more such from you.
@shubne ปีที่แล้ว ⁺⁴
Now you can use unionByName() function as well.
df3 = df.unionByName(df2, allowMissinColumns=True)
df3.show()
@davimonteiropaulelli9649 3 ปีที่แล้ว ⁺¹
Excelent video Azarudeen, you helped me alot! Thankssss
@Rajgupta-fh3yt 3 ปีที่แล้ว
u r doing great job and its helping a lot to the beginners. Thanks
@SurendraKapkoti 2 ปีที่แล้ว
Very clear and useful. Thank you very much
@arvindyadav1504 3 ปีที่แล้ว
Thanks Azar for making such a nice scenario based question series with demo.
@smileplease6151 2 ปีที่แล้ว
Thank you so much for the videos. They definitely increased my hope towards practical learning!!!
@AzarudeenShahul 2 ปีที่แล้ว ⁺¹
Thanks for your support 🙂
@sumitkumarsahoo 4 ปีที่แล้ว
The tutorial is very lucid and clear
@ankbala 3 ปีที่แล้ว
very nice approach and clear explanation! Thank you very much.
@4brogames 3 ปีที่แล้ว
Awesome work man. Appreciated
@abhinavsingh9333 ปีที่แล้ว
Nice video.. informative.. ❤❤
@AzarudeenShahul ปีที่แล้ว
Thanks for all your support
@madhavkondapalli785 4 ปีที่แล้ว
Thank you so much for these real time scenario videos brother
Eagarly waiting for more such
All the best
@AzarudeenShahul 4 ปีที่แล้ว
Thanks for your support, pls share with ur frnds aswell :)
@dattaningole8063 3 ปีที่แล้ว
Very good explanation of each scenario .... Thanks a lot @Azarudeen Shahul... Keep it up
@AzarudeenShahul 3 ปีที่แล้ว
Thanks for your support.. 😊
@nagamohanreddy1602 4 ปีที่แล้ว ⁺¹
Really its nice help friend
@krishnakishorenamburi9761 4 ปีที่แล้ว ⁺¹
great work Azar. I used the automatic technique for a datawareshousing project.
@AzarudeenShahul 4 ปีที่แล้ว
Thanks for your support, share with your bigdata frnds
@DiverseDestinationsDiaries 3 ปีที่แล้ว
Hi Shaul,
Superb content. Never seen such an clear and all possible approaches in TH-cam. Thanks a lot. Not only for the interview , to get out daily jobs done ,you're videos so helpful.
@sravankumar1767 2 ปีที่แล้ว
Superb bro 👌 👏
@sarjfud 4 ปีที่แล้ว
Great example and nice explaination
@AzarudeenShahul 4 ปีที่แล้ว
Thanks for your support, :-)
@nareshvemula2204 ปีที่แล้ว
Good videos. Thank you.
One small info, in "Automated Approach" if number of columns difference between two data frame is more than one and not in alphabetical order then it won't work.
We need to sort the columns while performing union operation like below.
df_final=df_file1.select(sorted(df_file1.columns)).union(df_file2.select(sorted(df_file2.columns)))
@aneksingh4496 4 ปีที่แล้ว ⁺¹
Good video ..please keep posted on new scenario based questions
@AzarudeenShahul 4 ปีที่แล้ว
Sure, move videos to come
@adshakin 4 ปีที่แล้ว
Great pyspark tutorial thanks
@muddy8107 3 ปีที่แล้ว
Boss , you are beauty!!’
@The_Code_Father_v1.0 3 ปีที่แล้ว ⁺¹
Excellent. Thanks for sharing.
Can u make a video on reading data from multiple parquet files of different schema using schema evolution.
@AzarudeenShahul 3 ปีที่แล้ว
Sure, can except the same soon👍
@Real_Nature_shorts222 2 ปีที่แล้ว
bro pls help me to install spark share me doc of steps i have windows 10
@sasmigration1920 2 ปีที่แล้ว
Awesome Azharuddin, your videos are very helpful...Do you take any online coaching?
@rohitrathod8150 2 ปีที่แล้ว
How outer join worked? We have same columns in both the DF, which columns it will take?
@puggyk4220 3 ปีที่แล้ว
I'm trying string (json style) -> parquet for merging different columns dataframe
@DASHTeknik ปีที่แล้ว
thanks a lot bro,
@AzarudeenShahul ปีที่แล้ว
Thanks for all your support 😊
@0305ram 4 ปีที่แล้ว
@Azarudeen Shah - In the example the missing column is at the last for one of the dataframe. So with_column automatically adds at the end. What if the column is missing in middle of the table structure ? Thank you!!
@AzarudeenShahul 4 ปีที่แล้ว ⁺³
Thanks for the question
Before merging, we can select the columns in same order as that of other like
Df1.select(df2.columns)
Hope this helps you :)
@0305ram 4 ปีที่แล้ว
@@AzarudeenShahul wow.. cool thanks Azar..
@awanishkumar6308 3 ปีที่แล้ว ⁺¹
so can you help me to fix it ?
can you check i am ready to share my screen ?
dear please helpp i have learnt theory part of Hadoop and spark but not feeling confident because of no good hands on because of no environment
@AzarudeenShahul 3 ปีที่แล้ว ⁺¹
Please mail me the error message scrnshot and steps u followed.. if needed we can chk on screen sharing
@heenagirdher6443 2 ปีที่แล้ว
Hi Azarudeen. Thank you so much for this video. I have implemented the same question in spark scala but I am facing problem in implementing the automated approach in spark scala. Could you please help me on this and provide me solution for the same.
@DanishAnsari-hw7so ปีที่แล้ว ⁺¹
How can we get the code for all the scenarios in this playlist?
@AzarudeenShahul 11 หลายเดือนก่อน
we have a github link provided in description of all recent video. u can find notebook for some scenario based question.
@DiverseDestinationsDiaries 3 ปีที่แล้ว
For the same scenario, I have used motonically I'd column for two then I have done left join.
Is that approach was correct?
@srinugoriparthi4608 2 ปีที่แล้ว
Can you help in merge two dataframes with date column and big int column i am getting error like failed to merge
@monicakannan9731 3 ปีที่แล้ว ⁺¹
When merging 5 different data format files how it will work ?? Your answer will be helpful
@ashwinc9867 4 ปีที่แล้ว
Can you also make some videos on spark using scala? All your videos are brilliant
@pavithrasri1890 3 ปีที่แล้ว
Hi..your videos are really helpful... could you please post a video on spark incremental data load and merge that data with scd2 type (using SCALA)...
@viswasp3388 3 ปีที่แล้ว
nice !
@sriharipinapaka1030 3 ปีที่แล้ว
Awesome Bro !.. If you can, please do the video on the same scenario by using Scala.
@AzarudeenShahul 3 ปีที่แล้ว ⁺¹
Sure 👍
@realMujeeb 2 ปีที่แล้ว
Hi Sir,
in for loop we see df2=df2.withColumn(i,lit("null"))
here we are able to update the dataframes, but how is it possible if dataframes are immutable.
@murari5921 ปีที่แล้ว
DataFrames are immutable that is the reason why we are assigning it to variable
@awanishkumar6308 3 ปีที่แล้ว
HI Azarudeen its Awanish your video really helpful,,,
actually i have installed Spark but while i am checking on command prompt by entering pyspark its saying path is not specified
, even though i have made many correctness and checked even environment variables as well many times
@anuvindkorivi5262 2 ปีที่แล้ว
Hi bro how to achieve the same using scala
@swaroopsuki1322 2 ปีที่แล้ว
Can we do this using unionByName
@ritikgupta8478 7 หลายเดือนก่อน
We can use unionByName in scala
@vineethkyatham536 3 ปีที่แล้ว
How to compare two data frames, with matched records and unnmatched record values?
@ashwinc9867 4 ปีที่แล้ว
Can you please share the scala code for automated approach
@srinuch9531 4 ปีที่แล้ว
Thanks Azar for making real-time scenario based videos.. how automated process works when both data frames have different column names ?
@AzarudeenShahul 4 ปีที่แล้ว ⁺¹
Thanks for your support,; Are you referring to same data with different column names. If so, then automated approach does not suits.. try schema method...
@himanshujain2047 2 ปีที่แล้ว
@@AzarudeenShahul Just if the order of columns is not same between 2 DFs then this will fail. In that case, we can use unionByName or do df2= df2.select(df1.columns) first then we can apply union.
@localmartian9047 2 ปีที่แล้ว
@@himanshujain2047 there is also allowMissingColumns param in unionByName that does the same as this video
@SpiritOfIndiaaa 4 ปีที่แล้ว
Thank you , but in automated approach , updating df2 in for loop it won't work in java
@SpiritOfIndiaaa 4 ปีที่แล้ว
Whatever changed inside is not accessible outside of loop...can you help me how to handle it
@ashwinc9867 4 ปีที่แล้ว
How can I achive same in scala? I tried following code but not working.consider a and b as two dataframe
Val diffcol=a.columns.diff(b.columns)
for(i
@priyankas6354 4 ปีที่แล้ว ⁺¹
Very nice explanation of the concepts. How we can achieve this in scala. Also it will be great if you also explain some scenarios using Scala . Thank you
@awanishkumar6308 3 ปีที่แล้ว
how to get your mail id ?
@fortheknowledge145 4 ปีที่แล้ว
Just add a scenario if we do not have columns in same order in both dataframes after loop?
New columns arrive or some columns may disappear over time but the merge/union should keep happening daily.
- we need to select columns in right order before doing union
we use foldLeft instead of loop (more functional programming way)
@pranayshukla9980 4 ปีที่แล้ว
From where input1.csv is fetched, do u have uploaded any CSV file there.?
@sangamrathore7850 3 ปีที่แล้ว
Yes Parnay I have created and uploaded csv file in my databricks account
@pshar2931 3 ปีที่แล้ว
Your methods will not work if both tables have one an extra column. For example
TableA: name, age, salary
TableB: name,age,gender
@sudippandit9855 3 ปีที่แล้ว
Awesome content!! please help me if we save the output => df1.union(df2).show() and save it to new dataframe as df, and apply df.show(), it didn't work, why?
@MyVaibhavraj ปีที่แล้ว ⁺¹
we can achieve this by using UnionByName:
union_df = df1.unionByName(df2, allowMissingColumns = True)
@AzarudeenShahul ปีที่แล้ว
Here we discuss about spark below 3.1
unionByName works when both DataFrames have the same columns, but in a different order. An optional parameter was also added in Spark 3.1 to allow unioning slightly different schemas.

ต่อไป

เล่นอัตโนมัติ

$Spark Interview Question | Scenario Based Questions | { Regexp_replace } | Using PySpark$ 9:22 $Spark Interview Question | Scenario Based Questions | { Regexp_replace } | Using PySpark$

Apache Spark | Spark Interview Question | Spark Optimization { PartitionBy & Repartition }