00:03 Recently asked Pyspark Coding Questions 02:37 Writing and executing Pyspark pseudo code 05:21 Creating a Spark dataframe from input and performing group by aggregation 08:04 Using aggregation functions and collect list in Pyspark. 11:15 Spark SQL solution for creating DataFrame and running queries. 14:18 Understanding the data frame reader API for reading JSON and the usage of explode function 17:11 Creating a Spark dataframe and performing operations on it. 19:44 Converting string to date and performing group by in Pyspark DataFrame 22:32 Finding the average stock value using PySpark 25:38 Practice more on data frames for interviews 28:15 Practice more to gain confidence in writing correct syntax for Pyspark coding
Thanks a lot, Sumit! I am a senior data engineer with 5 years of exp but since we don't work with dataframes or pyspark mostly I am not able to do these simple things.
What about remaining 10 questions on pyspark you told we are covering it in next video but still you not uploaded on TH-cam and when you will upload it on TH-cam we are waiting for remaining 10 questions on pyspark Thank you ❤
Hello sir, how can I run pyspark code online, are you also using any online utilty to run pyspark code as shown in this video , could you please share the source, it would be very helpful.
One of the great explanation so far on youtube. I wish i could afford your course :(
One of the best interview series Thank you sumit sir .
glad to know that you liked it.
00:03 Recently asked Pyspark Coding Questions
02:37 Writing and executing Pyspark pseudo code
05:21 Creating a Spark dataframe from input and performing group by aggregation
08:04 Using aggregation functions and collect list in Pyspark.
11:15 Spark SQL solution for creating DataFrame and running queries.
14:18 Understanding the data frame reader API for reading JSON and the usage of explode function
17:11 Creating a Spark dataframe and performing operations on it.
19:44 Converting string to date and performing group by in Pyspark DataFrame
22:32 Finding the average stock value using PySpark
25:38 Practice more on data frames for interviews
28:15 Practice more to gain confidence in writing correct syntax for Pyspark coding
Need more Pyspark Interview Solutions like this 😊
Thanks a lot, Sumit! I am a senior data engineer with 5 years of exp but since we don't work with dataframes or pyspark mostly I am not able to do these simple things.
Best selection of questions and very good explanation.
You are doing a great job posting these❤
Very useful informative video which gives more confidence to the bigdata aspirants. Thanks Sumit.
Much needed sir.....!!!
Sujoy, I am sure you will enjoy watching this.
It will be great if you put questions in comment . Others can try without looking at solution first
Best explanation sir thanks
I am happy to hear this
Sir...Share need more .. please continue this playlist
We can apply distinct() too I guess for avoiding duplicate values in df.
Thank you Sir greatly explained, would be good if you can post data/schemas also in the decription box for us to query and do hands on. Thanks.! :)
This is great!
thank you Umesh
Nice explanation sir, kindly post scenario based questions
yes for sure
Thank you sir😄
Hi Sumit,
Could you please create Video explaining pipelines on AWS Databricks End-End along with Orchestration of those.
thanks sumit make videos like this .
definitely
Superb
Amazing sir
Nikhil, I am sure you will find it useful.
What about remaining 10 questions on pyspark you told we are covering it in next video but still you not uploaded on TH-cam and when you will upload it on TH-cam we are waiting for remaining 10 questions on pyspark
Thank you ❤
Hi Sir, can we not write in Spark sql in interview? As there is no difference in performance.
Nice video
thank you
in question number 2 = do we not need to remove duplicate as last can you please clear me on it ?
Hello sir, how can I run pyspark code online, are you also using any online utilty to run pyspark code as shown in this video , could you please share the source, it would be very helpful.
Sir create coding interview playlist
Q2.
Data=[('a','aa',1),
('a','aa',2),
('b','bb',5),
('b','bb',3),
('b','bb',4)]
data_schema= "col1 string, col2 string, col3 int"
df_data=spark.createDataFrame(data=Data,schema=data_schema)
df_data.display()
from pyspark.sql.functions import *
from pyspark.sql.types import *
result = ( df_data.groupBy(col('col1'),col('col2'))\
.agg(collect_set(col('col3')))
)
result.display()