Sir, can you explain the regex expression more clearly or provide any youtube link where regex is explained nicely ? Thanks. Your videos are very helpful.
In this the last record in my test data had just four columns. hence I got a schema error. is there a way to specify to handle this by ignoring the malformed data?
hii bro....i have seen this 5 th occasion change in Scala but code is too difficult compared with pyspark... please share easy step of code for scala bro
Please upload the video for ingesting the data from sap server. This is very important as we need to ingest the data from different source via pyspark.
Some transformations are not allowed in dataframe but they are availble in RDD so to perform those operations it was converted into RDD and then back to DF
123#Australia,india,Pakistan 456#England,France 789#canada,USA output is 123#Australia 789#canada 456#England 456#France 123#india 123#Pakistan how to solve this using pyspark or scala
Nice explanation..nd gd questions too
Can you please attach data set and solution set ..so that we can practice...thanks for all the excellent videos
Great explanation.. Keep it 👆
Awesome bro
Great brother...God bless u
Good work bro
This was thoroughly explained, nice scenario .
Could you please do some videos on weekly cohort analysis using window functions in spark?
Sure, we will plan.. thanks for your support
@@AzarudeenShahul can you explain expressions used in regex replace? i didnt understand whats $0.
superb explanation, do more videos bro........
Thanks for this . This has been asked to be during Mindtree interview
Cool.. hope you were able to answer the question and crack interview...
Very helpful.. thank you!
can u expalin about regex in detail and how did u get the expression
good video to improve our logic.
Sir, can you explain the regex expression more clearly or provide any youtube link where regex is explained nicely ? Thanks. Your videos are very helpful.
Thank you very much for your video's
Thanks for you support:)
@@AzarudeenShahul please txt file and code snippets
Thanks for the nice video
Could you explain the same with kafka message streaming
Sure
Thanks for this information, can you please help me that
Is this aproch works large data as well ?
Thanks in advance !!
Yes, this approach can be scale out to large dataset. Let me know if you face any problem
Hi do you teach spark course?
Thank you so much bro
In this the last record in my test data had just four columns. hence I got a schema error. is there a way to specify to handle this by ignoring the malformed data?
Hey Azharuddin, Superb 👍🏻. Can u plz provide dataset.
Hello Azar! Amazing Video. Is there a way we could replace the 5th pipe occurrence rather than adding "-". I want to replace the pipe with "-".
hii bro....i have seen this 5 th occasion change in Scala but code is too difficult compared with pyspark... please share easy step of code for scala bro
Can you share the post where you have provided answer for this
Can you please share how to deploy in production environment for pyspark job.your videos very helpful
Also please upload for reading the occurance of an string in an word.
Please upload the video for ingesting the data from sap server. This is very important as we need to ingest the data from different source via pyspark.
I didn't understand at some places you convert rdd to df and then df to rdd...why is it so?
Some transformations are not allowed in dataframe but they are availble in RDD so to perform those operations it was converted into RDD and then back to DF
Please someone explain that regexp pattern(.*?\\){5} and why $0 in $0-
Why you kept $0 before delimiter -
The $0 in awk syntax means to return the output, so when azar uses "$0-" in the function, it will preserve the output of the regex and add "-" to it.
Mohammed,Azar,BE-4year
Prakesh,Kummar,Btech-3year
Ram,Kumar,Mtech,3year
jhon,smith,BE,2year # any one can share the pyspark code to delimit the "-"
123#Australia,india,Pakistan
456#England,France
789#canada,USA
output is
123#Australia
789#canada
456#England
456#France
123#india
123#Pakistan
how to solve this using pyspark or scala