Implementing Pyspark Real Time Application || End-to-End Project || Part-1
ฝัง
- เผยแพร่เมื่อ 5 ก.ย. 2024
- In this video we will discuss about , implementing Pyspark application in Pycharm and reading the Files Dynamically from the Respective Folders..
Pre-Requisite::
Spark and Hadoop Installed, Python, Pycharm
Link to DataSet::
Download City Dimension File at below Link:
prescpipeline1...
Download Prescriber Fact File at below Link:
prescpipeline1...
#azuredatabricks
#dataengineering
#dataanalysis
#pyspark
#pythonprogramming
#dataengineering
#dataanalysis
#pyspark
#python
#sql
Great explain, very clearly, this video very helpfull for me
Glad to hear that!
Good explanation 😊, now am confident on structure of folders in pyspark works
Thanks
you r ahead of everyone in explanantion.
This video helped me a lot.hope we can expect more real time scenarios like this
good content
good explanation
👍👍
Hey Great Explanation. Please could you reshare the csv file which is used. Not able to extract the file mentioned in your description
drive.google.com/drive/folders/1XMthOh9IVAScA8Lk-wfbBnKCEtmZ6UKF?usp=sharing
I think the code looks too verbose and need some refactoring to simplify things. Overall good content
@dataspark Could you please provide those data links again because those link got expired
Instead of get_env_variables.py, we could use .env file isn't it ?
can you please give me the github sourcecode for practise?
Hello. Does anyone know hindi and can explain this project to me entirely in Hindi (not very much detailed manner, just briefly) in 30 mins or so? I'm a fresher and all this is going bouncer over my head, help out pleaseeee😢😢😢
how can i find this code? is there any repo where you have uploaded it.?
Sorry to say this bro , unfortunately we lost those files
I am not able to download the fact file,I am getting the error in extracting the file
drive.google.com/drive/folders/1XMthOh9IVAScA8Lk-wfbBnKCEtmZ6UKF?usp=sharing
Sir, Can we use Scala in Intellij IDE for the project ?
yes you can use brother.
can i use databricks community edition?
Hi, You can use databricks, then you have to play around dbutils.fs methods in order to get the list / file path as we did in get_env.py file. Thank you
I can't download the dataset 😭.
Take a look at this :
drive.google.com/drive/folders/1XMthOh9IVAScA8Lk-wfbBnKCEtmZ6UKF?usp=sharing
sir why have u no used databricks for transformation?
Hi generally all the application development would be done with IDE and also it's easier to maintain folder kind of structure . Though you can develop in DataBricks But it's majorly for Analysis Part
@@DataSpark45 but DataBricks internally using spark and even its used in DEV,QA and PROD also? Current trend is also DataBricks right? Please correct me if my understanding is wrong!
@@nandesh783 Any answers ?
where's your parquet file located?
Hi, r u talking about source parquet file! It's under source folder
I am getting error with logging. Python\Python39\lib\configparser.py", line 1254, in __getitem__
raise KeyError(key)
KeyError: 'keys'
can you share the code written in the video?
sure, here is the link drive.google.com/drive/folders/1QD8635pBSzDtxI-ykTx8yquop2i4Xghn?usp=sharing
Thanks@@DataSpark45
did your problem resolved?
@@vishavsi
Can you provide me the source data file?
Hi in the description i provided the link bro
AuthenticationFailed
Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature. RequestId:ea8e17b4-701e-004d-1db1-573f6a000000 Time:2024-02-04T21:31:20.0816196Z
Signature not valid in the specified time frame: Start [Tue, 22 Nov 2022 07:36:34 GMT] - Expiry [Wed, 22 Nov 2023 15:36:34 GMT] - Current [Sun, 04 Feb 2024 21:31:20 GMT]
where did you got this error bro
@@DataSpark45 while downloading data. But i got data from part 2