Absolutely blown away by this TH-cam video! In just one word: phenomenal. It's like diving into an encyclopedia dedicated to CI/CD pipelines. My quest for a basic explanation led me to countless sources, but this video turned out to be an absolute goldmine.
He is Perfect Trainer, so he never used this words: " Likewise remain do yourself & forward by skipping some stuff".....His commitment to teach from scratch to Advance without skipping, thtswhy He always Great
Well spoken, well prepared, nicely presented. Thank you for helping others. One suggestion (IMHO): I would reduce the last 10 minutes to 2 to 3 minutes, for example: In dashboard, instead of showing the removal of each and every dataframe, I would just show the removal of one, and tell the audience "Likewise, you can remove all the other dataframes". Same thing for adding title (header) to each visualization and arranging visualizations: I would just do it for one and tell audience "Likewise you can add title to all other visualization and arrange them per your requirements". Then I would just fast forward (skip) to show the final view of the dashboard with a few seconds of my comments.
So is Spark use for aggregating and viewing data only like this ?? It's for Data analyst so ? No, Could you show a real example with data coming from a source (exemple an API) and writing production code to send spark job on batch data ?
It's really Awesome....i shocked ur teaching skills, Really how can u simplify this much complex Real-time projects also...But as a beginner my doubt is, with Databricks PySpark we did almost all this, then what's the use of Apache Beam Airflow, AWS Glue, Azure ADF, GCP Dataflow, Dataproc...etc many services, when u getting Same Results with 1 service
Hi Sir, one question on the query "frequency of customer who visited restaurant". In the Sales.csv file there are 27 records with restaurant entries.Your output giving 21 records. In your video you did ".agg(countDistinct("ordered_date"))" I changed that with "agg(count("customer_id"))" and I got 27 records matching with the input file. Request you to look into it and suggest if any misunderstanding from my end.
how can we store this dashboard into pdf or how can we share this dashboard to others and can you pls share the ppt that you are presented in the video
Hey, was there a need to use inferschema option when you are manually defining the schema? Can you please reply? Also, from where we can download the data set for practice?
Thanks for the informative session. Can you please let me know if we can import all the functions together instead of importing them one by one ( eg: from pyspark.sql.functions import month,year,quarter ) like we import libraries pandas,matplotlib, etc in Python?
All your videos are commendable. Could you please create a video on scheduling the execution of a Databricks notebook using Azure Data Factory (ADF) pipeline?
Hi, Im working on the pay-as-you-go service of Databricks. When I'm uploading the file its not giving me the path of my computer where the file is stored. It's getting stored in the 'hive' of the databricks as a table and sales.csv its getting changed to delta format. Can you tell me how to upload a csv file and work on it. Thank you.
Hi, that's good explanation, I liked it. but my advise is please don't say Ok all the times and don't go fast. If you can improve these 2 things in your explanation then you can become good tutor.
Iam preparing for interviews,Iam watching and practicing your realtime pyspark projects it's very helpful for me, If possible can you make video on how to explain about real time project in interviews,and what type of questions could I expect they will ask about realtime projects.
earlier it was running but now for this command:- sales_df = sales_df.withColumn("order_month",month(sales_df.order_date)) sales_df = sales_df.withColumn("order_quarter",quarter(sales_df.order_date)) display(sales_df) this is the error i m getting:- AnalysisException: [DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE] Cannot resolve "month(order_date)" due to data type mismatch: parameter 1 requires "DATE" type, however, "order_date" is of "INT" type.;
Absolutely blown away by this TH-cam video! In just one word: phenomenal. It's like diving into an encyclopedia dedicated to CI/CD pipelines. My quest for a basic explanation led me to countless sources, but this video turned out to be an absolute goldmine.
He is Perfect Trainer, so he never used this words: " Likewise remain do yourself & forward by skipping some stuff".....His commitment to teach from scratch to Advance without skipping, thtswhy He always Great
Thanks
Great, Nice Sales KPI Use case, Simple explanation, much Intuitive, Thank for your content contribution.
Well spoken, well prepared, nicely presented. Thank you for helping others. One suggestion (IMHO): I would reduce the last 10 minutes to 2 to 3 minutes, for example: In dashboard, instead of showing the removal of each and every dataframe, I would just show the removal of one, and tell the audience "Likewise, you can remove all the other dataframes". Same thing for adding title (header) to each visualization and arranging visualizations: I would just do it for one and tell audience "Likewise you can add title to all other visualization and arrange them per your requirements". Then I would just fast forward (skip) to show the final view of the dashboard with a few seconds of my comments.
Absolutely terrific content, done with my first pyspark project. Subscribed too for more projects videos keep them coming ✨
Well explained and detailed ! Superb Content. Appreciated your time & efforts !
You're simply incredible! Thanks for uploading this for us :) God bless you!
GOD bless you indeed for doing a prenominal job (hence helping others like me)
thanks a lot for creating such a grate video by doing and explaining. Great job. awesome ..keep it up
Hats off to your effort Man! Keep rocking ith aesome content
Nice explanation. Please do more pyspark projects
You have good content. You should upload more of end-end projects. It will definitely give your channels the credit it deserves.
ML projects
Nice explanation.Very much easy to underatnd.Thank you very much .
Thanks.
Would love see more project like these in future.
Awesome work
Amazing content, thanks a ton!
Nice video pls do more projects on pyspark
good examples. easy to understand.
Very nice Video !!!! Great job !!!!
Thanks a lot for valued information.
great explanation
Nice stuff. Thanks.
This was very useful for me!
thanks for this project really helpful
Instead of joining both df for each KPI we can join it once & cache it. so that it will increase performance.
Thanks brother..great content
This project is of data engineering or Data analytics ? Please reply ?
So is Spark use for aggregating and viewing data only like this ?? It's for Data analyst so ? No, Could you show a real example with data coming from a source (exemple an API) and writing production code to send spark job on batch data ?
thanks man. good stuff
It's really Awesome....i shocked ur teaching skills, Really how can u simplify this much complex Real-time projects also...But as a beginner my doubt is, with Databricks PySpark we did almost all this, then what's the use of Apache Beam Airflow, AWS Glue, Azure ADF, GCP Dataflow, Dataproc...etc many services, when u getting Same Results with 1 service
Will make video for that
best content,thank you
Super tutor🔥🙏
Price is in string format .then how you get aggregate sum ..
very nice
very helpful video 🙏
price is in string type, so it can do maths formula, i didnt getting because iam using sparksql for KPI
great work
well explained, at least you should increase zoom and while deriving column we can derive in one go all column like year, month, qtr
thank's bro
bro why price is showing null even we define it as IntegerType but its showing numbers if it is StringType
Hi Sir, one question on the query "frequency of customer who visited restaurant". In the Sales.csv file there are 27 records with restaurant entries.Your output giving 21 records. In your video you did ".agg(countDistinct("ordered_date"))" I changed that with "agg(count("customer_id"))" and I got 27 records matching with the input file. Request you to look into it and suggest if any misunderstanding from my end.
Actually data I created with so many duplicate records.. So may be issue that's good u are debugging that's what is expectation
can u show an example where pandas failed due to memory where pyspark was able to overcome the memory problem?
Please upload more pyspark projects
Thanks for this
Thank you
Sir, how can get system date and calculate the current month??
current_date()
Hi do you also support people in their data engineering jobs?
please make video on how to perform unit testing in spark
hi bro content is very nice please a end to end project on data engineering using aws bro
how can we store this dashboard into pdf or how can we share this dashboard to others and can you pls share the ppt that you are presented in the video
Give the link and access to dashboard ...
Everything is very good... just try to not say "OK"
Hey, was there a need to use inferschema option when you are manually defining the schema? Can you please reply?
Also, from where we can download the data set for practice?
If it's not in description you get in telegram and if schema there no need Inferschema
Thanks for the informative session. Can you please let me know if we can import all the functions together instead of importing them one by one ( eg: from pyspark.sql.functions import month,year,quarter ) like we import libraries pandas,matplotlib, etc in Python?
We can import in one time all libraries
Excellent content. Thank you
My pleasure!
All your videos are commendable. Could you please create a video on scheduling the execution of a Databricks notebook using Azure Data Factory (ADF) pipeline?
Ok
thanks
Thanks
Welcome
Hi, Im working on the pay-as-you-go service of Databricks. When I'm uploading the file its not giving me the path of my computer where the file is stored. It's getting stored in the 'hive' of the databricks as a table and sales.csv its getting changed to delta format. Can you tell me how to upload a csv file and work on it. Thank you.
Are u able to upload in databricks metastore or not ?
@@learnbydoingit I was able to resolve it by going into the settings -> Advanced -> Enabling DBFS File Browser.
Hi, that's good explanation, I liked it. but my advise is please don't say Ok all the times and don't go fast. If you can improve these 2 things in your explanation then you can become good tutor.
Yes working on it thanks for ur feedback
@@learnbydoingit Honestly don't think this is important. Krish Naik does this but his channels is very popular. Don't have to change
this is complete end to end project
If the CSV file in blob storage than how is it?
We do mounting and then same process
Didn't get you , could you please elaborate
Can i download dashboard? if so please tell me how
Can we show this project for 2 years of experience in data engineer in real time
Plz rply
How we can get the dataset?
Telegram link mentioned in the description
Where can i execute my pyspark code,is it free or can i pay for using databricks
Databricks community edition and it's free
Thank you
Iam preparing for interviews,Iam watching and practicing your realtime pyspark projects it's very helpful for me,
If possible can you make video on how to explain about real time project in interviews,and what type of questions could I expect they will ask about realtime projects.
Can you help on my project please .. on a part bases for money please
Please join telegram we can discuss
Telegram channel name??
@@learnbydoingitplease contact me
proper column name
Can you please provide the code of this video?
Would suggest do along with this video and if issue u can connect
Telugu lo chepachuga bro
bro please stop using "ok". its so frustrating
😃 sure
earlier it was running but now for this command:-
sales_df = sales_df.withColumn("order_month",month(sales_df.order_date))
sales_df = sales_df.withColumn("order_quarter",quarter(sales_df.order_date))
display(sales_df)
this is the error i m getting:-
AnalysisException: [DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE] Cannot resolve "month(order_date)" due to data type mismatch: parameter 1 requires "DATE" type, however, "order_date" is of "INT" type.;
Pls do covert proper format
How to connect on wtsup or telegram
Link in the description
great explanation