Smart City End to End Realtime Data Engineering Project | Get Hired as an AWS Data Engineer

CodeWithYu

มุมมอง 58 171

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 25 ธ.ค. 2024

ความคิดเห็น • 155

@CodeWithYu 10 หลายเดือนก่อน ⁺¹⁸
Please don't forget to LIKE and SUBSCRIBE! 🥺
@strayedaway19 10 หลายเดือนก่อน ⁺¹
do you have link to project too , as it github or somewhere where you have committed it ?
@vinaypattiwar 9 หลายเดือนก่อน
Is there any github link .
@Ta_3-k8n 2 หลายเดือนก่อน
Do you mind, i have a question, I stumbled into these project tutorials of yours and they are absolute gem for learners and students, so thank you for that.
But do I need to have and AWS subscription in order to do these project or not, cuz I don't have money to buy one.
thanks in advance.
@abdulazeezabebefe2863 2 หลายเดือนก่อน
May I ask what the offerings of your membership list are, like what perk is available for a sergeant ,recruit or corporal tiers
@easypeasy5523 10 หลายเดือนก่อน ⁺⁸
You know you are such a gem, amazing quality paid work for free i will buy you a coffee onces i get a job brother. Keep this work on
@CodeWithYu 10 หลายเดือนก่อน
Thanks for the kind words! Looking forward to the coffee! 😉
@easypeasy5523 10 หลายเดือนก่อน
@@CodeWithYu sure will have one
@RafaVeraDataEng 10 หลายเดือนก่อน ⁺¹
Don't wait for a job. It comes for sure. Let's buy a good Coffee now! It is only 5 bucks for such great content! I've not found anywhere such content quality, clear explanation and consistency of the code which is perfectly reproducible.
Thanks Yusuf!
@ibrahimsalaudeen1160 6 หลายเดือนก่อน ⁺⁴
This is the best data engineering content have since on youtube so far. Thanks for this.
@CodeWithYu 6 หลายเดือนก่อน
Thank you! Don't forget to spread the word! ❤️
@ibrahimsalaudeen1160 5 หลายเดือนก่อน
@@CodeWithYu please can you help with a road map on data engineer. I'm a BI analyst want to transit.
@michaelokorie6740 10 หลายเดือนก่อน ⁺⁴
Now beginning my Data Engineer journey and this tutorial is an absolute Gem! I was able to reproduce everything from A-Z and get it all running! Only glitch is the Broker service for some unknown reason always exits at some point so the vehicle never gets to the destination 😅. However I do still get the data on S3. Thanks again for this! Hope I can add this project to my portfolio. Looking forward to the visualisation part!
@subhamoypaul7029 7 หลายเดือนก่อน
getting a Task not Serializable error while streaming data into S3. Checkpoints and data folders are being created in s3 but data from kafka is not getting pushed. Any idea why?
@thequang9234 5 หลายเดือนก่อน
"Only glitch is the Broker service for some unknown reason always exits at some point"
Hey if it helps, removing the KAFKA_METRIC_REPORTERS and all the CONFLUENT variables helped me not letting Kafka exits : )
@lineomatasane2523 3 หลายเดือนก่อน ⁺¹
Thank you so much Yusuf! After some challenges here and there I've been able to complete the project. As a newbie in data engineering, I've learned so much in this exercise and gained more confidence. Onto the next, which is spark unstructured streaming.
@CodeWithYu 2 หลายเดือนก่อน
Fantastic!
@SaiPhaniRam 10 หลายเดือนก่อน ⁺⁴
Thank you so much !! You are a good teacher.
@CodeWithYu 10 หลายเดือนก่อน
You are welcome!
@hoduytruong615 10 หลายเดือนก่อน ⁺³
Thank you very much have a nice day
@ML_Enthusiast 10 หลายเดือนก่อน ⁺⁴
Another amazing pick
@pankajjaiswal3149 10 หลายเดือนก่อน ⁺¹
Great Job Yu. Thanks for helping the humanity :)
@CodeWithYu 9 หลายเดือนก่อน
My pleasure!
@Jerrel.A 4 หลายเดือนก่อน ⁺²
Subbed! Thanks a lot for your kindness to share this amazing wisdom and knowledge!
@CodeWithYu 4 หลายเดือนก่อน
You’re welcome!
@orlandobboy422 5 หลายเดือนก่อน
Your tutorials are just amazing. Makes all of this stuff make sense. I would love to see one of those projects where you also use infrastructure as code with terraform for example. I know that’s more on the devops side but I had to do that at my first job as well as data engineering and was kinda lost for a while.
@shujahtali 4 หลายเดือนก่อน
such a great project for free hatsoff to you man🥰
@assieneolivier5560 9 หลายเดือนก่อน ⁺¹
You amazing!! Keep going!!
@AnhNguyen-hj7pd 10 หลายเดือนก่อน ⁺³
always inspiring with handful content, keep up the good work
@pratikmahajan6082 10 หลายเดือนก่อน ⁺⁵
I can give basic changes like Use AWS EMR in placed of AWS glue and put this project in resume and LinkedIn
@CodeWithYu 10 หลายเดือนก่อน ⁺³
That’s another interesting angle to it! 🔥
@wiss1998 10 หลายเดือนก่อน ⁺³
Nice work..waiting for dbt and snowflake 🎉🎉😊
@CodeWithYu 10 หลายเดือนก่อน
Incoming… watch out! 😀
@ataimebenson 6 หลายเดือนก่อน ⁺¹
Thank you very much for this video, I learnt alot from it.
@CodeWithYu 6 หลายเดือนก่อน ⁺¹
Thank you for watching and learning from the video, it means a lot!
@idiyelisunday7505 8 หลายเดือนก่อน ⁺¹
Wow...! such a great content
@nikitabogatyrev7091 7 หลายเดือนก่อน ⁺²
Thank you very much for all your project!
Could you please make a end to end project with delta live tables in databricks?
@CodeWithYu 7 หลายเดือนก่อน
Sure thing! Don't forget to suggest this in the community section!
@fernandoa7902 3 หลายเดือนก่อน ⁺¹
excellent video!
@CodeWithYu 3 หลายเดือนก่อน
Glad you liked it!
@mauricecolon4359 6 หลายเดือนก่อน ⁺⁷
I love your content Yusuf but you're when you're doing your project videos try not to jump around so much there's a lot of grey areas where the code isn't explained, or the video isn't going along with the code you posted. please consider that I want your channel to be one of the best. Ps I work at an edtech company, and I like to send my students to your channel
@ukaszdugozima816 10 หลายเดือนก่อน ⁺¹
Great job ! 👏👏 Inspired !!!
@CodeWithYu 10 หลายเดือนก่อน
I’m glad the project got you inspired!
@mrcrblr850 10 หลายเดือนก่อน ⁺³
was an amazing Tutorial! you are a badass! very needy this end to end projects! and yes can you do how to connect to power bi please thanks!
@CodeWithYu 10 หลายเดือนก่อน ⁺¹
We'll see if there are more requests
@RafaVeraDataEng 10 หลายเดือนก่อน
If I'm not wrong, PBI has a connector available for redshift
@RafaVeraDataEng 10 หลายเดือนก่อน ⁺³
Great Yusuf! Thanks a lot for another terrific contribution!
This is very helpful for me, as I want to implement a similar architecture for a project to driveschools here in Málaga.
Just wondering how could we simulate a non-straight route between 2 points? Maybe I could get a route record (lat long) and passing it to kafka by timestamp record one each?
I will replace the emergency topic for "paint points" where the students used to be suspended...
@CodeWithYu 10 หลายเดือนก่อน
That could work… another would be to have an algorithm that simulates curves and bends every now and then and you could get before and after values in that case
@DevajTheExplorer 8 หลายเดือนก่อน
Great content. Thank you!!!
@HummingLaught 6 หลายเดือนก่อน ⁺¹
Hopefully i can complete this project + i'm trying develop it using PDM and nix-shell :)
@morshedsarwer 10 หลายเดือนก่อน ⁺²
I like your T-Shirt Yusuf 😀😀😀
@CodeWithYu 10 หลายเดือนก่อน
Haha 😀 thanks Morshed!
หลายเดือนก่อน ⁺¹
for setting up spark with docker, can you use envv variable SPARK_MODE=worker/ SPARK_MODE=master instead of the command line to create master worker containers instead?
@CodeWithYu หลายเดือนก่อน
I suppose that could work so far the containers are not using the same KEY-VALUE pairs in the env
@____prajwal____ 10 หลายเดือนก่อน ⁺¹
Thank you.
@CodeWithYu 10 หลายเดือนก่อน
My pleasure
@Arjun-b9z 28 วันที่ผ่านมา ⁺¹
Hi, Do I need to know every single tool to start this project? I am currently learning the tools as part of my course but I would really like to get a project done and came across this.
@CodeWithYu 26 วันที่ผ่านมา
you don't necessarily have to know it all. that's the whole point! you need the exposure and know how to pick up from there going forward. so keep learning!
@abdul20ize 10 หลายเดือนก่อน ⁺²
It would be really nice if you could share Kafka configuration docs link so we can refer for explanation of configuration.
@CodeWithYu 10 หลายเดือนก่อน
Apache Kafka official website is in the description
@FAyt-ov5uo 9 หลายเดือนก่อน ⁺¹
Thank you for the great video. Can somebody help me where to find to copy at 13:50 Docker env variables
@wreckergta5470 10 หลายเดือนก่อน ⁺¹
Thank you so much
@CodeWithYu 10 หลายเดือนก่อน
You're most welcome
@JubersonOriol 10 หลายเดือนก่อน ⁺¹
Hello, I'm glad I follow your content from Latin America, my question is what route do you recommend I study to be a software architect oriented to smart cities or physical and digital integration systems, greetings from Venezuela
@CodeWithYu 10 หลายเดือนก่อน
Hi, you should choose the route that aligns with your passion
@JubersonOriol 10 หลายเดือนก่อน
@@CodeWithYu Thank you very much for the answer, I appreciate it very much, please I would like a more technical answer, since my passion is architecture and programming
@aseessarkaria2323 4 หลายเดือนก่อน ⁺¹
I am getting an error at about 1:50:00 in the video:
ImportError: Pandas >= 1.0.5 must be installed; however, it was not found.
It turns out my spark-master doesn't have enough packages, including pandas and pyarrow. I tried pip installing all of them, and then the error changed to something else that doesn't make sense
Can anyone help point out what may have gone wrong?
@025_h_mohitkumardora5 5 หลายเดือนก่อน ⁺¹
Sir I have a doubt, can't we directly push our transformed data into warehouse, w/o passing from AWS glue architecture.
@TMk-r5e 7 หลายเดือนก่อน ⁺¹
Please tell me through which platform this diagram was made
@yangwang7656 4 หลายเดือนก่อน
Hi Yu, thanks for creating the amazing content. Some questions for the redshift part. Are the data physically loaded into the redshift or the data are actually stored in Glue or S3 bucket ? We are just redshift to read the data and maybe create another semantic layer on top of that in the next phase ?
@Achilles585 10 หลายเดือนก่อน ⁺³
Amazing Project could you do one for Azure also would be awesome thanks a lot :)
@CodeWithYu 10 หลายเดือนก่อน ⁺¹
Yes, soon!
@faizahmed007 10 หลายเดือนก่อน
@@CodeWithYu one for GCP too...😅
@BeyondNoise9 9 หลายเดือนก่อน ⁺¹
Can you share some good resources from where I can learn pyspark ? Or from where you learn ..
@njourawebdev 10 หลายเดือนก่อน ⁺²
Thanks for the content,
is there any free alternative for DBeaver
@CodeWithYu 10 หลายเดือนก่อน
You can try SQL Workbench
@LongNguyen-oy3qh 8 หลายเดือนก่อน ⁺¹
Where did you get the data in this project
@BeyondNoise9 9 หลายเดือนก่อน ⁺¹
I have a question.. can I include this project in my resume .. and can it help ?? .. I want to move into data engineering domain from QA ( my current role )…
@taz2177 10 หลายเดือนก่อน ⁺²
Hi Can you do a video on how these crypto exchanges show the real time data.
@OnkarPatole-eo5fx 8 หลายเดือนก่อน ⁺¹
Hello Yusuf, I am not able to connect my Redshift cluster with DBeaver. Could you please tell me what would be an issue?
@CodeWithYu 7 หลายเดือนก่อน
Majorly your VPC/Firewall permission.
@OnkarPatole-eo5fx 7 หลายเดือนก่อน
I was running the project again, I can see the data in the S3 bucket. But, I am not able to crawl the data, I am getting an error like accout **** denied access.
@ataimebenson 6 หลายเดือนก่อน
@@OnkarPatole-eo5fx create an Iam Role for your glue crawler, The Iam Role should be from Glue to S3 Access(S3 Full Access).
That would give Glue Crawler access to S3
@sunilKumar-ci9vn 10 หลายเดือนก่อน ⁺¹
Hey Yu!! I am a college student..and i am interested in DATA Engineer Field as A Fresher How much knowledge is enough. ? Like tools and all... And their level
@Mehtre108 10 หลายเดือนก่อน ⁺¹
For transformation, did you use pyspark.
@Mehtre108 10 หลายเดือนก่อน
Domain name pls
@CodeWithYu 10 หลายเดือนก่อน
Yes, pyspark was used
@PrathyushaReddyPingili 2 หลายเดือนก่อน ⁺¹
how can we change this to tableau visulaizations
@CodeWithYu 2 หลายเดือนก่อน
You’ll need to find a way to connect Tableau to the workload. However, I’m not sure Tableau and PowerBI are designed for realtime streaming. So you might want to consider tools best suited for realtime visualization
@richardilemon3715 8 หลายเดือนก่อน
Hello Mr Yu
So I'm following your tutorial but I'm running into issues around 1:12:23
When I try to run the whole code
Please there's no way to share my code on here but I followed you completely so I don't know if the error is from my system. The error is that I can't seem to access Kafka, I'm always getting an error
@eemayo5889 8 หลายเดือนก่อน
Hi, if you have completed arounf 1 hour can you please connect with me. I am running into issues in the beginning(docker-compose.yaml) itself. Beginner . Would love if provided some assistance from senior.
@anshulnegi1822 หลายเดือนก่อน
there are no entrylevel fresher jobs for de, should a fresher target for data analyst instead?
@dhjgj1412 10 หลายเดือนก่อน ⁺¹
Do you work with something like this in your current data engineer position?
@abiodunadebisi814 8 หลายเดือนก่อน ⁺¹
Producing IOT Data to Kafka, this where i got stuck. It kept telling me, failed to resolve broker:29092: No such host is known. I ensured that the host name is well configured with the broker still the issue is not resolve.
Please I need help. Thank you.
@CodeWithYu 7 หลายเดือนก่อน
You need to run it on docker to recognize your broker. Otherwise you'll need to change broker to localhost
@4L3J 10 หลายเดือนก่อน
I had a question, which arch are you using in your mac? AMD or ARN?
@CodeWithYu 10 หลายเดือนก่อน
aarch64 or arm64
@Mehtre108 10 หลายเดือนก่อน ⁺¹
Can you pls share documents regarding this project so that we will put in resume
@CodeWithYu 10 หลายเดือนก่อน
You can get that in the source code. The details in the video description
@DivineSam-w6m 8 หลายเดือนก่อน
Hi Yusuf, i am a beginner, can i work with the community version of PyCharm for these projects ?
@CodeWithYu 8 หลายเดือนก่อน
Yes, but this is still a little high level for absolute beginner. You may want to check out a more suitable version of this on datamasterylab.com
@ashokkumarkrishna5186 8 หลายเดือนก่อน ⁺¹
@@CodeWithYu does this describe the same project in detail?... in a way a beginner can understand?...
@ND-De-tn7ud 8 หลายเดือนก่อน
Thank you Yu for this amazing content. I am facing an issue while submitting a spark job and getting this error. Any help would be appreciated.
" 0 artifacts copied, 12 already retrieved (0kB/16ms)
24/04/23 22:47:04 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Exception in thread "main" java.lang.NullPointerException: Cannot invoke "String.lastIndexOf(String)" because "path" is null"
@DivineSam-w6m 7 หลายเดือนก่อน ⁺¹
Hi Yusuf, Could you share the Architecture Diagram as well?
@biswajitsingh8790 8 หลายเดือนก่อน
Hi, can this course be taken by someome who is a complete beginner into data engineering ?
@CodeWithYu 8 หลายเดือนก่อน
Yes, but this is still a little high level for absolute beginner. You may want to check out a more suitable version of this on datamasterylab.com
@RafaVeraDataEng 10 หลายเดือนก่อน
Hi everyone!
i'm stucked just at the end... time out trying to connect to redshift...
has anyone set properly de vpc, security groups an permissions?
im receiving time out in dbbeaver.
my inbound rules are set to custom protocol TCP port 5439 any ipv4.
I set publicly accesible enabled..
what am i missing?
please help!
@RafaVeraDataEng 10 หลายเดือนก่อน
I have created again the cluster.
Before:
- VPC 1 avalibilty zone
-a security group to this vpc with inbound port 5439 from my IP and alltrafic rule.
outbound all traffic rule.
-cluster subnet group
-redshift cluster with public availabilty..
and I receive time out from dbeaver... :(
PowerBI as well...
any suggestion please?
@CodeWithYu 10 หลายเดือนก่อน
What this usually means is that your cluster is still not accepting connections. Have you tried using the default configurations to test? Most times, it may be your configuration that's faulty.
@CodeWithYu 10 หลายเดือนก่อน
Also, open your inbound and outbound port as well and associate the right VPC to your cluster
@RafaVeraDataEng 10 หลายเดือนก่อน
@thYu oh GOD!! more than 4 hours back and forth!
just deleted all de vpc in my account. created new VPC with 2 regions, security group by default and cluster subnetgroup clicking nex next xD.
finally created my 3rd cluster and assigned the VPC. Didn't work...
but! I went to the security group again and added a new inbound rule:
custom - port 5439 - MyIP... and.... TA DA !!! dbeaver sucesfully connected!
Thanks Yusuf!
@CodeWithYu 10 หลายเดือนก่อน
You're welcome!
@tintin1678 10 หลายเดือนก่อน ⁺¹
Thanks, Yu . I watched the full video and liked it . Can you make a video on ETL pipeline using open-source modern data tech stack involving Duckdb , polaris etc ?
@CodeWithYu 10 หลายเดือนก่อน
Yeah sure… thanks for the suggestions
@longhoinh3997 10 หลายเดือนก่อน ⁺¹
Need visualization pll😊
@CodeWithYu 10 หลายเดือนก่อน ⁺¹
Hahaha… I guess we’ll see if more people requests for it
@Trendy-Bazar 9 หลายเดือนก่อน
Hi is the full code available in the video?
@CodeWithYu 9 หลายเดือนก่อน
In the description
@aadhilimam8253 10 หลายเดือนก่อน ⁺¹
could i do this in free tier account ?
@RafaVeraDataEng 9 หลายเดือนก่อน
Yes.
I paid 0.4€ because I made this project with some modifications and more data volume. All for free thanks to the aws free tier. Cool doesn't it?
@thanhngohuunhat3102 6 หลายเดือนก่อน
Hi somebody have the errror Shutdownhook called ??? :(
@aliel-azzaouy7007 10 หลายเดือนก่อน
thank you for all videos, Please can you sheare with us the source code
@CodeWithYu 10 หลายเดือนก่อน
Link to the source code is in the description
@errrbrrr3821 10 หลายเดือนก่อน
bro is putting a 5 dollar payment for the source code :3
@satwikkumar-eq6fm 10 หลายเดือนก่อน
You're doing a great job of providing free tutorials. And charging 5 Euro for source code would not be ideal. Think from a long-term perspective. freecodecamp earns a lot with only TH-cam pay, so if you can allow everyone to support you in this initial phase, you never know what you could earn in the long term. Restricting by not providing source code would definitely affect your channel growth. This is my perspective. It's up to you. And thanks for the videos@@CodeWithYu
@viethoangnguyen1264 10 หลายเดือนก่อน
can u post the dataset in here please?
@CodeWithYu 10 หลายเดือนก่อน
The good thing is you don’t need a dataset, just run the code and the data gets created automatically.
@viethoangnguyen1264 10 หลายเดือนก่อน
@@CodeWithYu btw i dont get about confluent, do we have to pay a lot of money to use confluent-kafka sir? or u just use the trial 1 month?
@CodeWithYu 10 หลายเดือนก่อน
@@viethoangnguyen1264 The docker image is free same as confluent-kafka. But if you use confluent cloud, you get free credits for like a month then you can start paying afterwards if you want to continue using their services.
@viethoangnguyen1264 10 หลายเดือนก่อน
@@CodeWithYu but i read somewhere they said that confluent-kafka is not as supported as in java. Is it right? And 1 more question, confluent-kafka is just a library like numpy and pandas right
@DarrenColeman-d5h 9 หลายเดือนก่อน
anybody else run into an issue when mounting to Docker?
@nadiiar75 10 หลายเดือนก่อน ⁺¹
🤗
@truongaoquang1893 14 วันที่ผ่านมา
🥰
@MASponge98 2 หลายเดือนก่อน
This is really indepth content but if I may offer a piece of advice. no one who knows data engineering well is watching these videos, you have to tailor your videos to people who dont know much. so skipping explaining why you are writing certain code is not good.
@4L3J 10 หลายเดือนก่อน ⁺²
Just to let you know that when I ran the first time "python jobs/main.py" it returned automatically the following error "Disconnected while requesting ApiVersion: might be caused by incorrect security.protocol configuration (connecting to a SSL listener?) or broker version is < 0.10 (see api.version.request) (after 31ms in state APIVERSION_QUERY)".
I solved it setting the security protocol to "PLAINTEXT" in product_config dict:
producer_config = {
"bootstrap.servers": KAFKA_BOOTSTRAP_SERVERS,
"error_cb": lambda err: print(f"Kafka error: {err}"),
"security.protocol": "PLAINTEXT",
}
@guavacodelab8807 22 วันที่ผ่านมา
Please how do fix this error without having to to use confluence-airflow
Broken DAG: [/opt/airflow/dags/kafka-stream.py]
Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.12/site-packages/kafka/record/legacy_records.py", line 50, in
from kafka.codec import (
File "/home/airflow/.local/lib/python3.12/site-packages/kafka/codec.py", line 9, in
from kafka.vendor.six.moves import range
ModuleNotFoundError: No module named 'kafka.vendor.six.moves'
@guavacodelab8807 22 วันที่ผ่านมา ⁺¹
Please how do i fix this error without having to use confluence-kafka package
Broken DAG: [/opt/airflow/dags/kafka-stream.py]
Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.12/site-packages/kafka/record/legacy_records.py", line 50, in
from kafka.codec import (
File "/home/airflow/.local/lib/python3.12/site-packages/kafka/codec.py", line 9, in
from kafka.vendor.six.moves import range
ModuleNotFoundError: No module named 'kafka.vendor.six.moves'
@CodeWithYu 21 วันที่ผ่านมา
Python 3.12 is a little problematic at this time. Can you try using Python 3.9 or 3.10? It should fix your errors
@guavacodelab8807 20 วันที่ผ่านมา
@@CodeWithYu I was initially running python 3.9, i switched to 3.12 because of the error.

ต่อไป

เล่นอัตโนมัติ

Realtime Voting System | End-to-End Data Engineering Project