If you're running into an error with "exit code 1" @1:28:55, you need to update the Dockerfile @1:08:08. Goto the Dockerfile and "image: postgres:9.2" for both "source_postgres" and "destination_postgres" I think this was a good idea but many details went off. I wish he'll be more specific about the version of the software he's using next time.
Thanks, I wasted many hours with that error. I would appreciate if you could expand on the reasons why this error happens. Again, thanks for your contribution.
bro, the execution of my code is still running, (after I edited the part of the code you mentioned), but I believe in you! Thank you so much wallah you are a savior! 🔥🔥
Please bring a bigger course , which covers all aspects from basics , like from scratch , MySQL , python/Java/Scala , Hadoop Spark , pyspark , something that covers a data engineering with one cloud
Amazing! Looking forward to it. Also, it would be perfect to have a more comprehensive version of this course as well. Covering all batch and stream processing tools and methods.
At 1:53:56 you may face an error due to the fact that the dbt service is launched before the completion of the elt_script service. To solve the issue, you have to add condition: service_completed_successfully under the depends_on clause of the dbt service to be sure that it will always be launched after the completion of the elt_script service.
@@wusswuzz5818 did you end up figure out any errors? dbt creates the view for me for the films, film_actors, and actors tables but not for the film_ratings table because the relation "public.films" does not exist
Good job, you are creating visibility for airbyte in a great way, by providing an evolutionary view of the stack that gets one to eventually need it. Hope they continue to support you making content using this approach.
This course is not for beginners, he is moving forward like a hell without telling how to install, he is moving fast forward and video title is for beginners
Hey , currently there is no set path to becoming a data engineer. So please create a proper certification with a clear roadmap of foundations and most used cloud tech in data engineering so that we can get some structure going for those interested in this career.
Thanks for the course .. to all those using VM's make sure files are located on the VM .. tried running docker excercise with docker on VM and files on host (Windows) wasted a lot of time resloving errors finally moved all the files to the Ubuntu VM where things ran smoothly ...
I just started and love the style. You teach fluent and set focus on the important take aways. I saw so much bullshit that I really expected that I need to watch you 10 minutes installing docker and already started skipping but you didn't show that part which is nice. Makes perfect sense. Someone not being able to RTFM and install docker on his own shouldn't focus on DE at this point anyway imho.
If you get the "pg_dump: error: aborting because of server version mismatch ". I found that running "apt-get update && apt-get install -y postgresql-client" is actually installing version 15, while "postgres:latest: pulls version 16. To fix this this, specify your image to be "postgres:15" for both the source_postgres and destination_postgres in the docker-compose.yaml. I also changed my RUN in command in Dockerfile to be "apt-get update && apt-get install -y postgresql-client-15" to be explicit.
@@JREQuickPods he codes and games on his twitch channel and he has a TH-cam(you can search his name) where he shares his experiences. twitch.tv/justinbchau. I watched him make some of this course a few months back on twitch.
Please make this a series! (Edit) Suggestion 1: I would make is to include an overview before each section of 1) the overarching pipeline 2) where in the pipeline we are for a given section. In other words, having a map or diagram of what is happening would help with conceptual understanding. Suggestion 2: Explain the code line-by-line conceptually. Time writing the code on the screen could be cut and replaced just by explanation. This would save time.
An advice. Doesn't make sense to read and write the code from your top monitor. Save your and the watchers' time and just copy-paste what's there and explain it line by line.
Thank you! I ran into all the missing configuration/typos Justin had. But for this one, his github yaml file didn't have the line, so it was difficult to find the root cause. thank you for sharing.
Errors: Database Error in model actors (models/example/actors.sql) relation "public.actors" does not exist relation "public.actors" does not exist ===> Solution: In 'docker-compose.yaml', add the below code to dbt: depends_on: elt_script: condition: service_completed_successfully
i did this and the dbt runs for the films, film_actors, and actors tables but when trying to create the view for the film_ratings table it still says that relation "public.films" does not exist. I have been frustratingly stuck on this for days, I even tried rebuilding the project. Not sure what else to do, any thoughts on what I can do next?
The setup for these services seems like A LOT of work and because of that things can go wrong in production, not to mention handover for something like this...what alternatives are there for this type of ELT pipeline?
Airbyte section is complicated and badly explained. It looks for me like some part is missing in the recording. E.g. how airbyte was started? After this video I don't see significant advantage using Airbyte over elt_script from the video example.
Came across another issue while running code for datapipeline gave me version error between dump and postgresql I changeed the version from latest to 15.5 to match dump version... hope this helps in case anyone face issues on this module
The dbt section might have been easier to explain if it was installed to the base image; all of the config could have been completed in the docker file? As you are using VS Code, it might be a good shout to use the remote conatiner extensions to interact with the running containers, too.
In the "Building a Data pipeline from Scratch" section, when I am running the containers by docker compose up, I get the following error: elt-elt_script-1 | pg_dump: error: aborting because of server version mismatch elt-elt_script-1 | pg_dump: detail: server version: 16.1 (Debian 16.1-1.pgdg120+1); pg_dump version: 15.5 (Debian 15.5-0+deb12u1) I have installed the latest version of postgres on my machine i.e. 16.1. I have also removed the images and volumes and rerun the docker compose up command, still I get the above error. Can someone please help? Thanks!
I ran into this same problem. I fixed it by changing the docker-compose.yaml. Instead of image: postgres:latest under the source_postgres and destination_postgres, I wrote image: postgres:15 in both of these places. This way when the code is run, it installed the same version of PostgreSQL in both the source and destination.
Thanks for creating this course. I like the live debugging. Gives mere mortals like me hope knowing that I am not the only one that codes like this ```check = Ture ```
I love it. But lets say i need to make a email classification system that distributes these emails after there classification and send them to there classified department server. Would you use a message queue like celery or use a broker like kafka or rabit mq?
It was actually he forgot the macro keyword when he was defining the macro in ratings_macro.sql file. {% macro generate_ratings() %} CASE WHEN user_rating >= 4.5 THEN 'Excellent' WHEN user_rating >= 4.0 THEN 'Good' WHEN user_rating >= 3.0 THEN 'Average' ELSE 'Poor' END as rating_category {% endmacro %}
Okay, major question. As you are adding in new technologies like Airflow in the docker file, where is your console log or terminal to let you know if there’s any syntax errors etc.? For example, with React, Django, Flutter, I always have the app running on local host and ALWAYS have that window open to see if there are any errors in the error log as I am updating files. How do you do that with this workflow to make sure you’re not making mistakes while you’re writing code?
have a issue with postgres version mismatch. I change the version of postgres server to suit with it. services: source_postgres: image: postgres:15.5 same for destination_postgres. Hope this help
I ran into the below error trying to run my container, can you please advice me on how to resolve it? All my files are in one directory. 2024-05-03 15:34:42 Node.js v18.20.2 2024-05-03 15:34:45 node:internal/modules/cjs/loader:1143 2024-05-03 15:34:45 throw err; 2024-05-03 15:34:45 ^ 2024-05-03 15:34:45 2024-05-03 15:34:45 Error: Cannot find module '/app/src/index.js' 2024-05-03 15:34:45 at Module._resolveFilename (node:internal/modules/cjs/loader:1140:15) 2024-05-03 15:34:45 at Module._load (node:internal/modules/cjs/loader:981:27) 2024-05-03 15:34:45 at Function.executeUserEntryPoint [as runMain] (node:internal/modules/run_main:128:12) 2024-05-03 15:34:45 at node:internal/main/run_main_module:28:49 { 2024-05-03 15:34:45 code: 'MODULE_NOT_FOUND', 2024-05-03 15:34:45 requireStack: [] 2024-05-03 15:34:45 } 2024-05-03 15:34:45 2024-05-03 15:34:45 Node.js v18.20.2
Why is dbt using the port 5434 to communicate with the destination database? Both containers are running in the same docker network so why do we need to use the exposed port number?
It does not need that port that port is exposed so that if you the user wants to take a peak at the database you can run a db manager against that port, otherwise to peek at the database you would have to exec -it into the db container to look at the tables.
I have same folder structure but when I execute docker compose sentence I receive an error: fatal: Invalid --project-dir flag. Not a dbt project. Missing dbt_project.yml file.
Hello everyone ! I have this error "elt_script-1 | Error connecting to PostgresSQL: Command '['pg_isready', '-h', 'source_progres']' returned non-zero exit status 2. elt_script-1 | Retrying in 5 seconds... (Attempt 1/5) elt_script-1 | Error connecting to PostgresSQL: Command '['pg_isready', '-h', 'source_progres']' returned non-zero exit status 2. elt_script-1 | Retrying in 5 seconds... (Attempt 2/5) elt_script-1 | Error connecting to PostgresSQL: Command '['pg_isready', '-h', 'source_progres']' returned non-zero exit status 2. elt_script-1 | Retrying in 5 seconds... (Attempt 3/5) elt_script-1 | Error connecting to PostgresSQL: Command '['pg_isready', '-h', 'source_progres']' returned non-zero exit status 2. elt_script-1 | Retrying in 5 seconds... (Attempt 4/5) elt_script-1 | Error connecting to PostgresSQL: Command '['pg_isready', '-h', 'source_progres']' returned non-zero exit status 2. elt_script-1 | Retrying in 5 seconds... (Attempt 5/5) elt_script-1 | Max retries reached. Exiting elt_script-1 exited with code 1", I already changes the images in my docker-compose.yaml with image: postgres:9.2 and with image: postgres:15 in my source_postgres and destination_postgres, but no success... Someone have any idea of how i can fix this ? Thanks a lot :)
We need a 60 hrs course for DataEngineer.
U make one
Check out the data engineering zoomcamp.
😂@@dijik123
@@StarLord-571
@@StarLord-571 it's not with the quantity but with the quality
We need full data engineering course py + sql + big data hadoop + apache spark + apache airflow + apache kafka + aws + project
If you're running into an error with "exit code 1" @1:28:55, you need to update the Dockerfile @1:08:08. Goto the Dockerfile and "image: postgres:9.2" for both "source_postgres" and "destination_postgres"
I think this was a good idea but many details went off. I wish he'll be more specific about the version of the software he's using next time.
Thanks, I wasted many hours with that error. I would appreciate if you could expand on the reasons why this error happens. Again, thanks for your contribution.
God bless you mate !
Thanks a lot for the point. I spent a bit of time trying to solve the issue.
bro, the execution of my code is still running, (after I edited the part of the code you mentioned), but I believe in you! Thank you so much wallah you are a savior! 🔥🔥
thanks mate
Please bring a bigger course , which covers all aspects from basics , like from scratch , MySQL , python/Java/Scala , Hadoop Spark , pyspark , something that covers a data engineering with one cloud
yes!!
Yes, please. This would be so helpful but thanks for this resource, a great start.
YES Please
Yes
Yes please
Amazing!
Looking forward to it.
Also, it would be perfect to have a more comprehensive version of this course as well. Covering all batch and stream processing tools and methods.
the best DE course that i ever seen. the most courses only stick to the theory and never show the practical part.
At 1:53:56 you may face an error due to the fact that the dbt service is launched before the completion of the elt_script service.
To solve the issue, you have to add condition: service_completed_successfully under the depends_on clause of the dbt service to be sure that it will always be launched after the completion of the elt_script service.
Thank you so much.
This did not work for me, and I can't progress any further. Frustrating.
Thanks man, I was struggling with this for a while
depends_on:
elt_script:
condition : service_completed_successfully
@@wusswuzz5818 did you end up figure out any errors? dbt creates the view for me for the films, film_actors, and actors tables but not for the film_ratings table because the relation "public.films" does not exist
in the docker-compose.yaml file, for anyone wondering (like me 1 min ago)
Good job, you are creating visibility for airbyte in a great way, by providing an evolutionary view of the stack that gets one to eventually need it. Hope they continue to support you making content using this approach.
This course is not for beginners, he is moving forward like a hell
without telling how to install, he is moving fast forward and video title is for beginners
Hey , currently there is no set path to becoming a data engineer. So please create a proper certification with a clear roadmap of foundations and most used cloud tech in data engineering so that we can get some structure going for those interested in this career.
Please add more data engineering course like this. I really love it.
YES THANK YOU SO MUCH keep expanding this!!!!
Thanks for the course .. to all those using VM's make sure files are located on the VM .. tried running docker excercise with docker on VM and files on host (Windows) wasted a lot of time resloving errors finally moved all the files to the Ubuntu VM where things ran smoothly ...
Yes! I’ve been waiting for this.
1:59:58
the reason why he encountered the error is that {% generate_ratings() %}.
To avoid the error, you should put {% macro generate_ratings() %}
Long waiting for this course
More bi and DE courses like this plz
I just started and love the style. You teach fluent and set focus on the important take aways. I saw so much bullshit that I really expected that I need to watch you 10 minutes installing docker and already started skipping but you didn't show that part which is nice. Makes perfect sense. Someone not being able to RTFM and install docker on his own shouldn't focus on DE at this point anyway imho.
Finally a data engineering course!
If you get the "pg_dump: error: aborting because of server version mismatch ".
I found that running "apt-get update && apt-get install -y postgresql-client" is actually installing version 15, while "postgres:latest: pulls version 16.
To fix this this, specify your image to be "postgres:15" for both the source_postgres and destination_postgres in the docker-compose.yaml.
I also changed my RUN in command in Dockerfile to be "apt-get update && apt-get install -y postgresql-client-15" to be explicit.
Thank you. I have 30 mininutes for this error.
love you mate.
Since when Justin is a data engineer? Well I guess the constant learning is real.
I’ve been following him since I started coding(almost 4 years now) he’s always learning and growing.
@@briabytes which channel ?
@@briabytes Please share his YT channel.
@@JREQuickPods he codes and games on his twitch channel and he has a TH-cam(you can search his name) where he shares his experiences. twitch.tv/justinbchau. I watched him make some of this course a few months back on twitch.
his twitch is in the other comment
Here I'm I thinking about data engineering, then Boom! TH-cam shows me a data Engineering course 😅
You probably googled
Thank you for this! Learned a lot today. 👍🏾
We would like to see more content for Data Engineering, possibly a full course.
Thank you for this. I like the way you teach it step-by-step and I always got lost with JOINS. Haha anyways thanks for this! Keep it up
My man...Justin!
You're a top crossfitter and I miss working out under your guidance
Please make this a series!
(Edit)
Suggestion 1: I would make is to include an overview before each section of 1) the overarching pipeline 2) where in the pipeline we are for a given section. In other words, having a map or diagram of what is happening would help with conceptual understanding.
Suggestion 2: Explain the code line-by-line conceptually. Time writing the code on the screen could be cut and replaced just by explanation. This would save time.
An advice. Doesn't make sense to read and write the code from your top monitor. Save your and the watchers' time and just copy-paste what's there and explain it line by line.
another one had issues with host.docker.internal hence added extra_hosts:
- "host.docker.internal:host-gateway" to docker compose for dbt
Thank you! I ran into all the missing configuration/typos Justin had. But for this one, his github yaml file didn't have the line, so it was difficult to find the root cause. thank you for sharing.
@@paolaprieto8111 heya could you share the exact code you wrote, I can't seem to get it working
I like to see more of your content, good explanation skills
I really need a longer video.
Thanks!
Can you pls do a complete end to end video on AWS or GCP data engineering data ingestion, etl, analytics using pyspark
Yes! Thanks!
Right off the bat, would just like to make the comment of, please nix the background music or make it even quieter... Distracting.
Thank you for amazing video. Could you please a second video including Data Engineering Projects
@freeCodeCamp please put more content on data engineering ;)
Please make a detailed tutorial on data engineer, about 20-30 hours full end to end course please it's a request 🙌
Thank you
ok, this is super helpful.
Finalllyyyy an data project
Nice one
best video on the internet
Keep up the good work!
thank you very much
Errors:
Database Error in model actors (models/example/actors.sql)
relation "public.actors" does not exist
relation "public.actors" does not exist
===> Solution:
In 'docker-compose.yaml', add the below code to dbt:
depends_on:
elt_script:
condition: service_completed_successfully
A hero, but why does it work for most people out of the box?
i did this and the dbt runs for the films, film_actors, and actors tables but when trying to create the view for the film_ratings table it still says that relation "public.films" does not exist. I have been frustratingly stuck on this for days, I even tried rebuilding the project. Not sure what else to do, any thoughts on what I can do next?
The setup for these services seems like A LOT of work and because of that things can go wrong in production, not to mention handover for something like this...what alternatives are there for this type of ELT pipeline?
Thank's for the video
Airbyte section is complicated and badly explained.
It looks for me like some part is missing in the recording. E.g. how airbyte was started?
After this video I don't see significant advantage using Airbyte over elt_script from the video example.
This is excellent content
Awesome!
please we need courses about big data , hadoop and sparks with practical projects
Please make video on performance testing using jmeter
Quallity content as always
Finalllyyyy an data project. What are the learning pre-requisites for this course?.
Love it!
please come up with big data course
This channel is gold 🪙❤
Came across another issue while running code for datapipeline gave me version error between dump and postgresql I changeed the version from latest to 15.5 to match dump version... hope this helps in case anyone face issues on this module
Hero
Please make more video about Data Engineering
Excellent .
I am stuck at installing docker, 'cant run on my pc' windows 10
My boy, Chau!
The dbt section might have been easier to explain if it was installed to the base image; all of the config could have been completed in the docker file?
As you are using VS Code, it might be a good shout to use the remote conatiner extensions to interact with the running containers, too.
Finally!
Real GEM
In the "Building a Data pipeline from Scratch" section, when I am running the containers by docker compose up, I get the following error:
elt-elt_script-1 | pg_dump: error: aborting because of server version mismatch
elt-elt_script-1 | pg_dump: detail: server version: 16.1 (Debian 16.1-1.pgdg120+1); pg_dump version: 15.5 (Debian 15.5-0+deb12u1)
I have installed the latest version of postgres on my machine i.e. 16.1. I have also removed the images and volumes and rerun the docker compose up command, still I get the above error. Can someone please help? Thanks!
I ran into this same problem. I fixed it by changing the docker-compose.yaml. Instead of image: postgres:latest under the source_postgres and destination_postgres, I wrote image: postgres:15 in both of these places. This way when the code is run, it installed the same version of PostgreSQL in both the source and destination.
@@oddlang687 it' s work , thanks you very much, i take two day to find the way to show that problem.
Thanks man!
@@oddlang687 Appreciate this comment, fixed for me. Thank you!
Do full MySQL course
Thanks for creating this course. I like the live debugging. Gives mere mortals like me hope knowing that I am not the only one that codes like this ```check = Ture ```
I love it. But lets say i need to make a email classification system that distributes these emails after there classification and send them to there classified department server.
Would you use a message queue like celery or use a broker like kafka or rabit mq?
I need a comprehensive video. Please !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!. I'm looking for that.
Do you need to know maths for data engineering if so what types of
Love your tutorials. But can you make a c++ sfml with visual studio tutorial that would help me a lot. Thanks
it was alright, I think adding a bit more discipline into the course would be an amazing upgrade
How do i learn all of these? What course should i take? Please help
Can you make data engineering boot camp.
You didn't solve the column level macro bug @ 2:00:03. That's not the attitude mate.
It was actually he forgot the macro keyword when he was defining the macro in ratings_macro.sql file.
{% macro generate_ratings() %}
CASE
WHEN user_rating >= 4.5 THEN 'Excellent'
WHEN user_rating >= 4.0 THEN 'Good'
WHEN user_rating >= 3.0 THEN 'Average'
ELSE 'Poor'
END as rating_category
{% endmacro %}
The intro and description don't match the video - there is no Spark or Kafka anywhere in the video.
Might be great, but source code is not working at all (starts with main branch, and keeps going).
Okay, major question. As you are adding in new technologies like Airflow in the docker file, where is your console log or terminal to let you know if there’s any syntax errors etc.? For example, with React, Django, Flutter, I always have the app running on local host and ALWAYS have that window open to see if there are any errors in the error log as I am updating files.
How do you do that with this workflow to make sure you’re not making mistakes while you’re writing code?
At 2:52:34 how was airbyte started ??
try looking into course resources > airbyte
What are the learning pre-requisites for this course?
brain
this guy could barely code in JavaScript like a year ago so not too much i guess.
@@AndrewHuange thanx for answering my query👍
have a issue with postgres version mismatch. I change the version of postgres server to suit with it.
services:
source_postgres:
image: postgres:15.5
same for destination_postgres.
Hope this help
thank you man! Great help
Here's your data model. People/Places/Pennies/Product involved in a Process.
I ran into the below error trying to run my container, can you please advice me on how to resolve it? All my files are in one directory. 2024-05-03 15:34:42 Node.js v18.20.2
2024-05-03 15:34:45 node:internal/modules/cjs/loader:1143
2024-05-03 15:34:45 throw err;
2024-05-03 15:34:45 ^
2024-05-03 15:34:45
2024-05-03 15:34:45 Error: Cannot find module '/app/src/index.js'
2024-05-03 15:34:45 at Module._resolveFilename (node:internal/modules/cjs/loader:1140:15)
2024-05-03 15:34:45 at Module._load (node:internal/modules/cjs/loader:981:27)
2024-05-03 15:34:45 at Function.executeUserEntryPoint [as runMain] (node:internal/modules/run_main:128:12)
2024-05-03 15:34:45 at node:internal/main/run_main_module:28:49 {
2024-05-03 15:34:45 code: 'MODULE_NOT_FOUND',
2024-05-03 15:34:45 requireStack: []
2024-05-03 15:34:45 }
2024-05-03 15:34:45
2024-05-03 15:34:45 Node.js v18.20.2
Why is dbt using the port 5434 to communicate with the destination database? Both containers are running in the same docker network so why do we need to use the exposed port number?
It does not need that port that port is exposed so that if you the user wants to take a peak at the database you can run a db manager against that port, otherwise to peek at the database you would have to exec -it into the db container to look at the tables.
2:40:23 That was litterally it 😂 but thanks for tutorial;)
Sir please make the audio track available
face-red-heart-shape face-red-heart-shape face-red-heart-shape the cat stole the show
definitely not for beginners but good .
Can anyone suggest some youtube channel which provide all the topics like SQL python spark pyspark azure ???
The semi-transparent terminal is a very bad idea for a video. It unnecessarily makes it harder to see the text.
Is spark and kafka actually covered here as mentioned in the description?
nope
I want to be Data Engineering Assistant 😀
I have same folder structure but when I execute docker compose sentence I receive an error: fatal: Invalid --project-dir flag. Not a dbt project. Missing dbt_project.yml file.
Hello everyone ! I have this error "elt_script-1 | Error connecting to PostgresSQL: Command '['pg_isready', '-h', 'source_progres']' returned non-zero exit status 2.
elt_script-1 | Retrying in 5 seconds... (Attempt 1/5)
elt_script-1 | Error connecting to PostgresSQL: Command '['pg_isready', '-h', 'source_progres']' returned non-zero exit status 2.
elt_script-1 | Retrying in 5 seconds... (Attempt 2/5)
elt_script-1 | Error connecting to PostgresSQL: Command '['pg_isready', '-h', 'source_progres']' returned non-zero exit status 2.
elt_script-1 | Retrying in 5 seconds... (Attempt 3/5)
elt_script-1 | Error connecting to PostgresSQL: Command '['pg_isready', '-h', 'source_progres']' returned non-zero exit status 2.
elt_script-1 | Retrying in 5 seconds... (Attempt 4/5)
elt_script-1 | Error connecting to PostgresSQL: Command '['pg_isready', '-h', 'source_progres']' returned non-zero exit status 2.
elt_script-1 | Retrying in 5 seconds... (Attempt 5/5)
elt_script-1 | Max retries reached. Exiting
elt_script-1 exited with code 1", I already changes the images in my docker-compose.yaml with image: postgres:9.2 and with image: postgres:15 in my source_postgres and destination_postgres, but no success... Someone have any idea of how i can fix this ? Thanks a lot :)
I'm having the same issue
ChauCodes !
Cool
1:40:58 can't find the schema.yml file in the course github. Had to copy everything from the video :/
ran into the same issue, its in a different branch
1:14:49