This is a great video, and perfectly timed for me. I know SQL and Power BI, and had been trying to figure the next skill to learn, Python (I already learnt a little 3 years ago), ETL/ELT, Data Factory, Data Warehouse design etc . Your video answered my question, brush up on Python. Thanks!
Yeah, Python would be a great place to start. Generally most interviews will focus on 2-3 key areas from a tech perspective. 1. SQL 2. Coding 3. ETL and Data warehouse design. So it would be a good idea to make sure your coding is on point. They may ask some high level questions about more specialized topics like streaming and distributed computing. But these can usually be answered using a similar explanation you would find in a blog. Like, what are the pros and cons to using streaming or real-time data pipelines?
Man your channel is exactly what was I looking for. I'm mexican and there isn't content about this topic, only data science and that stuff and I want information about this topic specifically. So many thanks.
Man thank you so much. I'm trying to be a data-engineer. And I watched so many videos that really disappointed me. But your video ignited my mind one more time. thanks so much you're amazing.
Thanks! would be really nice to have a short video explaining data warehouses/data lakehouses/cloud and where to look for best practices etc. I find it very hard to find information on these topics!
Thanks for this , I was trying to get into DE after a career gap. Didn't know where to start , so i learned sql basics and doing some exercises on it. Now next in line looks like data warehousing and ETL . what do companies expect from a person trying become a DE ? do they look for all these topics from the person or they doing to stick with SQL and DW & ETL ? how deep should my knowledge should be when applying for the interview. Thanks again for this video. Looking forward to your more videos
First, thanks for the comment! As far as what you should know. As a basis you should know ETLs, DW, SQL and coding. Most interviews will ask you some combination of these topics. From there, you may want to look into streaming vs batch etls. Sometimes interviewers will ask for high level discussions of these topics or perhaps distributed computing pros and cons. Some of the more unique questions I have gotten were "What is the difference between an integration test vs a unit test and explain the different styles of compression." Does that help?
@@SeattleDataGuy absolutely. It did really help. Now i have roadmap on what to study and what to expect. Could you tell me where to study data warehousing and etl tools if you don't want to join courses . Looked at lots of videos but none doesn't seem to help. Thanks again
What references would you recommend for DW/ETL/ETL? Currently I'm going trough 'Star Schema: The Complete Reference' and Datacamp's Data Engineering with Python Track. What do you think?
Great info in here Ben, many thanks. You mentioned an article about data visualisation best-practices. I don't think it was posted in the details section. Can you share a link to it?
@@SeattleDataGuy yes, i agree with the cuestion. And from your experience, being a Data Engineer, wich of the 3 publics cloud did you have the best experience working?, or is the same experience on any just with diferent services names?
The issue with most Data Engineering postings is they put dozens and dozens of requirements no one person needs to have to do DE job. I have seen some DE jobs ask for PhDs and literally 24 software packages. Oh yes, they also only want to pay bare minimum.
I have seen the occasional job description for DEs requiring a masters, but never a PHD! That's wild. I can see that for like a research or pure data scientist. But yes, the struggle for most technical positions are always description bloat.
git is basic programming skill, really unfortunate to put DevOps several layers higher there (or that it's a hierarchical in importance). Note that DevOps has 'dev' in it.
I’ve almost done with Data engineering career track on data camp but can’t solve bash (shell) course. Could you help me or maybe do you know where to find the answers?:)
Thanks for the great video, do you know a good source or website where one can practice application of SQL and Python for data engineering? I have the grasp of Python and SQL, but I would like to use application, any help would be appreciated.
It could be used, but I would say that leads to a lot of custom coded solutions which will be very expensive to maintain. Using other pre-created libraries like Airflow or Dagster would be better.
I am a react dev having exp of 2 years, i am learning python and sql in order to switch to data engineering, should i learn ETL next after python and sql? Thanks
Thank you for this video and all your other videos. It's been really helpful. I am a software engineer with a pretty solid foundation in programming (java) and sql. Level 2 of your pyramid is where I'd like to get more experience, and luckily for me I have a real project at my company that would be perfect. What we have is a postgres database with some pretty large OLTP tables of around 50M - 90M rows. The goal is to build some data viz dashboards using this data, but other than that it's a totally blank canvas. Are there any specific tools you'd recommend to look into for this project? Thanks
Hey, great information. Quick question. What’s a good starting point for a data engineer out of school with a stats and analytics degree. By the time I graduate I will have experience with Python, Java, R, and SQL
My guess is it would be great to start learning some data warehousing and data pipeline concepts. This is usually where the gap is in terms of knowledge. This is because these are applied skills on top of your coding and stat skills.
@@SeattleDataGuy I’m sorry I wasn’t clear in exactly what I was asking. I was asking what jobs would you apply for to get the relevant experience. Thanks again!
@@liamhoward2208 I see! Well I wouldn't limit yourself. If you can find jr. data engineer, or data engineering internships, then I would try that. Or if you can find BI jobs those are also great. Finally, look for analyst jobs that have ETL work. At the end of the day its about doing the right work. So ETLs, data warehousing, data viz, data pipelines and so on. Also, the more you can be on an engineering team where you build best practices, the better. Is that helpful?
@@SeattleDataGuy Yes it is. Thank you. I just want to land a job that gives me good SQL and Python experience. In general, from what I understand, those are the two skills that are in most demand right now.
@@liamhoward2208 Yeah there are a lot of places you can pick up those skills. Analyst, BI developer, data engineer, etc. All of these will have different mixes of SQL and python. So I would look for jobs by skills instead of by title.
What do you think of the Udacity data engineering Nano degree? I’m about to take the course before doing an AWS or Azure Data engineering certification. Is this pathway comprehensive enough to get an entry level role in engineering?
I haven't taken their nano degree yet. I usually point towards the GCP data engineering course since its kind of free. I am meaning to review the certificates at some point
Hey! Thanks for the comment. I do have a video where I discuss both free and paid for courses on these skills. Let me know if this video helps! Happy to answer more questions th-cam.com/video/lVj0RlSxTXk/w-d-xo.html
@@SeattleDataGuy really helped me alot just one que more ..as there are many libraries present in Python can you suggest me some important one regarding this course...
@@shrutimishra9809 Library wise, airflow, flask and pandas is a great place to start. From there I would say try to build some form of project like a basic scraper that uses airflow to pull data and then flask to put together some form of website. Here are some articles on the topics. Does this help? Or would you like actual courses? Flask betterprogramming.pub/building-your-first-website-with-flask-part-1-903a8b44e806?sk=8636b4aa9dafd81464d0bf55cb2b7863 Airflow towardsdatascience.com/getting-started-with-apache-airflow-df1aa77d7b1b
It's weird how Americans say "school" or "college" when they mean "university". Here in the UK, school is for kids. College is optional for people in their late teens. University is for degrees and research.
Do engineers work? according to 95% on YT of "Day in a life of a Software, Data, ML engineer" you guys spend the day eating and playing Ping Pong. Hahaha - Thank you for the videos, Subscribed!
So at my previous job we used SQL Server, and when I’d open up SSMS on that left side panel I’d see Multiple Databases; I’d only query from one of them for my work. So is each individual database a “Data-warehouse”? Or are all of the databases considered a “Data-Warehouse”? The reason I ask is because at my new job we query the “Data-Warehouse” but it’s just one database with 100’s of tables. For those of us who wanna get more foundational knowledge on this stuff, Do you know of any comprehensive book(s) that cover Data-warehousing, Data Lakes, ETL, etc.? Or websites?
Ideally, a datawarehouse should exist on a single database. It makes it easier to conform dimensions across disparate facts. That said, the real answer is, "it depends".
I am a data engineer, and this is the best video I’ve ever watched that describes exactly what I do in daily basis. Thank you for sharing.
Glad you enjoyed it!
Hii what about math bro what should DE know?
This is a great video, and perfectly timed for me. I know SQL and Power BI, and had been trying to figure the next skill to learn, Python (I already learnt a little 3 years ago), ETL/ELT, Data Factory, Data Warehouse design etc . Your video answered my question, brush up on Python. Thanks!
Yeah, Python would be a great place to start. Generally most interviews will focus on 2-3 key areas from a tech perspective.
1. SQL
2. Coding
3. ETL and Data warehouse design.
So it would be a good idea to make sure your coding is on point. They may ask some high level questions about more specialized topics like streaming and distributed computing. But these can usually be answered using a similar explanation you would find in a blog. Like, what are the pros and cons to using streaming or real-time data pipelines?
Man your channel is exactly what was I looking for. I'm mexican and there isn't content about this topic, only data science and that stuff and I want information about this topic specifically. So many thanks.
Glad you enjoyed this data engineering video! Yeah I see lots of data science, which is why I started putting out more content.
'Seattle Data Guy' is going in my resume as a tech stack.
Oh do you have all these skill!
Super helpful! Thanks Seattle Data Guy!
Thanks for the comment! I am glad it was helpful
that was a great call to action at the end, " the question I have for you is why haven't you hit that like button" solid ending
Finally some content for us Data Engineers
Move over data scientists and software engineers! Move over techlead!
Man thank you so much. I'm trying to be a data-engineer. And I watched so many videos that really disappointed me. But your video ignited my mind one more time.
thanks so much you're amazing.
Glad to hear it! What other questions do you have for data engineering?
Any progress so far?
I like how your ever so subtlety promoting AWS databases with that box in the background lol. I see you lol
It was more me trying to play around with the background. Amazon sent me a bunch of swag, it’s also where all the t shirts come from.
A strong man shares what he got, you are a strong man with a kind ❤️
I love sharing! Thanks for the comment
Thanks! would be really nice to have a short video explaining data warehouses/data lakehouses/cloud and where to look for best practices etc. I find it very hard to find information on these topics!
Yeah, the hard thing is these are always changing and are so different per company...but let me see if I can eventually put this together.
@@SeattleDataGuy thank you!
Im a data engineer recruiter in the netherlands and I declare this as awesome content.
Thank you!
do you have any openings for juniors/trainee/interns? I'm somewhere between 3 and 5 in the pyramid with basic knowledge of gcp and aws
Thanks for this , I was trying to get into DE after a career gap. Didn't know where to start , so i learned sql basics and doing some exercises on it. Now next in line looks like data warehousing and ETL .
what do companies expect from a person trying become a DE ? do they look for all these topics from the person or they doing to stick with SQL and DW & ETL ?
how deep should my knowledge should be when applying for the interview.
Thanks again for this video. Looking forward to your more videos
First, thanks for the comment! As far as what you should know. As a basis you should know ETLs, DW, SQL and coding. Most interviews will ask you some combination of these topics. From there, you may want to look into streaming vs batch etls. Sometimes interviewers will ask for high level discussions of these topics or perhaps distributed computing pros and cons. Some of the more unique questions I have gotten were "What is the difference between an integration test vs a unit test and explain the different styles of compression." Does that help?
@@SeattleDataGuy absolutely. It did really help. Now i have roadmap on what to study and what to expect.
Could you tell me where to study data warehousing and etl tools if you don't want to join courses . Looked at lots of videos but none doesn't seem to help.
Thanks again
@@amalnisham hey mate what are you doing right now
What references would you recommend for DW/ETL/ETL? Currently I'm going trough 'Star Schema: The Complete Reference' and Datacamp's Data Engineering with Python Track. What do you think?
This is an excellent video, thank you so much.
Thank you so much for the comment! Let me know if you have any other videos you would like to see!
thanks! helped me a lot. greetings from 🇧🇷
Glad it helped! Hello from the states!
Great video... thanks for explaining it nicely-it helps
Glad it was helpful! Thanks for the comment!
Thanks so much. This was really informative. Please keep making more videos.
I am glad you enjoyed these videos! I keep making them. Have you seen any of my new videos??
Amazing! This video was really helpful.
Glad it could help! What are you hoping to see next?
Great info in here Ben, many thanks. You mentioned an article about data visualisation best-practices. I don't think it was posted in the details section. Can you share a link to it?
uxplanet.org/10-rules-for-better-dashboard-design-ef68189d734c
Good call, I will add it in the description
Great expanation. I've just landed my first proper data job, i'm definitely out of my depth, but in a good way, this is super helpful sir thank you
Useful...thanks for the info...
I am glad you thought so! Thank you for your time and comment
Will be really helpful to have a session on - comparison and suggestions on best data engineer paths .. AWS vs GCP vs Azure !!
That might be a fun video to make!
@@SeattleDataGuy yes, i agree with the cuestion. And from your experience, being a Data Engineer, wich of the 3 publics cloud did you have the best experience working?, or is the same experience on any just with diferent services names?
The issue with most Data Engineering postings is they put dozens and dozens of requirements no one person needs to have to do DE job. I have seen some DE jobs ask for PhDs and literally 24 software packages. Oh yes, they also only want to pay bare minimum.
I have seen the occasional job description for DEs requiring a masters, but never a PHD! That's wild. I can see that for like a research or pure data scientist. But yes, the struggle for most technical positions are always description bloat.
perfect roadmap sirthaks a lot
You're welcome!
git is basic programming skill, really unfortunate to put DevOps several layers higher there (or that it's a hierarchical in importance). Note that DevOps has 'dev' in it.
Thanks for that rich content that you keep giving us 🙌🏾🙌🏾🙌🏾. You didn’t talk about DataCamp Data Engineering Certificate. Is it worth It ?
Thanks for the comment. Its a great way to get up to speed quickly, but you likely won't get a job because of it.
I’ve almost done with Data engineering career track on data camp but can’t solve bash (shell) course. Could you help me or maybe do you know where to find the answers?:)
I probably won't be able to help but let me know if you find a good blog or article to help!
@@SeattleDataGuy I did it! thank you 🙏
Thank you for making these for us. :)
Glad you like them!
Very well explained. Thanks!
Thank you!
Nice explanation!
Thanks I am glad you enjoyed the video!
Thanks for the great video, do you know a good source or website where one can practice application of SQL and Python for data engineering? I have the grasp of Python and SQL, but I would like to use application, any help would be appreciated.
Try looking into a demo of flask. Its a great way to actually apply python th-cam.com/video/Z1RJmh_OqeA/w-d-xo.html
Awesome content...is Perl used at all in data engineering? Especially considering the regex capabilities?
It could be used, but I would say that leads to a lot of custom coded solutions which will be very expensive to maintain. Using other pre-created libraries like Airflow or Dagster would be better.
@@SeattleDataGuy how often do you see it being used? In real data engineering work today?
@@priteshugrankar6815 If you're referring to Perl, Never. Airflow and dagster, plenty.
@@SeattleDataGuy Thank you 🙏🏻 yes I was referring to Perl.
@@priteshugrankar6815 Yeah I haven't seen much Perl. Sorry!
What books do you recommend for getting into the weeds with the second layer of the pyramid?
I need to make that video! But I always say start with Kimball's book on data warehousing. It's classic.
@@SeattleDataGuy thank you! I'll look that up
I am a react dev having exp of 2 years, i am learning python and sql in order to switch to data engineering, should i learn ETL next after python and sql? Thanks
Thank you for this video and all your other videos. It's been really helpful. I am a software engineer with a pretty solid foundation in programming (java) and sql. Level 2 of your pyramid is where I'd like to get more experience, and luckily for me I have a real project at my company that would be perfect. What we have is a postgres database with some pretty large OLTP tables of around 50M - 90M rows. The goal is to build some data viz dashboards using this data, but other than that it's a totally blank canvas. Are there any specific tools you'd recommend to look into for this project? Thanks
Have you done any data warehouse development for this project yet?
Really good video! 👏👏👏
Thank you! High praise coming from another tech youtuber!!! If you're in these comments, then perhaps check out Dev John's channel.
Thank you.
Thank you!!
Hey, great information. Quick question. What’s a good starting point for a data engineer out of school with a stats and analytics degree. By the time I graduate I will have experience with Python, Java, R, and SQL
My guess is it would be great to start learning some data warehousing and data pipeline concepts. This is usually where the gap is in terms of knowledge. This is because these are applied skills on top of your coding and stat skills.
@@SeattleDataGuy I’m sorry I wasn’t clear in exactly what I was asking. I was asking what jobs would you apply for to get the relevant experience. Thanks again!
@@liamhoward2208 I see! Well I wouldn't limit yourself. If you can find jr. data engineer, or data engineering internships, then I would try that.
Or if you can find BI jobs those are also great.
Finally, look for analyst jobs that have ETL work.
At the end of the day its about doing the right work. So ETLs, data warehousing, data viz, data pipelines and so on.
Also, the more you can be on an engineering team where you build best practices, the better.
Is that helpful?
@@SeattleDataGuy Yes it is. Thank you. I just want to land a job that gives me good SQL and Python experience. In general, from what I understand, those are the two skills that are in most demand right now.
@@liamhoward2208 Yeah there are a lot of places you can pick up those skills. Analyst, BI developer, data engineer, etc. All of these will have different mixes of SQL and python. So I would look for jobs by skills instead of by title.
Thanks
you're welcome!
Very informative and understandable for a new 🐝
Glad you found it informative. Good luck on your DE journey.
Im a junior frontend, should I try becoming data analyst first before data engineering?
Hmm, do you have mysql or other traditional db experience?
Very informative video. Is the job responsibilities of data engineers similar to business intelligence analyst?
Glad you found it helpful. It's generally a little more focused on building data sets and less on BI
thank you
you're welcome!
Well done sir!
Thank you! Good luck with your data engineering journey.
What do you think of the Udacity data engineering Nano degree? I’m about to take the course before doing an AWS or Azure Data engineering certification. Is this pathway comprehensive enough to get an entry level role in engineering?
I haven't taken their nano degree yet. I usually point towards the GCP data engineering course since its kind of free. I am meaning to review the certificates at some point
@@SeattleDataGuy we have free DE course from GCP? Could you please provide me the link of "GCP data engineering free course".
Nice!
Thank you!
Ty bro
Glad you liked the video!
how much time it take to get a entry level job in the data engineer??????
Please reply I need help just suggest me a good platform to learn necessary skill or any websites
Hey! Thanks for the comment. I do have a video where I discuss both free and paid for courses on these skills. Let me know if this video helps! Happy to answer more questions th-cam.com/video/lVj0RlSxTXk/w-d-xo.html
@@SeattleDataGuy really helped me alot just one que more ..as there are many libraries present in Python can you suggest me some important one regarding this course...
@@shrutimishra9809 You can't go long using airflow for data pipelines, pandas for analytics and flask for building APIs.
@@SeattleDataGuy so what i need?/sorry for asking too many questions..m learning all on my own. I
@@shrutimishra9809 Library wise, airflow, flask and pandas is a great place to start. From there I would say try to build some form of project like a basic scraper that uses airflow to pull data and then flask to put together some form of website.
Here are some articles on the topics. Does this help? Or would you like actual courses?
Flask
betterprogramming.pub/building-your-first-website-with-flask-part-1-903a8b44e806?sk=8636b4aa9dafd81464d0bf55cb2b7863
Airflow
towardsdatascience.com/getting-started-with-apache-airflow-df1aa77d7b1b
It's weird how Americans say "school" or "college" when they mean "university".
Here in the UK, school is for kids.
College is optional for people in their late teens.
University is for degrees and research.
Language is weird that way. It does always sound proper for someone to say university. But also, being that tomorrow is July 4th...🎆 'MERICA
Do I need to be good at math?
Or arthimetics it's just fine?
I need to learn scala too?
You can! But it depends company to company
How the faq did you hide the publication date of this video?
Yeah, I don't think I did? I see the date. It's april 4th 2021
@@SeattleDataGuy Crazy, because for me it shows nothing but blank space to the right of the views, which I have never seen before.
Do engineers work? according to 95% on YT of "Day in a life of a Software, Data, ML engineer" you guys spend the day eating and playing Ping Pong. Hahaha - Thank you for the videos, Subscribed!
Cute haircut, man. Looks like straight from the 70's.
So at my previous job we used SQL Server, and when I’d open up SSMS on that left side panel I’d see Multiple Databases; I’d only query from one of them for my work. So is each individual database a “Data-warehouse”? Or are all of the databases considered a “Data-Warehouse”?
The reason I ask is because at my new job we query the “Data-Warehouse” but it’s just one database with 100’s of tables.
For those of us who wanna get more foundational knowledge on this stuff, Do you know of any comprehensive book(s) that cover Data-warehousing, Data Lakes, ETL, etc.? Or websites?
Ideally, a datawarehouse should exist on a single database. It makes it easier to conform dimensions across disparate facts. That said, the real answer is, "it depends".
Parking here since I'm starting a career path in Data warehousing
Ralph Kimball's Data Warehousing toolkit and his website which has snapshots from the book with core fundamentals is all you need
👍🏼
thanks!
I can't even able to spell upper layer names.
What do you mean?