Hello I am preparing the dp-203 and your channel is simply magical. you explain complex concepts very simply. I really like your method with the whiteboard and the hand drawings. thank you very much for this quality content in your channel. I know it's a lot of preparation work behind the final video.😊🙏
Hi Tybul! Question, when you said that the bronze layer is immutable, does that rule out that the tables can have some kind of incremental update? I mean, for example. I had to extract from an on-premise database via CDC, and in turn schedule the entire CDC azure preview resource. Would that be wrong? In addition to that process, I only update the update date of the bronze tables in that layer.
Hi, the approach depends on how you manage incremental updates. If you ingest only the changed data (the increments) without any immediate processing, it should go into the raw, immutable layer. From there, you would process the data to apply the changes - such as insert, update, or delete - and store the result in another layer. This resulting layer would essentially be a 1:1 copy of your source data, stored in Delta Lake format, assuming you are implementing SCD Type 1 and do not need to maintain historical records. Alternatively, if your ingestion process directly updates the target data, it would be better to store the data directly in the next layer since it is being continuously updated.
Hi Piotr, It would be nice to see some video about how to verify the quality of the data in different layers before it reaches end user. Great video like always, please keep it up!
As a "Data Engineer" member of my channel, you’ll have the special privilege of suggesting topics for new videos and voting on them. If you have a topic in mind, I’d love for you to join as a member. I’ll be setting up the first poll once I complete the DP-203 course.
You are explaining the complex concepts in a nice way. So I thought it would be great to listen the partitioning concept from you.Beacause I found it somewhat confusing when I started learning it by myself.
15:58 You mention that we can process all data from scratch. Is it also possible to easily process data from a certain point? For example, all data from the last 2 weeks.
Hi Tybul. Nice explanation . I have a query regarding PII. Can we anonymise the PII in the raw data itself ? or we anonymise the PII during transformations?
It depends on your requirements and what your legal team says, e.g. you might not be able to store PII data in raw layer at all. Then what? I can see three basic options: 1. Don't ingest PII data at all (if possible). 2. Get rid of PII data on the fly before writing the data to the raw layer. 3. Add an additional zone (raw-PII) with tight security measures, dump your raw data there, then read from it, get rid of PII data and save the outcome in regular raw layer. Optionally, set automatic removal of files from raw-PII layer after few days or so.
Hi Tybul. I am training to become a data engineer on Azure and I was planning in joining the club of the "Junior section". However, I could not find what I was looking for. For a fee, would you be able to to make interviews for real job scenarios? Would it be something you would consider to be part of your service package? Your tutorials are great and it gives me confidence, great work!
Due to TH-cam's membership policy, I can't offer 1:1 meetings. However, I'm thinking about introducing a new membership tier that would include a monthly group call. In these sessions, we could cover different topics, brainstorm ideas, do live training or interviews, consult, or just have a casual chat. Please note, though, it would be a group setting.
Hello Sir, The question I will ask may not be relevant to topic of the video. Is there a specific reason to partition our Sales Orders Dataset by the Ingestion Date?
You can call it however you want, e.g. staging, raw or bronze. The important thing is to make everyone aware what it means and what kind of data it stores. I talked more about data lake zones in 30th episode.
Hello Pybul, this course is very good. It was what I wanted to complement my architecture data master. I'm really not clear on how to load the same database every day without repeating the same data over and over again, with increasing daily cost. Can you give a real example of how to face and solve this problem?
Sure. Basically you would write your data extraction SQL queries in an incremental way. Take a look here (learn.microsoft.com/en-us/azure/data-factory/tutorial-incremental-copy-overview) for more details.
Hello
I am preparing the dp-203 and your channel is simply magical.
you explain complex concepts very simply. I really like your method with the whiteboard and the hand drawings.
thank you very much for this quality content in your channel.
I know it's a lot of preparation work behind the final video.😊🙏
I liked so much that class!!! I'm happy to review and to learn some concepts and understand real time scenarios.
so important Thank you
My pleasure, I'm glad that those videos were useful.
Great content on how to structure/organize our data in Raw layer!!
Thank you, now i understand landing zone, raw zone, ect... I used them before but i just got how & why are they there!
Glad it was useful!
Hi Tybul! Question, when you said that the bronze layer is immutable, does that rule out that the tables can have some kind of incremental update? I mean, for example. I had to extract from an on-premise database via CDC, and in turn schedule the entire CDC azure preview resource. Would that be wrong? In addition to that process, I only update the update date of the bronze tables in that layer.
Hi, the approach depends on how you manage incremental updates. If you ingest only the changed data (the increments) without any immediate processing, it should go into the raw, immutable layer. From there, you would process the data to apply the changes - such as insert, update, or delete - and store the result in another layer. This resulting layer would essentially be a 1:1 copy of your source data, stored in Delta Lake format, assuming you are implementing SCD Type 1 and do not need to maintain historical records.
Alternatively, if your ingestion process directly updates the target data, it would be better to store the data directly in the next layer since it is being continuously updated.
Hi Piotr,
It would be nice to see some video about how to verify the quality of the data in different layers before it reaches end user.
Great video like always, please keep it up!
As a "Data Engineer" member of my channel, you’ll have the special privilege of suggesting topics for new videos and voting on them. If you have a topic in mind, I’d love for you to join as a member. I’ll be setting up the first poll once I complete the DP-203 course.
Hi Tybul,
The contents that you are delivering is awesome!!!. Can you also please make a video on Data partitioning and its types and implementation.
What do you have in mind?
You are explaining the complex concepts in a nice way. So I thought it would be great to listen the partitioning concept from you.Beacause I found it somewhat confusing when I started learning it by myself.
15:58 You mention that we can process all data from scratch. Is it also possible to easily process data from a certain point? For example, all data from the last 2 weeks.
It is possible to process only a subset of data - I'm mentioning this in the "Dynamic ADF" episode.
Hi Tybul. Nice explanation . I have a query regarding PII. Can we anonymise the PII in the raw data itself ? or we anonymise the PII during transformations?
It depends on your requirements and what your legal team says, e.g. you might not be able to store PII data in raw layer at all. Then what? I can see three basic options:
1. Don't ingest PII data at all (if possible).
2. Get rid of PII data on the fly before writing the data to the raw layer.
3. Add an additional zone (raw-PII) with tight security measures, dump your raw data there, then read from it, get rid of PII data and save the outcome in regular raw layer. Optionally, set automatic removal of files from raw-PII layer after few days or so.
@@TybulOnAzure thanks for the detailed explanation .
Hi Tybul. I am training to become a data engineer on Azure and I was planning in joining the club of the "Junior section". However, I could not find what I was looking for.
For a fee, would you be able to to make interviews for real job scenarios? Would it be something you would consider to be part of your service package?
Your tutorials are great and it gives me confidence, great work!
Due to TH-cam's membership policy, I can't offer 1:1 meetings. However, I'm thinking about introducing a new membership tier that would include a monthly group call. In these sessions, we could cover different topics, brainstorm ideas, do live training or interviews, consult, or just have a casual chat. Please note, though, it would be a group setting.
@@TybulOnAzure thanks for replying. That group setting would be a good start
Hello Sir,
The question I will ask may not be relevant to topic of the video. Is there a specific reason to partition our Sales Orders Dataset by the Ingestion Date?
Yes - just to know when given set of data was ingested from the source.
Is raw layer also called 'staging'? I think it's used for silver layer
You can call it however you want, e.g. staging, raw or bronze. The important thing is to make everyone aware what it means and what kind of data it stores.
I talked more about data lake zones in 30th episode.
Hello Pybul, this course is very good. It was what I wanted to complement my architecture data master. I'm really not clear on how to load the same database every day without repeating the same data over and over again, with increasing daily cost. Can you give a real example of how to face and solve this problem?
Sure. Basically you would write your data extraction SQL queries in an incremental way.
Take a look here (learn.microsoft.com/en-us/azure/data-factory/tutorial-incremental-copy-overview) for more details.
@@TybulOnAzure Thanks Tybul
Wonderful
can you please explain medallion architecture?
It is mentioned in future episodes.
i agree with christian
🤙 Thanks
Wonderful
Thanks