Thanks a lot , we are building our data warehouse system and we have encountered this issue. we are planning to go ahead with insert a dummy value option at this moment. the way you have explained all methods were really helpful. Thanks a lot
Nice video. Thank you. I have used 2nd option. We used to keep data in the staging source table for 40 days and process everyday along with incremental data. If the employee is assigned to a team(say for example after 10 days of first load), we load the data along with the team id in the stage fact table. If we don't get employee after 40 days, we ignore the record. I believe this is the clean way of processing because in this way we will get correct dimension key and fact data.
We do both park and retry as well as dummy entries, first is the dummy entry, we check whenever a dummy entry is created, it has to be repaired so we send a notification to get it fixed after parking it. Once it is repaired, we use the new value in the parked record and process it.
Thanks Priyanka for the kind words. I have a small request, I am working on this new channel and will really appreciate if you watch and subscribe to it. th-cam.com/video/GnVn3mPBRz4/w-d-xo.html
Nicely explained. I have seen approach 3 being used more often. while doing so, data integrity, accuracy and quality is maintained. This will also help improve the quality of the OLTP systems vis a vis the business process at that layer
Nice video!! Thanks I agree conceptually but we know that most data warehouses will also use surrogate keys. So with option 4, you will have to not only insert in dimension but also get back the surrogate key generated and apply it on the fact. In your examples the distinction is missing between natural and surrogate keys which is quite crutial from ETL perspective. Think this step complicates things specially if you are doing this across all dimensions. Can you tell me how you would handle this? Also with option3, again assuming we have surrogate keys, you would need to heal that in fact and one way would be to hold both natural and surrogate key in fact and run an update process on fact at periodic intervals where you only fix if key is dummy. Can you share some thoughts on these ?
Good comment...I know you posted this over a year ago but I'd just like to chime in with my thoughts on your suggestion regarding option 3. I agree that holding the natural key in the fact will allow for easy updates. You are suggesting to only update the fact if a surrogate key becomes available for rows which had the unknown dimension member assigned (-999). I'd be inclined to update the fact if any of the underlying natural keys have changed in the interim for whatever reason. Doing it this way would cover the above scenario and also all other corner cases where things change. I'm just thinking out loud here so feel free to disagree :)
Thanks Mahendar for the kind words I have a small request I am working on this new youtube channel, I Would love it if you watch and subscribe to it as well. th-cam.com/video/GnVn3mPBRz4/w-d-xo.html
Hi there, I have small doubt. You said it will give error when we try to insert null value in foreign key column. But we can insert null value in foreign key column. I am not from DWH background so not sure about this. Could you please clear my doubt.
Thanks a lot , we are building our data warehouse system and we have encountered this issue. we are planning to go ahead with insert a dummy value option at this moment. the way you have explained all methods were really helpful. Thanks a lot
I am glad I could help . Parking the record works well :)
One of the beautifully explained video on this ! Thanks Man
Thanks Ranjan :)
Bravo! Outstanding video with great examples of COA’s. I’ve used the parking method in my jobs.
Thanks Mike for the kind words :)
You are a savior! Thanks!! I will go through all your videos..
Very Well explained, thanks.
Thanks a lot buddy, please consider subscribing to my other channel as well it will really help me :)
th-cam.com/video/gTg6nCUuYO8/w-d-xo.html
Nice video. Thank you.
I have used 2nd option. We used to keep data in the staging source table for 40 days and process everyday along with incremental data. If the employee is assigned to a team(say for example after 10 days of first load), we load the data along with the team id in the stage fact table. If we don't get employee after 40 days, we ignore the record. I believe this is the clean way of processing because in this way we will get correct dimension key and fact data.
If you are not getting data for 40 days that's a huge lag.
This is the only video which explains this concept so easily. Great job sir!!
Thanks a lot sandesh for the kind words, I will upload a lot of new videos soon so please keep watching :)
@@TechCoach we will await your videos sir.
The new video on Index organized table is live now, Check out here.
Happy learning:)
th-cam.com/video/mIo44Ydnd3o/w-d-xo.html
nicely explained !!!
Thanks Buddy :)
Good work keep up the work.. and thanks for sharing knowledge..looking forward to gain knowledge from you.
Sure AK1007, I will be creating new videos soon :)
Very Well explained. Thank you!!
Thanks a lot buddy for the kind words :)
Awesome thank you
We do both park and retry as well as dummy entries, first is the dummy entry, we check whenever a dummy entry is created, it has to be repaired so we send a notification to get it fixed after parking it. Once it is repaired, we use the new value in the parked record and process it.
Nice Mayank, that's a good approach.
Thanks for the detailed explanation👍🏻
Thanks Priyanka for the kind words.
I have a small request, I am working on this new channel and will really appreciate if you watch and subscribe to it.
th-cam.com/video/GnVn3mPBRz4/w-d-xo.html
Nicely explained. I have seen approach 3 being used more often. while doing so, data integrity, accuracy and quality is maintained. This will also help improve the quality of the OLTP systems vis a vis the business process at that layer
Thanks Saugata for the kind words, My apologies for the delayed reply
Nice video!! Thanks
I agree conceptually but we know that most data warehouses will also use surrogate keys. So with option 4, you will have to not only insert in dimension but also get back the surrogate key generated and apply it on the fact. In your examples the distinction is missing between natural and surrogate keys which is quite crutial from ETL perspective. Think this step complicates things specially if you are doing this across all dimensions. Can you tell me how you would handle this?
Also with option3, again assuming we have surrogate keys, you would need to heal that in fact and one way would be to hold both natural and surrogate key in fact and run an update process on fact at periodic intervals where you only fix if key is dummy.
Can you share some thoughts on these ?
Good comment...I know you posted this over a year ago but I'd just like to chime in with my thoughts on your suggestion regarding option 3. I agree that holding the natural key in the fact will allow for easy updates. You are suggesting to only update the fact if a surrogate key becomes available for rows which had the unknown dimension member assigned (-999). I'd be inclined to update the fact if any of the underlying natural keys have changed in the interim for whatever reason. Doing it this way would cover the above scenario and also all other corner cases where things change. I'm just thinking out loud here so feel free to disagree :)
Super sir good explanation
Thanks Mahendar for the kind words I have a small request
I am working on this new youtube channel, I Would love it if you watch and subscribe to it as well.
th-cam.com/video/GnVn3mPBRz4/w-d-xo.html
Can you please make video on fast changing dimensions / rapid changing dimensions
Sure Kanishk , I have noted it in my backlog I will work on it soon
@@TechCoach thankyou so much 😊😊
Hi Kanishk , I have created the video and scheduled for day after tomorrow at 11 AM IST :)
@@TechCoach thankyou so much ❤️
blessings !
thank you
Thanks Juan :)
Hi there,
I have small doubt. You said it will give error when we try to insert null value in foreign key column. But we can insert null value in foreign key column.
I am not from DWH background so not sure about this.
Could you please clear my doubt.
Null is acceptable in FK but if FK and PK are not mapping it throws an error