@ankit bansal: Great job on explaining the concept. qq: Instead of making the end date as forever, will it make sense to keep it as NULL & include another column such as is_current_value which would be a boolean field. When someone wants to track the history in the report, an analyst can simply put the condition for start_date, end_date IS NULL and is_current_value = 'n' to take a look at the previous record or they could query on the start_date, end_date IS NOT NULL and is_current_value ='y'. You could even use an OR operator in the query with the structure I'm proposing. Using forever as the end_date is frowned upon in the data warehousing world IMHO.
Hi Ankit, great explanation, how to handle scenario in scd2 type two, when there are insert, update and delete all together in staging for the same record. Assuming we are using cdc to keep track of changes and using cdc info to update the dim tables
then you need to create one more temp_table while running script by keeping where timestamp in stg_table > max(timestamp) in dim_table to get the only changed records to temp_stg table now data is in temp_table (which has only latest records) dim_table has old records as of now (we did not performed any transformations yet) now follow anikt procedure to keep history track
@ankit Bhaiya, Instead of doing manual work by query, We can create insert update trigger also, that will be a good automation work. What you say brother.☺
Thank you for creating such quality content. I have a question, is it possible to implement such SCD2 using merge ? (where update and insert are involved to maintain history, same example as described in video). Thanks in advance.
for mysql query is slightly changed: set @updated_date='2024-01-20'; UPDATE product_type1_dim a, product_stg b SET a.price = b.price, a.last_update = @updated_date WHERE a.product_id = b.product_id ;
How we can have data till last three change of id … suppose if id 1 iPhone12 40000 Changes to id 1 iphone12 30000 Changes to Id 1 iphone12 25000 Changes to id 1 iphone12 20000 I want in the final table last 3 changes only that means dont want first change that means when prize was 40000 .. this first record will be ignore give me explanation Output data 20k 25k 30k
you wont believe ,i was just learning the same concept from your python course today itself in the morning
Great stuff.Must learn one by every data enthusiast.
Great way of explaining SCD types
you are too good..very very nice explanation
Thank you for the appreciation
Great Ankit, thanks. I am completely new to this concept and its very useful
great video ! need more data modelling and data engineering videos man !
@ankit bansal: Great job on explaining the concept. qq: Instead of making the end date as forever, will it make sense to keep it as NULL & include another column such as is_current_value which would be a boolean field. When someone wants to track the history in the report, an analyst can simply put the condition for start_date, end_date IS NULL and is_current_value = 'n' to take a look at the previous record or they could query on the start_date, end_date IS NOT NULL and is_current_value ='y'. You could even use an OR operator in the query with the structure I'm proposing. Using forever as the end_date is frowned upon in the data warehousing world IMHO.
If there was a way to love your videos and not just like.. Learning a lot Ankit. Thanks
Cheers 🥂
Very useful. Thank you!
Superb explanation 👌 👏 👍
Yaay just yesterday only I learned this thanks
Great Explanation !
Thanks for the video Ankit
Awesome Bro..
Very good information and thanks for the content. How to create staging tables in the first place?
Best video.. Thanks !! If possible pls make videos on SQL performance tuning or launch course.
Hi Ankit, great explanation,
how to handle scenario in scd2 type two, when there are insert, update and delete all together in staging for the same record.
Assuming we are using cdc to keep track of changes and using cdc info to update the dim tables
then you need to create one more temp_table while running script by keeping where timestamp in stg_table > max(timestamp) in dim_table to get the only changed records to temp_stg table
now data is in temp_table (which has only latest records)
dim_table has old records as of now (we did not performed any transformations yet)
now follow anikt procedure to keep history track
@ankit Bhaiya, Instead of doing manual work by query, We can create insert update trigger also, that will be a good automation work.
What you say brother.☺
That will be too much load because it will trigger for each row.
Needed this video but 6months ago... Bt we did it together in office with a friend that time 😀😺 using sql
Thank you Ankit Bro
sir, can we implement scd-1 via merge statement. i mean to ask is merge statement is nothing but the scd-1 only?
Thank you for creating such quality content.
I have a question,
is it possible to implement such SCD2 using merge ? (where update and insert are involved to maintain history, same example as described in video).
Thanks in advance.
It can be done but merge operation can have performance issues.
Thank you so much Ankit ❤😊
My pleasure 😊
my question is if we connect the data in power bi desktop so we need to manually do this scd 2 or it will automatically updated
Hi Ankit sir will you start any data engineering course ?
In SCD1 when first insert is completed we emptied the stg table.. How can we do changed to update dim without empty the stg after first insert
1ST TABLE IS UPSERT NOT TRANCATE LOAD RIGHT?
Million Thanks
Can't we use merge to perform the SCD2 implementation?
Performance is not good with merge.
Thanks @ankit
Can’t we use merge statement instead of using two separate insert and update statements???
Performance not good with merge
Sir which one is first video I learn to this course I start my career plz help me
th-cam.com/video/ejdIgYPfcV4/w-d-xo.html
Can't we implement it using Merge statemnet
What if same record comes in staging table,how to handle it?@ankit
That is the case of copy records. We can check if the key and value are the same then ignore them
ELT and ETL approaches are different in operations
you have written ELT as extract tranform and load. It's extract load and tranform
❤❤❤
Not sure what you have not implemented by using merge statement
for mysql query is slightly changed:
set @updated_date='2024-01-20';
UPDATE product_type1_dim a, product_stg b
SET a.price = b.price, a.last_update = @updated_date
WHERE a.product_id = b.product_id ;
How we can have data till last three change of id … suppose if id 1 iPhone12 40000
Changes to id 1 iphone12 30000
Changes to
Id 1 iphone12 25000
Changes to id 1 iphone12 20000
I want in the final table last 3 changes only that means dont want first change that means when prize was 40000 .. this first record will be ignore give me explanation
Output data
20k
25k
30k
bro keep the pace slow. You speak too fast
Ok next time. You can reduce speed from settings.
Thank you Ankit