Data Load Strategies - Full vs Incremental Load

The Data Channel

มุมมอง 19 710

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 10 พ.ย. 2024

ความคิดเห็น • 74

@isaackodera9441 6 หลายเดือนก่อน ⁺¹
Thank you for this detailed simple explanation. You are God sent.
@thedatachannel878 6 หลายเดือนก่อน
Glad that it was useful for you
Please Subscribe and share. Happy Learning 👍
@prakashgadupoodi7549 ปีที่แล้ว ⁺²
Thanks a million for the detailed explanation. Please keep doing the great stuff.
@thedatachannel878 ปีที่แล้ว
Thank you, keep supporting. Happy learning
@Prashant-qz8ky 8 หลายเดือนก่อน ⁺¹
Very precise explanation, thank you sir😊
@thedatachannel878 8 หลายเดือนก่อน
Thank you Prashant
Please Subscribe and share
Happy Learning 😊
@vsssport ปีที่แล้ว ⁺¹
one of the best SCD video🙂
@thedatachannel878 ปีที่แล้ว
Thank you, happy learning. Please Subscribe and share
@madhur2045 3 ปีที่แล้ว ⁺¹
Much Useful...Please Bring More and more such content..
@thedatachannel878 3 ปีที่แล้ว
Thank you for the support
@junedkhan-ge5mi 6 หลายเดือนก่อน ⁺¹
full load stagging layer main use karte hai na ?
@thedatachannel878 6 หลายเดือนก่อน
Generally yes
@junedkhan-ge5mi 6 หลายเดือนก่อน ⁺¹
But thatis we called truncate and load?
@thedatachannel878 6 หลายเดือนก่อน
@junedkhan-ge5mi when we do Full load it is usually truncate and load unless we like to keep existing data for some specific case
@abubakarsaddique7668 11 หลายเดือนก่อน ⁺¹
Great explanation. Thank you
@thedatachannel878 11 หลายเดือนก่อน
Thank you for your support. Please subscribe and Happy Learning 😊🎉
@prajakta887 ปีที่แล้ว ⁺²
Hello interviewer asked what kind of transformation you have done in project can you explain on that ? What are the transformation we can apply in project
@thedatachannel878 10 หลายเดือนก่อน
It clearly depends on Projects and the domain but generally transformations are applied to clean the Data and make Data quality better, then it depends on ETL or ELT but in both cases Transformation will be applied using joins between multiple tables and select only columns for final report requirements.
If any further questions please feel free to mail yt.the.data.channel@gmail.com
@Adikes_InfoTech 7 หลายเดือนก่อน ⁺¹
What if the records are deleted from source permanently and the same records need to be deleted from target table. How can we do this? Please explain this as well. Thank you.
@thedatachannel878 7 หลายเดือนก่อน ⁺²
Good question, if there is delete scenario, logic would be little different. If record is deleted in upstream and we can choose to delete the record in downstream or else we can choose to keep with additional column like delete_indicator and flag it is Y when record deleted in upstream
@Adikes_InfoTech 7 หลายเดือนก่อน
Ok, but can you give me the logic how to delete the records from target table? And is it possible to implement Incremental load without using any variables or parameters in PC? This is interview question. Thank you in Advance@@thedatachannel878
@helovesdata8483 2 ปีที่แล้ว ⁺¹
This answered a lot of questions I had about delta loads. Thank you
@thedatachannel878 2 ปีที่แล้ว
Thank you, please Subscribe and Happy Learning 👍
@vanshchauhan3910 2 ปีที่แล้ว ⁺¹
@@thedatachannel878 what is Preprocessor while doing ingestion ?
@thedatachannel878 2 ปีที่แล้ว
Preprocessing here means, suppose you want to ingest only 100 records out of 1000 based on certain where condition.
Hope this is clear, if not please let me know, happy to explain much in detail of needed
@vanshchauhan3910 2 ปีที่แล้ว ⁺¹
@@thedatachannel878 actually me and my team use to do daily prod ingestion. In KT one of senior explained that first you need to do Preprocessor for all tables then u can start incremental ..we use to activate Preprocessor in the watcher ..if Preprocessor is completed then we can start incremental load for that table.
@thedatachannel878 2 ปีที่แล้ว
@@vanshchauhan3910 preprocessing simply means, you can apply row level filter in where or column level filters in select clause while selecting columns or sometimes it can even be joining 2 or more tables. But important point to note when we say preprocess in ETL is these filters is applied before actually ingestion happen and also the compute for this will be your source system
@yamunau.yamuna5189 ปีที่แล้ว ⁺¹
Super explination thanks bro
@thedatachannel878 ปีที่แล้ว
Thank you, please subscribe and share with your friends.
Happy Learning 💐
@kirangaikwad2716 2 ปีที่แล้ว ⁺¹
sir, in general medium level project like Telecom Domain project at what volume data is loaded and how many records generally loaded from source to target?
@thedatachannel878 2 ปีที่แล้ว
It completely depends on project to project but I can say something from 1million to 10million records on average
@kirangaikwad2716 2 ปีที่แล้ว ⁺¹
@@thedatachannel878 Thank you sir
@sridharnk3391 2 ปีที่แล้ว ⁺¹
Very well explained, Thanks.
@thedatachannel878 2 ปีที่แล้ว
Thank you Sridhar
@prajakta887 ปีที่แล้ว ⁺¹
Hello How to validate incremental loading.interviewer asked me this question, can you exaplain it
@thedatachannel878 ปีที่แล้ว
There might be multiple ways of doing. One of them would be write a script/program which will take count of source and count of target system, then perform the count validation. Also you can take some KPI like sum of amount column, average of some number columns from both source and target and then do comparisons of those KPI to see if these match than your incremental load is validated to be working as expected
Hope this helps
@eugene2653 ปีที่แล้ว ⁺¹
Great learning material!
@thedatachannel878 ปีที่แล้ว
Thank you! Cheers!
@vinaybk1799 3 ปีที่แล้ว ⁺¹
Informative 🙌
@thedatachannel878 3 ปีที่แล้ว
Thank you...keep learning...Happy Learning.
@sidonguitar5858 2 ปีที่แล้ว ⁺¹
very well explained
@thedatachannel878 2 ปีที่แล้ว
Thank you....keep supporting and keep learning...😊 Happy Learning 👍
@gowthamsagarkurapati9388 10 หลายเดือนก่อน ⁺¹
awesome explanation! could you please also explain on merge concepts.
@thedatachannel878 10 หลายเดือนก่อน
Sure, watch out, will soon explain about merge.
Happy Learning 🥳
@ksm1847 ปีที่แล้ว ⁺¹
Great explanation ❤
@thedatachannel878 ปีที่แล้ว
Thank you, please Subscribe and share with your friends.
Happy Learning 👍
@pragati3187 2 ปีที่แล้ว ⁺¹
it was really helpful... thank you
@thedatachannel878 2 ปีที่แล้ว
Glad to know it is helpful.. Kindly SUBSCRIBE and that will motivate us to bring more quality content like this...Thank you ...!
@akshitjain7666 ปีที่แล้ว ⁺¹
What if in source table some record gets deleted then how will it be executed in incremental load ?
@thedatachannel878 ปีที่แล้ว
Usually this process is only for inserts or updates, if there are old records deleted and that's the scenario, would recommend to run some separate housekeeping job scheduled which will check all the records from target and see if they exists in source, if not delete from target
@thedatachannel878 ปีที่แล้ว
You can choose to schedule this daily/hourly based on how frequently or critical this is about.
@thedatachannel878 ปีที่แล้ว
Hope that helps, Happy Learning 👍
@johnthomas1891 2 ปีที่แล้ว ⁺²
Well Explained 🙂
@thedatachannel878 2 ปีที่แล้ว
Thank you John
@prakashgadupoodi7549 ปีที่แล้ว
Could you post the practical code snip for this example please. Thanks in advance!
@thedatachannel878 ปีที่แล้ว
Sure, will bring up more practical examples
@jgowrri ปีที่แล้ว ⁺¹
How to use control file instead of control table ?
@thedatachannel878 ปีที่แล้ว
File can be used as control instead of table, however it is highly not recommended not to use File based control as it leads to huge performance impact especially when file is Big.
Even though if still there is need to used file based, recommend file format is Json as it is easy to read or update and has performance benefits as well
@jgowrri ปีที่แล้ว
@@thedatachannel878 thank you for the response . Do you have any blog or reference for this ?
@jgowrri ปีที่แล้ว
@@thedatachannel878 most of the data is moving to cloud these days I don’t see performance hit for control file .. as the same row get hit everyday growth of data would be less . Do you see any other performance cause this case ? Much appreciate your response
@thedatachannel878 ปีที่แล้ว
Yes, still it will have performance issue and when I say that it is the performance cost not performance time. Also it is not an efficient way of doing as every time your process has to do IO operations on file
@thedatachannel878 ปีที่แล้ว ⁺¹
But still if you want to do just read json control file from python and read it is dictionary and write it back as dictionary
@bgm7644 2 ปีที่แล้ว ⁺¹
good explation
@thedatachannel878 2 ปีที่แล้ว
Thank you, please subscribe and share,
Happy Learning 🎉
@vinodbarma1753 ปีที่แล้ว ⁺¹
How about if data gets deleted how to do it in incremental load
@thedatachannel878 ปีที่แล้ว ⁺¹
One way is to have additional column called active-indicator which can be Y if record is active and you can set to N if the record gets deleted in source or also can delete in target as well. However this very rare in case on incremental loads as incrementals are mostly transactional in nature and generally don't get updated
@vsssport ปีที่แล้ว ⁺¹
please explain end to end data pipeline for data engineering
@thedatachannel878 ปีที่แล้ว ⁺¹
Sure Vinod, we will upload more such videos, please keep Following us
@marcobernardo1604 2 ปีที่แล้ว ⁺¹
Thank you for good content
@thedatachannel878 2 ปีที่แล้ว
Thank you Marco ❤️
@dhrupadsaha4171 ปีที่แล้ว ⁺¹
Thank you so much sir
@thedatachannel878 ปีที่แล้ว
You are welcome thank you for the support, Happy Learning
@thedatachannel878 ปีที่แล้ว
Welcome, Happy Learning 👍
@saeedwiz หลายเดือนก่อน ⁺¹
Where is the lady

ต่อไป

เล่นอัตโนมัติ

Data Scientist vs Data Engineer- How different are these ?