Implementing Vertical Sharding

Arpit Bhayani

มุมมอง 8 936

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 4 มิ.ย. 2024
System Design for SDE-2 and above: arpitbhayani.me/masterclass
System Design for Beginners: arpitbhayani.me/sys-design
Redis Internals: arpitbhayani.me/redis
Build Your Own Redis / DNS / BitTorrent / SQLite - with CodeCrafters.
Sign up and get 40% off - app.codecrafters.io/join?via=...
In the video, I explained the importance of sharding in scaling databases, focusing on vertical sharding where tables are distributed across multiple servers. I discussed the transition from monolithic to microservices architecture and how vertical sharding helps in this shift. I detailed the implementation steps of moving tables between database servers, emphasizing the use of tools like Zookeeper for storing meta information and ensuring reactive updates across API servers. The process involved dumping tables, loading them into new databases, setting up replications, and performing a seamless cutover for data consistency.
Recommended videos and playlists
If you liked this video, you will find the following videos and playlists helpful
System Design: • PostgreSQL connection ...
Designing Microservices: • Advantages of adopting...
Database Engineering: • How nested loop, hash,...
Concurrency In-depth: • How to write efficient...
Research paper dissections: • The Google File System...
Outage Dissections: • Dissecting GitHub Outa...
Hash Table Internals: • Internal Structure of ...
Bittorrent Internals: • Introduction to BitTor...
Things you will find amusing
Knowledge Base: arpitbhayani.me/knowledge-base
Bookshelf: arpitbhayani.me/bookshelf
Papershelf: arpitbhayani.me/papershelf
Other socials
I keep writing and sharing my practical experience and learnings every day, so if you resonate then follow along. I keep it no fluff.
LinkedIn: / arpitbhayani
Twitter: / arpit_bhayani
Weekly Newsletter: arpit.substack.com
Thank you for watching and supporting! it means a ton.
I am on a mission to bring out the best engineering stories from around the world and make you all fall in
love with engineering. If you resonate with this then follow along, I always keep it no-fluff.
วิทยาศาสตร์และเทคโนโลยี

ความคิดเห็น • 47

@d4devotion 2 ปีที่แล้ว ⁺²
I have hit my head so many times understanding the sharding, but could not get it so well. But this guy never fail to explain the things in so so easy way. I am lucky that I found this channel on YT.
@AsliEngineering 2 ปีที่แล้ว
🙌🙌
@adianimesh 2 ปีที่แล้ว ⁺³
binge watching after a break ! So much quality content over the last week. Thanks a lot
@homestaysandcafes ปีที่แล้ว ⁺¹
Really grateful to God that I found this valuable gem like content on time♥️
Never worry about views, because some gem music videos are also hidden and craps are getting 1B views
@vasusharma1192 2 ปีที่แล้ว ⁺⁵
Maybe a dumb question but here I go
If the table renaming step ( table to table.bak) is done after firing zookeeper update, can’t this be more helpful in reducing the small database down time ( assuming zookeeper updates happen immediately without consistency issues )
Saying this because, if we do this, the second DB server is anyways up and will take requests and renaming can happen later … this will also ensures that the replication is completely done
@AsliEngineering 2 ปีที่แล้ว ⁺¹¹
If we update the config and then rename the table the tables will diverge i.e. the new table will get some writes and the old one will also get some writes.
This would lead to an unresolvable conflict. For example: old table has rows till ID 100, the new table also has updates till row ID 100. Now you update the config and it takes 100 ms reflect it on all servers but one of the API server got the changes in 1ms.
So for rest 99ms there would be a situation where both the tables are accepting the writes from a subset of API servers.
This would lead to a divergence/conflict.
Consider auto increment ID column. There will come a time where both the table will have two different rows with the same ID because two different API servers wrote to the tables in different databases.
Which is why to have consistency and no conflict we taking a miniscule downtime, cutting off the traffic, and then sending the update.
Hope that helps.
@vasusharma1192 2 ปีที่แล้ว
@@AsliEngineering thanks a lot for the quick reply, that clears everything… Amazing content btw, no one covers such practical aspects of things so well .. Hats off ✌🏻
@ashishtewari2162 9 หลายเดือนก่อน ⁺¹
Great content Arpit. Very easy to understand,
small doubt - Why to rename the table first then go for zookeeper config change? Why not first update the config in zookeeper then take back up the table. This will reduce the availability loss.
@mukeshmahadev7419 ปีที่แล้ว ⁺²
Arpit bhai you just rocked it, ek dum top level content with no clutter even for 1 second.
This video filled me up with confidence that I can handle database in production.
Started binge watching your channel.
Keep making content Sir.
One thought that hit me while watching this video : This type of content will catalyse the transition of India from being IT services hub to IT manufacturing hub😄
@AsliEngineering ปีที่แล้ว
Exactly my vision. Glad you resonated ✨
@DEEPAKKUMAR-wk5pk ปีที่แล้ว
you nailed it, man
@ramyakrishnan8741 5 หลายเดือนก่อน
Thanks for an amazing video - may i know the difference between federation and vertical sharding?
@mehrajuddin8798 ปีที่แล้ว
Thanks Arpit, Allah bless you. Top notch level content.
Have one query :
If I have a large DB/Table for which I have indexing on some columns as well. While partitioning my data, do my indexing also got partition or I have to do manual indexing on my data partition on it's restored on different DB instance.
@Aditya-us5gj 2 ปีที่แล้ว
Designing cannot be anymore intersting and easy when compared to your videos. Just keep those videos comming everyday !! I've already took out a slot from my day to watch your videos.
@AsliEngineering 2 ปีที่แล้ว ⁺¹
Thank you so much :)
@shivamsrivastava3076 2 ปีที่แล้ว
Just connecting the dots, is this the same way how we scale blob storage (S3/Azure) when data node in a bucket gets hot? :)
@vighneshmahale 2 ปีที่แล้ว
Very Informative!
@chiragrajani1606 ปีที่แล้ว
What about the failed requests when we renamed the table ie `Table Not Found` part. Read requests are acceptable but those write requests will be lost, wont be that a consistency issue?
@raj_kundalia ปีที่แล้ว ⁺¹
thank you!
@6vikas 2 ปีที่แล้ว ⁺¹
One of the best content on YT for Vertical Sharding , looking forward for Horizontal Sharding video. :). One question related to joining between 2 database tables , do we need to use host level join in case?
@AsliEngineering 2 ปีที่แล้ว ⁺²
We would not join across databases. Joins would happen locally.
Also, thank you so much for the kind words 🙌
@arunrahullakkapragada2304 ปีที่แล้ว ⁺¹
One doubt. While copying bin log to shard 2 we record last time stamp or id till which we copied right? After that copy is done, we start replication right?
CDC or replication service catches up the shard 2 with live updates
What about the updates that are happening to db while we are copying the bin log?
@AsliEngineering ปีที่แล้ว
Already answered in the video. But still put some more thought and you'll get the answer on your own.
@cnp6501 11 หลายเดือนก่อน
how is vertical sharding different from partitioning?
@aniruddhkhera510 4 หลายเดือนก่อน
Arpit, as always amazing video, thanks for sharing. I was actually planning to join your Feb cohort but couldn't enroll before the registrations got closed.
I have some thoughts on this video, maybe I am missing something. I feel migration of table t1 from 1 db server to another with this approach is kind of over-engineering. I have done migration in my previous company, let me explain my approach.
1. We don't need to store the metadata about which db server the table belongs to in zookeeper or any service discovery. Generally in each app server we have our DB configurations file (yaml, xml), we can add and maintain both the DB configs in that. And app server connects to both.
2. The cutover can happen gradually with dual writes to the table in both the DB servers (simple code change). And historic data can be migrated by the db table snapshot.
3. The final cutover can be done by maintaining a config in a remote config, which is basically WIREON/WIREOFF (WOWO) configuration, i.e. turnoff the writes to the previous db server table (example: disable.writes.to.xyz := true)
Let me know your thoughts..
@ujjwalsaini5830 4 หลายเดือนก่อน
Great content. Didn't feel like skipping even for a sec. Kudos!! Also, one question - How do we go about migrating huge table from one database server to another? By huge table I am assuming that the table size is big and also there are huge number of writes happening.
@AsliEngineering 4 หลายเดือนก่อน
Migrating high write database from one to another is done in 6 broad steps.
0. Take snapshot
1. Load it in a new database
2. Setup replication
3. Let it catch up
4. Stop the write for a fraction of second
5. Failover
@ujjwalsaini5830 4 หลายเดือนก่อน
@@AsliEngineering so the same strategy is being followed whether the table size is big or small? or Are there any alternate practices being followed to make the migration more efficient?
@notionmakeit2888 8 หลายเดือนก่อน
how can we get your notes Please help
@rahulsarkar4206 ปีที่แล้ว
How the watch updates config of API server? Are they connected on websocket? Dont think so generally. Please explain.
@AsliEngineering ปีที่แล้ว
these granular details I cover in my course, so cannot answer it here.
@vikassrivastava7081 2 ปีที่แล้ว ⁺¹
Indepth video! 🙏🏼
@vikassrivastava7081 2 ปีที่แล้ว
Arpit bro , can u suggest any book for beginners like me for System design alongside ur awesome videos!!
@debmalyapan53 2 ปีที่แล้ว
amazing
@AnubhavShrivastava 2 ปีที่แล้ว
awesome
@pranjalmishra2602 ปีที่แล้ว
What if a request needs to connect to two tables present in different DB servers?
@AsliEngineering ปีที่แล้ว
what do you mean when you say "connect"?
@pranjalmishra2602 ปีที่แล้ว
@@AsliEngineering
I meant, there comes a request which needs some data from a table which is there in DB1 and another data from the table which is in DB2.
I guess I'm still unclear:(
@sayantankundu4532 11 หลายเดือนก่อน
Hey Arpit, Great Video . Have a doubt here
You mentioned zookeeper watch will inform the API server when there is a change, but where will the API Server store this config information ?
If API server is not storing the config information then with every request we need to hit the zookeeper first to get the config, which will surely add latency.
@AsliEngineering 11 หลายเดือนก่อน
You don't need to make network call everytime. Local copy of config is held at the server.
@sayantankundu4532 11 หลายเดือนก่อน
Thank you arpit for clarifying it
@imdsk28 ปีที่แล้ว
Massive Like ❤
@deepadeshra7195 2 ปีที่แล้ว
Maybe a silly question,
but I am confused with one thing in DB sharding.
Let's say in DB-1 there is T1 and T2. there is one relationship between T1 and T2 (foreign key relationship), and then we moved T2 to another database server DB-2. so T1 in DB1 and T2 in DB2, so in these distributed scenario how the data integrity will be maintained ?
@AsliEngineering 2 ปีที่แล้ว ⁺²
You have to drop foreign keys. You cannot have cross shard foreign keys.
@deepadeshra7195 2 ปีที่แล้ว
@@AsliEngineering Thank you :)
@kaustavdas1577 2 หลายเดือนก่อน
Price increased 1.8 times in 1 year
@AsliEngineering 2 หลายเดือนก่อน
in 2 years. also the course has changed significantly. it is much more in depth than what I used to cover.

ต่อไป

เล่นอัตโนมัติ