One thing missing here is how they setup the monitoring system to have such analysis from the running system, which is also an important part. Would like to hear more about this.
Really good content. Just one point - writes go to both the local SSDs and the persistent disks at the same time. Write-mostly in Linux md means that reads go to the other disks, so in this case the local SSDs.
I don't think that is needed in case of sending and receiving messages, as long as data sync from the write disk come in order, we only need eventual consistency. For example, in a group chat, every body sending and receiving messages, right? As long as the messages stored correctly in the write DB, they can eventually be synced to the read DBs. I mean there is no hard consistency here.
The real beauty of ScyllDB lies in Seastar framework which bypasses OS kernel to achieve close to metal speed. Also, Cassandra 5.0 which will release in few months from now won't have this issues and will much higher throughput as it will be able to compile against Java 17 SDK and might also probably run Java 21 which has Generational ZGC garbage collector which gives you sub milliseconds latency. Cassandra also have the advantages of defining aggregate function in Java or JS and run in Casandra DB instance itself rather than fetching all data in application server which is super expensive.
@@ankeshkapil3129 How is a more efficient database more costly to run? The more performant a DB is, the cheaper it is to run, even if you're talking about small workloads
my takeaways and a question: ScyllaDB with C++ under the hood and no GC was a part of it. Request coalescing is another part. Selecting and deciding on optimal Google Cloud services i.e Persistent disk is another part. The two layered RAID setup is another part. Modifying the Linux kernel to do write on the Persistent disks and reads from the local SSDs was another part(But I'm wondering how they made the changes to the Linux kernel of the Google VMs? anyone help me out here, do they bake the changes to a VM image and deploy the image to the Google VMs?) And finally for the migration using a data migrator written in memory safe and highly performant Rust was another great decision.
I may not have enough experience with such tasks, but how do they make sure they pick the right database that fits their performance requirements (at that scale) ?
Sometimes these companies have experts (maybe one of the developers of such DB) but at such a scale it's mostly these companies, who test if these DBs can widthstand such a hight load. So I'd call it precision guess work. Compare existing solutions in terms of performance etc., maybe run a extensive tests and then just give it a go.
They didn't - there are Petabyte scale deployments used by many organisations that are larger than Discord uses. Your DB is going to be as good as its integration with the system and use case. This case study only shows discord lacks expertise or is reluctant to take in advisory from experienced people. This is true for Twitter and airbnb, too, when they switched DBs, thinking it would help but didn't. It requires a lot of experience and architectural knowledge of DB itself to make the rught call. In reality tho who make right call are usually not in like light or advertise about their milestones. So it's not an open book that can be simply explained with diagrams while implementation details would vary a lot. P.S. Even large orgs make mistakes or lack skills. They sometimes need external help, and when they dont, they often run into performance issues that are talked about in the public domain. How they pick it? It's mostly biased towards teams experience and decision makers bias towards tech they know until they are ready to move on and do all migrations to new tech, it's always moving, changing so there's no specific reciepie for that.
This actually follows some money logic. First you pick a tool that is good enough and FREE, like Cassandra. If the business develops slowly, then you can get by with this initial solution just fine. But, when you have a booming growth, as is the case with Discord, the free solution is not designed for that type of scale, so you start to think about high-availability, low-latency, and so on, and usually end up with some commercial solution, that may be even be based on your original free solution, but with enterprise features on top. In the case of Discord, they just picked a DB with better engine, and they leveraged a different storage solution in the cloud, ergo a better performing storage. Of course, the explosive growth guarantees money is no longer an issue - you can afford the enterprise solution and the better performing storage because customers are paying :). There is you logic :).
it think the fact that they changed the behavior of read and write of an SSD make the different. I dont really think changing the DB make that much different, yes it may be better but not so much without the technic of override SSD behavior. And since this is Discord, they migrated from a garbage collector language and move to a non-garbage collector language before (Go to Rust), it could also affect their choice in this case.
Inspiring story. It would be really cool to see how Telegram solves their backend challenges. They have an even greater scale, smaller team with less funding and fastest latency of any messenger. I wish they shared more of their backend, that system is engineering marvel, just like Discord.
I have literally never seen a company be happy with choosing Cassandra. I know of engineering orgs who have an entire pod of SREs dedicated to nothing but keeping the Cassandra alive.
Hmm ... As a user of ScyllaDB. This doesn't seem like the most daunting DB migration scenario, I could imagine. I mean... Had it been any other database than Cassandra, the task would have been magnitudes larger.
QUICK QUESTION: I've seen this style of YT video somewhere else too, somewhere in a tech video I remember. Is there a tool you're using to make these kind of videos? Or did you (or someone you hired) edited the video yourself?
I am confused the difference between "Request coalescing" and what your other videos call caching? Since "Request coalescing" is part of the success here I'd like to understand the difference better, thanks!
I think the boxes at th-cam.com/video/O3PwuzCvAjI/w-d-xo.htmlsi=3YXYT_VTw2QqriyG&t=343 are a bit misleading. First, probably scylla uses both persistent disks for writes and NVMe(s) for reads, so, really one box should contain both with arrows for reads and writes. Secondly, the message service is partitioned, therefore, different writes go on different instances of scylla instead of a single scylla master node or central cluster which the triangular placement of scylla in diagram suggest.
Is it ok to move older messages to a separate database periodically? So the main database would not be carrying so much data all the time. And have dedicated service to read messages from that database with historical messages.
If the writes were going to the persistent disk with high latency, then how were they made immediately available (with low latency) to all the intended members of that messaging group?
@@daviduzumaki In that case they might have to again deal with data loss issues which they were facing, and the reason why they introduced persistent storage in the first place.
An amazing video thanks man, I just have a question about the Data service part, isn't similar to redis or CDN or anyother cache mechanism? I'm just wondered why they called this name and what it does exactly
I believe is not like a CDN or Redis. I understand that is an API acting as an interface between the Monolith and the database, decoupling one form another - which allowed to use a language (Rust) more efficient to handle the high-performance data operations.
Well it doesn't actually tell HOW it stores trillions of messages (like partitioning, cache organization, hot path issues), just a high level overview of the migration and dbms
To my understanding, without a garbage collector there are no stop-the-world pauses to collect the garbage in the background and therefore more computing resources are spent on the actual work of DB.
The Garbage Collector is responsible for checking which variables aren't more needed in the program and removing than from memory, the problem is, the Garbage Collector has to run every now and then, and this process requires processing power and will add latency to the main application running, because for the most of the time, the whole application or parts of it has to be stopped and wait for the Garbage Collector to do it's job, even though this process takes a few milliseconds, in a high throughput application this can add up and slow down the whole system. Languages like C++ and Rust have a "manual" memory allocation and deallocation, the programmer is responsible to take care of it before hand, so you won't need a garbage collector at runtime, what results in faster executions ...
I can agree it is cool technology wise, but 99.9% of discord messages are garbage, and who will ever scroll back to see any of them? I admire the engineering effort it just seems to me kind of wasted effort
Hats off for the on call team during these processes 😅
This channel is underrated.. actually a lot of useful videos are available
Yes dude he has taught me so much like wow .
It is not underrated, for the subject and 500k subs is pretty well rated actually.
It is a very well known resource.
you should have discussed how they moved from mongodb to cassandra, its was a bigger engineering challenge
Can you brief about why nosql makes sense in this context instead of relational dbs?
@@gradientOscaling writes in relational db is hard or not as possible
@@gradientOthere’s a lot of overhead in maintaining the relationships in a relational database and there’s overhead in ensuring strong consistency
@@terencepan2232 You can definitely shard but the operational complexity is insane
Because its not even written in original blog post from Discord.
Amazing content man . Also your animation are really helpful in making things more understandable. Keep up the great work.
One thing missing here is how they setup the monitoring system to have such analysis from the running system, which is also an important part. Would like to hear more about this.
Really good content. Just one point - writes go to both the local SSDs and the persistent disks at the same time. Write-mostly in Linux md means that reads go to the other disks, so in this case the local SSDs.
I don't think that is needed in case of sending and receiving messages, as long as data sync from the write disk come in order, we only need eventual consistency.
For example, in a group chat, every body sending and receiving messages, right?
As long as the messages stored correctly in the write DB, they can eventually be synced to the read DBs.
I mean there is no hard consistency here.
The real beauty of ScyllDB lies in Seastar framework which bypasses OS kernel to achieve close to metal speed. Also, Cassandra 5.0 which will release in few months from now won't have this issues and will much higher throughput as it will be able to compile against Java 17 SDK and might also probably run Java 21 which has Generational ZGC garbage collector which gives you sub milliseconds latency. Cassandra also have the advantages of defining aggregate function in Java or JS and run in Casandra DB instance itself rather than fetching all data in application server which is super expensive.
On the blog post it was pretty clear, that they were tired of GC alltogheter.
@@bruterasta Generational ZGC is auto tunable.
Amazing. I have read about the features of Scylla but this a definitive proof that is an excellent database.
But costly to run
@@ankeshkapil3129 How is a more efficient database more costly to run? The more performant a DB is, the cheaper it is to run, even if you're talking about small workloads
next month I'll migrating one of my database, it's so painful to planning and flow designing of migration, but this video help a lot.
my takeaways and a question:
ScyllaDB with C++ under the hood and no GC was a part of it.
Request coalescing is another part.
Selecting and deciding on optimal Google Cloud services i.e Persistent disk is another part.
The two layered RAID setup is another part.
Modifying the Linux kernel to do write on the Persistent disks and reads from the local SSDs was another part(But I'm wondering how they made the changes to the Linux kernel of the Google VMs? anyone help me out here, do they bake the changes to a VM image and deploy the image to the Google VMs?)
And finally for the migration using a data migrator written in memory safe and highly performant Rust was another great decision.
Those animations look sick!!🔥🔥🔥🔥
kudos to make a video to the point and easy to grasp not like other channels doing long videos to hack the youtube algorithm.
what an Excellent team to be able pull off that humongous data 🥶
"discord found themselves in a bit of a pickle"....good one :)
great content!!
I’m just curious, what tools are you using to make this beautiful diagram and fancy effects 😮❤
Hi your contents are worthy, it was such a meticulous way of explanation of concepts,thank you
Scylla is a saviour ❤
Thanks for this, very helpful.
Cool, interesting to watch! learning all the time!
Discord doing the backend: 👹
Discord doing the UI/UX: 🥺
Could you make a video on the Data services part which written in Rust ? That will be a quite interesting to many folks !!
"of course. Migration production data is no joke!". Thanks for the quality contents.
Nice explainer. Can you also please link the OG blogpost in the description?
I may not have enough experience with such tasks, but how do they make sure they pick the right database that fits their performance requirements (at that scale) ?
Sometimes these companies have experts (maybe one of the developers of such DB) but at such a scale it's mostly these companies, who test if these DBs can widthstand such a hight load. So I'd call it precision guess work. Compare existing solutions in terms of performance etc., maybe run a extensive tests and then just give it a go.
I guess they have TONS of metrics which cann give them the info they exactly need on what they need and don't need to optimizr
They didn't - there are Petabyte scale deployments used by many organisations that are larger than Discord uses. Your DB is going to be as good as its integration with the system and use case. This case study only shows discord lacks expertise or is reluctant to take in advisory from experienced people. This is true for Twitter and airbnb, too, when they switched DBs, thinking it would help but didn't.
It requires a lot of experience and architectural knowledge of DB itself to make the rught call. In reality tho who make right call are usually not in like light or advertise about their milestones. So it's not an open book that can be simply explained with diagrams while implementation details would vary a lot.
P.S. Even large orgs make mistakes or lack skills. They sometimes need external help, and when they dont, they often run into performance issues that are talked about in the public domain.
How they pick it? It's mostly biased towards teams experience and decision makers bias towards tech they know until they are ready to move on and do all migrations to new tech, it's always moving, changing so there's no specific reciepie for that.
This actually follows some money logic. First you pick a tool that is good enough and FREE, like Cassandra. If the business develops slowly, then you can get by with this initial solution just fine. But, when you have a booming growth, as is the case with Discord, the free solution is not designed for that type of scale, so you start to think about high-availability, low-latency, and so on, and usually end up with some commercial solution, that may be even be based on your original free solution, but with enterprise features on top. In the case of Discord, they just picked a DB with better engine, and they leveraged a different storage solution in the cloud, ergo a better performing storage. Of course, the explosive growth guarantees money is no longer an issue - you can afford the enterprise solution and the better performing storage because customers are paying :). There is you logic :).
it think the fact that they changed the behavior of read and write of an SSD make the different. I dont really think changing the DB make that much different, yes it may be better but not so much without the technic of override SSD behavior. And since this is Discord, they migrated from a garbage collector language and move to a non-garbage collector language before (Go to Rust), it could also affect their choice in this case.
Amazing topic!!!
Only 9 days what a huge achievement
Thank you for the video!
Inspiring story. It would be really cool to see how Telegram solves their backend challenges. They have an even greater scale, smaller team with less funding and fastest latency of any messenger. I wish they shared more of their backend, that system is engineering marvel, just like Discord.
Me too, i'm really impressed on how Telegram is very responsive and optimized
Awesome content. Please, tell me what you use for the graphics and animations. They are so fluid.
Thank you for the video.
💯 Great episode! Interesting takes on this 🐘 project! 😎✌️
excellent conten.
I love your content. Do we have a discord server for this channel and this community? I would really love to engage more.
I have literally never seen a company be happy with choosing Cassandra. I know of engineering orgs who have an entire pod of SREs dedicated to nothing but keeping the Cassandra alive.
Rust is a superstar !!
Hmm ... As a user of ScyllaDB. This doesn't seem like the most daunting DB migration scenario, I could imagine.
I mean... Had it been any other database than Cassandra, the task would have been magnitudes larger.
Why?
Quality content as always, love this channel
If Discord would only create a good UX. Having may servers is a nightmare to go through..
QUICK QUESTION: I've seen this style of YT video somewhere else too, somewhere in a tech video I remember. Is there a tool you're using to make these kind of videos? Or did you (or someone you hired) edited the video yourself?
someone please explain more on the RAID 0 setup? What is the safety net under that? Am I missing/not understanding something?
That's incredible
thanks for content :)
I am confused the difference between "Request coalescing" and what your other videos call caching? Since "Request coalescing" is part of the success here I'd like to understand the difference better, thanks!
cool,I want to try in my work
Love your content and respect for your efforts
kudos to the Discord team 👏
Please make video in realtime application like figma, Google docs, or multiplayer game
Thanks for the content. A little introduction was missing what kind of database this is and why it is better, except for C++ under the hood
I think the boxes at th-cam.com/video/O3PwuzCvAjI/w-d-xo.htmlsi=3YXYT_VTw2QqriyG&t=343 are a bit misleading. First, probably scylla uses both persistent disks for writes and NVMe(s) for reads, so, really one box should contain both with arrows for reads and writes. Secondly, the message service is partitioned, therefore, different writes go on different instances of scylla instead of a single scylla master node or central cluster which the triangular placement of scylla in diagram suggest.
3:28 is that like a cache on query parameters?
What pipeline tool did they use for the data migration from Cassandra to Scylla?
That's just awesome sharing. Thanks!
wow, can i ask what after effect template is used for presentation? I would like to present my work in school.
Is it possible that there's a typo at 4:48 and following? There are two md0 devices.
What tools do you use to create these videos
What software do you use to create such presentation?
Is it ok to move older messages to a separate database periodically? So the main database would not be carrying so much data all the time.
And have dedicated service to read messages from that database with historical messages.
If the writes were going to the persistent disk with high latency, then how were they made immediately available (with low latency) to all the intended members of that messaging group?
I'm guessing data was also stored in an LRU cache when it was written to the persistent disk
@@daviduzumaki In that case they might have to again deal with data loss issues which they were facing, and the reason why they introduced persistent storage in the first place.
what do you use for these animations?
When there's a new message or edited old message, how do the app receive the changes? Is it by listening to data on the DB (like a Web Socket)?
is discord the only paltform having to store trillions of messages? or data for that matter? how have other companies solved it?
I think quite a few like meta companies products : whatsapp, insta, fb. Google : google fpm, gmail, google ads, youtube
How is their solution different from RAID10? It is essentially raid1 array mirroring a raid0 set. Am i missing something?
Can't they use AWS dynamoDB?
To the Discord devs 🥂
An amazing video thanks man,
I just have a question about the Data service part, isn't similar to redis or CDN or anyother cache mechanism? I'm just wondered why they called this name and what it does exactly
I believe is not like a CDN or Redis. I understand that is an API acting as an interface between the Monolith and the database, decoupling one form another - which allowed to use a language (Rust) more efficient to handle the high-performance data operations.
So they didn’t plan to have data warehouse?
how to run scylla db in the cloud??
Waouh!
Show!!!!
Well it doesn't actually tell HOW it stores trillions of messages (like partitioning, cache organization, hot path issues), just a high level overview of the migration and dbms
Can you make me a pro at syatem design ?? I just love this subject...
9 days - are you kidding me!
Friendly programming noob here: Can someone explain why Garbage Collection-free gives Scylla an advantage over Cassandra? Thanks in advanced.
To my understanding, without a garbage collector there are no stop-the-world pauses to collect the garbage in the background and therefore more computing resources are spent on the actual work of DB.
Joy != rust
What is a garbage collector, how it is useful for DB’s?
Cassandra is written in Java, so the garbage collector is automatically part of it.
The Garbage Collector is responsible for checking which variables aren't more needed in the program and removing than from memory, the problem is, the Garbage Collector has to run every now and then, and this process requires processing power and will add latency to the main application running, because for the most of the time, the whole application or parts of it has to be stopped and wait for the Garbage Collector to do it's job, even though this process takes a few milliseconds, in a high throughput application this can add up and slow down the whole system.
Languages like C++ and Rust have a "manual" memory allocation and deallocation, the programmer is responsible to take care of it before hand, so you won't need a garbage collector at runtime, what results in faster executions ...
@@RicardoSilvaTripcallappreciate this
So much effort for messages that nobody can ever find again anyway 😂Seriously, Discord scrollback might as well be write-only it's so unusable. :(
Scylla is basically a C++ version of Cassandra. This story is one more proof of java's inferiority. Why are people still using Java ?? 🤷🏽♂️
Because they can't learn C++ :)
👀👌
How do they store TRILLIONS of messages? Like everyone else, they use DATABASE. Hows that any different of any other services?
I can agree it is cool technology wise, but 99.9% of discord messages are garbage, and who will ever scroll back to see any of them?
I admire the engineering effort it just seems to me kind of wasted effort
Not true at all. If you have message garbage on Discord, it your server choice you have to question, not the tool or the others... 🥱
1st viewer😂
Brother, I love your content, but please, your tone is too flat. It makes me sleepy