Table Partitioning: If a single table is too large for the index, it may be a good idea to split the table (for example) per month of data. Your queries now must specify the month to select the right table(s), but each table gets its own index, instead of having one massive index, returning speed to normal.
It doesn't have to be per month - you can also select per year or day, and it doesn't have to be per date either: You could split it per category or name as well.
🎯 Key points for quick navigation: 00:00 *📈 Scaling database importance* - Why scaling is necessary as applications grow - Effects of increased load on performance - Importance of maintaining smooth operations for good user experience 01:09 *📊 Indexing for database performance* - Indexes help locate information quickly in a database - B+ tree indexes are common and efficient for various queries - Balancing indexing for improved performance without slowing down write operations 02:07 *📑 Materialized views benefits and considerations* - Materialized views store pre-computed data for faster access - Balancing data refresh frequency with performance benefits is essential - Efficiency gains from materialized views in complex query scenarios 03:01 *⬆️ Denormalization advantages and drawbacks* - Denormalization simplifies data retrieval and speeds up queries - Consistency challenges with managing redundant data during updates - Impact of denormalization on complex query executions 03:56 *💻 Vertical scaling for immediate performance improvement* - Adding resources to an existing database server to handle increased load - Addressing limitations and cost considerations of vertical scaling - The importance of redundant database configuration in vertical scaling 05:04 *🚀 Caching to reduce database load and improve response times* - Storing frequently accessed data in a faster storage layer - Addressing cache invalidation challenges for maintaining data accuracy - Implementing caching at various levels for improved performance 06:14 *🔄 Database replication for availability and fault tolerance* - Creating copies of primary databases on different servers - Configuring synchronous and asynchronous replication for data consistency - Challenges of managing data consistency and overhead with replication 07:08 *🔀 Sharding for efficient distribution of database workload* - Splitting a large database into smaller, manageable pieces called shards - Effective scalability by distributing workload across multiple servers - Challenges and benefits of horizontal scaling through sharding techniques Made with HARPA AI
Great job addressing the top strategies for scaling and efficiency! Denormalization and effective data modeling is critical for scaling data in your data warehouse!
Data archival is also an excellent technique when the system scales and table size increases apart from partitioning as discussed in some other comments as it decreases overall load on the system. It's better to shard the db, partition it, and archive the older partitions.
7 Ineffable Paradigms for Augmenting Database Scalability **1. Indexation: The Cryptic Codex of Data Retrieval** Analogous to the esoteric indices of an arcane grimoire, database indices facilitate the expeditious exhumation of information sans the necessity of scrutinizing every infinitesimal datum. * **Modus Operandi:** Indices constitute labyrinthine data structures that warehouse specific column values and indicate the corresponding rows within the tabular labyrinth. * **Exemplification:** In a repository of clientele, indexing the 'customer identifier' permits swift chronological excavation of transactional history, circumventing a comprehensive perusal of the tabular expanse. * **Taxonomies:** * **B-tree Index:** The most ubiquitous typology, suitable for a wide spectrum of inquiries, including range-bound interrogations. Proffers rapid insertion, deletion, and lookup operations within its arboreal structure. * **Advantages:** Substantially diminishes the temporal duration required for query execution. * **Disadvantages:** * May decelerate write operations as the index necessitates perpetual recalibration with each infinitesimal data transmutation. * Ascertaining the optimal equilibrium and selecting appropriate fields for indexation is a Herculean task crucial for peak performance. **2. Materialized Views: Platonic Ideals of Pre-computed Data** Materialized views constitute the pre-calculated quintessence of intricate queries, warehoused in a realm of expeditious access. * **Modus Operandi:** A materialized view entombs the query outcome, undergoing periodic metamorphosis to reflect the most recent data transmutations. * **Exemplification:** In a Business Intelligence pantheon, a materialized view can enshrine daily sales reports, generated from a voluminous dataset of Cyclopean proportions. In lieu of perpetually interrogating the entire dataset, the report can be instantaneously manifested from the view's crystallized form. * **Advantages:** Substantially enhances performance by mitigating the Sisyphean computational burden. * **Disadvantages:** * Necessitates periodic rejuvenation to maintain data congruence, a process of Promethean resource consumption. * Striking a Delphic balance between refresh frequency and performance benefits is crucial. **3. Denormalization: The Faustian Bargain of Data Redundancy** Denormalization entails the introduction of deliberate redundancy, a Mephistophelian pact of warehousing data in multiple loci to accelerate retrieval. * **Modus Operandi:** Redundant data is appended to tables with Borgesian duplication, circumventing the need for complex joins across multiple tabular realms. * **Exemplification:** Social media Leviathans frequently denormalize data to warehouse user posts and information within the same tabular expanse, expediting the Herculean task of feed generation. * **Advantages:** Substantially augments read performance by simplifying the labyrinthine process of query execution. * **Disadvantages:** * Augments storage requisites with Brobdingnagian voracity. * Necessitates meticulous, Sisyphean management of updates to maintain congruence across the database's multifarious facets. * Can engender complexities and potential issues of Gordian proportions if not handled with Solomonic wisdom. **4. Vertical Scaling: Promethean Augmentation of Silicon Titans** Vertical scaling, or "scaling up," involves the Titanic augmentation of resources to your extant database server, a process akin to bestowing godlike powers upon mortal silicon. * **Modus Operandi:** Upgrading hardware such as CPU, RAM, or storage capacity of the existing server to Olympian proportions. * **Exemplification:** An online marketplace experiencing Promethean growth upgrades its database server to contend with increased load and transaction volume of Biblical proportions. * **Advantages:** * Relatively straightforward to implement, akin to granting Herculean strength to Atlas. * Provides immediate performance enhancements without necessitating Daedalian modifications to application architecture. * **Disadvantages:** * Limited scalability due to the Procrustean constraints of hardware limitations and pecuniary considerations. * Fails to address redundancy; a single server failure can still precipitate a database apocalypse. **5. Caching: The Mnemosyne's Embrace of Ephemeral Data** Caching, the art of storing frequently accessed data in Mnemosyne's bosom, provides a stratum of expeditious retrieval and mitigated database encumbrance. * **Modus Operandi:** Frequently accessed data is ensconced in a cache (in-memory or application-level), diminishing the Sisyphean necessity to perpetually query the database. * **Exemplification:** A streaming service of Amazonian proportions caches movie metadata to expedite title display, mitigating the Herculean database load. * **Advantages:** * Drastically reduces response times for frequently accessed data to near-instantaneous levels. * Enhances user experience to heights of Elysian bliss. * **Disadvantages:** * Necessitates a cache invalidation strategy of Delphic complexity to ensure data congruence. * Stale cache data can lead to the Cassandra-like prophecy of inaccurate information. **6. Replication: The Hydra-headed Proliferation of Data Simulacra** Replication involves the creation of Hydra-like copies of the primary database on disparate servers to enhance availability, distribute load with Herculean efficiency, and augment fault tolerance to Olympian levels. * **Modus Operandi:** Data undergoes mitotic division from the primary database to replica servers. * **Taxonomies:** * **Synchronous Replication:** Ensures immediate data congruence but introduces latency akin to Achilles' pursuit of the tortoise. * **Asynchronous Replication:** Proffers superior performance but may engender temporary incongruities, a Schrödinger's cat of data states. * **Advantages:** * Enhances read performance and availability to near-omniscient levels. * Augments fault tolerance with Promethean resilience. * **Disadvantages:** * Augments storage and maintenance overhead to Brobdingnagian proportions. * Introduces complexity in maintaining data congruence, particularly in distributed systems of Borgesian intricacy. **7. Sharding: The Alexandrian Solution to the Gordian Knot of Data Magnitude** Sharding involves the bifurcation of a voluminous database into smaller, more manageable segments called shards, distributed across multiple servers with Alexandrian precision. * **Modus Operandi:** Each shard contains a subset of the data predicated on a specific sharding key, akin to dividing the world among Olympian deities. * **Exemplification:** Instagram, that Panopticon of digital narcissism, shards its database by user ID, distributing data across a multitude of servers with the efficiency of Daedalus's labyrinth, achieving a load balancing feat worthy of Atlas himself. * **Advantages:** * Permits horizontal scaling by appending more servers, akin to adding new realms to Yggdrasil. * Substantially enhances both read and write performance to near-lightspeed efficiency. * **Disadvantages:** * Introduces complexity in database design and management that would perplex even the Sphinx. * Selecting the appropriate sharding key is a task of Delphic importance, crucial for equitable data distribution. * Querying and re-sharding can be complex and resource-intensive, a Herculean labor that would make Sisyphus weep.
thanks for another excellent video....every time I see such videos to make the applications faster, I just wonder that are we pushing the hardware/software limits or we are reducing our patience & sanity limits.
Thank you Alex. wonderful video. Additionally, I believe elastic search is good option as well for scaling reads...if eventual consistency between primary db and elastic search be maintained
Hello Sir!!, big fan of your content, very knowledgeable, a quick question what tool do you use to simulate all the workflow diagrams for understanding considering all architectural diagrams
Love your content ! Just clarify a bit on Denormalization, denormalization is we don’t apply normalization techniques (2NF,3NF,..) into db, that means we accept redundancies in db storage but it’s much efficient in queries (no need to join other tables).
In a traditional RBS when talking specifically about schema design yes your correct (taking me back to my relational calc days). Speaking more generally about distributed systems denormalization is a general team meaning just having multiple views of the same data normally backed by a specific SOT (source of truth), I consider putting distinct read models into my stack as denormalization. (Redis / Elastic / etc).
Mf’s love their diagrams and PowerPoint presentations about database scaling. I got an idea, actually do it. Show us how you’d shard and partition a db for real.. no more theory, no more diagrams, no more explanations. I’m starting to think no one can actually do it in reality and people just like talking about it..
13 And no man hath ascended up to heaven, but he that came down from heaven, even the Son of man which is in heaven. 14 And as Moses lifted up the serpent in the wilderness, even so must the Son of man be lifted up: 15 That whosoever believeth in him should not perish, but have eternal life. 16 For God so loved the world, that he gave his only begotten Son, that whosoever believeth in him should not perish, but have everlasting life. 17 For God sent not his Son into the world to condemn the world; but that the world through him might be saved. 18 He that believeth on him is not condemned: but he that believeth not is condemned already, because he hath not believed in the name of the only begotten Son of God. 19 And this is the condemnation, that light is come into the world, and men loved darkness rather than light, because their deeds were evil. 20 For every one that doeth evil hateth the light, neither cometh to the light, lest his deeds should be reproved. 21 But he that doeth truth cometh to the light, that his deeds may be made manifest, that they are wrought in God. (Jn.3:13-21)
Note to self, time stamps are:
1) Indexing: 0:58
2) Materialised Views: 2:06
3) Denormalisation 3:04
4) Vertical Scaling: 3:47
5) Database Caching: 5:00
6) Replication 6:01
7) Sharding 6:58
Table Partitioning:
If a single table is too large for the index, it may be a good idea to split the table (for example) per month of data. Your queries now must specify the month to select the right table(s), but each table gets its own index, instead of having one massive index, returning speed to normal.
It doesn't have to be per month - you can also select per year or day, and it doesn't have to be per date either: You could split it per category or name as well.
aka- Archiving data unlikely to be retrieved during normal operations, if it significantly reduces rowcount.
Isn't that sharding?
@@raptyaxa5771partitions stay on the same server
@@raptyaxa5771 Shards are separate servers.
🎯 Key points for quick navigation:
00:00 *📈 Scaling database importance*
- Why scaling is necessary as applications grow
- Effects of increased load on performance
- Importance of maintaining smooth operations for good user experience
01:09 *📊 Indexing for database performance*
- Indexes help locate information quickly in a database
- B+ tree indexes are common and efficient for various queries
- Balancing indexing for improved performance without slowing down write operations
02:07 *📑 Materialized views benefits and considerations*
- Materialized views store pre-computed data for faster access
- Balancing data refresh frequency with performance benefits is essential
- Efficiency gains from materialized views in complex query scenarios
03:01 *⬆️ Denormalization advantages and drawbacks*
- Denormalization simplifies data retrieval and speeds up queries
- Consistency challenges with managing redundant data during updates
- Impact of denormalization on complex query executions
03:56 *💻 Vertical scaling for immediate performance improvement*
- Adding resources to an existing database server to handle increased load
- Addressing limitations and cost considerations of vertical scaling
- The importance of redundant database configuration in vertical scaling
05:04 *🚀 Caching to reduce database load and improve response times*
- Storing frequently accessed data in a faster storage layer
- Addressing cache invalidation challenges for maintaining data accuracy
- Implementing caching at various levels for improved performance
06:14 *🔄 Database replication for availability and fault tolerance*
- Creating copies of primary databases on different servers
- Configuring synchronous and asynchronous replication for data consistency
- Challenges of managing data consistency and overhead with replication
07:08 *🔀 Sharding for efficient distribution of database workload*
- Splitting a large database into smaller, manageable pieces called shards
- Effective scalability by distributing workload across multiple servers
- Challenges and benefits of horizontal scaling through sharding techniques
Made with HARPA AI
This video dropped when I was considering researching this topic. Awesome !!
Great job addressing the top strategies for scaling and efficiency! Denormalization and effective data modeling is critical for scaling data in your data warehouse!
Data archival is also an excellent technique when the system scales and table size increases apart from partitioning as discussed in some other comments as it decreases overall load on the system. It's better to shard the db, partition it, and archive the older partitions.
Thanks! That was informative
You have to give us another tutorial to create those slick animations, please :). Great video btw
I would like to have a big enterprise to assign this man as my CTO
* indexing
* materialized view
* denormalization
* asynchronous replication
* database caching
* vertical scaling
* horizontal scaling / sharding
Very useful information. Can I ask what tool you use for design diagrams? Thanks
Wat abt partitioning of tables eg in postgres, tey help immensely
What software do you use to make the video?
Typo at 3:04: Recude complex joins, should be Reduce
Keep up the Good work
Thank you. Very informative 👏
clustering can be used as well
hello bytebytego can you make a series of videos explaining reverse engineering web APIs, automation
Is it even legal to make content this good!?
Reported him for too high quality of animation and generally too helpful of content. You can to! 😂
I want to Ask you what IS thé softwar you use to create this beautifull presentation
Your animations are amazing!
I must say , but content is very shallow no depth at all
@a1988ditya It's meant to be high level. Their newsletter, bytebytego site and training programs, and book go into greater depth.
0:35 - more users → more unstable = fewer users → more stable
So it is a self-balancing DB, problem solved :D
7 Ineffable Paradigms for Augmenting Database Scalability
**1. Indexation: The Cryptic Codex of Data Retrieval**
Analogous to the esoteric indices of an arcane grimoire, database indices facilitate the expeditious exhumation of information sans the necessity of scrutinizing every infinitesimal datum.
* **Modus Operandi:** Indices constitute labyrinthine data structures that warehouse specific column values and indicate the corresponding rows within the tabular labyrinth.
* **Exemplification:** In a repository of clientele, indexing the 'customer identifier' permits swift chronological excavation of transactional history, circumventing a comprehensive perusal of the tabular expanse.
* **Taxonomies:**
* **B-tree Index:** The most ubiquitous typology, suitable for a wide spectrum of inquiries, including range-bound interrogations. Proffers rapid insertion, deletion, and lookup operations within its arboreal structure.
* **Advantages:** Substantially diminishes the temporal duration required for query execution.
* **Disadvantages:**
* May decelerate write operations as the index necessitates perpetual recalibration with each infinitesimal data transmutation.
* Ascertaining the optimal equilibrium and selecting appropriate fields for indexation is a Herculean task crucial for peak performance.
**2. Materialized Views: Platonic Ideals of Pre-computed Data**
Materialized views constitute the pre-calculated quintessence of intricate queries, warehoused in a realm of expeditious access.
* **Modus Operandi:** A materialized view entombs the query outcome, undergoing periodic metamorphosis to reflect the most recent data transmutations.
* **Exemplification:** In a Business Intelligence pantheon, a materialized view can enshrine daily sales reports, generated from a voluminous dataset of Cyclopean proportions. In lieu of perpetually interrogating the entire dataset, the report can be instantaneously manifested from the view's crystallized form.
* **Advantages:** Substantially enhances performance by mitigating the Sisyphean computational burden.
* **Disadvantages:**
* Necessitates periodic rejuvenation to maintain data congruence, a process of Promethean resource consumption.
* Striking a Delphic balance between refresh frequency and performance benefits is crucial.
**3. Denormalization: The Faustian Bargain of Data Redundancy**
Denormalization entails the introduction of deliberate redundancy, a Mephistophelian pact of warehousing data in multiple loci to accelerate retrieval.
* **Modus Operandi:** Redundant data is appended to tables with Borgesian duplication, circumventing the need for complex joins across multiple tabular realms.
* **Exemplification:** Social media Leviathans frequently denormalize data to warehouse user posts and information within the same tabular expanse, expediting the Herculean task of feed generation.
* **Advantages:** Substantially augments read performance by simplifying the labyrinthine process of query execution.
* **Disadvantages:**
* Augments storage requisites with Brobdingnagian voracity.
* Necessitates meticulous, Sisyphean management of updates to maintain congruence across the database's multifarious facets.
* Can engender complexities and potential issues of Gordian proportions if not handled with Solomonic wisdom.
**4. Vertical Scaling: Promethean Augmentation of Silicon Titans**
Vertical scaling, or "scaling up," involves the Titanic augmentation of resources to your extant database server, a process akin to bestowing godlike powers upon mortal silicon.
* **Modus Operandi:** Upgrading hardware such as CPU, RAM, or storage capacity of the existing server to Olympian proportions.
* **Exemplification:** An online marketplace experiencing Promethean growth upgrades its database server to contend with increased load and transaction volume of Biblical proportions.
* **Advantages:**
* Relatively straightforward to implement, akin to granting Herculean strength to Atlas.
* Provides immediate performance enhancements without necessitating Daedalian modifications to application architecture.
* **Disadvantages:**
* Limited scalability due to the Procrustean constraints of hardware limitations and pecuniary considerations.
* Fails to address redundancy; a single server failure can still precipitate a database apocalypse.
**5. Caching: The Mnemosyne's Embrace of Ephemeral Data**
Caching, the art of storing frequently accessed data in Mnemosyne's bosom, provides a stratum of expeditious retrieval and mitigated database encumbrance.
* **Modus Operandi:** Frequently accessed data is ensconced in a cache (in-memory or application-level), diminishing the Sisyphean necessity to perpetually query the database.
* **Exemplification:** A streaming service of Amazonian proportions caches movie metadata to expedite title display, mitigating the Herculean database load.
* **Advantages:**
* Drastically reduces response times for frequently accessed data to near-instantaneous levels.
* Enhances user experience to heights of Elysian bliss.
* **Disadvantages:**
* Necessitates a cache invalidation strategy of Delphic complexity to ensure data congruence.
* Stale cache data can lead to the Cassandra-like prophecy of inaccurate information.
**6. Replication: The Hydra-headed Proliferation of Data Simulacra**
Replication involves the creation of Hydra-like copies of the primary database on disparate servers to enhance availability, distribute load with Herculean efficiency, and augment fault tolerance to Olympian levels.
* **Modus Operandi:** Data undergoes mitotic division from the primary database to replica servers.
* **Taxonomies:**
* **Synchronous Replication:** Ensures immediate data congruence but introduces latency akin to Achilles' pursuit of the tortoise.
* **Asynchronous Replication:** Proffers superior performance but may engender temporary incongruities, a Schrödinger's cat of data states.
* **Advantages:**
* Enhances read performance and availability to near-omniscient levels.
* Augments fault tolerance with Promethean resilience.
* **Disadvantages:**
* Augments storage and maintenance overhead to Brobdingnagian proportions.
* Introduces complexity in maintaining data congruence, particularly in distributed systems of Borgesian intricacy.
**7. Sharding: The Alexandrian Solution to the Gordian Knot of Data Magnitude**
Sharding involves the bifurcation of a voluminous database into smaller, more manageable segments called shards, distributed across multiple servers with Alexandrian precision.
* **Modus Operandi:** Each shard contains a subset of the data predicated on a specific sharding key, akin to dividing the world among Olympian deities.
* **Exemplification:** Instagram, that Panopticon of digital narcissism, shards its database by user ID, distributing data across a multitude of servers with the efficiency of Daedalus's labyrinth, achieving a load balancing feat worthy of Atlas himself.
* **Advantages:**
* Permits horizontal scaling by appending more servers, akin to adding new realms to Yggdrasil.
* Substantially enhances both read and write performance to near-lightspeed efficiency.
* **Disadvantages:**
* Introduces complexity in database design and management that would perplex even the Sphinx.
* Selecting the appropriate sharding key is a task of Delphic importance, crucial for equitable data distribution.
* Querying and re-sharding can be complex and resource-intensive, a Herculean labor that would make Sisyphus weep.
Sounds more like database scaling strategies by Socrates, Plato, Aristoteles et al. 😊
thanks for another excellent video....every time I see such videos to make the applications faster, I just wonder that are we pushing the hardware/software limits or we are reducing our patience & sanity limits.
great thanks
Thanks for this great video, could you share which software or framework you use to create the animations?
Illistrator and After effects
Thank you Alex. wonderful video. Additionally, I believe elastic search is good option as well for scaling reads...if eventual consistency between primary db and elastic search be maintained
He is Sahn Lam, not Alex
this,
is
amazing
thank you
Thank you!
Nice tut 🎉🎉
Nice succinct summary:)
Always informative. Thanks
Thanks
"vertical scaling does not address redundancy" - here is a mistake, it does not address resilience (or fault tolerance)
Hello Sir!!, big fan of your content, very knowledgeable, a quick question what tool do you use to simulate all the workflow diagrams for understanding considering all architectural diagrams
Very nice
golden!!
Is it possible to combine two or more strategies?
yes
Regarding the strategies of this video, you can do all of them at the same time.
What about partitioning ?
That's shardinng
Indexing, materialized views and denormalization are all conceptually the same thing
Love your content ! Just clarify a bit on Denormalization, denormalization is we don’t apply normalization techniques (2NF,3NF,..) into db, that means we accept redundancies in db storage but it’s much efficient in queries (no need to join other tables).
In a traditional RBS when talking specifically about schema design yes your correct (taking me back to my relational calc days).
Speaking more generally about distributed systems denormalization is a general team meaning just having multiple views of the same data normally backed by a specific SOT (source of truth), I consider putting distinct read models into my stack as denormalization. (Redis / Elastic / etc).
@@Netfox_thank for sharing
This video is very informative. Can you help us to understand ho to develop database for multiple user webapp ?
My god, this guy
Just stick all the data on a single big HD. Problem solved! 😋😋
But the scaling problem will occur will become single point of failure
@@yatinarora1252 then, you can focus only on that single point of failure.
Single big High definition ? I'm not sure if all my data can fit into a 1280x720 ...
You must be trolling lol
single point of failure 😂, risk if your kids or your wife throw it in garbage, let alone flooding or storm :))
Mf’s love their diagrams and PowerPoint presentations about database scaling. I got an idea, actually do it. Show us how you’d shard and partition a db for real.. no more theory, no more diagrams, no more explanations. I’m starting to think no one can actually do it in reality and people just like talking about it..
Lol literally have examples out there like instagram, pinterest etc
13 And no man hath ascended up to heaven, but he that came down from heaven, even the Son of man which is in heaven.
14 And as Moses lifted up the serpent in the wilderness, even so must the Son of man be lifted up:
15 That whosoever believeth in him should not perish, but have eternal life.
16 For God so loved the world, that he gave his only begotten Son, that whosoever believeth in him should not perish, but have everlasting life.
17 For God sent not his Son into the world to condemn the world; but that the world through him might be saved.
18 He that believeth on him is not condemned: but he that believeth not is condemned already, because he hath not believed in the name of the only begotten Son of God.
19 And this is the condemnation, that light is come into the world, and men loved darkness rather than light, because their deeds were evil.
20 For every one that doeth evil hateth the light, neither cometh to the light, lest his deeds should be reproved.
21 But he that doeth truth cometh to the light, that his deeds may be made manifest, that they are wrought in God.
(Jn.3:13-21)
vhai tora namaj, porda thik moto korisss
建议不要露脸,没必要
What is the name of tool that he draw his dynamic diagrams....please🥲
Keep up the Good work