What makes db a DB? It is the underlying database engine. SQL is 5% of all the Databases used. Myths With NoSQL: NoSQL because SQL doesn't scale It doesn't scale with constraint. If you shard SQL, it will scale. By default, NoSQL is sharded so people claim it is scalable. SQL Db Structure Myths: SQL uses B+ Tree. You can write your storage engine. You can store data anywhere. Popular and default engine MyISAM engine and InnoDB engine use B+ Tree. Reason: they give log(n) lookup. No SQL Structure It Depends on the use case as it is having no Standardization. Few Types: NewSQL, InMemory, Key-Value, Columnar DB, Hybrid Dbs. Document DB It is very close to Relational Databases with a change of modeling layer. MongoDB uses wiretiger engine. The underlying engine can be the same for SQL and NoSQL. If they have the same underlying engine the difference can be with the guarantees they offer. Some can be distributed, some in memory, centralized, or embedded. A database system typically has several abstracted layers that handle different aspects of data management. These layers include: The Physical Layer: This is the lowest layer and is responsible for managing the actual storage of data on disk or other storage devices. It handles tasks such as allocating space for data, reading and writing data to disk and managing data files. The Storage Engine Layer: This layer sits above the physical layer and is responsible for managing the storage and retrieval of data. It handles tasks such as indexing data, managing data structures, and providing an API for querying data. The Query Layer: This layer sits above the storage engine layer and is responsible for parsing and executing queries. It provides an API for querying data and translating high-level queries into operations that can be executed by the storage engine. The Application Layer: This is the highest layer responsible for interacting with the user or application. It allows the user or application to interact with the database using a query language or an API. These layers are abstracted from each other so that changes or updates to one layer do not affect the functionality of the other layers. All of these are plug-and-play. DB is as performant as its storage layer. We can see JSON at the top, but beneath the layers, it is a highly complicated way of storing the data. What does the node of the B+ tree contain? In relational DB, it contains the exact row, as it has a fixed width, so it knows how much data it will require in one row. However, It is not necessary to have a single row in a node, it can have multiple rows. Indexing: It is similar to SQL and NoSQL. It makes reading faster. (lookups) Sparse Index: Indexed Value + Offset. Smaller Index Sizen Dense Index: All the words in the index Why are we not able to do joins in NoSQL even if the underlying data structure is the same? Join runs on compute side, not on the storage engine side. Databases need to be in the same machine to join. In sharded db, you need to bring data in one very costly machine (network overhead) so people say there is no join in NoSQL. So people tend to do Approximate Join or Partial Join. Geo-sharding: Geo-sharding is a technique used to distribute a database across multiple geographic locations to improve performance, scalability, and availability. Master-Slave architecture This is done to scale the reads. We do write in master. Pulls write periodically, called replication log. We are more likely to have read. Multi-Master Architecture Problems of Conflict Resolution, How will you handle ID's? Conflict Logic First Write Wins Last Write Wins Concat Not Accept Any Distributed Databases Masters are independent, as they have shards Joins in Sharded DB: All the relevant data from the databases will arrive at a single machine then the join will happen. It then computes the result and sends it back to all machines. These queries are good for analytics but not ideal for real-time use cases as it is very expensive. Use cases The strength of SQL DBs is ACID compliance, some distributed claims ACID compliance, which means they are having distributed transactions which will result in them becoming slower. If we want strong consistency we need a single node.
I think btrees or lsm tree or any other type of index ds will store the memory location and not the whole row.can you help me fact check that information?
Just amazing !!! free me itni knowledge mil gai itna muje bhut expensive course se an mile I love this channel do bring such staff software engineer who have such a great experience
Would a postgres master slave architecture be eventually consistenet even with physical replication rather than logical replication on the storage layer? For example aurora postgres database
Good Podcast. I've been a fan or arpit for a long time. His BitTorrent playlist was very interesting. About the podcast, I'd prefer if you put your and arpit's video side by side, it would give a conversation feel rather than this.
I think btrees or lsm tree or any other type of index ds will store the memory location and not the whole row.can you help me fact check that information?
This is pure gold discussion. right questions were asked during discussion and great answers were given. Please do such podcast more frequently.
Thank you so much. Will surely do.
What makes db a DB?
It is the underlying database engine.
SQL is 5% of all the Databases used.
Myths With NoSQL: NoSQL because SQL doesn't scale
It doesn't scale with constraint.
If you shard SQL, it will scale. By default, NoSQL is sharded so people claim it is scalable.
SQL Db Structure
Myths: SQL uses B+ Tree.
You can write your storage engine.
You can store data anywhere.
Popular and default engine MyISAM engine and InnoDB engine use B+ Tree. Reason: they give log(n) lookup.
No SQL Structure
It Depends on the use case as it is having no Standardization.
Few Types: NewSQL, InMemory, Key-Value, Columnar DB, Hybrid Dbs.
Document DB
It is very close to Relational Databases with a change of modeling layer.
MongoDB uses wiretiger engine.
The underlying engine can be the same for SQL and NoSQL. If they have the same underlying engine the difference can be with the guarantees they offer. Some can be distributed, some in memory, centralized, or embedded.
A database system typically has several abstracted layers that handle different aspects of data management. These layers include:
The Physical Layer: This is the lowest layer and is responsible for managing the actual storage of data on disk or other storage devices. It handles tasks such as allocating space for data, reading and writing data to disk and managing data files.
The Storage Engine Layer: This layer sits above the physical layer and is responsible for managing the storage and retrieval of data. It handles tasks such as indexing data, managing data structures, and providing an API for querying data.
The Query Layer: This layer sits above the storage engine layer and is responsible for parsing and executing queries. It provides an API for querying data and translating high-level queries into operations that can be executed by the storage engine.
The Application Layer: This is the highest layer responsible for interacting with the user or application. It allows the user or application to interact with the database using a query language or an API.
These layers are abstracted from each other so that changes or updates to one layer do not affect the functionality of the other layers. All of these are plug-and-play.
DB is as performant as its storage layer.
We can see JSON at the top, but beneath the layers, it is a highly complicated way of storing the data.
What does the node of the B+ tree contain?
In relational DB, it contains the exact row, as it has a fixed width, so it knows how much data it will require in one row. However, It is not necessary to have a single row in a node, it can have multiple rows.
Indexing:
It is similar to SQL and NoSQL.
It makes reading faster. (lookups)
Sparse Index: Indexed Value + Offset. Smaller Index Sizen
Dense Index: All the words in the index
Why are we not able to do joins in NoSQL even if the underlying data structure is the same?
Join runs on compute side, not on the storage engine side.
Databases need to be in the same machine to join.
In sharded db, you need to bring data in one very costly machine (network overhead) so people say there is no join in NoSQL. So people tend to do Approximate Join or Partial Join.
Geo-sharding: Geo-sharding is a technique used to distribute a database across multiple geographic locations to improve performance, scalability, and availability.
Master-Slave architecture
This is done to scale the reads.
We do write in master.
Pulls write periodically, called replication log.
We are more likely to have read.
Multi-Master Architecture
Problems of Conflict Resolution, How will you handle ID's?
Conflict Logic
First Write Wins
Last Write Wins
Concat
Not Accept Any
Distributed Databases
Masters are independent, as they have shards
Joins in Sharded DB: All the relevant data from the databases will arrive at a single machine then the join will happen. It then computes the result and sends it back to all machines. These queries are good for analytics but not ideal for real-time use cases as it is very expensive.
Use cases
The strength of SQL DBs is ACID compliance, some distributed claims ACID compliance, which means they are having distributed transactions which will result in them becoming slower. If we want strong consistency we need a single node.
If you feel there is some inconsistency in my data, please do comment :)
I think btrees or lsm tree or any other type of index ds will store the memory location and not the whole row.can you help me fact check that information?
Hats off to the effort. Thanks.
Learned more than college Database course.
Thanks. Glad you found it useful.
Really engaging talk. Thanks Sukhad for bringing in Arpit !
Thanks for watching. Glad you liked it.
Just amazing !!!
free me itni knowledge mil gai itna muje bhut expensive course se an mile
I love this channel do bring such staff software engineer who have such a great experience
Will definitely bring.
Wow, what an eye-opening session!
Good podcast, love to hear arpit talking on system design and also a fun fact, Depends is the most word in the whole discussion😃.
Amazing...Arpit bhai is Gem
Gyan hi gyan !!!, Need more such podcast.
Coming your way
Very well articulated . Thanks for bringing this
Thank you. Glad you enjoyed it.
Quality Podcast learned a lot of new things
Glad you liked it.
Thanks for the podcast..very quality discussion
Glad you enjoyed it!
Quality podcast! Cleared many of my sql/no-sql misconceptions!
Thank you so much.
Do more podcast like this on decoding tech. setup like beerbiceps might take you to next level.
Will do more podcasts.
This is what we need ❤ and people are still getting views on dsa vs dev.
Glad you liked the video.
really great....learned a lot.....
Glad you liked it.
Would a postgres master slave architecture be eventually consistenet even with physical replication rather than logical replication on the storage layer? For example aurora postgres database
Yes. Since in this case also a WAL file has to be written to the destination which will take some time.
Citus helps to distribute postgres DB, right?
Now does that become eventual consistency?
Yes.
Good Podcast. I've been a fan or arpit for a long time. His BitTorrent playlist was very interesting.
About the podcast, I'd prefer if you put your and arpit's video side by side, it would give a conversation feel rather than this.
Thank you for the feedback.
Postgres doesn't store rows in B+ Tree, it stores in heap files
I think btrees or lsm tree or any other type of index ds will store the memory location and not the whole row.can you help me fact check that information?