0:40 Agenda 2:28 History of Data Processing (Anonymous' quote) 2:44 Timeline of Database Technology 5:54 Technology Adoption and the Hype Curve 7:42 Why NoSQL? 9:20 Amazon DynamoDB 12:21 Table 13:50 Partition Keys 14:40 Partition: Sort Key 15:08 Partitions are three-way replicated 16:15 Local Secondary Index (LSI) 17:12 Global Secondary Index (GSI) 18:04 How do GSI Updates Work? 18:56 Scaling NoSQL (Douglas Adams' quote) 19:06 What bad NoSQL looks like... 20:04 Getting the most out of Amazon DynamoDB throughput 21:00 Much better picture... 21:20 Auto Scaling 22:43 NoSQL Data Modeling (Grace Hopper's quote) 23:10 It's all about relationships... 23:51 SQL vs. NoSQL design pattern 26:03 Amazon DynamoDB - Key Concepts 27:24 Tenets of NoSQL Data Modeling 30:41 Complex Queries (Pablo Picasso's quote) 30:51 DynamoDB Streams and AWS Lambda 32:46 Triggers 34:51 Composite Keys (Nicolás Gómez Dávila's quote) 35:10 Multi-value Sorts and Filters 35:21 Approach 1: Query Filter 36:50 Approach 2: Composite Key 37:38 Advanced Data Modeling (Libby Larsen's quote) 37:46 How OLTP Apps Use Data 38:18 Maintaining Version History 40:20 Managing Relation Transactions 41:29 DynamoDB Transactions API 42:39 DynamoDB Table Schema 44:33 Reverse Lookup GSI 45:41 Hierarchical Data 45:47 Hierarchical Data Structures as Items 47:24 Modeling Relational Data 47:34 Modeling a Delivery Service - GetMeThat! 47:53 The Entity Model 48:14 The Access Patterns 48:42 The Relational Approach 49:21 The NoSQL Approach 52:01 The NoSQL Approach (Orders and Drivers GSI) 53:02 The NoSQL Approach (Vendors and Deliveries GSI) 53:45 A Real World Example (Philip K. Dick's quote) 53:52 Audible eBook Sync Service 54:52 Access Patterns 55:18 Primary Table 56:04 Indexes 56:28 Query Conditions 57:36 The Serverless Paradigm (Linus Torvalds' quote) 57:48 Elastic Serverless Applications 59:26 Conclusions
29:20 "NoSQL is not a flexible DB, it’s an efficient DB (and especially at scale). But the data model is very much not flexible, because the more that I tune the data model to the access pattern, the more tightly coupled to that service (ie the DB service tuned to my data access pattern) I am." Finally someone who states that clearly !
This was brilliant: I think I had my mind blown around 49:15 where he's got a dozen different access patterns supported by a single table and just two GSIs. I've never considered mixing totally different data types (e.g., customer, order, provider) in a PK, or mixing different data types in an SK. It's gonna take a while for me to internalize this, but I really appreciate this eye-opener. The example of ebook/audiobook at 53:50 is also excellent.
This comment prompted me to watch this. Specifically "dozen different access patterns supported by a single table and just two GSIs.... ". Thanks much for the _very_ concise highlight.
Since a PK can contain totally different data types, why bother creating GSI, we could simply extend the number of data types inserted as PK and increase the redundancy? Or is the GSI saving up space by not duplicating all the attributes?
@@nemetralI think one advantage of GSI is that the provisioned throughput settings are separate from those of its base table. It may imply in scalability, avoiding throttling.
a lot of the material online that "tries" to enlighten people on how to design these type of databases all take the SQL approach. Even a lot of the library support for DynamoDB get this utterly wrong.
This is dense as hell but incredibly informative and oddly thought-provoking. This guy is a great speaker! You can tell there's nothing he doesn't know about databases
Dude... only a few minutes into your introduction I know I'm looking at an expert. Your passion is way up in the sky. Good man you are . Thanks for sharing the lecture.
After many round trips, trying to get DynamoDB modelling right, I reached here. Consider these (slides and videos) the bible of Modelling Dynamo DB. Last 10 minutes are pure pleasure.
All I can say is WOW! Truly changed my perspective on NoSQL data modeling. Overloading the PK and SK with non-related datatypes has shown me the power in this approach and sent me down a new rabbit hole. Well done sir! Keep up the excellent knowledge sharing!
Finally. I've been trying to wrap my head around this stuff for a while, even after watching this (and last year's) several times. What broke the log jam? 1. Watch at .75 speed. He needs to talk fast to jam it all into the allotted time. Much more absorbable at .75. 2. Primary key doesn't uniquely identify an item. Rather it identifies a collection of items. Might be a terminology overload between RDMS and NOSQL though. . I think I have a better understanding of how this stuff can be used. So cool!
This is the best video for understanding nosql, I always worked with RDBMS and wondered how nosql handles consistency, complex relational data etc. and had a cursory knowledge of the concept. When I completed watching this video, my understanding ended up being high to some degree. Thanks.
Nailed it. Came here direct from 2016 session. Never bored to listen his speech. Wish to meet him some day. Looks like he practiced DynamoDB, all scenarios 1M times.
Great video! For the viewers, one interesting thing to note is that while GSIs allow you to satisfy many use cases with a single table, internally they are implemented using multiple tables. This is why you need to provision for GSIs separately. So, the single table is not really a single table under the hood.
probably the only talk which correctly identifies use cases for nosql. So many stackoverflow posters say it's 'flexible' and it couldn't be further from the truth. 'flexible' complex queries are best served by rdbms. Use nosql to have 'KISS" extremely scalable applications and just say your product manager that it's the responsibility of analytics for plugging in random new stuff to fetch data outside of what main api are doing which it was designed for
easily one of the best Re:Invent talks out there. Fabulous content. Thank you. Really opened my eyes as to what NoSQL is all about and how to do many to many relationships with DynamoDB.
**Takeaways:** - Global secondary index should scale as fast as the table writes, or else the table writes are throttled - In NoSQL it is an anti-pattern to query a small subset of data, the queries should be equally distributed (the partitions should be created in such fashion) - Design your schema based on your access patterns. NoSQL is designed for simple transactional queries. If your access patterns cannot be simple, use SQL databases. - Tenets of data modeling - understand the nature OLAP, OLTP, DSS / Understand the entity relationships / identify data life cycle TTL, backups, archivals etc / identify access patterns - data sources, aggregations, workflows / use single table / simplify access patterns - single table is good enough for many access patterns (talk mentions 40) - NoSQL for OLTP, DSS at scale, SQL for OLAP or OLTP without too much scale.
I can not praise this video enough. I've been looking for introductions to NoSQL on TH-cam, particularly for how to model, and after dozens of videos I've found this. It is clear, it is full of introductory knowledge, it's got real world examples (and a presenter with real world experience), and is very well presented. That alone makes for a quality video. But in addition, this guy has a really soothing voice (it gives me that movie vive) and his work with the quotes gave me something else to think about! Thank you.
I think this took me about two hours to watch, with rewinding and pausing to actually understand the diagrams and what he's saying. (To be fair I haven't used Dynamo as much more than a key value store up to this point.) But anyway, this is super enlightening about Dynamo and NoSQL in general.
Multi-access patterns using different combinations of GSI is the most useful takeaway tip (big tip actually) from the presentation. Thank you Sir! well explained
He glossed over the race condition at 38:17. If 2 clients are creating new versions (v1 and v2), and those 2 versions get promoted to v0 without a lock/transaction, the properties of v1 and v2 will conflict if v2 is building off v0. You would need to promote to v0 in a transaction to avoid a conflict between draft versions.
I may have left out the need for a conditional check on the current version of the v0 item when committing back to the item. Lots of steps to describe in that workflow for sure. You could also explicitly set a lock the v0 item and check that before inserting the new version using transactWrite API. Good catch.
Wow! I went from being interested in NoSQL but having no clue where to start to knowing why we need to switch to it and a pretty good idea how to get there. Thank you
Excellent talk! 47:37 is one of the finest examples of how a complex relational data with a dozen access patterns' requirement is solved by a single NoSQL table with DynamoDB Global Secondary Indexing!!
Few additional things to consider: 1. How do we plan for capacity when we have multiple access patterns for the same table; just SUM it up? 2. Be careful to analyse your tables with potential hotspots. I think in the exercise of trying to store multiple item types in the same table, keep an eye out for key distributions that are not ideal for DynamoDB.
ReturnConsumedCapacity is your friend when doing capacity planning. Turn it on and run some pattern load. Log the data and look at the cost of each access pattern.. After that it is simple math to look at what the table will require over time. Don't just add it up as most of your access patterns do not run in parallel. Use patterned load representative of real world traffic and the capacity log data will show you what the app will need as you scale. Certainly as you call out make sure that you are not exceeding partition level throughput and if you are then simply shard the partitions accordingly to distribute the load.
I love the fact that much of this is regurgitated from his presentation 2 years ago... and still highly relevant! This time, some great best practice tips, examples and breakdowns. Pretty obvious the NoSQL option is sound and steady, it's all about picking the right use case and how you model the data. Absolutely stellar performance, end to end... again.
The relational database was invented because of their strength in "relationships" not speed. NoSQL predates via Hierarchical Databases which is basically what all NoSQL databases are (tree structures). Hierarchical Databases were also faster than RDBMs which tells you speed is not the reason RDBMs were invented. RDBMs were implemented only when the computing power to process JOINs (relationships) became feasible.
I *have* run NoSQL clusters at scale and it is true: it's basically a full-time job unless the org has a very, very bright team doing it; you're going to deal with *all* sorts of issues: storage scaling consideration, node scaling consideration, security, to-backup-or-not-to-backup, blah blah blah. I would move to a SaaS in a heartbeat; likely *will* move to a SaaS actually lol.
@25:11 Sorry I lose my voice ..folks.... :-). Yea I can tell because you talk much faster than your brain :-). You're so good and passionate about your product. AWS is so lucky to get a guy like you. Kudos!
Mind blown, one table for multiple partition keys. will take a lot to get to grips with this approach. I can see now why u dont need that many GSI's in DynamoDb.
Amazing how NoSQL is described here... Is there a way to find the actual design of the 2 DynamoDB examples anywhere? That would be great as a resource to wrap our heads around the concepts...
This is a groundbreaking article. I read a book on a similar subject that altered my perceptions. "AWS Unleashed: Mastering Amazon Web Services for Software Engineers" by Harrison Quill
Aha, now I understand NoSQL: it is all about denormalizing your data for all of the query use cases, and then overlaying all of these denormalizations into one table* by clever overloading of partition key and sort key. * Or "a few" tables, if you count GSI's as automatically-managed separate tables, which they are under the hood.
At 30:25 it's said not to create multiple tables, yet when you follow official AWS courses on DynamoDB like the one at www.edx.org/course/amazon-dynamodb-building-nosql-database-driven-applications they're creating 6 tables for a simple CRUD prototype application.
Which one makes the most sense from a compute cost perspective? Structure the data on one table and it takes one query to retrieve all the items required. Structure it on 6 tables and it will take 6 queries with multiple round trips, result set iterations, and client side grouping/ordering. The single table design is obviously much more efficient.
My issue with this is scaling a many to many relationship, you're duplicating data many places, but what if I need to update/edit this data? I would also have to do that in many places, that sounds like a scaling issue
This is a common objection, but it is an extremely rare requirement. The vast majority of use cases that require denormalization of N:N relationships have immutable data on one side or the other, e.g. event/status/type definitions. The ones that don't either use data from one side or the other that changes infrequently or the use case requires history to be maintained if those values change, e.g. user name or shipping address. Bottom line is if you do have the use case for maintaining N:N relationships with frequently updated data then NoSQL is probably not for you, but using that as a reason to choose RDBMS when 99% of workloads don't require that is not a valid argument.
Totally changed my take on DynamoDB. Helped a lot. There are two doubts that I have: 1. Is it a good practice to keep a copy of the same data with different Sort Key? Doesn't it take more storage? 2. How should I handle updates to the resources? Should I use the versioning mechanism that you have shown? Thanks
1. Yes but "storage is cheap" is the premise of NoSQL, as he says. 2. Sounds like the versioning mechanism was especially useful for simulating transactions before DynamoDB had the transaction API. But now, unless the old versions really are needed by the OLTP-style queries of the applications/services, probably not justified keeping them in the table.
The argument about storage cost vs CPU cost is a fair argument, but it's a bit of a stretch to claim that's the only consideration on whether to use SQL or NoSQL
Although what happens when you add a query pattern as your application grows. That will sure shot lead to a re-architecture of the whole data model and possibly of the application itself. The data model he shows at 49:23 is very specific to the query pattern he knows at the time of modeling this app.
Adding patterns almost never requires a re-architecture of the entire model. Usually this involves decorating certain items with additional attributes, adding indexes, or modifying existing values. Unless you were completely off target when you built the app the patterns you designed for are not going away.
Very well presented although I think some of the shade cast at RDBMS was overblown. A couple of things were wrong. One, I don't believe anyone, today, is building to rigid definitions of third normal. No one is joining to a state and country table to build a complete address. Storing duplicate values is an optimization pattern in both NOSQL and SQL. Two, some RDBMS's have storage patterns which allow content from multiple tables to be collocated on the same block. Therefore although it looks like joins between ALL_TABLES, ALL_TAB_COLUMNS, ALL_TAB_INDEXES, ALL_TAB_* it's not. I don't need noSQL to collocate data from multiple entities if that's what best fits my usecase. Like the presenter said, it really helps to understand your technology before you use it. I think a 99% of performance problems with existing databases come from a lack of expertise of the user, not a lack of capability of the tool. Right now, everywhere, low-skilled developers are implementing bad nosql Applications. Simply because it can scale to greater heights of performance than an RDBMS, doesn't guarantee that it will. When I find performance issues with RDBMS it's almost always a seven-page long query; often running multiple times per second and returning the same result. Clearly if it takes seven pages to describe what you want the database to do, you've gone horribly wrong somewhere. What are the chances that someone willing to write seven pages of SQL, fully expect it to run smoothly, throw up their hands in exasperation... what are the chances they will be any more successful with Dynamo?
It seems to me that Data warehouse experts say Relational Databases (RPBs) are poor at OLAP at scale, and NoSQL experts say that RDBs are poor at OLTP at scale. It seems to me that RDBs can't scale well--these newer technologies are taking a portion of the RDBs workload and making it scale well.
GSI is eventually consistent. This is a major issue. I can model the data in dynamodb but because GSI's are NOT strongly consistent, it create many issues/challenges for an application requiring strong consistency.
Just looking at DynamoDB Streams and lambda as a way of building a far better performant DB than any SQL database ever was an eye opener. But the rest of the presentation... mind blown.
I am wondering what happens when technology with very low-cost computing resources may be like truly cost-effective quantum computing arrives..do we need to rethink and go back to relational db or something more compute-hungry but efficient?
So, I'm now in the position that I didn't know upfront that we would need another item type and so I named the partition key attribute to something that will not be applicable at all for the new item type. So for example, I'm now storing animals with a partition key like AnimalID, but I didn't know that I would also need to store the veterinarian in the same table. So probably I will need to create a new table and migrate the data and fix references to the attribute name? So I guess it's arbitrary and therefor not shown in any of the talk's examples, but how would you name the partition key's attribute if the key can be different things for different items?
I am starting to wrap my head around this concept and understand it. Yet, on the 13th, AWS launched DoucmentDB, which sounds like their version of MongoDB. Is it going to replace DynamoDB, or will it have different use cases, for basic consumer apps, what will be the best direction to go with?
DynamoDB is not going away. There are strong opinions in the NoSQL community about differences between so called "Document" and "Wide Column" API's. Those who really know the technology understand that there is no difference when you are handling big data workloads and that is what NoSQL was designed for. The patterns I use apply to all NoSQL backends in some variation. AWS is providing a choice for those who believe they need to have one. As far as which one to go with I think that comes down to the cost to support the workload. Take a look at both and see which one is the most cost effective for what you are trying to do. Depending on the workload it might be faster to develop on DocumentDB at first but you will probably be introducing scale challenges similar to what I see commonly in MongoDB apps that will force you to do things correctly sooner or later. Take the time up front to model your data for scale and then make the call based on a meaningful TCO analysis.
Access patterns change? You're building a data model which works today, but when the business requires new access patterns which no longer fit, how do you handle that?
Great explanation! Quick question. On the hierarchal data demonstration you use the USA as a partition key. Is that a good partition key in terms of uniqueness? Would that become a hot key?
Potentially the partition could become hot depending on the number of Items on the tree. If you are trying to move more that 1K WCU/3K RCU in/out of the partition you would need to shard the partition key and split the data across more than one logical key. The data would then be processed in parallel across the logical keys. To ensure uniqueness the last item in the composite sort key should be a unique location or ID, like a desk location or userId.
Spent the last 5 years trying to explain this to every single developer that we have assigned to use DynamoDB and still 0 get it. Relational databases have destroyed any sensibility of data storage.
Slides if you cannot read it www.slideshare.net/AmazonWebServices/amazon-dynamodb-deep-dive-advanced-design-patterns-for-dynamodb-dat401-aws-reinvent-2018pdf
You can use optimistic concurrency with conditional writes check a version or lastUpdated timestamp attribute contains an expected value or the write will fail.
The guy forgot that before relations database, the NoSQL already existed on mainframes based on CODASL model. In the 1980's IBM and Univac had very good DBMS based upon hash keys.
Can you please explain why empty strings aren't allowed in non-key attributes? Or at the very least why they can't be in nested attributes? forums.aws.amazon.com/thread.jspa?threadID=90137
I see how this is all working with a single table, but... This feels like abuse (from a cognitive load perspective). A little too much optimization towards implementation details. Quite interesting though.
The only way to scale is to optimize the data model for the workload. RDBMS gives query flexibility at the cost of CPU, because of this it cannot scale cost effectively. If you do not denormalize then queries are expensive on big data, that is just a fact. NoSQL databases are built on the idea that everything lives in the same collection/table/key space and indexes will be used to produces aggregations of items based on simple queries like "SELECT * FROM TABLE WHERE X=[someValue]". This reduces the CPU cost dramatically and enables applications to scale cost effectively.
That's the whole point of it, getting deep and dirty with your database, this is not for the faint of heart or "one man team full stack" developers. I will admit tho there could be more community tools that abstract the difficulty of it for a more common usage
@@ppgab what difficulty> you need to think carefully about your access patterns and design your keys accordingly. This is not necessarily a very difficult task, that you may not get it right the first time is common in all things automation.
I assume you are asking about the Composite Key section using status/date to form a faceted list of items sorted by state and date. If you look at the charts the Composite Key is defined on a Secondary Index, not the table. Key values in Secondary Indexes can be updated, so if the items are stored on the table with a Partition Key of SessionID then you can define the GSI on PK = UserID, SK = StatusDate and when the status is updated the items will be re-sorted on the GSI automatically.
Great presentation, Thank you. I happen to be a fullstack developer working mostly with RDBMS databases and thus I tend to keep thinking about the impact the data modelling will impact UI Design. Where does DynamoDB leave the drop down lists on the UI's ? Usually drop down list data gets sourced from a table somewhere(i.e. if not static). For example: person & country table. You are creating a person and therefore you want to pick the person's nationality( i.e. country_id in country table). My RDBMS would look like this 1. person(person_id,first_name,nationality_id....,) 2. country(country_id,name,....). NOTE: data in country table may keep changing (updates & inserts). DynamoDB dictates that all this data goes into one table. How will I get my countries data to present at the drop drop down list? or generally, how will countries' UI component use the table (create,read,update)?
No, not really. Using begins_with is not a significant source of latency even in the rare cases where items that satisfy that query are spread over different partitions.
This guy doesn't even take breath man... If Eminen was a programmer he would be proud of this guy.
Gold 😂
0:40 Agenda
2:28 History of Data Processing (Anonymous' quote)
2:44 Timeline of Database Technology
5:54 Technology Adoption and the Hype Curve
7:42 Why NoSQL?
9:20 Amazon DynamoDB
12:21 Table
13:50 Partition Keys
14:40 Partition: Sort Key
15:08 Partitions are three-way replicated
16:15 Local Secondary Index (LSI)
17:12 Global Secondary Index (GSI)
18:04 How do GSI Updates Work?
18:56 Scaling NoSQL (Douglas Adams' quote)
19:06 What bad NoSQL looks like...
20:04 Getting the most out of Amazon DynamoDB throughput
21:00 Much better picture...
21:20 Auto Scaling
22:43 NoSQL Data Modeling (Grace Hopper's quote)
23:10 It's all about relationships...
23:51 SQL vs. NoSQL design pattern
26:03 Amazon DynamoDB - Key Concepts
27:24 Tenets of NoSQL Data Modeling
30:41 Complex Queries (Pablo Picasso's quote)
30:51 DynamoDB Streams and AWS Lambda
32:46 Triggers
34:51 Composite Keys (Nicolás Gómez Dávila's quote)
35:10 Multi-value Sorts and Filters
35:21 Approach 1: Query Filter
36:50 Approach 2: Composite Key
37:38 Advanced Data Modeling (Libby Larsen's quote)
37:46 How OLTP Apps Use Data
38:18 Maintaining Version History
40:20 Managing Relation Transactions
41:29 DynamoDB Transactions API
42:39 DynamoDB Table Schema
44:33 Reverse Lookup GSI
45:41 Hierarchical Data
45:47 Hierarchical Data Structures as Items
47:24 Modeling Relational Data
47:34 Modeling a Delivery Service - GetMeThat!
47:53 The Entity Model
48:14 The Access Patterns
48:42 The Relational Approach
49:21 The NoSQL Approach
52:01 The NoSQL Approach (Orders and Drivers GSI)
53:02 The NoSQL Approach (Vendors and Deliveries GSI)
53:45 A Real World Example (Philip K. Dick's quote)
53:52 Audible eBook Sync Service
54:52 Access Patterns
55:18 Primary Table
56:04 Indexes
56:28 Query Conditions
57:36 The Serverless Paradigm (Linus Torvalds' quote)
57:48 Elastic Serverless Applications
59:26 Conclusions
thank you for the index!
You're Awesome
Thanks so much for this! 👍
You rock!
The best talk on any database I've ever seen. Also one of the only talks I have to slow down, rather than speed up, to digest.
فیلم اکشن جنگی بفرستین برای من
من یک فیلم رزمیکار اکشن میخوام متشکرم
29:20 "NoSQL is not a flexible DB, it’s an efficient DB (and especially at scale). But the data model is very much not flexible, because the more that I tune the data model to the access pattern, the more tightly coupled to that service (ie the DB service tuned to my data access pattern) I am."
Finally someone who states that clearly !
This was brilliant: I think I had my mind blown around 49:15 where he's got a dozen different access patterns supported by a single table and just two GSIs. I've never considered mixing totally different data types (e.g., customer, order, provider) in a PK, or mixing different data types in an SK. It's gonna take a while for me to internalize this, but I really appreciate this eye-opener. The example of ebook/audiobook at 53:50 is also excellent.
This comment prompted me to watch this. Specifically "dozen different access patterns supported by a single table and just two GSIs.... ". Thanks much for the _very_ concise highlight.
Since a PK can contain totally different data types, why bother creating GSI, we could simply extend the number of data types inserted as PK and increase the redundancy? Or is the GSI saving up space by not duplicating all the attributes?
@@nemetralI think one advantage of GSI is that the provisioned throughput settings are separate from those of its base table. It may imply in scalability, avoiding throttling.
a lot of the material online that "tries" to enlighten people on how to design these type of databases all take the SQL approach. Even a lot of the library support for DynamoDB get this utterly wrong.
I had the same realization. I've also never considered putting different data types in a partition key or sort key. Mind blowing!
Seeing this talk live blew my mind and made me realize just how little I really know about properly designing Dynamo tables.
Do you mean... a (single) Dynamo table?
This is dense as hell but incredibly informative and oddly thought-provoking. This guy is a great speaker! You can tell there's nothing he doesn't know about databases
Dude... only a few minutes into your introduction I know I'm looking at an expert. Your passion is way up in the sky. Good man you are . Thanks for sharing the lecture.
After many round trips, trying to get DynamoDB modelling right, I reached here. Consider these (slides and videos) the bible of Modelling Dynamo DB. Last 10 minutes are pure pleasure.
Best NoSQL deep dive I have ever seen. Very clear about what makes modeling different. Wonderful.
All I can say is WOW! Truly changed my perspective on NoSQL data modeling. Overloading the PK and SK with non-related datatypes has shown me the power in this approach and sent me down a new rabbit hole. Well done sir! Keep up the excellent knowledge sharing!
Yes! The overloading truly opens up a lot of possibilities!
I am glad this is my first video on dynamoDB.
This is the quality every aws talk shud have. Awesome
easily one of the best tech talks i've ever seen. I learned so much in the last 15 minutes alone.
Rick Houlihan really seems to talk through this very complex topic quite casually. He's an expert and this was an excellent video!
Incredible talk. I've always wondered when I should use RDBMS vs NoSQL and this video has finally answered it.
yeah, postgres for everything
Finally. I've been trying to wrap my head around this stuff for a while, even after watching this (and last year's) several times. What broke the log jam?
1. Watch at .75 speed. He needs to talk fast to jam it all into the allotted time. Much more absorbable at .75.
2. Primary key doesn't uniquely identify an item. Rather it identifies a collection of items. Might be a terminology overload between RDMS and NOSQL though.
.
I think I have a better understanding of how this stuff can be used. So cool!
Thats why i find Partition key
better name then primary key
This is the best video for understanding nosql, I always worked with RDBMS and wondered how nosql handles consistency, complex relational data etc. and had a cursory knowledge of the concept. When I completed watching this video, my understanding ended up being high to some degree. Thanks.
One of the best tech talks on Dynamo DB. Learnt a lot.
Nailed it. Came here direct from 2016 session. Never bored to listen his speech. Wish to meet him some day. Looks like he practiced DynamoDB, all scenarios 1M times.
Great video! For the viewers, one interesting thing to note is that while GSIs allow you to satisfy many use cases with a single table, internally they are implemented using multiple tables. This is why you need to provision for GSIs separately. So, the single table is not really a single table under the hood.
@@nononononoo689 Too meta. Can't handle.
Didn’t have this high quality in-depth information for a long time, love this. Thanks for the talk.
probably the only talk which correctly identifies use cases for nosql. So many stackoverflow posters say it's 'flexible' and it couldn't be further from the truth. 'flexible' complex queries are best served by rdbms. Use nosql to have 'KISS" extremely scalable applications and just say your product manager that it's the responsibility of analytics for plugging in random new stuff to fetch data outside of what main api are doing which it was designed for
easily one of the best Re:Invent talks out there. Fabulous content. Thank you. Really opened my eyes as to what NoSQL is all about and how to do many to many relationships with DynamoDB.
**Takeaways:**
- Global secondary index should scale as fast as the table writes, or else the table writes are throttled
- In NoSQL it is an anti-pattern to query a small subset of data, the queries should be equally distributed (the partitions should be created in such fashion)
- Design your schema based on your access patterns. NoSQL is designed for simple transactional queries. If your access patterns cannot be simple, use SQL databases.
- Tenets of data modeling - understand the nature OLAP, OLTP, DSS / Understand the entity relationships / identify data life cycle TTL, backups, archivals etc / identify access patterns - data sources, aggregations, workflows / use single table / simplify access patterns
- single table is good enough for many access patterns (talk mentions 40)
- NoSQL for OLTP, DSS at scale, SQL for OLAP or OLTP without too much scale.
I can not praise this video enough. I've been looking for introductions to NoSQL on TH-cam, particularly for how to model, and after dozens of videos I've found this. It is clear, it is full of introductory knowledge, it's got real world examples (and a presenter with real world experience), and is very well presented.
That alone makes for a quality video. But in addition, this guy has a really soothing voice (it gives me that movie vive) and his work with the quotes gave me something else to think about! Thank you.
He speaks very confident and interesting, thank you.
Love these re:Invent videos...an hour passes by in a snap! Lot of information in just 1hr
The best ever talk on Dynamo DB & NoSQL DB's
I think this took me about two hours to watch, with rewinding and pausing to actually understand the diagrams and what he's saying. (To be fair I haven't used Dynamo as much more than a key value store up to this point.)
But anyway, this is super enlightening about Dynamo and NoSQL in general.
Multi-access patterns using different combinations of GSI is the most useful takeaway tip (big tip actually) from the presentation. Thank you
Sir! well explained
Favorite session of Re:invent 2018 so far!
He glossed over the race condition at 38:17. If 2 clients are creating new versions (v1 and v2), and those 2 versions get promoted to v0 without a lock/transaction, the properties of v1 and v2 will conflict if v2 is building off v0. You would need to promote to v0 in a transaction to avoid a conflict between draft versions.
I may have left out the need for a conditional check on the current version of the v0 item when committing back to the item. Lots of steps to describe in that workflow for sure. You could also explicitly set a lock the v0 item and check that before inserting the new version using transactWrite API. Good catch.
Magnificent talk, i never understood so much about dynamo ans DBs in one talk
Wow! I went from being interested in NoSQL but having no clue where to start to knowing why we need to switch to it and a pretty good idea how to get there. Thank you
Excellent talk! 47:37 is one of the finest examples of how a complex relational data with a dozen access patterns' requirement is solved by a single NoSQL table with DynamoDB Global Secondary Indexing!!
Really amazing presentation. I never understood the one table design but this presentation opened my eyes to a new way to data model.
This one talk gave me the information I needed to understand the benefits of NoSQL databases. NoSQL is not Non-relational. The ERD still matters.
Second talk I watch from this guy and really making me understand how it works.
God damn. This guy knows stuff. Thank you very much.
That is an awesome lecture about NoSQL databases. I wish AWS would include a download link for the slides of the presentation
The best tech lecture I have gone through.
Few additional things to consider: 1. How do we plan for capacity when we have multiple access patterns for the same table; just SUM it up? 2. Be careful to analyse your tables with potential hotspots. I think in the exercise of trying to store multiple item types in the same table, keep an eye out for key distributions that are not ideal for DynamoDB.
ReturnConsumedCapacity is your friend when doing capacity planning. Turn it on and run some pattern load. Log the data and look at the cost of each access pattern.. After that it is simple math to look at what the table will require over time. Don't just add it up as most of your access patterns do not run in parallel. Use patterned load representative of real world traffic and the capacity log data will show you what the app will need as you scale.
Certainly as you call out make sure that you are not exceeding partition level throughput and if you are then simply shard the partitions accordingly to distribute the load.
I love the fact that much of this is regurgitated from his presentation 2 years ago... and still highly relevant! This time, some great best practice tips, examples and breakdowns. Pretty obvious the NoSQL option is sound and steady, it's all about picking the right use case and how you model the data. Absolutely stellar performance, end to end... again.
What a strange (bad) choice of word
The relational database was invented because of their strength in "relationships" not speed. NoSQL predates via Hierarchical Databases which is basically what all NoSQL databases are (tree structures). Hierarchical Databases were also faster than RDBMs which tells you speed is not the reason RDBMs were invented. RDBMs were implemented only when the computing power to process JOINs (relationships) became feasible.
I *have* run NoSQL clusters at scale and it is true: it's basically a full-time job unless the org has a very, very bright team doing it; you're going to deal with *all* sorts of issues: storage scaling consideration, node scaling consideration, security, to-backup-or-not-to-backup, blah blah blah. I would move to a SaaS in a heartbeat; likely *will* move to a SaaS actually lol.
Any Rick's session is a thumbs up every time!
@25:11 Sorry I lose my voice ..folks.... :-). Yea I can tell because you talk much faster than your brain :-). You're so good and passionate about your product. AWS is so lucky to get a guy like you. Kudos!
Perfect summary and examples for the developer associate exam. Really powerful and amazing stuff
What tool is used to generate the heat map at 19:30 and what metrics contribute to "key pressure"?
Amazing Video.great info in a nutshell. Loved the initial slides on data pressure and why No SQL
Mind blown, one table for multiple partition keys. will take a lot to get to grips with this approach. I can see now why u dont need that many GSI's in DynamoDb.
Yup, in the end your data will pretty much be unreadable without the modeling documentation
Amazing how NoSQL is described here... Is there a way to find the actual design of the 2 DynamoDB examples anywhere? That would be great as a resource to wrap our heads around the concepts...
Great presentation! Definitly mind blown to put everything into one single table!
“Those who cannot remember the past, are condemned to repeat it.”
― George Santayana, The Life of Reason: Five Volumes in One
This is a groundbreaking article. I read a book on a similar subject that altered my perceptions. "AWS Unleashed: Mastering Amazon Web Services for Software Engineers" by Harrison Quill
Nicely defined many concepts. Thanks a lot Rick.
Aha, now I understand NoSQL: it is all about denormalizing your data for all of the query use cases, and then overlaying all of these denormalizations into one table* by clever overloading of partition key and sort key.
* Or "a few" tables, if you count GSI's as automatically-managed separate tables, which they are under the hood.
At 30:25 it's said not to create multiple tables, yet when you follow official AWS courses on DynamoDB like the one at www.edx.org/course/amazon-dynamodb-building-nosql-database-driven-applications they're creating 6 tables for a simple CRUD prototype application.
Which one makes the most sense from a compute cost perspective? Structure the data on one table and it takes one query to retrieve all the items required. Structure it on 6 tables and it will take 6 queries with multiple round trips, result set iterations, and client side grouping/ordering. The single table design is obviously much more efficient.
My issue with this is scaling a many to many relationship, you're duplicating data many places, but what if I need to update/edit this data? I would also have to do that in many places, that sounds like a scaling issue
This is a common objection, but it is an extremely rare requirement. The vast majority of use cases that require denormalization of N:N relationships have immutable data on one side or the other, e.g. event/status/type definitions. The ones that don't either use data from one side or the other that changes infrequently or the use case requires history to be maintained if those values change, e.g. user name or shipping address. Bottom line is if you do have the use case for maintaining N:N relationships with frequently updated data then NoSQL is probably not for you, but using that as a reason to choose RDBMS when 99% of workloads don't require that is not a valid argument.
Amazing presentation... and on a single breath too!
Totally changed my take on DynamoDB. Helped a lot. There are two doubts that I have:
1. Is it a good practice to keep a copy of the same data with different Sort Key? Doesn't it take more storage?
2. How should I handle updates to the resources? Should I use the versioning mechanism that you have shown?
Thanks
1. Yes but "storage is cheap" is the premise of NoSQL, as he says.
2. Sounds like the versioning mechanism was especially useful for simulating transactions before DynamoDB had the transaction API. But now, unless the old versions really are needed by the OLTP-style queries of the applications/services, probably not justified keeping them in the table.
The argument about storage cost vs CPU cost is a fair argument, but it's a bit of a stretch to claim that's the only consideration on whether to use SQL or NoSQL
Best session at reInvent 2018! Bravo!
Although what happens when you add a query pattern as your application grows. That will sure shot lead to a re-architecture of the whole data model and possibly of the application itself. The data model he shows at 49:23 is very specific to the query pattern he knows at the time of modeling this app.
Adding patterns almost never requires a re-architecture of the entire model. Usually this involves decorating certain items with additional attributes, adding indexes, or modifying existing values. Unless you were completely off target when you built the app the patterns you designed for are not going away.
Favorite session so far as well !
Very well presented although I think some of the shade cast at RDBMS was overblown. A couple of things were wrong. One, I don't believe anyone, today, is building to rigid definitions of third normal. No one is joining to a state and country table to build a complete address. Storing duplicate values is an optimization pattern in both NOSQL and SQL. Two, some RDBMS's have storage patterns which allow content from multiple tables to be collocated on the same block. Therefore although it looks like joins between ALL_TABLES, ALL_TAB_COLUMNS, ALL_TAB_INDEXES, ALL_TAB_* it's not. I don't need noSQL to collocate data from multiple entities if that's what best fits my usecase. Like the presenter said, it really helps to understand your technology before you use it. I think a 99% of performance problems with existing databases come from a lack of expertise of the user, not a lack of capability of the tool. Right now, everywhere, low-skilled developers are implementing bad nosql Applications. Simply because it can scale to greater heights of performance than an RDBMS, doesn't guarantee that it will. When I find performance issues with RDBMS it's almost always a seven-page long query; often running multiple times per second and returning the same result. Clearly if it takes seven pages to describe what you want the database to do, you've gone horribly wrong somewhere. What are the chances that someone willing to write seven pages of SQL, fully expect it to run smoothly, throw up their hands in exasperation... what are the chances they will be any more successful with Dynamo?
It seems to me that Data warehouse experts say Relational Databases (RPBs) are poor at OLAP at scale, and NoSQL experts say that RDBs are poor at OLTP at scale.
It seems to me that RDBs can't scale well--these newer technologies are taking a portion of the RDBs workload and making it scale well.
this is a really mindblowing talk.
great talk about data model with dynamodb
Impressive stuff. I need to soak all of this up :)
GSI is eventually consistent. This is a major issue. I can model the data in dynamodb but because GSI's are NOT strongly consistent, it create many issues/challenges for an application requiring strong consistency.
Just looking at DynamoDB Streams and lambda as a way of building a far better performant DB than any SQL database ever was an eye opener. But the rest of the presentation... mind blown.
Learned a lot! Helped me to resolve a complex issue!
Speaker should host an auction selling database technologies, 10/10 would buy
I am wondering what happens when technology with very low-cost computing resources may be like truly cost-effective quantum computing arrives..do we need to rethink and go back to relational db or something more compute-hungry but efficient?
Thanks for the talk, it was really helpful!
❤️
So, I'm now in the position that I didn't know upfront that we would need another item type and so I named the partition key attribute to something that will not be applicable at all for the new item type. So for example, I'm now storing animals with a partition key like AnimalID, but I didn't know that I would also need to store the veterinarian in the same table. So probably I will need to create a new table and migrate the data and fix references to the attribute name?
So I guess it's arbitrary and therefor not shown in any of the talk's examples, but how would you name the partition key's attribute if the key can be different things for different items?
pk is typically the best practice
This men is a genius!
Excellent presentation!
I am starting to wrap my head around this concept and understand it. Yet, on the 13th, AWS launched DoucmentDB, which sounds like their version of MongoDB. Is it going to replace DynamoDB, or will it have different use cases, for basic consumer apps, what will be the best direction to go with?
DynamoDB is not going away. There are strong opinions in the NoSQL community about differences between so called "Document" and "Wide Column" API's. Those who really know the technology understand that there is no difference when you are handling big data workloads and that is what NoSQL was designed for. The patterns I use apply to all NoSQL backends in some variation. AWS is providing a choice for those who believe they need to have one.
As far as which one to go with I think that comes down to the cost to support the workload. Take a look at both and see which one is the most cost effective for what you are trying to do. Depending on the workload it might be faster to develop on DocumentDB at first but you will probably be introducing scale challenges similar to what I see commonly in MongoDB apps that will force you to do things correctly sooner or later. Take the time up front to model your data for scale and then make the call based on a meaningful TCO analysis.
Access patterns change? You're building a data model which works today, but when the business requires new access patterns which no longer fit, how do you handle that?
Great explanation! Quick question. On the hierarchal data demonstration you use the USA as a partition key. Is that a good partition key in terms of uniqueness? Would that become a hot key?
Potentially the partition could become hot depending on the number of Items on the tree. If you are trying to move more that 1K WCU/3K RCU in/out of the partition you would need to shard the partition key and split the data across more than one logical key. The data would then be processed in parallel across the logical keys. To ensure uniqueness the last item in the composite sort key should be a unique location or ID, like a desk location or userId.
A great presentation!
Fantastic speaker.
Spent the last 5 years trying to explain this to every single developer that we have assigned to use DynamoDB and still 0 get it. Relational databases have destroyed any sensibility of data storage.
This guy is insane. Wow!
Slides if you cannot read it
www.slideshare.net/AmazonWebServices/amazon-dynamodb-deep-dive-advanced-design-patterns-for-dynamodb-dat401-aws-reinvent-2018pdf
How to generate a unique key for sorting just like autoincrement in RDMS? Should we use UUID for it?
Why would you want to sort by an arbitrary value?
12:45 The table graph is not very accurate since both partition key and sort key are required. The second row is not valid.
Nice talk!. One question how can I make my concurrent writes to dynamodb thread safe?
You can use optimistic concurrency with conditional writes check a version or lastUpdated timestamp attribute contains an expected value or the write will fail.
The guy forgot that before relations database, the NoSQL already existed on mainframes based on CODASL model. In the 1980's IBM and Univac had very good DBMS based upon hash keys.
I did not forget that, in fact I often include that in my session when I have more time than an hour to talk.
Well it really is just a giant, managed hashmap
Can you please explain why empty strings aren't allowed in non-key attributes? Or at the very least why they can't be in nested attributes? forums.aws.amazon.com/thread.jspa?threadID=90137
16:30 partition key shouldnt be order id, it should be customer id
I am surprised nobody has mentioned that, I said OrderID but meant CustomerID as I indicated a few seconds later.
I see how this is all working with a single table, but... This feels like abuse (from a cognitive load perspective). A little too much optimization towards implementation details.
Quite interesting though.
The only way to scale is to optimize the data model for the workload. RDBMS gives query flexibility at the cost of CPU, because of this it cannot scale cost effectively. If you do not denormalize then queries are expensive on big data, that is just a fact. NoSQL databases are built on the idea that everything lives in the same collection/table/key space and indexes will be used to produces aggregations of items based on simple queries like "SELECT * FROM TABLE WHERE X=[someValue]". This reduces the CPU cost dramatically and enables applications to scale cost effectively.
That's the whole point of it, getting deep and dirty with your database, this is not for the faint of heart or "one man team full stack" developers. I will admit tho there could be more community tools that abstract the difficulty of it for a more common usage
@@ppgab what difficulty> you need to think carefully about your access patterns and design your keys accordingly. This is not necessarily a very difficult task, that you may not get it right the first time is common in all things automation.
He said that “No SQL is not flexible but very efficient” or something like that
39:59 What whould happen if I want to update the status of the game since the PK cannot be changed?
I assume you are asking about the Composite Key section using status/date to form a faceted list of items sorted by state and date. If you look at the charts the Composite Key is defined on a Secondary Index, not the table. Key values in Secondary Indexes can be updated, so if the items are stored on the table with a Partition Key of SessionID then you can define the GSI on PK = UserID, SK = StatusDate and when the status is updated the items will be re-sorted on the GSI automatically.
@@MrShredder2011 sorting does not scale. You mean, the item is removed and a new item is inserted into the b-tree
Great presentation, Thank you. I happen to be a fullstack developer working mostly with RDBMS databases and thus I tend to keep thinking about the impact the data modelling will impact UI Design. Where does DynamoDB leave the drop down lists on the UI's ? Usually drop down list data gets sourced from a table somewhere(i.e. if not static). For example: person & country table. You are creating a person and therefore you want to pick the person's nationality( i.e. country_id in country table). My RDBMS would look like this 1. person(person_id,first_name,nationality_id....,) 2. country(country_id,name,....). NOTE: data in country table may keep changing (updates & inserts). DynamoDB dictates that all this data goes into one table. How will I get my countries data to present at the drop drop down list? or generally, how will countries' UI component use the table (create,read,update)?
16:15 local and global secondary indexes
Awesome, mind blown!
Does `StartsWith` have any performance impact if you extensively use it to query hierarchical data?
No, not really. Using begins_with is not a significant source of latency even in the rare cases where items that satisfy that query are spread over different partitions.