I have read so many articles, watched so many videos on DynamoDB but none of them come close to the clarity of your explanation. U got yourself a new subscriber. Great job!
You're hands-down the best instructor on TH-cam. Even complex topics are simplified and your videos are concise and straight to the point. Thank you good sir!
Really appreciate your content. I was working on a DynamoDB project with a friend a while back and we were struggling with how to handle some of these queries. We always check documentation, but videos like yours are extremely helpful for making it plain and clear. The visuals are particularly helpful.
Wow. I watch a lot of tech, cloud, Linux, etc. videos, and honestly this is one of the best videos I've seen. Clear, concise, to the point. Subscribed!
This is one of those tutorial videos where you just KNOW that you should immediately subscribe. This channel is a huge help for me a beginner and I look forward to watching more of your videos.
I was totally confused about GSI Before watching your video, but after watching my concept got clear. Thanks a lot for sharing the information in very simplified manner
@@BeABetterDev Very well explained and covering a good ground as well. One thing I am still confused about is that can we query on a partition key and an attribute from GSI in 1 query? I mean without resulting in a scan of table etc? e.g lets say the query is give me all the USA accounts which were created in last month? Will this query hit just the GSI or GSI + Primary key index?
Thanks zPiranhaz! Your comment is exactly why I decided to start this channel - I find the existing AWS documentation makes it difficult to understand whats important, and what you can gloss over. I hope to continue making videos like this one highlighting the important concepts. Thanks again for the support.
That is a great video. Thank you. One thing you might want to fix: you mentioned it is the same as having another partition key, however it would be better if you also mention that gsi does not need to be unique.
Excellent video indeed - well presented. It would be great if you would please have a video on Local secondary index and how it compares to the GSI. In addition, the concept of parallel scan and how it will remediate scan operation, etc. Well done!
Hi Shouki, Taking into account your suggestion, I recently released a video on LSIs. You can check it out here: th-cam.com/video/Y8gMoZOMYyg/w-d-xo.html I'm definitely going to be doing a video on parallel scans in the future. Look forward to it soon!
Great explanation. I do have a question though. At 5:38 what happens when another record with originCountry = Germany is inserted? How is it stored in the OriginCountry GSI Table (with OriginCountry being the partitiionKey)? Does the index table store multiple records having the same partitionKey?
Any insights on the advantage of using something like DynamoDB (which I'd consider more of a document/key-value hybrid, where you essentially have to create GSI's and stuff to query based on attributes) vs. MongoDB, which doesn't have the same restrictions? I guess I'm trying to understand a use case in which DynamoDB would be superior....
This video was so good! You deserve more subs! A quick question though: I noticed that having a GSI charges you, even if you are under the Free Tier. Since I'm early on the development phase, I did not want to be paying for it. Should I just create another table and maintain it myself instead of making use of the GSIs?
@@BeABetterDev Hi! Thank you for answering this. I have the same concern as Peng mentioned. And, in this case, if I really need to have a GSI that allows me query on county, and sort the results by the lastUpdatedAt, what should I do?
Amazing talk as always. Love your videos and I have learnt a lot. One quick qs. I created a GSI on one of the date column but since I have to do a between , I had to use scan operation. Is this a bad way to get data? I can use index with query but that won't let me do any between. So is table scan on gsi an expensive operation?
Hi Matt, What I mean is that ideally the values of GSI should be unique - like a UUID, and not categorical, i.e. ORDERED, SHIPPED, DELIVERED. The latter will cause all records with the same value to be located on the same partition. Keep in mind there is a limited amount of throughput you can achieve per partition, so this may become a problem for some applications.
So, can you update the origincountry field itself? For example, you already had a GSI on origincountry and decided to update "Germany" to "Italy". Will it also update the GSI's partition key from Germany to Italy?
Thanks so much Jonathan for your kind words! If you would like to make any contribution, I have a patreon account here: www.patreon.com/awssimplified . Thanks again and have a great week!
Very useful video, thanks! But I have one question about queries by GSI. Should we always use an IndexName option in query request, if we are looking for specific value of this GSI? I thought that KeyConditionExpression is enough, but my teammate says that IndexName must be present with value of this GSI name....
Local Secondary Indexes or Global Secondary Index Local Secondary Indexes still rely on the original Hash Key. When you supply a table with hash+range, think about the LSI as hash+range1, hash+range2.. hash+range6. You get 5 more range attributes to query on. Also, there is only one provisioned throughput. Global Secondary Indexes defines a new paradigm - different hash/range keys per index. This breaks the original usage of one hash key per table. This is also why when defining GSI you are required to add a provisioned throughput per index and pay for it.
Hey, this is a great video. I have a use case that I am having a hard time modeling with dynamoDB. I have an application that contains a list of items in your area (lets say it scales up to millions), and each user will be shown a list of items, but that item can only be shown to them ONCE. So after a user has seen an item, he/she shouldnt be shown it again. So I need to be able to query for all items in my area that the user has not seen. DynamoDB is good for query against values I know about, but what about unknown values? I thought about using Bloom filters to accomplish this, but I'm not sure if this is a good use case for dynamoDB. I am trying to avoid using a scan and filter expression since that would be costly. Thanks for your help.
Hey James Thanks for the question. Bloom filter sounds like a good idea at first glance. However, it doesn't really help you too much since when you're pulling items in your area, you would need to pull the items first, followed by attempting to negate each one from your result set using your bloom filter in order to determine which ones are candidates to show. For instance, if when querying areas, you pull 10 records, you need to run your bloom filter on each one (possibly 1 by 1) to figure out if you show all 10, a subset of the 10, or none if they have all already been seen. Thinking about this a bit more, you definitely will need to keep track of which records each user has seen. I don't think there's any way around that based on your use case. However there is a question of how you store this information for optimal lookups. The two options are a) 1 record for each customer, and a column that contains a list that indicates each one they have already seen (bad idea), or b) 1 record for each customer + area they have seen combination. Based on this, I don't think Dynamo is the best option for your use case since you will likely need to do double queries (1 for regions, 1 to determine view candidates). This will likely incur some significant costs as your dataset grows. If this application does not have scaling concerns, I would suggest using a RDS based solution such as AWS aurora + some criteria based queries. If scaling is a concern, you should look into using AWS elastic search. Elasticsearch is excellent in performing "fuzzy searches" like you described and allows you to offload much of the logic you need to perform onto the data store itself. Additionally, it is horizontally scalable so dataset growth shouldnt be too much of a concern. Hope this was helpful.
Race condition sounds scary. I would like to know more about them. I am sure there will be some best practices out there to properly use the API in order to not fall into it.
In case of using GraphQL, to avoid race conditions, probably, making separate table for each entity type is better than making GSI for each entity type.
I had a doubt. I read that GSIs do not support strong consistent reads, does that mean, we can't create indexes in dynamoDB (for it to support strong consistency)?
This video makes it sound like GSI's sort key has to be the same as that of the primary key, while in reality a GSI's sort key can be an entirely different attribute (in fact, a GSI need not necessarily be a composite primary key)
Hi. For your GSI, your composite primary key of OriginCountry and CreationDate to me implies that there can only be one row having a given country+creationdate (since a primary key must be unique). Would this mean that someone could not create an account if somebody from the same country already created an account that day?
Hi Lagouyn, Yes you are correct. The Partition Key + Sort Key must be unique. In the example I gave, the timestamp is just the day so the case you described is accurate. This could easily be worked around by using a more specific timestamp - i.e. Day-Month-Year-Seconds-Milliseconds, or something as simple as epoch timestamp in millis.
This channel deserves to be known by more people.
Thank you so much for your kind words!
Exactly
I clearly understood GSI after watching this video. Thanks for a perfect explanation.
Most welcome!
One stop shop for AWS. THIS IS PROBABLY THE BEST CHANNEL I HAVE COME ACROSS..
Thanks so much Satya! Welcome to the channel!
I have read so many articles, watched so many videos on DynamoDB but none of them come close to the clarity of your explanation. U got yourself a new subscriber. Great job!
Thank you so much Silver. You're kind words really mean a lot to me :)
You're hands-down the best instructor on TH-cam. Even complex topics are simplified and your videos are concise and straight to the point. Thank you good sir!
Really appreciate your content. I was working on a DynamoDB project with a friend a while back and we were struggling with how to handle some of these queries. We always check documentation, but videos like yours are extremely helpful for making it plain and clear. The visuals are particularly helpful.
Thanks so much for the kind words Alec! I'm so glad these videos are helping out so many.
Cheers,
Daniel
This is the most simple and clear explanation I've found. Thanks.
Glad you enjoyed!
You Always talk about what developers really needs to understand thanks man.
I finally understood what GSIs are. Thank you so much, perfect explanation.
Great to hear!
One of the best explanations(video, audio, text) that I have seen or read. Thank you so much @BeABetterDev.
Thank you so much for your kind words Ajit!
Wow. I watch a lot of tech, cloud, Linux, etc. videos, and honestly this is one of the best videos I've seen. Clear, concise, to the point. Subscribed!
Thank you so much for your kind words!
This is one of those tutorial videos where you just KNOW that you should immediately subscribe. This channel is a huge help for me a beginner and I look forward to watching more of your videos.
I was totally confused about GSI Before watching your video, but after watching my concept got clear. Thanks a lot for sharing the information in very simplified manner
Happy to help!
@@BeABetterDev Very well explained and covering a good ground as well. One thing I am still confused about is that can we query on a partition key and an attribute from GSI in 1 query? I mean without resulting in a scan of table etc?
e.g lets say the query is give me all the USA accounts which were created in last month? Will this query hit just the GSI or GSI + Primary key index?
There are lots of confusion and misconceptions were there, all are cleared with your one video ... Thumb up ... Thanks ...
Glad that I found your channel! Good and concise explanations for DynamoDB which saves a lot of reading of the rather exhausting documentation.
Thanks zPiranhaz!
Your comment is exactly why I decided to start this channel - I find the existing AWS documentation makes it difficult to understand whats important, and what you can gloss over.
I hope to continue making videos like this one highlighting the important concepts.
Thanks again for the support.
This makes it so easy to clearly understand GSI and LSI !! Keep up the good work.
Thanks, will do!
Wow I am binge watching your videos. Tech made simple and fun. Subscribed!!
It feels great to grasp everything in a video! obviously due to simple and clear explanations!
Great to hear!
Very helpful while i was creating my terraform script to understand whether do i need GSI or not. Subscribed!
Your all DynamoDB videos are informative and just hits to target of what needs to know
Thanks so much Rovshan! I try to to cover the most important topics. Keep a look out for more DynamoDB videos coming soon!
I have never seen a video with 0 Dislikes. Loved the video and very informative. Thank you so much.
Glad you liked it!
@@BeABetterDev Usually never comment on vids but your content is awesome, keep it up and if you create a paid course let us know!
This is the best explanation of GSI
Thanks so much for the support!
Much clearly explained. Have already started visiting all ur videos on Playlist
Thanks so much Yogitha. If you have any recommendations of videos you'd like to see do let me know!
The author explains the concepts in a very nice way... Awesome thanks for posting nice stuff..
Thank you Satya!
This clarified a lot of misconceptions I had around GSIs. Many thanks!
Very welcome!
Wow, this is one of my best channel for learning tech. Can you please write a query to get data from GSIs using python code?
Thanks so much for your kind words Komal!
That is a great video. Thank you. One thing you might want to fix: you mentioned it is the same as having another partition key, however it would be better if you also mention that gsi does not need to be unique.
Thank you! for mentioning this
Great video! It explains clearly the concepts involved. I recomend it.
Excellent video indeed - well presented. It would be great if you would please have a video on Local secondary index and how it compares to the GSI. In addition, the concept of parallel scan and how it will remediate scan operation, etc. Well done!
Hi Shouki,
Taking into account your suggestion, I recently released a video on LSIs. You can check it out here: th-cam.com/video/Y8gMoZOMYyg/w-d-xo.html
I'm definitely going to be doing a video on parallel scans in the future. Look forward to it soon!
Great explanation! much better than the course I've bought. Tks a lot!!
Thanks Renan!
Great explanation. I do have a question though.
At 5:38 what happens when another record with originCountry = Germany is inserted? How is it stored in the OriginCountry GSI Table (with OriginCountry being the partitiionKey)? Does the index table store multiple records having the same partitionKey?
Thank you for making GSI so easy to understand!
Happy to help!
smooth explanation ,surely will be checking your videos first if i m confused about anything
Great video... You can make these videos again and again every year as an update.
At last found answers to my questions .... great explanation 👏👏
Glad it was helpful!
Wonderfully explained! Thanks a lot
You're very welcome!
Best tutorials on dynamodb
Thanks Mej!
Excellent explanation - thanks and keep up the good work!
Thanks so much Storming Barney!
Excellent explanation. Cheers.
I loved your explanation... its so intuitive and informative. Subscribed :)
Thanks Shubho and welcome!
great video. If we use country as partition key that is not in a randam form. How do we garantee uniform distribution to our gsi table?
the videos are really helpful, topics are really explained well, subscribed, thnx
Glad you like them!
Awesome explanation, thank you very much.
I am a "Better Dev" with your channel 🙃
I'm glad I can help Simone!
Stellar explanation, thank you _so_ much 🙏
fantastic content and delivery, thanks!
Amazing explanation bro.
Great content. Please keep up the awesome work 👍🏻
Thank you Likhita!
Great video man! Keep up the good stuff!
Thank you Michal! I am very glad you enjoyed
awesome!!! thanks! Really appreciate your content.
Great explanation.
Thanks Shaleen!
This is exceptional. Can't thank you enough!
You're very welcome!
Any insights on the advantage of using something like DynamoDB (which I'd consider more of a document/key-value hybrid, where you essentially have to create GSI's and stuff to query based on attributes) vs. MongoDB, which doesn't have the same restrictions? I guess I'm trying to understand a use case in which DynamoDB would be superior....
This channel is awsome
Thanks so much Jhony!
Thanks for the video. Creating a GSI will occupy double the space in the database?
Can you please explain we can use GSI for implementing Priority Queue ?
This really cleared a lot of things. Thanks.
Glad it was helpful!
This video was so good! You deserve more subs!
A quick question though: I noticed that having a GSI charges you, even if you are under the Free Tier. Since I'm early on the development phase, I did not want to be paying for it. Should I just create another table and maintain it myself instead of making use of the GSIs?
Hi Yago, you can indeed set up a separate table as an alternative until you are ready to set up your GSI.
@@BeABetterDev thank you so muuch
You're very welcome!
As mentioned in the video, it's better to use uniformed distributed key as GSI. The example uses country, I guess that is not a good choice?
Hi Peng! Great point. Yes this example was more for demonstration purposes. Glad you caught on!
@@BeABetterDev Hi! Thank you for answering this. I have the same concern as Peng mentioned. And, in this case, if I really need to have a GSI that allows me query on county, and sort the results by the lastUpdatedAt, what should I do?
Really great one!! Pro 👍👌
Thank you! Cheers!
Question on Naive approach, "FilterExpression" is applied after DynamoDb has completed "Scan" operation right ?
correct!
Great explanation
Nice video man, keep going, very helpfull
Thank you! I appreciate the support.
Thanks a lot for this wonderful content
Thanks for the support tony!
Thank you, this was super useful!
Good stuff! Thanks for the content! Cost is often overlooked in these design patterns. It basically doubles your storage, correct?
Amazing talk as always. Love your videos and I have learnt a lot. One quick qs. I created a GSI on one of the date column but since I have to do a between , I had to use scan operation. Is this a bad way to get data? I can use index with query but that won't let me do any between. So is table scan on gsi an expensive operation?
As always love your vids.
what does this mean:
"GSI partition key requires uniform data distribution"
Hi Matt,
What I mean is that ideally the values of GSI should be unique - like a UUID, and not categorical, i.e. ORDERED, SHIPPED, DELIVERED. The latter will cause all records with the same value to be located on the same partition. Keep in mind there is a limited amount of throughput you can achieve per partition, so this may become a problem for some applications.
Any common design patterns you can recommend for the staleness issue. That would be really appreciated
very helpful. Thank so much!
You're welcome!
Thank you for the explanation.
Glad it was helpful!
So, can you update the origincountry field itself? For example, you already had a GSI on origincountry and decided to update "Germany" to "Italy". Will it also update the GSI's partition key from Germany to Italy?
Great video. Really helped. Thank you so much for your efforts. :)
Thanks Sakshi! Glad you enjoyed.
So good. How do I send you a tip. Great content.
Thanks so much Jonathan for your kind words! If you would like to make any contribution, I have a patreon account here: www.patreon.com/awssimplified . Thanks again and have a great week!
@@BeABetterDev I'm a patron! Thanks for putting out such great content.
Thanks Jonathan, really appreciate the support!
best dynamo video
Awesome, thanks I’ve subscribed.
Welcome!
You mentioned that it is allowed only 20 GSI, is that by table or by DB?
Hi Carlos, thats per table.
Very nice video
Thanks!
Awesome, thanks!
Very useful video, thanks! But I have one question about queries by GSI. Should we always use an IndexName option in query request, if we are looking for specific value of this GSI? I thought that KeyConditionExpression is enough, but my teammate says that IndexName must be present with value of this GSI name....
Need a clarity - as it is allowing up to 20 more secondary tables for a primary table, will it not consume additional space? Is that not a drawback?
Hi Swapan. It is definitely a drawback and can affect the amount dynamo ends up costing.
thanks! very helpful
You're welcome!
I guess there no automatic sync s when removing or updating data from primary tables and the updating gsi one?
Hi Sergey, the sync happens automatically for you behind the scenes.
Local Secondary Indexes or Global Secondary Index
Local Secondary Indexes still rely on the original Hash Key. When you supply a table with hash+range, think about the LSI as hash+range1, hash+range2.. hash+range6. You get 5 more range attributes to query on. Also, there is only one provisioned throughput.
Global Secondary Indexes defines a new paradigm - different hash/range keys per index.
This breaks the original usage of one hash key per table. This is also why when defining GSI you are required to add a provisioned throughput per index and pay for it.
Hey, this is a great video. I have a use case that I am having a hard time modeling with dynamoDB. I have an application that contains a list of items in your area (lets say it scales up to millions), and each user will be shown a list of items, but that item can only be shown to them ONCE. So after a user has seen an item, he/she shouldnt be shown it again. So I need to be able to query for all items in my area that the user has not seen. DynamoDB is good for query against values I know about, but what about unknown values? I thought about using Bloom filters to accomplish this, but I'm not sure if this is a good use case for dynamoDB. I am trying to avoid using a scan and filter expression since that would be costly. Thanks for your help.
Hey James
Thanks for the question. Bloom filter sounds like a good idea at first glance. However, it doesn't really help you too much since when you're pulling items in your area, you would need to pull the items first, followed by attempting to negate each one from your result set using your bloom filter in order to determine which ones are candidates to show. For instance, if when querying areas, you pull 10 records, you need to run your bloom filter on each one (possibly 1 by 1) to figure out if you show all 10, a subset of the 10, or none if they have all already been seen.
Thinking about this a bit more, you definitely will need to keep track of which records each user has seen. I don't think there's any way around that based on your use case. However there is a question of how you store this information for optimal lookups. The two options are a) 1 record for each customer, and a column that contains a list that indicates each one they have already seen (bad idea), or b) 1 record for each customer + area they have seen combination.
Based on this, I don't think Dynamo is the best option for your use case since you will likely need to do double queries (1 for regions, 1 to determine view candidates). This will likely incur some significant costs as your dataset grows.
If this application does not have scaling concerns, I would suggest using a RDS based solution such as AWS aurora + some criteria based queries. If scaling is a concern, you should look into using AWS elastic search. Elasticsearch is excellent in performing "fuzzy searches" like you described and allows you to offload much of the logic you need to perform onto the data store itself. Additionally, it is horizontally scalable so dataset growth shouldnt be too much of a concern.
Hope this was helpful.
@@BeABetterDev Yes, this absolutely is helpful. Thank you! I really enjoy your videos. Please keep it up.
You are very welcome and thank you for the support!
Race condition sounds scary. I would like to know more about them. I am sure there will be some best practices out there to properly use the API in order to not fall into it.
Excellent!
should we not have different table for different collections???
great explination
Thank you so much.
wouldn't creating an index that alters primary keys potentially violate normalization rules if not adhered to?
thanks , this is really good.
Thanks for the support!
If i have to create a Global Secondary Index for a map type of data or Nested String Data how to do so?
In case of using GraphQL, to avoid race conditions, probably, making separate table for each entity type is better than making GSI for each entity type.
I had a doubt. I read that GSIs do not support strong consistent reads, does that mean, we can't create indexes in dynamoDB (for it to support strong consistency)?
Hi Ankit. You are correct, GSI's do not support consistent reads.
This video makes it sound like GSI's sort key has to be the same as that of the primary key, while in reality a GSI's sort key can be an entirely different attribute (in fact, a GSI need not necessarily be a composite primary key)
Hi , i have one question is there any way to implement one -many or many to many in Dynamo DB .
Can you do a video about sparse index?
I'll look into Sparse Indexing for future videos. Thanks for the suggestion!
THANK YOU 🙏🏻
You are very welcome!
Hi. For your GSI, your composite primary key of OriginCountry and CreationDate to me implies that there can only be one row having a given country+creationdate (since a primary key must be unique). Would this mean that someone could not create an account if somebody from the same country already created an account that day?
Hi Lagouyn,
Yes you are correct. The Partition Key + Sort Key must be unique. In the example I gave, the timestamp is just the day so the case you described is accurate. This could easily be worked around by using a more specific timestamp - i.e. Day-Month-Year-Seconds-Milliseconds, or something as simple as epoch timestamp in millis.
Thank you.