If you're aware of how caching works and LRU and came here curious about the design of how distributed caches work, start at 23:42 . And very well put content, as always :)
29:17 snapshot + log replay. Normally restore from recent snapshot first, then replay the remaining log. All previous logs before the recent snapshot timestamp can be reclaimed.
but how does snapshotting and log replaying work if node is down for a long while and a new master has completely new snapshot and other log (log is cleared after snapshotting). Is the snapshot passed to the slave which just restarted or how does this happen. Otherwise, even though the node replays the log it still has stale data comparing to its peers
@@DenisG631 For sure, slave nodes or active-active nodes will have the sync-up periodically, if the server goes down, the next-chozen server will start the latest snapshot itself have(it may be earlier than the down server's snapshot), and replay all the logs since this new server's snapshot.
Thanks for the video tutorials, thats first :), but I still hv some questions/comments, may b i am wrong, but let me tell u that..plz correct me if i m wrong...1. Not clear - how ur design got impacted with the estimation that u made at first, how ur design could get changed in case of 5 request/sec tps 2. How a value in hastable results into constant time lookup in linked list 3. Only apart from the last part, all other areas are also applicable for monolithic system., so it seems like that distributed philosophy is needed to be adderessed more
Yes. we will need to remove it from hash table as well. For this we already have the value that needs to be removed (on that node). We can pass that value through hash function and remove that value from the hash table.
Personally I don't think the hash table and linked list approach used here is good for an LRU caching data structure. Especially since with caching, you want fast access. The linked list approach described with the index positions requires O(n) access to loop through the linked list. You can use a hash table and linked list to have O(1) access by having the hash table keys mapped to each linked list node (instead of a key mapped to a chain of nodes). So if you have the table keys [1, 2, 3, 4, 5] and the linked list A B C D E, you could have the mapping [1: B, 2: A, 3: C, 4: E, 5: D]. The most recently accessed object is moved to the top of the linked list which is very simple to do for a double linked list (O(1) time). The key reference remains the same - only the node positions change. The point is as new data is added to the cache, the last node is deallocated if the limit is reached. Reading is O(1), moving to top is O(1), and deallocating is O(1).
How switching to master slave pattern solves the syncing issue between the replicas? If we are only writing to master, how does the slave get updated? By snapshot? That will still havé inconsistency right?
You accept writes at Master nodes, and let the replication log propogate to the secondary nodes. That is the basic Multi Leader replication. But but but, There seems to be multiple masters in the last explanation, in such a case even masters would need to sync up with themselves. If that sync is via network (which it ofc is), what is the point of using it in a caching system!! Better invalidate and then make DB queries.
@@jasonparraga3391 : The distributed cache has been already discussed in the other video of consistent hashing which he mentioned during this video. The link is also given!
I guess typical ideas apply. Master/Slave replication. Master for reads+writes/Slaves for reads only. Typical Leader detection. Nothing particular to distributed cache since can be applied to almost any distributed system
In the Availability section, you have shown 1 and 2 as nodes. How does data distribution happen between nodes? What is the partition strategy? How can we redirect read/writes to specific nodes where they were previously written ? How do we decide no of nodes required? Can you provide info on all these or refer me to any video ?
lack of distributed part. Lots of challenging problems are not discussed. how to partition data ? how to use consistent hashing, how to rebalancing data? replication, cluster management, hot keys issue log compaction etc.
In-memory snapshotting leads to bad performance .. so how about designing a background thread to fanout in-memory state on disk. And then generate snapshot of disk state.
ARC is probably the best cache eviction algorithm as it takes care of both recency and number of hits. Its generally not seen in open source as IBM holds a patent for the algo.
It misses very important part: how does LRU eviction work ? In your design we add to cache as many items as we want leading to out of memory. I think you shall have mentioned that LRUs are limited either by the number of elements or their total size. If adding a new element would exceed limit then we need to evict one element(*) and we pick this element from the front of linked list and based on it's key we remove entry from hash table and delete node in linked list. * - if w limit LRU by memory size we might keep removing elements till memory goes back below threshold.
Thank you so much for giving a lot around thought process and guiding the though process while designing complete system around any concept/problem. Thank you. Your in detailed content be the saviour for me in lead interview. Thank you.
Thanks for vid, one thing that seemed off, you said getting from the cache is a constant cost, O(1) but wouldn't it be based on the size of the linked list n so it would be O(n) linear search or at O(n/2) if we traverse from either end? Again top video with so much details!
the list is never accesed linearly in this case because it's only used for "sorting". getting is still O(1) because the fields in the hash table are pointing to the nodes and that's how we access them
@@vmod3985 I got that we have the association between the hash table and the linked list, but I still having problem to understanding why he said O(1). My concern is because in my mind we need to go through node by node in the linked list to find the index.
@@ythalorossy The HashTable keeps the node's address or the node itself as value against the given key (thus O(1)). The linkedList is there to maintain the Least Recently Used node at the end of it.
In my opinion this design is not entirely scalable and does not addresses the scenario about how data will be distributed across clusters deeply. Consistent Hashing will help in that scenario and it will also help in fault tolerance. You will not loop an event queue in that scenario. Also event queue does not guarantee high availability and time of get/put will be more than O(1).
It lacks many things present in distrubuted cache for example how Redis cluster shards and routes data (only a fraction of it was discussed), how redis client makes a connection to the redis cluster, etc. It focused more on storage on single node than a distrubuted cache.
Searching element in linked list by index has O(n) time complexity since we need iterate from 0 to index value, so we lose hash table O(1) time, right? In this case, why we need hash table at all, if we need to search through the linked list every time we access the value?
For log reconstruction, only write operation is needed, so we can omit read operations. As for the fault tolerance, since it's only cache, I think we can just start up a new clean server instead of doing the replication.
Imagine cache nodes being rebooted in production, and having no data. All reads to go database putting a lot of pressure on systems. High chances of increased error rate s till the time cache gets populated
Thanks for the video. I have a question. For ensuring availability, how about having a load balance which decides which server to hit. It would be more convenient design because only write operation need to be synced to both servers, whereas read operation can happen via any server. This would also ensure minimum latency for requests. Expecting your response
Great, can you make a video on "Design a Amazon locker system" and "Chicken nugget problem". I understand it takes a lot of work to make a video, and i appreciate your work a lot. It would be helpful if you can make a design interview video on locker system soon.
Your channel is very helpful. Is it possible to make video on Traffic Light System Design because I have found this questions in many Big Tech Companies Interview. Appreciated for this video on Distributed caching and expecting more from you. You are almost like celebrity for us.
How does write around cache result in a cache miss? Assume data is both in cache and DB. If you update that data and write directly to DB without going through the cache, then the cache would have stale data, and would respond with that stale data. How would a cache miss happen?
(25:20 Event Library) About event library schduling the tasks we can use tevent: tevent.samba.org/tevent_thread.html Code github.com/amitkumar50/Code-examples/blob/master/tevent/adding_events_to_queue.c
29:25 this design will result in some cache miss due to reconstruct the hashtable in memory. also some recent data will not be read, so this is an eventual consistent system. basically an AP system. Redis does the same; it uses AOF and RDB. I think it is good idea to have a configuration for user to choose if they want to use AOF or RDB or both. The reason is log replay (AOF) takes too much time resulting in bad performance. Every time you start a new node in the cluster, you have to wait for long time for it to restore the data in memory.
@@DenisG631 that is correct. Let's say one slave is in another node. Read comes and master is in construction mode and not able to serve that read req. read req will normally linear probe the next node to find the slave. This is the design I will go for.
@@blackcatcaptain2022I was thinking of a load-balancer + master/slave scenario, where all nodes are within one datacenter, otherwise, the latency may be too high
Liked your content. Partly the content is inspired (I don't mean copy paste here) by Cassandra architecture. Or in other words, we can always draw parallel (conceptually) with these distributed architecture frameworks. Thanks for raising my awareness.
When implementing LRU cache, What if there is collision in insertion? So if another (new) key also has hash value 3 and how do we insert it into double linked list?
Have a look at the OpenJDK implementation of LinkedHashMap. Every element has 3 pointers and is simultaneously in 2 different linked lists: one singly-linked list for the hash chain, and a doubly-linked list for LRU purposes.
In the Write Around pattern, how does the Read operation know that the latest data in the cache could be invalid? The writes always go around the cache, so imagine the following situation. Write to address A1, read from A1, and it causes miss and cache update, then write again to the address A1. Now if I read again from address 1, how would the cache know that it has to re-read the data even though some data is already there?
Nice. In case cache system is down, we any how have replica as we put multiple servers for availability, Why do we log file? If all master slave crashes with very low probability, then log file can also have probability to get lost,now if we replicate log file also,we are making it complex, and given the scale, log file will be too huge. I think if we lost both master slave in worst, let it auto build from and get stable, let me know what you guys think.
RAM is blocking part, in event loop design RAM would still be blocking to the threads from the pool, ie. only one tread can access it. So how is it better from other designs? Is cache is shradded and duplicates are maintained?
Very good video but certain details missed like when the node in the d list is removed what are the changes in hash map value changes? I know it's very difficult to pack everything. Never mind it's a great learning experience
Really good video .. How are typically range queries handled? Firstly they condition may vary and also the range may vary? Most probably the same condition or range may never be used for a while? Do normally range queries get cached?
@@castrothomas7127 How will you determine priority on the request? It will have the same priority so they will be used in the order they were received and basically will nothing more than just big array which we are trying to avoid. To determine priority you will have to have additional service running through the whole queue constantly or maybe adding additional data structures to store some statistics and so on...
The distributed cache design is so close to a distributed database design, I could not tell much difference there. For example, you need to increase the scalability - add shards; to increase availability - add replica; to avoid data loss during machine crash - write a log into disk. Even the modern databases (e.g. SSTable) store the fresh data inside of memory to improve the latency before saving to persistence storage. I would say, the boundary between a cache and database is really vague now (for example, Redis claims itself a database too).
when we have replicas for high availability then why do we need to worry about fault tolerance?If once fails, other will take up and failing both at same time will have less probability and if that happens in worst case,we can rebuild from database?
You are right. We need buffers to handle concurrent writes and reads, and I think we need one queue for only writes and one queue for only reads. Could you please talk more on concurrent topic ?
@@blackcatcaptain2022 why 2 buffers/queues? Why not one queue for all events? Since all events reading or writing mutate the internal data structure (moving the linked list node to the tail of the list)
How do make sure requests are served in the same order they are requested, as 2 different requests might be updating the same key. Queue could help but since there is a thread pool behind it, you can not guarantee ordering. Also how would you avoid locking issues if 2 threads are updating the same key simultaneously?
Excellent question! The way I am thinking, we could store the value as timestamp_value. If the new write is of an older timestamp, we could drop that write
Thanks for the video. This is very helpful for those like us who are starting with system design. Would you also mind sharing any resources like blogs/books etc.
@@TechDummiesNarendraL Does that mean FB mainly uses Memcached cus of its better performance than Redis? then why Twitter use Redis not Memcached? Thanks a lot!
Just one point, I am not quite clear here. You mentioned like while finding a node and delete it in constant time from Doubly linked list in LRU. Deleting and adding it at the end is fine but to get the node in doubly linked list will in O(n) time. can you please elaborate little more on that please?
@narendra . Is it possible to optimize the collison case , currently it is o(n) if there are n values for the same key. Also how do you handle the case where mulitple threads are modifying the same key ?
@piyush do you have a case when you have n values for same key? If so how do ypu distinguish which one is your value. I think you are talking about collision in hash function, when ,ultiple key hash to same value and hence ypue have multiple key value pairs, for that firstly we use a strong hash that tends to have miminum collisions and even if they collide go by approach told by @Shiv pratap to optimize to log n
I was asked a system design question. LinkedIn wants to Integrate with Dropbox. LinkedIn does not do business about document storage. However LinkedIn had 400 webservers and after integration with Dropbox, they do not want to scale up. Is it possible to make such integration without adding more servers?
Regarding log reconstruction, I suppose log data will be written to disk only after sufficient data is accumulated or a certain period has passed to reduce the disk access latency. If that is the case then how do you make sure data is not lost during that period?
Very informative videos. I have confusion and need more visibility. If we talk about the Cache write-through/write-back approach, our cache would be 1st level of storage and the application will write/read data from the cache. My confusion is that my application will fire the JDBC SQL query to fetch data and data will be responded back through the cache. How my cache having the capability to understand JDBC query ... Even if we consider the default cache implementation, data 1st write to Db, and while read request, the 1st hibernate layer will search whether the data is available with cache, if yes that respond back through cache data. In this scenario also no call will send to DB and respond back using cache data. how my cache having JDBC SQL understanding.
My concern is specific to Native SQL query, how it works if we provisioned our hibernate second level cache as write-through/write-back approach as with this cache configuration read/write happens on the cache first. How cache having capabilities to work with SQL native query
In both mechanisms of making fault tolerant, you are mentioning both as async as well as in both we are just creating a log file, so whats the difference, its quite vague!!
For the LRU policy, how are reads still going to be O(1) if the values are stored in list? When you re-access key 2 and placed the node at the end of the list you are not updating the indexes in the hash table. How does the cache know which index in the list contains my key (2) when its re-accessed without looking sequentially in the list?
Jorge Smulevici just save the key in the linked list node the hash table doesn’t keep order just need the key also the key just noved position, it didn’t change
When you delete the linked list the address are still present in the hash table. When you access the address you will get garbage value. How to overcome this?
One question Naren, on design, as we have event loop in place, suppose we have two requests put and get, so get will have to wait until put request completed and than event loop will read this. Doesn't this event loop be a multithreaded ?
A linked list is O(1) to find either of the two end elements (least recently and most recently used). The hash map's hash chain finds elements by key in O(1) time. Since the LRU list is doubly-linked, you can pull the element out of the list in O(1) time and place at at either end of the doubly-linked list in O(1) time. You're never scanning for, say the 117th most recently used item. We're always evicting the item at the least-recently-used end, removing an element we already have a pointer to via the hash map hash chain, and re-inserting the accessed element at the most-recently used end of the LRU list. Have a look at the OpenJDK implementation of LinkedHashMap if you're curious.
buckets should be indexed 0-9 instead of 1-10 for hash % 10 to land in buckets Also - at one point caching is spelled 'cashing' and patterns is spelled 'patters'
is distributed cache attached to any server .. U said 1 sever is down then we will have cache miss .. If cache was distributed then we should not lose data from cache no ???
There are no collisions in the LRU lineked list. One item is accessed, than another. There are no ties for when items were accessed. Even if there were a collision as to which was most recently used, you'd just make an arbitrary choice and treat one as older and one as younger.
How can we use thread pool if manipulating of the LRU cache is not thread-safe. Since we are mutating internal data structure (linked list) on reading and writing (moving the k/v pair to the end of the linked list), this way we can only change one node at a time. This implies that there is no benefit of thread pool in your design, one thread would be as good. Or this linked-list LRU cache explanation is just for fun and in reality, you need to use a concurrent linked list implementation of some sort? Otherwise thread pool doesn't really make sense in my opinion
I think using commit log will guarantee all concurrent reads and writes to be serialized. so no concurrent r/w will impact the in-memory data structure.
@@blackcatcaptain2022 you mean a background job/task will go through commit log and adjust the internal data structure accordingly? Meaning the rate of log entries update (append) is faster than the writer/data structure update thread which is trying to catch up? This way while processing a read request you may return data which in reality (after processing all the logs) should have been removed. Implementing concurrent data structures is a hard thing to do. Simply this implementation of LRU cache reminded me of a splay tree. Which is cool, but not concurrent-friendly
@@DenisG631 yes, read will eventually get recent write. that is why it is called eventual consistency. Using commit log guarantee all writes are persisted.
@@blackcatcaptain2022 eventual consistency makes sense in a distributed scenario. I, on the other hand, was discussing manipulations done on a single node What I was saying is regarding the reads not being blocked, i.e. 2 reads can happen at the same time in parallel and nothing will happen, but this (in my opinion) can not be done due to mutation of the internal data structure. One option is to perform reads in parallel and update the internal data structure later (what you mean with "eventual") even though I never heard of eventual consistency within one node ¯\_(ツ)_/¯
If you're aware of how caching works and LRU and came here curious about the design of how distributed caches work, start at 23:42 . And very well put content, as always :)
I have implemented these cache on system in code levels. Checkout the leetcode LRU cache problem.
You are a Hero, man! These are all the topics that we actually need. And sadly find no decent tutorial videos online! Thanks.
Hey, Thanks :)
29:17 snapshot + log replay. Normally restore from recent snapshot first, then replay the remaining log. All previous logs before the recent snapshot timestamp can be reclaimed.
but how does snapshotting and log replaying work if node is down for a long while and a new master has completely new snapshot and other log (log is cleared after snapshotting).
Is the snapshot passed to the slave which just restarted or how does this happen.
Otherwise, even though the node replays the log it still has stale data comparing to its peers
@@DenisG631 For sure, slave nodes or active-active nodes will have the sync-up periodically, if the server goes down, the next-chozen server will start the latest snapshot itself have(it may be earlier than the down server's snapshot), and replay all the logs since this new server's snapshot.
Thanks for the video tutorials, thats first :), but I still hv some questions/comments, may b i am wrong, but let me tell u that..plz correct me if i m wrong...1. Not clear - how ur design got impacted with the estimation that u made at first, how ur design could get changed in case of 5 request/sec tps 2. How a value in hastable results into constant time lookup in linked list 3. Only apart from the last part, all other areas are also applicable for monolithic system., so it seems like that distributed philosophy is needed to be adderessed more
Given the time you were on the spot, my expectation was higher! I was more expecting HA & DR as well..! As usual, you are awesome.
When you delete the head of the LRU linked list to make space in the cache, wouldn't you also need to invalidate its address stored in the hash table.
Yes. we will need to remove it from hash table as well.
For this we already have the value that needs to be removed (on that node). We can pass that value through hash function and remove that value from the hash table.
I really didnt see any distributed cache system in most of the video... the title should be changed to working of cache.
You need to check the big picture and assemble the puzzles here. He has spit the design into several parts.
Personally I don't think the hash table and linked list approach used here is good for an LRU caching data structure. Especially since with caching, you want fast access. The linked list approach described with the index positions requires O(n) access to loop through the linked list. You can use a hash table and linked list to have O(1) access by having the hash table keys mapped to each linked list node (instead of a key mapped to a chain of nodes). So if you have the table keys [1, 2, 3, 4, 5] and the linked list A B C D E, you could have the mapping [1: B, 2: A, 3: C, 4: E, 5: D]. The most recently accessed object is moved to the top of the linked list which is very simple to do for a double linked list (O(1) time). The key reference remains the same - only the node positions change. The point is as new data is added to the cache, the last node is deallocated if the limit is reached. Reading is O(1), moving to top is O(1), and deallocating is O(1).
who else noticed 'cashing' in the beginning whiteboard ?
Great content bro
How switching to master slave pattern solves the syncing issue between the replicas? If we are only writing to master, how does the slave get updated? By snapshot? That will still havé inconsistency right?
You accept writes at Master nodes, and let the replication log propogate to the secondary nodes. That is the basic Multi Leader replication. But but but, There seems to be multiple masters in the last explanation, in such a case even masters would need to sync up with themselves. If that sync is via network (which it ofc is), what is the point of using it in a caching system!! Better invalidate and then make DB queries.
I think most of the talk was on cache rather than distributed cache. May be some parts of sharding/quorum aspects might also need to be included here.
you shard cache? never heard of it
@@fenggeliu4241 yes we can, and such cache systems are called distributed cache systems.
Agree! Would appreciate more focus on the distributed aspect.
@@jasonparraga3391 : The distributed cache has been already discussed in the other video of consistent hashing which he mentioned during this video. The link is also given!
I guess typical ideas apply.
Master/Slave replication. Master for reads+writes/Slaves for reads only.
Typical Leader detection. Nothing particular to distributed cache since can be applied to almost any distributed system
In the Availability section, you have shown 1 and 2 as nodes. How does data distribution happen between nodes? What is the partition strategy? How can we redirect read/writes to specific nodes where they were previously written ? How do we decide no of nodes required? Can you provide info on all these or refer me to any video ?
The best thing i like in this course is your style of wearing the cap in presentation.
bhai koi style nhi h...... baal nhi h iss bhai k .. mera bhi same scene h and m bhi cap lagata hu
lack of distributed part. Lots of challenging problems are not discussed. how to partition data ? how to use consistent hashing, how to rebalancing data? replication, cluster management, hot keys issue log compaction etc.
In-memory snapshotting leads to bad performance .. so how about designing a background thread to fanout in-memory state on disk. And then generate snapshot of disk state.
@@blackcatcaptain2022 Should also be a combination of both log and snapshot.. log should only maintain state from last snapshot
@@y5it056 yes of course.
LINKIN PARK T-shirt??? I love this video already!
The bucket range will be from 0-9 at 6:00, it will be never coming to 10 (x%10 range from 0-9)
ARC is probably the best cache eviction algorithm as it takes care of both recency and number of hits. Its generally not seen in open source as IBM holds a patent for the algo.
It misses very important part: how does LRU eviction work ? In your design we add to cache as many items as we want leading to out of memory. I think you shall have mentioned that LRUs are limited either by the number of elements or their total size. If adding a new element would exceed limit then we need to evict one element(*) and we pick this element from the front of linked list and based on it's key we remove entry from hash table and delete node in linked list. * - if w limit LRU by memory size we might keep removing elements till memory goes back below threshold.
Thank you so much for giving a lot around thought process and guiding the though process while designing complete system around any concept/problem. Thank you. Your in detailed content be the saviour for me in lead interview. Thank you.
Thanks for vid, one thing that seemed off, you said getting from the cache is a constant cost, O(1) but wouldn't it be based on the size of the linked list n so it would be O(n) linear search or at O(n/2) if we traverse from either end? Again top video with so much details!
the list is never accesed linearly in this case because it's only used for "sorting". getting is still O(1) because the fields in the hash table are pointing to the nodes and that's how we access them
@@vmod3985 I got that we have the association between the hash table and the linked list, but I still having problem to understanding why he said O(1). My concern is because in my mind we need to go through node by node in the linked list to find the index.
@@ythalorossy The HashTable keeps the node's address or the node itself as value against the given key (thus O(1)). The linkedList is there to maintain the Least Recently Used node at the end of it.
Really a great explanation. 🎉
In my opinion this design is not entirely scalable and does not addresses the scenario about how data will be distributed across clusters deeply. Consistent Hashing will help in that scenario and it will also help in fault tolerance. You will not loop an event queue in that scenario. Also event queue does not guarantee high availability and time of get/put will be more than O(1).
But an event queue would be needed because you do not want to lose your requests in case there is throttling.
It lacks many things present in distrubuted cache for example how Redis cluster shards and routes data (only a fraction of it was discussed), how redis client makes a connection to the redis cluster, etc. It focused more on storage on single node than a distrubuted cache.
Brother your videos are fantastic. I just have a suggestion to use a better microphone for more clear recording. Thanks for this video.
thanks bro!! it was an awesome video, i was really searching for an good caching video!!, it was explanatory!!
Does the write around pattern deletes the entry in the cache before updating db? otherwise the cache read would return inconsistent data. 4:14
How is key collision handled in your 2 nd scenario when double link list is maintained?
Wow!! One video and you cleared thousands of doubts !!
Great Info sir 👏
Searching element in linked list by index has O(n) time complexity since we need iterate from 0 to index value, so we lose hash table O(1) time, right?
In this case, why we need hash table at all, if we need to search through the linked list every time we access the value?
Very nicely explained. Could you come up with a video on 'Designing Distributed Scheduler' ?
For log reconstruction, only write operation is needed, so we can omit read operations. As for the fault tolerance, since it's only cache, I think we can just start up a new clean server instead of doing the replication.
Imagine cache nodes being rebooted in production, and having no data. All reads to go database putting a lot of pressure on systems. High chances of increased error rate s till the time cache gets populated
Thanks for the video. I have a question. For ensuring availability, how about having a load balance which decides which server to hit. It would be more convenient design because only write operation need to be synced to both servers, whereas read operation can happen via any server. This would also ensure minimum latency for requests. Expecting your response
Thank you for the AWESOME tutorial
Again Very nice Video.. thanks sir :)... waiting more videos like this....
Great, can you make a video on "Design a Amazon locker system" and "Chicken nugget problem". I understand it takes a lot of work to make a video, and i appreciate your work a lot. It would be helpful if you can make a design interview video on locker system soon.
Hey can we discuss amazon locker
Your topics are great and your explanations also.
But it’s hard to hear!
Just to clarify
So, the two architectures used in high availability section is a) active-active cluster and b) active passive cluster?
Your channel is very helpful. Is it possible to make video on Traffic Light System Design because I have found this questions in many Big Tech Companies Interview. Appreciated for this video on Distributed caching and expecting more from you. You are almost like celebrity for us.
Thanks @Bipin, this means a lot.
In LRU in hashing bucket what will happen if the collision happens which one linked list address will it keep
Time:20:51 Removal from doubly LL is not O(1) it same as singly LL. We need to search element, and we are only aware of Head & tail pointers.
Can be done in O(1) if hashmap stores pointer to dll node as value
Is linked list used for distributed system as well, wouldn't that be too slow if used in multi-threaded environment?
How does write around cache result in a cache miss? Assume data is both in cache and DB. If you update that data and write directly to DB without going through the cache, then the cache would have stale data, and would respond with that stale data. How would a cache miss happen?
due to TTL. key is valid only for a certain period of time
(25:20 Event Library) About event library schduling the tasks we can use tevent:
tevent.samba.org/tevent_thread.html
Code github.com/amitkumar50/Code-examples/blob/master/tevent/adding_events_to_queue.c
29:25 this design will result in some cache miss due to reconstruct the hashtable in memory. also some recent data will not be read, so this is an eventual consistent system. basically an AP system. Redis does the same; it uses AOF and RDB. I think it is good idea to have a configuration for user to choose if they want to use AOF or RDB or both. The reason is log replay (AOF) takes too much time resulting in bad performance. Every time you start a new node in the cluster, you have to wait for long time for it to restore the data in memory.
more discussion on AOF and RDB: redis.io/topics/persistence
you are the MVP. Thanks for the link
cache miss will occur only if there are no slaves that back up the master, right?
@@DenisG631 that is correct. Let's say one slave is in another node. Read comes and master is in construction mode and not able to serve that read req. read req will normally linear probe the next node to find the slave. This is the design I will go for.
@@blackcatcaptain2022I was thinking of a load-balancer + master/slave scenario, where all nodes are within one datacenter, otherwise, the latency may be too high
Liked your content. Partly the content is inspired (I don't mean copy paste here) by Cassandra architecture. Or in other words, we can always draw parallel (conceptually) with these distributed architecture frameworks. Thanks for raising my awareness.
When implementing LRU cache, What if there is collision in insertion? So if another (new) key also has hash value 3 and how do we insert it into double linked list?
How do you handle Cache collision in LRU eviction mechanism, since you are using single entry in bucket, pointing to doubly-linked list?
+1
Keeping a list of addresses in the hash table should still work.
Have a look at the OpenJDK implementation of LinkedHashMap. Every element has 3 pointers and is simultaneously in 2 different linked lists: one singly-linked list for the hash chain, and a doubly-linked list for LRU purposes.
Hi Narendra - great videos indeed, pls keep them coming! By the way how to do handle concurrency (writes) issues?
In the Write Around pattern, how does the Read operation know that the latest data in the cache could be invalid? The writes always go around the cache, so imagine the following situation. Write to address A1, read from A1, and it causes miss and cache update, then write again to the address A1. Now if I read again from address 1, how would the cache know that it has to re-read the data even though some data is already there?
Nice. In case cache system is down, we any how have replica as we put multiple servers for availability, Why do we log file? If all master slave crashes with very low probability, then log file can also have probability to get lost,now if we replicate log file also,we are making it complex, and given the scale, log file will be too huge. I think if we lost both master slave in worst, let it auto build from and get stable, let me know what you guys think.
RAM is blocking part, in event loop design RAM would still be blocking to the threads from the pool, ie. only one tread can access it. So how is it better from other designs? Is cache is shradded and duplicates are maintained?
Very nice video and very good content. Thanks a lot 😊
Very good video but certain details missed like when the node in the d list is removed what are the changes in hash map value changes? I know it's very difficult to pack everything. Never mind it's a great learning experience
Thanks and I have already made a video on consistent hashing. Please take a look at description of the video.
Really good video ..
How are typically range queries handled? Firstly they condition may vary and also the range may vary? Most probably the same condition or range may never be used for a while? Do normally range queries get cached?
Hey, can someone help me to understand when someone would create their own cache when we already have the systems like redis and memcached ?
"Cashing Best Practices" @ 2min. Best section title ever.
Dude what if instead of doubly linked list you use priority queue... And the priority is given depending on how often it is used
@@castrothomas7127 How will you determine priority on the request? It will have the same priority so they will be used in the order they were received and basically will nothing more than just big array which we are trying to avoid. To determine priority you will have to have additional service running through the whole queue constantly or maybe adding additional data structures to store some statistics and so on...
The distributed cache design is so close to a distributed database design, I could not tell much difference there. For example, you need to increase the scalability - add shards; to increase availability - add replica; to avoid data loss during machine crash - write a log into disk. Even the modern databases (e.g. SSTable) store the fresh data inside of memory to improve the latency before saving to persistence storage. I would say, the boundary between a cache and database is really vague now (for example, Redis claims itself a database too).
hi. Can you make some video on how to handle heavy traffic ,loadbalancing ,zookeeper. how all these factors helps to increase response time
Good one. Do we need to talk about the updates to distributed hash map ? like locking the buckets etc ?
when we have replicas for high availability then why do we need to worry about fault tolerance?If once fails, other will take up and failing both at same time will have less probability and if that happens in worst case,we can rebuild from database?
28:44 need buffer ? Log is already appended log to receive all W/R serialized.
You are right. We need buffers to handle concurrent writes and reads, and I think we need one queue for only writes and one queue for only reads. Could you please talk more on concurrent topic ?
@@blackcatcaptain2022 why 2 buffers/queues? Why not one queue for all events? Since all events reading or writing mutate the internal data structure (moving the linked list node to the tail of the list)
awesome explanation.
Won't the keys fall back to 2 when 1 fails in the case of consistent hashing? 30:05
How do make sure requests are served in the same order they are requested, as 2 different requests might be updating the same key. Queue could help but since there is a thread pool behind it, you can not guarantee ordering. Also how would you avoid locking issues if 2 threads are updating the same key simultaneously?
Excellent question!
The way I am thinking, we could store the value as timestamp_value. If the new write is of an older timestamp, we could drop that write
Do we cache images and how we do that and doesn't that increases size of database?
Thanks for the video. This is very helpful for those like us who are starting with system design. Would you also mind sharing any resources like blogs/books etc.
here you go dzone.com/
highscalability.com/
medium.com/netflix-techblog
eng.uber.com/
martinfowler.com/
www.oreilly.com/feed/four-short-links
news.ycombinator.com/
ai.google/research/
hackernoon.com/
machinelearningmastery.com/
www.analyticsvidhya.com/
www.kaggle.com/
www.codingthearchitecture.com/
Great work! Wonder why Facebook chose memcached over Redis...
Memcache uses multithreading but not redis.
@@TechDummiesNarendraL Does that mean FB mainly uses Memcached cus of its better performance than Redis? then why Twitter use Redis not Memcached? Thanks a lot!
Just one point, I am not quite clear here. You mentioned like while finding a node and delete it in constant time from Doubly linked list in LRU. Deleting and adding it at the end is fine but to get the node in doubly linked list will in O(n) time. can you please elaborate little more on that please?
The hashtable maps the key to the node. That's how you get the node in constant time.
Nice Video..Thank you
Thanks for this! Great content across your channel.. I feel stronger on system design :)
nice to see LP on your t-shirt.
@narendra . Is it possible to optimize the collison case , currently it is o(n) if there are n values for the same key. Also how do you handle the case where mulitple threads are modifying the same key ?
@piyush do you have a case when you have n values for same key? If so how do ypu distinguish which one is your value. I think you are talking about collision in hash function, when ,ultiple key hash to same value and hence ypue have multiple key value pairs, for that firstly we use a strong hash that tends to have miminum collisions and even if they collide go by approach told by @Shiv pratap to optimize to log n
I was asked a system design question. LinkedIn wants to Integrate with Dropbox. LinkedIn does not do business about document storage. However LinkedIn had 400 webservers and after integration with Dropbox, they do not want to scale up. Is it possible to make such integration without adding more servers?
Regarding log reconstruction, I suppose log data will be written to disk only after sufficient data is accumulated or a certain period has passed to reduce the disk access latency. If that is the case then how do you make sure data is not lost during that period?
Nice explanation. Thanks for the awesome videos. :)
Kindly create a separate video for event driven architecture
One question - in lru when you delete node from end of linked list, how it's reference is removed from hashtable
In linked list node you have key and value (the address of itself). You search for such key/value in hash table and remove it.
In ur implementation of lru..u missed to tell us wht wud happen in case of hash collision... How wud it impact the linked list
That would still work as it is. The size of linked list will increase more than no of buckets
Awesome it will be great help if you can talk more about LFU thanks in Advance :)
Very informative videos. I have confusion and need more visibility. If we talk about the Cache write-through/write-back approach, our cache would be 1st level of storage and the application will write/read data from the cache. My confusion is that my application will fire the JDBC SQL query to fetch data and data will be responded back through the cache. How my cache having the capability to understand JDBC query ... Even if we consider the default cache implementation, data 1st write to Db, and while read request, the 1st hibernate layer will search whether the data is available with cache, if yes that respond back through cache data. In this scenario also no call will send to DB and respond back using cache data. how my cache having JDBC SQL understanding.
My concern is specific to Native SQL query, how it works if we provisioned our hibernate second level cache as write-through/write-back approach as with this cache configuration read/write happens on the cache first. How cache having capabilities to work with SQL native query
would have been better if you move LRU in a separate video. Not much content on distrubuted Redis Cache
Could you use a bitmap instead of a link list to determine the LRU faster?
Dude, that is cool!
I think LFU better choice than LRU
Thanks for video
In this scenario definitely. But it is harder to implement and to explain
In both mechanisms of making fault tolerant, you are mentioning both as async as well as in both we are just creating a log file, so whats the difference, its quite vague!!
For the LRU policy, how are reads still going to be O(1) if the values are stored in list? When you re-access key 2 and placed the node at the end of the list you are not updating the indexes in the hash table. How does the cache know which index in the list contains my key (2) when its re-accessed without looking sequentially in the list?
Jorge Smulevici just save the key in the linked list node the hash table doesn’t keep order just need the key also the key just noved position, it didn’t change
Thanks a lot +Narendra. This is an invaluable video.
When you delete the linked list the address are still present in the hash table. When you access the address you will get garbage value. How to overcome this?
By deleting it from the hash table, it's kinda obvious isn't it?
One question Naren, on design, as we have event loop in place, suppose we have two requests put and get, so get will have to wait until put request completed and than event loop will read this. Doesn't this event loop be a multithreaded ?
Why not a min heap w/ update ability on Timestamp of last access?
Linked List is still linear to find the actual element.
A linked list is O(1) to find either of the two end elements (least recently and most recently used). The hash map's hash chain finds elements by key in O(1) time. Since the LRU list is doubly-linked, you can pull the element out of the list in O(1) time and place at at either end of the doubly-linked list in O(1) time. You're never scanning for, say the 117th most recently used item. We're always evicting the item at the least-recently-used end, removing an element we already have a pointer to via the hash map hash chain, and re-inserting the accessed element at the most-recently used end of the LRU list. Have a look at the OpenJDK implementation of LinkedHashMap if you're curious.
Thanks for the explanation, helped a lot!
I was looking forward to the part where the data to be cached is so large, that it can't fit in 1 host. And we can scale that horizontally.
It is explained in the consistent hashing video.
Queue should be FIFO correct? Otherwise LRU would go for toss
It says distributed cache? I did not find the distributed part neither how eviction will be handled in case of distributed cache serves???
buckets should be indexed 0-9 instead of 1-10 for hash % 10 to land in buckets
Also - at one point caching is spelled 'cashing' and patterns is spelled 'patters'
good demo, but title shall not called Redis design, Only 1 line is talked about Redis, which is "Regular Interval Snapshot". Nice session
is distributed cache attached to any server .. U said 1 sever is down then we will have cache miss .. If cache was distributed then we should not lose data from cache no ???
WIth LRU linkedlist, how can we handle collision. Will it be Linkedlist of linkedlist.
There are no collisions in the LRU lineked list. One item is accessed, than another. There are no ties for when items were accessed. Even if there were a collision as to which was most recently used, you'd just make an arbitrary choice and treat one as older and one as younger.
How can we use thread pool if manipulating of the LRU cache is not thread-safe.
Since we are mutating internal data structure (linked list) on reading and writing (moving the k/v pair to the end of the linked list), this way we can only change one node at a time. This implies that there is no benefit of thread pool in your design, one thread would be as good.
Or this linked-list LRU cache explanation is just for fun and in reality, you need to use a concurrent linked list implementation of some sort? Otherwise thread pool doesn't really make sense in my opinion
I think using commit log will guarantee all concurrent reads and writes to be serialized. so no concurrent r/w will impact the in-memory data structure.
@@blackcatcaptain2022 you mean a background job/task will go through commit log and adjust the internal data structure accordingly? Meaning the rate of log entries update (append) is faster than the writer/data structure update thread which is trying to catch up? This way while processing a read request you may return data which in reality (after processing all the logs) should have been removed.
Implementing concurrent data structures is a hard thing to do. Simply this implementation of LRU cache reminded me of a splay tree. Which is cool, but not concurrent-friendly
@@DenisG631 yes, read will eventually get recent write. that is why it is called eventual consistency. Using commit log guarantee all writes are persisted.
@@blackcatcaptain2022 eventual consistency makes sense in a distributed scenario. I, on the other hand, was discussing manipulations done on a single node
What I was saying is regarding the reads not being blocked, i.e. 2 reads can happen at the same time in parallel and nothing will happen, but this (in my opinion) can not be done due to mutation of the internal data structure.
One option is to perform reads in parallel and update the internal data structure later (what you mean with "eventual") even though I never heard of eventual consistency within one node ¯\_(ツ)_/¯