Adding a cache is not as simple as it may seem...

แชร์
ฝัง
  • เผยแพร่เมื่อ 15 ม.ค. 2025

ความคิดเห็น • 218

  • @dreamsofcode
    @dreamsofcode  10 หลายเดือนก่อน +66

    Big shout out to everyone in the comments on this video for asking GREAT questions

    • @MohammadJairumi
      @MohammadJairumi 8 หลายเดือนก่อน

      Can you share your neovim distro?

    • @dreamsofcode
      @dreamsofcode  8 หลายเดือนก่อน

      @@MohammadJairumi you can find it on GitHub at elliottminns/dotfiles

  • @nathaaaaaa
    @nathaaaaaa 10 หลายเดือนก่อน +167

    Usually instead of write-through, I just DEL the relevant keys and force a new cache miss. Looks very reliable to me

    • @dreamsofcode
      @dreamsofcode  10 หลายเดือนก่อน +38

      That's a cool idea. I imagine it makes things a little more simple and can work for more advanced aggregations!
      I like it!

    • @xorlop
      @xorlop 10 หลายเดือนก่อน +14

      lol just left long winded comment about this

    • @dreamsofcode
      @dreamsofcode  10 หลายเดือนก่อน +9

      ​@@xorlop I'm glad you did!

    • @daleryanaldover6545
      @daleryanaldover6545 10 หลายเดือนก่อน

      I would do the same 😊

    • @123mrfarid
      @123mrfarid 10 หลายเดือนก่อน

      Good idea. Thank you..

  • @o11k
    @o11k 10 หลายเดือนก่อน +300

    "There are only two hard things in Computer Science: cache invalidation and naming things" ~Phil Karlton

    • @hansenchrisw
      @hansenchrisw 10 หลายเดือนก่อน +77

      And off by one errors 😉

    • @Rundik
      @Rundik 10 หลายเดือนก่อน +12

      And cache invalidation

    • @tacticalassaultanteater9678
      @tacticalassaultanteater9678 10 หลายเดือนก่อน +3

      ​@@hansenchriswand scope bloat

    • @af43bacc
      @af43bacc 10 หลายเดือนก่อน +7

      Concurrency and floating bugs: "Am I a joke for you?"

    • @markhaus
      @markhaus 10 หลายเดือนก่อน +1

      @@Rundik and cache invalidation

  • @xorlop
    @xorlop 10 หลายเดือนก่อน +52

    What a cool video! So many great ideas.
    Another idea for write through cache: delete the key instead! I think this could be good because whenever you update the entry, you are resetting its LRU value, which might not be accurate/helpful. I think there are a few cases where db write is not aligned with access of the key from cache. What if user writes spell but doesn’t use it right away, for example? By deleting it, you are saying save is not the same as use, which might be better aligned for a spell store. Newly updated spells might not be so popular. It also helps minimize cache overall size, which probably helps the redis LRU algorithm, which is only approximate LRU.

    • @dreamsofcode
      @dreamsofcode  10 หลายเดือนก่อน +6

      This is a great idea!
      Deleting does make a lot of sense when it comes to fast accessed data. I think the time you'd want to use write through would be when the data itself takes a long time to populate. But even then, you'd likely use some sort of expiration/deletion based resync.
      Another approach would be to not extend the expiration which I believe you can do with another redis SET option.

    • @daleryanaldover6545
      @daleryanaldover6545 10 หลายเดือนก่อน +2

      yes, the principle is delete cache on every operation except get requests, that's where we store the cache!

  • @Cranked1
    @Cranked1 10 หลายเดือนก่อน +28

    Making the writing to the cache independent of the completion of the request can be dangerous because if writing to the cache fails, you have a big problem. It also could happen that the user already moves on to make another request before the cache was written in the previous request which results in wrong data. This can be completely inconsistent and you have no gurantees (e.g. like database ACID). Even a database transaction won't save you because the system can still fail between cache write and transaction commit.

    • @dreamsofcode
      @dreamsofcode  10 หลายเดือนก่อน +5

      Yep, 100%.
      In a distributed system, this is even more challenging as you'd likely need to lock the key in the cache as well, which is even more complexity.

    • @DryBones111
      @DryBones111 10 หลายเดือนก่อน +3

      @@dreamsofcode Eventual consistency is both a blessing and a curse. Just like how async colours your functions, eventual consistency colours your whole system.

  • @rodemka
    @rodemka 10 หลายเดือนก่อน +11

    Video checklist:
    ✓ Editor - Neovim
    ✓ DB - PostgreSQL
    ✓ Cache - Redis
    ✓ gRPC
    - Fulltext seraches: meilisearch/sonic/typesense and postgres tsvector/tsquery
    - Authentication and authorization - oauth2, saml, openid, jwt, etc. endless list
    - full axum course from "todo app" -> "url shortener app" -> "pocketbase like app"
    - templates engines + hype about htmx
    - reports from db - rust/go + db -> pdf creation
    Thank you for the inspiring high quality videos!

    • @dreamsofcode
      @dreamsofcode  10 หลายเดือนก่อน +2

      Thank you for the great suggestions!

  • @thienlacho860
    @thienlacho860 9 หลายเดือนก่อน +6

    With write through, you may face dual write problem. There maybe a success write to database, but a timeout in Redis call. In that case, there is a stale data still in redis. In my application, I use Debezium to capture the change in the database and Produce to a Kafka topic, then a background process consume those changes and apply a cache invalidation. In my opinion, cache delete is safer than change the cache, as one cache may be affected by many different action, and those action may come concurrently, if you change the cache in wrong order due to async, then the cache may be changed to wrong result. Just delete the key for safe and memory efficiency.

    • @dreamsofcode
      @dreamsofcode  9 หลายเดือนก่อน +1

      Debezium is pretty great. I wanted to showcase it in this video but it blew the scope out way too much!
      CDC caching is dope.

  • @penguindrummaster
    @penguindrummaster 10 หลายเดือนก่อน +5

    I like the final takeaway saying caching is not your first step, and that database optimization should always be a consideration. I've seen too many people complicate their tech stacks just to avoid tackling an otherwise simple problem. Much like C, just because SQL is old doesn't mean it isn't really good at certain tasks.

  • @nedimkulovac6394
    @nedimkulovac6394 9 หลายเดือนก่อน

    Man, this video is awesome. By far the best and clearest explanation I've come across. Thanks a ton!
    I would like to see more videos explaining cache strategies and when to use caching and when not to use it.

  • @giuliopimenoff
    @giuliopimenoff 10 หลายเดือนก่อน +49

    I just removed the Redis cache for my project because I figured out it created more issues than benefits. Databases are already fast as heck, so use caches with intention. I use Redis for session tokens for example but nothing else rn

    • @dreamsofcode
      @dreamsofcode  10 หลายเดือนก่อน +13

      I think this is right choice. Session tokens are a good use of caching.

    • @eveleynce
      @eveleynce 10 หลายเดือนก่อน +2

      honestly I almost never use an external cache for anything I've written, because it helps me considerably more to consider how and WHY I want to cache information for each particular type of data. Some of it never needs a cache at all, and some of it only needs to cache a few bits of data, and it also gives you a hint on how to tell when your cache is stale since you know exactly what you're caching and when, rather than caching all data all the time

    • @giuliopimenoff
      @giuliopimenoff 10 หลายเดือนก่อน +6

      also when data is relational caching just triplicates the effort to keep it synced properly

    • @anthonycavagne4880
      @anthonycavagne4880 10 หลายเดือนก่อน

      I don't exactly understand you store the userId as the key and the token as the value ? Why is this better than using cookie ?

    • @giuliopimenoff
      @giuliopimenoff 10 หลายเดือนก่อน

      @@anthonycavagne4880 I use cookies and in the cookie I store the session id. Then in redis I have an hashmap with the user id and session id, so I can get the session data quickly and can also invalidate all sessions if needed

  • @RomanKornev
    @RomanKornev 10 หลายเดือนก่อน +12

    In the Write-Through caching case, what happens when the concurrent cache write takes slightly longer than expected? Now the client assumes that the data was updated, but when reading it back it would be a race condition, and the value might be stale.

    • @dreamsofcode
      @dreamsofcode  10 หลายเดือนก่อน +5

      Your correct, that's one drawback of concurrency as it can introduce a race condition.
      The only way to solve it would be to lock the cache and pass that lock through to the concurrent task.
      The other approach is to do write through caching with the cache being the first target, although this can lead to some weird state if the database operation fails.
      Either way adds complexity!

    • @NotherPleb
      @NotherPleb 10 หลายเดือนก่อน

      I was thinking something similar, like the DB or the cache fails, you need a way to sync the state again. I think the easiest solution is to spawn 2 tasks, one for the DB and one for the cache and await the results of both and handle those cases. However, the response time is the slowest of the two

    • @dreamsofcode
      @dreamsofcode  10 หลายเดือนก่อน +2

      @@NotherPlebEven spawning two tasks can be complicated, however, especially if one fails and the other doesn't. You need to then reconcile afterwards.

    • @NotherPleb
      @NotherPleb 10 หลายเดือนก่อน

      @@dreamsofcode yes, but I guess you always need to wait for the result of both in the handler when you mutate data, you can't just "set and forget" as an optimization

    • @EduarteBDO
      @EduarteBDO 9 หลายเดือนก่อน +1

      I think one solution workflow would be: lock cache key > update database > (update failed > unlock cache) update success > delete cache and let a cache miss happen in the future.

  • @underflowexception
    @underflowexception 10 หลายเดือนก่อน +14

    if you're using PHP and Laravel you can use the dispatchAfterRequest function to save to cache

  • @hominusprogramming
    @hominusprogramming 24 วันที่ผ่านมา +1

    I want to add that there is the dirty-bit technique in the write through cache to make it more efficient. When you modify a data in the cache, instead of writing it to the db directly, you set a “dirtybit” to 1. Then you subscribe a function to the event of key deletion. In the function you check if the dirtybit is 1, if it is you write to the db, otherwise you simply let the key be deleted without any action.
    This is how caching works in modern cpu too btw. They write stuff back in the ram/hdd/ssd only if they were modify.
    The reason why this is efficient is that, in this way, even if a data is modify multiple time you write only the last modification.
    (This is useless if you want keep track of every modification)

  • @kartik180rajesh1
    @kartik180rajesh1 9 หลายเดือนก่อน +3

    If you use the redis as a cloud hosting service isn't it defeating the purpose of the cache? The cache should be ideally as close to your backend service - either in your network or same instance memory

  • @posteisnoob5763
    @posteisnoob5763 10 หลายเดือนก่อน +2

    Thanks for the great video!! I would really like to see your take on when / when not to cache

  • @yuu-kun3461
    @yuu-kun3461 10 หลายเดือนก่อน +7

    After watching the recent PostgreSQL video by "The Art Of The terminal" it would seem to me that adding Reddis to PostgreSQL is not needed for most projects.
    Additionally, as presented in the blog post by martinheinz, a cache can be achieved by creaing UNLOGGED tables. And if the key-value pair functionality of Reddis is that important, the video mentioned covers that too.

    • @dreamsofcode
      @dreamsofcode  10 หลายเดือนก่อน +6

      100% agree. Most use cases it's not needed. And the complexity can often outweigh the benefit.
      I don't know if I would consider UNLOGGED tables a viable alternative. I did some benchmarking on them for another video I had planned and they're no where near as fast. There's also a lot of caveats with using them and if you're unaware it's unlogged, mistakes can happen.

    • @peppybocan
      @peppybocan 10 หลายเดือนก่อน +1

      I think Redis is more perfomant than Postgres' unlogged tables, just because Redis a specific tool optimized for in-memory store. Postgres, OTOH, has layers of abstraction for a simple store and retrieve functionality. Use correct tools for the correct uses.

    • @DanniDuck
      @DanniDuck 10 หลายเดือนก่อน

      @@dreamsofcode *The complexity often does not outweigh the benefit. It's extremely simple and will make everything 100x faster. For example, say you have a bunch of base64 encoded images stored in pg, you can make it so the image (likely ~10 kb each or so), gets stored in memory as it's result format, allowing you to make anything involving images significantly faster. It can make things super fast if you use it right, eg. big queries for a product's info or whatever.

    • @OneShore
      @OneShore 10 หลายเดือนก่อน

      @@peppybocan Yeah, the difference is that Redis is very lightweight. If you're looking to throw more RAM & CPU at a DB problem, then Redis starts to make sense. Because 8GB Redis + 8GB Postgres is going to outperform 32GB Postgres in many cases.

    • @peppybocan
      @peppybocan 10 หลายเดือนก่อน

      not necessarily, depends on the workload. If you have a transaction-heavy processing, there is no way around it. E.g. if you are a paypal and you need strong ACID guarantees you may find yourself in a pickle. Storing payment information in-memory is fine as long as you have resiliency built into the application.@@OneShore

  • @alirahimi4477
    @alirahimi4477 10 หลายเดือนก่อน +3

    Most of the time that one select by id isnt the bottleneck, rather it is a complex query that returns a possibly big result and caching that can be a real pain or plainly impossible. When you query by X but update by Y there is no clear way to use the write through method to update your cache because you dont even know what cache keys you should be updating!

    • @dreamsofcode
      @dreamsofcode  10 หลายเดือนก่อน +2

      Yep, you're correct!
      Apologies if I didn't make that clear in the video, I didn't want to complicate the caching implementation itself so went for a simple query

  • @marcing5380
    @marcing5380 10 หลายเดือนก่อน +1

    One major thing to remember about caching and caches (or in general where you have two separate sources of truth/data) is that you'll always run into eventual consistency so you shouldn't use it in every possible scenario. I.e. there is a non-zero time for the cache and DB to sync up in which the data is inconsistent but still readable. The only to avoid is I think explicit locking but that slows down the whole thing quite considerably - when you make an update to a table you lock everything related to it and unlock after the write and cache update have been completed.

  • @EvanEdwards
    @EvanEdwards 10 หลายเดือนก่อน +1

    Best walk in on a codebase I ever did was to realize they had left on passthough on their cache layer. The cache was invalidated with every request and passed through, presumably as a debug/development one line shortcircuit. I pointed it out, they deleted the one line and vastly improved responsiveness. They were nearly three months post-launch. (I was working on a loosely coupled service; consulting company operating as a separate department, essentially. I found it when looking into connecting to their database for some features).

  • @WanderingCrow
    @WanderingCrow 10 หลายเดือนก่อน +20

    Great video, very clear and articulated!
    Cache is among my biggest weakness, I think, after authentication and token management, so I'd be interested to learn more about them, and how/when to use them effectively 🤔

    • @dreamsofcode
      @dreamsofcode  10 หลายเดือนก่อน +2

      Thank you!
      They definitely have their use cases, but they're not that simple to implement and there's a lot to consider, more so that I even do in the video!

    • @GreatTaiwan
      @GreatTaiwan 10 หลายเดือนก่อน +4

      external IDP with SAML2 & OIDC (PKCE) is my biggest weakness

    • @SeanLazer
      @SeanLazer 10 หลายเดือนก่อน

      My advice is squeeze as much perf as you can out of your primary data store before you add a caching layer! Your RDBMS can take you a lot further than some people realize.

  • @wcrb15
    @wcrb15 10 หลายเดือนก่อน +3

    Too many people reach for caching as a mechanism to improve performance when actual performance tuning of thr application is the more appropriate action. Cache isnt going to save you if uou application is over fetching or inefficiently grabbing data from the DB. But wjem it's used correctly caching is awesome!

  • @LauriePoulter
    @LauriePoulter 10 หลายเดือนก่อน +1

    any tips for avoiding stale data when dealing with a 3rd party service that can be updated by other actors?

  • @CrypticConsole
    @CrypticConsole 10 หลายเดือนก่อน +3

    Why do you need to cache this in Redis? Could you not just use master slave database scaling for read heavy workloads?

    • @dreamsofcode
      @dreamsofcode  10 หลายเดือนก่อน

      Read replication is a decent solution in many use cases, especially read heavy as you mentioned.
      Just like caching, it's a tradeoff so it does depending on what your data model / system looks like.

  • @n0kodoko143
    @n0kodoko143 10 หลายเดือนก่อน +3

    awesome video. I would love to see a 'when to and not to cache' video.

    • @dreamsofcode
      @dreamsofcode  10 หลายเดือนก่อน +1

      Thank you! I shall do one then! 😁

  • @Sarwaan001
    @Sarwaan001 10 หลายเดือนก่อน

    I work at a team that handles very large amounts of data and we usually take a “best tool for the job approach”. E.g. for a graph database as the ground truth, we use it for simple queries that are O(1) use a search db for search at O(log(n)) and use a trigger to send data from the graph database, obtain the full object by performing a walk, sending the object to a search db.
    Feel like this is technically like caching but it’s still very fast and we think of cache databases more of a crutch to by more time rather than a solution.

  • @petar567
    @petar567 10 หลายเดือนก่อน +1

    Great video. Thanks for the information, also I would appreciate it if you make a video of when to use cache and when not.

  • @fahimferdous1641
    @fahimferdous1641 10 หลายเดือนก่อน +1

    What would be an example usecase for the random eviction policy?

    • @I25mI25
      @I25mI25 10 หลายเดือนก่อน +1

      LRU comes with a small overhead since you have to somehow store/maintain a "list" of which items were last accessed. In many "normal" cases, it is likely that an item that was recently accessed will be accessed again, so keeping the newer/most frequently used ones in cache is worth the overhead. If your access patterns on the other hand are mostly random, keeping track of usage patterns isn't really worth it, so you can just delete any random entry. You might still want to use a cache even in random use cases when the occasional random cache hit might still give a big enough boost/save you money in bandwidth/storage access cost to make the added complexity of a cache worth it.

    • @dreamsofcode
      @dreamsofcode  10 หลายเดือนก่อน

      This is a great explanation.
      As for specific use cases, it's hard to really describe any that would fall into this. But any data / queries that have no discernable pattern, or in a system where the likelihood of needing a key is the same across your data set.

  • @legobuildingsrewiew7538
    @legobuildingsrewiew7538 9 หลายเดือนก่อน

    Instantly subscribed! Great video.

  • @zshanahmad2669
    @zshanahmad2669 10 หลายเดือนก่อน +2

    great video. my biggest problem with caches in bigger projects is dealing with related data.
    for example, I cached the blogs api: GET blogs/blog_ID, this api returns the JSON which has blog and the information about the author.
    When the data about author changes e.g. their name, I have to invalidate the blogs/blog_ID too, otherwise users will get the old author data.
    I know I could only return the blog data in blogs/blog_ID request, but I cant change the frontend, which expects the user data inside of the response.

  • @fahimferdous1641
    @fahimferdous1641 10 หลายเดือนก่อน +1

    I legit thought today's sponsor was docker XD
    Are you using the embedded terminal though?

  • @Biowulf21
    @Biowulf21 10 หลายเดือนก่อน

    Love your videos man. Keep it up!

  • @kennedydre8074
    @kennedydre8074 10 หลายเดือนก่อน

    I would really love to see a video of when to cache and when not to cache, thank you.

  • @michaelhenze877
    @michaelhenze877 9 หลายเดือนก่อน

    Would really like to see a comparison between NvChad and your current NeoVim configs.

  • @hunorportik5618
    @hunorportik5618 9 หลายเดือนก่อน

    Useful info, well described.
    One important thing was left out IMO: using concurrency might actually re-introduce the stale-data-issue since one might fail due to a non-transient (or improperly handled transient) issue.

    • @dreamsofcode
      @dreamsofcode  9 หลายเดือนก่อน

      That's correct! This issue becomes even more problematic in a distributed system as well if we horizontally scale our app.

  • @nexovec
    @nexovec 10 หลายเดือนก่อน +2

    I just realize you can literally ship a product that's just static files and a Postgres server. Curb your stack, please.

  • @pieter5466
    @pieter5466 10 หลายเดือนก่อน

    8:14 Makes you wonder whether there is *ever* a good use case for "random order"

  • @saywaify
    @saywaify 10 หลายเดือนก่อน

    Can you please share your nvim setup (or at least the colorscheme) ?? It looks so fine

    • @perz1val
      @perz1val 10 หลายเดือนก่อน

      Colorscheme looks like catpuccin

  • @Daniel-i8v2i
    @Daniel-i8v2i 8 หลายเดือนก่อน

    what video editing software do you use? it looks like you're on Linux

  • @PiesekLeszek90
    @PiesekLeszek90 10 หลายเดือนก่อน

    Write-through cache sounds like you just have 2 databases running at the same time, but I assume it's because of the simplicity of the example? I'd imagine you only cache the prepared API response with all it's relations and after applying logic, and not "raw data" as it is in the main database?
    This doesn't sound too optimal when you update one record that applies to many users, but each user needs it's own cached version?

  • @ebukaume
    @ebukaume 10 หลายเดือนก่อน

    What happens when the spawned task compeletes with an error? It seems we didn't completely solve the stale data problem.

    • @dreamsofcode
      @dreamsofcode  10 หลายเดือนก่อน +1

      Correct, and in a distributed system, this is even more difficult!

  • @fadhilinjagi1090
    @fadhilinjagi1090 10 หลายเดือนก่อน

    What of you deleted the cache entry right before you updated/deleted the record in the DB? Will this prevent the race condition?

    • @fadhilinjagi1090
      @fadhilinjagi1090 10 หลายเดือนก่อน

      I think that's optimistic mutation, if I'm not wrong.

  • @betoharres
    @betoharres 10 หลายเดือนก่อน

    why did you make the Write Through Cache concurrent? there's a chance of two concurrent requests have a mismatch value returned based on what's in the database; maybe I'm missing something here

    • @dreamsofcode
      @dreamsofcode  10 หลายเดือนก่อน

      You're correct.
      However, even with it being serial, there's no guarantee in a distributed / horizontally scaled system of a race condition not occuring.
      With a cache, it's almost impossible to guarantee consistency without locking the actual cache itself. In a distributed system, that's going to be even more complex.

  • @archip8021
    @archip8021 10 หลายเดือนก่อน +1

    i have a table of about ~20 items that i need very, very often, and it rarely changes
    is this a good use case for caching? can a whole table be cached like this?

    • @dreamsofcode
      @dreamsofcode  10 หลายเดือนก่อน +3

      I think for that size, you're likely not going to need caching.
      Caching is more for when you have slower queries, such as aggregations or hitting an API that has poor performance.
      Adding in a cache adds in complexity and it's probably not worth the performance gain you might receive.

    • @arturpendrag0n270
      @arturpendrag0n270 10 หลายเดือนก่อน +1

      Cant you load them at the start of the request and use some singleton or put them in some "global" variable so you wont have to request the unless needed.
      Even if thats not the case the db usually has caching mechanisms for repeating queries so for such small size of records its probably unnecessary.

  • @skr-kute1677
    @skr-kute1677 9 หลายเดือนก่อน

    Thanks for the vid
    Informative and simple

  • @krateskim4169
    @krateskim4169 10 หลายเดือนก่อน

    I would like to know when to cache and when not to please

  • @mementomori8856
    @mementomori8856 10 หลายเดือนก่อน

    crazy that you release this the same day as I start implementing Redis from scratch

  • @TR1XT3RZ360
    @TR1XT3RZ360 9 หลายเดือนก่อน

    can you share your terminal setup.

  • @vinii2815
    @vinii2815 10 หลายเดือนก่อน

    hey sorry this is out of the topic of the video but will you make a new video about NvChad configuration? their new file structure is very confusing and I haven't seen anyone with an update tutorial for it yet

    • @dreamsofcode
      @dreamsofcode  10 หลายเดือนก่อน +1

      I'll be redoing the neovim content soon! I recommend staying with NVChad 2.0 for the mean time!

  • @theblckbird
    @theblckbird 10 หลายเดือนก่อน +3

    In Rust, you can do the following to convert a Result to an Option:
    let my_result = action_that_returns_a_result(); // Result
    let as_option = my_result.ok(); // Option
    It works the other way around as well:
    let option = Some("foo"); // Option
    let as_result = option.ok_or(0); // Result
    let option = Some("foo"); // Option
    let as_result = option.ok_or_else(|| 3 * 3 / 9) // Result

    • @dreamsofcode
      @dreamsofcode  10 หลายเดือนก่อน +2

      This is much simpler! Thank you.

    • @nathanoy_
      @nathanoy_ 10 หลายเดือนก่อน

      Awesome write up. I was about to write a similar comment. Now I write this reply to push this one. 👌

    • @Maxelya
      @Maxelya 10 หลายเดือนก่อน

      I still consider myself lacking experience with Rust, but somehow I knew about these "ok" methods and was about to point it out after watching the vid ^^'.

  • @neelg7057
    @neelg7057 10 หลายเดือนก่อน

    Which font is that in your nvim? :)

  • @TheTwober
    @TheTwober 10 หลายเดือนก่อน +1

    Now imagine you program in Java and all those problems are already solved. :)
    Just use a SoftReference that will be cleared by the GC if it needs memory, and the attached ReferenceQueue can be (blockingly) polled by a background thread, so your cache gets informed whenever something got removed by the GC. A near perfect cache is nowaydays literally 3 lines of code in Java.

  • @youtube_user9921
    @youtube_user9921 10 หลายเดือนก่อน

    Hi. Can you also post tutorial lectures on nix?

    • @dreamsofcode
      @dreamsofcode  10 หลายเดือนก่อน

      Absolutely! I'll likely do it on my other channel which is more focused on Linux and FOSS. I've been playing with NixOS more on there

    • @youtube_user9921
      @youtube_user9921 10 หลายเดือนก่อน

      Can you tell me which channel it is?

  • @its_maalik
    @its_maalik 9 หลายเดือนก่อน +1

    Adding a cache should be the last resort to achieving good performance. Majority of applications will do just fine without a cache if they nail the data modeling and query optimizations.

  • @foreverexpanding
    @foreverexpanding 9 หลายเดือนก่อน

    Why not update the cache when we update the DB, in that case there would be no need to worry about it being stale

  • @Avanta1
    @Avanta1 10 หลายเดือนก่อน

    I'm not very familiar with async Rust, but is there any change of race condition when updating the cache? If a thread that was spawned later acquires the lock before an earlier spawned thread?

    • @dreamsofcode
      @dreamsofcode  10 หลายเดือนก่อน

      Youre correct.There absolutely is a chance. A race condition is introduced by making the update concurrent from the response.
      If you want to ensure 100% consistency then performing the update synchronously would be preferable!

    • @Avanta1
      @Avanta1 10 หลายเดือนก่อน

      @@dreamsofcode Cool, thanks for replying!

    • @dreamsofcode
      @dreamsofcode  10 หลายเดือนก่อน

      @@Avanta1Thanks for asking the question!

  • @Myrkytyn
    @Myrkytyn 7 หลายเดือนก่อน

    When to cache?

  • @FinlayDaG33k
    @FinlayDaG33k 10 หลายเดือนก่อน +1

    There is a major issue tho... If your key expires, and suddenly 1K requests come in, you're now hitting the database with all 1K requests and may overload the database anyways.
    Not exactly ideal.

    • @dreamsofcode
      @dreamsofcode  10 หลายเดือนก่อน +2

      You're correct, this is known as the thundering herd problem.
      You can solve this by using something such like a single flight mechanism or connection pooling, again it's more complexity though.

    • @mind.journey
      @mind.journey 10 หลายเดือนก่อน +1

      I don't know if it's optimal, but what I usually do is never let the key expire, and instead just create a cronjob (or something similar) that periodically refreshes the key with updated data.

    • @FinlayDaG33k
      @FinlayDaG33k 10 หลายเดือนก่อน

      @@mind.journey This works depending on the goals yes.
      If it's data that you know will be highly saught after by your code, it can definitely work.
      However, you are now burdened with the task of guessing which data would benefit from it.
      It can also lead to you having a lot of data in the cache that you may only need once under the full-moon, thus wasting resources in fetching and keeping it cached.

  • @M3t4lstorm
    @M3t4lstorm 10 หลายเดือนก่อน

    Note: In the write-through example, if your application crashes/errors/gets killed before the cache update is writen to redis (after the DB write) you will have stale data forever.

    • @dreamsofcode
      @dreamsofcode  10 หลายเดือนก่อน

      You mean until the TTL?

    • @liu-river
      @liu-river 10 หลายเดือนก่อน

      yeh, but if you do sync, update redis after successful dB write then you sacrifice speed. I guess you can implement some kind of rollback if either fails?

  • @SlavomirDanas
    @SlavomirDanas 10 หลายเดือนก่อน +1

    Woah, wouah, woah! Just 8 seconds into the video and I see inforgraphics with cache layer in the completely wrong spot.

  • @rando521
    @rando521 10 หลายเดือนก่อน

    so i have a question since i am new to rust and axum
    the appstate is some amalgamation of arc/mutex
    and you lock it everytime you want to access db or redis cache
    wouldnt this just mean you are making the asynchronous runtime semi-synchronous

    • @dreamsofcode
      @dreamsofcode  10 หลายเดือนก่อน

      It's a great question.
      My implementation in the video is naive, mainly due to simplifying the code as much as possible. The requests are still asynchronous but you're correct the lock would prevent concurrent requests both accessing the shared state.
      An improved implementation would be to use either a RW lock, or have a better abstraction of the state to only lock when needed (rather than at the start of the request)

    • @rando521
      @rando521 10 หลายเดือนก่อน

      thanks kinda new to rust and async

  • @Fanaro
    @Fanaro 10 หลายเดือนก่อน

    Please make a video on how you edit your videos!

  • @egemengol306
    @egemengol306 10 หลายเดือนก่อน

    For the life of me I don't understand the need for Redis
    When I need caching I always reach for in-memory caching libraries right in my codebase, reducing latency with development and deployment complexity at the same time, while staying featureful.
    If the language is memory hungry in-memory sqlite works really well for most of the cases
    If I want centralized state I reach for the database itself, Postgres is excellent
    Under which circumstances Redis would be the first choice?
    Edit 1: Multiple instances caching for mutable data would be one I suppose

  • @allroni
    @allroni 10 หลายเดือนก่อน

    Great video, as usual! 🙂

  • @ordinarygg
    @ordinarygg 10 หลายเดือนก่อน +2

    90% of issues is missed indexes, or crappy backend code that runs 99% and 1% db time. So before you say DB is slow, please benchmark your API and DB independently. Simple 8 core Ryzen machine can handle 300k selects/sec and 60k/inserts per seconds using PostgreSQL. 256 cores and 1 TB of ram will solve a lot of issues in single instance. People don't even reach a level of vertical scaling first, instead starting scaling horizontally, huge mistake for middle-small businesses and startups.

    • @dreamsofcode
      @dreamsofcode  10 หลายเดือนก่อน +1

      Yep, I agree with you (I believe I stated something similar at the end).
      There are certain use cases where caching applies, but in general, optimizing your database queries is the correct approach.
      There is a case for horizontal scaling over vertical still, especially wrt availability. But even then you can use read replication to improve that.

    • @kriffos
      @kriffos 10 หลายเดือนก่อน

      If you want a really fast cache, it is a good idea to scale the cache together with your application and have no http request to the cache. Most of the time spent to get the data is probably http overhead. I think cache as micro service is - most of the time - a bad idea.

  • @animanaut
    @animanaut 10 หลายเดือนก่อน

    if you want to enable client side caching there are also etag request/response headers that can be used as well. a whole other topic, but i believe they use hashes to let the backend decide to respond with either a potential big payload over the network or not if the client's hash code looks ok to what is pesent in the server db/cache already (returning http code 304 instead).

    • @hansenchrisw
      @hansenchrisw 10 หลายเดือนก่อน +1

      +1, though If-Modified-Since is a bit simpler and usually sufficient

    • @hansenchrisw
      @hansenchrisw 10 หลายเดือนก่อน

      @CesarLP96 search for HTTP conditional requests

    • @animanaut
      @animanaut 10 หลายเดือนก่อน

      developer pages from mozilla would be one recommendation from me, also known as mdn

    • @animanaut
      @animanaut 10 หลายเดือนก่อน

      @CesarLP96 mozilla developer pages would be one page, just search for etag

    • @animanaut
      @animanaut 10 หลายเดือนก่อน

      fyi, i answered multiple times now but yt refuses to show it for some strange reason. not sure you will see this comment as well. one example would be the mozilla developer network

  • @ivan_adamovich
    @ivan_adamovich 9 หลายเดือนก่อน

    I did not understand one thing: 50 ms is certainly good response time. but for the simplest api written in rust, there is somehow a lot, don't you think? (i use go im projects, so i'm noob in rust)

    • @illyias
      @illyias 9 หลายเดือนก่อน

      You won't need caching in a simple project, your database will be able to handle the load fine.

  • @peppybocan
    @peppybocan 10 หลายเดือนก่อน +2

    Unless you handle 1000s of concurrent users and they pay you nothing (free users) you don't need to worry about caches. The right DB design with the right sized DB node can handle 1000s of concurrent users. Once you start handling 10k-s of users, then you think about the caching, but at that point, it should be fairly easy to scale the particular parts of your DB, because you *know* what is slow.

    • @dreamsofcode
      @dreamsofcode  10 หลายเดือนก่อน +1

      100%
      Profiling your queries and using well placed indexes is always a better option.
      Sometimes it's not possible (such as hitting a remote AP)I, but if you have control of the database then it's always the better option

    • @parkourbee2
      @parkourbee2 10 หลายเดือนก่อน

      Even then, do I really need a cache? Why not just index what needs to be indexed?

    • @peppybocan
      @peppybocan 10 หลายเดือนก่อน

      yeah absolutely! Those limits are external, and that's when it matters. I think Redis is a viable option in that case.
      even things like session authentication can be done with a silly in-memory LRU cache and it will get you 90-95% to the goal.
      But people tend to be very quick to stuff the project with a billion dependencies.
      @@dreamsofcode

    • @dreamsofcode
      @dreamsofcode  10 หลายเดือนก่อน +1

      @@parkourbee2If you don't have access to the database? I'm thinking more like Remote API's etc where you don't control the data at all.
      for example, we had an API that hit NIST for CVE's and was incredibly slow, in that case, caching was a good solution.

    • @luca4479
      @luca4479 10 หลายเดือนก่อน

      Postgres has built-in caching which is already crazy performant

  • @Amejonah
    @Amejonah 10 หลายเดือนก่อน

    There is one big question I have for a long time: how do distributed microservices work? Especially how scaling of certain services can be achieved? How buses/message broker play a role in it?
    You might be the one who can address these questions using simpler terms.

  • @hosamhamdy258
    @hosamhamdy258 10 หลายเดือนก่อน

    great video
    can you make when to cache or not video too
    thanks in advance

  • @saxtant
    @saxtant 10 หลายเดือนก่อน

    You do you your hardware is pretty much taking care of this already?

  • @watzyh
    @watzyh 9 หลายเดือนก่อน

    I never use redis for caching. It's a database. Other than simple key-value store, i use it for handling time dimension in the program (rate-limiting task & job queue) very useful for webserver which each request run separately.
    For caching, it's job for webserver like nginx. It's far-far more efficient & performant. I never have any issue with cache invalidation or using custom cache-key. You can control nginx cache programmatically just like redis cache.

  • @fra4897
    @fra4897 10 หลายเดือนก่อน

    amazing video, will checkout aiven for sure

  • @Cal97g
    @Cal97g 10 หลายเดือนก่อน

    It’s not stale it’s just eventually consistent

  • @JuanPabloCisneros2207
    @JuanPabloCisneros2207 10 หลายเดือนก่อน

    Caching is always tricky. In the lazy loading presented, you can end hitting the dual write problem as postgres is wat slower than redis. If the system needs concurrency it could be a tricky bug to solve i think

  • @arcadierosca9818
    @arcadierosca9818 10 หลายเดือนก่อน

    Can you create a video on how to make video like that? it's amazing!!!

  • @Affax
    @Affax 9 หลายเดือนก่อน +2

    Welp, time to move to KeyDB or DragonflyDB, at least they both are redis API compatible haha

  • @Zutraxi
    @Zutraxi 8 หลายเดือนก่อน

    Don't forget the retry policy for when your concurrent write to the cache fails. What if the api crash as the write is happening.
    Better use a fault handling outbox pattern.
    Suddenly caching is slower than accessing the database.

  • @neliosantos4014
    @neliosantos4014 7 หลายเดือนก่อน

    Amazing!! 😄

  • @CottidaeSEA
    @CottidaeSEA 10 หลายเดือนก่อน

    Cache is all fun and games until the cache is automatically invalidated due to a timer and everyone hits the same slow query at the same time.

    • @dreamsofcode
      @dreamsofcode  10 หลายเดือนก่อน

      This is a good one! The thundering herd problem.
      There are ways to solve it, using something such as single flight, although again it adds more complexity. Much harder to solve across in a distributed system.
      I'll probably do a video on it as a few people have mentioned it!

    • @CottidaeSEA
      @CottidaeSEA 10 หลายเดือนก่อน

      @@dreamsofcode Last time I had that issue, I solved it by forcibly fetching and caching with cron. A bit of a hacky and antipattern way of solving it, but it works really well.

  • @ShimoriUta77
    @ShimoriUta77 10 หลายเดือนก่อน

    Rust code is so beautiful.

    • @dreamsofcode
      @dreamsofcode  10 หลายเดือนก่อน

      😅
      It's not known for it's beauty

  • @Pipu748
    @Pipu748 3 หลายเดือนก่อน

    Good content

  • @PhilfreezeCH
    @PhilfreezeCH 10 หลายเดือนก่อน

    Who ever thought caching was simple?
    Its one of those rare things thats hard on all levels. Its very difficult in hardware development, difficult in software and ridiculously difficult in networking, its just brutal.
    Plus it always requires a ridiculous amount of benchmarking and verification to make sure you don‘t accidentally degrade performance on certain workloads or even worse, mess up data.

  • @livingfreely
    @livingfreely 3 หลายเดือนก่อน

    Love neovim, love jq, but perhaps Rust is not the best language to demonstrate this. I think Python is ideal for such video content, because it abstracts out all the noise so it's easier to focus on the actual caching. That said I would not use Python in the backend personally unless I do some AI related tasks.

    • @dreamsofcode
      @dreamsofcode  3 หลายเดือนก่อน +1

      I actually prefer to use Go instead these days! It's as consicise as python but more realistic to backend development as well.

    • @livingfreely
      @livingfreely 3 หลายเดือนก่อน

      Even better :D

  • @backupmemories897
    @backupmemories897 10 หลายเดือนก่อน

    sometimes adding cache slows it down xD but scale it better xD because whenever u do something u call that cache system.. another step.

    • @dreamsofcode
      @dreamsofcode  10 หลายเดือนก่อน

      Absolutely!
      That's the problem in the case of inserts at first. It improved the performance of the reads on a cache hit, but caused the timings to increase by 66% on a cache miss.

  • @perz1val
    @perz1val 10 หลายเดือนก่อน

    Looking at the comments I think you should've used a request that queries multiple tables of normalized data into a single object. Like /user/2/permissions is: user + user_role + role_permission + permission (list of permission names). Then the benefits are clear. Using cache to store a SELECT * FROM table; is a bad example.

    • @dreamsofcode
      @dreamsofcode  10 หลายเดือนก่อน

      Yeah, that's fair. I wanted to keep the interface as simple as possible so as not to distract from the caching itself.
      My original setup was doing a string search across 10m rows, but then it added more complexity to the examples (and at that point and index is still a better solution).

  • @shady4tv
    @shady4tv 9 หลายเดือนก่อน

    ironic that a video about redis comes out just before everyone drops it for going closed source.

    • @dreamsofcode
      @dreamsofcode  9 หลายเดือนก่อน +3

      Bad timing!
      Although tbf, this can apply to any caching solution. + I get to review all of the forks that are coming

    • @shady4tv
      @shady4tv 9 หลายเดือนก่อน

      @@dreamsofcode Honestly the timing is perfect! Redis is hot in the news cycle right now and you're right - this video isn't really about 'Redis' persay. But it's actually a great introduction to people who are uninformed about the software and want to get up to speed on all that is happening with it right now. I hope you get hella views from this bud! :)

    • @HUEHUEUHEPony
      @HUEHUEUHEPony 9 หลายเดือนก่อน

      I mean it is only closed source if you are a big company

    • @mrmelon54
      @mrmelon54 9 หลายเดือนก่อน

      @@HUEHUEUHEPony no? The new licensing doesn't fall under the definition of open source, and isn't accepted by the open source initiative.

  • @Ca1vema
    @Ca1vema 10 หลายเดือนก่อน

    Dunno what you're talking about, to add cache all I need is to put 2 lines in framework settings 🙃

  • @evccyr
    @evccyr 10 หลายเดือนก่อน +11

    I will do no push-ups for every like this comment gets. I'm sore from the last time.

    • @foziezzz1250
      @foziezzz1250 10 หลายเดือนก่อน

      Would like to join you in this

    • @martin4ata933
      @martin4ata933 9 หลายเดือนก่อน

      LETS GOO

  • @user-qr4jf4tv2x
    @user-qr4jf4tv2x 5 หลายเดือนก่อน

    the perfect cache is when you can natively plug cache on a database

  • @eveleynce
    @eveleynce 10 หลายเดือนก่อน

    the number one question you should be asking yourself when setting up a data layer is "does it matter HOW my data is stored?"
    if you discover that it doesn't matter at all, a flat json file is a decent option. If you discover that you need to connect several devices together, then a fast but scalable network database like postrgres or ravendb will work JUST file. If you discover that you are needing to request the data far more frequently than your systems are able to handle, THEN you need a cache.

    • @hansenchrisw
      @hansenchrisw 10 หลายเดือนก่อน

      +1, engineers often over complicate things. I think it was Dijkstra who said premature optimization is the root of all evil.

  • @lemonking4076
    @lemonking4076 10 หลายเดือนก่อน +1

    Nice video! But I don't understand why would a dev torture themself with rust

    • @dreamsofcode
      @dreamsofcode  10 หลายเดือนก่อน +1

      🤣🤣🤣😭😭😭

    • @lemonking4076
      @lemonking4076 10 หลายเดือนก่อน

      @@dreamsofcodeit's just way too verbose and not easily readable 😂🙀
      I hope this comment doesn't turn into a flamewar!!!

  • @temie933
    @temie933 9 หลายเดือนก่อน

    Can you create a how to arch video? Showing how you configured arch Linux.

  • @blender_wiki
    @blender_wiki 2 หลายเดือนก่อน

    Misleading video. is very easy to add a cash in nowadays. One rules: if you don't have the knowledge to do it, don't implement it.
    🤷🏿‍♀️

  • @siya.abc123
    @siya.abc123 10 หลายเดือนก่อน +1

    Rust syntax 😭😭😭🤢🤢🤢🤢

    • @dreamsofcode
      @dreamsofcode  10 หลายเดือนก่อน +2

      I'm with ya.
      I think I'm gonna use Go more for demonstrating anything non language specific in the future!

  • @sieunpark2160
    @sieunpark2160 10 หลายเดือนก่อน +3

    first place!

    • @tqwewe
      @tqwewe 10 หลายเดือนก่อน

      Ok

    • @itsme3217
      @itsme3217 10 หลายเดือนก่อน

      Is this your life achievement ?

    • @pythagoran
      @pythagoran 10 หลายเดือนก่อน

      Congratulations and/or I'm sorry to hear that

    • @sieunpark2160
      @sieunpark2160 10 หลายเดือนก่อน

      @itsme3217 yeah my mom is proud of me 😁

  • @bavidlynx3409
    @bavidlynx3409 10 หลายเดือนก่อน

    The comments and the video made me realise that caching is rather unnecessary and creates a lot of overhead and issues so imma stay away from it

    • @TheHTMLCode
      @TheHTMLCode 10 หลายเดือนก่อน

      I don’t think that’s necessarily the best decision, if you don’t cache you will incur performance issues under certain circumstances. The example illustrated in this video is a very simple one which could have been solved by efficiently indexing your database, but as you scale or encounter more complex problems you may want to consider caching for latency sensitive functionality. At work we have a workflow that requires our operators to pick orders in a warehouse, fetching the pick list (all the instructions to carry out the picking of an order) takes around 500ms to generate. The pick list reflects the entire state of the current pick journey and uses a cache write through strategy to update the cached pick list after every scan in the warehouse. Without a cache, the front end would need to rebuild the list from the database every time it retrieved the next instruction, 500ms after completing a stop and fetching the next stop would suck, fetching from cache and having a result in 30ms is far better. The tradeoff here is maintaining the complexity of the cache in order to achieve the performance SLO (service level objective) we promised to our consumer (warehouse staff). For simple applications you may be able to keep away from caching but I’d definitely learn and keep it as a tool in your toolbox, I’m sure sometime in your career it’ll be useful :) hope that helps!

    • @Amejonah
      @Amejonah 10 หลายเดือนก่อน

      I use caching (through postgres, I should really switch to redis) currently to make values live after restart of the application, as requesting the data takes a lot of time and consumes rate limit tokens.

  • @A_Me_Amy
    @A_Me_Amy หลายเดือนก่อน

    nice try rusty.

  • @IS2511_watcher
    @IS2511_watcher 10 หลายเดือนก่อน

    4:26 `.unwrap_or(None)` can be shortened to `.ok()` for `Result`, more idiomatic too.