Set Associative Caches 1: What is a Set Associative Cache?

แชร์
ฝัง
  • เผยแพร่เมื่อ 29 ก.ย. 2024
  • Support What's a Creel? on Patreon: / whatsacreel
    Office merch store: whats-a-creel-...
    FaceBook: / whatsacreel
    This is the first video in a 2 part series discussing Set Associative Caches. These are the types of caches on AMD and Intel CPU's. In this video, we look at how a set associative cache works, and explain the numbers set associativity, cache line size, tag, offset and index. Showing how the CPU addresses RAM, and stores cache lines in a set associative cache.
    Software used to make this vid:
    Blender:
    www.blender.org/
    Audacity:
    www.audacityte...
    OBS:
    obsproject.com/
    Davinci Resolve 16:
    www.blackmagic...
    OpenOffice:
    www.openoffice...

ความคิดเห็น • 82

  • @madokalover
    @madokalover 4 ปีที่แล้ว +36

    You are way better at explaining this than my university teachers were! The graphics are a huge help. Thank you, this is helping tons of people!

  • @giladreich810
    @giladreich810 4 ปีที่แล้ว +118

    You really went on another level! Those new simulations stimulated my brain to the point where I stored all this information in my L1 cache. Thanks for the great video once again!

    • @WhatsACreel
      @WhatsACreel  4 ปีที่แล้ว +13

      Hahaha! Cheers for watching :)

    • @StefanReich
      @StefanReich 4 ปีที่แล้ว +6

      What is it with the Reichs and their interest in computing

  • @lakshyagoyal5560
    @lakshyagoyal5560 4 ปีที่แล้ว +6

    Great video! I was not expecting to see animations for in this and that was a pleasant surprise! Helped with the explanation a lot too!

  • @0xggbrnr
    @0xggbrnr 4 ปีที่แล้ว

    Fucking amazing!

  • @Alex-op2kc
    @Alex-op2kc 3 ปีที่แล้ว

    Part 2: th-cam.com/video/tde8lhFdczI/w-d-xo.html

  • @uncoherentramblings2826
    @uncoherentramblings2826 4 ปีที่แล้ว +38

    Omg. So clearly explained. This is very good teahing material. Good job and thank you very much!

    • @WhatsACreel
      @WhatsACreel  4 ปีที่แล้ว +3

      Thank you SirUniverse :)

  • @deud1eskrub503
    @deud1eskrub503 4 ปีที่แล้ว +22

    Great stuff, keep it up!

    • @WhatsACreel
      @WhatsACreel  4 ปีที่แล้ว +3

      Cheers mate! Thanks for watching :)

  • @mhfarhadi4376
    @mhfarhadi4376 4 ปีที่แล้ว +12

    this video was literally more useful than my entire semester...i'm speechless

    • @ipotrick6686
      @ipotrick6686 3 ปีที่แล้ว +1

      then you didnt listeb

  • @stendall
    @stendall 4 ปีที่แล้ว +4

    Awesome vid btw, your CUDA tutorials got me thru a semester. Set associative seems like a cuckoo table without hashing.

  • @sakari_n
    @sakari_n 4 ปีที่แล้ว +7

    also real CPUs will have to synchronization between cores. in situations like core0 has data from address 0x01230123 and core1 stores to address that is in same cache block as 0x01230123. now core0 has invalid/old data it's cache. What happens next depends on the ISA (how relaxed is the memory model and stuff) but, if remember correctly on x86 the invalid/old cache data needs to be reloaded to cache from main memory by core0 when it tryis to access it. also the c/c++ memory model (more relaxed than x86) has some opinions about this and this effects how compilers are allowed to generate code for loads and stores.

    • @WhatsACreel
      @WhatsACreel  4 ปีที่แล้ว +2

      They do indeed! Synchronization between cores is a great topic!

    • @ngissac3411
      @ngissac3411 2 ปีที่แล้ว

      @@WhatsACreel Actually, there is a protocol for multiple cores CPU, which is the MESI protocol. Basically, Intel and AMD have their unique protocol based on the MESI.

  • @thehen101
    @thehen101 4 ปีที่แล้ว +5

    This video is great, although TH-cam's low bitrate kind of ruins those nice 3D renders. Perhaps you could render at a higher res? Cheers

  • @PrivateSi
    @PrivateSi 3 ปีที่แล้ว +1

    Better compilers could probably eliminate the hardware automatic caching system and precache code and data in an optimised way. Same for the OS / app runtime dynamic memory manager. Currently it isn't possible to access a cache directly (in X86 at least) but it is possible to precache data. If you could access a cache directly it would save the memory address translation step the hardware has to perform.

  • @gearstil
    @gearstil 4 ปีที่แล้ว +3

    It is better for you if I let the advertising run all the way to the end?

    • @WhatsACreel
      @WhatsACreel  4 ปีที่แล้ว +2

      Ha! I'm not sure... Nice of you to think of that tho! Thanks for watching :)

  • @gideonmaxmerling204
    @gideonmaxmerling204 4 ปีที่แล้ว +3

    In the next vid will you teach about dirty bits and how the CPU is notified of a change made to ram by another component i.e. the GPU or the disk

    • @WhatsACreel
      @WhatsACreel  4 ปีที่แล้ว +1

      Cheers mate! I actually recorded one long video, but decided to split it into two because the second half was different. It's just a chat about some specs from Intel and AMD CPU's. It would be fun to continue with some more info on caches, dirty bits, exclusive v inclusive, victim caches, etc. And the instruction cache, which is a different beast all together! Anywho, thanks for watching :)

  • @cezarcatalin1406
    @cezarcatalin1406 4 ปีที่แล้ว +3

    The one dislike is from intel 😆

    • @WhatsACreel
      @WhatsACreel  4 ปีที่แล้ว +1

      Wow, your icon is animated in the notifications... Is it a gif? How did you do that? Hahaha :)

  • @willofirony
    @willofirony 4 ปีที่แล้ว +5

    Wow! Gilad Reich wrote all that needs writing about the awesome presentation. Can't wait for part 2. I am hoping you might conclude with an alignment strategy to get the most efficient use of the caches. We are certainly not in Kansas anymore, Toto. Great video.

    • @WhatsACreel
      @WhatsACreel  4 ปีที่แล้ว +3

      Really glad you liked it! I actually recorded both vids in a single take, but split it into 2 because the second half is seemed liked a different video. It's just a commentary on a handful of hardware specs. It would be fun to discuss alignment strategies, and particularly access patterns! Maybe in an upcoming video? Thanks for watching :)

  • @NeilRoy
    @NeilRoy 4 ปีที่แล้ว +3

    When you have L1, L2 and L3 cache, isn't data from L1 pushed into L2 when new data comes in? And if the data in L2 gets old, it is moved to L3? Something like that anyhow. My memory on this is fuzzy. Anyhow, I seen some great videos on coding your programs to maximize cache hits. The code to do this can often look slower with more code, but the end result will be a huge speed increase. I forget where I seen the video now, but was REALLY fascinating to see normal code, verses code which has been designed to maximize cache hits.

    • @WhatsACreel
      @WhatsACreel  4 ปีที่แล้ว +7

      Yes, the caches generally evict to higher levels. It might be fun to make a video on exclusive vs inclusive and victim caches! All that stuff is great :)
      Techniques called cache tiling/blocking are great! Keep the data being processed in the L1!!
      Cheers for watching mate :)

  • @wesleymesquita8380
    @wesleymesquita8380 3 ปีที่แล้ว +4

    You did a really good job containing the animations, didactics and enthusiasm for this subject! Thank you!

  • @jake_3745
    @jake_3745 3 ปีที่แล้ว +1

    brilliant

  • @mrkrisey4841
    @mrkrisey4841 3 ปีที่แล้ว

    I dont understand, in the start animation he has 4 sets and 4 ways. Is one yellow block a cache line or do all the yellow blocks together make up a cache line?

  • @marceloguzman646
    @marceloguzman646 3 ปีที่แล้ว

    Im curious about the 'Valid Bit'. I was told that there must be one valid bit too, could someone tell me what happened to it? haha

  • @tythedev9582
    @tythedev9582 4 ปีที่แล้ว +2

    Incredible material. Many thanks.

  • @him21016
    @him21016 4 ปีที่แล้ว +2

    My guy

  • @user-ym4yt9bo2u
    @user-ym4yt9bo2u 3 ปีที่แล้ว

    THANK U now i actually understand this 4 my final

  • @georgewright1093
    @georgewright1093 2 ปีที่แล้ว

    I know this is going to sound silly, but could you work in something about throwing a shrimp onto a barbie

  • @regulus8518
    @regulus8518 2 ปีที่แล้ว

    what happens to the cache line that gets evicted from L1 ? does it get written into L2 and what is that process look like ?

  • @MexicanRaptorJesus
    @MexicanRaptorJesus 4 ปีที่แล้ว +2

    Awesome video man! You're going to blow up with such fantastic content!

  • @MrSpikegee
    @MrSpikegee 3 ปีที่แล้ว

    What is your accent? English pirate? Great content btw

  • @Filaxsan
    @Filaxsan 2 ปีที่แล้ว +1

    Amazingly done! A very clear explanation, thanks Creel! :D

  • @duydianvu5466
    @duydianvu5466 3 ปีที่แล้ว

    Could you turn on subtitles for this video? Thanks

  • @jiannickW
    @jiannickW 3 ปีที่แล้ว +1

    Really useful and awesome video. Clear and concise, with good examples!

  • @Vi0lad0r
    @Vi0lad0r ปีที่แล้ว

    This video is absolutely brilliant.

  • @sandraviknander7898
    @sandraviknander7898 3 ปีที่แล้ว

    These are not the cache lines you’re looking for.

  • @sabriath
    @sabriath 4 ปีที่แล้ว +4

    You missed the policy of a "dirty" cache, where data was written to a cache but wasn't synced with RAM when it's evicted....but other than that, pretty much got it.

  • @skilz8098
    @skilz8098 4 ปีที่แล้ว +1

    Another thing that is similar but different within CPU ISA's is when it comes to their function - virtual - routine tables from accessing data from the disk drive... There is sort of a cache structure there as well, except the information can be hashed into a virtual lookup table.

  • @mohamed_khoudjatelli9349
    @mohamed_khoudjatelli9349 4 ปีที่แล้ว +1

    I can't describe how much greatful am I
    thank you prof!

  • @TomStorey96
    @TomStorey96 4 ปีที่แล้ว +1

    🤯 such a brilliant explanation. Never knew how caches worked, and not sure when I will ever need to know, but it's fascinating stuff.

  • @Morimea
    @Morimea 2 ปีที่แล้ว

    Thank you! Great video!

  • @Psykorr
    @Psykorr 2 ปีที่แล้ว

    Thats a really great explaination!

  • @johnyoungquist6540
    @johnyoungquist6540 4 ปีที่แล้ว +3

    great explanation!

    • @WhatsACreel
      @WhatsACreel  4 ปีที่แล้ว +1

      Glad it was helpful!

  • @romanemul1
    @romanemul1 4 ปีที่แล้ว +3

    Thanks for this.

    • @WhatsACreel
      @WhatsACreel  4 ปีที่แล้ว +1

      Welcome, cheers for watching :)

  • @franzlyonheart4362
    @franzlyonheart4362 2 ปีที่แล้ว

    0:56, there. And 3:55 also.

  • @dannggg
    @dannggg 2 ปีที่แล้ว

    how did the offset read 9?

  • @thespourieye8590
    @thespourieye8590 2 ปีที่แล้ว

    Amazing video !

  • @AmaroqStarwind
    @AmaroqStarwind 3 ปีที่แล้ว

    I'd love to be able to use Ternary Content Addressable Memory (TCAM) for everything.
    I just wish TCAM wasn't so expensive and power hungry, and that the storage densities were actually half-decent.

  • @TheYmBProduction
    @TheYmBProduction 2 ปีที่แล้ว

    king

  • @ahmadk5844
    @ahmadk5844 2 ปีที่แล้ว

    THANK U !!!

  • @RupertBruce
    @RupertBruce ปีที่แล้ว

    The Disruptor circular buffer makes use of cache characteristics for speed. Thank you for this great explanation of the process!

  • @CodingJesus
    @CodingJesus 3 ปีที่แล้ว

    This was an amazing explanation!

  • @captainbodyshot2839
    @captainbodyshot2839 4 ปีที่แล้ว +2

    If my program makes a sequential access from beginning to end of some large array, can CPU predict that it will need data from more than just one cache line and start loading the following ones in advance?

    • @skilz8098
      @skilz8098 4 ปีที่แล้ว +1

      That depends on a few other things... It isn't just the hardware and its opcodes, but it also depends on the OS and on your Compiler - Interpreter and how they convert your source code to either assembly, byte codes, or opcodes... There are many optimizations that your Compiler - Interpreter will make depending on your compiler's - interpreter's command-line options and settings... Then it comes down to the architecture and its hardware design for which features are available. After that, it then depends on your Operating System and how it handles the calls to the underlying hardware such as reading and writing to disk, creating threads and semaphores, reading and writing to ports, etc.

    • @captainbodyshot2839
      @captainbodyshot2839 4 ปีที่แล้ว +3

      @@skilz8098 ...Are you sure you know what you're talking about? Never mind, I found out that modern x86 processors do, in fact, have automatic prefetch mechanisms which can detect linear access patterns.

    • @WhatsACreel
      @WhatsACreel  4 ปีที่แล้ว +3

      I thnk they call it smart prefetch at AMD or hardware prefetch at Intel? They certainly do this with the instruction cache too! Compilers will use software prefetch if they're clever enough! Certainly an interesting topic! Cheers for watching :)

    • @skilz8098
      @skilz8098 4 ปีที่แล้ว

      @@captainbodyshot2839 I wasn't trying to be too explicit because you would have to read the datasheets, and the ISA manuals to get all of the details. And the available features and techniques that can be used vary from architecture (cpu), platform(os), and compiler.
      Take, for example, you and I could have the same exact hardware and operating system except I could be using Visual Studio and you could be using GCC or Clang for C++. They all work very similarly and they usually implement 98%+ of the C++ standard, but they may do so in different manners.
      Compiler A might use register X with instruction 1 where Compiler B might use register Y with instruction 2 to generate the same algorithm.

    • @elliott8175
      @elliott8175 4 ปีที่แล้ว +1

      @@skilz8098 pre-fetching happens at the hardware level, not the software level. An executable/assembly can't tell a processor where to put data in the caches. Different compilers might result in different assembly which may result in the processor handling memory differently among the caches. However, processors are either using this technique or they're not, regardless of your assembly code. These days most processors do it.

  • @davidprock904
    @davidprock904 4 ปีที่แล้ว

    My architecture im working on gets rid of the cache principles, your entire storage space would be more like level 0, faster than L1

  • @MagnusTheUltramarine
    @MagnusTheUltramarine 3 ปีที่แล้ว

    Really an amazing and clear explanation, great animations too
    In this example we assume there isn't any virtualization right? All those addresses would be physical addresses

  • @tamaracousineau2329
    @tamaracousineau2329 2 ปีที่แล้ว

    Very nice visualization. Super helpful!

  • @abeygi5615
    @abeygi5615 3 ปีที่แล้ว

    Awesome graphics and to the point explanation! Thanks!

  • @Alex-op2kc
    @Alex-op2kc 3 ปีที่แล้ว

    HOly visualizations, Batman! This is great!

  • @dominiccatherin6661
    @dominiccatherin6661 3 ปีที่แล้ว

    Fantastic video. Thank you Creel!

  • @robertfaney4148
    @robertfaney4148 3 ปีที่แล้ว

    wow , so good - have you done anything on virtual memory please?

  • @NicosLeben
    @NicosLeben 4 ปีที่แล้ว

    How exactly does the comparison with tags work? If a set is full, are all these tags going to be compared in parallel or does it work like a binary search?

    • @chainingsolid
      @chainingsolid ปีที่แล้ว

      Given how hardware is naturally parallel I would assume parallel.

  • @poojasinha1943
    @poojasinha1943 3 ปีที่แล้ว

    It helped a lot. Thank you.

  • @WiseWeeabo
    @WiseWeeabo 4 ปีที่แล้ว

    love the skeletor thing

  • @damian_smith
    @damian_smith 3 ปีที่แล้ว

    Beautiful - thank you!

  • @ZedaZ80
    @ZedaZ80 2 ปีที่แล้ว

    This was so well made and explained!