Cache, from History to the Future of CPU Memory

แชร์
ฝัง
  • เผยแพร่เมื่อ 18 ม.ค. 2025

ความคิดเห็น • 691

  • @jerrywatson1958
    @jerrywatson1958 6 ปีที่แล้ว +418

    Your long format videos are the best! I know it's a lot of work but, your work/content is better than a commercial tv show. I would go as far to say it's documentary level writing with very high production values. Thank you Jim, do what you need to to do. We will wait.

    • @jordanwharton5286
      @jordanwharton5286 6 ปีที่แล้ว +6

      I also agree. I've learned so much from your analyses and I always get excited to see what you'll uncover next! Keep up the great work!

    • @CoccoUri
      @CoccoUri 6 ปีที่แล้ว +5

      agree :)

    • @Velkanis
      @Velkanis 6 ปีที่แล้ว +6

      you my dear intenet stranger nailed my thoughs dead on.

    • @_BangDroid_
      @_BangDroid_ 6 ปีที่แล้ว +3

      I know this is totally random but Jerry Watson sounds like the coolest name I've ever heard. Sounds like a cool jazz cat from back in the day.
      Totally agree, thoroughly enjoyed the video also.

  • @Peds013
    @Peds013 6 ปีที่แล้ว +272

    Your dad started coding at 60...
    My boss still can't use a mouse :-/

    • @MarikHavair
      @MarikHavair 6 ปีที่แล้ว +35

      @calistorich Reminds me of one of my favorite quotes.
      "It's is of the nature of man to err, and to blame it on someone else shows management potential."

    • @shznn
      @shznn 6 ปีที่แล้ว +10

      Someone very smart would here say, "Hmm, your comment, sir, explains everything that's wrong with society, hmm hmm." :)

    • @rpmTweeK
      @rpmTweeK 6 ปีที่แล้ว +2

      I'm more in awe of the fact that his dad had him at 60 or so years, then started coding. What a legend !

    • @oldtimergaming9514
      @oldtimergaming9514 6 ปีที่แล้ว +1

      So the hashtag #LearnToCode does apply to ex coal miners? Who would have thought it possible.
      My dad loved coding, building circuit boards and anything electronic but that was his job, not a coal miner. I am impressed. I miss him. Etching circuit boards with him are some of my fondest memories.
      I cut my teeth on a honeywell 6000 mainframe and learned COBOL, FORTRAN and BASIC programming. A staggering 256k of core memory!

  • @SporkOfDestruction
    @SporkOfDestruction 6 ปีที่แล้ว +196

    Fantastic content. I am an IT pro, and one of the concepts even colleagues struggle with is cache memory and how it's used. I've never heard it explained in such an understandable way - thank you! Now I have somewhere to point them!

    • @erikboesephoto
      @erikboesephoto 6 ปีที่แล้ว +4

      ITPro here as well. Definitely a fantastic explanation!

    • @milkman9055
      @milkman9055 6 ปีที่แล้ว +3

      Yeah, this was a good one!

    • @EditioCastigata
      @EditioCastigata 6 ปีที่แล้ว

      You'd have to have the term 'memory wall' at least once in college.

  • @mitchellwheeler7107
    @mitchellwheeler7107 6 ปีที่แล้ว +38

    I'm an embedded software engineer (i live and breathe microarchitecture & memory optimisation, so I deal an awful lot with optimising software around cache usage).
    Note: I did my best to make this as concise as possible, but it's unavoidably a complex topic, so it's a wall of text regardless.
    The 'weird'/flawed runs you're seeing are quite common, and while the causes indeed can be so many things (all of which are difficult to diagnose), the most common cause in my experience is poor page table colouring, sometimes things 'go wrong' with this optimisation depending on the OS, and it results in this kind of behaviour for the entire run of the process (or if you're lucky / the OS doesn't cache page table allocations, lasts until you re-allocate the memory without needing to re-create the process).
    You'd have to understand how virtual memory / paging works to get a strong grasp on what's going on, but the short version is when working within operating systems (or indeed any software system dictated by a kernel with a concept of virtual memory) - memory is allocated by the kernel in 'pages' (due to processors having a limits to it's 'pages' / virutal memory support in it's MMU, and/or due to the kernel optimising around the size of the TLB).
    Some kernels & C runtime library implementations are pretty simple, and when you malloc some memory, you're basically given an entire page (not always true) - and even if you're not, in benchmarks especially, you're often working with chunks of memory that are multiples of the page size. So in an awful lot of cases, you're literally working with memory aligned to a virtual memory page.
    Something I don't think you covered in your video though, is most modern CPU caches are associative (see: en.wikipedia.org/wiki/CPU_cache#Associativity) - which means there's a limited number of entries in the cache, but it still has to be capable of caching 'any' memory address despite it's limited entries... This results in a compromise where N cache entries are responsible for caching up to potentially M memory addresses (where M is far greater than N). Also see en.wikipedia.org/wiki/CPU_cache#Cache_entry_structure on how this works (tl;dr - all memory addresses sharing a common MSB share the same cache entries, how much of the MSB depends on the cache).
    At the start of my comment I referred to 'page table colouring', this is an optimisation made by kernels to 'avoid' this problem - by attempting to ensure contiguous virtual memory pages, get put into 'different' cache entry sets, to make the most use of the processor cache.
    HOWEVER (this is where it all comes together), these two concepts can collide in unfortunate ways. It's very rare, but very possible, that subsequent memory allocations made by a process, happen share those cache entries, either due to a lack of or a failure of page table colouring (the how/why it can fail is a another wall of text, but long story short - non-hard-realtime kernels (which includes windows, non-RT Linux, and macOS) can't easily/efficiently enforce this, due to the non-determinism of scheduling in non-realtime scenarios, lest they serialize everything / bring performance of multi-threaded memory allocation to a crawl).
    In the scenario where this problem occurs, you can/will often see things similar to what you're seeing in your weird/flawed results. It's likely subsequent chunks of memory in the of 8-16MiB memory allocated by the benchmark, have a poor distribution of memory addresses across the cpu cache, resulting in poor cache utilization. Due to the non-deterministic nature of consumer operating systems (as they don't have hard-realtime/deterministic schedulers/memory-allocators/etc), this is why it happens only sometimes, and restarting the process (which ensures the memory is completely re-allocated) makes the problem go away.
    Well written microbenches can avoid this, by ensuring the memory they're given has a statistically even distribution across the cache, but most applications don't bother to check this / they just blindly use the memory they allocate and hope for the best.

    • @PanduPoluan
      @PanduPoluan 6 ปีที่แล้ว +1

      Awesome explanation! Fortunately I'm quite well-versed enough in CPU intricacies to understand your explanation (many thanks to the BYTE Magazine -- I still grieve the loss of that great publication).
      On the flip side, not having much experience in writing software that take all the vagaries of cache management in consideration, do you think slightly reducing the size of the dataset being tested on will help? For example, rather than testing with a 16 MB dataset, we use just 15 MB dataset, giving a leeway of 1 MB for a 16 MB-sized cache?

    • @RobBCactive
      @RobBCactive 4 ปีที่แล้ว

      @@PanduPoluan @Pandu POLUAN watching the video it struck me that the simple natural doubling of data set size is a bit too coarse. It would be interesting to test double and double +/- ½ & +/- ¼ to on the runs with junps in latency to investigate the behaviour crossing these boundaries where Ln-1 effects behaviour of Ln size sets.
      I am not sure what you mean by "help", but basically with a victim cache the data size adds and a benchmark is so dominant in CPU usage on a quiescent system your results are not effected, when they are the whole run needs to be discarded because some CPU intensive operation interrupted the benchmark. This can be mostly avoided by increasing the priority of the benchmark.
      Note in practice, fast programs tend to operate over memory with sequential accesses, which allows anticipatory speculative loads which hide the main memory latency almost completely. I have used that to process data sets the full size of system memory, which behave close to L3 cache bandwidth even though the virtual memory system is loading pages from disk.
      I can recommend a series of articles non lwn.net about cache effects on modern programs, if you are still interested in the subject.

  • @nikolaangelov3583
    @nikolaangelov3583 6 ปีที่แล้ว +112

    Man your job is very hard, but it must be very fulfilling too. And you keep learning new stuff every day. That's truly beautiful. You keep getting smarter every day. That's fun

    • @adoredtv
      @adoredtv  6 ปีที่แล้ว +13

      This is true!

    • @alexmarin7897
      @alexmarin7897 6 ปีที่แล้ว

      Shame though that the video is filled with poorly made future projections. Suggesting you would get 64MB of L3 cache in Ryzen 3000 series (32MB per octacore die, 16MB per CCX or 4MB per core) just screams lack of understanding about basic aspects of computer science.

    • @adoredtv
      @adoredtv  6 ปีที่แล้ว +3

      @@alexmarin7897 I guarantee you that Ryzen 3000 has that cache layout (except not "4MB per core" as you erroneously put it) and I guarantee you that you're the one who lacks basic understanding.

  • @issaciams
    @issaciams 6 ปีที่แล้ว +264

    Alright got my food. I'm ready. Go.

    • @V4zz33
      @V4zz33 6 ปีที่แล้ว +3

      Haha, I just had my breakfast;))))

    • @CaveyMoth
      @CaveyMoth 6 ปีที่แล้ว +6

      Did you bring back any tomweapondamage while you were out?

    • @gustavb3673
      @gustavb3673 6 ปีที่แล้ว

      You mean "goto 10" right ;)
      Seriously i needed to take a food break in the middle of the video.
      I found this video hard to watch especially the first part since i kept remembering things and dreamed away and had to rewind again and again and again.....
      \o/

  • @anaximanderification
    @anaximanderification 6 ปีที่แล้ว +47

    Aside the IAC methods, you just compressed about 2 semesters of CompSci on architecture into a neat package.
    Very good job sir, hat's off.

    • @Chuckiele
      @Chuckiele 6 ปีที่แล้ว +2

      yep. my heads smoking but it was well worth it.

  • @dastardly740
    @dastardly740 6 ปีที่แล้ว +38

    I skimmed replies and didn't see this mentioned (could have missed it) 1ns is the period for 1GHz. So, your 2700X running at around 4 GHz has an L1 cache that takes about 4 clocks to return data. The engineering sample that you called a regression is running around 3.5 GHz, and 4 clocks is about 1.13ns, so it is not a regression but exactly as expected. My R5 1600 was 1.25ns which at 4 clocks would be 0.3125 or 3.2Ghz. So we can be pretty sure that L1 on 1XXX and 2XXX is a 4 clock cache.
    Presumably, the engineers at AMD can fiddle with that L1 multiplier, so maybe they decided to try 3 clocks on the slower chip. 3.2Ghz is 0.3125ns would be 0.9375ns, not quite as fast as the benchmark, but not that far off. Maybe the actual clocks during the cache test were a bit higher. But, if I were an engineer testing chips this is probably a very important test. 5GHz at 4 clocks is 0.8ns. Maybe the engineering sample won't reach 5ghz, but they need to know whether the cache could reach 4.8-5GHz at 4 clocks. So, they down clock the chip and set the L1 multiplier to 3 clocks to see what the L1 is capable of and 0.8-0.9 means the L1 should allow for those high 4 to 5Ghz clock speeds from your leaks.

    • @PanduPoluan
      @PanduPoluan 6 ปีที่แล้ว +3

      Nice analysis! You must be one of the "helpful folks" Jim mentioned in the video :-)
      Hmmm... it seems that AMD's Zen 2 has quite a bit of headroom there... so when AMD finally launches Ryzen 3k, and Intel as expected tried to counter (with great difficulty), AMD can wait until the right moment and totally take out the wind to Intel's sails with another push into the Zen 2 headroom. Intel will then do a Hail Mary move, and AMD delivers the killing stab.
      I can totally see AMD owning the market for the next 2, maybe 3 years. In 2022 maybe Intel will start to become competitive again, but at that moment we will have 2 gorillas duking it out, none with clear dominance as Intel had the past decade, and consumers will profit greatly.

    • @RobBCactive
      @RobBCactive 4 ปีที่แล้ว +1

      @@PanduPoluan they have indeed announced some desperate looking moves including a big/little design in a rectangular package to mitigate excessive power consumption. Their 10nm laptops have lower battery life and performance than AMD Renoir but the problem is finding designs with AMD in them. Most of the market don't seem to care and just accept the inertia of the OEMs and ODMs who are the real laptop manufacturers

    • @RobBCactive
      @RobBCactive 4 ปีที่แล้ว

      Hmmmm, IIRC these low level caches are synchronous with core, so I think the silicon design determines the cache cycle time not a configurable multiplier. A key role of the Infinity Fabric is bridging the mismatch of cache speeds with main memory.
      Without memory in the system how can asynchronous operations function? You would have to stall CPU registers for variable times to permit variable cycle L1 accesses, if it's fixed synchronously the values can be held in the circuit transistors after the store micro op is initiated and the transfer flow through L1 and into L2.

    • @PanduPoluan
      @PanduPoluan 4 ปีที่แล้ว

      @@RobBCactive The AMD models are starting to trickle in. It's quite understandable from the POV of laptop makers to not immediately jump in with both feet; they need to see someone jump in first (Asus did it), and then after they saw how brilliantly AMD Renoir performed, they started to get on the bandwagon and design their systems.
      I think Papermaster did allude to this; he did expect Renoir pickup to be slow but steady. And as we started to see "enterprise" laptops with AMD coming from the likes of Lenovo and HP, I think what he had surmised back then is now proven.

  • @Trinitos
    @Trinitos 6 ปีที่แล้ว +42

    How many programmers do you need to change a bulb? - none, it's a hardware problem ¯\_(ツ)_/¯

  • @Velkanis
    @Velkanis 6 ปีที่แล้ว +20

    everyone can make a video, everyone can try making something entertaining but rarely do i see someone making something longer than 20 minutes that will make me sit and listen no matter what its the final content, thats the quality level of JIm. for me that i like and really enjoy to knowing how things work and daydreams about how and what the future depare for us, having someone take a earth leveled, reasonable look at the future is a feast upon my eyes and ears (for example that was also the case with path tracing video).
    i only can only be in awe at these videos due to how meticulusly crafted and wonderful are, its a joy for the mind.
    Thanks Jim for how long you have stuck in here against the popular opinion, and for inmeasurable efforth put in these videos! glad to be a supporter! Cheers and have a magnificent day!

    • @adoredtv
      @adoredtv  6 ปีที่แล้ว +3

      Cheers bud.

  • @eubikedude
    @eubikedude 6 ปีที่แล้ว +120

    23:18 16MB RAM eh? ;) An easy slip when you are discussing all the older stuff and cache sizes. :)

    • @The0Gizmo
      @The0Gizmo 6 ปีที่แล้ว +5

      Caught that also, lol.

    • @wewillrockyou1986
      @wewillrockyou1986 6 ปีที่แล้ว +23

      He made a mistake, whole video must be complete bullshit ;)

    • @cybercat1531
      @cybercat1531 6 ปีที่แล้ว +6

      Well... Cache is just fast SRAM.

    • @ec1021501
      @ec1021501 6 ปีที่แล้ว +8

      This is what will happen if you run out of cache and think you could reduce the latency by not looking at your script.

    • @AscendingApsolut
      @AscendingApsolut 6 ปีที่แล้ว +2

      wrong time, it is 23:02 instead

  • @redhaze8080
    @redhaze8080 6 ปีที่แล้ว +27

    my dad was a rigger but a few of his mates were coal miners here in Wollongong. One of them was mad in to his macintosh 128k right till he dropped from coal dust and asbestos. He was a tough old bugger and had never done anything like that before, but he was bloody inspiring. i was 10 and he was better at coding than me.

  • @ADR69
    @ADR69 6 ปีที่แล้ว +50

    I know this took forever to make but it was worth it. Thanks for sharing, this was really interesting.

  • @michaelkregnes9119
    @michaelkregnes9119 6 ปีที่แล้ว +79

    I Noticed this channel because of the Ryzen 3000 Series leaks. I got here from UFD Tech, and from that time to now i have watched at least 40 of Your videos. Keep up the grat content, and to top it off i love Your accent.. cant have 1 month without "Aritte guyz howsit goin":)

  • @EldaLuna
    @EldaLuna 6 ปีที่แล้ว +5

    all these years ive seen these cache sizes and never really knew how they functioned.. for the first time ever i now understand exactly how they work and why.. very impressive i must say.

  • @Starchface
    @Starchface 6 ปีที่แล้ว +54

    Cracking video Jim! Brilliant. Enjoy your rest. You've earned it.

  • @bigogle
    @bigogle 6 ปีที่แล้ว +39

    Brilliant. I was enthralled the whole way through.

  • @Elusivehawk
    @Elusivehawk 6 ปีที่แล้ว +139

    Jim, you really hate my sleep schedule, don't you?

  • @mike-barber
    @mike-barber 6 ปีที่แล้ว +11

    Really good video Jim, again. Being a coder involved in doing some fairly fast stuff, I do know how caches work in moderate detail, but found this to be a really good explanation for everyone. I think you did a great job of keeping the detail at just the right level (without going into all the extra stuff like cache lines, associativity, prefetching etc).
    Also really enjoyed seeing what is going on with Zen. I hadn't clicked that it was a victim cache, and definitely interesting to consider how this affects different CCX's. 16MB x 4 CCX is still just 16MB if you're doing stuff on one thread. Interesting stuff for both application and kernel devs.
    Thanks again. Your videos rock. Keep up the good work.

    • @adoredtv
      @adoredtv  6 ปีที่แล้ว +2

      Cheers, yeah I've been surprised by just how many people said they didn't know Zen's L3 was victim cache!

  • @dionamuh
    @dionamuh 6 ปีที่แล้ว +46

    Did you know UserBenchmark now has a link to this video at every System Memory Latency Ladder graph? Pretty cool. 😎
    Very interesting stuff btw!

    • @PanduPoluan
      @PanduPoluan 6 ปีที่แล้ว +4

      They did? Wow... Jim's truly well on his way to success.
      All the best wishes. His analysis are always the greatest.

    • @RobBCactive
      @RobBCactive 2 ปีที่แล้ว

      Ironic!! I wonder if Userbenchmark users stumble onto Jim's exposé into unreliable unprincipled world of slanted benchmarking.
      Last time I tried Userbenchmark with a Ryzen it ludicrously recommended a dual core i3

  • @ΒασίλειοςΜπεσλεμές
    @ΒασίλειοςΜπεσλεμές 6 ปีที่แล้ว +71

    If adored studios launch a new game it will be for sure optimised for the ccxs xD

    • @Numenor76
      @Numenor76 6 ปีที่แล้ว +3

      Looking forward to the game then ;)

  • @____5837
    @____5837 6 ปีที่แล้ว +2

    The only thing I would add to your explanation of how caches work at 15:22 is that what gets deleted doesnt just depend on how long it has been since the data was last read, it is also effected by how frequently that data was previously read, so even if bobhealth hasn't been read for a while, it might not be deleted if it was previously read more frequently than everything else.

  • @kevinglennon7864
    @kevinglennon7864 6 ปีที่แล้ว +5

    As a scientist, proper interpretation of errors is incredibly important to me. It is not correct to say "This value was lower than the other, but they were within margin of error." If the values are within 1 standard deviation, the only thing we can say is "the numbers could not be measured to be different." We honestly just have no idea which one is actually higher than the other. The number which is perceived higher may actually be the lower number, and was only perceived higher only by random chance.
    Although people often publish at just 1 SD, you should really be comparing numbers at 2 SDs (95% of the area under the gaussian) if you're trying to determine if they are the same.
    Great video, you make it easy to learn about something entirely new.

  • @TheOblacek
    @TheOblacek 6 ปีที่แล้ว +6

    Damn Jim I noticed that on Userbenchmark under System Memory Latency Ladder as explanation they have posted a link to this video. Congrats!
    It's a very informative video I enjoyed it a lot!

    • @PanduPoluan
      @PanduPoluan 6 ปีที่แล้ว

      No kidding! Another commenter mentioned this, and so I just _have_ to check it out... and it's gloriously awesome!
      Jim, you're really making your way up to become one of the Internet's greatest sources. Congrats!

  • @Healtsome
    @Healtsome 6 ปีที่แล้ว +5

    Your channel is the only one that could help me to batlle my attention span loss. Thank you.

  • @blackheart004
    @blackheart004 6 ปีที่แล้ว +9

    At the 3 minutes mark I got SUCH A HUGE NOSTALGIA PANG :O
    Back in 1991 when I was like 7 yrs old, my mom bought me a CIP-03, which was a Romanian produced Sinclair Spectrum (I live in Romania btw) with 48 KB of memory. AH THE DAYS of learning to code in BASIC!

  • @rick-potts
    @rick-potts 6 ปีที่แล้ว +3

    Couple of years older than you Jim, and some of my fondest memories of "me and my dad" were the hours we used to spend together programming and "gaming" on the Spectrum.

  • @dbzssj4678
    @dbzssj4678 6 ปีที่แล้ว +4

    Above the charts on userbench they've added a tidbit at the end as a link :D "L1/L2/L3 CPU cache and main memory (DIMM) access latencies in nano seconds (explanation by AdoredTV).
    "

  • @The_Nihl
    @The_Nihl 6 ปีที่แล้ว +43

    Sup Jim!
    Any mention over processor uArchs, Im wet.
    Highly educative content, and explained in such great way! I really love how you break down the cache purpose and functionality/principle of operation in such easy and understandable way for even people not exactly PC-hardware iterate.
    Cache size in Zen2 processor is really interesting beast. Bigger cache is always useful, as difference between latency and bandwidth of L-caches and DDR main memory banks is monstrous bottleneck. Many modern processors from long time struggle with waiting or idle cycles, when waiting until memory will be addressed, instruction extracted.... just looking at these CAS latencies today! 14 to 21 Cycles? yugh....more cache on Silicon would allow to store more binary data and instructions, creating significant reduction in waiting or wasted cycles for the memory. specially on doubled cache... 32mb per chiplet? oh my.... I still have my old Cyrix with kilobytes of cache haha
    I really liked this video. level of free education here is absolute champ!

    • @glenwaldrop8166
      @glenwaldrop8166 6 ปีที่แล้ว +3

      As the size increases so does the latency though.
      I imagine AMD has done the math.
      I am, however, certainly looking forward to the day that we have 16MB L1, though I imagine code will be so inefficient we'll need it.
      Ever notice that no matter how fast computers get Windows is always slow? Yeah, that's the only part about massive CPUs that worries me. MS doesn't seem to understand we got the massive computer to run something other than the OS.

    • @brunogm
      @brunogm 6 ปีที่แล้ว

      @@glenwaldrop8166 There are some papers on this. "Hybrid Memory Cube in Embedded Systems", basically HMC as main memory is better than LPDDR3 + L2 cache configuration.

    • @dreadlock17
      @dreadlock17 6 ปีที่แล้ว

      Lmao good to see you here lukasz

  • @mattsmechanicalssi5833
    @mattsmechanicalssi5833 6 ปีที่แล้ว +70

    Back in the day, AMD and Intel CPU's used to fit in the same socket. Cyrix too! What if an engineer is using an Intel board (Though be it heavily modified) in order to match their performance levels. Just a thought.
    Great work Jim. And I love the story of your childhood. NostraScotsman is human after all!

    • @bgk8890
      @bgk8890 6 ปีที่แล้ว +11

      This sounds like way too much work

    • @glenwaldrop8166
      @glenwaldrop8166 6 ปีที่แล้ว +11

      @@bgk8890 it would be a good way to run proper apples to apples testing.
      Gotta wonder.
      With the IO chip they could drop the CPU back to AM3+ if they wanted to.
      There would be a performance hit, obviously, but I would jump on a Ryzen upgrade chip in a second.

    • @pleasedontwatchthese9593
      @pleasedontwatchthese9593 6 ปีที่แล้ว

      I could see this being a thing

    • @darven
      @darven 6 ปีที่แล้ว +4

      That would be a nice thing. But... I doubt intel would like it. They would probably completly rearrange the pins and whatnot with every release just to make AMD waste time.
      But i am all up for flashing the bios to be able to run ryzen and viceversa.

    • @soylentgreenb
      @soylentgreenb 6 ปีที่แล้ว +20

      Matt Christie The reason why AMD used the same socket as intel was that AMD started making X86 CPUs as a second source for Intel. Now why would anyone like another company to act as a second source for their product, reducing profits? Because many of the early customers were military and mission critical and companies like IBM demanded that there was someone else able to supply a compatible product to replace intel if they failed to deliver or went out of business or something. AMD produced exact, identical 8088, 8086, 286 and so on chips under agreement with intel. With the 386, intel thought it was so revolutionary they basically said screw it and refused to let AMD be a second source; this delayed IBMs use of the 386, but other companies like Packard bell and what not made "PC Compatibles" with the 386. AMD didn't truly make their own separate product until the k5 and they didn't really succeede in making a competitive product until the K6 which was based on the Nexgen NX686 (AMD bought Nexgen).
      AMD was remarkably successful in the late 90's and early 00's. If not in sales, then on competitive performance. The pentium pro was a huge leap for intel; the K7 slot A Athlon let AMD catch back up and slightly surpass intel in floating point (which was the new thing since Quake that every 3D game needed). The Athlon 64 beat intels pentium 4 badly; this was because intel expected their process engineers to pull another rabbit out of the hat and make the pentium 4 run cool enough at 8 GHz (double pumped ALU running at 16 GHz) by 2003 and that was just not possible.
      In the late 90's AMD managed to equal intels performance while on average being a process node behind. That was really very impressive.

  • @jkd7799Yann
    @jkd7799Yann 6 ปีที่แล้ว +3

    I have Never Seen any other youtuber out there going so thoroughly into such details, that's why i remain a strong subscriber

  • @Orochimarufan1900
    @Orochimarufan1900 6 ปีที่แล้ว +44

    You know, the basic line numbers aren't as useless as they seem. They're basically the predecessor of jump/goto labels. They allow you to later add things between lines (hence you'd usually count 10 20 30 etc instead of 1 2 3) without messing up all your jumps (Trust me, having to go through your whole program and fix up every jump just because you added a line somewhere is not fun, especially since one's likely to repeat it a lot when debugging). At the time there were also few sophisticated text editors available, so replacing a line just by entering the same number again would have been easier than trying to edit the old one. This is especially true on (semi-) write-once media, though i'm not sure how much use this last point was in practice.
    Overall great video though.

    • @dralord1307
      @dralord1307 6 ปีที่แล้ว +2

      On my comodore 64 it also made debugging a hell of a lot easier :D

    • @RolandSxxx1
      @RolandSxxx1 6 ปีที่แล้ว +2

      It's also you taught the 10 x table...

    • @pleasedontwatchthese9593
      @pleasedontwatchthese9593 6 ปีที่แล้ว +1

      I think it was laziness. Labels could have been used for jumps and inserting a new line of code could have been based on a relative generated line number when printing to the screen. I think they where happy that it was working and did not care how well it was.

    • @adriankelly_edinburgh
      @adriankelly_edinburgh 6 ปีที่แล้ว +3

      Don't forget that back then, BASIC code on these machines was interpreted rather than compiled so getting an error such as 'Syntax error at line 40' made it much easier to pinpoint your inevitable typing mistakes. I had an Acorn Electron back then which ran BBC BASIC which supported auto line numbering as you typed and also had renumber command which could re-do all your line numbers if inserted code meant that you were in danger of running out of space between existing lines.

    • @joker927
      @joker927 6 ปีที่แล้ว +1

      There were no alpha line labels? Couldn't say jump to "loop1?" The line label was the line number?

  • @phillipcrowley7541
    @phillipcrowley7541 6 ปีที่แล้ว +4

    I cought that subtle hint at the end. Next video hopefully won't take "7" days to release. Radeon VII video incoming

  • @CRBarchager
    @CRBarchager 6 ปีที่แล้ว +1

    An instant like for your video as always. These in-depth videos are really the best thing about your channel and loving the knowledge and the details that goes into them. Thank you for this and looking forward for the next one!

  • @bakadeshi_aunstudios
    @bakadeshi_aunstudios 6 ปีที่แล้ว +2

    I used to also write text based rpg games like that in basic back in my high school days, thanks for the trip down memory lane Jim! Great information video. Most of this I already knew, but you break down things in a easy to digest manner for those that don't while still managing to make it interesting for us that do. Don't know how you do it.

  • @RoyTelling
    @RoyTelling 6 ปีที่แล้ว +6

    KIITOS - THANK YOU....
    you have managed to get me to understand Cache a lot better, at a level I could understand (my 58 year old brain not as quick as it use to be LoL)
    shared this on my FB page because I think many people my like this

    • @adoredtv
      @adoredtv  6 ปีที่แล้ว +1

      Cheers Roy!

  • @ImTheSlyDevil
    @ImTheSlyDevil 6 ปีที่แล้ว +1

    I really appreciate this extensive explanation, Jim. Also, I'm glad to see that Userbenchmark has added a link to this very video to explain their cache/memory latency section of the benchmark.

  • @chefov
    @chefov 6 ปีที่แล้ว +2

    The mad lad strikes again. What an awesome surprise to get a video while at work. Now I'll have something decent to do! Cheers!

  • @TrueThanny
    @TrueThanny 6 ปีที่แล้ว +3

    The 486 also supported a L2 cache, but it was not on the CPU package. It was on the motherboard. This was also the case with the Pentium line, as well as the Cyrix and AMD clones with both 486 and 586 class chips. The cheapest motherboards had no L2 cache sockets. The cheaper ones had the sockets but no chips. And the mildly cheap ones had either half the sockets occupied, or all occupied with small chips (making a cache upgrade much more expensive). The impact on performance between having an L2 and not was immense. The impact of its size was less dramatic, but still significant. Having it on the motherboard also meant more difficult troubleshooting. Rather than just having to check for bad memory SIMMs as causes of frequent crashing, you had to check the SRAM chips as well.
    Intel moved the cache onto the CPU package with the Pentium Pro, Pentium II, and first version of the Pentium III. The second P3 revision finally had on-die L2 cache. It was beaten to the punch by AMD with its K6-III chip, which was also the last AMD x86 chip to use the same socket and motherboard as Intel chips. Next came the Athlon, and the beginning of AMD's rise to nearly half the market, before Core 2 cut them off at the knees.

  • @introvertplays6162
    @introvertplays6162 6 ปีที่แล้ว +4

    everytime I see an AdoredTV video in my subscription box I shout out loud: "YES!!!" and some family member comes over to my room to ask what happened. XD

  • @mcDragoon
    @mcDragoon 6 ปีที่แล้ว +10

    Very interesting article, well done!! This is why I'm a loyal subscriber lol.
    Would it be possible to do a video about the memory speeds, latency with Ryzen CPU's? I see you have 3400mhz speed, I'm really curious to know what kind of timings you have and if it shrinks the gap between 2700X and Intel CPU's. I've seen other articles about this, but it won't compare to one made by you.

  • @Half_Finis
    @Half_Finis 6 ปีที่แล้ว +8

    How dear you release this while I'm at work? I need to comment first!!!!

  • @felixonyango5409
    @felixonyango5409 6 ปีที่แล้ว +3

    Wow. Wow. Wow...... Thank you for a brilliant explanation on how cache memory works. Will be using this to educate my programmers.

  • @Jimster481
    @Jimster481 6 ปีที่แล้ว +2

    Another great video!
    I am a low level software engineer and I have written many small algorithms that directly benefit in terms of caching.
    Infact when designing algorithms that will run almost constantly its important to be mindful of the average cache size of CPU's to be able to achieve as much performance as possible.
    As the case with most AMD hardware... these large cache's wont be utilized immediately and it will take some years of optimizations / software progress to truly speed up the most common of algorithms.
    Although I think that the Ryzen "AI/Intelligent" cache is actually scanning through applications in real time and trying to figure out what data is the best to cache. I have noticed some weird behavior on programs that I develop for my company where specific algorithms (especially those which are heavily multi-threaded) are much slower on my Ryzen vs on my older Intel parts.
    So much so that my old skylake i7 U XPS 13 can beat my 1700x by a full second (or sometimes more) in my data randomization software when targeted on one of my production products.
    The total "processing time" comes out to around 6 seconds on my Ryzen, but using the same 8 threads (or even 16) results in the performance being very much the same while the same task on the i7 U can take only 4 seconds even using the 16 threads.
    The design of my application has a single controller/dispatching thread and then it fills up the rest of the threads with work while it waits for them to complete (not the best design since it has to wait, but I cba to re-design it since its already more than fast enough)...
    Something about the AMD Ryzen Cache + IF penalties make this task so very much slower vs older monolithic intel designs.
    I hope that with Zen 2 that the IF performance is increased again or that the caching is improved to reflect an increase in performance in my specific application (not that the performance needs to be better, but I also use it as a sort of a benchmark).

  • @jonavin
    @jonavin 6 ปีที่แล้ว +8

    All you people nitpicking at his coding sample. He’s just dumbing it down so that an average people can understand it. It’s not really important that any strings in the sample would also need a trip to memory. If you want to be technical, the integers would be loaded into registers before the operation. You’ll just confuse most people if there’s too much more details. I think it was the right level of details for the understanding of multilevel caches.

    • @yottaXT
      @yottaXT 6 ปีที่แล้ว +2

      Haven't read any coment in that regard, at least not yet, but yeah as you said, he did a very simple example so everybody could follow the explaination, a very good one to be fair. I'm a Software Developer myself and found it very on point, i wish i've had a teacher with that kind of devotion and tact to explain things that easy back in my univ days.

  • @shznn
    @shznn 6 ปีที่แล้ว +23

    Used to be an Intel fanboy. Thought they had it all figured out. Heck, I was thinking of spending 500Euro on a 9900k on Black Friday, I didn't and am now waiting on Ryzen 3. I thought Zen was a "value proposition" when in fact it is superior to Intel's obsolete engineering in every way, from architecture to a viable cooler included. I'll venture to say that I used to not pay attention to power consumption, until after watching Adored and Coreteks. Now I understand that efficiency = power. I also used to think that nVidia is good, no matter what. Thanks guys. Now I'd only like a 9series Intel if it's for free :)

    • @grizzly6699
      @grizzly6699 6 ปีที่แล้ว +3

      Jim has great analyses in his vids. I found Coreteks several months ago and does similar content to Adored. Maybe they should collaborate, sounds like a plan :)
      I thought AMD was the greatest, until they released Bulldozer in 2011 and I turned to Intel and never looked back... until Ryzen arrived in 2017. Now I'm planning a Ryzen 3000 system later this year or the next. I can't wait to see what unfolds this year in the tech space.

    • @bartbroekhuizen5617
      @bartbroekhuizen5617 6 ปีที่แล้ว +2

      @@grizzly6699 Yeah, Coreteks also explained in his video about the energy it requires to run something to one point to another. Jims analysis perfectly fits the explaination of Coreteks. Here is his video: th-cam.com/video/oU-NNV2pYTQ/w-d-xo.html

    • @PanduPoluan
      @PanduPoluan 6 ปีที่แล้ว

      If you can get a 9series Intel for free, please inform me as well.
      Despite my Grand Dream of owning a 12- or 16-core Ryzen 3000, I definitely won't turn down the opportunity of owning a 9series Intel... for free xD

    • @RobBCactive
      @RobBCactive 4 ปีที่แล้ว

      But a free i9 is worthless without an expensive Z series mobo and if you use it for long periods (perhaps gaming) those extra watts add up, especially in warm summer with a/c.

    • @RobBCactive
      @RobBCactive 4 ปีที่แล้ว

      @@grizzly6699 But AMD 64 X2 was already out performed by core duo years before 2011. The Phenom / Bulldozer was disappointing because it meant AMD's new arch hadn't caught up with Intels, condemming them to discounting until their next CPU generation.
      But they really weren't so terrible, people were able to buy 2, 4 (even 3) core chips cheaper than Intel's ... It was in the reviewers' main app they looked bad, benchmarks mainly due to floating point which wasn't very relevant if you had a GPU for fp offload.
      That arch spawned Jaguar used in consoles and some integrated APU designs which performed well in daily use, allowing 3D games to run faster and smoother than the competition.
      The tech press often totally dis designs aimed at markets they have little experience of ... at one time it was "But can it run Lotus?" when ever a higher performance CPU was introduced due to a power user obsession with how they used their PC. I remember a 32bit workstation I worked on having a 16bit so called accelerator added so it could run MS-DOS apps, when I would actually knock up scripts or write C faster than I could use the productivity suite of that era.

  • @j.b.6855
    @j.b.6855 6 ปีที่แล้ว

    Nice informational video. It gives a basic understanding of whats going on with memory. Something I really didnt have a clue about. Very interesting and well presented.

  • @KuraIthys
    @KuraIthys 6 ปีที่แล้ว +19

    Since I still mess around with these early 8 and 16 bit systems, yeah, memory speed really became an issue.
    Especially the 6502 family had issues.
    See, the 6502 is a memory + register design, where say even the 8086 and z80 are register + register designs.
    You might ask, so what?
    Well, it means that the majority of operations on a register+register design revolve around combining values from two registers and storing the result in one of those registers.
    That means with careful coding, you can keep a lot of stuff in the registers, and keep memory accesses down.
    The Memory + Register design means almost all instructions that exist are ones where one of the values operated on is in memory, and one is in the CPU.
    That means everything accesses memory all the time.
    As long as the memory can keep up, that's fine.
    But as the CPU speeds outpaced memory speeds, it became more and more problematic.
    And the full extent of how bad this could get could be seen in the 16 bit console wars, where you had the Motorola 68000, which is a Register+Register design, vs the 65816.
    Now, the 68000 has a lot of registers to work with, so you really can cut down on memory accesses. In fact, the processor only does one memory access every 4 cycles, so it's even more critical to avoid frequent memory access.
    But it has another consequence too - since memory access is only on 1 in 4 cycles, the CPU can run 4 times faster than the memory without causing problems.
    Contrast this with the 65816, which performs single cycle memory access. Great. Amazing even. IF you have memory fast enough.
    And in fact, the 65816 makes things even worse, because a design quirk means the memory has to respond in half a cycle to prevent CPU instability...
    So guess what, that 3.58 mhz 65816 requires memory rated for something like 120 nanoseconds (eg fast enough for roughly 7.16 mhz single cycle access).
    Meanwhile that 7.16 mhz 68000 requires 480 nanosecond access. (eg. fast enough for 1.79 mhz)
    See the issue yet? Keep in mind that faster memory is more expensive than slower memory, and that was even more true in the 80's and early 90's...
    So, the 65816 system needs 4 times faster memory than the 68000 running at twice the clock speed!
    Does that mean it has 4 times the memory bandwidth? Well, no. The 68000 uses 16 bit memory, while the 65816 uses 8 bit memory.
    But the cost of memory is more closely related to it's speed, than the bit width.
    Also due to that unfortunate half-cycle requirement, the actual rate at which a 65816 CPU accesses memory is half the speed it's memory would suggest. Because of that, in spite of having 4 times the memory speed, the 3.58 mhz 65816 has the same memory bandwidth as the 7.16 mhz 68000.
    By now I'm sure you know which systems I'm referring to. And you can see how awful the SNES's memory performance requirements are, which would have driven up prices of RAM and ROM.
    If you know your 16 bit console hardware, you might also know that the SNES CPU drops to 2.68 mhz fairly often.
    And why is that? Because the system's main RAM, and most of the ROM chips seen early in it's life, weren't fast enough!
    So owing to high cost/low availability of sufficiently fast RAM, the SNES is actually operating at 2.68 mhz much of the time, not it's hypothetical 3.58
    And that's a 33% speed reduction.
    So is there an upside to this apparent weakness? Well, yes. The 6502 family can be said to have a very high IPC. That is, for a processor from that era it performs it's calculations in a rather low number of cycles.
    That's all well and good if memory speeds keep pace with CPU speeds, but of course, they didn't. And that became a problem.
    Because the CPU quickly started to outrun memory speeds, and that's very bad news for a Memory+Register design.
    Plus, where a Register+Register design lends itself well to cache schemes, a Memory + Register design is much less viable if you need a cache.
    Certainly, It's by no means impossible to use a cache with a memory + register design. And the 6502 family in particular could benefit enormously from the first 64k of memory being quite a bit faster than the rest of the memory, owing to the way it uses it's stack, and the zero page/direct page logic that treats the first 256 bytes (or a specified 256 byte range in the first 64k for the Direct Page version) of memory as something akin to an extended register file.
    However, on the whole Memory + Register designs are still relatively poorly suited to cache memory schemes.
    And thus, they largely fell out of favour.
    Although, ironically perhaps, the 6502 family is one of two main 80's designs that is still widely available as newly produced chips, and has even increased in speed. The standard modern 65816 chip runs at 14 mhz, and can easily be overclocked to 20 mhz without much problem. FPGA implementations have even managed to hit 200 mhz.
    the other design that's still widely available is the z80. Though that's not particularly faster than the early 80's versions.
    These chips are largely relegated to use in embedded devices, but the fact that they're still easily available when stuff like the 68000 hasn't been manufactured in something like 20 years now, says a lot about the enduring (if niche) benefits of these two designs...

    • @Leyvin
      @Leyvin 6 ปีที่แล้ว +5

      www.digikey.com/products/en?keywords=MCF54455VR266 (266MHz 90nm 68060 w/MMU/DSP/FPU/Cache/Superscalar)
      They do still produce the Legacy 68K Processors as well (MC68SE as the Search Key) … and while they're not strictly speaking "End-of-Life" as of yet., they are only produced to order now; and they've been threatening to discontinue them for the past 3 years.
      There hasn't been any "New" Developments on the Architecture since 2004., and no new Revisions since 2010-2012... something like that.
      Don't count out the 68K just yet... it's a trooper, especially in the Automotive Industry.

    • @RobBCactive
      @RobBCactive 2 ปีที่แล้ว

      That is over simplistic, it ignores the practical cost effective performance of 6502 micros. The 8086 & 8088 were 16bit processors, the PCs massively more expensive. Registers were not sufficient as CPU speed improved, RAM was a bottleneck, requiring cache-ing. The Z80 had higher frequency but multi cycle instructions, registers need transistors so add cost.
      The 6502 has an accumulator register but also fast zero page memory access without 2 cycles for a 16bit address. That is elegant and economical on transistors, the instructions operating in 1 or 2 cycles. The memory at the time operated at CPU frequency, so the solution matched the ecosystem using cheap 8bits for 16bit addressable memory. Furthermore memory mapped i/o avoided special but inflexible i/o instructions. The 6502 has a partial pipeline interleaving RAM access with computation.
      Moving on, engineers at Acorn dissatisfied with the available 16/32bit designs were able to make the ARM1 taking advantage of 32bit bandwidth with the CPU coupled with RAM speed by a load, process, store pipelined architecture. It was very fast, cheap and power efficient and outperformed the efforts of far better resourced teams.
      The downsides were: firstly the coupling had to be broken introducing caches as CPU frequency increases outstripped memory speeds, but that broke some software which had relied on the unbuffered RAM characteristic.
      Secondly requiring 32bit RAM was too expensive for embedded application and code density insufficient to meet cost targets of designs that wanted the power efficiency offered. So a 16bit instruction mode was added and it was adapted to minimise RAM chips.
      The point is designs meet a market and cost constraints, the elegant highly optimised becomes unsuitable when conditions change.

  • @Loundsify
    @Loundsify 6 ปีที่แล้ว +2

    A lot of schools would find content like this really useful for teaching computing.

  • @acidstorm001
    @acidstorm001 6 ปีที่แล้ว

    Jim, no one else covers anything like this on TH-cam to this extent. It's one of the reasons why I love this channel. 42 minute video, and I didn't even flinch. Most 20 minute videos I watch, I'll jump ahead on. Your videos just flow so well that I do not feel a need to jump ahead. In reality, you can't anyway. You would miss something important leading into your conclusions. Great stuff as always, keep it up!

  • @mik310s
    @mik310s 6 ปีที่แล้ว +2

    Great video as always Jim. Please dont stop the analysis videos they are the most interesting :)

  • @Najvalsa
    @Najvalsa 6 ปีที่แล้ว +3

    Perfect thing to be presented with after work.
    Thanks for the continued work, Jim. :)

  • @dycedargselderbrother5353
    @dycedargselderbrother5353 6 ปีที่แล้ว +2

    Line numbers in programs made a lot of sense in the "goto" paradigm where a goto statement would just jump to a line. This was eventally replaced by function programming, where code was separated into modules. This was a bit heavy for these old computers, though, because you needed to utilize a call stack to keep track of the functions you were entering and exiting. goto was faster, though it tended toward incomprehensible spaghetti code that no one but the original designer could understand or contribute to.

  • @kcvriess
    @kcvriess 6 ปีที่แล้ว

    An absolutely awesome production Jim! Now I fully understand the articles I read a long time ago! T-Bird and Coppermine era. L2 was off die before that, I think? LOL Thanks :) Now you've got me fantasizing about Zen4 indeed...

  • @kopasz777
    @kopasz777 6 ปีที่แล้ว +5

    As a CS graduate, this was a nice recap. You explaining it in such detail made me realize how the "curse of knowledge" affected me, just assuming what I know all others know too.
    But I believe most of your subscribers are more tech-savvy than the average guy.
    Edit: sry, this came out a bit pretentious.

    • @adoredtv
      @adoredtv  6 ปีที่แล้ว +3

      No you're right in that you are in that higher expertise range on this topic. ;)
      But most of my subscribers have no idea about this stuff. Even at this "entry" level, it's far beyond anything most have been taught. This is actually one of my major strengths - understanding when it's gone too far for most to comprehend, and toning it back.
      I could have gone a lot deeper (I'm no expert on cache and never will be), but had I done so it would have been alienating to the average viewer.

  • @beamtech3412
    @beamtech3412 6 ปีที่แล้ว +6

    Seems interesting, you should do more of these technical videos

  • @giovannip.1433
    @giovannip.1433 6 ปีที่แล้ว +2

    Thank you for your informative and - to myself at least- entertaining discussion on cache. When questions popped up in my mind you explained in your video so that I could grasp what is going on... How does a CPU 'know' what is in its cache and where it is? - how does the program 'know' to assign data in which cache and for how long? Registry? As the caches get bigger aren't more resources required to manage and record where the data is..? I'm surprised that You tube doesn't put adds on your videos- to compensate you in some fashion on their site- it is content like yours that has resulted in switching off the TV and watch content like yours.

    • @Viewer19
      @Viewer19 6 ปีที่แล้ว

      The memory controller checks what's in the cache upon request for data or instruction. The program does not control cache.

    • @zusammenarbeitfurerfolg6962
      @zusammenarbeitfurerfolg6962 6 ปีที่แล้ว +4

      +Giovanni P.
      To answer your questions:
      _How does a CPU 'know' what is in its cache and where it is?_
      Every cache has a tag cache which stores the original addresses of each cache block (which are requested by instructions, i.e. known) as tags.
      You can see that tag cache in the video at 17:17 in blue between the two blocks of "L2$" named "L2$ Tags". Depending on cache associativity, the cache logic has to search through one to every tag, but if it finds the tag, it immediately knows that some data is in the cache and in which cache block it is. If it doesn't find the tag, it knows the data isn't there and goes to the next level.
      _how does the program 'know' to assign data in which cache and for how long?_
      The cache is entirely managed by the cache logic which sits next to the cache within the CPU. No program has direct control over any data location within the cache, which is quite an advantage since no old program needs to be rewritten and the cache logic usually has more accurate information as to which data will be used or not.
      _Registry?_
      The Windows registry also has no knowledge of the cache. Not even the registers inside the cores know anything about the cache. This is some beauty about the cache - it's invisible except for its effect.
      _As the caches get bigger aren't more resources required to manage and record where the data is..?_
      Yes, this is true. The tag search takes longer, the bigger the cache is. You also have to consider the cache's associativity, since too high means too many tags to search, too low means too much wasted space, with cache blocks remaining unused for a long time. Higher associativity needs more space for logic or adds latency, lower one needs more space for storage.
      On the other hand, the time saved due to more data being nearer than memory can have huge benefits in some cases.
      It is quite a balancing game, too much cache and the performance penalty is bigger than the speed up, too low and the cache speeds nothing up. At the same time, you use area which you could use for actual compute logic which could be left unused due to being memory starved. Tough design decision.
      I hope I could answer some of your questions. Should you encounter new ones, please let me know, I will make my best efforts to answer them correctly.
      Sincerely,
      ZfE

    • @giovannip.1433
      @giovannip.1433 6 ปีที่แล้ว

      Thank you very much for your time in explanation. @@zusammenarbeitfurerfolg6962

  • @TheBIOSStar
    @TheBIOSStar 6 ปีที่แล้ว +47

    23:02 Your system has 16MB of DDR4 RAM? :>

  • @stale2665
    @stale2665 6 ปีที่แล้ว +6

    when you start the video and your speakers are off and you rewind just to make sure you catch the "alright guys how's it going"

  • @gabumoh
    @gabumoh 6 ปีที่แล้ว +10

    This was a fun and educational video...

  • @pvalpha
    @pvalpha 6 ปีที่แล้ว

    Once again a very excellent video. Take your time, we understand. :) As someone who started on a TI 99/4A when they were 8 years old, I certainly understand that early computer intro and why its so important to understanding the evolution of computer systems to the modern day. This is one of the clearest explanations I've ever watched and listened to. Thank You.

  • @RepsUp100
    @RepsUp100 6 ปีที่แล้ว +1

    Informative as always, thank you!

  • @bdhale34
    @bdhale34 6 ปีที่แล้ว +2

    My first x86 computer was a Tandy 1000 HX had the ram upgrade card and both size external floppy drives. The CPU was a dual speed 4.77MHz/7.16MHz so blazing fast. My very first home computer was the Tandy TRS-80 Model 2 color computer not sure what speed it ran at off the top of my head.

  • @personaldronerepair6141
    @personaldronerepair6141 6 ปีที่แล้ว

    Fantastic explanation !!
    That was time well spent watching .
    Thank you for the time in .

  • @jamespieske5246
    @jamespieske5246 6 ปีที่แล้ว +5

    Yes! A 42 minute masterpiece that dropped between dinner & bedtime instead of 3am!
    Earbuds definitely in whilst I do the dishes tonight!

  • @Frostie3672
    @Frostie3672 6 ปีที่แล้ว +1

    Mentioning those old programming languages brought back memories, was using fortran where I was working back in the early 90s. Started of writing code on the C64 in 1988, using basic first but then onto assembly code, I remember the program in the C64 manual where you moved a balloon with the Commodore logo around the screen using the joystick, I wrote that all in assembly code, was well chuffed with myself lol.

    • @fishclaspers361
      @fishclaspers361 6 ปีที่แล้ว

      You probably have lots of stories and wisdom to impose. Spill the beans.

  • @Mattski_83
    @Mattski_83 6 ปีที่แล้ว

    I just watched this and I would have to say that this is one of your best videos yet. I learnt so much and I enjoyed every second of it. You seem to be getting better every video you make and I've only been watching for a year and a bit. Keep up the good work, I eagerly await your next video.

  • @ryankraidich4533
    @ryankraidich4533 6 ปีที่แล้ว +1

    @AdoredTV/Jim Userbenchmark now has a link back to this video for the System Memory Latency Ladder section!

  • @AlmightyGTR
    @AlmightyGTR 6 ปีที่แล้ว +2

    Next week prof. Jim will help us understand LRU, LFU and FIFO. Jim, you are tremendous at ELI5, a born guru.

  • @Mrfiufaufou
    @Mrfiufaufou 6 ปีที่แล้ว +1

    Finally a new video, always a delight!

  • @M00_be-r
    @M00_be-r 6 ปีที่แล้ว +1

    Great video Jim 43 minutes felt like a blink of an eye, love this indepth approach.

  • @catsspat
    @catsspat 6 ปีที่แล้ว +2

    You made me look up info about the first computer I ever had. GoldStar (LG's ancient ancestor of sorts) FC-150 (FC supposedly meant Family Computer). I couldn't even remember the model number and only got to it after some searching. It was apparently a weird clone of Japanese Sord M5, which itself was likely a clone of something else. Kind of like MSX, but not really. I even had this weird printer (plotter?) attachment that printed using special ball-point pen like insert. It would control whether the pen was pushed against the paper or not, and then move the paper up-down or pen left-right to draw. Insane! Of course, I also had a dedicated cassette recorder attachment to save BASIC programs.
    Nostalgia explosion!

    • @adriankelly_edinburgh
      @adriankelly_edinburgh 6 ปีที่แล้ว +2

      Didn't LG actually stand for Lucky Goldstar?

    • @catsspat
      @catsspat 6 ปีที่แล้ว +1

      @@adriankelly_edinburgh
      Yes, they did for a while (Lucky was another big Korean company that dealt with chemicals and stuff). Come to think of it, the merger was between a chemicals company and an electronics company? Weird. I don't know when LG switched to "Life is Good."
      GoldStar sort of made more sense since they competed directly against Samsung (literally ThreeStar). I miss GoldStar's old ad always ending with the statement, "a moment's decision determines 10 years of outcome" or something like that, meaning you choose an appliance for the long haul.

    • @snetmotnosrorb3946
      @snetmotnosrorb3946 6 ปีที่แล้ว +1

      I still have a GoldStar microwave oven. I believe it's almost 30 years old. It's still working.

  • @Falkkos
    @Falkkos 6 ปีที่แล้ว +1

    Adored uploads a video, i'll watch it at 5am in the morning. Time to get some breakfast and get comfortable with a 40+ minute video. I enjoy your videos Jim!

  • @iseverynametaken
    @iseverynametaken 6 ปีที่แล้ว +1

    I love your channel. This episode hit me a little when you talked about your dad. Hope you go into detail in regards to Basic, python and C.

  • @TheythinkimNinja
    @TheythinkimNinja 6 ปีที่แล้ว

    Thank you for making these Long Videos, I listen to these wall at work and they are really entertaining to listen to and very informative. Keep up the good work

  • @rubenschaer960
    @rubenschaer960 6 ปีที่แล้ว +6

    Jim, the cache results don't quite match the q740, but they do very closely match the i7-870, another popular H55 era CPU, including the peculiar latency spike when transitioning from L3 to system memory. Also, congratulations: Userbenchmark now links directly to this video in their benchmark results, under the "System Memory Latency Ladder" header :D

  • @TheJamieRamone
    @TheJamieRamone 6 ปีที่แล้ว

    Note: the 486 was the first one to have cache INSIDE the processor package. 386 systems had cache in separate chips.

  • @ZZstaff
    @ZZstaff 6 ปีที่แล้ว

    Thank you for taking the time to create, produce and upload this video. Interesting. At the same time, it took me down memory lane to very early hardware. Intel 4004, Z80 that was used by home builders if I remember correctly, including computers I used and/or sold, but not owned by me, 8086 and follow on Intel and AMD [that for a time built CPUs for Intel]. I worked with a friend that kept trying to get me to purchase a computer for myself, a Commodore was his recommendation I believe. I told him, "When PCs could do something useful I would get one". I think it was around 1983 when I made my first purchase for personal use. I believe it was an Atari 130-XE. I purchased an external 5 1/4" floppy drive [it had no internal drive, floppy or hard], SubLOGIC Flight Simulator II because I had been a pilot, a word processor with spell checker and a printer. I also purchased a 13" black and white TV as a monitor. In other words, a computer that was capable of doing something worthwhile, at least in my opinion.

  • @dawienel1142
    @dawienel1142 6 ปีที่แล้ว +2

    Great video as always, now I understand how caches work, well at-least somewhat.

  • @blazbohinc5735
    @blazbohinc5735 6 ปีที่แล้ว +2

    I watched this in a documentary state of mind. Good shit Jim. Outstanding. Keep them coming :)

  • @junkerzn7312
    @junkerzn7312 6 ปีที่แล้ว +2

    Yah, but you could count cycles! Ah yes, I remember those days. I actually built a 2400 baud modem with an ADC, a DAC, trig tables [256], and very, very carefully cycle-counted code. I could produce perfect waveforms and I could perfectly decode the receive waveform. The sucker actually worked!
    -Matt

  • @TheBIOSStar
    @TheBIOSStar 6 ปีที่แล้ว +17

    Old CPU: *takes value from on-board memory*.
    Modern CPU: "That wasn't very cache memory of you"

    • @phenomanII
      @phenomanII 6 ปีที่แล้ว +4

      I knew I wouldn't regret scrolling this far down the comment section.

  • @ADR69
    @ADR69 6 ปีที่แล้ว +11

    Ah, the history of cache at 0430 in the morning. Yes please

  • @MKeehlify
    @MKeehlify 6 ปีที่แล้ว

    My grandfather was a coal miner, my dad a software developer, me I'm a crypto miner 8-) ... j/k I'm a software developer too! I wrote my first useful program in 2005. Most programmers from my and later generations have completely different ideas of high and low level programming. It's awesome to get glimpses of the past. The simple and personal introduction was a great way to lead us into the main content. Thanks for the video Jim!

  • @joker927
    @joker927 6 ปีที่แล้ว +1

    I'm glad I didn't skip b/c of the easy stuff; the goods came in the second half. Excellent video.

  • @SleepyRulu
    @SleepyRulu 6 ปีที่แล้ว +2

    Today my birthday more adobedtv feeding us knowledge making us more better informed consumers.

  • @andyp123456
    @andyp123456 6 ปีที่แล้ว

    Thanks for the lesson (and little bit of family history), Jim. Love all the analysis, but also looking forward to seeing what you come up with next if it's less analytical than the last few videos.

  • @SpinStar1956
    @SpinStar1956 6 ปีที่แล้ว +1

    Well thank you for all the hard work. Your analysis is fantastic guided by a "zen-like" intuition, which I'm sure is backed by a lot of theoretical knowledge. Albeit tempting you throwing a pie in my face, (and given your self-proclaimed love of programming) it would be neat to see a real programming example of taking advantage of the AMD specific architecture to show how software can be optimized/or-not. Anyway, huge hats-off to you and your community!

  • @vladdragos5881
    @vladdragos5881 6 ปีที่แล้ว +7

    Got my coffee, I'm ready for 46 min of pure IT stuff :))

  • @pascalleblanc9017
    @pascalleblanc9017 6 ปีที่แล้ว +1

    Another great video Jim! It answered many questions, some of which I didn’t know I had, but somehow it left me with even more…
    1. Which kinds of workload benefit most from large caches?
    2. Was the rumor of cache on the IO die confirmed / disproved? If still possible, do we expect an L4 cache pushing the system memory latency further away or the L3 cache to be moved to the I/O die, freeing up costly die silicon? I don't see any signs of a large L4 cache in the engineering samples benchmarks.
    3. If cache can be added to the I/O die, could we ever expect CPUs differentiated specifically by the inclusion of a large L4 cache (...Crystalwell)?
    4. Do various cache levels scale differently in terms of manufacturing process?
    5. The benchmarks show a variation of latency from 1 ns to around 60 ns going from L1 cache hit to system memory trips. I am wondering do "memory calls" represent a significant power draw? Do the various "memory call scenarios" vary significantly in power draw?
    I have more, but I will sum it up by saying I would not mind a part 2! :)

  • @MrGunnarPower
    @MrGunnarPower 6 ปีที่แล้ว

    Nice one, I spend all week just waiting for your next video. I get excited when I see the notification pop up that you uploaded again.

  • @alexparkish
    @alexparkish 6 ปีที่แล้ว

    Great video...phenomenal even...This is what i will show my 3 girls when they are older to explain how it all works. Hundreds of hours of editing and research but my god you put it together and explain it so well!

  • @mjaminian
    @mjaminian 6 ปีที่แล้ว +2

    Now I know why I like you. You‘ve been a ZX Spectrum kid like me too!
    There are invisible affinities in this world, a fascinating phenomenon...

  • @quittessa1409
    @quittessa1409 6 ปีที่แล้ว +1

    Love hearing all the latest in my native accent - Gonna enjoy brekky with this. Cheers Jim :)

  • @btw8798
    @btw8798 6 ปีที่แล้ว

    This video was very informative. Thank you for taking your time to explain all of this important information in a simple manner.

  • @masjter
    @masjter 5 ปีที่แล้ว

    So 38mins into the video I realized how long I've been sitting and watching it. This is so interesting and insightful, job done really well.

  • @NicoNice24
    @NicoNice24 6 ปีที่แล้ว

    Another great video! It's seriously phenomenal how much I learnt by watching your content.

  • @PanduPoluan
    @PanduPoluan 6 ปีที่แล้ว

    VERY educative, good sir!
    I kept forgetting watching this vid, but I'm totally not regretting spending 43 minutes of my weekend for this.
    Keep up the good work!

  • @elvintp10
    @elvintp10 6 ปีที่แล้ว

    Your detailed insights into microprocessors are astounding. This is a quality, educational channel,.