I'm not 100% sure but I remember reading something about how some of the timings' names like CAS and RAS, which I believe stand for "column/row access strobe" respectively, are relics from much older types of memory and the name was just retained despite the actual purpose and behavior of the timing changing significantly.
@Sumiko Mei - RAS and CAS (Row Address Strobe and Column Address Strobe) were used on the first DRAM chips that used a multiplexed address bus - the MK4096 and MK4027 (4 k bits each chip), and simply latched the row and column addresses into the chip on the same six address lines. For what it's worth, I worked with these devices around 1978. They remained unaltered for 16k, 64k, 256k, 1M and 4M DRAM chips. the number of address lines on the chip increased by one for each generation. All of those DRAM chips were asynchronous and did not normally have timings linked to a clock. Then the first SDRAMs (Synchronous DRAM) chips came along at 4M and 16M (I think), and so clock frequency for RAM became a thing and it all started to get more complicated. DDR2, DDR3 DDR4 and DDR5 etc, have just added further layers of complexity since then but traces of the original architecture and signal names remain. @Christopher Jackson - Magnetic core memory requires that a single row and a single column are given a current pulse at exactly the same time in order to access the required bit on a plane of cores. There may have been row and column addresses at a particular level. but they would be heavily embedded within the control logic. Also, of course, with core memory there is no requirement for memory refresh cycles. While core memory had control circuits with tricky timing requirements, they were vastly different from later DRAM architectures and varied massively from one manufacturer to another.
@@GodmanchesterGoblin Great to hear from someone who has first-hand knowledge! I must have confused the addressing for timings based on the familiar acronym. I forget what book I read that in. My bad.
@@christopherjackson2157 No worries - I worked with core memory hardware in the mid-1970s (at the component level) and also worked with most memory chip types from 1 k bit upwards right into the mid 90s before I changed my career direction a bit. They were interesting times!
It made perfect sense. You're not accountable for the complexity of DRAM. If it is complex then the explanation will... be complex! I'm sorry you hate doing these because I like the series a lot :)
@@paxdriver It mostly is just a matter of reading documenattions and Datasheets. You could do this on your own, or you could listen to professor buidziods lectures.
tRC exists to slow ACT to ACT on the same bank without slowing ACT to ACT on other banks, where the vast majority of memory operations happen. If you don't have the ability to raise tRC you must instead raise tRAS which slows all of that other stuff. This creates 5%+ Linpack performance losses on CPU's which do not have tRC when using most DDR4 IC's.
Buildzoid being frustrated at this entire topic made it so much more engaging. I've never laughed at such a complex topic whilst learning so much at the same time!
Read->Pre is dependent on the rise times of the internal sense amplifiers and latency to latch in buffer. There will be a finite minimum time for a given voltage and temperature before the data is amplified, before this the signals can still be in flux. Once they are ready however, activating the circuitry that charges the memory capacitors back up with the data should be fine.
So in summary: CR=1 Command rate, signals memory what to do by holding the command 1 or 2 ticks. tRCD=14 Time between ACTivation of memory and a read/write tCL=14 after tRCD it takes some time to read the data off the memory tRTP=8 PREcharge operation, for which specifically: during the read operation, memory must be written back into memory. Can be shorter than tCL. tRAS=21 time between ACT and next PRE commands. Competes with tRCD+tRTP tRAS, as the longer one overrules the other. tRP=14 buffer time to let the PRE finish tRC = tRAS+tRP = 35 for Ryzen chips mostly, on same banks lengthens ACT to ACT time in this scenario. personal note: This feels like college xD I am taking notes! from tRAS so far, it begs the question, if I were to OC this memory i would tighten up tRCD, followed by tRTP, and test if tRAS=tRDC+tRTP. If that fails, i have to lengthen tRAS. But if that works, is there a point to pushing tRAS any shorter? Maybe in other scenarios in future videos?
@@ActuallyHardcoreOverclocking Thank you! Looking forward to the next parts in the series - I cant stress enough how much I appreciate you making this, despite the frustrating nature!
These have been the best vids on how to understand how ram works, absolutely amazing, Thank you I've tried reading about it and it never made sense, now it does
Me: Click on "How RAM Timings work" ... watching 60 sec ... thinking "lets safe it for later, I need a fresh mind for this" ... Two weeks later .... holy 💩 - there are 2 more ... need more Energy drinks. Thank you very much for the explaination!! Learned a lot!
I learned about timings more than ever by watching this video. I like the way you tell it. Plus side is that it was entertaining. Keep posting more like this !
I learned a lot of this stuff before I became an engineer, but since DDR2, I’ve been lost since it wasn’t part of my work. This is great! Thanks for making these parts. The commentary is pretty spot on too. :) Now on to this tricky DDR5 situation….
Hi BZ, thanks for yet another great vid, I think you're explaining this in a very intuitive way, easy to follow and understand. I'm finding this series very interesting, entertaining and educational. I got a kick out of how you put the timings in logical order and how you basically told tRC where to go. Very much looking forward to more parts to this series.
@buildzoid Another really informant video. Great explanation given the complexity. Personally, I prefer having the "idle" results in there so you know exactly what it's doing at any point in time (for us newbs).
There have only been 2 platforms I've ran where I really struggled to convince myself to part with them. AMD 939, and Intel 1366. However, the difference between NF4 and X58 is that I eventually did part with my DFI SLI-DR Expert, but I just could not bring myself to part with my ASUS Rampage II Extreme. It's running an undervolted Xeon X5675 6c/12t @ 3.8GHz and serving as my home NAS/media server. I have a screenshot saved of a 345 day uptime, I think it's the most dependable rig I've ever owned. You'd have to pry my X58 from my cold dead hands, and considering how well it's apparently aging, I'm starting to think that could be a distinct possibility.
Since dependencies between timings seem well defined, it seems it would be cool to have a tool that identifies the limiting timings in a memory controller configuration. This way it would be easier to know why lowering a certain timing would not afect performance at all. Maybe there could be some autofill as well. However, considering that there are so many timings and some of them being manufacturer-dependant, it could get pretty nasty quickly. Seems doable though. It's a shame knowing a timing config is okay requires the stupidly long stress test. And the bigger capacities get, the worse. Cool video once again.
Hmm. interesting idea. Basically software would check best and worst case scenarios for certain timing values. I'm not sure but does L3/L4 try spread allocation that it can utilize all banks + bank groups when it's fetching data to cache row or write it back.
We need a program that tests every timing one by one and reports what is the lowest before the memory fails, that at least could give us an upper limit to aim for.
@@adama7752 meh, just attach it to an fpga. You could brute force the whole thing. No need for system reboots or anything like that. Just intelligent sweeps, like a binary search for each. You could end up with some “this is the best you’ll get” set of numbers. Would a reasonable amount people pay a little fee to have their RAM characterized?
This video series is fascinating, thanks for putting it together. I never realised there were specific timings for CPU vendors, I just assumed they differed in their optimisations or integration with the CPU. I'm curious as to what would break on Ryzen if the AMD engineers didn't include the tRC timing.
I think the reason for all the specific timings is the randomness of access. It's so that other devices/cores can read/write before/after another device/core has finished. This video example is based on read and write to the same bank, but that should be avoided as much as possible by the controller. Most often it is most likely a bank will be accessed by a different instruction unaware of the last operation or the next so it needs the different timing options for just in case. They could, however, calculate these timings based on the other timings and abstract away the details from the user, but without knowing how ram will improve in the future it would be restricting capability to have hardcoded timings relative to one another rather than allow them each to be set and simply building xmp profile capabilities for future enhancement custom for each manufacturer. It's confusing but so is a manual transmission compared to driving an automatic, but there's a reason we race manuals because you can get more control and stable performance from a cheaper system that way, and adapt to conditions with a bit more expertise.
I find this very interesting and informative, slight tip or suggestion though: Could you "freeze" the top/column label row in the spreadsheets? Helps to keep track of what column means what as you're scrolling.
Will explained, i wish u started by the order of commands before the end of the video. Loool but that was a great save in the end . Btw after those two parts i fixed my timing issue and i got a better results. Thank you 😊
Yes, totally not an idiot here rewatching the video for a 3rd time trying to keep up, yupyup! In all seriousness, great video! This stuff gets complicated FAST!
I really appreciate you making these videos! I love learning this stuff. I do have a question, so feel free to flame me. As you said in the video tRAS and tRC don't really make sense. At least in this scenario the intervals between each command are already well defined by other timings. Does it ever make sense to have a tRAS that's greater than tRCD+tRTP or is it just a completely redundant timing? I think the same could be said about tRC as well since it's just tRCD+tRTP+tRP. Presumably you could just set both of them to the minimum value and ignore them completely
the whole point of tRAS is to make sure that memory chips that can't handle short ACT to PRE delays don't end up getting precharged after tRCD+tRTP. Though I've never actually encountered a memory chip where ACT to PRE timing is a problem. It's usually just ACT to ACT that's a problem.
When they planned the standard they weren’t sure if they’ll need these redundant timings. In the end they didn’t, but you can’t add them after the fact. Better have them.
Very nicely done, thanks for enduring the torment of trying to make us less stupid. I really do enjoy learning as much as possible, and teaching what I know to others... this reminds me of trying to explain ethernet packet captures and protocols at a previous job. Cheers.
I think you've explained it about as well as it could be explained, and it's a mess indeed, but it's an insightful mess. Keep it up, you're doing a great job with it.
Not only are the acronyms not informative, they're also mostly remixes of the same 4 or 5 letters making them really easy to confuse Colour coding the rows would probably make the spreadsheet easier to look at. And I think the way you reordered the timings at the end helps clarify things.
Thanks Buildzoid! @ 20:41 I think what you describe as insert custom rows exist. Select a block of cells and right click on them. You will have a option "Insert cells" followed by choosing to shift the selected cells and everything after that to the right or to the bottom. In your case, I think you wanted to select something like from B16 to G17 and select "Insert Cells" then "Insert Cells and shift down" and that would add 2 rows (since you selected 2 rows worth of cells) Regarding the existance of tRC ... I think it makes sense. When you have one activate with a single read then precharge then another activate, then, yeah, it's useless. But, if you have activate, read, then another read (with whatever other timings are required, not sure if other commands too) then precharge and activate. In this case, tRAS and tRP can be lower, when summed, than tRC. This would allow to squeeze a few extra clock cycles when there's multiple read commands per activation. So for non-Samsung dies, you could have the sweet sweet tRAS of 21, tRP of 14, with the required tRC of 66 and achieve a 31 clock cycles reduction vs just having tRAS and tRP that have to be equal in sum with 66. Unless the physical construction doesn't actually allow something like that (dunno, maybe a high tRC requirement means equally high tRAS and tRP, I don't really know). To give some hopefully helpful feedback, this video was a bit harder to follow, like a bit all over the place at times. But if you pay proper attention, you can follow along. There's absolutely no need to reshoot it. Whomever is struggling can simply rewatch the relevant part(s) of the video.
This whole series needs more views. RAM timing latency affects emulation much more than regular games, it would be amazing if you teamed up with Level1techs and DigitalFoundry to discuss that! My timings are 14-13-13-12-24-30 @ 3800mhz, ToTK finally runs at a nearly locked 60fps.
Higher clock (3200MHz) requires longer timings (tRAS), so minimum operation time (ACT to PRE, ns) is retained. Operation time can be treated as constant for manufacturer's batch. So, having higher clock gives us much more granularity, more perfectly timed occasions when ordering the operations? CL is a constant, independent delay - so going low with CL is like reducing ping, and all other timings influence throughput?
I'm enjoying this so much! Also, very comparable to the timings of your brain, and why a concise technical analysis turns into a GIGAnerd rant every video, as your brain is moving way too fast for your tongue to keep up XD
It makes sense, it made so much sense that I know am even more at ease of setting my kit to xmp and just leave it as is. Thought I wanted to tinker with this kit, but once I got it I realized it's not a b-die and left the xmp on and never looked back. Ignorance is bliss. But I did love the video series, I learned something that I never knew before this so thanks for that.
I just want to say that like... finally I'm getting videos on how to understand RAM timings and how they work from one of my favorite dudes to watch. Even if it's pain 😕😂
Hmmm very interesting. Could it be possible to set tRC to a very low number, then lower the tRAS to find out the lowest possible timing for it? Then raise tRC to match it?
The memory controller knows it when you set an impossible value for a timing and uses a valid arbitrary value which you don't know. The best thing you can do is keep tRC at a valid and stable value that is greater or equals tRAS+tRP, while decreasing tRAS and increasing tRP by 1, and vice versa.
2 ปีที่แล้ว +1
Great video. I have a question tho. I have a tras of 31 and trp of 21 in a samsung c die ram wtih ryzen 3600 cpu. However if i set trc to any number below 64 i wont boot. I cant do 51 or 52 or 53 or even 60. It has to be 64. Why is that?
i might only understand like 5% of what he saying but i love it still , if i learn 1 thing every vid im still learning and i appreciate the effort he putting into this vids , ty so much for explaining and giving us all the info you sir learnd over many years
Really digging this series of videos. Been recently introduced to amdmemtweak software and gpu open suite of tools. This is my kind of tinkering. Hoping my understanding is correct but it would appear from my limited knowledge that tRC is a failsafe in case of an ACT failure. Last resort second chance of getting things rolling.
"Some of you ain't gonna get this, because you all stupid" Hahahha I love this man! After about 2 weeks of reading and watching (I'm stupid btw lol) and rewatching and rewatching I'm starting to finally get it. My conclusion so far, is that this vastly complicated world of memory timings and instruction sets seems horribly inefficient and the only reason why it is the way it is, is because making smart memory chips is too expensive for retail
Does any of this change with DDR5? Obviously until Zen4 and X670 is detailed, we can't know what "optimisations" are to come for Ryzen, also XMP/EXPO support.
Timings are generally higher, Gear 2 is there and also 1 read operation (same for write) takes 8 clock cycles (not 4) of transmitting data (1 DIMM is divided into 2x 32bit subchannels to compensate - keeps one operation equal to 512 bit = 64 byte = 1 cache line)
Thank you for doing this, also when I can see why this won't as popular as other videos. EEVBlog touched why he continues to make tutorials, despite them not being popular. I would love to make a calculator where you can enter the delay between commands and it calculates the timings or vice versa I'll join patreon for a month or buy a hoodie trying to reimburse lost view count and make up for your annoyance :P
2:30 , it feels off to me. At clock 32, " tRAS from ACT because tRCD+tCL+tRP< tRAS". What I get from your explaination, tCL should not be there. Sorry for asking late.
RAM is really stupid indeed, totally as you said in first video - now it all becomes so blatantly apparent.. like it can read different bank no problem, but if you read the same bank again - there are problem of a sudden and you need to waste bunch of time till it can finally do that read 🤣My rev.E at 3600Mbps is using tRC of 58 which is a ton. Maybe I could tighten it down, but for daily use it would be more stability testing for really negligible gains. Anyhow, great video, only like 20+ timings and other parameters left to go over :D
So basically on AMD tRC=tRAS+tRP is aspirational (unlike Intel where it's actual) and you might have to use tRC=tRAS+tRP+tZENMCTAX...? That clears up a lot of confusion on my part, thank you for illustrating this so clearly. Your estimate of my intelligence is spot on btw! :)
A few days ago I was learning about ferrite core memory and thought it was a bit silly that destructive read was just accepted. I am not happy with what I have learned in this video
Ok but how the pc know where the data it wants is? Or is this the thing for registers? And why hdd does not work the same way? Is it again because of the registers? Edit: 10:03 for the next 31 cycles, there's a mistake. If it was 32 cycles, you would issue the command in cycle 33, not 32. Edit no.2: my timings are 18-19-19-39 how can it be? I mean 19+18 so it should place on data bus on cycle 37+38+39+40 so it should be 41 no? So why 39? And can you command a precharge before/during the read? And is it even safe and would not it harm the stability? Is it dependant on chip type/manuf.?
25:09 but you said that tRC is specifically for same bank, that's the reason why it exists you also said that there is memory which specifically is sensitive to that and other timings you mentioned are for all commands, not just for same bank
i have a question for understanding: in the case that we have to read e.g. 8 bits again and again from the same adress in the memory, a 3200 ddr-4 cl16 is performing this faster than a 5600 ddr-5 cl 32? so: is it right, to say, that ddr-5 is better in handling bigger ranges of readed an written data due to its higher mhz while for smaller actions, a ddr4 performs better due to smaller timings? command rate etc. should be same on both.
Super interesting how these are interdependent! You can easily have a situation where a low value that doesn't do anything suddenly causes issues when you lower another timing. Making the too low timing suddenly the limiting factor, and causing data corruption. But because it was some other timing that was being modified, even though that timing in itself works totally ok. Good luck figuring out optimum timings without knowing how that works :D Yah, ram tuning sucks :D
Keeping things simple in most engineering scenarios is usually the best way to build. Unfortunately for Ram Developers, KISS seems to be a foreign descriptor. I am in tune with your sentiments for DDR operations. Thanks again for your efforts to give us all a thorough understanding of Ram Operations.
How common is 2 reads from a different row in the same bank. Seems like the latency for this command is quite high. Is it less common on dual rank dual channel compared to having a single rank of memory single channel because there are more banks to store data and the imc is smart enough to avoid this command if it can. Or is it completely random where data is stored in the memory. Or it depends how the program works where data is stored.
The imc does some distributing, that's why dual rank is faster than single rank, I don't know how it does that. The program gets the memory in 4096 Byte chunks called pages and can do whatever it wants with it
thx BZ for another informative video. i have a normie question, what's special about ddr5 timings that causes longer latency? is it more setups, longer commands, or something else? thx again.
Really good information. Was L3 cache also trying to utilize all banks/bank groups when it do it's line read/write access to DDR. I think it try to read/write whole cache line.
1 single operation - no, it takes 1 bank group and 1 bank to have. 1 cache line is exactly 64 bytes on x86-64: memory controller executes the same operation across 8 x8 (4 x16 memory chips or 16 x4 memory chips are also possible) memory chips to get the following: 2 (double data rate) × 64 (total width of memory chips) × 4 (clock cycles to get the data = 512 bits = 64 bytes = 1 single cache line
DDR5 - result is the same, but it is computed a bit differently 2 (double data rate) × 32 (width of 1 subchannel of memory chips) × 8 (clock cycles to transfer data) = 512 bytes = 64 bytes = 1 cache line DDR2, DDR3 - I don't know exactly. DDR3 - I suppose it is the same as DDR4 (prefetch is also 8n in DDR3)
You're getting actually good at explaining stuff concisely. Also, i have the feeling a flowchart would be very suitable for this exact topic. But you're probably already done recording this stuff. Meh.
Explanation is the best thing out there on the internet. It's a shame you seem so frustrated, I don't feel like the explanation needs to be so long winded or come with so many examples. It's just a "maximum". Maximum is a very simple concept. 19:15 I don't understand the frustration. If it's 4pm, and you have a turkey that has 2 hrs left, gravy that has 15min left, casserole that has 1 hr left, dinner is served at 6pm! From my knowledge of thanksgiving cooking, actually delaying the cooking of the gravy and casserole to time it well is 1000x harder than "figuring out" that dinner is served at 6pm, all our great mothers put so much effort on thanksgiving but my mother doesn't know anything about math or computer science and she'll tell you dinner at 6pm it's such a simple concept, it's not so frustrating you don't have to move the whole spreadsheet up and down so many times to show the bazillion different combinations of timings we get it we're just waiting for a turkey to bake 😂😂, if I start the gravy at 5 before 6 then I'll delay dinner by 10min, very simple! You don't have to actually show the entire hypothetical with the gravy to explain the concept if you tried that to my mother she'd find it a great insult as it's just so simple. Anyway great explanation but it's a lot of hullabaloo for something that someone who's never done math in their life should find very simple, it's just a maximum you don't to be so flustered and move the spreadsheets up and down for 30min. Just 5min of explaining what each number limits, with ONE example, is more than fine. Gosh I just feel 2nd-hand frustrated as well with all of the moving and copy pasting and adjusting the spreadsheet and. Well anyway. I dont remember what any of the timings are because of a lot of the way things were moved around so often. I'll look again to just hear the clear definition of what exact constraint each timing is demanding, that's all that matters, the particular timing of a sequence of commands is just limited by the maximum constraint.
So if tras is lower than trcd and trtp means the latter are limiting perf. And the opposite (tras higher) means tras is the limiter, and trtp only happens on tras (?) Does that mean trtp should be a multiplier of tras ? For example i have 10 13 13 36, and my motherboard set trtp to 9 (and twr is auto with no number shown, which ive heard is supposed to be trtp x2 ?? Im on ddr3). That means the trtp 9 is fitting good into 36 tras and lowering it would be bad? And my trcd and trtp is already far lower meaning tras is the limiting performance Its a bit hard to wrap my head around I cant see any difference in aida64 and cyberpunk benchmarks with trtp 8 twr 16 , tras 32. Vs trtp 8 twr auto tras 36.
BuildZoid is probably the only person on Earth i'd ever expect to hold grudges against specific RAM timings.
I used to buy RAM by the size and price until I bumped into his channel. Buildzoid may be a bit of a freak but the World needs him :)
I'm not 100% sure but I remember reading something about how some of the timings' names like CAS and RAS, which I believe stand for "column/row access strobe" respectively, are relics from much older types of memory and the name was just retained despite the actual purpose and behavior of the timing changing significantly.
Magnetic core memory introduced in the early 1950s uses most of the same primary timing acronyms. Its possible they predate even that.
@@christopherjackson2157 they probably date back to the mercury delay line technology.
@Sumiko Mei - RAS and CAS (Row Address Strobe and Column Address Strobe) were used on the first DRAM chips that used a multiplexed address bus - the MK4096 and MK4027 (4 k bits each chip), and simply latched the row and column addresses into the chip on the same six address lines. For what it's worth, I worked with these devices around 1978. They remained unaltered for 16k, 64k, 256k, 1M and 4M DRAM chips. the number of address lines on the chip increased by one for each generation. All of those DRAM chips were asynchronous and did not normally have timings linked to a clock. Then the first SDRAMs (Synchronous DRAM) chips came along at 4M and 16M (I think), and so clock frequency for RAM became a thing and it all started to get more complicated. DDR2, DDR3 DDR4 and DDR5 etc, have just added further layers of complexity since then but traces of the original architecture and signal names remain.
@Christopher Jackson - Magnetic core memory requires that a single row and a single column are given a current pulse at exactly the same time in order to access the required bit on a plane of cores. There may have been row and column addresses at a particular level. but they would be heavily embedded within the control logic. Also, of course, with core memory there is no requirement for memory refresh cycles. While core memory had control circuits with tricky timing requirements, they were vastly different from later DRAM architectures and varied massively from one manufacturer to another.
@@GodmanchesterGoblin Great to hear from someone who has first-hand knowledge!
I must have confused the addressing for timings based on the familiar acronym. I forget what book I read that in. My bad.
@@christopherjackson2157 No worries - I worked with core memory hardware in the mid-1970s (at the component level) and also worked with most memory chip types from 1 k bit upwards right into the mid 90s before I changed my career direction a bit. They were interesting times!
It made perfect sense. You're not accountable for the complexity of DRAM. If it is complex then the explanation will... be complex! I'm sorry you hate doing these because I like the series a lot :)
The algorithm: More exasperated Buildzoid -> enjoyable vid
Ditto lol I've never met anyone who could explain this so precisely
@@paxdriver It mostly is just a matter of reading documenattions and Datasheets. You could do this on your own, or you could listen to professor buidziods lectures.
One of the best overclockers in the world, unfortunately most of them won't share their knowledge.
Agreed. This series is awesome!
"Some of y'all ain't gonna get this" You're god damned right lmao. Thanks for the video
tRC exists to slow ACT to ACT on the same bank without slowing ACT to ACT on other banks, where the vast majority of memory operations happen. If you don't have the ability to raise tRC you must instead raise tRAS which slows all of that other stuff. This creates 5%+ Linpack performance losses on CPU's which do not have tRC when using most DDR4 IC's.
Buildzoid being frustrated at this entire topic made it so much more engaging. I've never laughed at such a complex topic whilst learning so much at the same time!
Read->Pre is dependent on the rise times of the internal sense amplifiers and latency to latch in buffer.
There will be a finite minimum time for a given voltage and temperature before the data is amplified, before this the signals can still be in flux.
Once they are ready however, activating the circuitry that charges the memory capacitors back up with the data should be fine.
So in summary:
CR=1
Command rate, signals memory what to do by holding the command 1 or 2 ticks.
tRCD=14
Time between ACTivation of memory and a read/write
tCL=14
after tRCD it takes some time to read the data off the memory
tRTP=8
PREcharge operation, for which specifically: during the read operation, memory must be written back into memory.
Can be shorter than tCL.
tRAS=21
time between ACT and next PRE commands.
Competes with tRCD+tRTP tRAS, as the longer one overrules the other.
tRP=14
buffer time to let the PRE finish
tRC = tRAS+tRP = 35
for Ryzen chips mostly, on same banks
lengthens ACT to ACT time in this scenario.
personal note:
This feels like college xD I am taking notes!
from tRAS so far, it begs the question, if I were to OC this memory i would tighten up tRCD, followed by tRTP, and test if tRAS=tRDC+tRTP. If that fails, i have to lengthen tRAS. But if that works, is there a point to pushing tRAS any shorter? Maybe in other scenarios in future videos?
your tRTP is wrong. tRTP is just the minimum delay between a READ command and PREcharge
@@ActuallyHardcoreOverclocking Thank you! Looking forward to the next parts in the series - I cant stress enough how much I appreciate you making this, despite the frustrating nature!
These have been the best vids on how to understand how ram works, absolutely amazing, Thank you I've tried reading about it and it never made sense, now it does
Me: Click on "How RAM Timings work" ... watching 60 sec ... thinking "lets safe it for later, I need a fresh mind for this" ... Two weeks later .... holy 💩 - there are 2 more ... need more Energy drinks.
Thank you very much for the explaination!! Learned a lot!
Lol, he sounds soooo pissed. 😂
I, for one, DEFINITELY appreciate this video. I'm actually beginning to understand these concepts.
I learned about timings more than ever by watching this video. I like the way you tell it. Plus side is that it was entertaining. Keep posting more like this !
I learned a lot of this stuff before I became an engineer, but since DDR2, I’ve been lost since it wasn’t part of my work. This is great! Thanks for making these parts. The commentary is pretty spot on too. :)
Now on to this tricky DDR5 situation….
Hi BZ, thanks for yet another great vid, I think you're explaining this in a very intuitive way, easy to follow and understand. I'm finding this series very interesting, entertaining and educational. I got a kick out of how you put the timings in logical order and how you basically told tRC where to go. Very much looking forward to more parts to this series.
@buildzoid Another really informant video. Great explanation given the complexity. Personally, I prefer having the "idle" results in there so you know exactly what it's doing at any point in time (for us newbs).
Appreciate you doing the video, know you don't enjoy it but I enjoy learning this stuff
Thanks BZ. I’ll probably be referring to both of these videos for some time to come. 👍🏼
It makes sense. I wish there there was more of this info around during the DDR2 to DDR3 days. I'm still optimizing my old X58 build for the fun of it.
I think that you can download some documentation for free from the jedec site
There have only been 2 platforms I've ran where I really struggled to convince myself to part with them. AMD 939, and Intel 1366. However, the difference between NF4 and X58 is that I eventually did part with my DFI SLI-DR Expert, but I just could not bring myself to part with my ASUS Rampage II Extreme. It's running an undervolted Xeon X5675 6c/12t @ 3.8GHz and serving as my home NAS/media server. I have a screenshot saved of a 345 day uptime, I think it's the most dependable rig I've ever owned. You'd have to pry my X58 from my cold dead hands, and considering how well it's apparently aging, I'm starting to think that could be a distinct possibility.
Since dependencies between timings seem well defined, it seems it would be cool to have a tool that identifies the limiting timings in a memory controller configuration. This way it would be easier to know why lowering a certain timing would not afect performance at all. Maybe there could be some autofill as well. However, considering that there are so many timings and some of them being manufacturer-dependant, it could get pretty nasty quickly. Seems doable though. It's a shame knowing a timing config is okay requires the stupidly long stress test. And the bigger capacities get, the worse. Cool video once again.
Hmm. interesting idea. Basically software would check best and worst case scenarios for certain timing values. I'm not sure but does L3/L4 try spread allocation that it can utilize all banks + bank groups when it's fetching data to cache row or write it back.
We need a program that tests every timing one by one and reports what is the lowest before the memory fails, that at least could give us an upper limit to aim for.
You could but it's a factorial problem as some of the timings interfere with each other.
@@adama7752 meh, just attach it to an fpga. You could brute force the whole thing. No need for system reboots or anything like that. Just intelligent sweeps, like a binary search for each. You could end up with some “this is the best you’ll get” set of numbers. Would a reasonable amount people pay a little fee to have their RAM characterized?
@@float32 why an fpga? The boot loader that already exists on your motherboard is capable.
This video series is fascinating, thanks for putting it together.
I never realised there were specific timings for CPU vendors, I just assumed they differed in their optimisations or integration with the CPU. I'm curious as to what would break on Ryzen if the AMD engineers didn't include the tRC timing.
I think the reason for all the specific timings is the randomness of access. It's so that other devices/cores can read/write before/after another device/core has finished.
This video example is based on read and write to the same bank, but that should be avoided as much as possible by the controller. Most often it is most likely a bank will be accessed by a different instruction unaware of the last operation or the next so it needs the different timing options for just in case.
They could, however, calculate these timings based on the other timings and abstract away the details from the user, but without knowing how ram will improve in the future it would be restricting capability to have hardcoded timings relative to one another rather than allow them each to be set and simply building xmp profile capabilities for future enhancement custom for each manufacturer.
It's confusing but so is a manual transmission compared to driving an automatic, but there's a reason we race manuals because you can get more control and stable performance from a cheaper system that way, and adapt to conditions with a bit more expertise.
Buildzoid: "Y'all stupid"
Me: "That's a bit harsh but let me watch the video anyway"
(5 seconds later)
Me: "Okay I'm pretty stupid"
This man is blasting all of us commenters. In all fairness he's right, most of us are pretty stupid.
Anyone watching this video has realized they're dumb and is here trying to fix that :)
I find this very interesting and informative, slight tip or suggestion though:
Could you "freeze" the top/column label row in the spreadsheets? Helps to keep track of what column means what as you're scrolling.
Thank you for this, I couldn't watch till the weekend, but this journey is really interesting and we're only getting started
i appreciate you being willing to do it anyways, despite how much of a mess it's going to become.
this one running a bit long isn't that big a deal.
Enjoying this series, I stopped doing memory designs before DDR came in, so it's great to catch up on the more modern stuff. Keep on going.
Been setting my tRAS wrong all this time, brought my latency down quite significantly after adjusting it to tRCD+tRTP thanks buildzoid!
Will explained, i wish u started by the order of commands before the end of the video. Loool but that was a great save in the end . Btw after those two parts i fixed my timing issue and i got a better results. Thank you 😊
Yes, totally not an idiot here rewatching the video for a 3rd time trying to keep up, yupyup! In all seriousness, great video! This stuff gets complicated FAST!
its not complicated but it is. Its like math, simple concepts that need understanding of previous contents to contextualize what youre seeing
I really appreciate you making these videos! I love learning this stuff. I do have a question, so feel free to flame me. As you said in the video tRAS and tRC don't really make sense. At least in this scenario the intervals between each command are already well defined by other timings. Does it ever make sense to have a tRAS that's greater than tRCD+tRTP or is it just a completely redundant timing? I think the same could be said about tRC as well since it's just tRCD+tRTP+tRP. Presumably you could just set both of them to the minimum value and ignore them completely
the whole point of tRAS is to make sure that memory chips that can't handle short ACT to PRE delays don't end up getting precharged after tRCD+tRTP. Though I've never actually encountered a memory chip where ACT to PRE timing is a problem. It's usually just ACT to ACT that's a problem.
@@ActuallyHardcoreOverclocking I see. Thanks for the answer :)
When they planned the standard they weren’t sure if they’ll need these redundant timings. In the end they didn’t, but you can’t add them after the fact. Better have them.
Thx for the second installment! Eagerly waiting for more complexity as this is just child's play :D
Very nicely done, thanks for enduring the torment of trying to make us less stupid. I really do enjoy learning as much as possible, and teaching what I know to others... this reminds me of trying to explain ethernet packet captures and protocols at a previous job. Cheers.
I think you've explained it about as well as it could be explained, and it's a mess indeed, but it's an insightful mess. Keep it up, you're doing a great job with it.
now I love memory oc even more, wow, thank you.
Thank you very much mate, hats off. Very educative for someone interested in topic.
Not only are the acronyms not informative, they're also mostly remixes of the same 4 or 5 letters making them really easy to confuse
Colour coding the rows would probably make the spreadsheet easier to look at. And I think the way you reordered the timings at the end helps clarify things.
Isn't RAS Row Access Strobe? Act to PRE just describes what it does.
Thanks Buildzoid!
@ 20:41 I think what you describe as insert custom rows exist. Select a block of cells and right click on them. You will have a option "Insert cells" followed by choosing to shift the selected cells and everything after that to the right or to the bottom. In your case, I think you wanted to select something like from B16 to G17 and select "Insert Cells" then "Insert Cells and shift down" and that would add 2 rows (since you selected 2 rows worth of cells)
Regarding the existance of tRC ... I think it makes sense. When you have one activate with a single read then precharge then another activate, then, yeah, it's useless.
But, if you have activate, read, then another read (with whatever other timings are required, not sure if other commands too) then precharge and activate. In this case, tRAS and tRP can be lower, when summed, than tRC. This would allow to squeeze a few extra clock cycles when there's multiple read commands per activation. So for non-Samsung dies, you could have the sweet sweet tRAS of 21, tRP of 14, with the required tRC of 66 and achieve a 31 clock cycles reduction vs just having tRAS and tRP that have to be equal in sum with 66. Unless the physical construction doesn't actually allow something like that (dunno, maybe a high tRC requirement means equally high tRAS and tRP, I don't really know).
To give some hopefully helpful feedback, this video was a bit harder to follow, like a bit all over the place at times. But if you pay proper attention, you can follow along. There's absolutely no need to reshoot it. Whomever is struggling can simply rewatch the relevant part(s) of the video.
This whole series needs more views. RAM timing latency affects emulation much more than regular games, it would be amazing if you teamed up with Level1techs and DigitalFoundry to discuss that! My timings are 14-13-13-12-24-30 @ 3800mhz, ToTK finally runs at a nearly locked 60fps.
"some of ya'll ain't gonna get this" ROFLMAO
Higher clock (3200MHz) requires longer timings (tRAS), so minimum operation time (ACT to PRE, ns) is retained. Operation time can be treated as constant for manufacturer's batch.
So, having higher clock gives us much more granularity, more perfectly timed occasions when ordering the operations?
CL is a constant, independent delay - so going low with CL is like reducing ping, and all other timings influence throughput?
I so love this. Awesome!
I'm enjoying this so much!
Also, very comparable to the timings of your brain, and why a concise technical analysis turns into a GIGAnerd rant every video, as your brain is moving way too fast for your tongue to keep up XD
Thanks a lot for explaining this all
"Some of you arent gunna get this, but I can't help that y'all are stupid"
-BZ 2022 Lol
It makes sense, it made so much sense that I know am even more at ease of setting my kit to xmp and just leave it as is. Thought I wanted to tinker with this kit, but once I got it I realized it's not a b-die and left the xmp on and never looked back. Ignorance is bliss. But I did love the video series, I learned something that I never knew before this so thanks for that.
Thanks again for the great explanation
just perfect! part 1 and 2 are perfectly understandable
Thanks BZ! 🙌
"because y'all stupid" raises hand...
Signature "anyway" right after had me breathing out audibly through my nose.
thanks for making a part 2 for a particular GamersNexus video about ram
1:16
Buildzoid: tRAS on the other hand is weird
Subtitles: tear ass...
This was actually excellent
I just want to say that like... finally I'm getting videos on how to understand RAM timings and how they work from one of my favorite dudes to watch. Even if it's pain 😕😂
Great Video, thanks for the explanation.😅
Hmmm very interesting.
Could it be possible to set tRC to a very low number, then lower the tRAS to find out the lowest possible timing for it? Then raise tRC to match it?
The memory controller knows it when you set an impossible value for a timing and uses a valid arbitrary value which you don't know. The best thing you can do is keep tRC at a valid and stable value that is greater or equals tRAS+tRP, while decreasing tRAS and increasing tRP by 1, and vice versa.
Great video. I have a question tho. I have a tras of 31 and trp of 21 in a samsung c die ram wtih ryzen 3600 cpu. However if i set trc to any number below 64 i wont boot. I cant do 51 or 52 or 53 or even 60. It has to be 64. Why is that?
because C-die sucks.
@@ActuallyHardcoreOverclocking 😭
i might only understand like 5% of what he saying but i love it still , if i learn 1 thing every vid im still learning and i appreciate the effort he putting into this vids , ty so much for explaining and giving us all the info you sir learnd over many years
Really digging this series of videos. Been recently introduced to amdmemtweak software and gpu open suite of tools. This is my kind of tinkering. Hoping my understanding is correct but it would appear from my limited knowledge that tRC is a failsafe in case of an ACT failure. Last resort second chance of getting things rolling.
Classic Buildzoid, telling his viewers they might be stupid. He is not wrong.
the bit at 14:40 is a very good summation.
"Some of you ain't gonna get this, because you all stupid"
Hahahha I love this man!
After about 2 weeks of reading and watching (I'm stupid btw lol) and rewatching and rewatching I'm starting to finally get it.
My conclusion so far, is that this vastly complicated world of memory timings and instruction sets seems horribly inefficient and the only reason why it is the way it is, is because making smart memory chips is too expensive for retail
Does any of this change with DDR5? Obviously until Zen4 and X670 is detailed, we can't know what "optimisations" are to come for Ryzen, also XMP/EXPO support.
Timings are generally higher, Gear 2 is there and also 1 read operation (same for write) takes 8 clock cycles (not 4) of transmitting data (1 DIMM is divided into 2x 32bit subchannels to compensate - keeps one operation equal to 512 bit = 64 byte = 1 cache line)
10:44 Exactly!
Edit: TY for the video!
Thank you for doing this, also when I can see why this won't as popular as other videos. EEVBlog touched why he continues to make tutorials, despite them not being popular.
I would love to make a calculator where you can enter the delay between commands and it calculates the timings or vice versa
I'll join patreon for a month or buy a hoodie trying to reimburse lost view count and make up for your annoyance :P
inb4 BZ looses his mind about FAW
Oh and some of these timing names actually come from the time before DRAM...which means magnetic core memory.
Ram timing are black magic to most normal people
2:30 , it feels off to me. At clock 32, " tRAS from ACT because tRCD+tCL+tRP< tRAS". What I get from your explaination, tCL should not be there. Sorry for asking late.
This explains why I never got performance from tras on ryzen, but it was basically the only thing that seemed to make a difference on intel.
Thank you 🎉
Great vid, makes sense!
RAM is really stupid indeed, totally as you said in first video - now it all becomes so blatantly apparent.. like it can read different bank no problem, but if you read the same bank again - there are problem of a sudden and you need to waste bunch of time till it can finally do that read 🤣My rev.E at 3600Mbps is using tRC of 58 which is a ton. Maybe I could tighten it down, but for daily use it would be more stability testing for really negligible gains.
Anyhow, great video, only like 20+ timings and other parameters left to go over :D
Excellent Thank you
No, you're good mate, it makes sense.
So basically on AMD tRC=tRAS+tRP is aspirational (unlike Intel where it's actual) and you might have to use tRC=tRAS+tRP+tZENMCTAX...? That clears up a lot of confusion on my part, thank you for illustrating this so clearly. Your estimate of my intelligence is spot on btw! :)
A few days ago I was learning about ferrite core memory and thought it was a bit silly that destructive read was just accepted.
I am not happy with what I have learned in this video
Ok but how the pc know where the data it wants is? Or is this the thing for registers? And why hdd does not work the same way? Is it again because of the registers? Edit: 10:03 for the next 31 cycles, there's a mistake. If it was 32 cycles, you would issue the command in cycle 33, not 32. Edit no.2: my timings are 18-19-19-39 how can it be? I mean 19+18 so it should place on data bus on cycle 37+38+39+40 so it should be 41 no? So why 39? And can you command a precharge before/during the read? And is it even safe and would not it harm the stability? Is it dependant on chip type/manuf.?
tRC can be used to help prevent the "Row Hammer" Exploit.
25:09 but you said that tRC is specifically for same bank, that's the reason why it exists
you also said that there is memory which specifically is sensitive to that
and other timings you mentioned are for all commands, not just for same bank
i have a question for understanding: in the case that we have to read e.g. 8 bits again and again from the same adress in the memory, a 3200 ddr-4 cl16 is performing this faster than a 5600 ddr-5 cl 32? so: is it right, to say, that ddr-5 is better in handling bigger ranges of readed an written data due to its higher mhz while for smaller actions, a ddr4 performs better due to smaller timings? command rate etc. should be same on both.
Super interesting how these are interdependent!
You can easily have a situation where a low value that doesn't do anything suddenly causes issues when you lower another timing. Making the too low timing suddenly the limiting factor, and causing data corruption. But because it was some other timing that was being modified, even though that timing in itself works totally ok. Good luck figuring out optimum timings without knowing how that works :D
Yah, ram tuning sucks :D
Keeping things simple in most engineering scenarios is usually the best way to build. Unfortunately for Ram Developers, KISS seems to be a foreign descriptor. I am in tune with your sentiments for DDR operations. Thanks again for your efforts to give us all a thorough understanding of Ram Operations.
tRAS is probably Row Activation Sustain
tRC probably Row Cycling
you can have a row active for hundereds of clock cycles.
How common is 2 reads from a different row in the same bank. Seems like the latency for this command is quite high. Is it less common on dual rank dual channel compared to having a single rank of memory single channel because there are more banks to store data and the imc is smart enough to avoid this command if it can.
Or is it completely random where data is stored in the memory. Or it depends how the program works where data is stored.
The imc does some distributing, that's why dual rank is faster than single rank, I don't know how it does that.
The program gets the memory in 4096 Byte chunks called pages and can do whatever it wants with it
That's what tRas is named for, "random ass shit"
thx BZ for another informative video. i have a normie question, what's special about ddr5 timings that causes longer latency? is it more setups, longer commands, or something else? thx again.
Can you do a dumb down vision. I have no idea what's going on.
Hugs...
No One on this plane of existence has explained RAM timings so well.
Really good information. Was L3 cache also trying to utilize all banks/bank groups when it do it's line read/write access to DDR. I think it try to read/write whole cache line.
1 single operation - no, it takes 1 bank group and 1 bank to have. 1 cache line is exactly 64 bytes on x86-64: memory controller executes the same operation across 8 x8 (4 x16 memory chips or 16 x4 memory chips are also possible) memory chips to get the following:
2 (double data rate) × 64 (total width of memory chips) × 4 (clock cycles to get the data = 512 bits = 64 bytes = 1 single cache line
DDR5 - result is the same, but it is computed a bit differently
2 (double data rate) × 32 (width of 1 subchannel of memory chips) × 8 (clock cycles to transfer data) = 512 bytes = 64 bytes = 1 cache line
DDR2, DDR3 - I don't know exactly.
DDR3 - I suppose it is the same as DDR4 (prefetch is also 8n in DDR3)
@@volodumurkalunyak4651 Excellent answers.
You're getting actually good at explaining stuff concisely.
Also, i have the feeling a flowchart would be very suitable for this exact topic. But you're probably already done recording this stuff. Meh.
4:10 "You are not gonna get it because you all stupid" ... lol
Love this
So for performance is better to have tRP higher than usual (25+) and very low tRC (32) ? Or having low tRP (14 to 18) and higher tRC at 60+?
Thanks for such a quality explanation. Could you please share the spreadsheets used in this video series?
Explanation is the best thing out there on the internet. It's a shame you seem so frustrated, I don't feel like the explanation needs to be so long winded or come with so many examples. It's just a "maximum". Maximum is a very simple concept.
19:15 I don't understand the frustration. If it's 4pm, and you have a turkey that has 2 hrs left, gravy that has 15min left, casserole that has 1 hr left, dinner is served at 6pm! From my knowledge of thanksgiving cooking, actually delaying the cooking of the gravy and casserole to time it well is 1000x harder than "figuring out" that dinner is served at 6pm, all our great mothers put so much effort on thanksgiving but my mother doesn't know anything about math or computer science and she'll tell you dinner at 6pm it's such a simple concept, it's not so frustrating you don't have to move the whole spreadsheet up and down so many times to show the bazillion different combinations of timings we get it we're just waiting for a turkey to bake 😂😂, if I start the gravy at 5 before 6 then I'll delay dinner by 10min, very simple! You don't have to actually show the entire hypothetical with the gravy to explain the concept if you tried that to my mother she'd find it a great insult as it's just so simple.
Anyway great explanation but it's a lot of hullabaloo for something that someone who's never done math in their life should find very simple, it's just a maximum you don't to be so flustered and move the spreadsheets up and down for 30min. Just 5min of explaining what each number limits, with ONE example, is more than fine.
Gosh I just feel 2nd-hand frustrated as well with all of the moving and copy pasting and adjusting the spreadsheet and. Well anyway. I dont remember what any of the timings are because of a lot of the way things were moved around so often. I'll look again to just hear the clear definition of what exact constraint each timing is demanding, that's all that matters, the particular timing of a sequence of commands is just limited by the maximum constraint.
So if tras is lower than trcd and trtp means the latter are limiting perf. And the opposite (tras higher) means tras is the limiter, and trtp only happens on tras (?) Does that mean trtp should be a multiplier of tras ?
For example i have 10 13 13 36, and my motherboard set trtp to 9 (and twr is auto with no number shown, which ive heard is supposed to be trtp x2 ?? Im on ddr3). That means the trtp 9 is fitting good into 36 tras and lowering it would be bad? And my trcd and trtp is already far lower meaning tras is the limiting performance
Its a bit hard to wrap my head around
I cant see any difference in aida64 and cyberpunk benchmarks with trtp 8 twr 16 , tras 32. Vs trtp 8 twr auto tras 36.
I wonder if a program can cause the IMC to read and not precharge, for example if a program reads the data and destroys it immediately.
There should be a "read destructively" command for occasions where data is constantly swapped for new data. Lot's of time to be saved. Good point!
There may or may not be the link to part one somewhere 😄
He could not resist but revealed how it all works at 35:30