@Anastasi In Tech - What about using i-squared-l logic and/or vacuum channel FETs, possibly on chiplets? I2L seemed very promising when first introduced, but it's power consumption was high since transistors were all large at that time. As a bipolar technology it will not suffer from gate leakage problems. Are there any other reasons why it might not work? As for "vacuum" channel FETs, they are 10 times faster or more, partly because they use free electrons. They also benefit from nanoscale features, are extremely radiation resistant, and they can operate comfortably at temperatures up to hundreds of degrees Celsius. Also they don't actually require vacuum when built at small nanoscales.
@@fluiditynz - I left my comments and questioon here because it is the most likely place for her to see it. Nothing to do with the laptop she's promoting.
When I first started programming, and RAM was off chip and typically a few KB, we'd spend a lot of dev time working out how to do as much as possible in as little RAM as possible and as few clock cycles as possible. These days the demands to cut development time and get new features out, more driven by senior management and Product Owners than by real customer demand, seems to have ditched those ideas. If it's too slow the customer is expected to just buy a higher spec machine and new developers are taught ways to shorten development time but not execution time. I think that this is a false economy. About 10 years ago I was able to shorten a big data-processing job from 3 days to under 20 minutes, on the same hardware, by applying the techniques I'd learned back in the 1980s to key functions. It took me 5 days, but when this is something that has to be run every week the saving soon stacks up
You are absolutely right. Once I participated in a service job to get a power station running. The problem was to bring the gas engines up and running as fast as possible. After a few days the programmer had been flown in and looked for alternative assembler commands to save a clock cycle here and a clock cycle there.😁
Wirth's Corollary to Moore's Law: Any improvement in Hardware performance will be negated by code bloat at an equivalent rate. Kinda like traffic in London.
It's not a false economy, just a different emphasize due to the change in price structure. In the old days, memory were expensive, so we tried to economize its use. Today's memory are so cheap, that software developing time has become the most expensive part of a system.
@@gorilladisco9108 the cost of memory is largely immaterial. It’s the cost of execution time. Say you’ve got a transaction that currently takes 10 minutes to complete but if the code was optimised would take 7 minutes. To optimise the code would take the developer an extra 5 days effort and the developer earns £30 an hour (that’s the mid-point for a developer where I work), so that’s about £1100 wage cost but once it’s done that cost is done. Once rolled out the application is used by 200 people paid £16 an hour (I have some specific applications we use in mind here). Saving 3 minutes per transaction means either those same staff can process 30% more transactions or we can lose 60 staff at a saving of just over £7000 a day. That extra development time would repay in a little over an hour on the first day and after that would be pure cost saving.
You really have a knack for making complex topics engaging and easy to follow for everyone! Breaking down the challenges of SRAM and introducing phase change memory in such a clear manner is no small feat. Excited for more content like this!
The problem with chiplet design is heat management. Since every layer is active, it burns energy and produces heat, and this isn't good. A secondary problem is the bus interconnect because stacking requires shared lanes, so memory layers are in parallel, making the bus interconnect a bottleneck. Last but not least is signal strength and propagation time: stacking layers requires precise alignment and add electron jumping around, so there's a potential limiting factor in electron propagation, noise and eventual errors. This isn't much of a problem if the system is built around it, but it still is a limiting factor. There are solutions: since there's one master and multiple slaves there's no risk of collisions and so you can make a lot of assumptions on the drawing board... but busses are going to become wider and more complex, and that will add latency where you don't want it. My 2 cents.
- I wonder if they run veins of metal in between the layers to send the heat to radiator. - They put L3 cache on the second layer, which by virtue is quite removed from the logic circuits.
I'd be curious about the thermodynamic side effects of phase change memory during transitions as the crystallisation would release heat while amorphization would be cooling
@@nicholasfigueiredo3171 Given the way she talks, I would had guessed her field was steering investors into various markets. The technical run down is useful, but the whole discussion is still clearly framed like "guess whats going to make a lot of money in the near future?".
She is a technology communicator. Learn what science is, please. I'm guessing you are a Republican, so I wouldn't expect you to understand the difference.
The point "good endurance 2*10^8 cycles" prohibits its use for cache memory. But it's really a viable and competitive option as a replacement for Flash memory!
Thanks. Amazing video. It's kind of interesting how it always comes down to the same principles. First shrinking the size in 2D, then layering stuff, and eventually going into the 3rd dimension. And when that reaches its limits, then change the packaging and invent some hybrid setup. Next, change the materials and go nano or use light etc. instead. Even the success criteria are usually similar: energy consumption, speed or latency, size and area, cost of production, reliability and defect rate, and the integration with the existing ecosystem.
Very interesting. Thanks for sharing your expertise. There is always something interesting in your videos. At least in the three or four i have seen so far.😊
Nice idea. Very similar to Nantero NRAM that also uses Van der Walls effect to provide resistive cells using carbon nanotubes for SSD/DRAM universal memory. I've been waiting for NRAM for 20 years, and it is only now beginning to make it's way into the data centre. Let's hope that this technology takes less time to mature.
I used to use magnetic memory... when I worked on DEC PDP-8e. It was called core memory, you could actually see the core magnets and wires that were wrapped around the cores.
I worked on micron/intels PCM, optane, for a few years. While we were making peogress on some of the problems you mentioned, the venture ultimately failed due to the economics of producing the chips as well as a lack of customers. Would be cool to see it make a comeback in the future
I thank you for your service. When intel announced that they were ending optane, I bought 6 of those PCIE drives; I caught a fire sale. Those drives are the fastest drives I have for doing some disk intensive Studio work. I wish they could've gotten the price down around $100-$200 dollars for the good stuff. I actually got 6 optanes for $45 a piece. I lucked up and bought a box.
PCM memory chip technology has been in R&D since the mid 2000s. Intel, StMicroelectronics and Ovonyx were in the game together in a joint development starting around 2005. Samsung was also doing research in PCM. I believe the biggest player now in Micron Technology.. And you are correct about all the advantages of PCM. I believe the two big challenges are being able program the device.into two or more distinct, well defined resistance states reliably coupled with manufacturing very small structures with precise dimensions. Nvidea is talking about PCM.
This sort of tech is very interesting, because depending on how it advances, it stands to change the computing landscape in one or more different ways. If Phase-Change Memory is fast enough and gets good enough density, it can replace SRAM in L3 cache. If the speed cannot get high enough, it could still find use as an L4 cache or a replacement for DRAM. If all else fails, I bet it could give Flash storage a run for its money.
It has been discussed for decades that close stacking of chips has advantages of speed and size. The issue is heat generation, thus trying to reduce the total charge (electron count per bit). New memory technology is required with far smaller charge transfered per operation.
Thank you for your presentation. I found it fascinating. The phase change memory, amorphous crystal back to uniform array crystal seems like the mental models used to explain demagnetization around the currie point.
As always fantastic work. I am not so enthusiastic right now with the new technology an endurance of 2E8 is amazing for something like storage, but the computer will go over that in no time for the cache. Even a microprocessor that is not super scalar and runs on the ghz range will be accessing memory in the other of 10^9 per second. Clearly, that access is per cell, and not for the full memory but they need to improve that number a lot.
The words "dynamic" and "static" are a reference to the powering method between state changes. You kind of hinted at this with the TTL logic diagram, but didn't expand. Static is faster because it doesn't have to wait for the re-fresh cycles before it can change state. Static also runs hotter and consumes more power- there are no free lunches ;-)
Not exactly. DRAM consumes power all the time, because it needs constant refresh to preserve contents. SRAM only consumes power during state change. Both consume some leakage current though, and with that, SRAM consumes more due to having more transistors per bit cell. DRAM also consumes considerable current to change state, because of its larger gate capacitance. Overall, DRAM tends to consume more power per bit but costs less and is more compact, which is why we use it for main memory and reserve SRAM for cache and internal registers.
My fave memory joke: Stand in the nutritional supplement section of a store and look at the products with a confused expression. When someone else is nearby, ask "Do you remember the name of that one that's supposed to help memory?"
Paul Schnitzlein taught me how to design static RAM cells. This video speaks to me. Yes the set/clear, and sense amps are all in balance. It is an analogish type circuit that can burn a lot of power when being read.
Great stuff. As someone who built their own desktops through computer conventions in the 90s I appreciate you bringing me up to date on where we stand now in personal computer development😊
Thank you for this video. It's great. My two issues: (1) heat dissipation, is not addressed (over cycles there is growth of H.A.Z.), (2) One thing I heard about and remember vaguely, was an attempt at self healing logics (rather, materials + control circuitry), which is aimed at reducing the need for redundancy, in elements at the core of the chip (hottest and fastest environment), and attempts to also better the chip lifetime (cycles 'til dead). -I would be grateful if you could address both.
Great video - thank you Anastasi :-) I think if we stack much more memory as 3rd level cache chiplets on top of CPUs we may reach the size of gigabyte 3rd level cache. And this would eliminate the external DIMMs on the mainboard which makes future Notebooks and PC again cheaper and reduces not just the complexity of the mainboard but also of the operating system, drivers and firmware because data can be loaded directly via fast PCIe lanes connected SSDs to 3rd level cache.
One of the chief benefits I can see in going to optical computing is the ability to have associative addressing through polarization and muliple concurrent optical reading/writing heads for raid like processing.
Loved the graph you put together with the memory pyramid (access time vs where is used, with volatility information)!! P.S. Your accent also becomes more and more easy to understand!
I love the way you explain the topic it gets me thinking even though I have no idea. Like possibly folding the memory and interconnecting them to form cubes cause I always see dies represented in 2d. Like I said, not my field.
Thanks for the updates, really informative... I was working on OTP memory designs and this new time of glass memory is looking similar to the concept of OTP memory, may be we can see this kind of evolution in OTP memories side also.
Stacking silicon...who woulda thought ...now it makes perfect sense for chip real estate. Thank you for your brilliant assessment of the latest chip technology. You have expanded my knowledge regularly.
It's quite bizarre that you thought the PCM memory is a future replacement of SRAM, as the it has a switching speed of 40ns (on par with DRAM), according to the paper you cited. This is an order of magnitude slower than SRAM. The current only viable option to replace SRAM is SOT-MRAM, which TSMC is working on. Go research SOT-MRAM😁
It also involves a physical change to the medium, which means wear and limited number of writes. I believe a similar principle has been around since at least the 90s. I used to have a CD-R/W type device that used a laser to heat up spots of a special metallic medium, changing it from smooth to amorphous. Could be rewritten some number of times. I will say though, your point is probably good and valid, but could have been made more constructively.
This is true. PCM is totally useless as SRAM replacement and doesn’t have sufficient speed or rewrite resilience. Honestly, she really failed to understand its use case. It’s a great alternative to floating-gate FLASH memory, not SRAM!
Thank u for pointing this out!👍Not just on chip SRAM memory, but operating memory in general has a lot of catchup to do with the compute logic not only because of the limitation of further shrinking SRAM and demand from the AI workloads. Operating memory has been historically left behind the compute logic and in a way ignored "nature's" way of things (brain neurons) by being the same and as fast as compute/processing while having sufficient capacity. Maybe PCM or other memory technologies will deliver that in the future, however I agree with u that L1 cache will most definitely continue to use SRAM for the foreseeable future and L2/3/4 with larger capacities will most likely go first with the stacked SRAM before moving to new technology like PCM or resistive memory.
Thank you! Just wanted to say you have a mistake in the figure you show (e.g. 12:38) labelling the latency of flash memory as milliseconds (1/1000s) when, as you say in the audio, the latency is in microseconds (1/1000000s)
the NVMe slotted in the DDR5 slot - direct access storage - skipping a part of memory all over.. the system boots from storage/memory - slot 2,3,4 are real RAM just a dime throw
I had thought of building memory (and the whole IC) in 3D 10 years ago. I think I even put the idea in my website years ago. One part of my idea that is not used yet is using microfluidics to cool the chips that are stacking transistors in 3D, thus restricting heat transfer. The channels could run many levels, and of course, they need fluid-tight connections (a big problem). And use optics to communicate instead of a BUS. Possibly LED or laser tech.
I remember hearing about the SRAM scalling issue some time before the Zen4 release, but then haven't heard anything even though I kept hearing about shinking nodes. Been curious what was coming of that. I was thinking that since it's not benefiting from the scaling, if it may have been counterproductive regarding degradation etc. I wonder if that is what is happening with the Intel 13 and 14K skus? I guess we will find out soon enough. Thanks for the update, I'm glad they are on top of it!
The BCM2837 SoC chip uses stacked RAM. The Raspberry Pi Foundation released the Pi Zero 2W in 2022 using it. So who stacked first? Regardless of who, it’s great to hear the designers are finding solutions to such huge (microscopic) problems!
Xilinx's (now AMD) HBM products were combining FPGAs with DRAM chiplets on a silicon interconnect substrate back in 2018. Altera released similar tech a year later.
I believe that down the line we would need to use another processor architecture than the Von Neumann one that we use today (i.e. having logic and memory separated), an architecture that instead has an "on memory compute" design, or perhaps a mix of them. In the end the speed of light makes it hard to compute over longer distances (i.e. CM or even MM) specially when the frequency goes up and the data becomes even larger.
It should be mentioned that process node sizes like N3 or N5 nodes are density measurements and not actually a transistor size. Intel 10nm was equivalent to TSMC 7nm as they average over different area sizes and utilize different shapes and can't be compared directly or even with the size of a silicon atom which is only 0.1 nm in "size".
One way of attacking the Memory Wall hierarchy is to attack it from the top, use RLDRAM which has been around for >25 years but only in NPUs (network PUs) since it offers DRAM cycle rates closer to 1ns but latency of 10ns or 8 clocks. Since it is highly banked, 16-64 banks working concurrently allows for 8 memory accesses every 8 clocks so throughput is 2 orders better than conventional DRAM. Of course in single thread use, not much benefit and to keep as many threads in flight requires that thread selects pseudo randomly across the banks and not hit on the same bank successivly.This could be used as an extra layer between normal DRAM on slow DIMM packages and the first SRAM cache level. This RLDRAM layer is where it would be used in CAM modules or soldered. We are substituting The Memory Wall for a Thread Wall here. But we already are used to having dozen threads these days. The RLDRAM model could be applied one level lower down in an RLSRAM version which would be perhaps several times faster but allow bank cycles and latency near 1-2ns but still 8 clocks and 16 banks.
Interestingly, computing is still based upon an electron pump system when the Spherical Photon Well combines logic and data storage in one system that moves at the speed of light.
I worry about using non-volatile memory for primary or cache memory because of the security aspect. If the information remains after power is interrupted, quite a few "secrets" will be in clear text, and the determined and well equipped "bad actor" will be able to extract surprising amounts of information from a system. My industry has to issue letters of volatility with everything we produce, and for anything with NVM, the sanitization procedure usually involves removing the part with non-volatile storage and destroying it. The only exception is when it can be proven that the hardware is incapable of writing to that NVM from any component present on the assembly, even if malicious or maintenance software is loaded onto the device. This phase change memory built in the same package as the CPU logic could not be provably zeroized without some sort of non-bypassible hold up power, and that would increase the cost and size of the chip package. I think this is very promising for secondary addressable storage, but I don't see it replacing main memory in most applications.
What about keeping the heat down. Sure lower power required in some case but stacking should also increase the requirement for improved cooling perhaps?
Many years ago I wondered why transistors and memory were not stacked in 3D in layers. I figured it was because of heat. My solution to that was microfluidics and possibly sodium to carry it away. I also thought light pipes (lasers) could replace the metal bus.. A lot of work to make it to production as the hardware needs to accommodate new kinds of connections.
Non-volatile and low-latency at the same time, coupled with scalability and hopefully cost-effectiveness in manufacturing, would be a huge technological leap. Thank you for the information.
Bus speed is an issue but have a look at IBM z mainframes and the high-speed optical link that they use to combine and share the L1 through L4 memory caches between the different die
what really excites me about this new PCM technology is it's analogue compatibility. i really think APUs will catch on within the next 10 years or so. and this type of RAM is perfect for that application
My concern with the phase change memory is just the lifetime and reliability. Do the cells grow oxides or change chemistry over time? Can they be ruined by ripple or electrical noise at scale that hasn't been discovered yet? Etc. Love your videos!
Although I do not comprehend all the things you mentioned, what I do understand I find very fascinating. Yours and videos of others help me to decide on what companies and technologies in which to invest (= gambling) at the Wall Street Casino. Investing in stock is like playing Black Jack. The more you know such as via "card counting", the better your chances of winning. For me, your advice is akin to card counting when it comes to gambling on stock purchases. Thanks for your insight in this realm. BTW, my 1st computer was an Atari 800XL which I purchased in 1985. I also wrote code in Atari Basic and in HiSoft Basic. Ten years later, I used the program I wrote to analyze the data for my Master's degree in Human Nutrition. With the Windows computers, writing code now has become too complicated for me, so I have given up on that endeavor.
Check out New ASUS Vivobook S 15: asus.click/vbs_anastasi
Please explain why we don't have quantum computers with Ning Li's room temperature superconductor?
@Anastasi In Tech - What about using i-squared-l logic and/or vacuum channel FETs, possibly on chiplets? I2L seemed very promising when first introduced, but it's power consumption was high since transistors were all large at that time. As a bipolar technology it will not suffer from gate leakage problems. Are there any other reasons why it might not work? As for "vacuum" channel FETs, they are 10 times faster or more, partly because they use free electrons. They also benefit from nanoscale features, are extremely radiation resistant, and they can operate comfortably at temperatures up to hundreds of degrees Celsius. Also they don't actually require vacuum when built at small nanoscales.
@@YodaWhat This is about Anastasi's Asus Vivobook commercial she boldly snuck into her main content?
@@fluiditynz - I left my comments and questioon here because it is the most likely place for her to see it. Nothing to do with the laptop she's promoting.
xoxooxoxoxooxox
When I first started programming, and RAM was off chip and typically a few KB, we'd spend a lot of dev time working out how to do as much as possible in as little RAM as possible and as few clock cycles as possible. These days the demands to cut development time and get new features out, more driven by senior management and Product Owners than by real customer demand, seems to have ditched those ideas. If it's too slow the customer is expected to just buy a higher spec machine and new developers are taught ways to shorten development time but not execution time. I think that this is a false economy. About 10 years ago I was able to shorten a big data-processing job from 3 days to under 20 minutes, on the same hardware, by applying the techniques I'd learned back in the 1980s to key functions. It took me 5 days, but when this is something that has to be run every week the saving soon stacks up
You are absolutely right. Once I participated in a service job to get a power station running. The problem was to bring the gas engines up and running as fast as possible. After a few days the programmer had been flown in and looked for alternative assembler commands to save a clock cycle here and a clock cycle there.😁
Wirth's Corollary to Moore's Law:
Any improvement in Hardware performance will be negated by code bloat at an equivalent rate.
Kinda like traffic in London.
It's not a false economy, just a different emphasize due to the change in price structure.
In the old days, memory were expensive, so we tried to economize its use. Today's memory are so cheap, that software developing time has become the most expensive part of a system.
@@gorilladisco9108 the cost of memory is largely immaterial. It’s the cost of execution time. Say you’ve got a transaction that currently takes 10 minutes to complete but if the code was optimised would take 7 minutes. To optimise the code would take the developer an extra 5 days effort and the developer earns £30 an hour (that’s the mid-point for a developer where I work), so that’s about £1100 wage cost but once it’s done that cost is done. Once rolled out the application is used by 200 people paid £16 an hour (I have some specific applications we use in mind here). Saving 3 minutes per transaction means either those same staff can process 30% more transactions or we can lose 60 staff at a saving of just over £7000 a day. That extra development time would repay in a little over an hour on the first day and after that would be pure cost saving.
NO COPILOT! NO RECALL! This future is PRISONPLANET!
You really have a knack for making complex topics engaging and easy to follow for everyone! Breaking down the challenges of SRAM and introducing phase change memory in such a clear manner is no small feat. Excited for more content like this!
👍🏽💚🌴☀️🌏
Has datsbus ended?
@@Panacea_archive 😂😅no..my English is bad🐪☀️
@Magastz love💚and peace 🌏
Not bad on the eyes either
You explain things so well, thanks for a well thought out presentation
The problem with chiplet design is heat management.
Since every layer is active, it burns energy and produces heat, and this isn't good.
A secondary problem is the bus interconnect because stacking requires shared lanes, so memory layers are in parallel, making the bus interconnect a bottleneck.
Last but not least is signal strength and propagation time: stacking layers requires precise alignment and add electron jumping around, so there's a potential limiting factor in electron propagation, noise and eventual errors. This isn't much of a problem if the system is built around it, but it still is a limiting factor.
There are solutions: since there's one master and multiple slaves there's no risk of collisions and so you can make a lot of assumptions on the drawing board... but busses are going to become wider and more complex, and that will add latency where you don't want it.
My 2 cents.
- I wonder if they run veins of metal in between the layers to send the heat to radiator.
- They put L3 cache on the second layer, which by virtue is quite removed from the logic circuits.
Heat, latency, voltage regulation, signal integrity, etc…. Stacked dies has never been simple which is why there aren’t many of them.
Thanks!
Thank you
I'd be curious about the thermodynamic side effects of phase change memory during transitions as the crystallisation would release heat while amorphization would be cooling
Science communicators who actually are professionals in their field are allways welcome. Thank you Anastasi
I didn't even know she was from the field, I thought she was just smart. But I guess that makes sense
@@nicholasfigueiredo3171 Given the way she talks, I would had guessed her field was steering investors into various markets. The technical run down is useful, but the whole discussion is still clearly framed like "guess whats going to make a lot of money in the near future?".
She is a technology communicator. Learn what science is, please. I'm guessing you are a Republican, so I wouldn't expect you to understand the difference.
The point "good endurance 2*10^8 cycles" prohibits its use for cache memory. But it's really a viable and competitive option as a replacement for Flash memory!
Been waiting for your vid.... Love the content
Thank you Anastasi - great presentation!
Subscribed... Always interested in intelligent people. You understand what you are saying and are not just spewing words. Fascinating.
I greatly admire the passion you infuse into your presentations. Your work is outstanding, please continue this excellent effort. Thank you!
Thanks. Amazing video. It's kind of interesting how it always comes down to the same principles. First shrinking the size in 2D, then layering stuff, and eventually going into the 3rd dimension. And when that reaches its limits, then change the packaging and invent some hybrid setup. Next, change the materials and go nano or use light etc. instead. Even the success criteria are usually similar: energy consumption, speed or latency, size and area, cost of production, reliability and defect rate, and the integration with the existing ecosystem.
А потом ещё уйти в 4 измерение:D
And even then after that, when all resources and possibilities are dead, go quantum and use "entanglement" to avoid heat and space limits...
Amazing!
This girl researched exactly what I wanted to know.
Thanks.
Very interesting, I like the way you present info clearly and concisely
Very comprehensive and interesting video. Thanks Anastasi! 👍
😮😮😮 Really liked this info. I'm formative to the point and exciting.
Well said. Excellent video Anastasi!
Very interesting. Thanks for sharing your expertise. There is always something interesting in your videos. At least in the three or four i have seen so far.😊
So, the two biggest old school technologies that are slowing progress seems to be memory and batteries.
Yup!
Also, a shortage of railways.
Super great explained in a such complex technology
Nice idea. Very similar to Nantero NRAM that also uses Van der Walls effect to provide resistive cells using carbon nanotubes for SSD/DRAM universal memory.
I've been waiting for NRAM for 20 years, and it is only now beginning to make it's way into the data centre. Let's hope that this technology takes less time to mature.
I used to use magnetic memory... when I worked on DEC PDP-8e. It was called core memory, you could actually see the core magnets and wires that were wrapped around the cores.
That was a great video very informative. You're right, it is an exciting time to be alive with all the evolving technology.
I worked on micron/intels PCM, optane, for a few years. While we were making peogress on some of the problems you mentioned, the venture ultimately failed due to the economics of producing the chips as well as a lack of customers. Would be cool to see it make a comeback in the future
I am shocked she failed to mention optane as well - "new technology" lol.
had they holded on till CXL was here imo it could have taken off, it had great promise it was just in the wrong interfaces
I thank you for your service. When intel announced that they were ending optane, I bought 6 of those PCIE drives; I caught a fire sale. Those drives are the fastest drives I have for doing some disk intensive Studio work. I wish they could've gotten the price down around $100-$200 dollars for the good stuff. I actually got 6 optanes for $45 a piece. I lucked up and bought a box.
PCM memory chip technology has been in R&D since the mid 2000s. Intel, StMicroelectronics and Ovonyx were in the game together in a joint development starting around 2005. Samsung was also doing research in PCM. I believe the biggest player now in Micron Technology.. And you are correct about all the advantages of PCM. I believe the two big challenges are being able program the device.into two or more distinct, well defined resistance states reliably coupled with manufacturing very small structures with precise dimensions. Nvidea is talking about PCM.
Very well explained. Thanks
We need more Journalism with clarity to present for the public the real challenges and advancements of Technology.
Its interesting that you talk about your experience in chip design. Maybe you cold make a video talking about your experince in chip design?
I love your explanations. Nice work! 👍
This sort of tech is very interesting, because depending on how it advances, it stands to change the computing landscape in one or more different ways. If Phase-Change Memory is fast enough and gets good enough density, it can replace SRAM in L3 cache. If the speed cannot get high enough, it could still find use as an L4 cache or a replacement for DRAM. If all else fails, I bet it could give Flash storage a run for its money.
It has been discussed for decades that close stacking of chips has advantages of speed and size. The issue is heat generation, thus trying to reduce the total charge (electron count per bit). New memory technology is required with far smaller charge transfered per operation.
This was an excellent and very informative episode!
This helps me immensely with my DD into the tech & companies involved in the memory sector, Thank you very much Anastasi!
Thank you for your presentation. I found it fascinating. The phase change memory, amorphous crystal back to uniform array crystal seems like the mental models used to explain demagnetization around the currie point.
Your channel has really improved over the 2 or so years Ive followed you. Im impressed!
Thank you for being here
As always fantastic work. I am not so enthusiastic right now with the new technology an endurance of 2E8 is amazing for something like storage, but the computer will go over that in no time for the cache. Even a microprocessor that is not super scalar and runs on the ghz range will be accessing memory in the other of 10^9 per second. Clearly, that access is per cell, and not for the full memory but they need to improve that number a lot.
The words "dynamic" and "static" are a reference to the powering method between state changes. You kind of hinted at this with the TTL logic diagram, but didn't expand. Static is faster because it doesn't have to wait for the re-fresh cycles before it can change state. Static also runs hotter and consumes more power- there are no free lunches ;-)
Not exactly. DRAM consumes power all the time, because it needs constant refresh to preserve contents. SRAM only consumes power during state change. Both consume some leakage current though, and with that, SRAM consumes more due to having more transistors per bit cell. DRAM also consumes considerable current to change state, because of its larger gate capacitance. Overall, DRAM tends to consume more power per bit but costs less and is more compact, which is why we use it for main memory and reserve SRAM for cache and internal registers.
My fave memory joke: Stand in the nutritional supplement section of a store and look at the products with a confused expression. When someone else is nearby, ask "Do you remember the name of that one that's supposed to help memory?"
Paul Schnitzlein taught me how to design static RAM cells. This video speaks to me. Yes the set/clear, and sense amps are all in balance. It is an analogish type circuit that can burn a lot of power when being read.
loved that memory zinger, ur so awesome!
Great stuff. As someone who built their own desktops through computer conventions in the 90s I appreciate you bringing me up to date on where we stand now in personal computer development😊
Thank you for sharing this new & exciting development 😊
Thank you for this video. It's great. My two issues: (1) heat dissipation, is not addressed (over cycles there is growth of H.A.Z.), (2) One thing I heard about and remember vaguely, was an attempt at self healing logics (rather, materials + control circuitry), which is aimed at reducing the need for redundancy, in elements at the core of the chip (hottest and fastest environment), and attempts to also better the chip lifetime (cycles 'til dead). -I would be grateful if you could address both.
Well done excellent video and very informative 👍
This is an excellent explanation of the current state of IC memory. Thanks.
This was an unexpected good video. This is my first video watch of the channel.
Great video - thank you Anastasi :-) I think if we stack much more memory as 3rd level cache chiplets on top of CPUs we may reach the size of gigabyte 3rd level cache. And this would eliminate the external DIMMs on the mainboard which makes future Notebooks and PC again cheaper and reduces not just the complexity of the mainboard but also of the operating system, drivers and firmware because data can be loaded directly via fast PCIe lanes connected SSDs to 3rd level cache.
Excellent analysis 👏🏾 👍🏾 👌🏾
Linked to my substack, title, "The very definition of brilliant" That meams you Anastasi. 😊
One of the chief benefits I can see in going to optical computing is the ability to have associative addressing through polarization and muliple concurrent optical reading/writing heads for raid like processing.
Loved the graph you put together with the memory pyramid (access time vs where is used, with volatility information)!!
P.S. Your accent also becomes more and more easy to understand!
I love the way you explain the topic it gets me thinking even though I have no idea. Like possibly folding the memory and interconnecting them to form cubes cause I always see dies represented in 2d. Like I said, not my field.
Thanks for the updates, really informative... I was working on OTP memory designs and this new time of glass memory is looking similar to the concept of OTP memory, may be we can see this kind of evolution in OTP memories side also.
Great information 😊
Cool video. Perhaps in some future, memory is controlled by shadow? 🤔
Stacking silicon...who woulda thought ...now it makes perfect sense for chip real estate. Thank you for your brilliant assessment of the latest chip technology. You have expanded my knowledge regularly.
It's quite bizarre that you thought the PCM memory is a future replacement of SRAM, as the it has a switching speed of 40ns (on par with DRAM), according to the paper you cited. This is an order of magnitude slower than SRAM. The current only viable option to replace SRAM is SOT-MRAM, which TSMC is working on. Go research SOT-MRAM😁
It is good enough for cache application but very bad for register memory.
It also involves a physical change to the medium, which means wear and limited number of writes.
I believe a similar principle has been around since at least the 90s. I used to have a CD-R/W type device that used a laser to heat up spots of a special metallic medium, changing it from smooth to amorphous. Could be rewritten some number of times.
I will say though, your point is probably good and valid, but could have been made more constructively.
@@kazedcat its not good enough for cache, modern caches are at most in the low dozen of ns, 40ns is DRAM levels of latency
This is true. PCM is totally useless as SRAM replacement and doesn’t have sufficient speed or rewrite resilience. Honestly, she really failed to understand its use case. It’s a great alternative to floating-gate FLASH memory, not SRAM!
what about 4ds memory? 4.7 nanosecond write speeds
Thank u for pointing this out!👍Not just on chip SRAM memory, but operating memory in general has a lot of catchup to do with the compute logic not only because of the limitation of further shrinking SRAM and demand from the AI workloads. Operating memory has been historically left behind the compute logic and in a way ignored "nature's" way of things (brain neurons) by being the same and as fast as compute/processing while having sufficient capacity. Maybe PCM or other memory technologies will deliver that in the future, however I agree with u that L1 cache will most definitely continue to use SRAM for the foreseeable future and L2/3/4 with larger capacities will most likely go first with the stacked SRAM before moving to new technology like PCM or resistive memory.
Thank you! Just wanted to say you have a mistake in the figure you show (e.g. 12:38) labelling the latency of flash memory as milliseconds (1/1000s) when, as you say in the audio, the latency is in microseconds (1/1000000s)
This does remember me of a mechanical (robot related) movement solution.
They used the same idea in a mechanical way.
It works like muscle cells.
In addition to learning heaps about memory, I really enjoyed hearing you say SRAM lots.
the NVMe slotted in the DDR5 slot - direct access storage - skipping a part of memory all over.. the system boots from storage/memory - slot 2,3,4 are real RAM
just a dime throw
I had thought of building memory (and the whole IC) in 3D 10 years ago. I think I even put the idea in my website years ago. One part of my idea that is not used yet is using microfluidics to cool the chips that are stacking transistors in 3D, thus restricting heat transfer. The channels could run many levels, and of course, they need fluid-tight connections (a big problem). And use optics to communicate instead of a BUS. Possibly LED or laser tech.
You are an amazing Vlogger and i love your accent :D
Ms. Anastasia is so lovely, hard to concentrate on her narration let alone it's not easy subject to understand.😊
My memory is so fragmented I can't tell which particle remembered me.
😂😂😂
@@guidedorphas10 Alterra, also included in a game I enjoyed for a very long time. SubNautica. Thanks for the extra smiles. On my face that is.
My memory is fine. Only problem is having the parity bit in a Schrödinger box.
@taurniloronar1516 damned light. Kick the box and listen for giggles. Good one.
I invented stacking when I was 3.
Astro blocks.
@Sven_Dongle
Was that you!?
I though it was David!
Good job 👍😉😆
I enjoy your work.
Yups.. but it keeps bulking
Not bad. My kid at 2 would stack boxes to make a stair to get over the gate. Necessity is the mother of inventions.
Oh hey Al gore… when did u change your name? Lol
I remember hearing about the SRAM scalling issue some time before the Zen4 release, but then haven't heard anything even though I kept hearing about shinking nodes. Been curious what was coming of that. I was thinking that since it's not benefiting from the scaling, if it may have been counterproductive regarding degradation etc. I wonder if that is what is happening with the Intel 13 and 14K skus? I guess we will find out soon enough. Thanks for the update, I'm glad they are on top of it!
The BCM2837 SoC chip uses stacked RAM. The Raspberry Pi Foundation released the Pi Zero 2W in 2022 using it. So who stacked first? Regardless of who, it’s great to hear the designers are finding solutions to such huge (microscopic) problems!
I very much appreciate your videos and recommend them to every engineer I know !!
Thank you
Cooling the buried cores may present a problem in the future
Great explanation
"And here I wanted to make a memory joke, but I don't remember which one"😂
I bought a book on how to improve my memory. But I forgot where I put it.
😂😂😂😂😂
Xilinx's (now AMD) HBM products were combining FPGAs with DRAM chiplets on a silicon interconnect substrate back in 2018.
Altera released similar tech a year later.
You are brilliant! Great content. Thanks for this. ;)
Nicely done.
The content, awesome. The jokes, not so much, lol. Thanks for sharing!
Great video. Loved your humor and I learned so much. Thank you!
I believe that down the line we would need to use another processor architecture than the Von Neumann one that we use today (i.e. having logic and memory separated), an architecture that instead has an "on memory compute" design, or perhaps a mix of them.
In the end the speed of light makes it hard to compute over longer distances (i.e. CM or even MM) specially when the frequency goes up and the data becomes even larger.
So basically smart RAM chips with shaders?
thank dear, its informative
Awesome explanation…. Thanks 😊
It should be mentioned that process node sizes like N3 or N5 nodes are density measurements and not actually a transistor size. Intel 10nm was equivalent to TSMC 7nm as they average over different area sizes and utilize different shapes and can't be compared directly or even with the size of a silicon atom which is only 0.1 nm in "size".
One way of attacking the Memory Wall hierarchy is to attack it from the top, use RLDRAM which has been around for >25 years but only in NPUs (network PUs) since it offers DRAM cycle rates closer to 1ns but latency of 10ns or 8 clocks. Since it is highly banked, 16-64 banks working concurrently allows for 8 memory accesses every 8 clocks so throughput is 2 orders better than conventional DRAM. Of course in single thread use, not much benefit and to keep as many threads in flight requires that thread selects pseudo randomly across the banks and not hit on the same bank successivly.This could be used as an extra layer between normal DRAM on slow DIMM packages and the first SRAM cache level. This RLDRAM layer is where it would be used in CAM modules or soldered. We are substituting The Memory Wall for a Thread Wall here. But we already are used to having dozen threads these days. The RLDRAM model could be applied one level lower down in an RLSRAM version which would be perhaps several times faster but allow bank cycles and latency near 1-2ns but still 8 clocks and 16 banks.
Suggest captions. I think I’d like Anastasi in Tech even more.
So fancy! I think I want that laptop
Interestingly, computing is still based upon an electron pump system when the Spherical Photon Well combines logic and data storage in one system that moves at the speed of light.
I worry about using non-volatile memory for primary or cache memory because of the security aspect. If the information remains after power is interrupted, quite a few "secrets" will be in clear text, and the determined and well equipped "bad actor" will be able to extract surprising amounts of information from a system.
My industry has to issue letters of volatility with everything we produce, and for anything with NVM, the sanitization procedure usually involves removing the part with non-volatile storage and destroying it. The only exception is when it can be proven that the hardware is incapable of writing to that NVM from any component present on the assembly, even if malicious or maintenance software is loaded onto the device. This phase change memory built in the same package as the CPU logic could not be provably zeroized without some sort of non-bypassible hold up power, and that would increase the cost and size of the chip package.
I think this is very promising for secondary addressable storage, but I don't see it replacing main memory in most applications.
Another banger video. Do you have discord channel to reach out to?
telegram probably if shes russian
NO COPILOT! NO RECALL! This future is PRISONPLANET! NO WORK NON-STOP!
What about keeping the heat down. Sure lower power required in some case but stacking should also increase the requirement for improved cooling perhaps?
Many years ago I wondered why transistors and memory were not stacked in 3D in layers. I figured it was because of heat. My solution to that was microfluidics and possibly sodium to carry it away. I also thought light pipes (lasers) could replace the metal bus.. A lot of work to make it to production as the hardware needs to accommodate new kinds of connections.
I'd love to see a AIT and High Yield collab someday :D
Non-volatile and low-latency at the same time, coupled with scalability and hopefully cost-effectiveness in manufacturing, would be a huge technological leap. Thank you for the information.
Great news! You are my crystal ball when it comes to predicting the future of these tiny miracles. Thanks for sharing.
Bus speed is an issue but have a look at IBM z mainframes and the high-speed optical link that they use to combine and share the L1 through L4 memory caches between the different die
Smart and beautifull, well this is something new.
I hope you become a trend , so that our kids can stop follow brainless influencers
I have to wonder about the operational durability of phase change material memories. Decomposition and/or malformation seems very likely.
what really excites me about this new PCM technology is it's analogue compatibility. i really think APUs will catch on within the next 10 years or so. and this type of RAM is perfect for that application
My concern with the phase change memory is just the lifetime and reliability. Do the cells grow oxides or change chemistry over time? Can they be ruined by ripple or electrical noise at scale that hasn't been discovered yet? Etc. Love your videos!
Although I do not comprehend all the things you mentioned, what I do understand I find very fascinating. Yours and videos of others help me to decide on what companies and technologies in which to invest (= gambling) at the Wall Street Casino. Investing in stock is like playing Black Jack. The more you know such as via "card counting", the better your chances of winning. For me, your advice is akin to card counting when it comes to gambling on stock purchases. Thanks for your insight in this realm.
BTW, my 1st computer was an Atari 800XL which I purchased in 1985. I also wrote code in Atari Basic and in HiSoft Basic. Ten years later, I used the program I wrote to analyze the data for my Master's degree in Human Nutrition. With the Windows computers, writing code now has become too complicated for me, so I have given up on that endeavor.