Why Lunar Lake changes (almost) everything
ฝัง
- เผยแพร่เมื่อ 25 มิ.ย. 2024
- Can x86 beat ARM? Chip & architecture deep-dive of Lunar Lake, Lion Cove, Skymont & Xe2 Battlemage.
Follow me on X/Twitter: x.com/highyieldYT
Become a supporter on Patreon: www.patreon.com/user?u=46978634
0:00 Intro
0:52 Lunar Lake SoC / on-package memory
3:16 compute tile / chip analysis
6:08 LNL architecture / Lion Cove P-cores
8:30 Lion Cove IPC / Hyper Threading
11:18 Skymont E-cores
12:03 Skymont IPC / SLC / Memory Side Cache
14:06 Skymont vs Raptor Cove
15:01 Xe 2 GPU / Battlemage / GPU TOPS
16:10 Intel Neural Processing Unit / NPU4
17:31 Lunar Lake & x86 vs arm - วิทยาศาสตร์และเทคโนโลยี
Lunar lake looks compelling, but just like Zen 5, I'll believe it when I see it.
The only way you should do it.
Yup. But I am really glad we are seeing competition in this segment and companies are back testing out new and being bold with the design and not just being afraid of unknown and stagnating. Testing out new stuff and improving older stuff is never a bad thing.
Well with regards to IPC anyway AMD has been on the money with every irritation of gains they said they will get, Intel on the other hand... Well lets just say they have fallen short each and everytime.
Intel still has to prove that they can do efficient compute (but this step closer to Apple design might help)
Amd still has to prove they can do efficient platform power, where Apple and Meteor Lake are miles ahead in true achievable battery life.
Will see when notebooks are out.
@@Sam-jx5zy Yeah AMD has been making head way in power gating but still not near Intel's way in doing it, as for Apple well that's all based on ARM which is very efficient but ARM is by it's very nature, basic in how it processes requests and can't process long instructions and requests like x86. On the other hand they sip power so pros and cons.
respect the heck out of the multiple 'this is the boundary of my expertise' comments. in my experience when someone says that it just reaffirms that everything up to that point is trustworthy, or at least honest. it makes your speculation more interesting.
not in the market for a laptop, but this tech space has been so cool to follow! i love where intel (and amd) is going.
The more things I learn, the more in realize how little I really know
@@HighYield Yep... but keep it up! :D Awesome to see great content like this on here.
High Yield explaining chip layouts is like music! I just nerd-out. This man is so good at the niche he is in. I wish this would be as financially lucrative as the value of the knowledge he espouses!
Yeah, this guy is the real deal. Really enjoy his content, even when it’s on x86
nerds
Much better detail than Asianometry, that's for sure
@@aerohk Jon is much smarter than me, he's just looking at the bigger picture most of the time. Like a certain technology, instead of a specific chip.
one characteristic of high IQ people is they pinpoint intelligent aspects in other people 🎩🧤🥂
It's insane that this channel went from small youtuber with sub-1000 views to being invited to international trade shows by one of the biggest chip makers in less than three years!
As someone who's here since the VTFET video I am amazed but also not surprised because the quality was there from the start.
Herzlichen Glückwunsch und weiterhin alles Gute auf deinem Weg der dich hoffentlich weit bringen wird! 🍻
Dankeschön. Ich bin immer noch überrascht
@@HighYield Verständlich aber hast es dir definitiv verdient.
I can’t wait until real high-res die shots of these chips will be available
Get a scanning electron microscope
I think they're doing some very interesting stuff with their SOCs. I'm happy they flew you out to the event, you're a great creator.
CPUs are going through somewhat of an architectural revolution. The days of simply adding more cores is over. The real innovation has begun.
It seems a cpu architecture revolution is underway. The days of simply adding more cores are over. Intel and AMD are now innovating a lot more than they have been the last decade or two, and ARM CPUs are reaching insane performance levels, and neural processing is becoming much more prevalent. It seems that we have reached a turning point. It reminds me of the innovation that occurred during the movement from single core processors with huge pipelines, to multicore processors with reduced pipelines.
ARM processors are still RISC based while x86 is CISC. ARM does well for certain task but it is a reduced instruction set
Frankly speaking after 4 years of inertia intel make something based on apple m1 ideas 🤔😉
@@_EyeOfTheTiger reduced instruction set is better not worse for power per watt and compiled code efficiency. Risc set is enough for any task complicity, and much of modern workload are vectorised so no difference risc or cisc etc, more depends from vector extensions. big and complex decoders only waste for chip space and energy in cisc and later they also do all in mu-ops (so intel and amd chips really a risc chips with added on chip cisc to risc translation, since from i686 times 🤷♂️😂)
The problem is, all of this innovation is only coming because we haven't been able to make gains by upping core counts and building die on smaller process nodes, like you said.
This is because we're reaching a point where we simply can't gain more benefits from those methods. So after all of this restructuring of the Chip and optimization is done, my layman opinion is that we're going to plateau. These sorts of organizational innovations can't happen forever, at a certain point you've reached peak efficiency for the tech you have available
I think a lot of us would be surprised to know how many of these 'new' technologies were actually patented decades ago. It's just a matter of culture and economics I think. No?
Excited for this video!
Congrats on getting a press invite dude! You deserve it.
I love the idea of on package memory. It's fantastic to get the perspective of someone who sees this idea as an opportunity for improved efficiency and cost, rather than just a lack of upgradability.
First found High Yields channel 3 months back with the Zen 6 video and have become a fan ever since and have watched lots of his previous videos as well. Great content!
For a while I was expecting a big reveal of the Adamantine L4 cache.. Alas, it ended being the side cache
I was hoping for that too
Adamantine is a separate cache tile that goes between the base tile and active tiles, so it can’t be on Lunar Lake with only 3 tiles. It’s possible for it to be on Arrow Lake as the tile implementation isn’t revealed, but I am doubtful of that.
@@dex6316 Adamantine is an active Silicon base die which was rumoured to contained L4 cache. It is not a die that goes in between.
The battle is far from over, X86 still has a lot of bullets to fight
haha. this is an advanced arm soc copy at every level of the design, except for the instructions decoder.
@@PaulSpades yeah just like how snapdragon ditch the low power cores for the laptops
Intel used to make great ARM chips in their Xtensa series, up until their Atom SoC push in mobile. But they still hold on to their ARM architecture license.
AMD also based their first competitive x86 products on their am29k RISC architecture.
x86 (or more accurately AMD64) is just a layer of backwards compatibility and nothing more.
They just directly copy the ARM SoC to x86, but just like for Qualcomm, it's required 4 years just to copy the M1.. so those bullets comes out to slowly.. also Apple can scale their chip to desktop level, but check the Snapdragon X Elite, if you increase the power consumption with 250% (from 23W to 80W), the gained extra performance just 10%.. so food luck to make a desktop chip with that, so the chip itself doesn't mean a lot, if it's limited to only laptops, since the laptop market only a small part of the PC market..
@@PaulSpades If it means backwards compatability with out emulation... I am not buying mac in everything but name. And if it costs like curent Arm solution, it will be reasonable even as core for actual PC. But lack of Ram expandability is still a bit meh.
This is the best explanation of these things I’ve seen so far! I also don’t fully understand everything but I feel like you made it really easy and enjoyable to follow in one video. Thank you!
I suspect the L0 naming scheme does a few things. Allows for L2 to keep the same name since it has the same performance and size as previous gen and allows L3 to not be named L4, which may have some negative thoughts among the media (plus the potential for direct comparisons to AMD)
And naming their L1 L1.5 likely creates unnecessary problems on the SW side
Also it must be using virtual addresses like L1, the larger L1 still has to start physical address translation for cache misses.
So this small cache for recent data not in the register file may save energy by (usually) avoiding work.
My first thought was that the L0 is not shared per core, or micro code references to cache start at 0 and they're being more cohesive with naming.
@@mikeb3172 Cache coherency requires any cached data to be available to another core. But that requires physical addresses to obtain a read only or exclusive write to the memory cacheline.
If L0 is read only with L1 able to invalidate entries then the simpler faster cache type fits.
It sounds like a Jim Keller style idea that questions prevailing assumptions.
There used to be an intel generation during the stagnation years that had an L4 cache. DDR4 wasn't ready yet and the cores were memory-starved, so they put some cache on.
If the standard would be 32 GB for RAM on every soldered LPDDR5X RAM then no one will have issue over upgradability.
That is what they said 50 years ago about 640kB. You have no idea how much memory we might need 5 or 10 years from now. Maybe even only 2 or 3 years from now.
@@TheRealEtaoinShrdlu win 7 times required 2gb of ram (8gb for best experience). now at least 8gb is required for daily tasks and non under engineered games. so 2-4 times ram in 14 years i say.
but 32gb max... i say it may be not for ultra professional 3D / music producers. but who knows
Two option ram 16/32gb
Allow a user to add more ram and use the "onboard" RAM as cache or allow users to replace the SOC like they do for desktops.
That or CXL3.1 can access more RAM via a PCIE enabled port and device.
There are already plenty of mobile use cases that don't need massive compute power but do need more than 32GB of RAM.
It's an understandable compromise at times but it would be nice if there were more memory options.
High Yield with the early Computex coverage!!!!
My first, but hopefully not my last Computex.
@@HighYield Aw, many more, High Yield. After all, I'm sure you've met many great people in the industry. More to come I say! Also, will we get some Strix or even Turin content? Skymont seems very impressive. I feel like AMD is sitting on Zen5c, which IPC is on par with Zen5, I'm saddened AMD didn't talk about it at all (perhaps in a future Hot Chips). They've left 8 Zen5c cores for consumer and the rest for Turin (dense). From what I've heard it's also a unified CCX, so no split cache, so much better latency (Zen 2 to Zen3), I don't know why they're sitting out on the design.
That said, Turin dense, the CCDS look massive, and I don't think it'll fit on AM5. I'm really interested to know why the Zen5c CCDS look larger than Bergamos Zen4c CCDs. My thoughts lead me to it to having 12 CCDs instead of 8 in Bergamo. Could it be more GMI links, to fit more CCDs on package? Is that the reason why is bigger? Could a 12 Zen5c CCD fit onto AM5 package socket?
Great video!
Thanks! Was really nice meeting you in person :)
Great analysis! Really a real breakthrough.
i really like ur transparency
M4 and Lunar CPU fight is going to be interesting.
Hopefully intel becomes competitive with arrow lake against m4max.
@@PKperformanceEUThere is no way intel will reach M4 max that quickly. Intel is good but the last few years haven’t been kind to intel or 10 years at that.
@@GlobalWave1 Most likely. But it be nice to haven alternative if m4max will be more expensive than m3max i dont buy it
@@GlobalWave1 lunar lake is not a competitor to m4 max. its a competitor to m4
Amazing contents!
Thanks for the disclaimer early in the video. A perfect example of why you are exemplary and trustworthy.
Great job on the video as usual bro! Thanks for the info and looking forward to seeing Lunar Lake AND Arrow lake hopefully later this year 🤞. Congrats on having Intel fly you out there too!
Technically you can upgrade the memory after purchase. You just have to be really good at soldering 😁.
I only tell this comment, because I knew someone who did this to his MacBook. Bought a. 8GB model abd with some patience and skill it became a 32 GB model 😅.
But did it help perf
@@wawaweewa9159 I don't know but I know the device worked afterwards. I lost contact with him after his internship ended. But I think he wouldn't have done it, if it wouldn't have improvement performance.
Hahaha its gonna be fun for dyi project
That works when the memory is soldered to the motherboard, not when it's on the CPU die.
@@mattbosley3531 very true but this was still an Intel Mac with soldered memory.
Interesting, the L0 pre-L1 may avoid work.
L1 caches use process specific virtual addresses with an address translation needed in parallel to validate the tag isn't a clash with data from another thread. (There's some great CPU engineering lecture videos in YT that explain how L1 operates)
The tiny pre-L1 shouldn't imply the real L1 is an L2 accessed by physical addresses shareable between processes.
So a pre-L1 cache ends up as L0 to avoid confusion.
Now for some speculation and this could be why Lion Cove dropped HT.
Without HT that cache using logical virtual addresses could make energy saving simplifications. If it is entirely flushed on thread changes and not shared between threads, no logical to physical address translation validation seems necessary.
The small size may mean it can be looked up fast enough to pre-filter L1, with misses going to L1 after or the energy inexpensive fast cases are going to L1 simultaneously to complete faster with later validation on an L0 miss.
So perhaps the underlying truth of the leaks was HT went to allow effectively a cache of register file and most recently used logical addresses accelerating L1.
The address translation is needed to figure out which cache line in the selected set has the desired virtual address (and all caches still store the full physical address each line is for, regardless of whether they're initially addressed virtually). The reason for traditional L1 being virtually-addressed is specifically to allow doing the translation (aka TLB lookup) in parallel.
The reason such L1-s are so tiny is because they (ab)use the low 12 bits of physical and virtual addresses being the same (due to 4KB pages), and extend to 32KB or 48KB or whatever via just reading all 8 or 12 (aka associativity) possible matches, and selecting between them when the TLB result is gotten. A 192KB virtually-addressed cache would imply it reading an entire 48 possible cachelines (each being 64 bytes) on each access, which is utterly crazy.
That said, assuming that L0 and L1 accesses aren't done in parallel, by the time the L0 concludes that it doesn't have the asked-for data, the TLB lookup will have finished anyway, and thus the L1 will be addressable physically with no additional delay, like it would with a traditional L2.
@@dzaimaThe point is in L1 the virtual address can be looked up, with the physical address translation in parallel for validation to ensure it's from the right process. 2 different physical cachelines can share the same logical page bits.
You don't want the latency penalty of translating virtual addresses first because it's slow.
The figuring out which cache line has the virtual address is back to front, process virtual addresses are mapped to physical memory via address mapping.
The question what virtual address does this physical memory have is meaningless because it depends on what processes are sharing the memory page, you have a 1:n mapping. But the process thread running has a 1:1 translation.
All the code I compiled tried to use relative addresses with relocatable code to minimise such problems.
@@RobBCactive Virtual to physical address mapping isn't a 1:1 translation even within one process - it can be n:1, as a process can map the same page to multiple locations in its own single virtual address space (and this is useful - see "Mesh: compacting memory manager"). Thus, addressing a cache by a full virtual address is impossible to do correctly without still having some physical mapping check somewhere.
@@dzaima just another reason to avoid the need for it, I think you are ignoring the possibility of a read only cache that writes through via L1 with its translation.
Actual processing writes mostly to registers and then store operations.
If you include all the L1 features what is the benefit of the L0 cache? The address translation isn't going to magically complete faster.
@@RobBCactive I'm saying that it'd be pretty reasonable for the Lion Cove L0 to function exactly like traditional L1-s, and its L1 can largely function exactly like a traditional L2.
Haswell (2014) has a 32KB L1 with 4-cycle latency and 256KB L2 with 12-cycle latency, and it seems very possible to me that, with 10 years of process node improvements, similarly-structured caches (with the higher bandwidth of course) can map to Lion Cove's L0 and L1; and then the difference ends up being the modernly-sized extra level before the very-slow L3.
I suppose it would be possible that Lion Cove's L0 leaves write ops to L1, but that'd obviously result in a rather larger write latency (though perhaps that doesn't matter too much given store forwarding).
I love that I came across this channel. I can't believe how good the quality of content you have while still being a sub 50,000 sub channel.
Pretty cool. Looks interesting on paper. We will see how it performs in real life.
Good effort from former IMG guys (among others). Congrats to the team
I remember watching your channel before you had less than 1000 subscribers. It's good to see you getting big enough to be invited to Intel events
I've seen your username around for a while now, thanks for sticking with me
Look at you now! This is crazy, I remember watching your videos when you had 700 subscribers, and now you're getting invited to these events. Congrats!
Thanks! Tbh, it still feel like dream to me. I'm enjoying it as much as I can.
@@HighYield I'm glad :) and it's well deserved! Your hard work is definitely paying off
Tbh the NPU is taking up an absurd amount of space. I think they have drunk far too much of the AI Kool-Aid
perfect explained for me. thx
Lunar Lake looks like the biggest improvement for Intel in over a decade. In terms of performance per watt and GPU performance, it looks like Lunar Lake will beat Zen 5 and Qualcomm's X Elite. The only downside is that Lunar Lake is focused exclusively on thin and light laptops and handhelds, its not their highest performance product for mobile or desktops that is Arrow Lake which looks great for performance but will lose some efficiency and iGPU gains Lunar Lake brings.
I love this channel so much
Exciting stuff! Great video, as usual. I do have one question, though. Is it certain that a server implementation (or any) of Lion Cove would have SMT? Also, different implementations of the same architecture sounds more like a standard vs Dense Zen situation to me, and I think that it could get expensive to develop lots of just slightly different cores
Yes, Lion Cove in Xeon will have SMT. And yes, LNC is also more flexible. Not really a “LNCc”, but there will be size differences.
@@HighYield Nice! Excited to see what they will come up with
Good Job @Highyield , love your detailed reviews on these silicons. With respect to this video finally Intel catching up with various ARM platforms including Apple’s M series and Snapdragon X series.
intel is back on innovation track. I really want to bulid my next pc with intel. And Arc gpu😅
They are copying apple/arm and using tsmc, like AMD. Can't wait to see them do interesting things besides 14nm++++++++
"innovation track --> build my next pc" reads like "nVidia has -90 class GPUs that are great, let me build my next PC with GT 1030 (DDR4)"
yes, they are looking for new ways to get us on another decade of 4 cores is more than enough
I'm seriously considering an LNL mini PC as an upgrade from my current 5600G mini PC and 7220U laptop, I feel like this thing can do it all with a much lower power and heat output (which will make it more portable than my current AM4 mini PC), 32 gigs is seriously enough for me since that's where I'm at right now, and the heaviest thing I run is probably just War Thunder and RPCS3
it looks weird that media and display engine separated. they could switch display engine and 8MB side cache but the media engine does need some cache (not 8MB )
crazy you can soon build a tiny 4" by 4" work station, totally fine for 3d, illustration works and code. this is the way forwards.
Hello, great video. I wanted to ask you, now that both mobile laptop cpus from amd and intel are announced, which cpu do you think is superior overall? Taking everything into account would you go with lunar lake or strix?
Thanks
I think Lunar Lake has areal shot at the efficiency crown, but it does launch later in Q3, while AMD will launch sooner. Always wait for reviews, but for battery life I think LNL will be best. Strix Point should win in raw performance with up to 12 cores.
@@HighYield AMD is usually late with actual shipping laptops though, no?
@@HighYield thanks for replying, patiently waiting on arrow lake desktop reveal as that is what im really interested in, im looking to upgrade to a new desktop with an rtx 5090, gonna go with whatever is faster amd 9000 or arrow lake. Ryzen 9000 vanilla series kinda disappointed me a bit tbh, pretty much same gaming performance as previous gen. Have to see what 9000 x3d chips have to offer.
I don't think they are comparable. AMD doesn't have something to compete with lunar lake given the low power target of lunar lake, and Intel has not announced what their answer to Strix Point is (though we all know it's going to be some variation of arrow lake).
Intel will Winn the efficiency battle against Strix point and it's very likely that their GPU will be very competitive with hawk point at lower power, but it is unlikely it will be able to touch Strix point in GPU performance given that Strix point has 16 CUs.
Overall, more and more excited for lunar lake. I think in a handheld form factor it's going to be very interesting.
@@sloanNYCthe shipping may be not late, but the real issue is supply. Here in my country you only able to find phoenix point/hawk point easily in gaming laptops while the thin & light category is dominated by intel.
I'm really glad to hear intel seems to be going as wide as possible. It seems like that is why Apple chips are so fast and efficient, not ARM doing magic or something
Apple have caches v. close to the CPU, reducing latency and energy for data flows.
Going wide doesn't help a lot of code, it inherently has serialising data dependencies.
Not sure if it's going wide that's helping here. From what I know, the efficiency of Apple chips came from 4 things:
- better manufacturing node (M1 was N5, everybody else was on 7nm. M3 was on N3, everybody else was on N5. With Lunar Lake, we're finally on even grounds here)
- on-chip RAM (while I hate non-upgradable RAM, I'm glad that Intel did this with Lunar Lake. There is a segment which clearly want battery life much more than upgradeable RAM)
- non bloated OS (nothing to comment here, Windows (and Microsoft) sucks, Linux doesn't have enough support to be perfectly tuned yet)
- laptop and motherboard design - this is much more subjective. Thing is that Apple actually prioritises battery life, while on PC side it's usually the benchmarks. Which is why many laptops are much louder and warmer. I also know that simply having some extra ports, that is, only having them exist, having something connected to them is not neccessary, that can also increase the minimum power required for the laptop to be on. Apple is famous for not having enough ports - I think this is also a reason of its efficiency
Edit: forgot to add, M chips being on ARM also help on efficiency ... but not so much as most people claim (as if it's the only thing). My gut feeling is that it helps like 5-10%.
As for the M chips being so fast ... other than the big memory bus width (up to 16!! channels on the Ultra chips) I'd say is also because of better manufacturing node. If you take the N5 and N4 and 5nm and 4nm generation of chips, Intel and AMD are better than M1 and M2. I mean, if you exclude the efficiency, Intel's Raptor Lake and Raptor Lake refresh which are on 10nm++++ are quite competitive even with M3 chips. Still, overall, the difference is not that big usually. The M cores/chips are clearly well designed.
I will neither understand, nor remember most of this, but it was interesting.
The core layout with everything right next to the memory controller makes sense, and I'm glad to see intel moving in this direction. It'll be super interesting to see how x86 power consumption improves with this layout!
This guy got me high this morning. He got the sample
Wake up babe High Yield posted 🔥🔥🔥
it is also interesting to see that the NPU has roughly a similar TOPs per area as the gpu, so expect it to be very power efficient, which also might mean that perhaps someone might find out a way to overclock it, since sometimes hardware optimized for efficiency has insane headroom for overclocking.
Which are the channels mentioned in 10:40? Do you guys have channels related to these topics? I would love to follow more channels like this, I love them
Chips and Cheese and Anandtech. Both are websites.
I'm still waiting for a chip that integrates 32GB of HBM3e as an on-package L4 cache within the same SoC, while also supporting the addition of DDR5 memory modules with ECC capability, rather than being limited to just integrated memory.
bruh
the hell you talking about man lmfao
Do I understand correctly that 128 bit memory bus is the same bandwidth as what we'd get with dual-channel DDR modules? Since each module (channel) is usually 64-bit? Just trying to understand the overall memory bandwidth, I know we also get latency benefits and not downplaying it.
Yes, it’s 128-bit as most other consumer chips.
X86 is not dead yet. Pat also seemed very excited for Panther Lake...
A cynic would point out Pat was excited about Raptor and Meteor Lake and even Sapphire Rapids.
These presentations have not been a reliable guide to what's delivered and when in recent years.
@@RobBCactive
Raptor lake was great
@@technewseveryweek8332 if you read the tech news, you'd know better.
he was happy when he offered all their unused fabs to amd and nvidia because they are going extinct and nvidia seems to have listened
@@betag24cn to be fair, the Xeon Sierra Forest server is on an Intel 3nm node and with 144 E-cores has some advantages over Bergamo Zen4.
Will you do a video on Zen 5 and 5C? I'm interested in the capabilities of Zen 5C versus Lunar Lakes E-Cores and how the different paths they took paid off now.
C cores are wayyyy better than e cores
For sure, but idk when I'll find the time. Soonish I think.
is this where the power and signal have been seperated (on opposite side)?
No, it’s on a TSMC node which doesn’t have backside power.
Depending on the details on moving data between the NPU and the GPU, using both at the same time could work really well for some AI workloads. Training a QLora where the main weights are only used for 4 bit inference that could run in the NPU and the backpropagation is done only for a low rank adaptor in fp32 or fp16 in the GPU could potentially work well. It won't be faster than a dedicated GPU, even a 3060 should outperform it. Memory bandwidth will likely also severely limit its performance. But often the issue with GPUs is not speed but available memory. Also this should be much more power efficient.
It all will depend on software support, that is usually the issue with most non nvidia AI hardware.
Gunna slap this into a new rig once it's released 🙏
Always wait for final silicon reviews. Something can look great on paper and be meh in reality
Thank you for this educational content! Underdog Intel is striking back with a mean kick! This is an amazing SOC! Its real competitor is the Apple M4!
I think they should have used the empty silicon left in the die to make the gpu more powerful
We love competition between Intel and AMD
Seems interesting stuff coming ;)
Something not related to ai and NPUs :/
Does this mean the gpu can access all the memory like the m series aka unified memory?
I can access the 8MB GPU L2 cache and the 8MB memory side cache.
every intel iGPU can access all of system RAM
Speaking of, we REALLY need the dynamic iGPU memory allocation that Apple has. On Windows' side I can see why it's not implemented and why nobody talks about it, as Microsoft couldn't give 2 flying Fs about Windows, especially in the performance side. If it's not ads or tracking the user, then it's priority 7384, to be done in 15 years from now.
On Linux side I hope we'll see something, but usually GPU stuff comes from the manufacturer, so it would be Intel or AMD here for the iGPUs. And they're both busy on other areas, like the actual GPUs being competitive. And the drivers for Windows. Linux comes after that. Sigh.
@@Winnetou17 you do have dynamic igpu allocation on windows, on intel all of memory is accessible to the igpu (unlike ryzen lol)
@@sowa705 Oh, ok. I was under the impression that it's settled at boot time. I wonder then why did Apple presented (and people being wowed) as something new. I guess it was new for them.
Is Arrow Lake actually getting the 20A node, or just like this it will be using TSMC all around?
There will be some ARL parts on 20A.
I'd love to see something like this for desktops where I can get an entire SOC with 32-64GB of ram all bundled together. I know there are upgradeability concerns but the performance benefits if you over spec could be really good, especially for ram heavy applications.
Not even 50k subs and you're already getting free Computex trips? Damn, balling
I work for a major computer vendor and you're spot on. Your conclusion 110% speaks my mind and maims exactly what I've been saying when Intel presented us the LNL 3 weeks ago. I said that if LNL matches almost the battery perf of ARM by Qualcomm, this is going to be another Windows RT. ARM for Windows doesn't really offer a difference.
We have already more performance than needed, NPU's are available en masse thanks to NVIDIA, it's just MS that firewalls for now the marketing bullshit storytelling about Copilot and that blocks other than embedded NPU's from being recognized by copilot, but this will change probably next year and they'll have to open the gates. What's left ? Battery performance. Ok, but if this gets matched, what's the point of having the whole industry shifting away from x86 ? Zero...
ARM will be the thing that made Intel rethink it's architecture and from there the power efficiency and that's a good thing.
Chad skymont!!!
I think we'll have to wait for the release of the Lunar Lake laptop and the benchmark scores, but if you simply multiply the graphics scores of the Meteor Lake-H's 3DMark benchmark Time Spy and Fire Strike by 1.5, you get TS: 5250 FS: 13800. In terms of desktop GPUs, it's close to the performance of the GTX1660. In the country where I live, there are several articles that say it's 50% better in performance than the Meteor Lake-U, but if you multiply the GPU performance of the Meteor Lake-U by 1.5, it will be the same as the Meteor Lake-H's GPU performance. On a different note, is the presence or absence of hyperthreading related to the high single-thread performance of Apple silicon?
I have two guesses on Intel 4 and later processes Meteor Lake and Lunar Lake all being mobile oriented and not desktop. One is that the processes are not suitable for high performance operation, but do get better power efficiency. 2) manage fab-process capacity.
Not gonna lie, I was expecting Intel to showcase their fabrication process, but they are still relying on TSMC, which is not bad, it's technically better, but it'd be interesting to see how much they've improved in their chip-making.
There should be some Arrow Lake SKUs produced in Intel 20A.
@@HighYield Also, they're utilizing their Intel 3 process on Xeon 6, least we dont forget, server market demand is magnitude larger than consumer demand, I've seen so many AMD fanboys proclaiming that Intel fabs are dead because Lunar Lake relied on TSMC but that's not actually the case.
Would be very much interested in the thermal perf as they are using the TSMC manufacturing
All we need now is to add a cache chip over everything and we will have amazing performance at ultra low power consumption.
Cool to see a nice bit of Cache on the side to minimize DRAM access, L4 foresight on desktop? probably not but I love what I'm seeing from Intel this year, very exciting in more ways than expected. Maybe not quite leadership just yet but at least on par, the whole E-cores thing is evolving into something and I wont be surprise if it eventually gets to a point of Zen Dense. So far its still looking to be a split design mentality but a high IPC Philosophy so the ability to use E-cores for most task will get the best out of the efficiency. Last time I was this excited was Alder Lake?
for *Corporate America's Automation of Middle Management,* e-cores are more than enough.
I'm glad we have so vast mobile CPUs choice these days: Apple M1/M2/M3, Intel Meteor/Arrow/Lunar Lake, AMD Hawk Point/Strix Point and a new player - Snapdragon X Elite - is on its way. We never had a more difficult choice
If those E cores are getting so good, I wouldn't mind having a budget option with just 6 E cores!
those things are basically a 8th gen i5 mixed with a atom, if you want that, go buy one, dont wait for the future
@@betag24cn That was the first generation of E-cores. Did you watch the video? Skymont E-cores have similar IPC to Raptor Cove (Raptor Lake)... while being vastly more efficient
@@__aceofspades doesnt matter, tje concept is stupid, is fake you did not glued together two cpus because you were in panic, it is a dumb idea and points to the fact that your designs are terribñe on not generating heat thanks to absirds levels of power consumption, does not matter
That packaging is impressive. Certainly more complex than what AMD is doing.
Bro, it's been a while but I still have that love for Intel
the NPU is pretty gigantic compared to for example what Apple does. Curious about the performance because Apples ones are ridiculously fast for their size
It will be interesting to see actual laptops.
What are the channels mentioned here?
You mean the memory channels?
@@HighYield You mentioned couple of TH-cam channels for more detailed technical explanation.
@@sirinathAnandtech and Chips and Cheese are publications on the internet with their own websites.
@@sirinath Like @dex6316 said, its websites. Chips and Cheese and Anandtech.
I'm currently using a zen4 amd cpu and have been an amd user for several years now but honestly Intels next Gen stuff looks more compelling than zen 5 and up. That is if they can pull out off. I really like thread director being in hardware. Dealing with Gazillion cores and core types has been a weak point in software as of late so I hope this can help.
Hey look at you doing better disclosure than LTT
Is there a good chance that all these architectural improvements will help Intel make much more efficient desktop CPUs in the near future? I'm really interested in the Small Form Factor space, and I think Intel has had a bit of a hard time competing there in recent years with their processors.
AMD was working on same design. supposedly, the NPU portion of Phoenix and Hawk Point was meant for cache. Something call MALL cache and they got axed due to microsoft requesting NPU
SMT has been a perennial source of security vulnerabilities. It may still be useful for pure number-crunching like CFD or chemistry, but it's too much of a liability for multi-tenant. Client PCs should be considered multi-tenant because they are always running code from the internet through JavaScript and the like
How did they make it blue?
It only happens once in a blue moon!
While all of this is great stuff, here I am being hyped the most about hardware VVC decode 😅
I know this is a video about Lunar Lake but this video gets me really really excited for Battlemage and desktop products like Arrow Lake
If intel could figure out a V-Cache competitor and commit to multiple years of support for a motherboard platform they could make AMD straight up unattractive on desktop. I say that as someone with a 7950x3D and invested into AM5!
I can't wait to see the next few years
Shouldn't intel be exceeding not matching AMD's current offering to make AMD unattractive?
Or the standards are different for different companies?
@@aravindpallippara1577 Well currently, Lion cove is projected to have higher single threaded performance than Zen5 cores. That single thread lead will help with everything, including gaming. AMD has the biggest advantage in gaming rn with V-Cache, platform support and efficiency.
with skymont, intel has a real chance of gaining a huge performance/watt uplift particularly in multi threaded loads which is where intel sucks down a comically large amount of power
That's why I specified V-Cache and platform support would make AMD unattractive on desktop. Because Intel already has a decent chance of having class leading single threaded performance, adding V-Cache to an intel CPU would surely boost performance considerably (especially in games that love v-cache like Factorio, or Kerbal Space Program)
And platform support like we have with AM5 would be really great. Having to upgrade every 2 gens is a huge downside compared to AMD's offerings and commitment to 2027+ support and why I personally went for the 7950x3D and AM5. V-Cache and platform support is just great
The V-cache is a solution because of the slow memory controller on Zen processors. When you glue the ram this close you dont have that AMD problem. No need for the same solution. The mystery cache is probably enough if Intel engineers did their job well.
@@impuls60 Imo that's an F Tier comment. L3 cache is going to beat out faster memory just by virtue of the insanely high bandwidth and lower latency. There's a reason intel loses in those games that favor 3D cache
@@impuls60 Agreed with the above commentor a cpu cache and ram have vastly different type of uses - cache is very raw and hence very fast as opposed to ram which needs to be encrypted and passed through os layer checks before being accessed - cache is still the king for performance of single thread operations.
I foresee great deal for the war of mobile chips for this. It's radical but completely acceptable for th needs of laptops or handhelds or mini pc's. But lack of main specs like clock speeds or benchmarks shows that main desktop performance might be on the disappointing end
So which is best for gaming lunar or meteor lake.
Lunar Lake should be quite a bit stronger. But like always, wait for benchmarks.
So... it won't be possible to even add DRAM to the system afterwards? What the actual hell...
it is a soc for laptops, but they might offer something for desktops later, or might not, intel is like a kid lost in the forest at this point, copying everybody but scared all the time
Looking forward to getting a windows laptop similar to the Macbook Air. I would love to have laptop without the need for a CPU fan, or maybe one that runs only during high workloads.
So there's no Intel process node in lunar lake? All from tsmc nodes.
The base tile ins manufactured by Intel and they also do testing + packaging. But yes, all the active silicon is TSMC.
@@HighYield if i not wrong isnt they gonna used on home manufacturing for 2025? basically all the things gonna used intel intel nodes
You think it would be possible to do some gaming on this chip?
obv
@@XashA12Musk Better than 780m in Ryzen 7840hs/8840hs?
games fron 2010 and before via those horrible intel gpu drivers hopefully
@@hypocrisy5336not even by accident
those are oaptop and tablet low power apus, old by now, ibtel is trying to catch up but doesnt know how to run, perhaps in 5 years, if intel still exists by then
@@hypocrisy5336 Yes. Meteor Lake already beats the 780M in various benchmarks and games. Lunar Lake has +50% performance per Xe core over Meteor Lake.
Chadmont 🫡
Instead of two 8533mhz LPDDR5X DRAM modules they should've used two, or a single link HBM modules. The bandwidth would be revolutionary and exactly what the igpu needs to perform at console levels of performance.
Intel missed a serious opportunity on that one.
HBM interposers are expensive, they're already soldering RAM modules on the SoC and paying for one of the best nodes TSMC can offer, these chips are supposed to be included in a middle range price laptop and you can reach those prices adding more manufacturing costs, however I agree, even a 256-bit width bus would be great, however besides HBM is designed to offer a low power and very efficiency memory this chip is design to use the less energy you could, I don't know if at very low energy levels HBM is still the best choice maybe at ultra low power is unstable or draw too much energy in comparison to normal LPPDR
I'm wondering why they would make it on TSMC N3B when N3E is already in production.
Intel had to take what was available when the order was made would be my bet.
Jupp
Sir make A video on 14900K
And upcoming ultra 9 290K
If it has NPU and TOPS
TOPS in CPU and integrated GPU
Or has inbuilt RAM
Or any difference than Lunar lake
All I want to know is whether Intel can produce an APU good enough for me to ditch my Ryzen 7700/Radeon 6700XT desktop?
that's a gen or two away...don't expect all that this soon lol
their TOP gpu right now doesn't even reach 6700 xt performance let alone an APU from this coming gen...
@@ofon2000
Oh well.
Have to lug my combo around again...No, not seriously.
LOL.
keep in mind that amd pushed for a apu for decades, and look how far they got, intel catching up will not be soon, if it ever happens
Lol. Thats a good one
Sicne theyre using tsmc n3b theyre gona be very good and efficient
Cant wait for the comparison with with the Snapdragon.
Snapdragon will for sure be worse than this. Not comparable products.
@@impuls60 depends if intel decides to push high clock speeds or not, their last laptop chip has 120W peak power draw