PSA: I am a hardware engineer and there is an astonishing amount of misinformation in the hardware part of this video. I know it's coming from a JS dev so no flame, just thought I'd let you know. (P.S. microarchitecture determines power efficiency **NOT** instruction set)
I'm kind of shocked at how much stuff is wrong in this video. 5:40 This is incorrect, all of those instructions are decoded into microcode that's pipelined on more granular execution units, like those ALUs. Each unique instruction doesn't imply a unique hardware unit. Both x86 and modern ARM processors decode to micro-ops, which is why there's no inherent difference between them after decoding. The only meaningful difference worth debating is decoding fixed vs. variable length instructions, which hasn't been brought up at all in this video. 6:50 There's no dedicated chip or coprocessor in the M-series processors for Rosetta. They use just-in-time and ahead-of-time translation, and simply map x86 instructions to the ARM equivalents. Some x86 extensions (like SSE) also map to the ARM variants (NEON). The instructions that don't directly map are emulated with multiple instructions. Of course, there's a ton of other sneaky tricks Apple engineers added (like with memory ordering), but AFAIK there's no special silicon for hardware-accelerating dynamic binary translation like what's being implied here. 7:20 I struggle to understand how this benefits power efficiency. What do you mean "add more chips"??? 8:35 "More chips" is confusing here, again. I feel like you're trying to imply a focus on instructions per clock (IPC) rather than clock speed? IPC isn't related to the ISA at all, that's up to the design teams and microarchitecture. You can't just pump up IPC by simply adding ALUs, either. If only it was that easy.... 11:00 Intel and AMD have provided hardware-based codecs in their integrated graphics for a long time now. Instead of hardware encoding, you can do software encoding which *can* be accelerated with SIMD instructions, which Apple also supports. There's nothing Apple is doing in this regard that's better than Intel. It's incredibly wasteful to provide a specialized instruction for something like video encode, so it should be decoupled from the ISA as much as possible. Instead, it's better to add generic instructions for generic encode/decode algorithms, like SSE/AVX. 12:28 AMD can't officially support ZLUDA (the project I assume you're referencing) not because it's "not an official standard". They were funding it but their legal department chose to axe it. I assume since they anticipated legal action from Nvidia. Instead, they've been making efforts with HIP for a while now, which is an abstraction layer on top of CUDA and ROCm (ROCm being AMD's solution vs. CUDA). 12:36 AMX isn't a direct competitor to CUDA. I wouldn't even consider any CPU extension to compete with CUDA, they're just not comparable. 13:44 I don't see how looking at this block diagram demonstrates how bloated x86 is. AMD could easily adapt this microarchitecture for ARM. Notice how the bulk of the diagram is after the decode stage. This article has a ton of other issues... 14:09 MPSADBW is an instruction for sums of absolute differences, which ironically is a very useful and relevant instruction for video encoding 😅. This is a good example, because it's generic enough to be used for other digital signal processing applications, not just video encoding. 15:22 Spectre/Meltdown isn't related to the ISA at all. There wasn't a particular x86 instruction that was problematic, it was instead a massive oversight related to implementations of branch prediction and speculative execution. This affected all CPUs at the time, even some ARM CPUs. This opened up an entire class of security vulnerabilities, and we're still getting newly discovered CVEs to this day, even affecting Apple's M-series. Overall I think there's a fundamental lack of understanding of how a CPU works, and other false assumptions in the hardware space. There are a few other things I missed, which I see some other folks in the comments section pointed out. You seem to have a big platform, so I hope you redo or amend this video to mitigate the misinformation.
Ooof. That first point you mention is something I learned in undergraduate school in Computer Science, lol. How the heck does someone who built a YT channel on programming not know about microcode?
Quicksync absolutely rules. Video encode/decode is one thing where Intel straight up wins if they can leverage the hardware. Both on CPU’s and on ARC (for the 8 people who bought one, me being one of them)
@@jonathanjones7751 Yeah, the absolute power it has. I use N100 mini PC (120$) as media server and it can transcode 4k video while sipping just 7W of power. This has insane value.
Quicksync is so underrated. I run BlueIris servers on N100 mini pcs at different business locations and it's amazing how many video feeds I can stream with very little power.
I know this is coming from a JS dev, but almost everything about the architectural comparison between x86, ARM and RISC-V other than just numerical facts are wrong in this video... Maybe should have done a little more research or let someone with a lot more knowlege about the field explain...
He asks for forgiveness. He gets to keep live-streaming his misinformed thoughts and continuing to keep his work minimal while maximizing the people he reaches as a Js dev. He has a bit of an ego and still find it funny that he tried to make dark mode a feature and thought he would be praised. Reminds me of how I used to wonder how people lived off of url shorteners since it is such a simple tool and now I can see. Enough fools and you support another fool.
Hi Theo! IDK if you dig through the TH-cam comments, but I saw this and really felt like people might come away with the wrong idea. You make a few good points, but I'd like to set a few things straight about RISC/CISC/Intel/ARM. Here's two helpful of helpful bits of info, in no particular order: - x86 implementations very rarely have ALUs that completely implement every version of the 3,000-odd instructions, that's what microcode is all about. Under the covers, a modern Intel chip is a RISC-like system (kind of, it's complicated, asterisk-asterisk) running it's own ""emulation"" (kind of, it's complicated, triple-asterisk) of the entire complex instruction set. YES: This does make the architecture overall, in at least some measurable ways "more complicated" than ARM. Intel has really struggled on some architectures to optimize the tradeoff between "what do we build an accelerator for in silicon" vs "what do we emulate in 7 clock cycles of microcode". But they're always at least trying to innovate here. NO: x86 does not, and I believe[would need to fact-check and I do not have my textbooks handy] that even the 8086 had a few of the complex instructions that were ""emulated"" in that way. They took several clock cycles because they were stepped processes like add/carry/jump etc. - One of the main tradeoffs that my computer design textbook lists as The Difference between CISC and RISC is NOT speed; i.e. it is NOT necessarily about fewer OVERALL clock cycles. The book lists the example that while in theory complex instructions [like multiply] COULD be implemented in one clock cycle on CISC, in practice this basically never actually are. The BIG DIFFERENCE the book (Computer Architecture and Design, I have an edition from 2016, not sure the ed number) is about CODE PAGE SIZE. In RISC, complex instructions have to be _written out in your code_, whereas in CISC some segments can be represented extremely compactly and then it becomes the processor's job to translate those individual instructions into ops during decoding. Why is this important? Cache Lines! More instructions in the pipe theoretically means higher throughput / fewer cache misses / more code resident in RAM, etc. This was EXTREMELY important back in the day when memory was an expensive commodity! That was ((at least one of)) the reason(s) that Intel made this tradeoff in the design of the 80-series. Back when it was designed, you could fit literally more code more compactly into the same memory and memory was PRECIOUS. [citation, while not a computer architect in practice, I studied low-level hardware in college under a professor who seriously knew his shit. We used Computer Architecture and Design as a textbook, and Prof Whose Name Shall Not Be Shared Lest I Get Doxxed was a prominent member of the IEEE, a practitioner in the 70s and 80s, and pulled stunts like bringing in an actual board of hand-wound Core Memory from an ancient PDP for us to look at, he knew his shit] I will happily go back through my class notes and textbook for actual references and a proper bibliography, if anybody cares lol
A proper bibliography would be nice 👍 I’d also like to know if the same memory concerns are valid today? E.g. is storing the instructions in CPU cache significantly faster than RAM and thus x86 still beneficial today?
@@BenLewisEi am sure someone will make that argument, but the real tradeoff today is between more, faster cache, and more cores. due to the relatively huge die sizes for cisc, they have to optimise for cache, whereas risc designs also get the option of having more cores and less cache. as this option is only available on risc, we need to wait and see which will be better in practice, but risc has a lot of other advantages, so in the long term, risc is going to win, the same way x86-64 beat x86.
Wow, I couldn't watch that whole thing. Sometimes I forget Theo understands hardware worse than he understands Rust. This is an interesting topic, and thanks for bringing it to my attention, but I'm not going to get my information on the subject here. I get the feeling that Theo wants to do to X86 what JSSugar wants to do to JS because complexity he doesn't understand doesn't seem justified? Really common mistake, but you can avoid it by either sticking to where you are an expert, or actually learning more about things you talk about.
"x86 is in a rough state" No, I'd argue that "the state" of x86 is still "the best". When it comes to high-performance desktops, workstations, servers etc. there is really only one option right now, and that is x86. Software support is also still best on x86, including things like toolchains, compiler(-optimization) support, legacy applications, etc. - It's only in certain niches that other architectures are even a thing. Of course x86 still has it's problems, including that it is ancient, huge and has quite a bit of "cruft" - That's what I assume that x86 Ecosystem advisory board is about. But pretending that "x86 is in a rough state", and that it desperately needs saving when it is clearly still the best and default option is misleading.
The fact that there are 3500+ instructions means compilers completely ignore the vast majority of them. If you want to use them, you need to code in asm directly. And you need to know that they exist. If you have a couple hundred instructions, the compiler backend may use most of them, if you have thousands, 90% of them will never be used by the compiler. That's why Linus is onboard.
@@lolilollolilol7773 In general, I would say compilers don't ignore instructions. Compilers typically have a "cost" value associated with instructions, and make decisions on which instruction to choose based on the lowest cost of a possible generated code snippet. For example, on x86 you often have indirect addressing in instructions, which can save you loads/stores and offset calculations, but means your instructions are larger, and possibly slower to execute compared to a separate load/store + ALU op, for example. Your compiler is very clever about selecting the correct instructions(ideally, anyway), and regards the context as well, picking either the smallest instruction(-Os), or the fastest to execute(-O2). This takes into account the entire code block, so that if, for example, you need the offset into memory in a register at a later time, a instruction that under other circumstances might be less ideal gets chosen(In this example, a separate load and offset calculation, to have the offset in a register for a later operation, might be better than the shorter an usually ideal combined load from offset complex instruction). In fact, instruction selection in compilers is way more complicated than even this, it also considers pipelines, cache usage, etc. That's why it's important to tell your compiler about your CPU when performance really matters. You are right that compilers often don't regard the entire instruction set, but that is mostly because you asked them to: If you use a generic CPU target your binary will run on any x86 CPU, but won't make use of special instructions only some processors support(e.g. AVX, newer SSE versions, etc.). I'm sure there are instructions that are used so rarely that compilers didn't bother writing the selection/generation code, but I would guess it's quite rare. That's why you should compile performance-critical applications yourself with e.g. "-march=native", "-mtune=native", or manually specify the supported instructions. This is BTW the same on ARM or RISC-V. Thumb(2) for example is not included on every ARM core, and your compiler needs to know or a slower fallback might be used even when support is available. RISC-V has extensions, for example, for multiply/divide, SIMD vector operations, virtualization, etc. Some of that is mitigated by automatically selecting the correct implementation at runtime, but this is not possible everywhere, and has it's own problems.
That's not how instructions are handled like at all. They get broken down internally into other simpler instructions so no you don't have seperate handling for all instructions baked into the silicon
Not for everything, but CISC does contain instructions that RISC does not. If you're compiling to a RISC based architecture, you need to break the instructions into more simple instructions
7:20~ Number of instructions doesn't say much. The instruction decoding takes little to no silicon space compared to everything else like cache, multiple ALUs, multiple FPUs, SIMD stuff and pipeline etc (This is PER CORE, add glue logic and multiply by number of cores). Pretty much all IPC gains we've seen since the 90's is because of smarter pipelining/branch prediction and the number of operations it's able to do per clock cycle, not the ISA or how simple/complex it is). One could probably argue that a bigger decoder adds a very miniscule amount of additional draw but all the other crap is the real culprit. More performance == the bigger number of ctrl+c,ctrl+v of individual components inside each core.
you got so much wrong here. go watch that prime+casey video you mentioned since its relevant. the instruction set doesnt matter since it goes down to uops anyway.
@@si4745 Their chip fab business is in uncompetitive with TSMC, it's holding back the processor design business right now. But they can't split them apart because of patent encumbrances with AMD around x86_64 (and other shared instructions) which don't transition through sale. So by making an open source deal to standardize the instruction set, the patent encumbrances are relieved and you can now split up until Intel and sell pieces
When you say there’s a paraphrase of “arm is eating our lunch now” I read an additional level: “If we had done this up until now we would've had the anti-trust whistle blown at us”
decade? More like 2 if I remember correctly. I played Skyrim when it launched on Intel integrated graphics which was on chip? It really took 8 mins to load a map.
x86_64 and arm both decode instructions into micro operations that are actually executed. So the complex instruction might actually become several loads, adds, multiplies, stores, etc. that get shoved down the execution pipeline. Very true though that x86 needs much more active silicon to do this decoding though. A core piece of the inefficiency here is that each instruction may have a different byte width which complicates the decoding logic, where I don't believe this is true for arm or risc-v. It is crazy to me that AMD and Intel CPUs still support 8, and 16 bit memory modes in 2024. On Apple's video decoder, both intel and amd have dedicated accelerators on chip to do this work too. See Intel's QuickSync released in 2011. Similarly QuickAssist (QAT) offloads a lot of network packet processing.
^ THIS! I literally came directly from watching one and a half hour video "The Magic of ARM w/Casey Muratori" published by Prime just a few hours ago to this uninformed rambling.
Theo, I usually like your videos, especially your coverage of new JS implementation tech, but it's apparent in this one that you don't really know what you're talking about when it comes to the hardware. Virtually every statement or comparison of the architectures of x86/ARM/RISC-V in this video is misinformed somehow.
LOL they are not scared of ARM as much as you believe. The Qualcomm laptops are not selling well. Sure, they recognize ARM is a threat, but right now that threat is not even gaining a significant market share. It could very well become a flop, like the first time MS tried Windows on ARM. I think assaulting Nvidia's AI dominance is what is driving this cooperation.
Most computers sold each year are servers. ARM is becoming competitive with x86 on servers. x86 is not really used for AI workloads. This is about CPUs.
Business > consumer. You have apple silicon and arm on the consumer side but most are going to be x86 and x64. On the business side arm and risc are huge with aws, azure, gcp and even things like alibaba cloud all having arm servers and x86 ones.
Nvidia is the real threat for them. They are manufacturers not only a license, going at the heart of their business. Intel and AMD both tried to get a share of the upcoming AI market (selling shovels). They had a big bet on FPGAs buying Altera and Xilinx respectively, turns out that was a dead end, at least so far.
That’s not how any modern processors work. They are basically Jit compilers for its variant of the assembly language. The actual operations are called uops which are summing, division, etc
6:30 You compare apples to oranges, ALL usable CPU have build in division! You compare CPU that is used to blink leds with one that run on data server. Any serious RISC CPU (this include Risc-V, look for all possible variants of it) that run on server will have around 1000 instructions (integer operations, floating points, vectors operations int/float variant, hundreds of operations for OS, etc.) not mentions that some RISC chip manufacturers can add custom instructions to speed up some operations.
Yeah, SIMD, System management, and Security related instructions put a relatively high floor on the number of instructions needed to run a modern fully feature operating system (Linux, Windows)
14:25 "wired instructions" there is noting weird here, this is same logic as adding dedicated instructions to better support AI. This is why old x86 had lot of operations to speed up string operations but now when memory is bottleneck most of them not make lot of sense.
x86 isnt a rough license. You literally cant licence it. At this point the only reason AMD has an x86 license is because they made good products during the Athlon64 days and ushered in 64Bit computing for the platform and Intel licensed their design. They then did a perpetual cross licensing deal and they've been tied together ever since. I am making this distinction because Intel blocked Nvidia from licensing x86 in the recent past which is what lead Nvidia to license transmeta tech and produced interesting ARM based architectures starting with "denver" cores. The lore is hella deep Theo.
Love your vids Theo, but I'm getting increasingly tired of listening to "thing everyone does that doesn't work well", "which is why Apple does it this way", when they were far from the first to do so. To echo others, RISC != power efficiency by default. Modern CISC processors do not have silicon dedicated to old instructions that just takes up space doing nothing. Those same transistors can also execute common, useful instructions. The best analogy I've seen is an instruction manual that comes in 15 languages, but you only need one, does not make it take longer to put the thing together.
The take on VIA tells me this guy really needs to learn more before spouting off. There were a lot of points of know-it-all ignorance in this video, but the VIA one ruffled my feathers. Via's mini, Nano and pico-itx boards running vVa's low powered chips were awesome for embedded applications. I built a 15" touchscreen computer into the dashboard of my truck back in the early 2000's. Via was the only game in town. I've used them in robotics, Nintendo emulators and so many other projects. Via is still around and they produced x86 up into 2021 before Intel purchased Centaur. They still have the license and are currently working with Chinese company Zhaoxin to produce chips for the Chinese market. There are mini pc's on the market with the new chips in them now.
VIA is actually *a very old player* in this world - they've been manufacturing north- and southbridges for motherboards _since the 386 era._ If you owned a computer between 1995 and 2005, there's _a very good chance_ its motherboard had a VIA chipset linking the CPU to the peripherals like RAM+AGP (nb), IDE+USB+PCI/ISA slots (sb). Of particular note are the VIA Apollo chipset which functioned as the flagship stuff for Intel Pentium/Pentium II/Pentium III and Celerons of the era, and the VIA KT series which did the same for AMD Athlon/Duron/Sempron CPU lines. They kinda fell out of the spotlight once the northbridge got "sucked" into the CPU itself, and the dev of PCIe kinda made southbridges mostly irrelevant as they were.
Sorry Theo but Arm has way more than 232 instructions. ARMv8 has well over a thousand. Can you explain what exactly is in the x86 64 ISA that would prevent someone from building a dedicated chip for encoding video? And how ARM is somehow better for this..?
@@insanemal Are you talking about the microcode? I guess I never considered that to be RISC based, since it's not the interface that is targeted by compilers, nor assemblers. x86 architecture may be physically built around a RISC cores, but it still seems to me that the x86_64 instruction set is still CISC
@@MNbenMN Well you'd be disagreeing with the creator of the first x86 chip to work like this. But hey, if you're willing to disagree with him, I'm willing to point and laugh at you. So win, win I guess
@@insanemal We are clearly talking about different things. The instruction set for x86_64 is still as complex as ever, no matter how the internals of the microcode decoder translation works. The core may be simplified, but the instruction set is not. Maybe you are an RISP?
Theo if you haven't seen some of the interviews Jim Keller has done where gets into technical details (namely the one with Lex Fridman, if I remember correctly), you should go watch some. The ISA war is a distraction from the Micro Architecture and Fabrication Capabilities wars. 2:32, 'seamless interoperability across hardware and software platforms' x86 is worse than ARM? But I can have bootable usb that works on 99% of x86 machines, where as for ARM you need images specific to your device! 'delivering superior performance' I know Apple has payed for that really nice newest TSMC node again, but wheres the rush to get AAA games running on ARM? My gaming machine is x86, AMD has V-Cache, and the Xbox and PlayStation are both x86. And its not purely for porting to ARM reasons, many big engines have an export to Android/IOS button, and my current small gaming project has code to handle Linux vs Windows, not x86 vs ARM I let the compiler do that! 6:18 Well I'll give Theo points here, hes not completely wrong. However there are advantages to having more instructions. And you can't have any realistic conversation on this with out understanding micro ops. ALUs on x86 haven't seen x86 instructions for possibly decades.... there's been more instructions than ALUs for a long time. 6:38 So others have already called out div being a bad example, but those x86 complicated instructions will use more I-Cache on ARM after translation. And if you have I-Cache problems you have problems at the start of the cpu's pipeline not the mid/end like with D-cache. A way better example would be that variable length instruction problem x86 has. 8:11 Why are we still having the CISC vs RISC debate? Both sides are trying to move to an inbetween point once they've been making chips long enough. ARM's added stuff cause JS! And x86, feeds a more RISC like instructions into the backend. 11:18, um Intel Quick Sync exists..... Everyone making a general purpose compute chip, has included (or required you to include a separate gpu) video decode hardware. 15:32 the CPU manufacture does weird stuff (out of order, speculative execution, branch prediction...), is how all of them get performance now. Apple is at the same risk as Intel and AMD. Heck Prime even covered an article where Apple chips have a security issue cause they where looking at bit patterns in cache, in hardware and prefetching them regardless of weather it was a pointer or not...... 15:44 FYI, Spectre and Meltdown effected like almost every CPU made for the past 1-2 decades before its discovery, as long as the chip designers cared about performance. Honestly one of x86's biggest benefits is the dualopaly, so seeing them come together should worry ARM and RISC-V, anything x86 already does better (looks at UEFI....), could become a real problem for them.
Many (probably most of) x86 instructions are now "micro-coded" instructions, they are turned and pipelined as many simpler opcodes, meaning x86 ALUs aren't that many.
This entire video is Utter Non-Sense. 1- Yes RISC, as the name implies, is a Reduced Instruction Set Computer. Both ARM and RISC-V are RISC architectures. ARM is a proprietary one. RISC-V is Open Source. 2- Having a reduced instruction set on your CPU is Far Worse in terms of Speed AND development. You have Way more to program and that will hinder your computational output. This is why X86 are to this day Faster then any RISC CPU ... be it ARM or RISC-V. Meaning, if the CPU does not have say a vector arithmetic operation that exists in X86_64, the the User-space software Has to make that calculation which will drag down the CPU response to a halt. Imagine say a huge 10MB (say 10 Bytes per data row) input from a software that is Instantly calculated on X86, that is in 1Million operations = at say 3GHz that is 3.3e-3 seconds (single core only to simplify). If the RISC cpu Does not have the instructions to calculate that operation ..because they have way less operations .. and if the User-Space Software algorithm requires say 40 steps for the operations to make the calculation ... the response on that RISC cpu running at the same 3GHz clock speed will be 3.32e-3 x40 = 0.13 Seconds (minimum, there are more overheads)! ... 3- Yes, X86 chips are going to be more difficult to produce and therefore more Expensive and less energy efficient ...you need to maintains all those Billions of extra gates fed ... . they are indeed More complex because they are more Feature-Rich. 4- There is of course a caveat about Speed ... if one application is Strictly used for a Single Purpose scenario, for example Server work Does not need a big instruction set of arithmetic operations and the platform can always add externally say encryption (SSL for example) decoding facilities ... in That particular scenario a RISC-based CPU (ARM or RISC-V) can indeed Compete with X86 because for the same amount of lithographic printed Area there will be More Cores on a RISC based CPU then on a X86-64. That is indeed an Advantage for RISC processors. For generic computing ..no , X86_64 is going to win. 5- The idea that RISC is lower power again ... depends on How the RISC Chip is used. 6- X86 can always develop Chip for custom purpose applications with a reduce instructions set ... I know that does not sound good ... but it is possible to Remove some CPU instruction sets for the X86 platform in order to make it more efficient. And the Range of applications is actually broader then for RISC cpus'. 7- The Big, Huge, unsurmountable problem is that Actually No one Needs the Sophistication of X86. And that is the Big problem for both AMD and Intel ... Everyone uses a smartphone and they are Plenty Sufficient for Most User-Space application these days. the fact remains that RISC is Capable of providing Excellent User experience without the need for the 1350 something instructions Set printed on a CPU ! And that is the big problem Intel and AMD are facing ... hence the Panic mode.
the main point i would pick you up on is your use of advanced cisc instructions vs single core risc, which is largely a straw man argument but useful as an illustration in some circumstances. the real battle now is between a small number of fast but power hungry cisc processors which need lots of cooling, and much higher core count risc processors which are cheaper to design, build, and run. this is why your average smartphone already has at least 10 cores, while your average desktop does not. yes, i know multithreading inflates the thread count, but the results are in, and the downsides are enough that it is starting to go away.
@@grokitall I mentioned the "Server usage" scenario ... lots of cores with specific simple tasks = more performance. Core count wins. In that case risc wins. On the mobile side obviously the less power hungry platform also wins. There is not even a serious attempt from the X86 platforms to that sort of environment. I think we both agree on the technical issues. My point though was not even the technical issue but the Usage issue. The computing world is turning to Mobile in a huge way, has been for more then a Decade. The total number of smartphones sold per year is up to 900 Million, a huge number. The number of all PC's (laptops and desktops) tops at about 300Million ... that's 1/3 the number of mobile devices... (add to this Smart_tv (ARM inside 99% of all models sold) , network equipment (home routers), car multimedia and so many other applications ... and the number of risc (basically ARM) cpu's to X86 is about 4-1 ..maybe 5-1 ... Both Intel and AMD understand that if the RISC cpu's starts to reach the PC "ecosystem" ... it's a Big Big problem. Specially because the most sold computers are the Low-spec ones .. exactly were RISC can have a Huge role ... And we can see that happening at ever great numbers .. Hence this never dreamed before "alliance" between Intel and AMD. It is indeed Panic mode.
@@John_Smith__ i largely agree, but it is not just mobile where risc wins due to power, but the data center too. the issue here is that cisc uses a lot of power, and produces a lot of heat. moving to a larger number of slower processors which use less power overall, and do not produce anywhere near as much heat saves the data center a fortune. as most of these loads are platform neutral, and you can just recompile linux for arm on the server, it does not have the windows lock in. even worse, as more devices go risc, the software moves to apps and websites, further killing the need for windows. and of course for a lot of tasks, the linux desktop is already good enough, so you have a triple threat. i cannot wait to see the companies panic as they realise that people outside the west cannot upgrade to 11 and 12 due to costs being too high, and scramble to find some other solution.
@@grokitall Dear friend I use Linux since 1997, full time Single installed OS since 2001! I still remember those days: vmware had a Awesome client that allowed to mirror entire PC's with extreme ease ... that was the end of windoze for me on Any of my computers. From that day on I use Linux as Main OS ever since. I witnessed the Birth and evolution of Linux from almost day 1! Something Epic! And the example you talked I mentioned and I have forecasted that More than a Decade Ago! Server side Data Center work is basically two things: Database access and web server along with some other simple programming .. All of that is Already present and highly optimized in risc/(ARM mainly). I Even go Further on my predictions ... I think it is just a matter of time until Both Intel AMD start to make very very heavy investments on RISC silicon ... yes it sounds insane but for example Intel (and AMD) can always go the way of RISC-V. I joke saying they can rename it CISC-V 😀 . This of course if they take a hit on their server side offerings. I've been following now and them the ARM offerings on the server side...and I remember a couple of years the 128 Cores ARM for servers ... yes it is coming in full force And of course we agree on the cost factors in data-centers: Power and Heat ... When loading a Data-center with say ... 20.000 server it is a Huge Impact on cost if each server runs at 500W with 64 Threads or 300W with 128 Threads ... TOC of both solutions is Drastically in favor of risc hands down ... But again ... Server side risc is not yet there yet ... but they are already Owning or at least can do it, the lets call it "Mid-power" PC market Specially on the Laptops.
There are a lot of limitations on cpu performance that make the actual instruction sets less important. However from an open source or even just open market perspective, simplification allows more unsophisticated entries into the market, faster. Having said that, depending on on the software, combinations of instructions can be more efficient time or energy wise- some tasks such as video encoding are a specific use-case to optimize. The biggest gain is actually apple’s Rosetta Code, which translates and caches conversions from one instruction set to another. This allows software to move separately from hardware; x86 dominates partially because of windows’ dependency on it. It also allows all of apple’s software to benefit from hardware optimizations without recompiling for a specific cpu. Android has something similar with java bytecode being the ISA. Apps are both JITed and compiled optimally for the phone’s ISA.
Gotta hand it to the comment section, generally they have had the right reaction to a bunch of things here, no hate because I don't think your reaction is bad just the details and why this is interesting is something of note. 1. x86 and ARM both have the same problem in that they are licensed, ARM though has a single company ARM itself steering it and you were right to highlight the fractured nature of AMD and Intel but they have also added features like video encoding to their chips in the same way as ARM and that is done not through ISA improvement it is done through libraries like libva accessing those chips 2. The number of ISA instructions shouldn't really matter to you as a dev but has a lot of implications from a complexity of design standpoint. The compiler decides how to differentiate the signals between x86, RISC-V and ARM variants. Stuff like Box64 and Rosetta just do that after the fact and shouldn't be a huge issue to applications overall but does have an effect on speed. If the processor has more cores because the complexity it lower it can do more things at once, if the ISA is more complex they might not need as many instructions to complete a task. There are tradeoffs at both ends of the spectrum. 3. RISC-V isn't open source it is an open standard and can run into the same fragmentation issues as other ISAs just that you don't have to license it and can propose extensions to the foundation. Another point about standards is Cuda already is competing with an open standard OpenCL which both AMD and Intel support the ISA discussion would be more here introducing specific instructions to accelerate those workloads but there is no real hint that there is a new Cuda alternative coming down the line. Also AMD can support Cuda as long as it isn't using Nvidia libraries or code, see the Google v Oracle case on Java use in Android, you can't copyright APIs. Similar WINE on Linux also does conversion of calls. My hot take is this is going to be a great cleanup of x86-64 from both Intel and AMD, they will push those changes to compilers after they disable those instructions and it will give a bit more longevity to the platform but I think if you asked both do they see ARM on desktop being a primary platform in the future they will probably say yes regardless of the partnership. A key point though and no one seems to mention this is who the partners are in this effort, they are all cloud providers. Cloud is still and will still be x86-64 based for at least another few decades because of how slow server compatibility moves. CISC makes a lot more sense in that domain anyway because it is less about power usage for your household or battery life on a laptop and way more about complex workloads and efficiency. So I don't see much movement in cloud at all other than a bunch of edge stuff maybe to ARM or RISC-V in the medium term.
you are fundamentally wrong about cloud. while cloud dumped x86 in favour of x86-64 well before the desktop did, it was because we could see well in advance the problems that would kill it. the problem facing x86-64 is it is only relatively cheap due to economies of scale due to windows, and for the data center it uses too much power, and needs too much cooling. windows is gradually going away, which is obvious to everyone outside the usa, and to microsoft which will shrink the market even further. the data centers have already decided to move to risc with linux to solve the power and cooling problems, and the only major question is how fast.
"arm can barely be called reduced instructions these days" - something I heard twice in two days. "I just know exactly enough about architecture" - theo. doesn't sound like he knows enough
“RISC-V” is typically pronounced as “risk five.” The “RISC” part refers to “Reduced Instruction Set Computer,” which is pronounced “risk,” and the “V” represents the Roman numeral for 5, indicating it’s the fifth iteration in the RISC architecture lineage.
Go watch prime's video with Casey Muratori. I kinda like Theo, but the amount of straight out misinformation here is shocking. If you have no clue whatsoever about something, why do you talk about it? For example - if one were to implement all the optional ARM instructions, an ARM would also end up in the range of 2-3k instructions. The difficulty in translating/emulating x86 instruction sets on ARM has nothing to do with the amount of instructions. It's simply a) because the instructions need to be translated to begin with and b) x86 instructions have variable lengths vs. fixed length on ARM. So decoding them introduces a lot of overhead. ARM is not easier or cheaper to manufacture either. ARM also had microcode for a veeery long time. They are very beefy these days and come with all the features like x86 chips, such as cryptography.
RISC and CISC has nothing to do with performance , in Risc every opcode is either 16bit (thumb) or 32bit wide instruction, in Cisc the instruction size is dependent on register or what instruction is used and so on.
the reason this matters is that in modern synchronous designs, all of the instructions are slowed down to fit the time needed for the slowest part, which is slower on cisc. by dumping some of the slower instructions, risc gets a speedup for free. when the research was done on asynchronous arm chips in the 90s, an improvement was seen, but the funding went away and we never got to find out if synchronous clocks were fundamentally a bad idea, or just a simplification which was not expensive enough to dump.
@@grokitall Sorry makes no sense, instruction Sets are numbers not transistors , old instruction sets can be for example be converted or translated to newer instruction set, the alu unit can be renewed And so on, this why i say RISC or CISC has nothing to do with performance. They are just numbers, and even if they slow down the instruction set for,the slowest part, then that is a fault on intel design , but not the instruction set.
@@jasonbuffalo6845 if it was just logically equivalent to having a firmware x86 / x86-64 to risc microcode jit, then i would agree with you, but even some of the cisc processor designers say that the larger instruction set increases complexity, which in turn increases transistor counts. that is before you consider the failure of these people to work with compiler writers to make some of these more complex instructions into compiler macros instead, reducing the need to keep them on the chip. this is also a problem for testing, which is historically why you can recognise which intel x86 cpu you are running on by spotting which hardware bugs need working around.
@@grokitall Here is the Truth why ARM is Better as Intel, and that is because of there Modular design, Company can Add extra Functionality to there CPUs what intel can not, Apple and Qualcomm is the best Example With there Video Decoding Engine there NPU and the use it with good effect, what has Intel nothing, no NPU , no extra Video Decoding hardware, so company like DELL or HP can not Bring nothing to the Table like APPLE or Android, and all of this has nothing to do with Instruction set, these extra Module increases the Instruction set increases Complexity and Transistors. But the Performance is getting better. Intel was sleeping on the high Horse and now the have catching up to do, and the instruction set is not the problem , its Intel Engineers.
@@jasonbuffalo6845 not sure i agree with that. because the extra hardware does not report its presence, it is hard for the os to detect, and thus use automatically, which can cause all sorts of issues. i agree intel got complacent, they do that every so often.
When Microsoft Decides they dont want to support X86 , Intel and AMD will die the same day. And Microsoft is building their own chips so its matter of time. I see no future for x86.
Hi, just want to add my two cents, I do think that the comparison between instruction counts is a very popular fact but it's very misleading. Cent 1: the presented count for arm might be correct, but it's for the 32 bit version (with 16 bit support), the version that gets traction now is the aarch64 variant, which is quite different. Cent 2: because the instruction count is smaller there times where there are more instructions per line of C code. Also, both instruction sets mainly use just a handful of instruction types in day to day operation and I suspect there are some tricks applied at the silicone level that make those instructions faster to decode for both architecture types. My personal opinion is that arm (aarch64) might start to get some bigger traction in the consumer space, especially for the laptop market; but enterprise will be very slow adopters (honestly, i don't see it coming until Microsoft makes a huge effort to port and stabilise absolutely everything in their business suite) and aarch64 servers might just be a novelty rather than the norm. I am also curious about the evolution of risk-v and aarch64 in the future, it would be quite funny to see a future where these two get to have a considerably higher number of instructions because of needs they weren't actually designed for.
11:15 if you go to the spec pages for most modern CPUs form AMD/Intel they explicitly mention extensions they have. E.g AES or codec encode/decode. They’ve been doing it for ages, it’s why things like quick sync allow you to pretty efficiently run a plex server.
The instructions et doesnt necessarily make a chip more efficient. The x86 turns the Ops into uOps so the CISC v RISC differences arent really there. Uarch is distinct from Instruction set. ARM does a good example Compare ARM neoverse to arm cortex, compare the new X4 cores as seen in the dimensity 9400 to the cores in the latest Snapdragon x elite or Apple M4, same IS, different IPC metrics.
RISC-V has open instruction slots anyone can assign to new tasks. That's what arm is, RISC-V with a proprietary set of a bunch of specific instructions tacked on to the end, I think. Meta is also designing FPGA and ASICS on RISC-V, along with Google, Apple, MSFT and Nvidia.
The explainstion is so incredibly false it might as well be misinformation. Some things that i noticed that are completely untrue: RISC vs CISC is a stupid talking point, not even a single hardware person is talking about it. Modern CPU, including the first x86 CPU, executing things in pipeline, and the instruction set only affects the implementation of the decoder. Admittedly, x86 isn't easy to decode due to the variable bit length, but that has nothing to do wit ALU or the number of instruction set. aPpLE is so big brain they know how to add a different chip. Dude, tell me which CPU instruction on x86 is for QuickSync? None! Because QuickSync is on a sperate chip. ARM doesn't have instructions for division. How TF do you think ARM chip calculate division then? Instructions doesn't need to be literally called division for it to be considered an instruction for division. GPUs have much simpler instructions. No they are not. Most newer GPU these days have tensor cores which can do tensor operation, and I'm pretty sure that's not a thing for CPU. Also, it's really not about the complexity of the instructions, it's about weather you need them or not. It's absolutely stupid to add an instruction for mouse and keyboard on a GPU for example, but it's not that stupid of an idea for x86 to have it. CISC makes companies harder to deal with the hardware. It's literally the opposite. X86 is much easier to work with than ARM when you are doing real programming.
risc vs cisc is not being talked about directly, but is fundamental to if the future is cisc with bigger caches, or risc with way more cores with smaller caches. and the answer to that is we don't have the data yet to know which way thecfuture will jump, but the answer will come from smartphones and risc in the data center.
11:45 Wrong, x86 has dedicated video encoding hardware. Not a seperate chip like Apple does, but its embeded in the existing chip. Intel has QuicksyncCore which supports quicksync, H264 and AV1, AMD has VideoCoreNext which supportd h264, h265, VP9 and AV1. I have heard of the efforts of x86S, its interesting and i know AMD had a simular idea a few years ago to drop 8-bit and 16-bit support, but had pushback from customers that ARE STILL USING 8BIT AND 16BIT, like WHAT THE HECK. The fact there are still customers using new AMD and Intel products to run in 8-bit and 16-bit mode is wild.
I think they are wasting their time. Unless x86 gets forbidden by law, it doesn't go anywhere anytime soon. Much like ICE gasoline and diesel fuel! I repeat, X86 does not go anywhere anytime soon!
x86-64 is being kept in power by windows, which is only relevant on the decreasingly used desktop. netbooks died because smartphones and tablets increasingly displaced the form factor, and chips got good enough that entry level laptops crashed in price. x86-64 is in a similar position. linux has killed windows in the data center, which are now moving to risc because of power and cooling. it never got anywhere in smartphones, which are all risc, and because of this more things are moving to web apps in data centers. then you have the excessive hardware needs of windows 11 and 12, which mean that developing countries will not be able to afford even pirated copies of windows, accelerating the existing trend to ditch it in favour of linux. nobody ever got fired for buying ibm, until they lost the market to the pc they helped create. microsoft are diversifying away from windows as they can see which way the wind is blowing, so the only remaining question is how amd and intel pivot when the wintel duopoly dies.
@@zalnars7847 so you are seriously claiming that the addition of the tpm chip for windows 11 and the ai chip for 12 are not increasing the price for the base windows hardware, and that this will not be an issue for anyone needing to replace multiple machines, nor for all of the people outside the west, who mainly run pirated windows as they currently cannot pay more than a months wages for the existing machines, and thus will not be able to run even pirated versions of 11 and 12. india already has 15% linux desktop usage for these reasons, and 11 and 12 will make this grow faster. just because some wintel fanboys in the west can afford the latest microsoft tax does not mean everyone can. the other points have similar supporting reasons.
Funny how yet again, they'll do anything except lower prices or improve features give more features. Took forever to get more than 4 cores, AVX512, VNNI, legible naming, profiling tool compatibility, etc.. Also, remember that time when all CPUs have been overvolted stock for the past couple of years?
actually it was getting 2 cores to work right, then getting os support for it which took forever. it only really took off when they started hitting clock speed limits, so going dual core was the only option.
actually you are wrong. if the processors get much higher clock rates, you have to stop sending the signals down wires, and start sending them down plumbing, which is why they stopped chasing clock numbers and moved to something else. the main thing keeping the x86 & x86-64 alive is the wintel duopoly, which is slowly dying. data centers need cooler less power hungry processors, which they can move to now that linux is getting better risc support. the desktop is being eaten by smartphones and tablets on one end, and by web apps on the other, neither of which need windows. the windows market will get further punished due to the hardware requirements for 11 and 12 making it so that developing countries cannot even use pirated versions, as they cannot afford the hardware. the writing is on the wall for both windows and cisc, as cisc is only being kept alive due to windows. i hope they get a plan b before it is too late.
Please next time dig in the topic more you completly miss every point x86 have thing called micro ops it is like arm but you use high level abstraction over it so everything in the video is wrong
Clock speed scaling fell out of favor int he Pentium 4 days because clock speeds above 5-6GHz requires exotic materials as we've hit the limits of silicon. Thats why cores have been getting "slower but wider" and other strategies were introduced such as ILP/pipelining, SIMD, multicore, and the current strategy is chiplets. We left scale up for scale out but scale out hits limits due to Amdahl's law. So its almost pick your poison, maybe fixed function will make a comeback. I speculate there are at the end for deterministic processors and the next step will be fuzzy processors, Where we run statical models to determine the closest answer and depending on the precision needed that inference might be sufficient.
Sure is this good? Yes. But it's a patch ultimately. The problem with x86 is its OLD as it goes back to 1978. I'm sure there has been some cleanup but it still has decades of backwards compatibility. The advantage of arm, risc or any new architecture is that you don't have all the dead weight and it's easier to create from scratch then it is to go through and trim the fat. And for as much legacy code and systems exist how many companies will even be able to use a stripped down x86 standard.
except that arm was developed at acorn in the 1980s, so its old is the wrong argument. actually, the idea behind cuda also goes back to the 1980s as well, as does the work on multicore processor systems.
guys.. he did say the way he simplified architecture would hurt. I'm fairly certain Theo knows better. This is what Terry Pratchett would have called "lies to children". I'm curious, does anyone here have a better explanation that does not require previous knowledge or more explaining?
I had 2 question (although i know the answer but i want u to think about it) Given a arm and x86/64 which is going to perform better on web servers and here i talk about performance (for server performance is all about serving user request after processing) to efficiency (more specific power consumption not CPU efficiency) ratio? And which act better And second question is would u choose mac(arm based) or arm based windows for 4k gamming at 120 fps? Or rather choose windows x64/86????? If u consider this i think amd/intel are in good condition? And i had one question for my own is there any way to make x86/64 more power efficient i mean not adopting arm principles?
I don’t think the ISA determines power usage, arm has been historically power efficient because they target the mobile market and so have gone in that direction but recently we’ve seen intel getting quite good power usage. The architecture or the chip would be what matters for power.
I'm not surprised seeing Linus supporting this, he's been an ARM hater for a long time. His take on this can be justified because ARM is a clusterf*ck at architecture level (custom ISAs) seeing many vendors deriving from the ARM standard specs, making supporting, maintaining and optimising an OS a real nightmare. Funny how the same problem plagging the OS landscape at software level (ie. linux distro fragmentation) can now be at hardware level and alienating its creator.
PSA: I am a hardware engineer and there is an astonishing amount of misinformation in the hardware part of this video. I know it's coming from a JS dev so no flame, just thought I'd let you know.
(P.S. microarchitecture determines power efficiency **NOT** instruction set)
any others?
Thanks, can you (or anyone else) provide any details on what exactly in that section is misinformation?
Nice way to say very little with so many words.
Just remember, millions of software engineers are deploying multi-megabyte websites that display less then a kilobyte of text every day...
also shame on him for not mentioning the OISC architecture 💔
I'm kind of shocked at how much stuff is wrong in this video.
5:40 This is incorrect, all of those instructions are decoded into microcode that's pipelined on more granular execution units, like those ALUs. Each unique instruction doesn't imply a unique hardware unit. Both x86 and modern ARM processors decode to micro-ops, which is why there's no inherent difference between them after decoding. The only meaningful difference worth debating is decoding fixed vs. variable length instructions, which hasn't been brought up at all in this video.
6:50 There's no dedicated chip or coprocessor in the M-series processors for Rosetta. They use just-in-time and ahead-of-time translation, and simply map x86 instructions to the ARM equivalents. Some x86 extensions (like SSE) also map to the ARM variants (NEON). The instructions that don't directly map are emulated with multiple instructions. Of course, there's a ton of other sneaky tricks Apple engineers added (like with memory ordering), but AFAIK there's no special silicon for hardware-accelerating dynamic binary translation like what's being implied here.
7:20 I struggle to understand how this benefits power efficiency. What do you mean "add more chips"???
8:35 "More chips" is confusing here, again. I feel like you're trying to imply a focus on instructions per clock (IPC) rather than clock speed? IPC isn't related to the ISA at all, that's up to the design teams and microarchitecture. You can't just pump up IPC by simply adding ALUs, either. If only it was that easy....
11:00 Intel and AMD have provided hardware-based codecs in their integrated graphics for a long time now. Instead of hardware encoding, you can do software encoding which *can* be accelerated with SIMD instructions, which Apple also supports. There's nothing Apple is doing in this regard that's better than Intel. It's incredibly wasteful to provide a specialized instruction for something like video encode, so it should be decoupled from the ISA as much as possible. Instead, it's better to add generic instructions for generic encode/decode algorithms, like SSE/AVX.
12:28 AMD can't officially support ZLUDA (the project I assume you're referencing) not because it's "not an official standard". They were funding it but their legal department chose to axe it. I assume since they anticipated legal action from Nvidia. Instead, they've been making efforts with HIP for a while now, which is an abstraction layer on top of CUDA and ROCm (ROCm being AMD's solution vs. CUDA).
12:36 AMX isn't a direct competitor to CUDA. I wouldn't even consider any CPU extension to compete with CUDA, they're just not comparable.
13:44 I don't see how looking at this block diagram demonstrates how bloated x86 is. AMD could easily adapt this microarchitecture for ARM. Notice how the bulk of the diagram is after the decode stage. This article has a ton of other issues...
14:09 MPSADBW is an instruction for sums of absolute differences, which ironically is a very useful and relevant instruction for video encoding 😅. This is a good example, because it's generic enough to be used for other digital signal processing applications, not just video encoding.
15:22 Spectre/Meltdown isn't related to the ISA at all. There wasn't a particular x86 instruction that was problematic, it was instead a massive oversight related to implementations of branch prediction and speculative execution. This affected all CPUs at the time, even some ARM CPUs. This opened up an entire class of security vulnerabilities, and we're still getting newly discovered CVEs to this day, even affecting Apple's M-series.
Overall I think there's a fundamental lack of understanding of how a CPU works, and other false assumptions in the hardware space. There are a few other things I missed, which I see some other folks in the comments section pointed out. You seem to have a big platform, so I hope you redo or amend this video to mitigate the misinformation.
Thanks.
Ooof. That first point you mention is something I learned in undergraduate school in Computer Science, lol. How the heck does someone who built a YT channel on programming not know about microcode?
11:20 this is false. Intel does put dedicated video encoding chips on x86 CPUs. And they did it way before Apple even though about it.
😂😂😂😂
I assume he's speaking about SoC's in which case there are x86 based SoC's.
Quicksync absolutely rules. Video encode/decode is one thing where Intel straight up wins if they can leverage the hardware. Both on CPU’s and on ARC (for the 8 people who bought one, me being one of them)
@@jonathanjones7751 Yeah, the absolute power it has. I use N100 mini PC (120$) as media server and it can transcode 4k video while sipping just 7W of power. This has insane value.
Quicksync is so underrated. I run BlueIris servers on N100 mini pcs at different business locations and it's amazing how many video feeds I can stream with very little power.
I know this is coming from a JS dev, but almost everything about the architectural comparison between x86, ARM and RISC-V other than just numerical facts are wrong in this video... Maybe should have done a little more research or let someone with a lot more knowlege about the field explain...
You want Theo to be informed about a topic before talking about it, that’s asking a bit too much no?
He asks for forgiveness. He gets to keep live-streaming his misinformed thoughts and continuing to keep his work minimal while maximizing the people he reaches as a Js dev.
He has a bit of an ego and still find it funny that he tried to make dark mode a feature and thought he would be praised.
Reminds me of how I used to wonder how people lived off of url shorteners since it is such a simple tool and now I can see. Enough fools and you support another fool.
Hi Theo! IDK if you dig through the TH-cam comments, but I saw this and really felt like people might come away with the wrong idea. You make a few good points, but I'd like to set a few things straight about RISC/CISC/Intel/ARM. Here's two helpful of helpful bits of info, in no particular order:
- x86 implementations very rarely have ALUs that completely implement every version of the 3,000-odd instructions, that's what microcode is all about. Under the covers, a modern Intel chip is a RISC-like system (kind of, it's complicated, asterisk-asterisk) running it's own ""emulation"" (kind of, it's complicated, triple-asterisk) of the entire complex instruction set. YES: This does make the architecture overall, in at least some measurable ways "more complicated" than ARM. Intel has really struggled on some architectures to optimize the tradeoff between "what do we build an accelerator for in silicon" vs "what do we emulate in 7 clock cycles of microcode". But they're always at least trying to innovate here. NO: x86 does not, and I believe[would need to fact-check and I do not have my textbooks handy] that even the 8086 had a few of the complex instructions that were ""emulated"" in that way. They took several clock cycles because they were stepped processes like add/carry/jump etc.
- One of the main tradeoffs that my computer design textbook lists as The Difference between CISC and RISC is NOT speed; i.e. it is NOT necessarily about fewer OVERALL clock cycles. The book lists the example that while in theory complex instructions [like multiply] COULD be implemented in one clock cycle on CISC, in practice this basically never actually are. The BIG DIFFERENCE the book (Computer Architecture and Design, I have an edition from 2016, not sure the ed number) is about CODE PAGE SIZE. In RISC, complex instructions have to be _written out in your code_, whereas in CISC some segments can be represented extremely compactly and then it becomes the processor's job to translate those individual instructions into ops during decoding. Why is this important? Cache Lines! More instructions in the pipe theoretically means higher throughput / fewer cache misses / more code resident in RAM, etc. This was EXTREMELY important back in the day when memory was an expensive commodity! That was ((at least one of)) the reason(s) that Intel made this tradeoff in the design of the 80-series. Back when it was designed, you could fit literally more code more compactly into the same memory and memory was PRECIOUS.
[citation, while not a computer architect in practice, I studied low-level hardware in college under a professor who seriously knew his shit. We used Computer Architecture and Design as a textbook, and Prof Whose Name Shall Not Be Shared Lest I Get Doxxed was a prominent member of the IEEE, a practitioner in the 70s and 80s, and pulled stunts like bringing in an actual board of hand-wound Core Memory from an ancient PDP for us to look at, he knew his shit]
I will happily go back through my class notes and textbook for actual references and a proper bibliography, if anybody cares lol
that citation part is pretty foking impressive ngl
A proper bibliography would be nice 👍 I’d also like to know if the same memory concerns are valid today? E.g. is storing the instructions in CPU cache significantly faster than RAM and thus x86 still beneficial today?
Great 👍🏻
Thanks for this. I was going to make a similar comment, but you went into much more detail than I would've. 💯
@@BenLewisEi am sure someone will make that argument, but the real tradeoff today is between more, faster cache, and more cores. due to the relatively huge die sizes for cisc, they have to optimise for cache, whereas risc designs also get the option of having more cores and less cache.
as this option is only available on risc, we need to wait and see which will be better in practice, but risc has a lot of other advantages, so in the long term, risc is going to win, the same way x86-64 beat x86.
Wow, I couldn't watch that whole thing. Sometimes I forget Theo understands hardware worse than he understands Rust. This is an interesting topic, and thanks for bringing it to my attention, but I'm not going to get my information on the subject here. I get the feeling that Theo wants to do to X86 what JSSugar wants to do to JS because complexity he doesn't understand doesn't seem justified? Really common mistake, but you can avoid it by either sticking to where you are an expert, or actually learning more about things you talk about.
… and when he opens his mouth about AI/ML
"x86 is in a rough state"
No, I'd argue that "the state" of x86 is still "the best". When it comes to high-performance desktops, workstations, servers etc. there is really only one option right now, and that is x86. Software support is also still best on x86, including things like toolchains, compiler(-optimization) support, legacy applications, etc. - It's only in certain niches that other architectures are even a thing.
Of course x86 still has it's problems, including that it is ancient, huge and has quite a bit of "cruft" - That's what I assume that x86 Ecosystem advisory board is about.
But pretending that "x86 is in a rough state", and that it desperately needs saving when it is clearly still the best and default option is misleading.
The fact that there are 3500+ instructions means compilers completely ignore the vast majority of them. If you want to use them, you need to code in asm directly. And you need to know that they exist. If you have a couple hundred instructions, the compiler backend may use most of them, if you have thousands, 90% of them will never be used by the compiler. That's why Linus is onboard.
@@lolilollolilol7773 In general, I would say compilers don't ignore instructions.
Compilers typically have a "cost" value associated with instructions, and make decisions on which instruction to choose based on the lowest cost of a possible generated code snippet. For example, on x86 you often have indirect addressing in instructions, which can save you loads/stores and offset calculations, but means your instructions are larger, and possibly slower to execute compared to a separate load/store + ALU op, for example. Your compiler is very clever about selecting the correct instructions(ideally, anyway), and regards the context as well, picking either the smallest instruction(-Os), or the fastest to execute(-O2). This takes into account the entire code block, so that if, for example, you need the offset into memory in a register at a later time, a instruction that under other circumstances might be less ideal gets chosen(In this example, a separate load and offset calculation, to have the offset in a register for a later operation, might be better than the shorter an usually ideal combined load from offset complex instruction).
In fact, instruction selection in compilers is way more complicated than even this, it also considers pipelines, cache usage, etc.
That's why it's important to tell your compiler about your CPU when performance really matters. You are right that compilers often don't regard the entire instruction set, but that is mostly because you asked them to: If you use a generic CPU target your binary will run on any x86 CPU, but won't make use of special instructions only some processors support(e.g. AVX, newer SSE versions, etc.). I'm sure there are instructions that are used so rarely that compilers didn't bother writing the selection/generation code, but I would guess it's quite rare.
That's why you should compile performance-critical applications yourself with e.g. "-march=native", "-mtune=native", or manually specify the supported instructions. This is BTW the same on ARM or RISC-V. Thumb(2) for example is not included on every ARM core, and your compiler needs to know or a slower fallback might be used even when support is available. RISC-V has extensions, for example, for multiply/divide, SIMD vector operations, virtualization, etc.
Some of that is mitigated by automatically selecting the correct implementation at runtime, but this is not possible everywhere, and has it's own problems.
Ok mr smarty-pants!!@@Maxjoker98
That's not how instructions are handled like at all. They get broken down internally into other simpler instructions so no you don't have seperate handling for all instructions baked into the silicon
This, there’s the decode phase. There might be a slight slowdown on that phase but instructions can run in little chunks
Not for everything, but CISC does contain instructions that RISC does not. If you're compiling to a RISC based architecture, you need to break the instructions into more simple instructions
7:20~ Number of instructions doesn't say much. The instruction decoding takes little to no silicon space compared to everything else like cache, multiple ALUs, multiple FPUs, SIMD stuff and pipeline etc (This is PER CORE, add glue logic and multiply by number of cores).
Pretty much all IPC gains we've seen since the 90's is because of smarter pipelining/branch prediction and the number of operations it's able to do per clock cycle, not the ISA or how simple/complex it is).
One could probably argue that a bigger decoder adds a very miniscule amount of additional draw but all the other crap is the real culprit. More performance == the bigger number of ctrl+c,ctrl+v of individual components inside each core.
Also: x86 hasn't been x86 since the Pentium Pro or something. They chop everything up into microinstructions that are fairly similar to RISC.
Some videos should not exist, this is one of them
you got so much wrong here. go watch that prime+casey video you mentioned since its relevant. the instruction set doesnt matter since it goes down to uops anyway.
This isn’t about saving x86. This is about intel getting their AMD patent agreements undone so they can sell the company for parts
??? Intel is trying to sell their company ?
@@si4745 Their chip fab business is in uncompetitive with TSMC, it's holding back the processor design business right now. But they can't split them apart because of patent encumbrances with AMD around x86_64 (and other shared instructions) which don't transition through sale. So by making an open source deal to standardize the instruction set, the patent encumbrances are relieved and you can now split up until Intel and sell pieces
When you say there’s a paraphrase of “arm is eating our lunch now”
I read an additional level:
“If we had done this up until now we would've had the anti-trust whistle blown at us”
Hahahaha thats prettty funny. Maybe thats actually the reason hahaha
I guess Theo wasn't aware of integrated graphics that have existed for a decade at this point?
Yeah AMD's APUs and Intel Iris Graphics are nothing new.
decade? More like 2 if I remember correctly. I played Skyrim when it launched on Intel integrated graphics which was on chip? It really took 8 mins to load a map.
Screw x86, all my homies use x64
x86 comes in 2 flavors. 32- and 64-bit.
.. dont be scared when you discover your "64bit" cpu using only 48 bit address in reality.
Not the students on tiananmen square
U mean x86-64? Its the same shit 😂
Haaaaaha
x86_64 and arm both decode instructions into micro operations that are actually executed. So the complex instruction might actually become several loads, adds, multiplies, stores, etc. that get shoved down the execution pipeline. Very true though that x86 needs much more active silicon to do this decoding though. A core piece of the inefficiency here is that each instruction may have a different byte width which complicates the decoding logic, where I don't believe this is true for arm or risc-v. It is crazy to me that AMD and Intel CPUs still support 8, and 16 bit memory modes in 2024.
On Apple's video decoder, both intel and amd have dedicated accelerators on chip to do this work too. See Intel's QuickSync released in 2011. Similarly QuickAssist (QAT) offloads a lot of network packet processing.
I’d recommend having a chat with Casey Muratori (he’s been on prime’s stream) think you could learn a lot from him, he’s deep in the x86 world.
^ THIS! I literally came directly from watching one and a half hour video "The Magic of ARM w/Casey Muratori" published by Prime just a few hours ago to this uninformed rambling.
Theo, I usually like your videos, especially your coverage of new JS implementation tech, but it's apparent in this one that you don't really know what you're talking about when it comes to the hardware. Virtually every statement or comparison of the architectures of x86/ARM/RISC-V in this video is misinformed somehow.
LOL they are not scared of ARM as much as you believe. The Qualcomm laptops are not selling well. Sure, they recognize ARM is a threat, but right now that threat is not even gaining a significant market share. It could very well become a flop, like the first time MS tried Windows on ARM. I think assaulting Nvidia's AI dominance is what is driving this cooperation.
Then why do. you think the two parties in a duopoly are joining forces? Nvidia.
Most computers sold each year are servers. ARM is becoming competitive with x86 on servers.
x86 is not really used for AI workloads. This is about CPUs.
Business > consumer. You have apple silicon and arm on the consumer side but most are going to be x86 and x64. On the business side arm and risc are huge with aws, azure, gcp and even things like alibaba cloud all having arm servers and x86 ones.
Nvidia is the real threat for them. They are manufacturers not only a license, going at the heart of their business. Intel and AMD both tried to get a share of the upcoming AI market (selling shovels). They had a big bet on FPGAs buying Altera and Xilinx respectively, turns out that was a dead end, at least so far.
Think about it. Intel could start cranking out RISC-V chips tomorrow if the market demands it, why should they be worried?
That’s not how any modern processors work. They are basically Jit compilers for its variant of the assembly language. The actual operations are called uops which are summing, division, etc
6:30 You compare apples to oranges, ALL usable CPU have build in division! You compare CPU that is used to blink leds with one that run on data server.
Any serious RISC CPU (this include Risc-V, look for all possible variants of it) that run on server will have around 1000 instructions (integer operations, floating points, vectors operations int/float variant, hundreds of operations for OS, etc.) not mentions that some RISC chip manufacturers can add custom instructions to speed up some operations.
Yeah, SIMD, System management, and Security related instructions put a relatively high floor on the number of instructions needed to run a modern fully feature operating system (Linux, Windows)
14:25 "wired instructions" there is noting weird here, this is same logic as adding dedicated instructions to better support AI. This is why old x86 had lot of operations to speed up string operations but now when memory is bottleneck most of them not make lot of sense.
x86 isnt a rough license. You literally cant licence it. At this point the only reason AMD has an x86 license is because they made good products during the Athlon64 days and ushered in 64Bit computing for the platform and Intel licensed their design. They then did a perpetual cross licensing deal and they've been tied together ever since. I am making this distinction because Intel blocked Nvidia from licensing x86 in the recent past which is what lead Nvidia to license transmeta tech and produced interesting ARM based architectures starting with "denver" cores. The lore is hella deep Theo.
Love your vids Theo, but I'm getting increasingly tired of listening to "thing everyone does that doesn't work well", "which is why Apple does it this way", when they were far from the first to do so.
To echo others, RISC != power efficiency by default. Modern CISC processors do not have silicon dedicated to old instructions that just takes up space doing nothing. Those same transistors can also execute common, useful instructions.
The best analogy I've seen is an instruction manual that comes in 15 languages, but you only need one, does not make it take longer to put the thing together.
The take on VIA tells me this guy really needs to learn more before spouting off. There were a lot of points of know-it-all ignorance in this video, but the VIA one ruffled my feathers. Via's mini, Nano and pico-itx boards running vVa's low powered chips were awesome for embedded applications. I built a 15" touchscreen computer into the dashboard of my truck back in the early 2000's. Via was the only game in town. I've used them in robotics, Nintendo emulators and so many other projects. Via is still around and they produced x86 up into 2021 before Intel purchased Centaur. They still have the license and are currently working with Chinese company Zhaoxin to produce chips for the Chinese market. There are mini pc's on the market with the new chips in them now.
VIA is actually *a very old player* in this world - they've been manufacturing north- and southbridges for motherboards _since the 386 era._ If you owned a computer between 1995 and 2005, there's _a very good chance_ its motherboard had a VIA chipset linking the CPU to the peripherals like RAM+AGP (nb), IDE+USB+PCI/ISA slots (sb). Of particular note are the VIA Apollo chipset which functioned as the flagship stuff for Intel Pentium/Pentium II/Pentium III and Celerons of the era, and the VIA KT series which did the same for AMD Athlon/Duron/Sempron CPU lines. They kinda fell out of the spotlight once the northbridge got "sucked" into the CPU itself, and the dev of PCIe kinda made southbridges mostly irrelevant as they were.
Change the default page size from 4kb to 16kb would go a long way towards enabling x86 to have some runway for the future
Sorry Theo but Arm has way more than 232 instructions. ARMv8 has well over a thousand. Can you explain what exactly is in the x86 64 ISA that would prevent someone from building a dedicated chip for encoding video? And how ARM is somehow better for this..?
Yep that x86 needs to die article is freaking wrong in MANY ways.
x86 has been RISC based for quite some time.
The x86 instruction set is reduced how?
@@MNbenMN x86_64 isn't directly executed.
Hasn't been for ages. x86_64 is decoded and executed by an RISC engine.
@@insanemal Are you talking about the microcode? I guess I never considered that to be RISC based, since it's not the interface that is targeted by compilers, nor assemblers. x86 architecture may be physically built around a RISC cores, but it still seems to me that the x86_64 instruction set is still CISC
@@MNbenMN Well you'd be disagreeing with the creator of the first x86 chip to work like this.
But hey, if you're willing to disagree with him, I'm willing to point and laugh at you.
So win, win I guess
@@insanemal We are clearly talking about different things. The instruction set for x86_64 is still as complex as ever, no matter how the internals of the microcode decoder translation works. The core may be simplified, but the instruction set is not. Maybe you are an RISP?
Theo if you haven't seen some of the interviews Jim Keller has done where gets into technical details (namely the one with Lex Fridman, if I remember correctly), you should go watch some. The ISA war is a distraction from the Micro Architecture and Fabrication Capabilities wars.
2:32, 'seamless interoperability across hardware and software platforms' x86 is worse than ARM? But I can have bootable usb that works on 99% of x86 machines, where as for ARM you need images specific to your device!
'delivering superior performance' I know Apple has payed for that really nice newest TSMC node again, but wheres the rush to get AAA games running on ARM? My gaming machine is x86, AMD has V-Cache, and the Xbox and PlayStation are both x86. And its not purely for porting to ARM reasons, many big engines have an export to Android/IOS button, and my current small gaming project has code to handle Linux vs Windows, not x86 vs ARM I let the compiler do that!
6:18 Well I'll give Theo points here, hes not completely wrong. However there are advantages to having more instructions. And you can't have any realistic conversation on this with out understanding micro ops. ALUs on x86 haven't seen x86 instructions for possibly decades.... there's been more instructions than ALUs for a long time.
6:38 So others have already called out div being a bad example, but those x86 complicated instructions will use more I-Cache on ARM after translation. And if you have I-Cache problems you have problems at the start of the cpu's pipeline not the mid/end like with D-cache. A way better example would be that variable length instruction problem x86 has.
8:11 Why are we still having the CISC vs RISC debate? Both sides are trying to move to an inbetween point once they've been making chips long enough. ARM's added stuff cause JS! And x86, feeds a more RISC like instructions into the backend.
11:18, um Intel Quick Sync exists..... Everyone making a general purpose compute chip, has included (or required you to include a separate gpu) video decode hardware.
15:32 the CPU manufacture does weird stuff (out of order, speculative execution, branch prediction...), is how all of them get performance now. Apple is at the same risk as Intel and AMD. Heck Prime even covered an article where Apple chips have a security issue cause they where looking at bit patterns in cache, in hardware and prefetching them regardless of weather it was a pointer or not......
15:44 FYI, Spectre and Meltdown effected like almost every CPU made for the past 1-2 decades before its discovery, as long as the chip designers cared about performance.
Honestly one of x86's biggest benefits is the dualopaly, so seeing them come together should worry ARM and RISC-V, anything x86 already does better (looks at UEFI....), could become a real problem for them.
So, no more AVX-8192 extension that down-clocks CPU by 90% while executing its instructions?
Many (probably most of) x86 instructions are now "micro-coded" instructions, they are turned and pipelined as many simpler opcodes, meaning x86 ALUs aren't that many.
This entire video is Utter Non-Sense.
1- Yes RISC, as the name implies, is a Reduced Instruction Set Computer. Both ARM and RISC-V are RISC architectures. ARM is a proprietary one. RISC-V is Open Source.
2- Having a reduced instruction set on your CPU is Far Worse in terms of Speed AND development. You have Way more to program and that will hinder your computational output. This is why X86 are to this day Faster then any RISC CPU ... be it ARM or RISC-V. Meaning, if the CPU does not have say a vector arithmetic operation that exists in X86_64, the the User-space software Has to make that calculation which will drag down the CPU response to a halt. Imagine say a huge 10MB (say 10 Bytes per data row) input from a software that is Instantly calculated on X86, that is in 1Million operations = at say 3GHz that is 3.3e-3 seconds (single core only to simplify). If the RISC cpu Does not have the instructions to calculate that operation ..because they have way less operations .. and if the User-Space Software algorithm requires say 40 steps for the operations to make the calculation ... the response on that RISC cpu running at the same 3GHz clock speed will be 3.32e-3 x40 = 0.13 Seconds (minimum, there are more overheads)! ...
3- Yes, X86 chips are going to be more difficult to produce and therefore more Expensive and less energy efficient ...you need to maintains all those Billions of extra gates fed ... . they are indeed More complex because they are more Feature-Rich.
4- There is of course a caveat about Speed ... if one application is Strictly used for a Single Purpose scenario, for example Server work Does not need a big instruction set of arithmetic operations and the platform can always add externally say encryption (SSL for example) decoding facilities ... in That particular scenario a RISC-based CPU (ARM or RISC-V) can indeed Compete with X86 because for the same amount of lithographic printed Area there will be More Cores on a RISC based CPU then on a X86-64. That is indeed an Advantage for RISC processors.
For generic computing ..no , X86_64 is going to win.
5- The idea that RISC is lower power again ... depends on How the RISC Chip is used.
6- X86 can always develop Chip for custom purpose applications with a reduce instructions set ... I know that does not sound good ... but it is possible to Remove some CPU instruction sets for the X86 platform in order to make it more efficient. And the Range of applications is actually broader then for RISC cpus'.
7- The Big, Huge, unsurmountable problem is that Actually No one Needs the Sophistication of X86. And that is the Big problem for both AMD and Intel ... Everyone uses a smartphone and they are Plenty Sufficient for Most User-Space application these days. the fact remains that RISC is Capable of providing Excellent User experience without the need for the 1350 something instructions Set printed on a CPU !
And that is the big problem Intel and AMD are facing ... hence the Panic mode.
the main point i would pick you up on is your use of advanced cisc instructions vs single core risc, which is largely a straw man argument but useful as an illustration in some circumstances.
the real battle now is between a small number of fast but power hungry cisc processors which need lots of cooling, and much higher core count risc processors which are cheaper to design, build, and run. this is why your average smartphone already has at least 10 cores, while your average desktop does not.
yes, i know multithreading inflates the thread count, but the results are in, and the downsides are enough that it is starting to go away.
@@grokitall I mentioned the "Server usage" scenario ... lots of cores with specific simple tasks = more performance. Core count wins.
In that case risc wins. On the mobile side obviously the less power hungry platform also wins. There is not even a serious attempt from the X86 platforms to that sort of environment.
I think we both agree on the technical issues.
My point though was not even the technical issue but the Usage issue. The computing world is turning to Mobile in a huge way, has been for more then a Decade. The total number of smartphones sold per year is up to 900 Million, a huge number. The number of all PC's (laptops and desktops) tops at about 300Million ... that's 1/3 the number of mobile devices... (add to this Smart_tv (ARM inside 99% of all models sold) , network equipment (home routers), car multimedia and so many other applications ... and the number of risc (basically ARM) cpu's to X86 is about 4-1 ..maybe 5-1 ...
Both Intel and AMD understand that if the RISC cpu's starts to reach the PC "ecosystem" ... it's a Big Big problem. Specially because the most sold computers are the Low-spec ones .. exactly were RISC can have a Huge role ... And we can see that happening at ever great numbers ..
Hence this never dreamed before "alliance" between Intel and AMD. It is indeed Panic mode.
@@John_Smith__ i largely agree, but it is not just mobile where risc wins due to power, but the data center too. the issue here is that cisc uses a lot of power, and produces a lot of heat. moving to a larger number of slower processors which use less power overall, and do not produce anywhere near as much heat saves the data center a fortune. as most of these loads are platform neutral, and you can just recompile linux for arm on the server, it does not have the windows lock in.
even worse, as more devices go risc, the software moves to apps and websites, further killing the need for windows. and of course for a lot of tasks, the linux desktop is already good enough, so you have a triple threat.
i cannot wait to see the companies panic as they realise that people outside the west cannot upgrade to 11 and 12 due to costs being too high, and scramble to find some other solution.
@@grokitall Dear friend I use Linux since 1997, full time Single installed OS since 2001! I still remember those days: vmware had a Awesome client that allowed to mirror entire PC's with extreme ease ... that was the end of windoze for me on Any of my computers. From that day on I use Linux as Main OS ever since. I witnessed the Birth and evolution of Linux from almost day 1! Something Epic! And the example you talked I mentioned and I have forecasted that More than a Decade Ago! Server side Data Center work is basically two things: Database access and web server along with some other simple programming .. All of that is Already present and highly optimized in risc/(ARM mainly). I Even go Further on my predictions ... I think it is just a matter of time until Both Intel AMD start to make very very heavy investments on RISC silicon ... yes it sounds insane but for example Intel (and AMD) can always go the way of RISC-V. I joke saying they can rename it CISC-V 😀 . This of course if they take a hit on their server side offerings. I've been following now and them the ARM offerings on the server side...and I remember a couple of years the 128 Cores ARM for servers ... yes it is coming in full force And of course we agree on the cost factors in data-centers: Power and Heat ... When loading a Data-center with say ... 20.000 server it is a Huge Impact on cost if each server runs at 500W with 64 Threads or 300W with 128 Threads ... TOC of both solutions is Drastically in favor of risc hands down ...
But again ... Server side risc is not yet there yet ... but they are already Owning or at least can do it, the lets call it "Mid-power" PC market Specially on the Laptops.
Outta curiosity, do you invest in all these tech companies directly, or by proxy through a mutual fund?
There are a lot of limitations on cpu performance that make the actual instruction sets less important. However from an open source or even just open market perspective, simplification allows more unsophisticated entries into the market, faster. Having said that, depending on on the software, combinations of instructions can be more efficient time or energy wise- some tasks such as video encoding are a specific use-case to optimize.
The biggest gain is actually apple’s Rosetta Code, which translates and caches conversions from one instruction set to another. This allows software to move separately from hardware; x86 dominates partially because of windows’ dependency on it. It also allows all of apple’s software to benefit from hardware optimizations without recompiling for a specific cpu. Android has something similar with java bytecode being the ISA. Apps are both JITed and compiled optimally for the phone’s ISA.
Gotta hand it to the comment section, generally they have had the right reaction to a bunch of things here, no hate because I don't think your reaction is bad just the details and why this is interesting is something of note.
1. x86 and ARM both have the same problem in that they are licensed, ARM though has a single company ARM itself steering it and you were right to highlight the fractured nature of AMD and Intel but they have also added features like video encoding to their chips in the same way as ARM and that is done not through ISA improvement it is done through libraries like libva accessing those chips
2. The number of ISA instructions shouldn't really matter to you as a dev but has a lot of implications from a complexity of design standpoint. The compiler decides how to differentiate the signals between x86, RISC-V and ARM variants. Stuff like Box64 and Rosetta just do that after the fact and shouldn't be a huge issue to applications overall but does have an effect on speed. If the processor has more cores because the complexity it lower it can do more things at once, if the ISA is more complex they might not need as many instructions to complete a task. There are tradeoffs at both ends of the spectrum.
3. RISC-V isn't open source it is an open standard and can run into the same fragmentation issues as other ISAs just that you don't have to license it and can propose extensions to the foundation. Another point about standards is Cuda already is competing with an open standard OpenCL which both AMD and Intel support the ISA discussion would be more here introducing specific instructions to accelerate those workloads but there is no real hint that there is a new Cuda alternative coming down the line. Also AMD can support Cuda as long as it isn't using Nvidia libraries or code, see the Google v Oracle case on Java use in Android, you can't copyright APIs. Similar WINE on Linux also does conversion of calls.
My hot take is this is going to be a great cleanup of x86-64 from both Intel and AMD, they will push those changes to compilers after they disable those instructions and it will give a bit more longevity to the platform but I think if you asked both do they see ARM on desktop being a primary platform in the future they will probably say yes regardless of the partnership. A key point though and no one seems to mention this is who the partners are in this effort, they are all cloud providers. Cloud is still and will still be x86-64 based for at least another few decades because of how slow server compatibility moves. CISC makes a lot more sense in that domain anyway because it is less about power usage for your household or battery life on a laptop and way more about complex workloads and efficiency. So I don't see much movement in cloud at all other than a bunch of edge stuff maybe to ARM or RISC-V in the medium term.
you are fundamentally wrong about cloud.
while cloud dumped x86 in favour of x86-64 well before the desktop did, it was because we could see well in advance the problems that would kill it.
the problem facing x86-64 is it is only relatively cheap due to economies of scale due to windows, and for the data center it uses too much power, and needs too much cooling.
windows is gradually going away, which is obvious to everyone outside the usa, and to microsoft which will shrink the market even further.
the data centers have already decided to move to risc with linux to solve the power and cooling problems, and the only major question is how fast.
Tomorrow: Apple, ARM, RISC-V, Linus Sebastian, Epic Games team up to play Fortnite on Mac.
"arm can barely be called reduced instructions these days" - something I heard twice in two days.
"I just know exactly enough about architecture" - theo. doesn't sound like he knows enough
“RISC-V” is typically pronounced as “risk five.”
The “RISC” part refers to “Reduced Instruction Set Computer,” which is pronounced “risk,” and the “V” represents the Roman numeral for 5, indicating it’s the fifth iteration in the RISC architecture lineage.
"risk vee" is killing me. Literally first words on the wiki page, "RISC-V (pronounced 'risk-five')"
”Nothing unites humans as a common enemy”.
Go watch prime's video with Casey Muratori. I kinda like Theo, but the amount of straight out misinformation here is shocking. If you have no clue whatsoever about something, why do you talk about it? For example - if one were to implement all the optional ARM instructions, an ARM would also end up in the range of 2-3k instructions. The difficulty in translating/emulating x86 instruction sets on ARM has nothing to do with the amount of instructions. It's simply a) because the instructions need to be translated to begin with and b) x86 instructions have variable lengths vs. fixed length on ARM. So decoding them introduces a lot of overhead. ARM is not easier or cheaper to manufacture either. ARM also had microcode for a veeery long time. They are very beefy these days and come with all the features like x86 chips, such as cryptography.
RISC and CISC has nothing to do with performance , in Risc every opcode is either 16bit (thumb) or 32bit wide instruction, in Cisc the instruction size is dependent on register or what instruction is used and so on.
the reason this matters is that in modern synchronous designs, all of the instructions are slowed down to fit the time needed for the slowest part, which is slower on cisc. by dumping some of the slower instructions, risc gets a speedup for free.
when the research was done on asynchronous arm chips in the 90s, an improvement was seen, but the funding went away and we never got to find out if synchronous clocks were fundamentally a bad idea, or just a simplification which was not expensive enough to dump.
@@grokitall Sorry makes no sense, instruction Sets are numbers not transistors , old instruction sets can be for example be converted or translated to newer instruction set, the alu unit can be renewed
And so on, this why i say RISC or CISC has nothing to do with performance. They are just numbers, and even if they slow down the instruction set for,the slowest part, then that is a fault on intel design , but not the instruction set.
@@jasonbuffalo6845 if it was just logically equivalent to having a firmware x86 / x86-64 to risc microcode jit, then i would agree with you, but even some of the cisc processor designers say that the larger instruction set increases complexity, which in turn increases transistor counts.
that is before you consider the failure of these people to work with compiler writers to make some of these more complex instructions into compiler macros instead, reducing the need to keep them on the chip.
this is also a problem for testing, which is historically why you can recognise which intel x86 cpu you are running on by spotting which hardware bugs need working around.
@@grokitall
Here is the Truth why ARM is Better as Intel, and that is because of there Modular design, Company can Add extra Functionality to there CPUs what intel can not, Apple and Qualcomm is the best Example
With there Video Decoding Engine there NPU and the use it with good effect, what has Intel nothing, no NPU , no extra Video Decoding hardware, so company like DELL or HP can not
Bring nothing to the Table like APPLE or Android, and all of this has nothing to do with Instruction set, these extra Module increases the Instruction set increases Complexity and Transistors.
But the Performance is getting better. Intel was sleeping on the high Horse and now the have catching up to do, and the instruction set is not the problem , its Intel Engineers.
@@jasonbuffalo6845 not sure i agree with that. because the extra hardware does not report its presence, it is hard for the os to detect, and thus use automatically, which can cause all sorts of issues.
i agree intel got complacent, they do that every so often.
When Microsoft Decides they dont want to support X86 , Intel and AMD will die the same day. And Microsoft is building their own chips so its matter of time. I see no future for x86.
linus is on board because of targeting. how do you target for 3800 instructions / extensions? this is why arm is so much easier.
Hi, just want to add my two cents, I do think that the comparison between instruction counts is a very popular fact but it's very misleading.
Cent 1: the presented count for arm might be correct, but it's for the 32 bit version (with 16 bit support), the version that gets traction now is the aarch64 variant, which is quite different.
Cent 2: because the instruction count is smaller there times where there are more instructions per line of C code. Also, both instruction sets mainly use just a handful of instruction types in day to day operation and I suspect there are some tricks applied at the silicone level that make those instructions faster to decode for both architecture types.
My personal opinion is that arm (aarch64) might start to get some bigger traction in the consumer space, especially for the laptop market; but enterprise will be very slow adopters (honestly, i don't see it coming until Microsoft makes a huge effort to port and stabilise absolutely everything in their business suite) and aarch64 servers might just be a novelty rather than the norm. I am also curious about the evolution of risk-v and aarch64 in the future, it would be quite funny to see a future where these two get to have a considerably higher number of instructions because of needs they weren't actually designed for.
X86 has an instruction that prompts your mom to water your plants. Very useful stuff!
11:15 if you go to the spec pages for most modern CPUs form AMD/Intel they explicitly mention extensions they have. E.g AES or codec encode/decode. They’ve been doing it for ages, it’s why things like quick sync allow you to pretty efficiently run a plex server.
I'm in the top 10 baby!! that's what we here for!
The instructions et doesnt necessarily make a chip more efficient. The x86 turns the Ops into uOps so the CISC v RISC differences arent really there. Uarch is distinct from Instruction set. ARM does a good example Compare ARM neoverse to arm cortex, compare the new X4 cores as seen in the dimensity 9400 to the cores in the latest Snapdragon x elite or Apple M4, same IS, different IPC metrics.
BTW: The name x86 comes from the 8086 processor released by Intel.
RISC-V has open instruction slots anyone can assign to new tasks. That's what arm is, RISC-V with a proprietary set of a bunch of specific instructions tacked on to the end, I think.
Meta is also designing FPGA and ASICS on RISC-V, along with Google, Apple, MSFT and Nvidia.
NO lol. RISC-V is much newer than ARM.
I think Microsoft will be an important member here, since major changes to x86 can only move as fast as Windows does.
I'm actually really enjoying the sponsored segments lately, really spotlights cool tools and resources I didn't know about
The explainstion is so incredibly false it might as well be misinformation.
Some things that i noticed that are completely untrue:
RISC vs CISC is a stupid talking point, not even a single hardware person is talking about it.
Modern CPU, including the first x86 CPU, executing things in pipeline, and the instruction set only affects the implementation of the decoder. Admittedly, x86 isn't easy to decode due to the variable bit length, but that has nothing to do wit ALU or the number of instruction set.
aPpLE is so big brain they know how to add a different chip. Dude, tell me which CPU instruction on x86 is for QuickSync? None! Because QuickSync is on a sperate chip.
ARM doesn't have instructions for division. How TF do you think ARM chip calculate division then? Instructions doesn't need to be literally called division for it to be considered an instruction for division.
GPUs have much simpler instructions. No they are not. Most newer GPU these days have tensor cores which can do tensor operation, and I'm pretty sure that's not a thing for CPU. Also, it's really not about the complexity of the instructions, it's about weather you need them or not. It's absolutely stupid to add an instruction for mouse and keyboard on a GPU for example, but it's not that stupid of an idea for x86 to have it.
CISC makes companies harder to deal with the hardware. It's literally the opposite. X86 is much easier to work with than ARM when you are doing real programming.
@@HolyMacaroni-i8e actually writing things related to hardware like a compiler or some hardware driver instead of simply hooking 100 npm packages up.
risc vs cisc is not being talked about directly, but is fundamental to if the future is cisc with bigger caches, or risc with way more cores with smaller caches.
and the answer to that is we don't have the data yet to know which way thecfuture will jump, but the answer will come from smartphones and risc in the data center.
11:45 Wrong, x86 has dedicated video encoding hardware. Not a seperate chip like Apple does, but its embeded in the existing chip. Intel has QuicksyncCore which supports quicksync, H264 and AV1, AMD has VideoCoreNext which supportd h264, h265, VP9 and AV1.
I have heard of the efforts of x86S, its interesting and i know AMD had a simular idea a few years ago to drop 8-bit and 16-bit support, but had pushback from customers that ARE STILL USING 8BIT AND 16BIT, like WHAT THE HECK. The fact there are still customers using new AMD and Intel products to run in 8-bit and 16-bit mode is wild.
What would even be examples of those 8-bit and 16-bit users? AMD and Intel should just make a special CPU line for them at 10x price.
I called this back when apple first announced their move away from x86. I'm definitely no genius, this was just such an obvious eventuality
This is going to be quite interesting...
Looks like a good direction for an antimonopoly investigation.
ARM is RISC... its literally an acronym for "Advanced RISC Machines"
I think they are wasting their time. Unless x86 gets forbidden by law, it doesn't go anywhere anytime soon. Much like ICE gasoline and diesel fuel! I repeat, X86 does not go anywhere anytime soon!
x86-64 is being kept in power by windows, which is only relevant on the decreasingly used desktop.
netbooks died because smartphones and tablets increasingly displaced the form factor, and chips got good enough that entry level laptops crashed in price.
x86-64 is in a similar position. linux has killed windows in the data center, which are now moving to risc because of power and cooling.
it never got anywhere in smartphones, which are all risc, and because of this more things are moving to web apps in data centers.
then you have the excessive hardware needs of windows 11 and 12, which mean that developing countries will not be able to afford even pirated copies of windows, accelerating the existing trend to ditch it in favour of linux.
nobody ever got fired for buying ibm, until they lost the market to the pc they helped create. microsoft are diversifying away from windows as they can see which way the wind is blowing, so the only remaining question is how amd and intel pivot when the wintel duopoly dies.
None of these are true. @@grokitall
@@zalnars7847 so you are seriously claiming that the addition of the tpm chip for windows 11 and the ai chip for 12 are not increasing the price for the base windows hardware, and that this will not be an issue for anyone needing to replace multiple machines, nor for all of the people outside the west, who mainly run pirated windows as they currently cannot pay more than a months wages for the existing machines, and thus will not be able to run even pirated versions of 11 and 12.
india already has 15% linux desktop usage for these reasons, and 11 and 12 will make this grow faster. just because some wintel fanboys in the west can afford the latest microsoft tax does not mean everyone can.
the other points have similar supporting reasons.
It sounds like a group coming together to save web components from dying.
Regarding compatibility, ARM and RISC-V is a wild west compared with x86 and PCs
Funny how yet again, they'll do anything except lower prices or improve features give more features.
Took forever to get more than 4 cores, AVX512, VNNI, legible naming, profiling tool compatibility, etc..
Also, remember that time when all CPUs have been overvolted stock for the past couple of years?
actually it was getting 2 cores to work right, then getting os support for it which took forever. it only really took off when they started hitting clock speed limits, so going dual core was the only option.
Feels a bit like the association the lead battery manufacturers "Consortium for Battery Innovation" set up in 2019. Maximising inertia.
My prediction is that ARM and RISC will eventually kill off the x86 standard.
We didn't hit the limits on speed as much as we hit limits on temperature.
actually you are wrong. if the processors get much higher clock rates, you have to stop sending the signals down wires, and start sending them down plumbing, which is why they stopped chasing clock numbers and moved to something else.
the main thing keeping the x86 & x86-64 alive is the wintel duopoly, which is slowly dying.
data centers need cooler less power hungry processors, which they can move to now that linux is getting better risc support.
the desktop is being eaten by smartphones and tablets on one end, and by web apps on the other, neither of which need windows.
the windows market will get further punished due to the hardware requirements for 11 and 12 making it so that developing countries cannot even use pirated versions, as they cannot afford the hardware.
the writing is on the wall for both windows and cisc, as cisc is only being kept alive due to windows. i hope they get a plan b before it is too late.
Please next time dig in the topic more you completly miss every point x86 have thing called micro ops it is like arm but you use high level abstraction over it so everything in the video is wrong
Sounds like a campaign to save internal combustion cars.
OMG the hardware part is cringe and completely wrong
Clock speed scaling fell out of favor int he Pentium 4 days because clock speeds above 5-6GHz requires exotic materials as we've hit the limits of silicon. Thats why cores have been getting "slower but wider" and other strategies were introduced such as ILP/pipelining, SIMD, multicore, and the current strategy is chiplets. We left scale up for scale out but scale out hits limits due to Amdahl's law. So its almost pick your poison, maybe fixed function will make a comeback. I speculate there are at the end for deterministic processors and the next step will be fuzzy processors, Where we run statical models to determine the closest answer and depending on the precision needed that inference might be sufficient.
Fun fact: 98% of software uses only 6 instructions
How can they put google and broadcom when those orgs have their own enterprise ARM offerings which compete with x86?
My eyesssss, those white screens are so bright
Sure is this good? Yes. But it's a patch ultimately. The problem with x86 is its OLD as it goes back to 1978. I'm sure there has been some cleanup but it still has decades of backwards compatibility. The advantage of arm, risc or any new architecture is that you don't have all the dead weight and it's easier to create from scratch then it is to go through and trim the fat. And for as much legacy code and systems exist how many companies will even be able to use a stripped down x86 standard.
except that arm was developed at acorn in the 1980s, so its old is the wrong argument.
actually, the idea behind cuda also goes back to the 1980s as well, as does the work on multicore processor systems.
The only thing I took from this is that gpu's might be affordable once again
Bring on the RISC-V revolution!
this video was brought to you by a x86 chip
AMD was funded by Intel’s CEO back in the day so now history comes full circle
How do ASICs fit into all this?
guys.. he did say the way he simplified architecture would hurt. I'm fairly certain Theo knows better. This is what Terry Pratchett would have called "lies to children".
I'm curious, does anyone here have a better explanation that does not require previous knowledge or more explaining?
I had 2 question (although i know the answer but i want u to think about it)
Given a arm and x86/64 which is going to perform better on web servers and here i talk about performance (for server performance is all about serving user request after processing) to efficiency (more specific power consumption not CPU efficiency) ratio? And which act better
And second question is would u choose mac(arm based) or arm based windows for 4k gamming at 120 fps? Or rather choose windows x64/86?????
If u consider this i think amd/intel are in good condition?
And i had one question for my own is there any way to make x86/64 more power efficient i mean not adopting arm principles?
You can't revive a dead horse.
"Hey guys, I'm designing a processor and I found this video..."
(reads comments)
S'all good.
I don’t think the ISA determines power usage, arm has been historically power efficient because they target the mobile market and so have gone in that direction but recently we’ve seen intel getting quite good power usage. The architecture or the chip would be what matters for power.
Did someone noticed, theo heard us so videos are smaller 😮
And Theo's so scared that a non-ironically fine architecture will endure that he made this theatrical video about it.
I'm not surprised seeing Linus supporting this, he's been an ARM hater for a long time. His take on this can be justified because ARM is a clusterf*ck at architecture level (custom ISAs) seeing many vendors deriving from the ARM standard specs, making supporting, maintaining and optimising an OS a real nightmare.
Funny how the same problem plagging the OS landscape at software level (ie. linux distro fragmentation) can now be at hardware level and alienating its creator.
What's up with the light level? 😮
Got so many memories of x86. Would hate to see it go.
Is that Excalidraw in Obsidian?
7:37 its funny how an Indian institute was on top of search for an american user
Wait. That cap looks dope af. I want one
Ai doesnt run on x86! Anyone who says its bad is LLM
X86 is dead for me since the m1
You must say goodbye to PC games then.
@@dyto2287 bye bye 👋 there is prob already an effort to get a compatibility layer in place.
do i need to buy a new laptop if x86 is dying ? or i can keep my x86 laptop ???
Linus Torvalds is pronounced Leenoos Toorvalds.
Oh, they had a "come to jesus" meeting and something actually came out of it... That's actually, pretty cool!
Snapdragon X Elite rattled Intel and AMD. Well done, Qualcomm ❤❤❤
> Skip this if you know cpu architecture
he didn't even talk about the holes
also kinda sorta technically all chips add numbers*