Correct, RISC frontends can be more straightforward, and have a much easier time decoding multiple instructions in parallel. MIPS for example could take a 16 byte instruction buffer and feed it directly into a 4-way parallel decoder. Typically though, that 16 byte buffer is called an instruction queue and combined with the dispatch logic (for an out of order backend). In such a case the decoders can be more tightly coupled to the function units / reorder buffer entries, and only accept certain instruction types. The IBM Gekko is a good example of this, and the structure / behavior is well document in the manual (with examples). The more complex x86 processors do something similar, but through an added stage after decode (so they push the micro ops into a micro op queue, and then dispatch them to the function units accordingly). It's possible that Apple's M1 didn't implement that sort of queue logic and instead went for a simple 8-way decoder, and that's part of why the performance is so high. Aside from including the dispatch logic, there are other complications though. For example, you have the RISCV-C extensions, which are 16-bit instructions. Those are packed into 32-bit aligned words, but the decoders need to have the capability of switching between decoding a single 32-bit instruction and 2x 16-bit instructions. In short, yes, RISC frontends can be more straightforward, but usually architects take advantage of that to add more functionality to them, making them almost as complicated... Anything to save a pipeline stage.
@@RTLEngineering arm is complicated. It has 32-bit instructions (regular ARM), 16 bit (Thumb) and another 32 bit (Thumb2) but can be mixed with 16bit. So it's messy but not as bad. The small Cortex m0/0+/1/23 cores are wacky and are missing some instructions entirely which is a fun discovery.
@thenimbo2 Thanks for the clarification with ARM! That's partly why I explicitly didn't mention it in my comment (I knew it was more complex, but wasn't sure on the specifics). The M1 only implements AArch64 though. So that should be simpler since it doesn't need to support legacy 32-bit or the Thumb sets.
@@RTLEngineeringInteresting note on the A1. So you think they use a simple buffer and decode like Gekko? But obviously 64 bit instructions and more parallel decode.. it's always a little hard to speculate exactly what new processors are doing, even generically.
That would make sense, but officially it doesn't stand for anything (Intel's official position). It was added as a quip / play on words though, regarding the fact that the front end was overkill considering the backend limitations.
I was about to post "multi media extensions" too. I remember when these CPUs came out in the mid-90s and how this instruction extension was aimed at video decoding (around windows 95/98). IIRC it included some useful performance-counter/timing instructions too. Excellent video btw!
When can we expect to see the P6 video?
This series got me thinking. How complicated is ARM, MIPS and RISC-V frontend compared to x86? Seems RISC frontend should be straightforward right?
Correct, RISC frontends can be more straightforward, and have a much easier time decoding multiple instructions in parallel. MIPS for example could take a 16 byte instruction buffer and feed it directly into a 4-way parallel decoder. Typically though, that 16 byte buffer is called an instruction queue and combined with the dispatch logic (for an out of order backend). In such a case the decoders can be more tightly coupled to the function units / reorder buffer entries, and only accept certain instruction types. The IBM Gekko is a good example of this, and the structure / behavior is well document in the manual (with examples).
The more complex x86 processors do something similar, but through an added stage after decode (so they push the micro ops into a micro op queue, and then dispatch them to the function units accordingly).
It's possible that Apple's M1 didn't implement that sort of queue logic and instead went for a simple 8-way decoder, and that's part of why the performance is so high.
Aside from including the dispatch logic, there are other complications though. For example, you have the RISCV-C extensions, which are 16-bit instructions. Those are packed into 32-bit aligned words, but the decoders need to have the capability of switching between decoding a single 32-bit instruction and 2x 16-bit instructions.
In short, yes, RISC frontends can be more straightforward, but usually architects take advantage of that to add more functionality to them, making them almost as complicated... Anything to save a pipeline stage.
@@RTLEngineering arm is complicated. It has 32-bit instructions (regular ARM), 16 bit (Thumb) and another 32 bit (Thumb2) but can be mixed with 16bit. So it's messy but not as bad. The small Cortex m0/0+/1/23 cores are wacky and are missing some instructions entirely which is a fun discovery.
@thenimbo2 Thanks for the clarification with ARM! That's partly why I explicitly didn't mention it in my comment (I knew it was more complex, but wasn't sure on the specifics).
The M1 only implements AArch64 though. So that should be simpler since it doesn't need to support legacy 32-bit or the Thumb sets.
@@RTLEngineeringInteresting note on the A1. So you think they use a simple buffer and decode like Gekko? But obviously 64 bit instructions and more parallel decode.. it's always a little hard to speculate exactly what new processors are doing, even generically.
Multi-Media eXtensions?
That would make sense, but officially it doesn't stand for anything (Intel's official position). It was added as a quip / play on words though, regarding the fact that the front end was overkill considering the backend limitations.
I was about to post "multi media extensions" too. I remember when these CPUs came out in the mid-90s and how this instruction extension was aimed at video decoding (around windows 95/98). IIRC it included some useful performance-counter/timing instructions too. Excellent video btw!