Quake, Floating Point, and the Intel Pentium

RTL Engineering

มุมมอง 74 675

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 15 มิ.ย. 2024
The transition from mostly 2D games into immersive 3D environments was brought on by none other than the original Quake, making computer game history. At the same time, computer architecture began making a pivotal change from single pipelines into super-scalar. These two simultaneous changes shook up the PC processor market with architectural ramifications that still last till today. This video looks at some of the optimization techniques used by Quake and why it ran so much better on the Intel Pentium than the AMD K6 and Cyrix processors.
Chapters:
0:00 Setting the Stage
2:50 Quake's Shaky Optimizations
4:22 The Point of Floating-Point
7:03 FXCH Swaps the Stack
8:57 More Instructions per Cycle
10:12 FPU Pipelining
13:19 FDIV Overlap
15:28 K6 Deadlock
17:22 The Proof in the Pudding
วิทยาศาสตร์และเทคโนโลยี

ความคิดเห็น • 300

@unfa00 ปีที่แล้ว ⁺²⁴⁹
I think I would prefer a poor sounding human reading the script than a speech synthesizer...
@unfa00 ปีที่แล้ว ⁺¹⁰
PS: I see that it's been done for anonymity's sake.
@RTLEngineering ปีที่แล้ว ⁺⁴⁸
It's been done mostly for making production easier. This video would probably have taken an extra month to make (considering the length), if I had to record and edit my own voice. Some people enjoy doing that sort of production / editing, but I am unfortunately not one of them. The speech synthesis made it so that I could spend most of my production efforts on research and presentation.
@Ivan-pr7ku ปีที่แล้ว ⁺³⁸
The voice synth used here is actually quite sufficiently "natural" sounding. Reading from a script for long time is actually a skill that not everyone is good at, on top of the wasted production time. Also, for dry technical topic like this I would rather prefer steady and clear machine voice narration, rather to deal with weird ascents and uneven pacing.
@deth3021 ปีที่แล้ว
@@Ivan-pr7kuamazon polly is the best I've tried.
@Falcrist ปีที่แล้ว ⁺²¹
Agreed. This is an interesting topic, but I can't deal with the robot voice for 20 minutes.
@GatsuRage ปีที่แล้ว ⁺⁸²
I remember my family bought a used pentium 1 pc back in the day... and boy I was so stupidly happy when a few months after I found this random folder I knew nothing about called "quake" and the joy I felt when I opened it.... 18 years later and I will never forget that feeling! (and yes I beat it in every difficulty setting just with keyboard cos it had no mouse support and fun fact I remember literally 2 days after I beat it in the hardest difficulty a friend of mine told me about how to drop down the console and write "cheats" lol)
@AndreyKalikin ปีที่แล้ว ⁺⁷
it had a mouse support, but you needed to run mouse TSR before launching
@TorazChryx ปีที่แล้ว ⁺⁴
@@AndreyKalikin and then +mlook in the console within Quake itself
@GatsuRage ปีที่แล้ว ⁺⁵
@@AndreyKalikin yeah there was no way 15yo me knew that at the time I barely knew how to right click lol
@SuperSkandale ปีที่แล้ว
so 2005 ?
@GatsuRage ปีที่แล้ว ⁺⁴
@@SuperSkandale it was actually 2001...2 I think around pentium 3/4 era when we got a pentium 1 that we got scammed as the guy who sold it, told us it was pentium 3 lol at the time neither me or any1 in my family knew anything about computers.
@Ivan-pr7ku ปีที่แล้ว ⁺⁶⁷
This is probably the only video presentation on YT that properly deconstructs the Quake software rendering performance, in contrast to pretty much all other attempts that still play on old industry myths and oversimplified technicalities.
@Ojref1 ปีที่แล้ว ⁺³
Well, in my opinion the information presented is a bit more ancillary to the topic, plus the presenter using an artificial voice really devalues the overall quality. Dare I say click-bait like?
@JathraDH 11 หลายเดือนก่อน ⁺¹
@@Ojref1 Not sure what you are talking about here, its a pretty clear cut case. The robot voice does devalue the video presentation quality somewhat but not the content. So much stalling on a per clock cycle is horrid and absolutely the problem.
@mikosoft ปีที่แล้ว ⁺¹³
I've had courses in University about processor architecture and design. Even though we never got to any of these advanced techniques it's still a nice throwback for me to see the uOp diagrams and discuss pipelines and operation overlap. That FDIV explanation is perfect!
@ph1losopher ปีที่แล้ว ⁺⁸
Was not expecting such a thorough and comprehensive deep-dive. Outstanding.
@SaraMorgan-ym6ue หลายเดือนก่อน ⁺¹
imagine if you got the blueprints for the Pentium mmx and built your own chips and made your own Pentium computer today brand new
@TheVanillatech ปีที่แล้ว ⁺⁵
I got ripped off as a kid. Took my entire savings account of £1500, which my Dad had put aside for me, to a computer store when I was 15 years old, and asked for a Pentium 133 machine. They handed me the quote, which said "Pentium 133", 16MB RAM, 1MB ET4000AX, etc ... and told me to come back in 2 weeks to pick it up. The entire machine, with a 14" monitor, cost me £1400.
So I went back with my Dad to pick it all up. I grabbed a copy of PC Zone on the way home, which had a Diablo demo on the cover CD, and I fired the machine up. POST told me I had an AMD-PR-133 (@100Mhz), which was instantly confusing. Was it 133? Was it 100? Was it an INTEL????
So Diablo ran fine, looked amazing, and had me hooked for maybe two entire days. Then I got round to installing Quake, the very reason I had bought the computer, so I could play my friend over null serial on a weekend. To my horror, Timerefresh gave me 7.9fps on the start screen. And the game ran AWFUL. It actually ran worse than my friends Pentium 75Mhz with a crappy 512kb Trident. I called the shop, and asked them why I had terrible performance. They told me "We did you a favour, we gave you an AMD PR 133 which, although only 100Mhz, is actually FASTER and BETTER than a Pentium 133!". They fucking ripped me off. A kid. Took ALL my money, and gave me a CPU that cost them just 1/2 the price of a P133, but charged me for the P133. I couldn't argue, he was a grown man and I was just a kid. So I hung up and told me Dad. He blamed me for rushing into things. But I was devastated. I had to play Quake in a small window, on my brand new PC.
Fast forward 7 months, it was December of that year, and I went back to the store with a friend and - while he talked to the guys about a 56k modem and took him to the end of the store, I leaned over the counter and stole a brand new Pentium 200 in the BOX and walked out. I installed it that night, and Quake suddenly ran 3x faster, ultra smooth on all deathmatch maps, and I finally got my own back.
@SuperSkandale ปีที่แล้ว
you stole the cpu chip and installed it on a amd motherboard?
@TheVanillatech ปีที่แล้ว ⁺¹
@@SuperSkandale No. Back in the 1980's AMD had a subcontractor partnership with Intel, allowing them make X86 CPU's as an alternative source for industry buyers. Even though that deal fell apart and went through a judicial marathon, AMD continued to make Intel compatible CPU's into the 90's, including their take on the 686, or "Pentium". All those CPU's used Socket 7. So you could drop an AMD PR-133 (aka a fake Intel Pentium 133) into the same motherboard that would take a real Intel Pentium, or an IDT Winchip, or a Cyrix, etc.
AMD stuck with Socket 7 (and Super Socket 7) for long after Intel moved on with their Pentium 2 and 3 "SLOT" type motherboards. AMD released super high clocked Socket 7 CPU's like the K6-2 and K6-3 550Mhz. But then AMD would produce the Athlon and move to their own platform, no longer sharing space with Intel but as a direct competitor. Which they remain to this day.
The problem with the original AMD PR / K5 intel clones, was that the SUCKED at floating point computations compared to Intel. A true Intel Pentium 133Mhz was around TWICE as fast, as an AMD PR 133, in Quake. Same story with Cyrix CPU's. The only exception was the rare and expensive IDT Winchip 2 CPU's, which were also Socket 7 compatible, but used some kind of RISC architecture (I think) and were very good Quake performers, though much more expensive that an Intel Pentium.
@PaulSpades 12 วันที่ผ่านมา
This is the most heart warming theft I've ever read about. Good on you!
@TheVanillatech 12 วันที่ผ่านมา ⁺¹
@@PaulSpades I think many computer stores did similar things back then, usually preying on uninformed and older people - but also on children, like me. It was a goldmine, selling PC's back in the 90's. They were making enough money. But I guess bad people are just bad people, and always wanted more.
Yup, I got my own back. But I was definitely in the super minority, and had to risk a criminal record to do so.
But hey, as a Quake player, and for justice itself ... they left me no choice!
@idontwantahandlepleasestop 2 ปีที่แล้ว ⁺¹⁷
I'm impressed, this video gets it almost entirely right. Good job, most people spread misinformation when talking about quake optimizations.
one thing i'd note, though, is that the integer/fp execution overlap was not new to the Pentium, and in fact existed all the way back to the 8086/8087. The only x86/x87 compatible processors i know that did *not* have that were some very early Cyrix models. (i maintain a version of Quake that is hand-reoptimized for 486s and other non-Pentium CPUs, and take much heavier advantage of the original release - i've had to do a lot of research on this)
@RTLEngineering 2 ปีที่แล้ว ⁺¹
Thanks!
That was one of the motivations for making the video, the misinformation. Although, I have to admit that I thought a major factor was the FXCH serialization, but that didn't add up when I looked into it.
Thanks for the clarification. I did actually know that the FP co-processors could overlap, but talking about that would have been extra information that could have added confusion (it also wasn't relevant to the P5 vs K6 vs Cyrix 5x/6x story). To me, it was implied by "the 486 [..] used a floating point coprocessor model", although not explicit. I can make sure to explicitly discuss that in the longer version though.
Since you have background knowledge of the coprocessors, would you be willing to clarify a few things?
- Did the x87 coprocessors sniff the bus for x87 instructions, or was there an explicit signal hand-shake method for the co-processor, where the x86 would issue a command?
If it did sniff the bus, how would the x87 know that the data being read was an instruction vs data?
- How did the x87 maintain consistency with the integer registers, and how could it maintain that if it operated in parallel? I know the integer registers couldn't be used as direct arguments, but they could be used for memory addresses, correct?
- When the x87 was running asynchronously to the instruction stream, would debugging be more complicated, given that the x87 and x86 instruction pointers may not agree?
@idontwantahandlepleasestop 2 ปีที่แล้ว ⁺⁵
@@RTLEngineering to my understanding, the FPU does sniff the bus directly to know what operation to perform, but waits for a signal from the CPU before actually beginning any operation. the FPU does not interact directly with any integer registers and does not maintain a copy of them. it also does not access system memory in any way, and does not have a concept of an instruction pointer. there is a period of a few cycles at the start of each operation where the two are not executing asynchronously - during these cycles, the CPU does the work of fetching all relevant data from memory (direct or indirect) and transferring it to the FPU (exact method varies based on the generation but usually relies on i/o port transfers at offsets that can't normally be used from assembly code). once the FPU has that data, it is free to operate asynchronously while the CPU does other integer work. for operations like stores to memory, the FPU cannot run asynchronously - the CPU and FPU operate in lockstep for these instructions. debugging can be slightly more complicated - FPU instructions are treated as synchronizing points, which means that any results from one FPU instruction will generally not be available (including things like exceptions) until the next FPU instruction occurs. this is what the FWAIT instruction is for - it is basically a NOP, but acts as a synchronization point between the processor and the coprocessor, so that you can wait, and handle any exceptions that might have occurred before operating further on, e.g, results in memory that might not actually be valid yet.
@RTLEngineering 2 ปีที่แล้ว
Fascinating. I had considered that using the I/O range might make sense to transfer commands. What you said implies that the address calculation is done by the x86 and not the x87, at which point, the x87 could just load / store the data directly with the x86 driving the address/control lines, but not driving the data bus (if that's the method that was used instead of an extra I/O transfer).
So then how did this work with the original 8086? Were the FP instructions already added to it (given that the 8087 came out 2 years later), or was the 8087 a special case, and what you described for the 286 to 486?
@idontwantahandlepleasestop 2 ปีที่แล้ว ⁺¹
@@RTLEngineering yes, the x86 does the address calculations and at least in the 486 does loads and stores itself (served directly from its onboard cache which is faster than the fpu registers...). an entire range of instructions in the x86 instruction set is allocated for coprocessor use. the coprocessor interface was defined when the 8086 was released, even if the 8087 wasn't, and while i dont know the specifics, i know it is designed such that the CPU does not have to know what instructions the coprocessor supports and what they do. you could design an entirely different coprocessor with a completely unrelated instruction set that still works on that interface. if i am remembering correctly, there was at least one company that made a competing FPU like that, which was not compatible with the x87 instruction set but plugged into the same slot.
@RTLEngineering 2 ปีที่แล้ว
I didn't realize that the x87 instructions were technically just general co-processor instructions, but that makes sense. It's interesting, because that implies that the initial co-processor ISA model used by the 8086 limited the form of the implementation. Meaning that all instructions had to take the same form of the base instructions and could not instead utilize a different pattern. For example, you couldn't have a 3-byte opcode (including the Dx escape). It also means that you couldn't have immediate values, since that would cause the PC to become unaligned. That does make me wonder if my reason for using a stack model is correct, or if it was a specific limitation imposed by the co-processor encoding in the first place. Sort of along the lines of Intel said "we don't know what a co-processor might be, or how it would work... but here's some free bits, use them as you please" as opposed to "there aren't a lot of bits free for a co-processor, so it will likely have to be stack based" (the latter is what I had originally assumed, and the former is another example of the engineers involved not thinking ahead).
@GraveUypo ปีที่แล้ว ⁺¹⁰
this just reminded me that as a 8 year old i shoulf have listened to the seller that told my mom to buy the pentium 100 machine for just 100 bucks more than the 100mhz 486 dx4 i picked. i didnt trust him, but he just wanted us to get the better deal, for real. ah... if only i had listened. my first computer would have been twice as fast
@ReadersOfTheApocalypse ปีที่แล้ว ⁺³
Nice video. Never got such a solid explanation. Didn't mind the AI narrator at all.
Once played Quake on my secondary machine on a LAN party. That machine had a Cyrix 486dx4 or something like that. It worked, but was an abysmal experience... 5 fps on minimized viewport or so.
@kasimirdenhertog3516 ปีที่แล้ว ⁺¹⁵
Very interesting! I’m curious how the 486 architecture would handle that piece of Quake code and how many cycles that would take.
Oh, and John Carmack is a proper genius - I was under that impression already, but seeing how intricate the programming towards new technology is… 🤯
@migorigor ปีที่แล้ว ⁺⁵
Yep he is. But assembler optimizations for quake are mostly done by Michael Abrash (assembler genius)
@kasimirdenhertog3516 ปีที่แล้ว ⁺⁴
@@migorigor thanks for bringing that up! Never heard of him, but now looked him up and it's clear he's (also) a genius. How about this: 'In 1991 he introduced Mode X, a 320x240 VGA graphics mode with square pixels instead of the slightly elongated pixels of the standard 320x200 mode. At the same time, he introduced readers to a little-known part of the VGA standard allowing multiple pixels to be written at once.'
That's some serious tinkering, along the lines of Steve Wozniak figuring out how to do color on the Apple ][.
@AnalogAssailant 11 หลายเดือนก่อน ⁺⁴
Killer video, we always knew to stay away from amd back then because of performance issues (with quake and games running quake engine) now i have a detailed understanding as to why.
@flydeath1841 ปีที่แล้ว ⁺¹
amazing video @RTL Engineering this explain so much on why the K6 struggled so much with fpu heavy games like Quake, so quick question are you planning to post a video on the Pentium 2/3 and K7 Athlon?
@RTLEngineering ปีที่แล้ว ⁺¹
I do have a video planned on the Pentium 2/3 and on the K7, but those are focused on the front end alignment, and not specifically on games.
@Sonyfreak ปีที่แล้ว ⁺⁸
Thank you for this video! Altough I newer knew much about the inner workings of CPUs, I could follow your explanations reasonably well. I would love to hear a similar examination and comparison to the Cyrix processors. It's fascinating, that this small company was able to compete and often even beat Intel up to the Pentium with their own CPU designs. What a pity that Cyrix is long gone by now.
@RTLEngineering ปีที่แล้ว ⁺⁶
Thanks! I'm glad you were able to follow along! Finding the balance of background knowledge is always a challenge for videos like these.
To be honest, I don't really understand the internal design of the Cyrix processors all that much. They mention some advanced features which are also shown in the block diagrams, but those should have made the 6x86 perform better than the P5, and yet it performed worse (probably some implementation quirk like with the K6). Unfortunately, I think a video like that would require micro-benchmarking on an actual chip to extract architectural details.
They did have some very clever ideas though, which Intel did not start to integrate until the Pentium Pro / Pentium II.
@robgaros2985 ปีที่แล้ว
@@RTLEngineering One of the key differences was that the 6x86 was targeted as an alternative for the Pentium Pro, but with the ability for it to be placed on a cheaper platform and for significantly less money.
The integer speed was much faster clock for clock as a normal Pentium, so when it ultimately became a competitor for the standard user market instead they had to introduce a PR rating, because a 150MHz 6x86 was faster at integer as a Pentium 200. But with the FPU being the weakpoint, the difference got infinately bigger due to the PR rating, because it still ran at 150MHz while being about 25% slower at the same clockspeed. So a PR200+ Cyrix ran it's FPU at 150-25%=112.5MHz compared to an intel CPU, while integer being about equal to a 210MHz P5.
It cost Cyrix their name, while still being a better option for almost every single user that didn't play Quake like games, as it was significantly cheaper and faster in most applications as it's intel counterpart. But Quake became THE benchmark and basically destroyed a good competitor in the CPU space.
@louistournas120 ปีที่แล้ว ⁺¹
@@robgaros2985
That's true. Cyrix was probably aiming for the general market. It was a great CPU to use in business, where I imagine, they use Outlook, MS Office or something else, Netscape, Internet Explorer.
I had a Cyrix 6x86 PR200+L. I did benchmarking with a chess software and it beat a Pentium 200 MHz, maybe by 10%. Keep in mind, this Cyrix ran at 150 Hz.
Unfortunately, IT departments were not interested and still are not interested.
It is very rare to see AMD or Cyrix or IDT being used in corporate environments. IT departments base their decisions on trust and they trusted Intel since they had a long experience with buying Intel CPUs.
I have only seen 1 insurance company, the largest in Canada, use AMD Geo CPUs in mini PCs. These ran Win 7 and they are for connecting to a VM using Citrix. They did not have many. It was kind of a project to cut cost.
Even today, people who buy AMD CPUs are mostly home users. I’m sure there are some that buy servers and perhaps the occasional 3D rendering guys who want a high core count.
About 80% of gamers use Intel. Look at the Steam info.
@robgaros2985 ปีที่แล้ว
@@louistournas120 True, but AMD has basically taken over the high-end desktop CPU space with the Threadripper line and has a couple of huge supercomputer (department of defence and energy) deals with their Epic line. Most Formula 1 teams use those too. It's just the vast majority of run of the mill companies that are still keep clinging onto intel.
@TheMasterofComment 11 หลายเดือนก่อน ⁺¹
I do hope you post more videos, the content is interesting and quite niche on TH-cam. The Quake title likely attracted many casual viewers who expected more infotainment type content, it's unlikely they would understand and therefore they're not ur target audience. Do not be disheartened with those who are bothered with the synthesized voice, after all with 68k views at least some haters are expected. Many of us focus on the content.
@jirikajzar3247 ปีที่แล้ว ⁺²
Interesting comparison, simple and coherent. I would love to see later CPUs being compared in same way.
@DFX4509B ปีที่แล้ว
There's a couple parallels between Pentium vs. K6 in this vid and Bulldozer vs. Sandy Bridge, specifically regarding the FPU (Pentium had a self-contained FPU integrated into the silicon vs. the K6 using a shared module, similarly, Bulldozer shared an FPU between cores where each core in Sandy Bridge had fully independent resources between each other).
@DarrenRockwell ปีที่แล้ว ⁺²
Damn dude this video was so interesting thank you for taking the time and putting this together.
@wilburdog4508 ปีที่แล้ว ⁺¹²
Hi, im doing a research paper on the effects of Quake on the CPU market, your video has been very helpful to me, however, i was hoping you could help to find credible sources i could cite in my paper. Preferably the sources you used to curate your video. Thanks in advance - wilbur.
@rsdyeahh ปีที่แล้ว ⁺⁵
Very interesting. Would be better if not a text 2 speach interface with some misses (like Id)
@cropstar ปีที่แล้ว ⁺⁹
Anybody try Quake on a Cryrix? I did. It was not pleasant experience. I've seen graphic novels with higher frame rates! Thanks for a great video I think I understood some of it.
@louistournas120 ปีที่แล้ว
No but I did run Duke Nukem 3D on my Cyrix 6x86 P200+L and it was definitely slower than a P200. It was like having a P120.
It was playable. I estimate I had 20 to 30 FPS .
@nurbsivonsirup1416 ปีที่แล้ว ⁺²
Buying a K6-2/350 on Socket7 for gaming, when I could have had a Celeron A/300 on Slot1/BX440 for the same money was the worst mistake I ever made ...
@AnIdiotAboard_ ปีที่แล้ว ⁺²
Ahhh memories of my first computer
A80486SX-25 Intel CPU @ 25 Mhz with i beleive 8kb combined cache.
4 Meg Of whatever the memory was back then
A Truly massive 100 Meg Hard Drive
Mono Graphics
And MIDI Sounds.
Oh how times have changed
@RTLEngineering 2 ปีที่แล้ว ⁺³
For anyone wondering about the longer version, it has not been uploaded yet. I will upload it soon, so please keep an eye out for it.
@Anonymous______________ 2 ปีที่แล้ว
Seriously?!? Using that craptastic voice does your channel a great disservice.
@suncrafterspielt9479 ปีที่แล้ว
Hey, will it come after gpu June?
@RTLEngineering ปีที่แล้ว ⁺²
That's the plan. All that's left is editing, but it's a little over 50 minutes long, so that will take time.
@igg3937 ปีที่แล้ว
Wow I hadn't heard of or thought of 'Cyrix' for a very long time. Trip down memory road!
@yiannos3009 2 ปีที่แล้ว ⁺³⁰
This video is outstanding. Where can I find the longer version?
@RTLEngineering 2 ปีที่แล้ว ⁺¹⁴
Thanks! I haven't uploaded it yet - I will probably upload it next week.
@yiannos3009 2 ปีที่แล้ว ⁺⁴
@@RTLEngineering I look forward to it (but no rush, please invest the time it deserves)
@mbe102 ปีที่แล้ว ⁺⁵
@@RTLEngineering new here, but hearing there is a long version, I'm jonesing for it!!!!
@WhoLover ปีที่แล้ว ⁺¹
@@RTLEngineering any update on this?
@snakeplissken1754 ปีที่แล้ว ⁺²
Back in the days i had a cyrix 5x86 133, was quite an upgrade from my former 486dx4 100. But quake wasn´t a friend of it, other games ran fine even great but that one... na.
Thankfully so i wasn´t that big into this type of games back in the days anyways, always have been more of a 4x or rts type of person and for that the cyrix was fine.
A bit of a shame that cyrix went belly up, given for how long i used to run the cyrix 5x86 and for how little money i got it compared to other options. Pentium was way ahead sure but also cost and arm and a leg in comparison.
The cyrix 5x86 was also the first cpu/mainboard/ram setup i ever bought from my own money as a kid. All the previous pc stuff i got gifted from an uncle that worked in it. So i was sort of fortunate in my childhood when it came to access to pc stuff. I sure could have waited a bit and maybe get a pentium for christmas but i wanted to get something from my own money.
@Roxor128 ปีที่แล้ว
10:27 - Minor error: the "Compute" highlight doesn't cover the Exponent Module, but does cover the Temporary Registers, which should be in the "Control and Regs" highlight.
@RTLEngineering ปีที่แล้ว
Thanks! I think that was mostly a limitation from trying to draw a simplified shape. Although, I wouldn't really consider the exponent module as part of the "compute" given that it's implemented as a small adder - the real compute happens in the main units which could do up to FP80 computation (64-bit mantissa).
@Saturn2888 ปีที่แล้ว ⁺²
I always wanted to know more about older computer hardware rather than the hearsay I heard as a kid. I'm very happy you made this video!
@daddyturbo1 ปีที่แล้ว
OMG THANKS SO SO MUCH THIS HELPED!!!
@tomswift3938 ปีที่แล้ว ⁺¹
"The transition from mostly 2D games into immersive 3D environments was brought on by none other than the original Quake"
Descent came out a year before Quake.
@RTLEngineering ปีที่แล้ว
Descent wasn't nearly as popular as Quake though, and the engine under Quake was used in several other popular games.
@jimmyhirr5773 11 หลายเดือนก่อน
I'd argue that Super Mario 64 also played a role in that, as it was released around the same time as Quake. That said, it's true that many more games used Quake's engine.
@dabombinablemi6188 ปีที่แล้ว ⁺²
This showed up in recommended after I installed the shareware version of Quake on my Windows ME machine.
@khoroshen ปีที่แล้ว
Subscribed! Great content, even though the artificial voice is a bit difficult to follow, I understand the reason behind it.
@supabass4003 ปีที่แล้ว ⁺¹
Perspective correct texturing was the problem for the K6, makes sense.
@vincentvoillot6365 2 ปีที่แล้ว ⁺⁴
Very interesting, i like when the frontier between hardware and software is blurry.
Maybe one of the last example where good knowledge of the hardware lead to programming wonders.
I didn't know K5 was out-of-order, i had to look it up and what a strange bird it was : a x86 front-end with a Risc.
I start wondering how well a x86 front with a modern RiscV would performe.
Seing the benchmark, clearly the P6 was one of the finest architecture at the time ( But i would have sold my soul for a Dec Alpha in the late 90's ).
Since your last video, i have done some digging on 3DFX (find a site with technical papers and source codes).
Voodoo 1 had two chips ( texture and raster ) each with a 64bits@50Mhz dedicated EDO RAM, so 800Mo/s in total.
Voodoo Rush was a voodoo 1 with some 2d chips battling for the pci bandwitdh.
Voodoo 2 had two texture units, so 3x64bits@90Mhz 2160Mo/s.
And the Banshee was an half-voodoo2 (only one texture unit ) with a 2D.
I supposed you would have to adapt the architecture to a single memory system ? Can the color blending and overlay can be mutualise with the 2D part ?
@RTLEngineering 2 ปีที่แล้ว
Indeed, the span from 1993 to 2001 was probably the most dramatic change in hardware and software architecture. Before 1993 most hardware/software looked similar, and after 2001 most hardware/software looks similar. That's probably due to it being the inflection point where hardware started to become fast enough that it could handle almost any software thrown at it, instead of requiring heavy assembly based optimization.
- I was originally surprised that the K5 was OoO too, but that's probably because I was equating it to the Cyrix 5x86. There's a lot of detailed patents on the K5/K6, which have a very similar architecture. It was the K7 that was a complete overhaul (due to clock speed scaling on the k6).
- All OoO processors use a RISC like back-end. How RISCy it is depends on the architecture though. The K5/K6 were pure RISC, meaning that it was a load-store architecture. The K7+ and P6+ used a semi-RISC back-end, which was load-store based, but could fuse a load or a store with an ALU operation (to save in micro ops).
- You wouldn't be able to implement x86 with a RISCV backend efficiently, because of complications with the ISA. And a RISC-like backend becomes a nightmare when dealing with x87 FP (I talked about that in the longer video, which hasn't been published yet). The RISC back-end of the K5/K6 was highly specialized around these quirks of the x86 ISA, so other than being load-store and register-register based, it doesn't resemble any other RISC like ISA.
- The P6 was very powerful, but I think it was outmatched by the K7 in many cases (regardless of the benchmarks). There have been many improvements to the P6 though, starting with Core, and those allow it to far surpass the k7. Although AMD has many many improvements to the K7 base architecture too (even Zen is similar to K7).
- What you described with the Voodoo's is correct, except for the Banshee. Perhaps you are thinking of the Rush? The Banshee was combined like the Voodoo3+, and shared a single memory bus (it may have only had one TMU though, I would have to look that up). For a FPGA, the architecture would depend on the underlying hardware platform. If your platform only had 1 memory system, then you would need to have the framebuffer, texture memory, and 2D memory share the same system. Actually, 1 memory system would mean also having the CPU and sound card share too. Ideally, you would at least have 2, where the graphics have a dedicated system and the sound/CPU get their own. If you instead had 3 memory systems, then it could make sense to split the 2D and 3D graphics, and 4 could split them again. But I think 2 is a reasonable target if the latency and bandwidth requirements can be met, since that's what the Voodoo3+ (and Banshee) used. And yes, the 2D system would share the same memory. The Voodoo cards actually store their 2D framebuffers in a specific way that allows it to be compatible with the 3D framebuffer, though I haven't looked into how the windowing was done.
@vincentvoillot6365 2 ปีที่แล้ว
@@RTLEngineering
The rush (SST-96) was a revised voodoo 1 (SST-1) with a memory @45Mhz.
The Banshee was a Voodoo 2 (SST-2) but with only one texture unit (TMU ) instead of two. Hence the poor score in multi-texturing compare to the Voodoo2 (or the TnT).
I remember the banshee well, but the rush not so much beside the fact it was bad, "S3 Virge" bad.
Indeed the K7 was a beautiful chip compare to the PIII. The K7 bring 3Dnow and PIII gain SSE, both with MMX with completly reworked fpu-vector units.
@RTLEngineering 2 ปีที่แล้ว ⁺²
I had to go back to the spec to verify, but my original comment about the Banshee is correct. The graphics core in the Banshee is still the second generation SST1, but the overall system configuration is almost identical to the Voodoo3+. If anything, I would call it the Voodoo2.5. When I say system configuration, I mean at the top level. In fact, the Banshee, Avenger (Voodoo3), and Napalm (Voodoo 4/5), have the exact same top level diagram in their specs. In that architecture, they share the same PCI and command buffer interface, as well as a common memory controller. Also, as you said, the Banshee had 1 TMU, the Avenger had 2 TMUs, and Napalm had 2 TMUs but 2 Raster pipelines (so 1 TMU per pipeline - I can't recall if Napalm could combine both TMUs for tri-linear filtering though, or if it needed 2 passes like with the Banshee). Anyway, the point was the shared memory controller in all of those cases.
For some reason I thought that all of the K7s had SSE, but only the later ones did. The SSE operations were effectively a subset of 3DNow, but twice as wide. So modifying a 3DNow unit to support SSE wouldn't have been too difficult. The same thing can be said regarding SSE2 and the existing MMX units.
@vincentvoillot6365 2 ปีที่แล้ว
@@RTLEngineering Sorry to be so psychotic about the memory. I have been interested in hardware conception for a couple of years and i'm still searching a good platform to begin. Meaning extensible enough and with as much or more LE than the mister FPGA.
I didn't saw any Dual channel DDR capabilities in the "low" cost FPGA. For what i understood, the MIG need specific logic and clock for the dual data rate. The artix can go as far as 72bits wide but one channel. So i concluded some kind of custom board with sdram is the only way to go beside selling my organs (and DDR layout seem insanely difficult ).
My first idea was to tap the PL pins between a kria module ( 2x240pins ) and it's carrier board throught a breakout pcb and add some ram ( depending on how many pins i can get ) , and leave the PS ones connected.
Sadly I didn't follow 3dfx past the vooddo 2, i had bought a geforce 256 (T&L acceleration ^^)
Anyway, i'll wait your next video. Very interesting subject.
Ps the AI voice is pretty good
@RTLEngineering 2 ปีที่แล้ว ⁺²
No need to apologize about the memory concerns. This is one of the major limiting cases for FPGA emulation - memory technologies, bandwidth, and latency. It's actually still a problem today, but even more so for FPGA emulation since the hardware and potentially software, was dependent on specific parameters that can't be achieved with modern memory. I did start working on a video about the philosophy of FPGA emulation, and it has a section discussing that problem in more detail.
I'm not sure what you mean by dual channel DDR though. Dual channel would mean 2x memory controllers, and therefore 2x the wires/pins. As far as I am aware, there are no FPGA development boards that have such a configuration (other than the ultra-high-end ones, or the ones that use HBM). The MIG is needed to talk to any high speed memory, or at least something similar (the LiteDRAM controller also works, but it's not quite as fast). I would have to check, but I believe the Artix is limited to 64-bits, the extra 8-bits are for ECC, which I believe is built into the MIG (if enabled), so you would only have a 64-bit output (really 512-bit for DDR3, since it's 8n prefetch). SDRAM would certainly be easier to route, but it may be harder to control at high speeds (it may be easier to route DDR3-800 than SDRAM-200). Another option is to use LPDDR through the MIG, which is point-to-point instead of bussed (fly-by to T is what makes DDR so hard to route), but it comes at a higher read latency.
Tapping the pins on the Kria module is going to end in disaster though. Even at 100 MHz, you're talking about high speed signals where impedance matters (that's partly why MiSTer has such a hard time getting the SDRAM to run that fast). Your best bet would be either 1) if the Robot starter board connects the Pi header, use that for SDRAM (it still won't be enough bandwidth for a Voodoo1 though), or 2) design a carrier board for the Kria SoM (not the one from the starter kit, you would have to use the commercial SoM), or 3) go with a different system.
I may work on a geforce 256 at some point (or more specifically a NV2A), but there's a lot more information on the Voodoo cards, and hardly anything on the 256. I would probably have to rely on Nouveau or the open source NVidia drivers for that.
Thanks for the comment about the AI voice. It has worked out for the most part, though I have trouble getting it to say certain words or to keep a consistent vocal tone (often requiring the sentences to be restructured). The good news is that using that method provides a perfect transcript to upload for subtitles, and the pronunciation is clear enough that it can still be understood at 2x speed. Overall, I guess I would rate it a 7/10, which I think is "good enough".
@jmxtoob ปีที่แล้ว ⁺³
I'm pretty sure the K6 was priced so you'd get a few more MHz for the same money (maybe it was the K6-II when that became the case) so it depends on what you're comparing, the architecture or what a person spending the same money on a computer would experience
@RTLEngineering ปีที่แล้ว ⁺²
That was the point that I was making. The K6 could only overcome the architectural limitations through brute force (higher clocks). Most people wouldn't know the difference in the underlying technology, so it would be a more subjective comparison, which the framerates are a good proxy for. If the game has 20% more frames, then it will play smoother and be a better experience, regardless of the CPU clock.
@jmxtoob ปีที่แล้ว ⁺²
@@RTLEngineering I get your point, though at that time 'being able to hit a higher clock' *was* an architectural feature. I think it's historically relevant if the equivalent Pentium couldn't reach the clock speed, or cost twice as much (e.g. only from the highest binned CPUs) or required exotic cooling. The whole Pentium 4 netburst architecture was a deliberate choice to drop the instructions per clock to achieve a much higher clock speed, inspiring AMDs (Pentium rating) for it's Athlon XP - lower clocked Athlon chips could do the work of a higher clocked Pentium 4
@RTLEngineering ปีที่แล้ว ⁺¹
You do have a point there, though it could be argued that it's currently still an architectural feature. I am used to looking at it from the perspective of "we did X, and it can reach clock Y". A minimum Y may have been set as a design parameter, but the novel part of the architecture is X not the fact that it can reach clock Y. That's even more relevant now, considering that the number of possible technology nodes is far greater than in the 1990s. So unless you are targeting the most cutting edge node, you always have the option to drop down in size if you can't meet the minimum clock requirement.
And Netburst was an exception, where there was an emphasis on hitting a high minimum clock. Which as we saw, didn't really help performance all that much due to the ultra-deep pipeline.
@jmxtoob ปีที่แล้ว ⁺¹
@@RTLEngineering you're right and it kinda depends on what you're trying to demonstrate. Ends up less important when you can get either on eBay for $5 - I guess I'm biased by my bang-for-buck purchasing philosophy. And one point I think we can all agree is what a fail netburst was haha
@musaran2 ปีที่แล้ว
The accompanying chipset is a factor too.
Intel has a bad habit of owning the whole chain.
@azazelleblack ปีที่แล้ว ⁺²
I haven't finished the video yet, but so-far, it seems well-researched. However, it's odd to see you say that the Cyrix 6x86 PR233 was "233 MHz", since that chip only ran at 188 MHz. I'll leave any more comments I have after watching the rest.
@azazelleblack ปีที่แล้ว ⁺¹
Hmm, I'm not quite sure I followed your explanation of why you believe the "ST" K6 processors weren't benefiting from their on-board cache. Just looking at the specifications, it seems intuitive to me that the performance-per-clock of the K6 "ST" models scales with their L2 cache. I suppose we would need to test to see for sure.
@RTLEngineering ปีที่แล้ว ⁺²
The PR233 was used as a point of comparison, because Cyrix claimed their CPU was equivalent to a 233 Pentium. There's really no "fair" way to compare the architectures, given their optimal speeds and underlying implementations were different.
As for the ST variant, I think you have that confused. I said that the ST cores DID benefit from the L2 cache, and that's why they matched the Pentium MMXs (when scaled for frequency). It was the K6 CXT cores that did not, because they lacked the L2 cache. Unless you are questioning the difference between the 128K and 256K versions? That's likely because the 256K could cover more of the DRAM penalty by simply being larger (reduced misses).
@azazelleblack ปีที่แล้ว ⁺²
@@RTLEngineering Hmm, I'm not very good at listening, so maybe I did just hear wrong. I'll have to listen again sometime when I have time. ^^ Thanks for replying.
@iLeno ปีที่แล้ว
This channel is GOATED
@Saturn2888 ปีที่แล้ว ⁺²
Is that you talking or a computer? Sounds very robotic to me like one of those top-10 channels, but the information is sound.
@thebestspork ปีที่แล้ว ⁺³
Definitely an AI voice.
@RTLEngineering ปีที่แล้ว ⁺³
I answered a similar comment in another video, but it is AI voice synthesis. It's to lighten the production load which would have otherwise been a very long and tedious process. I am one of those people who enjoys assembling the information and presenting it, but not spending hours recording 15 minutes of dialog and then several additional days editing the audio together. If I ever find a more efficient workflow in the future, then I will go back to recording the audio with my voice.
@nopadelik9286 ปีที่แล้ว
@@RTLEngineering thanks for this very telling answer to the "why" behind using voice synthesis. I wrapped my brain around that a while ago, and your answer makes much sense to me.
@st.john_one ปีที่แล้ว
K6-2 300Mhz, was my first cpu bought for my first money earned in first job :)
@ccanaves 6 หลายเดือนก่อน
What about the 6x86? How does it differ from the K6?
@WizardNumberNext ปีที่แล้ว ⁺²
You failed to mention how many instructions K6 could theoretically execute per cycle
Answer is
AMD K6 have dual instruction decoders
2 integer units (slightly better solution, then Intel)
Floating, load, store and absolutely massive (8192 entries) branch petition unit
It have 32kb of L1 data, 32kb of l1 instructions, 20kb predecoded instructions cache. Even predecoded instruction cache is bigger then any L1 cache on any Pentium ever.
AMD K6 could issue up to 4 instructions at same cycle.
@RTLEngineering ปีที่แล้ว ⁺¹
That was discussed at 9:45 in the video.
The K6 actually has 4 instruction decoders, but can only decode the two simple decoders in parallel. That's in contrast to the P5 which has 2 decoders with some preconditions that effectively make it 1 simple + 1 complex decoder.
The K6 and P5 both have two integer units, so there's no difference there. Although, the integer units on the P5 were more advanced, since they could implement stack operations together.
The branch prediction entries aren't going to make much of a difference here. If anything, they help make the gap narrower than it would have otherwise been.
You're correct about the cache size, but again, that would only help narrow the gap - the K6 still performed worse than the P5 and P55C for Quake. Also note that the predecode bits weren't significantly more accurate on the K6 than the P55C's length predecoder - e.g. 80% accuracy vs 74% accuracy. They were mainly needed to ensure that the complex and vector decoders didn't cause the fetch stage to deadlock too often. And they also hindered self modifying code, which would require the entire cache to be flushed and a refill was slowed by the predecoding recalculation.
As for the last statement, the K6 could issue 4 MICRO-OPS per cycle, not full instructions. That was described in another section of the video, and ended up being the entire reason for why Quake was slower on the K6.
@jfftck ปีที่แล้ว ⁺¹
Wolfenstein 3D ran fine on 286, so it wasn’t optimized for 486 and was released as an enhancement to the previous game using the same engine, Catacomb 3D.
@RTLEngineering ปีที่แล้ว ⁺¹
Just because it ran fine on the 286 does not mean that it wasn't optimized for the 486. Optimized usually implies code efficiency, so it ran faster on the 486 than the 286 after accounting for the clock speed differences.
@fredoverflow ปีที่แล้ว
4:45 Actually, ASCII is only 7 bits and does not include a copyright symbol.
@ahmetyagzdemirhan9069 ปีที่แล้ว
Thanks so much
@DG-sy3rv ปีที่แล้ว ⁺³
I remember K6-2 destroyed Pentium II the moment AMD released a patch for Quake 2 back in 90s.
@Ivan-pr7ku ปีที่แล้ว
Celeron 300A is what "destroyed" P-II, and it didn't need any game patches for that.
@DG-sy3rv ปีที่แล้ว ⁺³
@@Ivan-pr7ku Trust me, I know what I am saying. I was a gamer at that time.
@PaulSpades ปีที่แล้ว ⁺¹
I remember the Pentium 2 systems being prohibitively expensive and mostly targeted at businesses in 97. So for a home windows 95 machine you either got an AMD K6, an MMX, Celeron or a Cyrix system(from best performance to least). Lots of people still had Amiga and Atari ST systems which ran 2d games and media software.
If it was 3d games you wanted, you got a Playstation or Nintendo 64.
But for gaming it was less important what x86 CPU you ran in '97/'98 because graphics cards like Riva TNT, ATI Rage, Matrox and S3 were coming out. Even the mostly-2d-accelerators ran software 3d better than the CPUs (though 3d APIs only got good with win98), and the T&L cards after that first gen were something else entirely. Massive FP multiply and divide operations were just done with the 3d APIs since the GPUs have the better hardware since then.
In retrospect, AMD's implementation of out of order execution was more significant than Intel's fast FPU switching trick. And Quake was less significant as a game but more as a basis for most 3d engines at the time (well, the openGL version, not this software rendered one).
@dmaifred 11 หลายเดือนก่อน
I remember buying quake when it came out then at future shop in park royal mall West Van.
@Ojref1 ปีที่แล้ว
The problem and irony with the Cyrix 6x86 FPU was the fact that it was essentially the FastMath 387 with very little changes. What got Cyrix notoriety back in the 386/486 era for having one of the fastest add-in coprocessors wound up becoming a millstone about their necks because the emergence of Quake as a popular game. Besides, the whole point of using the FPU was to make use of as much computing resources as possible to make Quake one of the most futuristic game engines of its time. Other applications would not utilize the FPUs to such an extent outside of scientific or supercomputing purposes. Thus AMD and Cyrix chose to not put as much development resources into the FPU pipelines in order to cost-optimize and focus more on integer ops, which is what more desktop consumer users would perceive as value. Cyrix unfortunately took that to an extreme and got caught out.
@RTLEngineering ปีที่แล้ว
Thanks for the additional information! I do recall reading about, and it makes sense from their perspective - why spend money upgrading the FPU architecture when it's not often used and they had limited funds due to competition with AMD and Intel.
Although I disagree with your assessment that Quake's goal was to make the most futuristic game possible. The goal was likely to produce a 3D game without resorting to pseudo-3D techniques like in previous generations - a simple goal which comes with many performance strings. Luckily at the time of development, the Pentium could just barely manage to achieve that goal, so they proceeded with development. It was more like the bleeding edge of consumer technology finally caught up to their goals rather than trying to push the goals as far as the hardware could allow.
@wilsard 5 หลายเดือนก่อน
the cyrix 6x86 pr233 ran at 188 or 200 depending on the version and bus speed.
@divo4957 ปีที่แล้ว
Works well!! DANKEEE
@iforth64 ปีที่แล้ว
Very good analysis. However, FP division is something that every software developer knows to avoid like the plague. What was the reason that Quake was written using FP division to a degree that it was measurably influencing game performance?
@RTLEngineering ปีที่แล้ว ⁺¹
This is a necessary operation for performing perspective-correct texture mapping. Think about the texture mapping on the original PlayStation vs the N64 - the PlayStation textures would warp and distort, which was because each sample is not being divided by the depth of the perspective transform (which is what allows a 3D object appear 3D on a 2D screen).
So essentially Id has the choice of distorted textures or rely on FDIV performance, which is turns out the Pentium at the time could sufficiently handle the FDIV load from the software renderer.
@SimonBuchanNz ปีที่แล้ว
There's always alternatives, but, eg, using integer operations and/or table lookups means you're now using the resources those following ops would have been able to use in parallel with the fdiv.
There was the famous Doom self-modifying code loop to correctly texture quickly on the 386 and 486, but perhaps that approach didn't play well with the newer operation decoders or the more general cases that quake needed.
@RTLEngineering ปีที่แล้ว ⁺¹
That's true, but integer operations and table lookups wouldn't perform quite a well as FP ops (precision and interpolation issues). I would imagine that an integer solution was tested and found to be insufficient for their design goals.
As for self-modifying code, that would have performed far worse on the K6. The K6 actually pre-decodes the instructions when they stream into the instruction cache, this means that self-modifying code would necessarily require the instruction cache to be flushed and re-pre-decoded every time a modification was made. The Pentium on the other hand did pre-decode the instructions as well, but that occurred in the fetch path (post-cache).
@ostrov11 2 ปีที่แล้ว ⁺¹
Спасибо.
@edgarbonet1 ปีที่แล้ว ⁺¹
Hi! Thanks for this well researched and very informative video!
A couple of nitpicks:
- The symbols of the transcendental functions “sin”, “cos”, “tan” and “exp” are always written in lower case.
- The function exp() is not “exponent”: it is the “exponential”.
@RTLEngineering ปีที่แล้ว
Thanks, and thanks for pointing that out! It's quite pedantic, but you are correct.
@Oxxyjoe 11 หลายเดือนก่อน
so, the pentium 233 processor could perform one "free" floating point operation every 16 cycles, whereas the k6 and the one mentioned at the start of the video had to pay for each one. And this meant that code optimized to take advantage of that pattern of 1 FP operation per 16 cycles fully took advantage of the pentium 233's special feature.
@RTLEngineering 11 หลายเดือนก่อน ⁺¹
Essentially, yes. Although it wasn't just any FP operation, it was FP divide. Other operations like FP add could be performed every cycle on both the K6 and Pentium (the K6 might have been every-other cycle due to the decoder though, I would have to look back at that, and the Pentium would require a specific sequence of instructions to also take advantage of that).
I wouldn't say this was necessarily a special feature of the Pentium (certainly not special to the 233 MHz variant), more of a quirk in the micro-architecture - it was a feature on the K6 which turned into a bug, and a missing feature on the Pentium which turned out to perform better.
@Oxxyjoe 11 หลายเดือนก่อน
@@RTLEngineering Oh ok. By the way, I know basically nothing about processors.
What I meant by a special feature was what you talk about at 14:30. The exception logic.. which made it possible to have a FP division in the background.
@RTLEngineering 11 หลายเดือนก่อน ⁺¹
Oh, the exception logic is implemented on all of the processors - it's required by the instruction set architecture (ISA) specification (x86 in this case), which effectively is the agreement that the hardware makes with software / compilers allowing code to compile and execute. The difference between the processors was in the technical implementation of how this logic was implemented - the ISA states what should happen, but not how it should happen. Modern processors take advantage of that distinction to execute more instructions in parallel and therefore faster overall. Anyway, I think it would still be more precise to refer to the difference between the processors as a side-effect rather than a feature, since the implementation is ideally transparent to the software (it should have no idea what's going on under the hood). AMD's solution was perfectly valid, it just had unintended consequences for Quake. Also, aside from the FDIV, I believe the rest of the Quake code actually ran faster on the K6 (because of the parallel execution - called Out-of-Order execution), but it wasn't enough to make up for the loss caused by FDIV.
I hope that made sense / clarified what I meant in the previous comment.
@Oxxyjoe 11 หลายเดือนก่อน
@@RTLEngineering yes, it made good sense. I did not get it as fully as you wish but that is down to my own illiteracy on the subject. Like, I'll take your word for it. But I find your explanation to be well stated.
So, the question I next have, is,
could an engine have been developed that ran just as well on the k6 as it did on the pentium? Was there a design limitation that the pentium happened to satisfy accidentally, but which the k6 was not made for?
Like stepping backwards in time, per se,
what would it have been like if they had developed Quake with the AMD processors in mind?
You say the software is supposed to be blind to the internals of the processor,
yet it will always be the case that knowing how the processor works means knowing how to utilize it better,
like going to a tailor, who will measure you and make you clothes that fit you specifically.
@RTLEngineering 11 หลายเดือนก่อน ⁺¹
An engine could have been developed that ran around the same on both processor, but it would have had to give up fdiv, or used some division approximation. The DOOM engine would have been a good example of that, but it came with limitations, and the Quake3 engine introduced the rsqrt approximation that would have been much faster (and sufficient for Quake), although it's possible that the trick had not occurred to anyone until that point. So to answer your other question, the design limitation was the need to do perspective correction, which requires the fdiv, and the Pentium happened to satisfy it.
Developing with AMD in mind would have led to a different solution: restricting the 3D movement like in DOOM, using an approximation, or using a small look up table (that's how the N64 did it - in hardware).
And your point is exactly correct. Blind in this sense means that the code will execute, which is true. You can execute the Quake code on a AMD or Cyrix processor, but it won't perform as well. You can always tailor the code and / or the compiler to the specific processor, but then you give up portability to other systems. Although I don't think Quake could have been tailored to the K6 or Cyrix in such a way that it performed better than the Pentium without altering it to the point of being a different game.
@prycenewberg3976 11 หลายเดือนก่อน
Regarding the use of a synthetic voice, this one isn't bad. I didn't notice any weird errors made by the voice, which made it much more tolerable. If you decide you want a human voice, though, I'd be willing to send you a sample reading of my voice.
Either way, I've heard much worse voice synthesizers and I wouldn't worry too much about it.
@RTLEngineering 11 หลายเดือนก่อน ⁺¹
Thanks! That was my take as well, otherwise I wouldn't have used it - I also found it comprehensible at 2x speed, which is rare for other TH-camrs unless they are speaking incredibly slow (which some do). I wonder if this is like Cilantro, where some people have a genetic predisposition for it to taste like soap.
Thank you for your offer, but I don't really have time to work on more videos at the moment. By the time that I do, AI voices will probably be much better - Elvenlabs just released a new version that's significantly more natural sounding.
@viscountalpha 3 หลายเดือนก่อน
I remember buying a Pentium 166mmx chip and thinking it was perfect priced/performance back then.
@foxdavion6865 11 หลายเดือนก่อน
Celeron 300A scored the highest; I guess it is because it was the final chip made by Intel which used the OG Pentium Architecture along with having the same floating point logic, considering it was Intel's budget low end chip, I just find that fact very amusing. Weirdly Intel still makes Celerons but not for the consumer market, they're now used in those redundant architecture industrial computers; The role Celerons were used for has been replaced with i3 processors.
@semicuriosity257 9 หลายเดือนก่อน ⁺¹
Celeron 300A "Mendocino" used a Pentium II P6 core with an on-chip 128KB L2 cache.
@japekto2138 ปีที่แล้ว
Ugh. 320x200 was so pixelated even on a 14" monitor. Most of wanted to run at least 640x480. No can do, even with the latest Pentiums of the time. Depending on Pentium model, frame rates were mainly in the teens or 20's. It took 3dfx and Rendition to push Quake frame rates to a playable 30's and 40's at 640x480.
@CharlesLeCharles ปีที่แล้ว
You forgot quake (and some other games) had optimized executables for each processor brand.
@RTLEngineering ปีที่แล้ว
Even with an optimized executable, you wouldn't be able to get around a hardware implementation "bug". You could potentially reduce the impact slightly, but not eliminate it.
@catfree ปีที่แล้ว ⁺¹
honestly hearing a tts voice is better than some curley hair teenager with voicemeeter
@fabiosemino2214 ปีที่แล้ว ⁺²
I remember being affected by this in the later stage of my k6-2 550 life, I had a kyro 2 GPU but pentium 2 450 users smoked me in RTCW, for that game switching to a duron 1300 while keeping sdram was one of the biggest boost ever for me.
@RTLEngineering ปีที่แล้ว ⁺¹
The Duron (K7) was a huge improvement over the K6, and marked the beginning of AMD's architecture shift which is still seen in the modern Zen processors.
@krzysztofb9279 11 หลายเดือนก่อน
i remember getting my pentium 233 w/mmx and selling it off a few weeks later because intel pentium cpus had a hardware design flaw that caused random crashes... lol replaced it with an amd k6-2 300
@mitchellschoenbrun ปีที่แล้ว
Wolfenstein and Doom were not targeted to the 486 processor. They both ran on a 286, although jerkily. On a 386 Doom was fine.
@RTLEngineering ปีที่แล้ว
Targeted implies the configuration in which the game runs the way the developer intended. Unless the developer intended the game to run jerkily, the 286 would not be considered the target processor. Same thing for Doom (you would have to reduce settings to run on a 386). Regardless though, if you look at the system requirements, Wolfenstein says a 386 is recommended and Doom lists the 486. These recommendations are the targeted system, and scale according to the computational workload that each of the games demand.
@mitchellschoenbrun ปีที่แล้ว
@@RTLEngineering I guess we have to disagree. I went through all 3 levels of Doom on a 386. It was quite smooth. There were no settings to reduce. There were no 386 processors when Wolfenstein was first released in 1981. The 386 came along in 1985.
@RTLEngineering ปีที่แล้ว
I think you are confusing games, which is my fault for not being clear. The Wolfenstein I was referring to in the video and the comment above is Wolfenstein 3D, which was released in 1992. As for Doom, the minimum requirements were a 386, but it ran smoother on the 486. As for setting to reduce, if I recall correctly, you can reduce the render window to make it run smoother (smaller window means fewer 3D pixels to draw each frame).
Anyway, the disagreement may come from a colloquial and technical definition of "targeted system", where I was referring to the technical one in the video.
@mitchellschoenbrun ปีที่แล้ว
@@RTLEngineering You are correct about reducing the window size, something I never felt a need for on a 386. I didn't even know it would increase smoothness though that also makes sense. I've never played Wolfenstein 3D so I am sure you are right about it.
@MadScientistsLair 5 หลายเดือนก่อน
I need to make a video on the total disaster my first "real" PC made from actually new parts was. It absolutely hauled for productivity and web browsing (back when page rendering speed mattered even on 56k!) but was an absolute dog at games. I picked pretty much the worst combo I could have back then for performance and stability.... A K6-2, An ALI Aladdin V chipset mobo and an NVIDIA TNT2. I'd have been better off with a PPGA Celeron, 66 MHz FSB and all and the cost difference would have been almost nil. Quake engine titles suffered the worst as expected but Unreal engine stuff wasn't exactly amazing either, though the latter DID benefit from 3DNow! without AMD making a special patch like they did for Quake II.
I stayed with AMD for the next rig I built for my 16th birthday....Athlon Tbird 1000 AXIA stepping OC'd to 1400 and a Geforce 2 Pro on a KT133A board. That was a proper rig though it combined with the barely 68% efficient PSUs at the time kept my room rather warm. I learned a lot in between those two rigs.
@8bvg300 ปีที่แล้ว ⁺¹
Yo, is this a robot talking or actually your voice? Anyone know?
@RUTHAN667 ปีที่แล้ว
Is not that different that Cyrix and K6 actually run on lower frequencies than was in their names? It was that Pentium rating..
@RTLEngineering ปีที่แล้ว
The Pentium Rating was to match integer performance (the dominant workload at the time), however, the Pentium would still outperform the Cyrix and K6 processors in Quake at the same speed. The floating-point division operations are performed frequently during software rendering and completely stalling the pipeline each time hindered performance regardless of the core clock speed.
@TranceParadise 11 หลายเดือนก่อน
I had a choice to have Pentium MMX 200 MHz or AMD K6 233 MHz. I went with AMD and now I regret it.
@charleshines2142 11 หลายเดือนก่อน
And to think back in those days people thought nothing ever really took advantage of a math coprocessor. They all thought it was only for highly specialized niche applications that no home user was ever likely to do. Back in those days a lot of motherboards had a second socket that looks like it is for a CPU but really that is the math coprocessor socket. To a lot of people it seemed trivial to have a math coprocessor as they thought that it was only for the kind of math on a chalk board or piece of paper that may have frustrated so many people (I am thinking advanced stuff not just addition and subtraction you morons!!)
@thomasvennekens4137 3 หลายเดือนก่อน
the winchip did well , but it wasnt widely known
@markmental6665 2 หลายเดือนก่อน
it was cheaper, but kind of slow
@donwald3436 ปีที่แล้ว
This text to speech engine is really good. It still gives me a headache though.
@nwobhm1992 11 หลายเดือนก่อน
So ok... K6 couldnt decode complex integer insturctions paralell to floating point while pentium can and pentium has writeback so load and store instructions are not needed for writing back a result tu L1 cache. So if FDIV isnt pipelined, K6 would stall if after FDIV there are plenty complex integer instructions while pentium wouldnt cause it can do FPU + complex integer in paralell. If only amd made one of SDec0 or SDec1 complex decoder...
@RTLEngineering 11 หลายเดือนก่อน
I think you may have gotten that confused. The FDIV in the K6 was "pipelined" from the rest of the processor, meaning it could execute FP and integer instructions in parallel. The problem was the re-order mechanism in the K6 which would fill-up during a FDIV operation and have to wait for the FDIV to complete before it could allow more instructions to enter. The Pentium didn't have that issue as it did not use an out-of-order mechanism.
@nwobhm1992 11 หลายเดือนก่อน
@@RTLEngineering ok, but i think due to ooe fdiv can execute in paralel to integer instructions on k6
@RTLEngineering 11 หลายเดือนก่อน ⁺¹
That's correct, FDIV can execute in parallel on both the K6 and the Pentium P5. The problem was not the parallel execution, it was the Re-order Buffer/Instruction Queue on the K6.
@raghul1208 ปีที่แล้ว
best channel
@dualboy24 ปีที่แล้ว
Mistake early and video, you said they all have the same clock speed, that is not. The cyrix and AMD chips use performance ratings,
@RTLEngineering ปีที่แล้ว
They all have the same effective clock speed (as measured by integer performance), that's what the PR is supposed to indicate. The Cyrix and K6 had a higher IPC than the Pentium, meaning they could achieve a similar integer performance at lower clock speeds.
@gui2peg ปีที่แล้ว ⁺²
your voice sounds like that robot from civvie
@Luix ปีที่แล้ว
Out of order execution something so common these days
@jmi2k 11 วันที่ผ่านมา
It saddens me to see all the fuzz about the speech synth thing. If you are into these kinds of things, this video is outstandingly good, going deep into what's going on.
I can understand a difference in opinion about the perceived quality of the video (which is subjective anyways) but the claims I've read that the video is low-effort or that the author is lazy are hurtful and unfair, especially taking into account it's done for free and publicly available.
@ThePeterCorne ปีที่แล้ว
ID software released a lot more before wolf and doom
@BigT5 ปีที่แล้ว
Design is very human voice
@ricardoalas743 ปีที่แล้ว
I have run that in a 286
@Casper_Min ปีที่แล้ว
I started crying when I saw word RTL in youtube 😂😂😂
@SilentOnion ปีที่แล้ว ⁺²
Its this text to speech?
@adriangorzelski6931 ปีที่แล้ว
Was that a Carmack's genius, Intel's one or the money sent to the compilators owners?
@RTLEngineering ปีที่แล้ว ⁺²
It wasn't from the compilers as the code to do this is written in assembly (it's in the Id github repo if you want to look at it).
I would say it's a combination of the other two. Carmack knew that the operations could overlap, so he could rearrange the computations to take advantage of that. Intel knew that stalling the pipeline would be bad for performance, so they let it continue as soon as the exception condition could be assured. And the fancier Out-of-Order execution in the K6 was ahead of its time (for this application), where bugs like this had not been worked out.
@adriangorzelski6931 ปีที่แล้ว
@@RTLEngineering Thanks for your reply!
@Tamperkele ปีที่แล้ว
Why am I watching this when I have absolutely no idea what they're talking about?
@Capybarrrraaaa ปีที่แล้ว ⁺³
Is this voiced by an AI?
@RTLEngineering ปีที่แล้ว ⁺¹
Yes, the audio is AI voice synthesis. It was not researched, written, and edited by an AI though.
@Lilithe 3 หลายเดือนก่อน
Why is this done in TTS? I guess if you just like writing powerpoints for TH-cam...
@RTLEngineering 3 หลายเดือนก่อน
Or that I really dislike editing my own audio. AI voice generation took hours rather than days. That is on top of researching, writing the script, and creating the visuals, which took several weeks. Then following that up with tedious spectrogram work is quite an unpleasant experience. The primary content is not the audio, it is only one component to the medium.
@Delo997 ปีที่แล้ว ⁺⁴
The video seems very interesting and the topics I love, but the narration is robotic and off-putting
@dotplan ปีที่แล้ว
Wolf3d and Doom did not target the 486. They targeted the 286 and the 386 respectively.
@ccanaves 6 หลายเดือนก่อน
Doom did in no way target the 386.
@dotplan 6 หลายเดือนก่อน
@@ccanaves a 286 did not have reasonable performance and most of them did not have enough ram to play doom.
@ccanaves 6 หลายเดือนก่อน
@@dotplan Doom was targeted to a 486. It ran at 90% full speed in a DX2-66 with a VLB card. In a 386-40 (which is how I played it) it's super slow, even with low detail and a shrinked screen. OFC a 286 was not the target. Wolf3D was more of a 286-386 hybrid. It did run well on a 286-20, but ideally you want a 386-33.
@MrBca009 ปีที่แล้ว
🎉
@pom2924 ปีที่แล้ว ⁺⁵
This is a pretty decent video, but please stop using text to speech generators. There is more value when an actual person is explaining.
@RTLEngineering ปีที่แล้ว ⁺¹
Thanks for the feedback! I agree that there is more value when an actual person is speaking, but one should compare the value against nothing instead.
Regardless, I have implicitly stopped using speech synthesis as I have not uploaded any videos in almost a year.
@Imnotrealyouneedtowakeup ปีที่แล้ว
okay so I think i got most of that
@jnbsp3512 ปีที่แล้ว
wow I have finally found a voiceover more annoying than the upbeat-inflection tiktok AI voice :O
This video is really cool tho, I guess I would have loved reading it as an article instead of captions while muting it.
@nothingelse1520 หลายเดือนก่อน
my first PC was a Pentium 100, I got Quake right after it launched......didn't run that great lol
@CrocoDylianVT 11 หลายเดือนก่อน ⁺¹
ah yes, when Pentium was the top of the line instead of almost the bottom of the barrel
@Philfluffer 11 หลายเดือนก่อน
They’re all the same architecture... however developers can optimize games by using special instructions only available on a specific manufacturer.
@RTLEngineering 11 หลายเดือนก่อน ⁺¹
They are all the same instruction set architecture (ISA), yes. But they are not all the same micro-architecture. The micro-architecture is what made the difference here.
As for special instructions, that didn't apply at the time. The Pentium P5, K6, and Cyrix 6x86 all had the same instructions as far as I know. So that means a programmer could not optimize games by using special instructions.
This changed when Intel released MMX and AMD their 3DNow! extensions, but those came after Quake was developed.
@macbaryum ปีที่แล้ว
So dinner is ready … NOW … if you are not hungry. Wait an hour.🤭🤭🤭🤭
@Funj0b ปีที่แล้ว
I wanted to like but the counter is on 486 so i couldn't 🤗
@alejandromoran4590 ปีที่แล้ว
Even a K6-3 @ 550 MHz has problems running this game
@baslifico ปีที่แล้ว ⁺¹
This is normally the sort of content I'd enjoy but the choppy auto-generated voiceover makes it a real struggle to watch.
@UXXV ปีที่แล้ว ⁺¹
Is this an AI voice? Hard to tell
@Pinipon.Selvagem ปีที่แล้ว ⁺¹
The only bad side of this video is the voice.
@SaraMorgan-ym6ue หลายเดือนก่อน
Quake, Floating Point, and the Intel Pentium because it was a pentium and not a pentium 4🤣🤣🤣🤣🤣
@jamesclark2663 ปีที่แล้ว
Decent video. I enjoyed watching it but... Please, please, please would videos like this stop measuring performance in FPS! It's a nearly useless metric. Use time-per-frame instead. As a developer I'm telling you - THIS is the way you need to measure these things or your going to get a skewed understanding every time.
@nopadelik9286 ปีที่แล้ว ⁺¹
Comparing performance via using fps metric is what already came included with the game, that's probably why it's the common metric. But i'm interested and have no clue, would you mind to explain why time-per-frame is better and point me to the differences ?
@nadirjofas3140 ปีที่แล้ว ⁺¹
not really
@RTLEngineering ปีที่แล้ว ⁺³
If were to redo the benchmarking, then I could use a better metric as you suggested (time per frame, with a min, average, and 90th percentile). The point of the video was doing an analysis of the architecture and Quake using existing benchmark data (which was in FPS), to explain the performance differences. While this is probably not what you want, you can always think of FPS as the inverse of the average frame time. And for this type of analysis, comparing the average frame times is the most practical approach.
@alfo2804 ปีที่แล้ว ⁺³
All you need to do to work out the time per frame is to do 1/[average fps]. Don't see how it's a useless metric.
@jamesclark2663 ปีที่แล้ว
@@RTLEngineering Perhaps 'useless' is strongly worded. Misleading would be better. The classic example I see is when people compare framerate changes erroneously or even just give the fps loss or gain with no context. For example they say something like 'this gained me 15 fps!'. That is a completely meaningless number. Going from 15 fps to 30 fps means you've cut your frametime by a whopping 33.3 milliseconds! Huge gains! You've literally doubled the speed! But if you were already at 120 fps and you gained 15 that means you only gained about 0.9 milliseconds of performance. Not completely insignificant but certainly not nearly the gains. In fact even if you doubled the framerate from 120 to 240 you still wouldn't have doubled the performance relative to the jump from 15 to 30 but instead only gained about 4.2 milliseconds.

ต่อไป

เล่นอัตโนมัติ

How did Microsoft store 1.68 MB on Windows 95 Setup diskettes?