Reaction: Of cycle counting and performance

Some Assembly required

มุมมอง 461

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 26 ธ.ค. 2024

ความคิดเห็น • 30

@otzmaanalytics4679 ปีที่แล้ว ⁺²
As a person who wrote his PhD about the power one gets from longer word sizes, I appreciated this.
@CompuSAR ปีที่แล้ว ⁺¹
I'm always edgy when people who actually know what they're doing watch my videos, so I appreciate this as well.
@AndersNielsenAA ปีที่แล้ว ⁺¹
Oh how I feel your frustration. As a person who often does 32 bit addition on a 6502, I appreciated this. (1 MIPS) != (1 MIPS).
@CompuSAR ปีที่แล้ว
You could argue that that's a "32 bit" workload, where you'd expect 32 bit CPUs to be faster. What I'm saying is that even 8 bit workloads are often faster (less cycles) on the 68,000.
@AndersNielsenAA ปีที่แล้ว
@@CompuSAR Absolutely! Even a NOP is 2 cycles on a 6502.
I am surprised to find the 68k does require a minimum of four cycles as a minimum. At least the 6502 can do a tax in two cycles :) Goes downhill fast from there.
@CompuSAR ปีที่แล้ว ⁺¹
@@AndersNielsenAA I wouldn't read too much into "NOP is 2 cycles". The minimal cycle count for an opcode is 2, but *for most commands* it doesn't go too much higher. For most commands, if you count the number of bus accesses required to carry it out, that's the number of cycles it takes. Read-modify-write add one extra cycle, and stack operations do take longer, but otherwise, that's almost it. The 65ce02 even eliminated most of those extra cycles. It's really true that the 6502 is a very efficient, cycles wise, CPU.
To be fair, the same can also be said about the 68,000, except there each bus cycle is 4 CPU cycles, so the result is quite big. The Amiga circumvented a lot of this by using the dead time to do DMA, but you could claim that computers such as the Apple II did something similar with the 6502.
The whole point of my video is that the difference in performance between the 6502 and the 68000 isn't at all about how long it takes an opcode to run, but rather in what power that opcode has.
@AndersNielsenAA ปีที่แล้ว ⁺²
@@CompuSAR You are absolutely correct.
The 6502 also allows DMA in the half cycle it’s not using the busses (except the NMOS continuously drives the address lines).
Of course we’re also seeing the cost of “increasingly more powerful instructions” with x86 now - so of course it’s a balancing act.
But in the case of 6502 vs 68k - comparing mips or instructions one to one is super meaningless :)
@RudysRetroIntel ปีที่แล้ว ⁺²
Interesting! Thanks for sharing
@CompuSAR ปีที่แล้ว ⁺²
Thank you for listening to me vent :-)
@DehnusNorder ปีที่แล้ว
Really, thank you for the video, and sorry for my ranting. Things like this are a personal gripe of mine.
I just can't believe he did it again. The C65 and the Apple 2GS were computers intended for a different market than the Amiga. The Amiga was a professional piece of kit with the price to match, and thus you could do things like spreadsheets, image processing, sound editing all at the same time from a graphical user interface. But it also was a pricey machine an Apple 2GS, Amstrad CPC Plus range and MSX 2(+), they were all intended for a different consumer than the Amiga. An Amiga you can use as a multi tasking machine for your office or your studio. And while there were MSX 2's out there with genlocking features (to help with video editing), it simply is not a comparable machine. I like the Z80a, but come on, it simple it's out of it's league the moment you put a 68000 in comparison, even without the extra IC the Amiga had on board to handle the aforementioned multi media features. An Atari ST, which lacks these advanced features the Amiga had, would still wipe the floor with the earlier mentioned 8 bit machines.
We are literally talking about comparing a Z80 architecture to a 286 (as that is the jump from 6502 to 68000. One based on the Motorola 6800 but far cheaper to sell en mass to everybody compared to Motorola's new and fancy CPU for high end workstations and servers and for a far more professional market until the price came down in the mid 80's. I mean there is a reason the Sinclair QL had a cheaper version with a far smaller addressing and databus. Cuts needed to be made to make it affordable. So like a 286 blows the Z80 out of the water (as it's a similar comparison as that much of a leap the 68000 was over the 6800), so does the 6502 get blown out of the water by the MC68000...
Sigh I'm ranting again. I just wish he'd understood that the market for an Apple 2GS or MSX 2 was just different from a Amiga. Most folks couldn't afford an Amiga until after the Amiga 500 launched and then usually until the early 90s, as these were expensive machines. That SEGA put a 68000 in the Megadrive was already a huge eye opener for many, SEGA's engineers had to push for it with management, who originally just wanted another Z80. Heck originally it had an even wider bus, but they halved the amount of memory and thus it would halve the bus width (to save cost). (You can still solder it on and it'll just work at this wider bandwidth :) ). But they showed the difference it made to management, and just how much a MC68000 would improve the performance in comparison. It simply was no contest.
Sorry.. rambling.
Interviews talking about why a company liked Treasure liked working with the Megadrive over the SNES (hint: It's that mighty 68000 vs that very CPU David Murray is talking about in his Apple GS comparison :P )
megadrive.me/2011/11/03/an-interview-with-treasure/
That should tell people a bit just how much of a step forward the 68000 was.
@CompuSAR ปีที่แล้ว ⁺¹
In his defense, I think the 6502 was the only CPU David programmed in assembly for. And the 6502 is a really great CPU, no argument. It's just that it's a really great CPU _for its time_, and the 68000 belongs to a different time.
@DehnusNorder ปีที่แล้ว ⁺¹
@@CompuSAR Not just a different time a completely different target audience. The history behind the 6502 is incredibly interesting, same with the Z80, I agree it's a really great little CPU. But it is designed with very different workloads in mind.
And I think he knows that, but just is not able to admit it, for some weird reasons.
BTW, found a fun little comparison while reading the wikipage of the 6809:
en.wikipedia.org/wiki/Motorola_6809#Market_acceptance
It not only shows just how much faster the 68000 was to the rest, but that it also did the calculations in far fewer total instructions, even when compared to the incredibly efficient 6502 and 6809. Total clockcycles taken is a nice adjustment for the cpu clock. So it takes less than 3 times as few cycles to do the same result at the same speed.
Sorry, I like the 68000 a lot :P. It was a gigantic leap at the time :) .
@turbo9team ปีที่แล้ว ⁺¹
Yes… as a guy that designed a 16bit pipelined microarchitecture for a the 16/8bit 6809 instruction set architecture (see my channel for details) I can verify your explanation. I will say the 8bit guys point isn’t invalid though, just lacking the details you have provided. The 65816/6502 does have faster cycle counts per instruction than the 68000 _given_only_8_bit_data_. Once you do mutli precision data the 68000 will dominate.
@CompuSAR ปีที่แล้ว
Welcome to my channel!
But the experiment I ran on that other video of mine shows that even for 8 bit workloads, the 68000 is sometimes faster (i.e. - fewer cycles per complete operation). I tried to check how long it would take to display an 8 bit number in decimal. For the 68000 that's a single opcode that takes 140 (!!!) cycles. For the 6502 it was a small program, but it took about the same number of cycles, even after I've optimized it as much as I could.
Check out th-cam.com/video/uZ-_aRzENSk/w-d-xo.html for more details.
@flatfingertuning727 ปีที่แล้ว ⁺¹
@@CompuSAR My attempt at a moderately-optimized 8-bit binary-to-BCD routine took 28 bytes and 133 cycles. It used a loop to execute a 3-instruction sequence 5 times, and explicitly-written-out code to repeat a 2-instruction sequence three times. Running the loop 8 times would have allowed the written-out-sequence to be eliminated at a savings of nine bytes.
Interestingly, though, the same technqiue could also be used on the 68000 and would work slightly better there. Load R0 with the value of interest and R1 with 0, perform ADD.B R0,R0, and then iterate eight times the sequence ABCD R1,R1 / ADDX.B R0,R0. The extra registers on the 68000 really help there.
@ZERR0R ปีที่แล้ว ⁺¹
I'm no retro CPU specialist, but why can't you load multiple programs and do some kind of multitasking with 6502?
@CompuSAR ปีที่แล้ว
It's not "can't". It's just impractical. Multi-processes require loading different programs into memory. This means that where things are in memory is something that isn't known during compile time. The 6502 does have indirect addressing commands, but they are: 1. Much slower than direct addressing commands and 2. require the pointer be stored in zero page.
So you only have 256 bytes (128 pointers) for *all* pointers *all* programs use. It's not an easy fit.
@ZERR0R ปีที่แล้ว
@@CompuSAR But you can use 1 pointer to point into some other pointer table somewhere in memory, and swap the other 127 in and out as you need them. I agree, that's not an easy task, but possible.
That way you can have as many pointers as you want, but can use only 127 at the same time.
@CompuSAR ปีที่แล้ว
@@ZERR0R The 6502 indirect addressing modes require that the pointer being dereferenced reside in the first 256 bytes of memory. If you want to swap a table in and out, you must actually copy the pointers back and forth.
The 65ce02 and the 65816 were better on that front, and can remap where the "zero page" resides, probably precisely because of that. Things are _more_ possible on those architectures. The original 6502? The cost would be really really high.
@ZERR0R ปีที่แล้ว
That's exactly what I was talking about. Use the first pointer to locate the table, and copy the necessary pointers into zero page.
@CompuSAR ปีที่แล้ว
@@ZERR0R Like I said many times, it's not that it's not possible. It's just really impractical. The context switch times you're describing are off the wall high.
That's before addressing the obvious problem: Indirect accesses on the 6502 are _considerably_ more expensive than absolute address accesses. That low cycles/opcode ration David was talking about? Less low if your program needs indirect accesses all the time.
Meanwhile, the 68000 doesn't have this problem, because its registers are large enough to hold addresses, and it has enough of those to not need to swap them to memory all the time. This means that in the "context switching" use case, you can hold the addresses to where things are in a register or two, and use register relative addressing to get to the data you need, without knowing at compile time where it was, with virtually zero added overhead.

ต่อไป

เล่นอัตโนมัติ