This was my introduction to your channel. You earned the sub within the first 30 seconds! But just so you know, if my boss complains about my productivity for the next few days I'm blaming you, as I will likely be binge watching these videos for the foreseeable future.
A reasonable approach, given that your target speed is two orders of magnitude lower than the native FPGA clock, LOL. Another trick I saw in an old 1972-era minicomputer was to generate the instruction clock from an 8x oscillator (1 MHz insn cycle from an 8 MHz clock), and use a counter and decoder (74LS138 idea) to generate eight distinct phases of the clock. The designers sequenced (or micro-sequenced? not quite the same as the modern concept) the flow within an insn by clocking flops or enabling latched on each of the distinct phases when they needed it. So you might increment the PC on phase 1 but load the instruction from memory in phase 2, load the register read data indexed by the opcode on phase 3, and so on. It worked quite well in practice for the mostly 74xx TTL design. But would be a nightmare in an FPGA unless you use them strictly as clock enables in a natively 8x fully synchronous clock design, which is sort of what you're doing.
Actually there's an important difference between that and what I'm doing. In control, it's called open loop vs. closed loop feed back. Doing that would generate the correct clock out of the too-fast clock. Execution rate would be constant and predictable, and you would only be able to divide the original clock by a whole number. If you don't want to muck around with DDR modules, you might even be limited to dividing by an even number. Aside from allowing to divide by an odd number (and, in fact, reach any rational fraction of the original clock), this technique is a closed loop feedback control. If there is an external source of delay, such as DDR being unavailable or bus contention with another unit (the Risc-V, HDMI and SPI all also need to access the memory), in your way you're pretty much resigned to lose the whole cycle. In my way you make up for the lost time, and do so at a rate adapted to the high frequency clock.
It turned out that all those latches are expensive. The i386 had two phases. Today, most designs are single phase. Each pipeline stage does its work. The result is captured on the edge of one clock signal. Then the new input is gated to new values for the next “task” of this block of combinatorical logic. More phases allow you to model different gate delays through a circuit. Yeah, and overlap: pass through preliminary results and transients to not loose any time in the latch.
I don't think this is about latches being expensive, as it was about the pipeline architecture proving superior, making everyone switch. Pipeline is very flipflops oriented (though it is not immune to cross stages communication).
@@CompuSAR then explain to me why the deep Pentium4 pipeline failed? A latch needs 7 transistors per bit. I think that the ALU in the 6502 only has like 20 transistors per bit. With 8 phases you pay more ( in terms of area and power) for latches than any real work.
@@ArneChristianRosenfeldt I will readily admit ignorance at those levels of analysis. With that said, *as far as I understand*, the main motivator to switch was the promise of more instructions/cycle, rather than power. The Intel line are the only modern CPUs that still carry machine language defined in the CISC era, and they pay a huge price to convert it to a pipeline in terms of pre-execution processing and, trying to save that, instruction caches size. And it's still worth it to them because the had no hope of achieving super-scalar execution with a CISC architecture. With that said, all of the above is my understanding of things. It's not my main area of expertise, so if I'm wrong, I'm more than happy to learn.
This is basically how i handle load-balancing within single thread applications to tie the fps rates to the desire position.....but I adjust the individual load timing on-the-fly in order to maintain rather than gate them, this way it attempts to maximize calculations until the core starts to suffer, then just raise the wait time for the following cycles (since the individual calculations performed are never known ahead of time, they are just added in as class callbacks into a list). soooo....in short, I create my own multi-thread using count triggers lol.
So 6502 is running with an effective clock jitter, to allow for other bus activity or SDRAM being unresponsive. It doesn't hurt your project since all of your other subsystems have flexible timing as well, and might as well go to sleep and catch up as needed. But i wonder how it would affect interfacing with legacy hardware, say Commodore IEC bus and 1541 drive, or emulating the 1541. The basic IEC protocol is explicitly clocked, so an endpoint can cycle stretch, but wouldn't have fastloader compatibility?
It's an excellent question, and one I don't have a ready answer to. I'm guessing no, but I might be wrong. In more details: DDR refresh takes several hundred nanoseconds, so is unlikely to cause even a single missed cycle (1MHz translates to 1us cycles, so 1000ns). Same goes for cache evictions and HDMI access, which pretty much sums up the potential causes for outside delays. Even if the delay is longer, I can't see it affecting IEC. Even with a fast loader, the IEC is driven by the 6502 on the C1541. It's maximal granularity is 3-4 cycles assuming no loops, closer to 10 cycles with a loop. Even a 3 cycles jitter cannot possibly affect it. Where things are a little more touch and go are things that sit on the actual 8 bit bus. If you hook up an Apple II expansion card or a C64 cartridge (which this project totally aims to support), those expect to see a 1MHz clock with bus operations. I'm still optimistic that it'll be possible to give them what they need (or, at least, close enough for things to work), but time will tell.
@@CompuSAR Yes. It was already noticed by Hector Martin that someone like myself and several of his female friends who largely watch engineering related content won't count as a female viewer, and the self-specified gender is completely ignored. Don't trust the stat.
@@CompuSAR Yes that was exactly the implication. Luckily there was never such a stigma in my family when i grew up, but then that's a family with Bulgarian roots, so that's a little different from most of the rest of the world.
The original RISC designs were the small efficient processors executing the µ-code of mainframe and minicomputers of the 1960s and 70s. Well before the 6502 (which isn't RISC either).
personally i would've gone exactly the opposite way, adjust the RISC-V CPU to have a 6502 compatible async. 8-bit bus. and then have the RISC-V CPU's clock just be the 6502's run through a PLL (multiplied by 50 or something). that way they're always in-sync with eachother and you can use some much much simplier logic to connect them to the same bus (before the width adjustment/cache). plus it allows you to adjust the 6502's clock to whatever system you're emulating and the RISC-V's clock will automatically follow. though overall, your idea of using a large master clock for both CPUs and just letting it get 1 cycle out of a set amount of cycles is way better.
Neither adjustments are particularly easy, but I think writing a synchronous 6502 is easier than writing an async RiscV. What's more, it's not just the RiscV. You'd also need to adjust the DDR controller, SPI controller, interconnect and any other component in the system. The whole system would have to be async.
I was thinking of how DEC made the Virtual Address Extension to turn the 16 bit PDP-11 into a modern for the time 32 Bit CPU while making sure PDP-11 code would still work. I think something similar could be done for the Zilog eZ80F917, Motorola 68060, WDC 65C832 and Vortex86DX3 to allow them to run modern 64 bit code with 256 Bit Vector Extensions, a 64 bit address bus, Cryptography acceleration, 4 way SIMD, EPIC instructions, and a hardware PCIE Express controller without breaking backwards compatibility. To escalate from the original instruction set to the 64 bit expanded instruction set, there would be a VAX instruction that would be used by the bootloader to switch to the expanded instruction set and address bus.
@@mal2ksc I am aware of that, but I favor DEC's approach with the VAX/PDP11 architecture over the protected/real mode intel architecture. The reason is backwards compatibility. What my proposed virtual address extension would so is allow 8 bit eZ80 code to escalate to 64 bit VAX code and to take advantage of an expanded instruction set. Other than that the instruction set is the same and if the programmer wants to they could use the AVX2, EPIC, 4 way SIMD and the hardware PCIE controller in 8 bit mode, but all the VAX instruction does is allow 64 bit code to run while 8 bit code that already exists does not need any modifications.
I'm sorry, I don't understand the question. We'll have a Risc-V running in whatever clock speed it can (currently 75MHz), and a 6502 running at the same clock speed, but being throttled down to the original Apple II speeds, so effectively it runs at 1MHz. Since all internal buses are 75MHz, no clock domains need be maintained. The only place this requires adapters is if you want the project to allow an external enhancement card, or a Commodore 64 cartridge. This requires a 1MHz async bus, but like I said in the video, I _can_ write an adapter.
After re-reading your question, I think I understand it. Yes, it is highly unlikely that any project other than CompuSar will find a use for a 6502 designed this way. With that said: a. CompuSar wants to implement more than just the Apple II b. The same can be said to pretty much all other components in the system. At 08:05 I mention that CompuSar isn't a stranger to crazy ideas. This is precisely what I meant. The whole project revolves around building custom modules for tasks where standard modules already exist, so that they are a more precise match and take less FPGA resources, allowing the use of cheaper FPGAs and lowering the end product's BoM.
@@CompuSAR i was referring to the halting of the clock that was done by the apple ][. I thought that would affect compatabilty with other systems for example atari 800 or vic 20 or the BBC micro.
I am really hard pressed to think of a scenario where two 8 bit systems, whether homogeneous or heterogeneous, would communicate with each other at those clock speeds.
@@CompuSAR I think he's asking about emulating other 6502 based computers. As an example, Atari systems ran at a (roughly) 1.79MHz clock which served the CPU and video signal generation. The exact clock speed derives from NTSC signal generation. RAM was also shared, and signals mediated the CPU or video chips accessing memory. Some advanced methods on these machines required the CPU to update memory mapped hardware registers in pixel sync with video generation. It looks like the method you're using will produce some phase dithering in the emulated 6502 clock, but unlikely to be more than few percent.
Personally, I don't care about accuracy. I'm not even sure if I would start with a 6502, but definitely I'd make refinements until it was basically a whole new chip anyway. I understand that most who pick up an FPGA are looking to make a completely accurate simulation of an old chip, and if that usage makes them happy then great. I suppose they'd label me a heretic for thinking emulation is a better option if you just want to play games, but I'd really like to design my own processor and eventually produce a physical piece of hardware from my design. I figure an FPGA is great for experimentation in that regard, especially as you scale up beyond what an emulator can adequately do.
It sure is, but that's a different project. Feel free to pursue it. My problem with "designing my own CPU" is that a CPU, or for that matter an OS, are useless without software. If you design your own CPU, you will need to design your own software, and for that matter, probably create a backend for a compiler as well. It's a huge undertaking. I'm not saying don't do it. If you want to prove to yourself that you can, it's a very cool project, and quite satisfying when you get it actually going. Just realize that the chances of objective success (commercial, users) are slim.
@@CompuSAR Well, I don't care about commercial uses or having more than one user anyway. The purpose I have in mind is more of a "no infrastructure existing at that point in time" kind of use. Sort of like doomsday prepping for someone that wants to preserve computers for the future. And I'm not afraid of writing my own software for such a system, I've already written multiple compilers including backend code generation. I'm in the process of writing a backend for ARM and I'm actually considering doing one for the 6502 just to see if I could adapt my language towards writing NES games.
@@anon_y_mousse good luck. I briefly considered writing an LLVM backend for the 6502, but its lack of indirect oprands except through zero page is a real bummer. You could do it by treating ZP as 128 16-bit registers, and use them for register scheduling, but the result would be *slow*.
@@CompuSAR This is why I wrote each of my code generators myself. I don't think that highly of LLVM and I've noticed that gcc which was hand coded does a better job of optimizing. I was thinking for NES code generation that it'd be better to add more as statics in the ROM, and for extra space I would have to add a RAM chip to the cart. Might be considered a bit of a cheat, but there were a few carts that had extra RAM. As far as I understand it, the 6502 only had 256 bytes of stack and some of that was reserved.
@@anon_y_mousse Technically, none of the stack bytes were reserved. I think you're confusing with page zero where, depending on platform, there were reserved addresses. Either way, good luck.
When the enhanced 6502s came out with higher clock rates and every instruction operates in a single cycle we ditched the original. The new models had PHX and PHY so IRQs had less overhead. We moved more data that way. Why even use the original, it was obsolete when the enhanced versions came out, early 90s? Make something that runs at a higher speed than the RISC-V and run it with the RISC-V clock. Then people can fix the code to deal with the higher clock rate. Innovate forward.
Please do spend one minute on the channel's trailer to see what the end goal is. What you suggest is literally the opposite of what I'm trying to achieve.
I'm sorry, but that's just not true. While there are some RISCish traits to the 6502, it is still very clearly a CISC CPU with CISC machine code. The three main RISC characteristics are a pipeline, with all instructions taking the same number of cycles and having the same length. The 6502 has none of those.
@@CompuSAR Actually there is a little bit of true pipelining, and a lot of instructions do finish up while the next one is being fetched. An example given in WDC's programming manual is ADC#, which requires 5 distinct steps, but only two clocks' time: Step 1: Fetch the instruction opcode ADC. Step 2: Interpret the opcode to be ADC of a constant. Step 3: Fetch the operand, the constant to be added. Step 4: Add the constant to the accumulator contents. Step 5: Store the result back to the accumulator. Steps 2 and 3 both happen in a single clock. The processor fetches the next byte not knowing yet if it will need it or what it will be for. Steps 4 and 5 occur during the next instruction's step 1, eliminating the need for two more clocks. It cannot do steps 3 and 4 in one clock because the memory being read may not have the data valid and stable any more than a small set-up time before phase 2 falls and the data actually gets taken into the processor; so step 4 cannot begin until after step 3 is totally finished. But doing 2 and 3 simultaneously, and then doing 4 and 5 simultaneous with step 1 of the next instruction makes the whole 5-step process appear to take only 2 clocks. Another part of the pipelining is the reason why operands are low-byte-first. The processor starts fetching the operand's low byte before the instruction decode has figured out how many bytes the instruction will have (1, 2, or 3). In the case of indexing before or without any indirection, the low byte needs to be added to the index register first anyway, so the 6502 gets that going before the high byte has finished arriving at the processor. In the case of something like LDA(abs), the first indirect address is fetched before the carry from the low-byte addition is added to the high byte. Then if it finds out there was no carry generated, it already has what it needs, and there's no need to add another cycle to read another address 256 bytes higher in the memory map. This way the whole 7-step instruction process requires only 4 clocks. (This is from the next page of the same programming manual.) While i agree it is not a full RISC CPU by nowadays standards i also can not aggree that it is a full CISC CPU like you claim. It's something between them both but it definitely has a lower amount of instructions then other CPU's of that time.
@gpisic If you want to paint a broad stroke brush, here is the division: CISC: internal buses, microcode RISC: Everything is done through one or more pipelines The 6502 does not have a pipeline. It is true that it has some things that are characteristic of RISC, but nowhere near enough to justify the RISC label. In particular: The 6502 fetches the PC at the second cycle of an instruction, regardless of what that instruction is (your 2 and 3 steps). So in a way, it fetches before it decodes, which is similar to what a pipeline does. The 6502, however, does not do that as part of a pipeline, not in the RISC sense of the word. There are a few commands (ADC isn't one of them) that indeed take effect after the next command has already started executing. Again, it's something a RISC machine does too, but using a different mechanism. So, no, saying that the 6502 is a RISC CPU is, to me, stretching the truth beyond breaking point. If you want, we can agree that it's a precursor to the RISC CPUs.
@@gpisicRISC was a strategy to break up complex instructions into smaller parts that can be implemented more simply, and allow redundancy to be removed, allowing faster execution by a higher clock speed and more efficient code, at the expense of larger code. But to simplify complex instructions, the CPU must be complex to have them in the first place. The earliest CPUs were accumulator based, in which arithmetic and logic operations were carried out in a dedicated register (later several), with data from memory. Early microprocessors like the 6800 and 6502 were also accumulator based, while mini and mainframe computers had general purpose register sets with operands in registers, memory, or indirect memory. It took a while for those types of designs to become microprocessors, leading to RISC, well after the 6502’s time. Oddly, there was an 8 bit microprocessor which did have a very RISC-like design, the RCA 1802. It had a large flexible register set, higher than average clock speed, and simplified instruction set, along with larger code size. It didn’t have much advantage over the competition, but was used in space (Galileo space probe to Jupiter’s moons).
Some things in the MOS6502 are not that great, but this processor was one of the early ones. The MOS6510 was a bit better, still not super but millions of them were made and the people loved it. The 6502 is a bit basic, it is a processor and not much more. Long time I did not see the strong points of the MOS6510 in comparison with the MOS6502, the extra bits were so meagre, so little, so tiny. Why did they ever bother to produce it. The MOS6502 had mountains of limitations and piles of shortcomings. Yeah, it could have been done better, but it wasn't done back then and now it is too late to do something about it. History is written down already, no use reinventing the wheel. The MOS6502 is a bit special, like a diversity kid that has to be in a commercial on TV, his parent must be proud that during the hole commercial he did not drewl rainbows on something, that kind of special is the MOS6502 processor. It is not loved for being fantastic but loved for getting the work done despite of the huge stack of quiescentnesses. (I ment koo worky ness ses)
@@CompuSAR Good luck next time. I suggest you don't include music unless your video contents requires it: your watchers will surely have very varied musical tastes, and agreeing with yours will most likely be chancy.
Interesting content, but why this brainfucking soundtrack? Your voice over is what we come for why put an annoying and distracting wall of sound underneath it?
Before anything else: I'm glad you like the content. The short answer is that most people seem to enjoy the video better this way. With that said, this is my first time experimenting with adding music, and mistakes were made. I should have definitely done a better job of making sure it doesn't interfere with the narrator. I'll try to do better next time.
Impressive, but we need to stop canonizing technology. We did that with UNIX and look at the mess it made. It time to move forward by exploring new ideas, not old ones.
This was my introduction to your channel. You earned the sub within the first 30 seconds! But just so you know, if my boss complains about my productivity for the next few days I'm blaming you, as I will likely be binge watching these videos for the foreseeable future.
Thank you.
With that said, my older videos give me the cringe. I hope you survive the experience.
Your best video yet! I love how far your production quality has come. And the content remains top notch.
Good. You have seen a flaw and addressed it. In for a penny, in for a pound!
That could have been this channel's name.
Beautiful!
That is nuts!
I explicitly call this crazy, after all.🤣
A reasonable approach, given that your target speed is two orders of magnitude lower than the native FPGA clock, LOL. Another trick I saw in an old 1972-era minicomputer was to generate the instruction clock from an 8x oscillator (1 MHz insn cycle from an 8 MHz clock), and use a counter and decoder (74LS138 idea) to generate eight distinct phases of the clock. The designers sequenced (or micro-sequenced? not quite the same as the modern concept) the flow within an insn by clocking flops or enabling latched on each of the distinct phases when they needed it. So you might increment the PC on phase 1 but load the instruction from memory in phase 2, load the register read data indexed by the opcode on phase 3, and so on. It worked quite well in practice for the mostly 74xx TTL design. But would be a nightmare in an FPGA unless you use them strictly as clock enables in a natively 8x fully synchronous clock design, which is sort of what you're doing.
Actually there's an important difference between that and what I'm doing. In control, it's called open loop vs. closed loop feed back.
Doing that would generate the correct clock out of the too-fast clock. Execution rate would be constant and predictable, and you would only be able to divide the original clock by a whole number. If you don't want to muck around with DDR modules, you might even be limited to dividing by an even number.
Aside from allowing to divide by an odd number (and, in fact, reach any rational fraction of the original clock), this technique is a closed loop feedback control. If there is an external source of delay, such as DDR being unavailable or bus contention with another unit (the Risc-V, HDMI and SPI all also need to access the memory), in your way you're pretty much resigned to lose the whole cycle. In my way you make up for the lost time, and do so at a rate adapted to the high frequency clock.
It turned out that all those latches are expensive. The i386 had two phases. Today, most designs are single phase. Each pipeline stage does its work. The result is captured on the edge of one clock signal. Then the new input is gated to new values for the next “task” of this block of combinatorical logic.
More phases allow you to model different gate delays through a circuit. Yeah, and overlap: pass through preliminary results and transients to not loose any time in the latch.
I don't think this is about latches being expensive, as it was about the pipeline architecture proving superior, making everyone switch. Pipeline is very flipflops oriented (though it is not immune to cross stages communication).
@@CompuSAR then explain to me why the deep Pentium4 pipeline failed? A latch needs 7 transistors per bit. I think that the ALU in the 6502 only has like 20 transistors per bit. With 8 phases you pay more ( in terms of area and power) for latches than any real work.
@@ArneChristianRosenfeldt I will readily admit ignorance at those levels of analysis. With that said, *as far as I understand*, the main motivator to switch was the promise of more instructions/cycle, rather than power.
The Intel line are the only modern CPUs that still carry machine language defined in the CISC era, and they pay a huge price to convert it to a pipeline in terms of pre-execution processing and, trying to save that, instruction caches size. And it's still worth it to them because the had no hope of achieving super-scalar execution with a CISC architecture.
With that said, all of the above is my understanding of things. It's not my main area of expertise, so if I'm wrong, I'm more than happy to learn.
This is basically how i handle load-balancing within single thread applications to tie the fps rates to the desire position.....but I adjust the individual load timing on-the-fly in order to maintain rather than gate them, this way it attempts to maximize calculations until the core starts to suffer, then just raise the wait time for the following cycles (since the individual calculations performed are never known ahead of time, they are just added in as class callbacks into a list).
soooo....in short, I create my own multi-thread using count triggers lol.
Wow, I was totally riveted to this video and I have nothing to do with hardware. Subbed!
Welcome! I hope you find the rest of my content as interesting.
So 6502 is running with an effective clock jitter, to allow for other bus activity or SDRAM being unresponsive. It doesn't hurt your project since all of your other subsystems have flexible timing as well, and might as well go to sleep and catch up as needed.
But i wonder how it would affect interfacing with legacy hardware, say Commodore IEC bus and 1541 drive, or emulating the 1541. The basic IEC protocol is explicitly clocked, so an endpoint can cycle stretch, but wouldn't have fastloader compatibility?
It's an excellent question, and one I don't have a ready answer to. I'm guessing no, but I might be wrong.
In more details: DDR refresh takes several hundred nanoseconds, so is unlikely to cause even a single missed cycle (1MHz translates to 1us cycles, so 1000ns). Same goes for cache evictions and HDMI access, which pretty much sums up the potential causes for outside delays.
Even if the delay is longer, I can't see it affecting IEC. Even with a fast loader, the IEC is driven by the 6502 on the C1541. It's maximal granularity is 3-4 cycles assuming no loops, closer to 10 cycles with a loop. Even a 3 cycles jitter cannot possibly affect it.
Where things are a little more touch and go are things that sit on the actual 8 bit bus. If you hook up an Apple II expansion card or a C64 cartridge (which this project totally aims to support), those expect to see a 1MHz clock with bus operations. I'm still optimistic that it'll be possible to give them what they need (or, at least, close enough for things to work), but time will tell.
On a completely different note, my channel stats insist that my viewers consist of 100% males. Please tell me YT is wrong on that front.
@@CompuSAR Yes. It was already noticed by Hector Martin that someone like myself and several of his female friends who largely watch engineering related content won't count as a female viewer, and the self-specified gender is completely ignored. Don't trust the stat.
My theory is worse. I suspect you don't count *because* you watch engineering content, which is really depressing if true.
@@CompuSAR Yes that was exactly the implication. Luckily there was never such a stigma in my family when i grew up, but then that's a family with Bulgarian roots, so that's a little different from most of the rest of the world.
I was going to upvote this, but it was at 42- so I left it as is.... oh wait! Now I can. Great discussion.
Just like all Internet discussions, you can only talk when things are negative.
The venerable 6502, the original RISC design, should be the first architecture implemented in graphene.
The original RISC designs were the small efficient processors executing the µ-code of mainframe and minicomputers of the 1960s and 70s. Well before the 6502 (which isn't RISC either).
personally i would've gone exactly the opposite way, adjust the RISC-V CPU to have a 6502 compatible async. 8-bit bus. and then have the RISC-V CPU's clock just be the 6502's run through a PLL (multiplied by 50 or something). that way they're always in-sync with eachother and you can use some much much simplier logic to connect them to the same bus (before the width adjustment/cache).
plus it allows you to adjust the 6502's clock to whatever system you're emulating and the RISC-V's clock will automatically follow.
though overall, your idea of using a large master clock for both CPUs and just letting it get 1 cycle out of a set amount of cycles is way better.
Neither adjustments are particularly easy, but I think writing a synchronous 6502 is easier than writing an async RiscV. What's more, it's not just the RiscV. You'd also need to adjust the DDR controller, SPI controller, interconnect and any other component in the system. The whole system would have to be async.
I was thinking of how DEC made the Virtual Address Extension to turn the 16 bit PDP-11 into a modern for the time 32 Bit CPU while making sure PDP-11 code would still work. I think something similar could be done for the Zilog eZ80F917, Motorola 68060, WDC 65C832 and Vortex86DX3 to allow them to run modern 64 bit code with 256 Bit Vector Extensions, a 64 bit address bus, Cryptography acceleration, 4 way SIMD, EPIC instructions, and a hardware PCIE Express controller without breaking backwards compatibility. To escalate from the original instruction set to the 64 bit expanded instruction set, there would be a VAX instruction that would be used by the bootloader to switch to the expanded instruction set and address bus.
I also think of the real mode/protected mode divide introduced with the 80286 and really only rendered fully baked in the 80386.
@@mal2ksc I am aware of that, but I favor DEC's approach with the VAX/PDP11 architecture over the protected/real mode intel architecture. The reason is backwards compatibility. What my proposed virtual address extension would so is allow 8 bit eZ80 code to escalate to 64 bit VAX code and to take advantage of an expanded instruction set. Other than that the instruction set is the same and if the programmer wants to they could use the AVX2, EPIC, 4 way SIMD and the hardware PCIE controller in 8 bit mode, but all the VAX instruction does is allow 64 bit code to run while 8 bit code that already exists does not need any modifications.
Fyi, the background music seems a bit too much foreground.
Yeah, I know. It's my first time trying to integrate music.
@@CompuSAR 👍 Low levels does work, stops these type of videos from being too dry.
would that screw up all non-apple 2 usage of your fgpa 6502?
I'm sorry, I don't understand the question.
We'll have a Risc-V running in whatever clock speed it can (currently 75MHz), and a 6502 running at the same clock speed, but being throttled down to the original Apple II speeds, so effectively it runs at 1MHz.
Since all internal buses are 75MHz, no clock domains need be maintained.
The only place this requires adapters is if you want the project to allow an external enhancement card, or a Commodore 64 cartridge. This requires a 1MHz async bus, but like I said in the video, I _can_ write an adapter.
After re-reading your question, I think I understand it.
Yes, it is highly unlikely that any project other than CompuSar will find a use for a 6502 designed this way. With that said:
a. CompuSar wants to implement more than just the Apple II
b. The same can be said to pretty much all other components in the system.
At 08:05 I mention that CompuSar isn't a stranger to crazy ideas. This is precisely what I meant. The whole project revolves around building custom modules for tasks where standard modules already exist, so that they are a more precise match and take less FPGA resources, allowing the use of cheaper FPGAs and lowering the end product's BoM.
@@CompuSAR i was referring to the halting of the clock that was done by the apple ][. I thought that would affect compatabilty with other systems for example atari 800 or vic 20 or the BBC micro.
I am really hard pressed to think of a scenario where two 8 bit systems, whether homogeneous or heterogeneous, would communicate with each other at those clock speeds.
@@CompuSAR I think he's asking about emulating other 6502 based computers.
As an example, Atari systems ran at a (roughly) 1.79MHz clock which served the CPU and video signal generation. The exact clock speed derives from NTSC signal generation. RAM was also shared, and signals mediated the CPU or video chips accessing memory. Some advanced methods on these machines required the CPU to update memory mapped hardware registers in pixel sync with video generation.
It looks like the method you're using will produce some phase dithering in the emulated 6502 clock, but unlikely to be more than few percent.
won't matter if a couple of clocks run after a longer delay... laughs in IEC bus.
Fine, I'll make a video about IEC and cycle accuracy! Happy now?
Personally, I don't care about accuracy. I'm not even sure if I would start with a 6502, but definitely I'd make refinements until it was basically a whole new chip anyway. I understand that most who pick up an FPGA are looking to make a completely accurate simulation of an old chip, and if that usage makes them happy then great. I suppose they'd label me a heretic for thinking emulation is a better option if you just want to play games, but I'd really like to design my own processor and eventually produce a physical piece of hardware from my design. I figure an FPGA is great for experimentation in that regard, especially as you scale up beyond what an emulator can adequately do.
It sure is, but that's a different project. Feel free to pursue it.
My problem with "designing my own CPU" is that a CPU, or for that matter an OS, are useless without software. If you design your own CPU, you will need to design your own software, and for that matter, probably create a backend for a compiler as well. It's a huge undertaking.
I'm not saying don't do it. If you want to prove to yourself that you can, it's a very cool project, and quite satisfying when you get it actually going. Just realize that the chances of objective success (commercial, users) are slim.
@@CompuSAR Well, I don't care about commercial uses or having more than one user anyway. The purpose I have in mind is more of a "no infrastructure existing at that point in time" kind of use. Sort of like doomsday prepping for someone that wants to preserve computers for the future. And I'm not afraid of writing my own software for such a system, I've already written multiple compilers including backend code generation. I'm in the process of writing a backend for ARM and I'm actually considering doing one for the 6502 just to see if I could adapt my language towards writing NES games.
@@anon_y_mousse good luck. I briefly considered writing an LLVM backend for the 6502, but its lack of indirect oprands except through zero page is a real bummer. You could do it by treating ZP as 128 16-bit registers, and use them for register scheduling, but the result would be *slow*.
@@CompuSAR This is why I wrote each of my code generators myself. I don't think that highly of LLVM and I've noticed that gcc which was hand coded does a better job of optimizing. I was thinking for NES code generation that it'd be better to add more as statics in the ROM, and for extra space I would have to add a RAM chip to the cart. Might be considered a bit of a cheat, but there were a few carts that had extra RAM. As far as I understand it, the 6502 only had 256 bytes of stack and some of that was reserved.
@@anon_y_mousse Technically, none of the stack bytes were reserved. I think you're confusing with page zero where, depending on platform, there were reserved addresses. Either way, good luck.
When the enhanced 6502s came out with higher clock rates and every instruction operates in a single cycle we ditched the original. The new models had PHX and PHY so IRQs had less overhead. We moved more data that way. Why even use the original, it was obsolete when the enhanced versions came out, early 90s? Make something that runs at a higher speed than the RISC-V and run it with the RISC-V clock. Then people can fix the code to deal with the higher clock rate. Innovate forward.
Please do spend one minute on the channel's trailer to see what the end goal is. What you suggest is literally the opposite of what I'm trying to achieve.
Why?
Ooh, I know that one!
Because.
Someone made a 100Mhz 6052 on FPGA already 😂
You do realize that's not what I'm trying to do here, right?
@@CompuSAR I wonder if their work may help you somehow 🤷♂️
Link?
Actually the 6502 is considered one of the first RISC processors not CISC like your video description states.
I'm sorry, but that's just not true. While there are some RISCish traits to the 6502, it is still very clearly a CISC CPU with CISC machine code.
The three main RISC characteristics are a pipeline, with all instructions taking the same number of cycles and having the same length. The 6502 has none of those.
@@CompuSAR Actually there is a little bit of true pipelining, and a lot of instructions do finish up while the next one is being fetched. An example given in WDC's programming manual is ADC#, which requires 5 distinct steps, but only two clocks' time:
Step 1: Fetch the instruction opcode ADC.
Step 2: Interpret the opcode to be ADC of a constant.
Step 3: Fetch the operand, the constant to be added.
Step 4: Add the constant to the accumulator contents.
Step 5: Store the result back to the accumulator.
Steps 2 and 3 both happen in a single clock. The processor fetches the next byte not knowing yet if it will need it or what it will be for. Steps 4 and 5 occur during the next instruction's step 1, eliminating the need for two more clocks. It cannot do steps 3 and 4 in one clock because the memory being read may not have the data valid and stable any more than a small set-up time before phase 2 falls and the data actually gets taken into the processor; so step 4 cannot begin until after step 3 is totally finished. But doing 2 and 3 simultaneously, and then doing 4 and 5 simultaneous with step 1 of the next instruction makes the whole 5-step process appear to take only 2 clocks.
Another part of the pipelining is the reason why operands are low-byte-first. The processor starts fetching the operand's low byte before the instruction decode has figured out how many bytes the instruction will have (1, 2, or 3). In the case of indexing before or without any indirection, the low byte needs to be added to the index register first anyway, so the 6502 gets that going before the high byte has finished arriving at the processor. In the case of something like LDA(abs), the first indirect address is fetched before the carry from the low-byte addition is added to the high byte. Then if it finds out there was no carry generated, it already has what it needs, and there's no need to add another cycle to read another address 256 bytes higher in the memory map. This way the whole 7-step instruction process requires only 4 clocks. (This is from the next page of the same programming manual.)
While i agree it is not a full RISC CPU by nowadays standards i also can not aggree that it is a full CISC CPU like you claim.
It's something between them both but it definitely has a lower amount of instructions then other CPU's of that time.
@gpisic If you want to paint a broad stroke brush, here is the division:
CISC: internal buses, microcode
RISC: Everything is done through one or more pipelines
The 6502 does not have a pipeline. It is true that it has some things that are characteristic of RISC, but nowhere near enough to justify the RISC label.
In particular:
The 6502 fetches the PC at the second cycle of an instruction, regardless of what that instruction is (your 2 and 3 steps). So in a way, it fetches before it decodes, which is similar to what a pipeline does. The 6502, however, does not do that as part of a pipeline, not in the RISC sense of the word.
There are a few commands (ADC isn't one of them) that indeed take effect after the next command has already started executing. Again, it's something a RISC machine does too, but using a different mechanism.
So, no, saying that the 6502 is a RISC CPU is, to me, stretching the truth beyond breaking point. If you want, we can agree that it's a precursor to the RISC CPUs.
@@gpisicRISC was a strategy to break up complex instructions into smaller parts that can be implemented more simply, and allow redundancy to be removed, allowing faster execution by a higher clock speed and more efficient code, at the expense of larger code. But to simplify complex instructions, the CPU must be complex to have them in the first place.
The earliest CPUs were accumulator based, in which arithmetic and logic operations were carried out in a dedicated register (later several), with data from memory. Early microprocessors like the 6800 and 6502 were also accumulator based, while mini and mainframe computers had general purpose register sets with operands in registers, memory, or indirect memory. It took a while for those types of designs to become microprocessors, leading to RISC, well after the 6502’s time.
Oddly, there was an 8 bit microprocessor which did have a very RISC-like design, the RCA 1802. It had a large flexible register set, higher than average clock speed, and simplified instruction set, along with larger code size. It didn’t have much advantage over the competition, but was used in space (Galileo space probe to Jupiter’s moons).
Some things in the MOS6502 are not that great, but this processor was one of the early ones. The MOS6510 was a bit better, still not super but millions of them were made and the people loved it. The 6502 is a bit basic, it is a processor and not much more. Long time I did not see the strong points of the MOS6510 in comparison with the MOS6502, the extra bits were so meagre, so little, so tiny. Why did they ever bother to produce it.
The MOS6502 had mountains of limitations and piles of shortcomings. Yeah, it could have been done better, but it wasn't done back then and now it is too late to do something about it. History is written down already, no use reinventing the wheel. The MOS6502 is a bit special, like a diversity kid that has to be in a commercial on TV, his parent must be proud that during the hole commercial he did not drewl rainbows on something, that kind of special is the MOS6502 processor. It is not loved for being fantastic but loved for getting the work done despite of the huge stack of quiescentnesses. (I ment koo worky ness ses)
How annoying, this background beating music while explanations are being narrated! Are you aware it interferes with understanding?
This was my very first attempt at including music with video. The ballance wasn't always..... balanced.
I'll try to do better next time
@@CompuSAR Good luck next time. I suggest you don't include music unless your video contents requires it: your watchers will surely have very varied musical tastes, and agreeing with yours will most likely be chancy.
The music is SO ANNOYING! When will people learn???
Hopefully, next video. This was *literally* the first time I tried integrating music, and mistakes were made.
Interesting content, but why this brainfucking soundtrack?
Your voice over is what we come for why put an annoying and distracting wall of sound underneath it?
Before anything else: I'm glad you like the content.
The short answer is that most people seem to enjoy the video better this way. With that said, this is my first time experimenting with adding music, and mistakes were made. I should have definitely done a better job of making sure it doesn't interfere with the narrator. I'll try to do better next time.
Impressive, but we need to stop canonizing technology. We did that with UNIX and look at the mess it made. It time to move forward by exploring new ideas, not old ones.
I don't think anybody has ever tried to do this in this way before. Doesn't that make this, by definition, a new idea?
@@CompuSAR Clearly your mental CPU includes the PWN instruction