Just found your channel sir, liked and subscribed, because this is extremely interesting. I've implemented the 6502 and a C64 in software, but I've always wanted to learn more about FPGAs.
Awesome! I will definitely get the C64 at some point, but the computer kinda scares me. Unlike the Apple, it has a ton of things going on outside of the CPU, and unlike the Amiga, I don't know them. When the time comes, tips would be most appreciated. Something that may be of interest to you: I want the C64 implementation to be able to accept C64 cartridges. I still have on a vague idea how that's going to work, though.
@@CompuSAR Yeah, there's a lot of stuff on the C64 for sure. My VIC-II implementation is by no means complete and lacks bitmap graphics modes. But I've still been able to use my emulator for a lot of cool projects. I have implemented basic cartridge support in my emulator and it's actually relatively straight forward. My plan when implementing the 6502 was always to implement more systems than the C64 and the Apple II is next on my list, so I guess I should get to it very soon.
@@HAGSLAB There are two challenges I see for implementing cartridges. The first is the number of free IO pins the FPGA has. I don't know the pin-out, but I'm guessing you'd need at least 16 address, 8 data, R/W and the clock. Those already use *all* the unassigned pins my FPGA board has. The second challenge is that the C64 works at 5V, wheres the FPGA tops out at 3V3. This means you need a level converter, and one that has as many pins as needed. Figuring out how to do that _cheaply_ is going to be interesting. Anyways, that's still a bit far in the future. There are plenty of challenges before we get there.
@@CompuSAR That makes a lot og sense. It's a lot more straight forward in software, haha 😅 See, this is why I'm a software guy, not so bright at the hardware stuff 😂
Wait, how is it you only got like 800 subscribers? It seems like the algorithm found your video though, 'cause I got it in my recommended and I'm now a subscriber, so I hope you'll soon be *a lot* farther up!
There's a bug in TH-cam. You can only give a comment one heart. Maybe I'll frame yours. I'm still high from that 800 number. Just a couple of days ago it was 600. For a channel that's only been active for a year or so and posting about once a month, it's not too bad, I think. Also: twitter.com/ShacharShemesh/status/1529020221368848385
I came up with two methods to generate a 1 MHz clock with your hardware and have them reside on proper clock routing networks to appease the Vivado Place & Route: OPTION 1.) Generate a power of 2 multiple clock (i.e. 16 MHz) and run a counter with that clock that has enough bits to count each MHz (so with 16 MHz clock, a 4-bit counter for 16 unique count values). Test for the "all 1s" case, register that test value, and drive the CE of a BUFGCE (see page 42 of Xilinx UG472) with the registered test value. The BUFGCE will let one cycle of the 16 MHz of the clock through once every 16 clocks. This will produce a valid but asymmetric duty-cycle clock that will already be on a clock network. OPTION 2.) Using the same 16 MHz clock as in the first option, test for the "all 1s" for all but the most significant bit. Use the slower clock to drive the clock of a toggle flip-flop, and the registered test of the counter to chip enable said toggle flip-flop. This will produce a 50% duty cycle 1 MHz signal. Route the generated clock to an I/O pin, and then run that back in on a clock capable I/O pin. You could likely do this with your current divided clock as well, but this is a minimal logic solution I'm proposing. From other information, it appears you're using a QMTECH Spartan 7 development board. If that's the case, you could run your clock out on pin F11 and back in on pin G11 by jumping the pins on JP3 (header pins 21 & 22). Looking forward to seeing more development!
Actually, I don't get any warnings about the hack I did. Maybe because I'm synthesizing the CPU separately, so it has it's own clock definition, unrelated to what the rest of the circuitry actually does. I voiced genuine concern that I'll be double-edging clock transitions, which at least for now, doesn't happen. BUFGCE seems like a good solution even based on a 1/50 divider. Why would dividing by a power of 2 be a priority in this case? I can tell you right now that there is very little chance I'll be spending two IO pins on this solution. Not only are those an expensive resource, I don't know what routing the clock outside the FPGA will do to its phase. Either way, thank you very much. I'm still kind of an FPGA newbie, and I find Xilinx UG documents rather overwhelming. They are okay individually, but it seems like Xilinx expects you to sit down and read them all through, which is beyond my current time budget. It's nice to have someone who actually knows what they are doing peeking over my shoulder.
@@CompuSAR I chose to go with a power of 2 clock because a counter that overflows on a power of 2 count requires no additional logic to reset the counter, where in your case without dividing down the clock, you have to make a 6 bit counter, and then infer/add circuitry that restricts the count to the range of 0 through 49, which effectively is a counter with a extra reset logic. Also, with a much slower clock, the asymmetric clock period provides more hold time in case you need that in the future for I/O. If you were to use the BUFGCE with the 50 MHz clock, you'd have less than 10 ns hold time but almost a whole microsecond of setup time. If you attempt to work with original parts for I/O, there may be latches instead of flip-flops, and I'm going to assume they don't work faster than 1 MHz, so the closer you get to a 50/50 duty cycle, likely the better. As for the phase shift on the out and back in again method, it wouldn't matter as all your logic would essentially be segregated from the clock generation circuit despite it originating from within the same chip. The returned clock would enter, get buffered and placed onto the clocking network, and as long as you didn't try to make logic work across domains of two separate clocks, it would not matter. I attempted to clone your project and rebuild it so I could see what resources you had available for such a clock division scheme (I'm spoiled with my Artix 7 35T and Zynq 7020 based boards having more FPGA fabric resources). Vivado has never played nice with git or really any CVS. I found that you included the Vivado project file, and if you open it in a text editor (it's just XML), it has full path information from your working machine. You can instead use the Tcl scripting capabilities along with some shell scripting to rebuild a project from scratch when a person launches Vivado. This way, you track only sources and not products of Vivado itself. Best way I found to learn these commands was to do things in Vivado and look at the Tcl Console or the .jou file generated by Vivado. Nearly everything you do in Vivado in the GUI is a Tcl command to the tool. As for the Xilinx User Guides, I have a leg up on you as I started with Xilinx components that are now footnotes in history, and I've seen the progression of their technology as if they were logical increments in design and capabilities. However, the best place to start is the Data Sheet (i.e. DS180 for Xilinx 7 Series). This gives you the CliffsNotes version of the part family, and highlights the various topics that are broken out into the in depth User Guides (basic blocks, advanced blocks, clocking, memory, I/O, configuration, etc.). With people new to Xilinx components that want/need to get in depth with them, I like to take them back a several years and point to the Virtex 4 family, as they are a direct ancestor to the 7th generation parts. Those parts had a lot less documentation, and many features were far simpler to digest. Progressing them forward up into the generation they're working with, they can see that we're really working with the same foundation Xilinx laid then.
@@jordanking357 First, I can't say how great it is to have someone follow along! The project file opens fine to me on different paths. The only thing that holds the complete path is the main project directory, and Vivado seems more than capable of inferring the correct path from the file's actual location. With that said, you would need a few extra steps. You'll need to run "git submodules update --init" in order to get the dependent projects (CPU and UART) into the work tree. You'll also need to provide the Apple 2 ROM files (I did not want to commit them to GIT until I understand what the copyright status is), split them into 2KB parts, and run "make" on the srcs/roms directory to convert them to MEM files that Vivado can further use. I can confirm my design works with the ROMs in the AppleWin source tree. Last, the only real downside of using the Vivado is that it includes the mdp cache file in the project. You may have to erase it from the project, so that it can re-generate it. Otherwise, the project should transfer over as is. As far as the clocks: I get your point about the "correct" division count. However, in the future I plan on dynamically scaling the clock up as far as the CPU would go, which is about 14MHz. Since that's the base clock of the original Apple 2, and since the divisor doesn't take _too_ much logic, I think it's a reasonable plan. I should point out that my current divider already has a duty-cycle of 50%, at least so long as the divisor is even. As for different clocks: it's not in the videos yet, but if you check the SoftSwitches.vs file, the UART is currently clocked at 50MHz to produce keyboard input, setting a register that the CPU then reset with a soft switch. As such, I'm using signals from an always_ff block that `posedge` on the CPU clock to one that relies on the board clock. Today I think I'm okay, as they are phase locked. With some of your tricks that will no longer be true, and I'll have to handle cross clock-domain communication. As the design will get more complex, more components will need a fast clock than the 6502. Some will need clocks even faster than the 50MHz that the board provides. I will probably have to have different clock domains anyway, as the Apple is really around 1.023MHz. I'm not sure what clock USB needs, but I doubt that will be a whole multiple of that. Last, I really _really_ value your input, but it seems that TH-cam comments provide a poor medium for this communication. Doubly so as I'm telling you things that are in the git repo, but aren't in this video, and will likely not make it to the next video either, so I'm spoiling two videos from now 🙂. If you can, it will be easier for me if you emailed me. My email is in the about page of the channel. Thank you for your help.
I talk about it a little in this video: th-cam.com/video/s1zQQAN6jmU/w-d-xo.html Let's get the truth out of the way: It's much much more fun to do those things from scratch. With that said, the MiSTer is a monster of an FPGA, with a lot more resources (and a matching price tag) than what I'm using. Maybe as a result, the cores for the MiSTer take way too much resources, which I don't have on the very small FPGA I'm using. So the objective non-NIH answer is that the architectures are just too different.
I'd have recommended the QMTech I'm using, but with the chip shortage doing on it went out of stock. I had _just_ enough foresight to buy four before that happened.
What font is that in your logo? It looks very familiar… because I think that's a font I designed back in the 00's called Old Block. It's cool to see it pop up elsewhere!
You just saved my butt! I was about to say that I didn't know what it was, because I only had the version where the font was elaborated to splines and Old Blocks isn't even installed on my system when I found the version before the elaboration, and it complained it doesn't have the font, showing "Old Blocks" crossed out. Turns out it I didn't transfer it from my old computer. I don't remember where I downloaded it from, but I just searched for "free fonts", and then browsed until I saw something that matched the transformation I wanted to do with it. Thank you very much for creating it!
I think your memory problems may be related to your clock problems. The off-chip memory is going to be DRAM, which needs to be refreshed. The FPGA should implement that circuit itself, but if the clock rate isn't high enough it won't be able to refresh the RAM fast enough and you'll get memory errors. Switching to on chip SRAM avoids the problem, but it's much more limited. That said, assuming that you've got a fixed clock on the board, presumably it's been validated with the DDR module also on the board?
This video (and the stage of the project) is all focused on getting _something_ up and running. The on-board DDR3 memory is something I have not yet even thought of using. The Spartan-7/15 I'm using is supposed to have 20 36Kbit block RAM units. Even if we don't find a way to use the ECC bit, that's more than enough for 64KB of RAM. The IP wizard that was supposed to make it happen had some problem. I wouldn't dare, and it would make no sense, to try and run the DDR at 1MHz. The whole point of using that module is that it supports more than just the 6502. The chip is rated at about 800MHz, and the sample code from QMTech runs it at around 700MHz, and I'll likely use something in that vicinity, when I finally get to use it. Doing so will require implementing a cache mechanism, which will be a fun challenge in and on itself.
Great, I won't miss any video, keep us updated!!
Just found your channel sir, liked and subscribed, because this is extremely interesting. I've implemented the 6502 and a C64 in software, but I've always wanted to learn more about FPGAs.
Awesome!
I will definitely get the C64 at some point, but the computer kinda scares me. Unlike the Apple, it has a ton of things going on outside of the CPU, and unlike the Amiga, I don't know them. When the time comes, tips would be most appreciated.
Something that may be of interest to you: I want the C64 implementation to be able to accept C64 cartridges. I still have on a vague idea how that's going to work, though.
@@CompuSAR Yeah, there's a lot of stuff on the C64 for sure. My VIC-II implementation is by no means complete and lacks bitmap graphics modes. But I've still been able to use my emulator for a lot of cool projects. I have implemented basic cartridge support in my emulator and it's actually relatively straight forward. My plan when implementing the 6502 was always to implement more systems than the C64 and the Apple II is next on my list, so I guess I should get to it very soon.
@@HAGSLAB There are two challenges I see for implementing cartridges. The first is the number of free IO pins the FPGA has. I don't know the pin-out, but I'm guessing you'd need at least 16 address, 8 data, R/W and the clock. Those already use *all* the unassigned pins my FPGA board has.
The second challenge is that the C64 works at 5V, wheres the FPGA tops out at 3V3. This means you need a level converter, and one that has as many pins as needed. Figuring out how to do that _cheaply_ is going to be interesting.
Anyways, that's still a bit far in the future. There are plenty of challenges before we get there.
@@CompuSAR That makes a lot og sense. It's a lot more straight forward in software, haha 😅 See, this is why I'm a software guy, not so bright at the hardware stuff 😂
Wait, how is it you only got like 800 subscribers? It seems like the algorithm found your video though, 'cause I got it in my recommended and I'm now a subscriber, so I hope you'll soon be *a lot* farther up!
There's a bug in TH-cam. You can only give a comment one heart. Maybe I'll frame yours.
I'm still high from that 800 number. Just a couple of days ago it was 600. For a channel that's only been active for a year or so and posting about once a month, it's not too bad, I think.
Also: twitter.com/ShacharShemesh/status/1529020221368848385
I came up with two methods to generate a 1 MHz clock with your hardware and have them reside on proper clock routing networks to appease the Vivado Place & Route: OPTION 1.) Generate a power of 2 multiple clock (i.e. 16 MHz) and run a counter with that clock that has enough bits to count each MHz (so with 16 MHz clock, a 4-bit counter for 16 unique count values). Test for the "all 1s" case, register that test value, and drive the CE of a BUFGCE (see page 42 of Xilinx UG472) with the registered test value. The BUFGCE will let one cycle of the 16 MHz of the clock through once every 16 clocks. This will produce a valid but asymmetric duty-cycle clock that will already be on a clock network. OPTION 2.) Using the same 16 MHz clock as in the first option, test for the "all 1s" for all but the most significant bit. Use the slower clock to drive the clock of a toggle flip-flop, and the registered test of the counter to chip enable said toggle flip-flop. This will produce a 50% duty cycle 1 MHz signal. Route the generated clock to an I/O pin, and then run that back in on a clock capable I/O pin. You could likely do this with your current divided clock as well, but this is a minimal logic solution I'm proposing. From other information, it appears you're using a QMTECH Spartan 7 development board. If that's the case, you could run your clock out on pin F11 and back in on pin G11 by jumping the pins on JP3 (header pins 21 & 22). Looking forward to seeing more development!
Actually, I don't get any warnings about the hack I did. Maybe because I'm synthesizing the CPU separately, so it has it's own clock definition, unrelated to what the rest of the circuitry actually does. I voiced genuine concern that I'll be double-edging clock transitions, which at least for now, doesn't happen.
BUFGCE seems like a good solution even based on a 1/50 divider. Why would dividing by a power of 2 be a priority in this case?
I can tell you right now that there is very little chance I'll be spending two IO pins on this solution. Not only are those an expensive resource, I don't know what routing the clock outside the FPGA will do to its phase.
Either way, thank you very much. I'm still kind of an FPGA newbie, and I find Xilinx UG documents rather overwhelming. They are okay individually, but it seems like Xilinx expects you to sit down and read them all through, which is beyond my current time budget. It's nice to have someone who actually knows what they are doing peeking over my shoulder.
@@CompuSAR I chose to go with a power of 2 clock because a counter that overflows on a power of 2 count requires no additional logic to reset the counter, where in your case without dividing down the clock, you have to make a 6 bit counter, and then infer/add circuitry that restricts the count to the range of 0 through 49, which effectively is a counter with a extra reset logic. Also, with a much slower clock, the asymmetric clock period provides more hold time in case you need that in the future for I/O. If you were to use the BUFGCE with the 50 MHz clock, you'd have less than 10 ns hold time but almost a whole microsecond of setup time. If you attempt to work with original parts for I/O, there may be latches instead of flip-flops, and I'm going to assume they don't work faster than 1 MHz, so the closer you get to a 50/50 duty cycle, likely the better. As for the phase shift on the out and back in again method, it wouldn't matter as all your logic would essentially be segregated from the clock generation circuit despite it originating from within the same chip. The returned clock would enter, get buffered and placed onto the clocking network, and as long as you didn't try to make logic work across domains of two separate clocks, it would not matter.
I attempted to clone your project and rebuild it so I could see what resources you had available for such a clock division scheme (I'm spoiled with my Artix 7 35T and Zynq 7020 based boards having more FPGA fabric resources). Vivado has never played nice with git or really any CVS. I found that you included the Vivado project file, and if you open it in a text editor (it's just XML), it has full path information from your working machine. You can instead use the Tcl scripting capabilities along with some shell scripting to rebuild a project from scratch when a person launches Vivado. This way, you track only sources and not products of Vivado itself. Best way I found to learn these commands was to do things in Vivado and look at the Tcl Console or the .jou file generated by Vivado. Nearly everything you do in Vivado in the GUI is a Tcl command to the tool.
As for the Xilinx User Guides, I have a leg up on you as I started with Xilinx components that are now footnotes in history, and I've seen the progression of their technology as if they were logical increments in design and capabilities. However, the best place to start is the Data Sheet (i.e. DS180 for Xilinx 7 Series). This gives you the CliffsNotes version of the part family, and highlights the various topics that are broken out into the in depth User Guides (basic blocks, advanced blocks, clocking, memory, I/O, configuration, etc.). With people new to Xilinx components that want/need to get in depth with them, I like to take them back a several years and point to the Virtex 4 family, as they are a direct ancestor to the 7th generation parts. Those parts had a lot less documentation, and many features were far simpler to digest. Progressing them forward up into the generation they're working with, they can see that we're really working with the same foundation Xilinx laid then.
@@jordanking357 First, I can't say how great it is to have someone follow along!
The project file opens fine to me on different paths. The only thing that holds the complete path is the main project directory, and Vivado seems more than capable of inferring the correct path from the file's actual location.
With that said, you would need a few extra steps. You'll need to run "git submodules update --init" in order to get the dependent projects (CPU and UART) into the work tree. You'll also need to provide the Apple 2 ROM files (I did not want to commit them to GIT until I understand what the copyright status is), split them into 2KB parts, and run "make" on the srcs/roms directory to convert them to MEM files that Vivado can further use. I can confirm my design works with the ROMs in the AppleWin source tree. Last, the only real downside of using the Vivado is that it includes the mdp cache file in the project. You may have to erase it from the project, so that it can re-generate it. Otherwise, the project should transfer over as is.
As far as the clocks: I get your point about the "correct" division count. However, in the future I plan on dynamically scaling the clock up as far as the CPU would go, which is about 14MHz. Since that's the base clock of the original Apple 2, and since the divisor doesn't take _too_ much logic, I think it's a reasonable plan. I should point out that my current divider already has a duty-cycle of 50%, at least so long as the divisor is even.
As for different clocks: it's not in the videos yet, but if you check the SoftSwitches.vs file, the UART is currently clocked at 50MHz to produce keyboard input, setting a register that the CPU then reset with a soft switch. As such, I'm using signals from an always_ff block that `posedge` on the CPU clock to one that relies on the board clock. Today I think I'm okay, as they are phase locked. With some of your tricks that will no longer be true, and I'll have to handle cross clock-domain communication. As the design will get more complex, more components will need a fast clock than the 6502. Some will need clocks even faster than the 50MHz that the board provides.
I will probably have to have different clock domains anyway, as the Apple is really around 1.023MHz. I'm not sure what clock USB needs, but I doubt that will be a whole multiple of that.
Last, I really _really_ value your input, but it seems that TH-cam comments provide a poor medium for this communication. Doubly so as I'm telling you things that are in the git repo, but aren't in this video, and will likely not make it to the next video either, so I'm spoiling two videos from now 🙂. If you can, it will be easier for me if you emailed me. My email is in the about page of the channel.
Thank you for your help.
Very well done!
Glad you liked it!
Excellent video! Thank you.
I'm back again a year later, rewatching and reenjoying! Lol 😂
I love me some 6502 waveforms! Nice video.
Thank you!
Great video!
Just a small question: why don't you use the work that was already done on Mister?
I talk about it a little in this video: th-cam.com/video/s1zQQAN6jmU/w-d-xo.html
Let's get the truth out of the way: It's much much more fun to do those things from scratch.
With that said, the MiSTer is a monster of an FPGA, with a lot more resources (and a matching price tag) than what I'm using. Maybe as a result, the cores for the MiSTer take way too much resources, which I don't have on the very small FPGA I'm using.
So the objective non-NIH answer is that the architectures are just too different.
This is pretty neat. One of these days I need to get an FPGA.
I'd have recommended the QMTech I'm using, but with the chip shortage doing on it went out of stock. I had _just_ enough foresight to buy four before that happened.
@@CompuSAR Sweet, I guess I'll have to buy more than one when I do finally pick one up.
What font is that in your logo? It looks very familiar… because I think that's a font I designed back in the 00's called Old Block. It's cool to see it pop up elsewhere!
You just saved my butt!
I was about to say that I didn't know what it was, because I only had the version where the font was elaborated to splines and Old Blocks isn't even installed on my system when I found the version before the elaboration, and it complained it doesn't have the font, showing "Old Blocks" crossed out. Turns out it I didn't transfer it from my old computer.
I don't remember where I downloaded it from, but I just searched for "free fonts", and then browsed until I saw something that matched the transformation I wanted to do with it. Thank you very much for creating it!
I think your memory problems may be related to your clock problems. The off-chip memory is going to be DRAM, which needs to be refreshed. The FPGA should implement that circuit itself, but if the clock rate isn't high enough it won't be able to refresh the RAM fast enough and you'll get memory errors. Switching to on chip SRAM avoids the problem, but it's much more limited. That said, assuming that you've got a fixed clock on the board, presumably it's been validated with the DDR module also on the board?
This video (and the stage of the project) is all focused on getting _something_ up and running. The on-board DDR3 memory is something I have not yet even thought of using. The Spartan-7/15 I'm using is supposed to have 20 36Kbit block RAM units. Even if we don't find a way to use the ECC bit, that's more than enough for 64KB of RAM. The IP wizard that was supposed to make it happen had some problem.
I wouldn't dare, and it would make no sense, to try and run the DDR at 1MHz. The whole point of using that module is that it supports more than just the 6502. The chip is rated at about 800MHz, and the sample code from QMTech runs it at around 700MHz, and I'll likely use something in that vicinity, when I finally get to use it.
Doing so will require implementing a cache mechanism, which will be a fun challenge in and on itself.
@@CompuSAR I hope it didn't sound like I was criticizing your approach. I'm just trying some armchair debugging with what I can see.
This is an interesting project, and I look forward to seeing more.
@@hikingpete Glad to have you.