Yep that’s a tease James! I’m gonna guess at an SDCard reader. The other, less likely, option is some kind of PCM sound output involving I2S. But I’m leaning towards an SDcard interface because you mentioned dataflows in both directions.
hmmm... I am betting on a Little flash storage chip - Winbond beastie or maybe an LCD panel with touch... Excellent vid btw, I love the way we get the "brain ticking" and the typing, not edited out like others do. It still blows my gourd that this is a complete CPU (and now computer system) of your own design - it must be great to know everything on such an intimate scale - Grab yourself an ice cream.
Very generous! I’m in Rome for a few days so it may have to be Gelato! The spi is definitely moving towards storage, I occasionally just let it run slowly to watch the led’s!
At this point I'd be looking at buffering SPI output, so multiple bytes can be read without the CPU needing to consume them on exact clock boundaries, and a WAIT input on the CPU so the SPI subsystem can block execution until data is ready without needing lots of explicit NOPs with super precise timing. This build truly never ceases to amaze me. Really need to have a go at doing stuff like this myself someday 🙂
It's always a balance on circuit verses code complexity. My tightest loop here is 3 cycles of action verses 5 cycles of delay, so I would only gain 5 cycles of parallelism for each byte of fifo (minus any management code). My sense is that I'm probably at the sweet spot for this build, but it's easy to see how that balance can shift rapidly.
SD card? It's what I would like to add to my SBC so I don't have to load everything over a serial connection. Considering the demos you've created so far you may want to add bankable RAM, so the SD card would allow for a lot more assets in future demos.
SD Card, I couldn't possible comment 😅. I'm not 100% sure about bankable ram, while it would be nice to have more memory none of my plans really need it. I'd rather save a larger memory pool for a future architecture that's designed to be able to make use of it.
I can see you are still working on and expanding the architecture and continuing to improve its ISA. When you finally complete this stage of the project where you feel it's completed, do you have any plans on taking it to the next step or level of abstraction? In other words do you have any intentions on implementing your own compiler - higher level language and do you intend to build your own operating system on top of it?
A compiler would be an interesting thing to have, but code generation for 8-bit architectures tends to be pretty inefficient. I'm expecting future builds to achieve more in this regard.
Why are you using 2 shift registers? If memory serves, the SPI protocol sends bits in both directions at the same time. So you only need 1 shift register along with the capability to read and write the contents of the shift register in parallel. Algorithm is: 1. Store value to transmit in shift register. 2. Toggle clock 8 times while shifting register. This both output's 1 bit at a time from the shift register /and/ shifts in 1 bit of data from the client device into the same shift register. 3. Read value in shift register to get data from client. Effectively, you have a 16 bit circular shift register with 8 bits on your side and 8 bits on the client side. After clocking 8 times, the values in each register are exchanged with each other.
You can indeed do it with just one shift register, but you need one of the more complex parts that can handle parallel read and write. The circuit logic is not any simpler and I felt it was easier to explain the separate functionality by handling shift in/out separately.
Hi James! Did you consider making all the IO hardware memory-mapped? In this case the CPU would not depend on peripherals and would be a general-purpose CPU with simpler general-purpose ISA instructions. Why did you decide to make IO a part of the CPU and ISA? Is it for better performance? Thanks for the videos!
This is a discussion that almost needs a video on it's own. Memory Mapped IO (I do a little of this in the vga circuit) is really just IO implemented as memory operations without additional support from the cpu. Processors like the z80 have dedicated IO circuitry which still works very much like memory but separates the address space out. But those two are not the only way to do it, the "IO Ports" on my build work more like some microcomputers. The bus is the internal main bus and those "port's" are just otherwise unused addresses for main bus devices, reading or writing to a port is exactly the same as asserting or loading one of the 8-bit registers. There is no hardware at all to support the IO in the core cpu, the only things it costs me is the instruction space. In this particular architecture the IO is indeed faster since it never touches the memory system so it doesn't cause a fetch stall.
@@weirdboyjim Well, removing bit banging serial did evoke some memories. And I know one thing that use spi almost exclusively. Ohhh, demos are always fun, all parts of them. 😉
@@weirdboyjim something simmilar would still be posible, as it's own devices with it's own memory. Essentially an autonomous buffer which could opperate at different speeds from the cpu and would remove the nops with a flag check. But this only makes sense when you plan on adding multi-theading.
Remember the first few iterations of the MIPS architecture had a branch delay slot that was invariably filled with a NOP by the compiler, let alone the programmer 😂 What you have will no doubt work for you, but possibly it's worth an auto clock counter so that you can trigger it and it just pumps the 8 bits. Could be triggered by the control line that currently controls the mux. Then you can overlap writing the Rx data to ram while the next byte is being clocked in, without having to be 100% cycle count accurate in the 'filler' code. Only worth it if the equivalent of *pointer++ = register takes more than 5 cycles or is nondeterministic.
Indeed, macros have been on the "todo list" for a while, but as I'm sure you can imagine my backlog of things that would be handy is pretty large. It's difficult to prioritize sometimes.
It started out as a few bits of common circuit I used on a breadboard. It has a d-type latch, a couple of line drivers and some debounced buttons. I was using that a lot of test bits of the circuit. I converted it to pcb for reliability. th-cam.com/video/aru482FfXlk/w-d-xo.html
Join us on Discord: discord.gg/jmf6M3z7XS
Follow me on Twitter: twitter.com/WeirdBoyJim
Support the channel on Patreon: www.patreon.com/JamesSharman
you are the most consistent engineer i have ever seen
Very kind of you to say that!
Cannot wait to see what your are introducing after this cliffhanger.
Thanks Peter! I hope it doesn't disappoint!
Yep that’s a tease James! I’m gonna guess at an SDCard reader. The other, less likely, option is some kind of PCM sound output involving I2S. But I’m leaning towards an SDcard interface because you mentioned dataflows in both directions.
@@lawrencemanningyeah... that's what I thought too... we'll have to wait and see. :)
@@lawrencemanning Or one of these ENC28J60 Ethernet modules
hmmm... I am betting on a Little flash storage chip - Winbond beastie or maybe an LCD panel with touch...
Excellent vid btw, I love the way we get the "brain ticking" and the typing, not edited out like others do. It still blows my gourd that this is a complete CPU (and now computer system) of your own design - it must be great to know everything on such an intimate scale - Grab yourself an ice cream.
Very generous! I’m in Rome for a few days so it may have to be Gelato! The spi is definitely moving towards storage, I occasionally just let it run slowly to watch the led’s!
Man I wish I had your asm skills. Great video as always 🎉
😊 thank you
At this point I'd be looking at buffering SPI output, so multiple bytes can be read without the CPU needing to consume them on exact clock boundaries, and a WAIT input on the CPU so the SPI subsystem can block execution until data is ready without needing lots of explicit NOPs with super precise timing.
This build truly never ceases to amaze me. Really need to have a go at doing stuff like this myself someday 🙂
It's always a balance on circuit verses code complexity. My tightest loop here is 3 cycles of action verses 5 cycles of delay, so I would only gain 5 cycles of parallelism for each byte of fifo (minus any management code). My sense is that I'm probably at the sweet spot for this build, but it's easy to see how that balance can shift rapidly.
It's always nice to see one of your videos pop up on TH-cam.
Good to hear you are enjoying!
Love watching what you do...but haven't a clue what you did 😊
Thanks Lol, Glad you are enjoying anyway.
a lot of what you do is well above my head but still very interesting 🙂
Good to hear you are enjoying it!
Fascinating progress . Thanks for sharing
Thanks for watching!
Great video again James! Cant wait to see what the device is.
Thanks!
SD card? It's what I would like to add to my SBC so I don't have to load everything over a serial connection. Considering the demos you've created so far you may want to add bankable RAM, so the SD card would allow for a lot more assets in future demos.
SD Card, I couldn't possible comment 😅. I'm not 100% sure about bankable ram, while it would be nice to have more memory none of my plans really need it. I'd rather save a larger memory pool for a future architecture that's designed to be able to make use of it.
I can see you are still working on and expanding the architecture and continuing to improve its ISA. When you finally complete this stage of the project where you feel it's completed, do you have any plans on taking it to the next step or level of abstraction? In other words do you have any intentions on implementing your own compiler - higher level language and do you intend to build your own operating system on top of it?
A compiler would be an interesting thing to have, but code generation for 8-bit architectures tends to be pretty inefficient. I'm expecting future builds to achieve more in this regard.
Can't wait to see the next step!
Hope it doesn’t disappoint!
Why are you using 2 shift registers? If memory serves, the SPI protocol sends bits in both directions at the same time. So you only need 1 shift register along with the capability to read and write the contents of the shift register in parallel. Algorithm is:
1. Store value to transmit in shift register.
2. Toggle clock 8 times while shifting register. This both output's 1 bit at a time from the shift register /and/ shifts in 1 bit of data from the client device into the same shift register.
3. Read value in shift register to get data from client.
Effectively, you have a 16 bit circular shift register with 8 bits on your side and 8 bits on the client side. After clocking 8 times, the values in each register are exchanged with each other.
You can indeed do it with just one shift register, but you need one of the more complex parts that can handle parallel read and write. The circuit logic is not any simpler and I felt it was easier to explain the separate functionality by handling shift in/out separately.
@@weirdboyjim 🤔so that's why 299 wasn't used
Are you going to look at an interrupt driven handler, instead of all those nop?
Interrupts rarely work with low enough latency, you would want some buffering as well.
Hi James!
Did you consider making all the IO hardware memory-mapped? In this case the CPU would not depend on peripherals and would be a general-purpose CPU with simpler general-purpose ISA instructions.
Why did you decide to make IO a part of the CPU and ISA? Is it for better performance?
Thanks for the videos!
This is a discussion that almost needs a video on it's own. Memory Mapped IO (I do a little of this in the vga circuit) is really just IO implemented as memory operations without additional support from the cpu. Processors like the z80 have dedicated IO circuitry which still works very much like memory but separates the address space out. But those two are not the only way to do it, the "IO Ports" on my build work more like some microcomputers. The bus is the internal main bus and those "port's" are just otherwise unused addresses for main bus devices, reading or writing to a port is exactly the same as asserting or loading one of the 8-bit registers. There is no hardware at all to support the IO in the core cpu, the only things it costs me is the instruction space. In this particular architecture the IO is indeed faster since it never touches the memory system so it doesn't cause a fetch stall.
When the programmer inserts NOP code to move forward.
Far more common at the low level than many people realize!
Congrats! A great milestone ;)
Thank you very much!
Oh, I think I know what is coming...
Ha! I expect you probably do. What you don't know is the first big demo that's going to use it! 😉
@@weirdboyjim Well, removing bit banging serial did evoke some memories. And I know one thing that use spi almost exclusively.
Ohhh, demos are always fun, all parts of them. 😉
I think now would be a good time to add a DMA Engine so that SPI and UART can do bulk data Trasfere.
Dma would require support on the cpu side that I don’t have, you may need to wait for the next cpu build for that.
@@weirdboyjim something simmilar would still be posible, as it's own devices with it's own memory. Essentially an autonomous buffer which could opperate at different speeds from the cpu and would remove the nops with a flag check. But this only makes sense when you plan on adding multi-theading.
Are you going to interface with an SD-Card? because this is the only thing i can imagine that is not only useful, but can also saturate the Interface.
The ability of SD cards to talk SPI is indeed an interesting possibility...
Remember the first few iterations of the MIPS architecture had a branch delay slot that was invariably filled with a NOP by the compiler, let alone the programmer 😂
What you have will no doubt work for you, but possibly it's worth an auto clock counter so that you can trigger it and it just pumps the 8 bits. Could be triggered by the control line that currently controls the mux.
Then you can overlap writing the Rx data to ram while the next byte is being clocked in, without having to be 100% cycle count accurate in the 'filler' code.
Only worth it if the equivalent of *pointer++ = register takes more than 5 cycles or is nondeterministic.
Great progress!
Thanks! Glad you liked it!
The best
Thanks TwoBob!
Just a random thought, but your assembler might need some form of macros if you keep unrolling loops etc.
Indeed, macros have been on the "todo list" for a while, but as I'm sure you can imagine my backlog of things that would be handy is pretty large. It's difficult to prioritize sometimes.
@@weirdboyjim I have the same type of issue. I guess that's part of the joy of solo projects... 😃
SPI display?? :D
Nice idea, but that would overlap with the vga project.
What’s your “bus tester”?
It started out as a few bits of common circuit I used on a breadboard. It has a d-type latch, a couple of line drivers and some debounced buttons. I was using that a lot of test bits of the circuit. I converted it to pcb for reliability. th-cam.com/video/aru482FfXlk/w-d-xo.html