Thanks for posting these technical breakdowns of the old Sinclair machines. It's nostalgic for me, as someone whose first computer was a ZX Spectrum, and now working in electronics and CAD design for the event industry. I've written some emulators for old machines over the years, but now itching to write some games again for the Speccy, with the more modern IDEs we have now :)
@@alexandrustefanmiron7723 I usually pour the (little) money I make from TH-cam back into spreading the reach of the videos, rather than keeping it. Thanks for the sub though.
thanks for sharing that knowhow. It's another great example, of how using clever technique can lead to huge impact on the performance. something we often enough forget when developing for our multi-core beasts with monster clock rates
Very interesting stuff, and great to understand the evolution from ZX80/1. Fun to see Manic Miner and JSW - both amazing works, but which shared the unforunate decision of having two "back buffers" stored in the lower 16k of memory. One is the static scenery of the current room. That gets copied to the other back buffer every frame, and the moving entities are drawn over it, then the final result is copied to the screen 0x4000. All this memory copying takes up most of the run time, and just swapping these buffers' locations with little-used data (e.g. the whole game map) in the upper, uncontended memory bank makes the whole thing run about 25% faster.
2:25 original ZX81 1k is capable of true HiRes! (obviously limited by the RAM) ... and in fact since the possibility of software driving hw much better than in the Spectrum, the only real limit of ZX81 are the colors!
Well, you can do pseudo high-res with enough memory. You choose a different character palate by changing the interrupt vector, and try to find part of the ROM that looks more random. You are still stuck with the closest match, but each line has to use the same palate. It’s not the same as the spectrum where you have absolute control over each pixel.
I was kind of hoping for an explanation for the very strange frame buffer organization in the ZX Spectrum, what with the 3 groups of 8-way interleaved scan lines. Giving the very unique loading screen patterns. Suddenly I fear that the explanation might hiding in plain sight, did they reuse the character display generator from the ZX81 ULA, just driven only by counters instead? 256 characters wide, 8 lines tall, folded 8 times per group. Those address calculations made writing graphics code an absolute nightmare for 11-year old me having learnt Z80 assembly from the "ascii" code tables in the back of the ZX Spectrum manual and trying to get faster graphics drawing than basic would allow.
The line count in the zx81 was in software so it didn’t come from there and it used the display file instead of a buffer. It doesn’t even make sense from a refresh perspective, it would have been better to put the line number as the lower 3 bits of the line address. I suspected that the may have thought is made printing characters easier. If you look at the character printing code, you just increment the upper 8-bit register of the address pointer to get the next line in the character.
Another great video! Thanks. I wasn't aware that the Spectrum ULA read the data for two sets of 8 pixels in that way in order to free up clock cycles. But I was then puzzling over why they interleaved the reads of the pixel and attribute data rather than read two bytes of pixels followed by two bytes for the attributes. They could then have used the page-mode read cycles of the 4116 DRAM to save even more time (two RAS\ pulses with only one CAS\ for two adjacent bytes). But perhaps that would have added too much compexity in the ULA (extra buffering perhaps?) and since RAM availability to the CPU was already quite high they saw no need to make life more complicated.
Yeah, it's a bit tricky, none of these decisions are document. It would make sense to use page mode in the DRAM (it would just cost a couple of registers to store the data), but i guess, what does it buy? Can we get in two memory accesses at the end of the 16 pixel block? The page mode cycle time is 170ns, so we could potentially save 3 pixel clocks across the 4 reads (bitmask and attribute). If so, this gives us 3.5 CPU clocks at the end of the 16 pixel block (instead of two). Looking at the Z80 timing diagram, i think it we need 5 CPU clocks at a minimum to get 2 reads in that block. So i suspect that is probably the main reason. That said, they could go to a 32 pixel block. Knowing the Sinclair team, i suspect this is the sweet spot in the $$$/performance curve.
Actually, i thought about it a bit. I think if they could get it up to 4 CPU clocks free, they could potentially get two blocks of 2 CPU clocks which would be enough for 2 reads in the 16 pixel cycle. It means they would need to do the random access followed by the page access (min 490ns) in 4 pixel clocks (which is 572 ns), so it may be theoretically possible. Might be hard to design a state machine clocked at 14 MHz to do it though.
@DrMattRegan I'm sure it was cost driven. In 1981/82 the lower cost ULAs would have had a few tens of logic blocks at best, and buffering extra bytes would use those up pretty quickly. Also, as you suggest, using page mode would not save enough time to allow an additional RAM access, it would only bring forward the single access window by a couple of clock cycles or so, so it was probably just not worth the effort. Sometimes, I find myself revisiting the designs that I worked on at that time (industrial microcomputers). I can often come up with improvements, but not when I restrict myself to the mid 1980's parts catalogue. Sir Clive and his engineers were a very smart bunch.
@DrMattRegan You've clearly thought about it in greater detail than I have. 😉 You could well be right. As you say, 14MHz might also be an issue - speed was not a strong point in those ULAs. (Early BBC Micros also used Ferranti ULAs in earlier versions and had heat/speed issues.)
A quirk of the ULA cycle stealing is that it would also pause the clock during io operations if A15 was low and A14 high. As the Z80 in and out instructions put either the contents of the A or B register on the top 8 bits of the address bus when executed it meant that tuning would be screwed up. i had to add an and gate stuck on top of the spectrum ula to force this not to happen during io ops in the network code in the if1 rom. This made networking between the spectrum and QL work properly instead of not at all.
Yeah, i don't know why they did it that way (actually i do, it probably saved logic and $$$), but all they had to do was look at MREQBar. Then again, i would have used wait as well instead of stopping the clock. Still a pretty clever design though IMHO. The other thing i didn't discuss was refresh. There is refresh occurring in the horizontal windows, but we need to scan over 32 lines to do a full 128-address refresh cycles if we are relying on the raster count. At 64uS per scanline this takes 2.048 mS. Just slightly outside the DRAM spec, but close enough i would imagine.
I immediately thought of using the Z8)'s wait line, rather than stop the clock; but, and I don't have my Zak's Z80 manual to hand, Wait might not be immediate, i.e. the wait occurs at the next clock cycle, (or longer), as the processor releases buses etc. I'm going to be lazy and not get into the times, but, it could be that WAITing is too long for this situation.
No wait should be fine. Wait is sampled on the falling edge of CPU clock in T2 just before the data is read. That's when we want to stop the CPU. Wait is designed to deal with slow memory devices, which is what the DRAM effectively becomes during scan out.
The TRS-80 Model I didn't have a built-in speaker. The usual methods of getting sound was to utilize the cassette port. The software would toggle the output to generate the sound by toggling the data bit. You'd either put an AM radio next to the cassette port (to pick up on the generated RF noise -- though it was mostly a lot of buzzing), or plug the lead that went into the MIC input of the cassette player into a separate amplifier/speaker unit (something that Radio Shack sold separately). The latter produced more satisfactory sound, though was fiddly if you were loading programs and data from tape due frequently plugging and unplugging of cables.
Thanks for that. I seem to vividly recall the model II making 1-bit sounds. I still remember the grainy "Game over player 1". I guess i meant that there was 1-bit sound output, and the Speccy still uses the cassette port. I said 1-bit speakers, instead of 1-bit sound. The problem of not having an editor!
Back in the 1980's, I built my own 2K, 8K and 32K static RAM packs for my ZX80 and ZX81. I noticed that the SAVE time increased dependant on which RAM pack was attached, with the 32K taking around 10 minutes to SAVE, regardless of programme code length.
Just found this channel; very interesting video! (subbed et.c.) Much of the hardware design in the ZX Spectrum was done by a fellow named Richard Altwasser, he was a very clever bloke IMO. Take the layout of the screen memory, for instance: to the uninitiated it looks very odd to have the screen split in three distinct segments, but it made the hardware very efficient. Another clever point is how text characters are printed to the screen: HL points to the top line of current print position while DE points to the first byte of the character bitmap, then we do LD A,(DE) LD (HL),A INC H INC DE eight times, et voila, we have our character on the screen .
Welcome. Yeah i tend to talk about Sir Clive, but it was his team that did the clever engineering. The ZX80/81 were shaved to the bone. If you haven't watched them yet, you might like this series th-cam.com/play/PLjQDRjQfW-84WG47-5UjPz1BrXxc1acvd.html
Interesting stuff. The 81% CPU utilization is on par with the Atari 800 which had a graphics coprocessor called the Antic chip. The Antic chip handled video generation, but RAM was shared and the CPU was halted when the Antic chip needed RAM access.
Hi Martin, Yeah, i thought it was pretty clever how they ended up doing it, i actually think there were more clever tricks in the Sinclair series than in the Apple II.
My rudimentary test on BASIC speed between SLOW and FAST modes on ZX81 with 16K (test duration in minutes - drawing circle using sin & cos ) give me result that in SLOW mode CPU have 25% speed. So i was pretty close ;)
I've really enjoyed this series! Thanks for all of the effort you've put in. I really want to watch the movie that you kept showing clips of. Any idea where an American can find it?
Hey Matt, absolutely love these videos. Noddy question, forgive me, but _how_ do the resistors divide the bus? They're in series … my mind recoils in confusion! 😄 (Obviously, I have no real electronics experience beyond your and Ben Eater's videos, etc.) 🙏
Great question. When there is 1 device driving the bus, the voltage on either side of the resistors will be the same (from a digital logic perspective) and it acts as a single bus. On a normal bus, when two devices try to drive it, there will be conflict and some non-valid voltages as a result. But with the series resistors, we can have two devices driving the bus provided they are on opposite sides of the resistors. There will be a voltage drop across the resistors (and some current draw), but from a logic perspective, the data on either side of the resistors can be different, so in this case, it acts as 2 busses.
@@DrMattRegan OMG, another ingenious trick, (the "other" being abusing the Z80 refresh register). Series resisters dividing ("separating" might be a better word to avoid confusion with voltage dividing) the bus, when you're to cheap to buy Tri-state buffer chips.
Exactly. It's cheaper than a tri-state buffer. The problem is that (worst case) a 5V drop across a 470 ohm resistor is 10 mA, so that is 80mA for the data bus.
Thanks for the explanations, @@DrMattRegan. I can't tell you how empowering your deep dive videos are. Between you, Ben Eater, and a few others, I've gone from being completely ignorant about electronics to building a 6502 computer using a Raspberry Pi Pico for ROM/RAM and hardware monitor (which taught me far more about clock timings and bus control than I ever wanted to know!), now working on a VGA circuit with a view to eventually using an FPGA for that!
I am simple man - I see ZX81 in thumbnail, I watch & like
Enjoy!
Thank you for doing this series, i was big time into hw but never knew how the 81 did the video. Very clever people at Sincair
Glad you enjoyed it! It's been fun to make the series. I like how much they were able to squeeze out of so little!
Thanks for posting these technical breakdowns of the old Sinclair machines. It's nostalgic for me, as someone whose first computer was a ZX Spectrum, and now working in electronics and CAD design for the event industry. I've written some emulators for old machines over the years, but now itching to write some games again for the Speccy, with the more modern IDEs we have now :)
They were actually great little machines. It's surprising how much can still be learned from them.
Why is this sponsored and not just recommend? I'm already subbed to you!
@@alexandrustefanmiron7723 I usually pour the (little) money I make from TH-cam back into spreading the reach of the videos, rather than keeping it. Thanks for the sub though.
thanks for sharing that knowhow. It's another great example, of how using clever technique can lead to huge impact on the performance. something we often enough forget when developing for our multi-core beasts with monster clock rates
Absolutely! They were very clever at the Sinclair farm.
The nostalgia I got from this brilliance was immense 😊 thank you mate 🤝🤝🤝
Glad you enjoyed it!!
Very interesting stuff, and great to understand the evolution from ZX80/1. Fun to see Manic Miner and JSW - both amazing works, but which shared the unforunate decision of having two "back buffers" stored in the lower 16k of memory. One is the static scenery of the current room. That gets copied to the other back buffer every frame, and the moving entities are drawn over it, then the final result is copied to the screen 0x4000. All this memory copying takes up most of the run time, and just swapping these buffers' locations with little-used data (e.g. the whole game map) in the upper, uncontended memory bank makes the whole thing run about 25% faster.
Interesting. Even if they used the same two buffers, it would have made more sense to put this in the upper 32K of DRAM!
2:25 original ZX81 1k is capable of true HiRes! (obviously limited by the RAM) ... and in fact since the possibility of software driving hw much better than in the Spectrum, the only real limit of ZX81 are the colors!
Well, you can do pseudo high-res with enough memory. You choose a different character palate by changing the interrupt vector, and try to find part of the ROM that looks more random. You are still stuck with the closest match, but each line has to use the same palate. It’s not the same as the spectrum where you have absolute control over each pixel.
I was kind of hoping for an explanation for the very strange frame buffer organization in the ZX Spectrum, what with the 3 groups of 8-way interleaved scan lines. Giving the very unique loading screen patterns.
Suddenly I fear that the explanation might hiding in plain sight, did they reuse the character display generator from the ZX81 ULA, just driven only by counters instead?
256 characters wide, 8 lines tall, folded 8 times per group. Those address calculations made writing graphics code an absolute nightmare for 11-year old me having learnt Z80 assembly from the "ascii" code tables in the back of the ZX Spectrum manual and trying to get faster graphics drawing than basic would allow.
The line count in the zx81 was in software so it didn’t come from there and it used the display file instead of a buffer.
It doesn’t even make sense from a refresh perspective, it would have been better to put the line number as the lower 3 bits of the line address.
I suspected that the may have thought is made printing characters easier. If you look at the character printing code, you just increment the upper 8-bit register of the address pointer to get the next line in the character.
Another great video! Thanks. I wasn't aware that the Spectrum ULA read the data for two sets of 8 pixels in that way in order to free up clock cycles. But I was then puzzling over why they interleaved the reads of the pixel and attribute data rather than read two bytes of pixels followed by two bytes for the attributes. They could then have used the page-mode read cycles of the 4116 DRAM to save even more time (two RAS\ pulses with only one CAS\ for two adjacent bytes). But perhaps that would have added too much compexity in the ULA (extra buffering perhaps?) and since RAM availability to the CPU was already quite high they saw no need to make life more complicated.
Yeah, it's a bit tricky, none of these decisions are document. It would make sense to use page mode in the DRAM (it would just cost a couple of registers to store the data), but i guess, what does it buy? Can we get in two memory accesses at the end of the 16 pixel block?
The page mode cycle time is 170ns, so we could potentially save 3 pixel clocks across the 4 reads (bitmask and attribute). If so, this gives us 3.5 CPU clocks at the end of the 16 pixel block (instead of two). Looking at the Z80 timing diagram, i think it we need 5 CPU clocks at a minimum to get 2 reads in that block. So i suspect that is probably the main reason. That said, they could go to a 32 pixel block.
Knowing the Sinclair team, i suspect this is the sweet spot in the $$$/performance curve.
Actually, i thought about it a bit. I think if they could get it up to 4 CPU clocks free, they could potentially get two blocks of 2 CPU clocks which would be enough for 2 reads in the 16 pixel cycle.
It means they would need to do the random access followed by the page access (min 490ns) in 4 pixel clocks (which is 572 ns), so it may be theoretically possible. Might be hard to design a state machine clocked at 14 MHz to do it though.
@DrMattRegan I'm sure it was cost driven. In 1981/82 the lower cost ULAs would have had a few tens of logic blocks at best, and buffering extra bytes would use those up pretty quickly. Also, as you suggest, using page mode would not save enough time to allow an additional RAM access, it would only bring forward the single access window by a couple of clock cycles or so, so it was probably just not worth the effort. Sometimes, I find myself revisiting the designs that I worked on at that time (industrial microcomputers). I can often come up with improvements, but not when I restrict myself to the mid 1980's parts catalogue. Sir Clive and his engineers were a very smart bunch.
@DrMattRegan You've clearly thought about it in greater detail than I have. 😉 You could well be right. As you say, 14MHz might also be an issue - speed was not a strong point in those ULAs. (Early BBC Micros also used Ferranti ULAs in earlier versions and had heat/speed issues.)
A quirk of the ULA cycle stealing is that it would also pause the clock during io operations if A15 was low and A14 high. As the Z80 in and out instructions put either the contents of the A or B register on the top 8 bits of the address bus when executed it meant that tuning would be screwed up. i had to add an and gate stuck on top of the spectrum ula to force this not to happen during io ops in the network code in the if1 rom. This made networking between the spectrum and QL work properly instead of not at all.
Yeah, i don't know why they did it that way (actually i do, it probably saved logic and $$$), but all they had to do was look at MREQBar. Then again, i would have used wait as well instead of stopping the clock. Still a pretty clever design though IMHO.
The other thing i didn't discuss was refresh. There is refresh occurring in the horizontal windows, but we need to scan over 32 lines to do a full 128-address refresh cycles if we are relying on the raster count. At 64uS per scanline this takes 2.048 mS. Just slightly outside the DRAM spec, but close enough i would imagine.
I immediately thought of using the Z8)'s wait line, rather than stop the clock; but, and I don't have my Zak's Z80 manual to hand, Wait might not be immediate, i.e. the wait occurs at the next clock cycle, (or longer), as the processor releases buses etc. I'm going to be lazy and not get into the times, but, it could be that WAITing is too long for this situation.
No wait should be fine. Wait is sampled on the falling edge of CPU clock in T2 just before the data is read. That's when we want to stop the CPU. Wait is designed to deal with slow memory devices, which is what the DRAM effectively becomes during scan out.
The relative speed mentioned at 18:50 made it possible to code a reasonable ZX81 emulator on the 48K ZX Spectrum
Yep, it's amazing how much of a difference a relatively small change can make.
The TRS-80 Model I didn't have a built-in speaker. The usual methods of getting sound was to utilize the cassette port. The software would toggle the output to generate the sound by toggling the data bit. You'd either put an AM radio next to the cassette port (to pick up on the generated RF noise -- though it was mostly a lot of buzzing), or plug the lead that went into the MIC input of the cassette player into a separate amplifier/speaker unit (something that Radio Shack sold separately). The latter produced more satisfactory sound, though was fiddly if you were loading programs and data from tape due frequently plugging and unplugging of cables.
Thanks for that. I seem to vividly recall the model II making 1-bit sounds. I still remember the grainy "Game over player 1". I guess i meant that there was 1-bit sound output, and the Speccy still uses the cassette port. I said 1-bit speakers, instead of 1-bit sound. The problem of not having an editor!
Back in the 1980's, I built my own 2K, 8K and 32K static RAM packs for my ZX80 and ZX81. I noticed that the SAVE time increased dependant on which RAM pack was attached, with the 32K taking around 10 minutes to SAVE, regardless of programme code length.
Interesting, do you remember how the 32K was memory mapped?
Fascinating thank you.
Glad you enjoyed it
Great video and a good comparison between the two machines. Chris Smith's book "The ZX Spectrum ULA" is a really good read.
Cool, thanks!
Very interesting deepdive into 80 secrets cpu technology. I like your videos very much. Keep up good work. Regards Kent
Thanks, will do! More to come!
Just found this channel; very interesting video! (subbed et.c.)
Much of the hardware design in the ZX Spectrum was done by a fellow named Richard Altwasser, he was a very clever bloke IMO. Take the layout of the screen memory, for instance: to the uninitiated it looks very odd to have the screen split in three distinct segments, but it made the hardware very efficient.
Another clever point is how text characters are printed to the screen: HL points to the top line of current print position while DE points to the first byte of the character bitmap, then we do
LD A,(DE)
LD (HL),A
INC H
INC DE
eight times, et voila, we have our character on the screen .
Welcome. Yeah i tend to talk about Sir Clive, but it was his team that did the clever engineering. The ZX80/81 were shaved to the bone. If you haven't watched them yet, you might like this series
th-cam.com/play/PLjQDRjQfW-84WG47-5UjPz1BrXxc1acvd.html
Interesting stuff. The 81% CPU utilization is on par with the Atari 800 which had a graphics coprocessor called the Antic chip. The Antic chip handled video generation, but RAM was shared and the CPU was halted when the Antic chip needed RAM access.
Hi Martin, Yeah, i thought it was pretty clever how they ended up doing it, i actually think there were more clever tricks in the Sinclair series than in the Apple II.
My rudimentary test on BASIC speed between SLOW and FAST modes on ZX81 with 16K (test duration in minutes - drawing circle using sin & cos ) give me result that in SLOW mode CPU have 25% speed. So i was pretty close ;)
Cool, nice when theory and practice align!
I've really enjoyed this series! Thanks for all of the effort you've put in. I really want to watch the movie that you kept showing clips of. Any idea where an American can find it?
Hope you enjoy it! The movie is MicroMen and it's on youtube. th-cam.com/video/XXBxV6-zamM/w-d-xo.html
Hey Matt, absolutely love these videos. Noddy question, forgive me, but _how_ do the resistors divide the bus? They're in series … my mind recoils in confusion! 😄 (Obviously, I have no real electronics experience beyond your and Ben Eater's videos, etc.) 🙏
Great question. When there is 1 device driving the bus, the voltage on either side of the resistors will be the same (from a digital logic perspective) and it acts as a single bus.
On a normal bus, when two devices try to drive it, there will be conflict and some non-valid voltages as a result.
But with the series resistors, we can have two devices driving the bus provided they are on opposite sides of the resistors. There will be a voltage drop across the resistors (and some current draw), but from a logic perspective, the data on either side of the resistors can be different, so in this case, it acts as 2 busses.
@@DrMattRegan OMG, another ingenious trick, (the "other" being abusing the Z80 refresh register). Series resisters dividing ("separating" might be a better word to avoid confusion with voltage dividing) the bus, when you're to cheap to buy Tri-state buffer chips.
Exactly. It's cheaper than a tri-state buffer. The problem is that (worst case) a 5V drop across a 470 ohm resistor is 10 mA, so that is 80mA for the data bus.
Thanks for the explanations, @@DrMattRegan. I can't tell you how empowering your deep dive videos are. Between you, Ben Eater, and a few others, I've gone from being completely ignorant about electronics to building a 6502 computer using a Raspberry Pi Pico for ROM/RAM and hardware monitor (which taught me far more about clock timings and bus control than I ever wanted to know!), now working on a VGA circuit with a view to eventually using an FPGA for that!
Yeah, "separating" is much better, @@axelBr1. Thanks. 👍
Your on the spectrum
Very good!