These are the vga capture devices I purchased from amazon.co.uk VGA to HDMI Adapter amzn.to/3fmxUMT USB HDMI Video Capture Dongle amzn.to/2VgfTZI Equivalent looking links from amazon.com: VGA to HDMI Adapter amzn.to/3zXCgBZ USB HDMI Video Capture Dongle amzn.to/2VixaRP
I don't think the equivalent on amazon.com you've chosen will work: This 6ft HDMI to VGA converter cable is ideal for transmitting and converting digital signal from HDMI devices(Notebook, PC, Laptop, DVD player, Nintendo Switch etc) to analog signal VGA devices(TV, projector, monitor etc).
I want to let you know that, even if this series has a really low views/quality ratio, it's extremely informative and entertaining for the ones that do watch it. Thanks
Indeed, I really liked they way it slots is nicely as an extra feature rather than something you have to design around. Without it you have a working system, with it you get a simple improvement.
@@weirdboyjim but how do you deal with the edge case of the FIFO buffer overflowing? maybe you have to have the FIFO buffer be the same size as a whole frame and it 'degenerates' into a back buffer solution?
@@Philip8888888 If the FIFO is full, you simply insert a wait state until it isn't if you're going for a glitch free display. If you want maximum speed, give priority to the CPU write and starve the video by either outputting black or repeating the previous byte for less obvious glitching. The FIFO only need be as long as one scan line's worth of processing as it will get emptied at DMA speed during horizontal blanking & sync. A 64 byte FIFO should be more than enough until James adds the vector processor unit.
I like doing those, I did actually prepare a bit more than you got in this video for some of the later tweaks, but there was nowhere to fit it on screen.
The obvious thing to do would be to just wright through, but you should be able to design it such that it never happens. The fastest my cpu with current instruction set can write to memory is at half the clock rate, so currently that would be 40 writes during the visible portion of the line, there would be 160 pixels of blanking interval per line. So a 40 pixel fifo would be big enough, if you raised the clock rate by 5x you could theoretically write faster than you could flush on a line so you would need to be able to handle that.
@@weirdboyjim An off the shelf FIFO with 64 slots sounds like it would handle your CPU up to 5MHz. If you stick with 4 bit multiples, call it a 256 slot FIFO. Good to 12 MHz if limited to 160 fill, 20 MHz if willing to let it flush over two scan lines. Would seriously doubt any code capable of stuffing data to the VGA that fast. And honestly, suspect that a vast majority of the VGA memory interactions would be scrolling the screen. So 2 clocks to read, 2 to write, and your FIFO has only half the work to do.
Interesting to watch you work on what was THE number one issue back in the old days.... getting a picture on the screen without strangulating the CPU. I've gotta vote for "show us the FIFO" too. :)
Amazing work and great video, thank you. Reminds me of the computer boards I was working with around 1985, the 8 inch floppy disc drive controller board alone used about 90off 74 series chips.
Loved the work on the frame buffer, this was really fun to watch. You were competing with the 2021 Gymnastics Final but my eyes tended to stay on this, lol. I would also like to submit a vote in favor of the write FIFO. It would complete the design and be a nice fix for the corruption issues.
This series is looking like the practical complement my Digital Systems course in High school should have had in its second year. All the basics to make a computer where there yes (mux, demux, flipflops etc etc), but the amount of work needed to be done on top of that knowledge without any other information would be huge, this would have had a absolutely positive impact on the course overhall quality and ensuring practical knowhow would be achieved in much greater depth.
Thanks again for another video. I liked the discussion on how to solve the memory access issue. As someone who also balked at the price of dual ported RAM (seriously, it's 2021, how can this not be dirt cheap now?) it was nice to hear the discussion of alternatives.
Glad you liked it! It's a specialty part, If I was trying to make a specialty demonstration circuit I might have considered it but for this I'm really trying to solve everything from the ground up.
Dual port RAM is seldom used in any volume application, therefore it tends to be expensive because there isn't volume manufacturing behind it. Sadly it's probably not likely to get any cheaper either, with complex logic moving into FPGAs which can also implement RAM and DPRAM
Seriously, at some point, you're going to run out of ways to blow my mind - the Nyan cat animation is staggeringly good! And, probably most importantly, this is the only time I haven't minded being rick rolled, haha!
Brilliant work, James. Rick's moves looked pretty fluid. I've finally added video output to my new backplane build, but using a TMS9918. Primarily for nostalgic reasons with the TI-99 being my childhood computer. I do plan on building an alternative display card soon.
Thanks Troy! I was pleased with the animations but I might put a bit of effort in soon to get it all done in the blanking interval. I didn't want to delay the video any further though.
Man, that is so cool. I remember watching Ben Eater's vids and wanting him to take the next steps to make it more efficient. So glad I found your channel.
I used to design using TTL and CMOS circuits 25 years ago ... My last design was a thermic printer .You had to control evry single pin in those chips according to the data sheet .Then FPGAs came along ,i would find very hard to go back now .As now i have complete freedom to structure my design not in terms of generic function chips ..But i can just implement what i need ..But what you did brings me back nice memories ...specially how bulky those designs tend to be ..regards...
Glad you are finding it interesting. I plan on doing some circuits with FPGA's in future videos, but not until after finishing the pure(ish) logic cpu first.
@@weirdboyjim You are a commited person ..As i am of a certain age .Let me tell you before the micro computer revolution .All was done this way .I recall our University HP3000 system had several boards CPU .. you could literraly see the registers lol. And sometime it will fail and we had to repair them .By using some wand signal injectors and wand logic detectors ..It was fun .It is nice to see someone like you that keep that tradition alive.Engineers now have never touched a TTL manual .at least in paper . It is one of my most proud possesion .Regards .
I loved this video. Ive watched along and am making progress on my own build. Your knowledge is outstanding james. And thankyou for 'not giving up, and not letting us down'. 😁
LOL no need to apologize for the Rick Astley bit. had me laughing out loud. ;-) and brilliant job on the vga output. that works way better than I imagined it would.
Awesome as always, thanks! This feels on the same level as the first computers and consoles I started using. For the videos, what data format did you use? Just the raw data for all the frames?
The static parrot image was raw data, the animations I had to implement some compression with support for delta frames. The first one was 12 frames and came it at 13KB, the "post credit scene" was 33 frames and I had to implement some lossiness in the compressor to fit it into memory, it came in at 41KB. In both cases a stored the extra delta frame to loop it back to the start to keep it smooth.
@@weirdboyjim Thanks for asking :-) A'm not quite there yet. My project is based on the video interface of the brittish Nascom-2. The very short description is, that a certain memory area has a double function, completely invisible to the CPU. Writing 41H in a memory cell in this area, will result in the letter "A" being output to the screen. Some counters, multiplexers, an eprom (containing the graphis representation of the ASCII charset) creates the basis for the video signal. This construction makes outputting text to the screen very fast, as the CPU does not have to "draw" every single letter before text is written. This is solemnly done by simple hardware. My angle is replacing the special RAM block whith a dual-port RAM chip. This will eliminate the need for the multiplexers. Giving the CPU and the Video output curcuits individual andress and data lines to the same RAM block, also simplifies the handling of horizontal and vertical linesync. A physical VGA-end would be a very nice upgrade to the simple analog video curcuits of that time :-)
Hey really glad to see this project progress. I feel you missed a big opitunity with your vga. You use double RAM to combat the contention issue but you could perhaps use it as a hardware double buffer. That way the VGA adapter would be reading from one whilst the CPU writes to another, obviously you lose the ability to read what is currently being drawn to screen but you gain the ability to write as slow or as fast as possible since the CPU would be in charge of flipping the buffers at the next available blanking. Anyway thats just my idea, you obviously got yours to work and it's amazing.
I regret not covering this kind of double buffering in my discussion but I don't agree with you. I think perhaps you underestimate the amount of extra logic that is necessary to implement that, You would roughly double this circuit. You'll also find a true double buffer get's in the way of doing incremental screen update which is the way to get fast updating graphics on an 8-bit cpu. Double buffering comes into it's own when your cpu and memory architecture get's to the level where you can fill the buffer with useful processing at a respectable frame buffer. I plan on my games running at 60fps and I'll need every trick in the book to do that.
Thanks! I wanted to stick with the 8k ram chips, there is also another problem I'll be talking about later that I want to solve the right way rather than be tempted by the "Yeah, but we can ignore this because modern ram is insanely fast" solution.
Hi James! I really enjoy your videos, they are just awesome! I'm interested in how this memory access issue is solved in different real historical computers of different eras. Are there some articles or videos on this topic?
Good question! Lots of different ways to be honest, as many different flavors as systems. Most are variations on clocking the memory higher than the video needs to access it and/or making the cpu wait for a gap.
what about a system kind of like PCRA where the CPU writes to the shadow buffer and then the buffers swap, whereupon the write is reissued to the other RAM chip?
Double buffering like that can be effective, but it's not free. You have to double up all the address multiplexing logic (you have 2 lots of address bus to change), but it's also an impediment at times. Most games and visually impressive demos in the 8-bit era updated the screen incrementally rather than redrawing everything and double buffering means you are updating on top of the frame before last.
@@weirdboyjim first of all, huge fan. I love your build series, and if I hadn’t watched it, I might not have found so much interest in this type of hardware engineering. My idea wasn’t really a full double buffer, and after some more thought, the computer has to issue a nop on the 2nd clock after a write which guarantees on that clock that the computer won’t be accessing the vga, and it can only issue 2 writes in a row. It might be possible to use latches to store writes temporarily while putting them in the shadow ram, then trigger a swap on the NOP, which could probably be made pretty well by running it from the VGA clock and issuing the buffered write. Because the system clock is much slower than the VGA clock that runs the swap, it should be possible to pull it off. Also, only 2 writes would need to be buffered, and only one catch-up write per swap. It would certainly be quite a lot of circuitry, but it’s more like a limited write buffer where you can guarantee only 2 writes ever need be buffered like in the video than a whole redraw. While it may not be the right solution here, I don’t think it’s entirely invalid.
Our 380Z at school had a system where you could extend the blanking interval so you had full CPU access to the video RAM but the screen would blank... so you could chuck an initial display at the start of a program and you'd avoid corruption by having a completely blank screen.
Raw genius is rarely so overtly displayed (pardon the pun) as (yet again) in this video. Watching Adreas Kling working on his SerenityOS operating system evokes the same euphoria. The two people who disliked the video are clearly blind...
Congratulations! This is one fantastic project. The PCB'ed CPU looks gorgeous! Looking forward to the next improvements on this video card. Developing video games for a computer designed by yourself is the ultimate kink! ^_^
Thanks Damian! I'm looking forward to doing some more game dev once the vga is done, but did you see my games "Snek" I did outputting to a serial terminal? th-cam.com/video/efLzgweF958/w-d-xo.html
@ 21:54 You also have the paging approach: double memory bank, cpu reads/writes from one memory, the VGA driver reads from the second bank. Both banks are swapped on Vsync. Cons: tear-free requires careful timing or the CPU to set a hardware lock when the first bank is ready for swapping.
For many years I was writing games and demos where we had to worry about the raster timing so that's not a worry for me. The problem with all of these "Just have double the circuitry" comments is that it doubles the circuit, and it doubles sections the planned circuit that will appear in later videos.
@@weirdboyjim True. Note that double RAM is only one more address line, as you did in the Hybrid. On the other hand, the amount of external sync circuitry might be easier said than done indeed.
Well, after binge-watching your videos during July this is the first I catch in real-time! Fantastic work as always, and thank you for the clear explanation of several possible methods to manage the RAM addressing. One way I've thought about when trying to figure this out (I'm not building anything yet, but I'd love to) was to use several (2 or 4, maybe 8) RAM chips as "pages", that way you could read from one while writing to the others - but I doubt it would be very efficient space-wise, and small RAM chips appear hard to come by.
Glad you are enjoying it. You suggestion is either similer to the banking system I described in passing, or true hardware double buffering. The later is indeed another way of solving the problem, but it ends up being roughly double the memory interfacing logic as the option I presented.
@@akkudakkupl Assuming I understand you correctly, if you just switch between two pages, then presumably the display memory the CPU sees is suddenly being changed (from what the CPU had just written)? So a subsequent CPU read may not return the expected data? I don't see how this would be workable in practice, without yet another video buffer in memory as the CPU's master display memory (that would then need to be bulk copied to the "paged" output buffer).
@@gregclare You blast data into one page or the other, you keep local copy of data in CPU RAM space. You just copy modified data to video RAM pages once per frame, alternating pages, so that video hardware has access to one page for displaying, while CPU has access to the other for copying.
@@akkudakkupl Yes, I understand what you're proposing. But having the CPU need to do a full buffer block copy once per frame is a significant overhead. It would be more efficient to just implement a simple "CPU has access priority to the video buffer" solution, and then code to only update the Video buffer during blanking periods (so no visual artifacts created by the CPU priority access). ie. You just organise your code to do the non-visual processing during the display period, and synchronise video updates triggered by either the Vsync or Hsync pulses.
I want to explore a little bit on Bus request with several chunks. Instead of having chunks to be placed one after another, one can have them interleaved! So, for ex., even bytes go into first chunk, and odd bytes into the other. That will guarantee, that the CPU will never be stopped for more than one Clock cycle, as VGA will always alternate reads from chunks. To do that, one can have 2 RAM chips and select between them, but instead of doing it on the highest bit, do it on lowest. That way consecutive addresses will alternate between chips. With more chunks, even better things possible. CPU cannot access VRAM all the time - it must also fetch instructions. That creates an "opportunity window", when VGA can read VRAM. The key idea is to read all chunks simultaneously, and latch the data for later use. With enough chunks, there will be no need to ever stop the CPU!
That's good thinking, it's actually what a lot of "real" systems do. For a while I toyed with an idea very similar to this but but instead of stopping the cpu you have a set of latches to store the last write (2 for address, one for data). Essentially a one byte fifo but you were always guaranteed to be able to flush it within a cycle.
You're still alive! Big relief! Also lovely video again! Again this is super educational and inspiring. It really feels like you're after the holy grail of 8-bit computer graphics here. Whether you implement the FIFO or not (but you should, of course ;-)) I'm very curious to see how this will turn out. Seems to me this VGA solution should also work well connected to, say, a Z80 or 6502 CPU. Please do keep smashing those milestones! I love how every part in this series so far has ended with some great accomplishment. A Rick Roll on a breadboard, it can't get better. Thankfully there was no sound!
I think the way the NES and I think the Gameboy handle graphics is really interesting, since they don't really have a pixel framebuffer. They just have the higher level descriptions of what do display and the pixels are calculated in real-time right as the they're being output. It saves on video memory, but also requires the graphics system to actually be capable of its own processing and requires the programming to be more clever to accommodate it. It also seems to be somewhat closer to a modern GPU rather than a traditional framebuffer since all the rendering is done outside the CPU. Regarding memory access, from what I remember of my time writing an emulator, the NES allows the CPU to still run while outputting pixels, but it can't access video memory during that time.
Tile based system have a couple of uses, it's a big reduction in the memory used compared to a framebuffer with the same pixel density, but it also allows modification of a large portion of the screen by manipulating less data.
Good stuff James! I worked out a very similar circuit to what you have here based off your last video. My design is nearly identical ( minus the framebuffer ), I should have just waited and copied! ;) keep up the good series.
As you say In old 8 bit days, the video mamory was updated during the video flyback period, there is no output to the screen. AKA PET, nice video. As in the old days of CGA, if you wait for the flyback signal you get no flicker but it slows down screen updates. can still remember when I got my first Trident SVGA card
I remember in most early vga cards if you updated the palette you would get this kind of snow on the screen if changed some of the registers without waiting for blanking.
@@weirdboyjimI can remember the exciting time of going from an amber screen to colour and a CGA card, might even have been full length. I bought a motherboard without Bios as they wanted and extra £75 for the bios i think. Luck my boss had bought one so I copied his. I think the processor was an 8086 can’t remember the speed but I expect around 1meg :-) and a single 360K floppy. What exciting times we have lived through, tech changing so fast.
You could implement that kind of Write FIFO system in software by switching to a double-buffer memory architecture. Still just two RAM chips, but you multiplex both chips to both CPU and VGA, with a way to swap the front and back buffers in hardware. Updates to one buffer can be stored in a software queue that then gets flushed after the front/back buffer swap.
That kind of double buffering would require you to double up the multiplexer chips and some additional glue logic. There are indeed lots of different ways to do it, but I can only cover so many in the my discussion.
Honestly, that’s what I thought you were going to do, James. I guess the downfall of that approach is that the frames have to be fully rendered before the “ping-pong” switch gets thrown, plus it ears too much address decode in main memory... hummm, never thought of your “shadow” approach with a FIFO, that’s brilliant! Kudos!
Nice job on the design decisions and "sticking to your guns" re: costing. I think the idea of pre-loading the counters to line up the start of memory to the top/left is a good idea. You are using the right counters for that job. Does your CPU/MCU board have an interrupt controller? I was not sure but it did not seem that you were tying in the start of vertical blanking interval to an ISR... apologies if I was wrong on that. Also is the 64KB video RAM that actual working memory or do you copy an already "rendered" image to the RAM during blanking intervals? If so, that could be extended into a nice working DMA system later on... Cheers,
No interrupt controller, watch the design retrospective video if you want a bit more information about that but there is a build complexity tradeoff to be had and this thing already fills my desk.
I was rooting for the multiplexed solution -- my fave 8-bit system, the BBC micro, used that approach. The 6502 CPU ran at 2mhz, and the RAM was clocked at 4mhz to allow the video system access to it.
It is a nice technique, but they did have the same speed conundrum designing the bbc as I have here (albeit for a different underlying reason). Those 4mhz chips was a big chunk of extra part cost. We are spoiled for high speed parts these days!
For PC enthusiasts, it might be interesting to know that all classic IBM graphics cards (MDA, CGA, EGA, VGA) did use Time-Division multiplexing for video RAM access, even though on the MDA, the video clock (16.257 MHz) and the bus clock (4.77MHz) were completely different. IBM added logic to add wait states to video memory access until "the time has come for a CPU access time slot". This has some interesting consequences we observed on the PC compatible system: - Suprisingly, performance is not that much hampered as I expected: You get only every second memory cycle to the CPU on MDA and CGA, because they use 1:1 multiplexing. On CGA in 40-coloumn modes (I get to 80-column mode later), we have a dot clock of 7.16MHz, and as characters are 8 pixels wide, a character clock of 895kHz. During this period of 1117ns, the graphics card needs to load a character byte and an attribute byte, i.e. 2 cycles. As graphics card and bus interface alternate between cycles, we need to pack 4 memory cycles into the 1117ns, so the total cycle time is 280ns. This exactly fits the 270ns cycle time of the MCM4517-12 RAM used on (some copies of) the original IBM CGA. The processor gets a chance to access video memory every 540ns. A memory cycle in the IBM PC takes 4 clocks at 4.77 MHz, which is 838ns, so it will miss every other opportunity, and could access one byte of video memory per 1080ns, i.e. at a rate of 925kHz. It's the slowness of the 8086 at 4.77MHz processor that makes this rate unobtainable, though. Keep in mind that the processor needs to fetch instructions over the same bus, and the microcode for "REP STOSW" or REP MOSVW" is slow. In essence, the average delay caused by the synchronization will be half the slot frequency, so around 270ns, which is around 1.3 processor clocks. The MDA card (in its only mode, the 80-character mode) runs a similar scheme at a character clock of 1.8MHz, but it uses 16-bit memory access to fetch characters and attributes at the same time, so the memory timing matches the 895kHz CGA clock quite well. - The CGA in 80-character modes needs double the bandwidth of the CGA card in 40-character mode (or in graphics mode). The memory subsystem with its static 1:1 multiplexiing it already maxed out in 40-character mode. In 80-character mode, the CGA opportunistically also uses the cycles assigned to the processor, converting the 1:1 multiplexing into a 2:0 multiplexing. It can't make the processor wait until the end of the scanline, because the active display period of a CGA scanline is aroung 45µs, but you can't add waitstates for such a long time. The IBM PC runs memory refresh cycles that need to happen every 15.6µs over the same bus, so wait-states must not exceed 10µs or memory refreshe will fail. As I already hinted, the CGA uses the processor cycles "opportunistically", i.e. only if the processor doesn't access the RAM. If it does access the RAM, the CGA performs the processor cycle instead of the graphics card cycle - but still uses the data that was transferred during this cycle as if the graphics card cycle had happened. This causes the well-known "snow" on CGA in high-resolution text mode. - To meet the performance required by 16-color 640-pixel graphics, IBM upped the 8-bit RAM interface of the CGA card to 32 bits on the EGA card. At the same time, the 40-character character time is no longer split into four memory cycles, but into five memory cycles. The multiplexing can be configured as 2:3 (2 cycles for the EGA, 4 cycles for the bus) or 4:1. In text mode, only one character/attribute pair is loaded in a 32-bit cycle due to how the memory is organized, so EGA is on par with the MDA here that uses all 16 bits of its 16-bit cycle. Text mode needs a second RAM cycle because EGA has the font in the display RAM. So high-res text mode (80 characters) needs to use the 4:1 multiplexing, while all other modes get away with 2:3 multiplexing. Display can be disabled to get a 0:5 multiplexing to the processor for increased performance. - VGA still is the same, but the increased video clocks now generate a character clock of 1.57MHz for the 40-character character clock. In 4:1 multiplexing, the theoretical maximum rate would be 1.57MHz, while in practice mainboards add enough waitstates to 8-bit cycles for compatibility reasons to drop the rate to half of this, i.e. 780KB/s. The advantage of the 2:3 multiplexing is thus rooted in the lower latency, but total throughput is likely only slightly higher. - MCGA got the 256-color mode to work with 8-bit video RAM instead of 32-bit video RAM (which has its rate maxed out on the VGA), because IBM decided to use VRAMs, a special kind of double-ported RAM, in this display solution. They could get rid of time-division multiplexing that way.
Obviously this was implementation specific, I had a 3rd party mda card that would screen fuzz if you wrote too fast. I remember the vga had it's memory broken up into 4 banks, this was usually hidden from you but you could flip some registers and take advantage of it. Sequential bytes were in different banks but that was usually hidden from you by the memory controller. In 256 colour mode you could only have 320x200 officially but flip a few bits and you could get 320x400 on a standard vga card. But to access the memory you would have to set the mask that controlled the bank access differently for each vertical column of pixels. I used that mode for Swiv 3D. As a programmer I wasn't thinking in physical implementation terms back then but now it's clear the 4 banks were for parallelising the read.
@@weirdboyjim Your 3rd-party MDA clone is likely an early cost-optimized clone. The later MDA/Hercules clones get away with 8-bit memory (remember the original MDA had 16-bit memory) without flicker, likely using faster RAM than the original MDA. A fuzzy MDA clone likely uses 8-bit RAM as the CGA card does, but isn't running the RAM fast enough to do proper 1:1 multiplexing. The 4 banks of the VGA are more commonly called planes. While it was hidden in 256-color mode that there are four planes, the EGA/VGA memory organization was very apparent in 16-color modes. VGA memory is organized as 64K x 32 bits, whereas the bus view has 64K x 8 bits. One of the most simple ways to access the 32-bit memory is to just pick one out of four 8-bit chunks that make up the whole 32 bits, and that's where the plane stuff comes from. You could also write to multiple planes simultaneously (using the "map mask register" enabling multiple planes. WTH didn't they call it plane mask register?), or do a 16-color mode color compare read instead of reading a single plane. The 256 color mode hack you talk about is most widely known as "Mode X" or "unchained mode". The idea of the standard 256-color mode is that four subsequent bytes in processor space map to the same address in VGA memory space, so the lowest two address bits are no longer used as address bit, but as plane select bits instead. Because it *chains* the *four* planes into one virtual plane, the mode is called "chain-4". 256-color mode is dependent on the four neighbouring pixels being stored in the same 32-bit word of VGA memory to, exactly as you say, read four pixels in parallel. In Mode X, you disable reinterpretation of the two lowest bits as plane select bits, so you can address all VGA memory addresses, and not just every fourth address. Then, you disable "doubleword mode" on scanout, which causes the card to only read every fourth address in VGA memory. This makes the whole 256K accessible - at the cost that you, as the programmer, have to fiddle around with the planes and make every byte hit the correct plane, a task the VGA card did for you in chain-4 mode. The interesting thing about chain-4 mode is that due to the presentation of the 16K x 32 of VGA memory that is used (48K x 32 is unused, as only every fourth word is used) as 64K x 8, the programmer doesn't have to interoperate with the VGA card for drawing operations. The MCGA card I already mentioned in my previous comment at the end chose a completely different hardware way to provide a 64K x 8 video buffer (by using two 64K x 4 VRAM chips), but the 256-color mode of the MCGA and the non-hacked 256-color mode of the VGA behaved identical. A further cool thing about chain-4 is that chain-4 modes allow 16-bit or 32-bit accesses to video memory by just transferring more than one plane at once, without any architecture change. Some early 16-bit ISA VGA cards support 16-bit memory cycles only in chain-4 modes, or the odd/even mode (something like a chain-2 mode used for CGA compatibility), but fall back to 8-bit only if no chaining is enabled. The fact that MCGA could downgrade from a 32-bit data bus to an 8-bit data bus deliviering the same video performance as VGA shows that VRAM, a special kind of "nearly dual-ported" RAM, is a very efficient way to implement video cards. I still consider your choice to not go dual ported in your project a valid choice. If you go VRAM, you could also use an integrated CRTC (like the ubiquitious 6845) instead of your army of counters and the lookup EEPROM. But that's the level of integration your project tries to avoid to show the basics. Cheers for keeping true to that idea! BTW: I call VRAM "nearly dual-ported", because you need to issue transfer cycles to the shift register for the secondary port trough the primary port, that is usually used for CPU access. You can't do anything sensible with the secondary port alone unless you have some control to the first port, too.
Did you consider using 41264 DRAM's? They are a memory that was used on VGA controllers for PC's in the 80's & 90's. One port is read write like standard 41256 DRAM's. The other is a high speed serial-like port that is read only for the display output. Might be worth a look, although it would add some logic for accessing DRAM.
I have no idea on the pricing but it looks like a good solution that I've seen used in older begin 90s computers too like the Acorn Archimedes. Haven't dug into the data sheets but it might use cpu free cycles to fill up the FIFO, maybe using parallel banks for speed.
I had a couple of reasons to not look at those, firstly they are not really an active part. You usually end up buying them on the second hand market and I wanted to avoid supply problems. The other thing is that they are great for solving one specific problem where the second port requires just linear access. However some of the features I want to build into the cpu require more random access on the vga side and I didn't want to include multiple systems.
@@reinoud6377 I looked at the data sheet and it seems that the serial FIFO uses a 256 bit wide parallel load mechanism so as to minimize the potential for blocking the CPU.
12:25 depends on where you buy! when i used one of those exact chips in my first crappy card i bought it used for a fraction of the retail price. the downsides are the size of the chip and the speed. if you need more than ~16kB of VRAM you're gonna need a really big PCB, and at 55ns access time you won't be able to get anything that requires high bandwidth
@@weirdboyjim yea that's reasonable. i feel like i should stop commenting on this older videos and just watch the whole series to maybe get some more ideas for my own Video Card design.
Fantastic work, as always. Now that the clocks are independent, are you considering increasing the CPU's clock speed to get more done in the blanking intervals? If you are, and you go above ~4.7MHz, I'm going to have to find more performance in my emulator, so I selfishly hope the answer is 'no'. :) I was wondering how you'd handle access to memory, and this wasn't something I'd considered. I was thinking of splitting the frame buffer in half on separate chips that could be enabled independently. So top half of the screen on one, bottom half on the other. But there's probably some issue with that I haven't considered. Also, please don't apologise for the final demo; a well done surprise like that is always welcome!
Thanks Quxxy, I'll do some tests on frequency at some point, but I think I'm faster already than I really need so I it's not a high priority. I calculated a while back that 4mhz was the peak with parts running inside there specification so I'm not sure going beyond that is a great idea.
Something's not adding up for me. You said in the intro video that you wanted 640x480 with 24-bit color. That works out to 900KB just for the color data, which is many times more than your computer's entire address space. Have you reduced your requirements? Are you planning on moving away from memory-mapped video later? Maybe use bank switching? Or have I completely missed something?
Maybe I could have been clearer in the goals. 640x480 with 24-bit color as a flat bitmap is a massive amount of memory as you correctly ascertain. My goal is that I output 640x480 pixels so I'm not running at a reduced resolution but I'll be using a combination of tiles, sprites and a palette is both make the memory manageable but also the update rate faster.
James, I do believe I have a couple rails of short (1K or 2K long) byte wide IDT FIFOs if you would like some chips. Also, are you considering a RAMDAC such as the Brooktree Bt476 or Bt471? You can even find the Bt476 in 28 pin DIP and it's got everything you need to add a 256 entry palette with 24 bit colors.
I've looked at a few FIFO chips, I haven't planned out that circuit in details yet but I'll probably take fresh look at what is available. I made a pure logic fifo for the uart but that wouldn't be practical at scale just in terms of part count.
Nice video! I was only wondering why you are not considered using a double buffer technic. Where you still have two RAM chips (one front buffer and one back buffer). The VGA would only read from the front buffer and the CPU would read and write to the back buffer. Once the CPU finishes drawing a screen it would request a buffer swap which the VGA could do in the blanking region. That way there is no timing problem and no problem of handling the write signal. I think you did something similar with the swappable registers in the CPU.
I've answered that question so many times now, I can only describe so many techniques in the videos I wish I had explicitly covered that (I regarded it as derivative). Double buffering in the way you describe takes more hardware than most people seem to think, you pretty much double the circuit I built. Furthermore 8-bit games get decent update rates by doing incremental updates on the screen which double buffering makes more complex.
Success! ... Oh, thanks for not, uh, giving up on us. 🤣 Hey, I was excited by the Hybrid ++ idea. Sorry, I'm new and trying to get to speed on this project. So, I can't quite get my head around how you'd implement a FIFO queue with low-level logic. Would this be a cascading array of buffers or a small ram chip with a bit of clever counter logic to push and pop?
There are a bunch of ways you can do it. An array of buffers would probably be too much for this, I've calculated 40 entries is the peak you need so that would be 120 8-bit chips. Ram chips are an option but I'd have to solve the read/write multiplexing issue for those as well. There are also hardware FIFO chips that solve is in one go, I might be more willing to use those to demonstrate this as a side project as the fifo implementation isn't the focus. If you are interest, I did a raw logic fifo implementation for the UART - th-cam.com/video/1766wc7rCNg/w-d-xo.html
That was really impressife! And on the technical side even the rick roll at the end xD So when you take the grafics card into a pcb, i would like to make the video rom as a seperate module, so you can make a seperate board for the actual circuitry, the fifo buffer and the existing two port rom. That allows you to not cheat, make a nice improvement later but also have the highest possible cpu speed! So if i were at your situation an would like to play and develope some games, i would hate myself if i had not used the "super cpu speed option"
@@weirdboyjim Thx for your answer, I cant wait to see all the pcb layouting process and finished pcbs! Maybe the next big project will be a led monitor from scratch xD
20:30 choose to ignore the contention to update the entire screen at the start of a level: You just need to add a master ENABLE signal, so you can turn off the output (replace the RAM-DAC output with a constant value) then do your update, then re-enable the normal operation. But, if you are going to double the RAM and have two identical copies, you are *so close* to having a proper double-buffer system! Instead of writing to both, just have complementary enable signals on the bus drivers on each side, so by changing a bit you can exchange which RAM chip is connected to the CPU and which is connected to the VGA. So, you write your updated frame and then output the signal to swap.
I'll likely have a blanking control, but a true double buffering of the type you describe is more complex than that. You basically end up doubling all the multiplexer circuitry which is what I wanted to avoid. I certainly wouldn't want to do that on a breadboard.
You've probably been told this but I think It would be cool to see you and Ben Eater get together on something. A Friendly competition, a large scale project where each of you produce interconnecting "modules" that work together, I'm sure you two could come up with far better ideas lol but even just different ways to achieve the same results would be interesting.
I'd love to, but let's be realistic. I'm a tiny channel compared to Ben so it will be of less first glance interest to regular viewers of his than regular viewers of mine.
We'll see, I want to make a bit more progress before I distract myself but there might be a nice gap to play while I'm waiting for some pcb;s to be made.
Pretty cool solution. I really like it. I also guess it is in a spirit of the build. But I wonder if you considered doing double or triple buffering: 2. Two or three RAMs. One is read by GPU, next one is written by CPU., on horizontal sync and if new buffer was finished by CPU (controlled by CPU), they switch / progress. If CPU didn't write new buffer, current buffer used by GPU is not changed, and continues being current. 1. Shadow RAM (writes from CPU goes there too and read from CPU are done from it) If you do not need to read data back from VGA, you can do with just doubling of RAM, but I would go with triple buffering. I think the FIFO approach you mention would be very hard to implement, but the double buffering should be actually quite easy to do, and make software side also easier. I know doublebuffering was not really used commonly in 8 and 16-bit era, but it is worth consideration. Oh. I found you addressed this in a next video. :)
That kind of double buffering with Independent bus driving adds quite a lot of chips, basically doubles the circuit. My plan was to most stuff with scrolling etc so it was unnecessary. My next build will have true double buffering.
I know you've already finished, but the alternative I thought of was using some kind of crossover switch, so that one RAM is filled up while the other is being read out.
That is a possibility, but hardware double buffering like that has it's own drawbacks. Most 8-bit games updated the screen incrementally rather than redrawing it all each frame, with double buffering you are updating the image from 2 frames ago and it get's more complex.
Best implementation would need Vblank interrupt, but that needs interrupt controller. Most naive way of implementing it that comes to mind would be to have just NMI. When NMI is asserted allow the pipeline to empty, then push everything to the stack, next go to hardcoded adress (interrupt vector) that contains pointer to the ISR, run that and return from the branch by poping stuff from stack. Then if you need more interrupts you can add more vectors and priority encoder 😁
I talk about what would be needed for interrupts in the design retrospective video. "Best" is a complicated term. Neither interrupts or the fifo that every seems keen on will actually change a single pixel of my final big demos, they would just make some things easier at the cost of extra circuitry.
@@weirdboyjim well yes, 'best' is relative 😅 I was thinking from programming standpoint, you do stuff that can be done whenever on the main loop, wait for Vblank to blast data into frame memory, maybe update audio at the same time. For a 'simple' system focused on video and audio using Vblank for timing stuff is 'neat' 😉 but yes, interrupt control logic would be complicated.
@@akkudakkupl The way we did it in the 1980s was to simply poll the VSync line in a loop until the blanking started. No interrupts necessary and you never had to worry about being stuck in the loop as another sync is always coming.
James, I’ve been rolling this around in my head since you posted this video, and I think I’ve got an idea that might be worth trying. Since the issue is that the video RAM needs to provide a byte on every pixel clock, that effectively shuts out the CPU except on retrace intervals. What if you doubled the video RAM width to 16 bits (two RAM chips), so that the video Would only need data every other clock? You could use address bit zero from the CPU to drive a MUX to select which of the pair of video RAM chips is accessed. That would allow the 50/50 access for the CPU again. You would have to implement high byte/low byte select on each pixel clock, but that doesn’t seem difficult. The only thing in the way of this, I think, is a “stall” capability for the pipeline stage zero and the memory bridge... if the CPU tries to access the video RAM on a cycle where the VGA adapter is reading from it, then that cycle needs to be held for one beat while the video access completes. Simpler than a FIFO, I think... of course, if you have a better plan, then by all means go for it.
There are a large number of ways you can solve this contention and it wasn't possible for me to talk about all of them. Increasing the data bus width in the way you describe is very much aligned with the time division approach but you create the fetch slack with width rather than increased frequency. Either way you end up with a regular time interval where you can switch device. But you would then have twice the bus width to deal with for that switch, memory read and writes to the cpu are currently 8 bit so you need logic to deal with the bus width differently. What you propose is a big step up in component count and complexity from where my circuit is, and that circuit already handles everything I need it to.
@@weirdboyjim fair enough. Just an idea. However, I wasn’t suggesting to change the CPU main memory to 16 bits, just the portion of RAM inside the video adapter. To the CPU, it would still be 8 bits wide with the high/low byte steering logic. I do understand keeping it simple for breadboarding. Maybe I can try something like I’m thinking in my own lab and get back to you.
I’m surprised you didn’t implement a back buffer for this purpose. They work by having V-RAM that’s read/write accessible to the CPU and one that’s read accessible to the rendering hardware. Every vertical blanking interval, the 2 pieces of RAM switch roles, allowing the CPU to write the next frame to what was previously the frame buffer while the previous back buffer is being read. Such a setup could greatly simplify graphical software while eliminating artifacts like tearing and corruption entirely.
Double buffering was an option but I couldn't discuss everything in the time. I had good reasons for disregarding it though, it's more circuitry than perhaps you think and in many of my use cases it becomes an impediment. Filling the screen for every frame is asking a lot of an 8-bit cpu, what you want to be doing is carefully updating only the bits that need it and pulling trickery to appear like you are doing more. Double buffering complicates that further.
Hi James, I really like your video series of the design and build process of this diy computer! One question came t mind though as you already realized that 25 MHz is already on the edge of reliably working on breadboards and you have to use 2 ram chips instead. The obvious solution would be to use a double wide memory data bus for the video and halving the frequency you need to read memory for vga out effectively anaable time sharing access within the timing limits. And other systems did this very early on. The C64 had a 12 bit video memory, the 4 bit color ram was read in parallel to the main ram for bitmap data, the nec uPD 7220 had 16 bit video data bus designed in the late 70s and used on larger Z80 systems.
Indeed there are far more ways of solving this problem than I had time to discuss in the video. Some of the features I want to implement later wouldn't work with a parallel read strategy so I disregarded it as an option to avoid solving the same issue 2 different ways in the build.
I was wondering. It's common practise to use double buffering in software, render a frame and then display it while rendering the next. Could this be applied at a hardware level by effectively switching between a pair of memory chips so that the CPU has uncontended read/write? I appreciate this means a 1 frame delay, but it ought to mean there's no possibility of corrupting due to writes. Of course it does mean that when you swap buffers any reads from the CPU are looking at two frames ago...
Another great video and I love the RR at the end. You will be able to do so much more with a VGA output than just a UART & a terminal emulator. My VGA works almost the same way that you implemented, with RAM multiplexing. I only go up to 160x120 though. I run the CPU at 8th VGA speed (approx 3.15 MHz) and don't get any glitches. Synchronising with the GPU and CPU simplifies things greatly and you can fire pixels at it at full speed without waiting for blanking or the VGA. I also use a FIFO for input but its a single level FIFO (i.e. one set of latches for address and data). Can't you use your boot loader to load data straight into the Video memory space over UART, without writing any separate code?
Thanks David. I'll be using early game console style trickery to get a higher visual resolution rather than increasing the frame buffer size. Updating a big frame buffer is a big ask for an 8-bit cpu.
I know I’m late to this party, but why not double buffer? CPU has exclusive access to one buffer, VGA to the other. When CPU is ready, flip the access (ideally during vertical blank), so that VGA is accessing first buffer, CPU second.
If you are building your own circuit then it may well be a legitimate choice. It's not however a "duh, why didn't I think about that", you would need twice the memory chips on the vga circuit, twice the interfacing logic and some additional selection and control logic. It would be a notable improvement to the access simplicity but with significant additional complexity.
I wonder if you could just double the ram and have the cpu write to one while the vga reads from the other, then swap the next frame. Sort of like the technique you used with the virtual registers
Could you have a hardware counter to drive the address lines at the maximum ram clock for a software-configurable range (of lines or pixels) of the video ram? If so, you could have the cpu only read+write from its main memory then have the main memory set to read and the gpu memory to write when the counter is running. The cpu would prepare the picture at any time it likes and then run the counter (which halts the cpu) during vblank. This could even be async, the cpu primes the counter and the counter starts automatically with vblank (or the last hblank to gain some extra time). With some extra logic, the counter could also do line unpacking into gpu ram to save main memory. ok, ok, I _am_ suggesting DMA, no way to deny it...
There are a vast number of ways of solving this problem, I was only able to discuss a few on the video. I was going for a good balance between complexity and features but there will always be other ways more appropriate for other use cases.
Idea: It would be good if it were possible to write text to the screen. This is atm. not possible because of the 8x8 tiles. But what if you add a spritebuffer that initial contains the graphic of the ascii characters and a screenbuffer which contains which sprite should be shown on which place. Then you have a good ascii-screen (with little ram) but the programmer (you) can overwrite ascii-char-sprites that are not be used with other graphics. So it is also possible to use graphics on the screen without additional chips (except 1 more ram for the sprites)
how about making 2 memory chips one in main memory the other small as a frame buffer and either connect the small memory to main memory only during blanking and make it copy graphical memory area, you would need some kind of adress translator a rom or pla which would adress reads and writes on 2 memory chips and just page over the data or you could use blanking signal as an interupt that would force cpu to update graphical memory as a subroutine and never do it during normal program operation and eliminate pla element or halting the cpu during transfering data all together it can even disconect frame buffer memory from the main bus when it be done copying the frame i may be crazy and stupid but that was my first thought when you started explaining possybilities of memory interfacing between 2 devices
There are far more ways of doing this then I can cover in one video. What you suggest has a limitation in that there are more visible pixel than blanking clocks, so you would need ram that could be run faster and in that case it would be easier to just implement the time division method (like the c64 and BBC micro chose to).
I think you can dualport the ram if you would use input and output buffers for adress and data and then run it at twice the pixel clock. Dump what you want into the buffers and clock data whichever way. EDIT and you just discussed it in the video as multiplexing 😉
I do extend things, and the project is still in progress but "Dedicated Video Memory" could mean different things so I'm not sure how to answer. This is Ram dedicated to video.
I know I'm way too late to actually suggest anything but why didn't you go with a double framebuffer approach? CPU accesses one RAM chip, reads and writes whenever it pleases, while VGA reads the other chip. When the frame is updated, swap chips (could even do so without waiting for blanking if you're trying to optimize for quick/low-latency updates, or just swap buffers during v-blank if you want a clean image without tearing). I believe it would be similar (though not exactly the same) as the dual program counter setup for the CPU that you use for function calls.
Weird, 2 comments in quick succession on an old video with a very similar sentiment. Here was my other reply - If you are building your own circuit then it may well be a legitimate choice. It's not however a "duh, why didn't I think about that", you would need twice the memory chips on the vga circuit, twice the interfacing logic and some additional selection and control logic. It would be a notable improvement to the access simplicity but with significant additional complexity.
You could use FPGA in schematic mode to bypass lot of work, when you get it working, replicate with chips. No need to know any VHDL, just draw as you draw :)
Just pause for a minute and look at that mess of pcb's, wires and breadboards evolving on my desk and ask yourself "Is this a man inclined to take shortcuts?".
@@weirdboyjim To be fair, I think everything in the 80's other than the ZX80 & 81, used custom chips for video, (PGAs). But building the logic from discrete chips of course keeps the design accessible to those of us struggling to understand what's happening. I've never programmed a FPGA, but I imagine the logic design that goes into them won't be significantly different from using discrete chips. And you're on the slippery slope by using a EEPROM for timing rather than building from logic gates 😁😉
I really enjoy your content (I have been awaiting this video with bated breath and am looking forward to your sprite logic) but I simply can't resist a bit of gentle mockery. To wit: "Dual" not "Duel" ;D
@@weirdboyjim Please don't take offence. I meant it purely in good fun and I'm sorry if it didn't come across that way. I really stand in awe of your accomplishment.
I'm thinking about starting the pcb conversion of the vga fairly early, just to limit the "peek breadboard" issue, I'm low on desk space as you can see.
Nice, I'm interested in seeing that soon! I dislike cluttered workspaces as well. But it was cool to see how quick you spotted the mistake on the breadboards that caused the horizontal line!
@@weirdboyjim what resolution would you settle on for games? Maybe 160x120x8 would be a nice compromise, or pack 2 pixels per byte and do 320x240x4? Thats 38,400 bytes though...
Great video as always, love it. But I don't really get the advantage of the hybrid dual RAM solution. The impact on CPU performance should be the same if not higher as with the Bus Request approach except for reading. But reading from the frame buffer should basically never be necessary. What am I missing?
I think you have misunderstood, the bus request actually stops the processor so for most of the frame the CPU can't do anything. With this hybrid I don't loose anything, I just need to add some feedback timing if I want to avoid the snow on the screen. The cpu reading from video memory is actually quote common but if you didn't need that functionality you could indeed drop the shadow copy.
When he's finished, there's going to be approximately 300K bytes of memory assuming 8 bits of color data per pixel. If he goes to 24 bits of color, call it close to a megabyte. Now his main computer only has 64K of memory. Now he has two main options for programming. 1. Keep track of everything on screen in his main memory, updating the video memory as required. 2. Check the video memory to see what's being displayed, and update as needed. Given the huge difference in the amount of each type of memory, it would be better to be able to examine the video memory directly. Also given how he's using those counters, suspect he's gonna have 512K of memory of which about 200K won't be accessed by the VGA system, but will be accessible by the CPU.
@@weirdboyjim Thank you for your fast response. I assume the CPU would be stopped until the next blanking period just if it tries to read/write from the frame buffer outside of the blanking period. If the timing is good there should be no big performance panelty. With the hybrid approach, assuming image defects are not allowed, the CPU is also just allowed to write to the buffer during the blanking period. In my understanding both approaches should have similar performance. Also, I could be wrong. Thank you for your videos and keep up the good work.
I have a Multitech CGA card which somehow solves the RAM contention issue with 8 bit era technology. It uses two HM6264LP-15 chips for a total of 16kb of memory, which is what CGA has. I have no idea how it manages to avoid "snow". I see from the data sheet that there are two write cycles, but I'm not sure exactly what that is about.
There is no "one true" way to solve this issue, but I know some cga cards use the time division approach. Their frame buffers were organized such that either framebuffer or character data was read into a shift register for use so the level of contention was much lower.
@@weirdboyjim Interesting. I had wondered whether they might just be implementing a near perfect solution, rather than a perfect solution. There are a huge number of 74LS logic chips on there (even more than IBM's CGA), so it's doing something. But I didn't find time to trace the circuit to see what that is all for.
Let's be clear, it was considered but rejected. It would have made the doomed demo easier though but for the most part 8-bit era games used scrolling and incremental update as the means to get games updating quickly. Double buffering would have been a more complex circuit (If you want to separate bus access) and is a real pain for anything where you are not going to redraw the entire screen. I have plans for future builds which will include double buffering though, just not this first one.
5:00 rounding up the row and col count to a power of 2: My immediate reaction is "why not _display_ 1024 horizontally?" 1024 by 768 is the full resolution VGA. I think real cards would not have any trouble having the rows one after another, as they maintained a counter that counted up each pixel rather than having separate counters for row and column.
"full vga" is a fairly arbitrary statement these days although technically everything over 640x480 is svga (or one of the other prefixes). 1024x768 needs a 75mhz pixel clock, no hope of that on a breadboard and I'd need a much more complex parallel memory access system to pull the data out fast enough. There isn't a single "real card" way of doing this, if you use a single counter you need a more complex circuit to handle replication of lines if anything isn't run at full resolution. Display systems designed for gaming tended to lead more towards duel counters for reasons that will become obvious after the next vga video.
If you insist on letter jumbles, that's not VGA but XGA. IBM VGA uses 720x480 (text mode with 9 pixels wide characters), 640x480 (which became known as VGA) and 640x400 (allowing it to output CGA-like 640x200 and 320x200). IBM 8514 used 1024x768. Plenty of "super" VGA-type boards used much higher resolutions as well, with 1600x1200 being quite common. 1024x768 is pleasingly round, though. Potential downsides include things like requiring wider pixel counters (the blanking interval must be in there somewhere) and faster pixel clocks (e.g. 65MHz rather than 25.2MHz) or interlacing (still demanding 44.9MHz).
If you ever do need fast RAM, look into SDRAMs. They’re cheaper than SRAMs, very fast, and also quite large. Also they often have parallel outputs. They do need a lot of peripheral components though, for continually refreshing the DRAM.
I'm not actually worried about faster ram, compared to what people had in the 8-bit era we can very fast chips now. I'm more interested in using things efficiently. Adding dram refresh logic on the breadboard feels like a distraction for this project.
@@weirdboyjim yeah I agree it isn’t a good fit for this project. I was just looking about at RAM solutions for audio DSP when I discovered how cheap SDRAMs could be. At least until I realised the ESP32 had 520kB internally. Bought one anyway, maybe I’ll use it for packet radio or some other very fast sampling situation. I guess a CPU communicating to RAM that spits out its contents as radio is somewhat similar to this project in that dual port would be handy. At least it would be if the radio transmission needed to be in synch with something, which you never know with radio protocols.
In a VGA system i am working on for a homebrew system. I am using two memories like you but like a double buffer like system. Where one memory is read by the screen hardware and the other is attached to the CPU. But if you flip the buffer they switch places. And i was thinking of making it so the first frame after a flip the screen one is copied into the CPu Acessable one.
There are lots of different ways of doing it. No one way is perfect. When you say "copied into" are you talking about a byte by byte copy? How long does that take? The obvious way would be while the front buffer is being read out and so it would presumably take almost 16ms?
@@weirdboyjim yea, it would copy the frame buffer byte by byte. No idea how long but i would assume around 1/60th of a second since i was just going to use the next frame go copy it. Still in the planning stages honestly.
It was surprising to use two RAM chips just for the sake of allowing the CPU to read back video data. Wouldn't it make more sense to use one RAM chip for storing data from the CPU while the other one is being read by the VGA circuit? If we could allow the CPU to swap between RAM chips, then it could freely read and write from the current RAM chip without worrying about what the VGA circuit is doing to the other RAM chip.
You have to look at the decision in context. I didn't add two2 ram chips, I added one for the video side and used a portion of the existing ram I already has to shadow it. To implement the scheme you suggest it would be necessary to modify the existing memory pcb (which was designed with knowledge of what I was intending to do here) and then double up the ram+multiplexor circuitry that I added in this video for the two halves (plus the additional logic necessary to manage the change). What I've built get's the functionality I want with the lowest circuit complexity, naturally I could make a better circuit by adding more complexity but it's always necessary to strike a balance.
That has been discussed heavily in the comments and on discord. In short it adds more circuitry than you think but it also can get in the way of doing incremental updates to the screen buffer which is widely used by 8-bit systems to maintain high update rates.
Hope that works out for you. I wanted to do analogue VGA as it was the technology I grew up with. There are chips that do the hard work of encoding but didn't feel right. Essentially the vga to hdmi cable I'm using to capture footage is one of those chips I suppose.
These are the vga capture devices I purchased from amazon.co.uk
VGA to HDMI Adapter amzn.to/3fmxUMT
USB HDMI Video Capture Dongle amzn.to/2VgfTZI
Equivalent looking links from amazon.com:
VGA to HDMI Adapter amzn.to/3zXCgBZ
USB HDMI Video Capture Dongle amzn.to/2VixaRP
I don't think the equivalent on amazon.com you've chosen will work:
This 6ft HDMI to VGA converter cable is ideal for transmitting and converting digital signal from HDMI devices(Notebook, PC, Laptop, DVD player, Nintendo Switch etc) to analog signal VGA devices(TV, projector, monitor etc).
I want to let you know that, even if this series has a really low views/quality ratio, it's extremely informative and entertaining for the ones that do watch it. Thanks
Glad you enjoy it! You have to remember this stuff is pretty niche, I'm pleased anyone is interested!
That write FIFO is a really elegant solution to allowing much greater software freedom over video ram, I'd love to see it implemented
Indeed, I really liked they way it slots is nicely as an extra feature rather than something you have to design around. Without it you have a working system, with it you get a simple improvement.
@@weirdboyjim but how do you deal with the edge case of the FIFO buffer overflowing? maybe you have to have the FIFO buffer be the same size as a whole frame and it 'degenerates' into a back buffer solution?
I vote for the FIFO too...
Ditto! I'd totally love to see the extra mile here. Maybe it's good to go without it first so it's possible to compare both.
@@Philip8888888 If the FIFO is full, you simply insert a wait state until it isn't if you're going for a glitch free display. If you want maximum speed, give priority to the CPU write and starve the video by either outputting black or repeating the previous byte for less obvious glitching. The FIFO only need be as long as one scan line's worth of processing as it will get emptied at DMA speed during horizontal blanking & sync. A 64 byte FIFO should be more than enough until James adds the vector processor unit.
I love watching the schematic and breadboard being created at the same time
I like doing those, I did actually prepare a bit more than you got in this video for some of the later tweaks, but there was nowhere to fit it on screen.
I recently found this TH-cam gem. The presentation, the technical details, the solution. We need to start promoting this channel is so underrated.
Kind words! Good hear people enjoying the series!
Vote for the FIFO.
Would be interesting about handling the situation where the FIFO fills up prior to the VGA getting to a blanking interval.
The obvious thing to do would be to just wright through, but you should be able to design it such that it never happens. The fastest my cpu with current instruction set can write to memory is at half the clock rate, so currently that would be 40 writes during the visible portion of the line, there would be 160 pixels of blanking interval per line. So a 40 pixel fifo would be big enough, if you raised the clock rate by 5x you could theoretically write faster than you could flush on a line so you would need to be able to handle that.
@@weirdboyjim An off the shelf FIFO with 64 slots sounds like it would handle your CPU up to 5MHz. If you stick with 4 bit multiples, call it a 256 slot FIFO. Good to 12 MHz if limited to 160 fill, 20 MHz if willing to let it flush over two scan lines. Would seriously doubt any code capable of stuffing data to the VGA that fast. And honestly, suspect that a vast majority of the VGA memory interactions would be scrolling the screen. So 2 clocks to read, 2 to write, and your FIFO has only half the work to do.
Absolutely bloody brilliant! The last seconds are the icing on the cake. 🤣🤣🤣 Well done James, well done!
Thanks! That outro was actually discussed on the discord but I don't think anyone thought I was serious.
Interesting to watch you work on what was THE number one issue back in the old days.... getting a picture on the screen without strangulating the CPU. I've gotta vote for "show us the FIFO" too. :)
Indeed! There are so many ways of attacking this problem, but attacking the interesting problems is what made this project appealing.
I like the Hybrid++ design the most! Reminds me a bit of my own system, or at least very similar to how I originally envisioned everything.
Looks like there is a lot of support for doing something with that at some point.
Amazing work and great video, thank you. Reminds me of the computer boards I was working with around 1985, the 8 inch floppy disc drive controller board alone used about 90off 74 series chips.
That sounds about right! Might be fun to pull one of those apart at some point.
Loved the work on the frame buffer, this was really fun to watch. You were competing with the 2021 Gymnastics Final but my eyes tended to stay on this, lol. I would also like to submit a vote in favor of the write FIFO. It would complete the design and be a nice fix for the corruption issues.
Seems popular, I'm thinking about how/when to fit it into my plans.
This series is looking like the practical complement my Digital Systems course in High school should have had in its second year. All the basics to make a computer where there yes (mux, demux, flipflops etc etc), but the amount of work needed to be done on top of that knowledge without any other information would be huge, this would have had a absolutely positive impact on the course overhall quality and ensuring practical knowhow would be achieved in much greater depth.
Glad you are finding the series interesting!
Thanks again for another video. I liked the discussion on how to solve the memory access issue. As someone who also balked at the price of dual ported RAM (seriously, it's 2021, how can this not be dirt cheap now?) it was nice to hear the discussion of alternatives.
Glad you liked it! It's a specialty part, If I was trying to make a specialty demonstration circuit I might have considered it but for this I'm really trying to solve everything from the ground up.
Dual port RAM is seldom used in any volume application, therefore it tends to be expensive because there isn't volume manufacturing behind it.
Sadly it's probably not likely to get any cheaper either, with complex logic moving into FPGAs which can also implement RAM and DPRAM
It's cheaper to use an FPGA these days and make use of its internal block RAM which you can set up to be dual ported.
@@nockieboy mind-blown. i hadn't even considered using an FPGA for something as basic as sram. i know some of the lattice chips are pretty cheap.
Awesome :-)
I am curious about the FIFO as well.
It seems popular! I'll give it some thought.
Seriously, at some point, you're going to run out of ways to blow my mind - the Nyan cat animation is staggeringly good!
And, probably most importantly, this is the only time I haven't minded being rick rolled, haha!
Yooo are you here!
I came to this channel from your channel.
Brilliant work, James. Rick's moves looked pretty fluid. I've finally added video output to my new backplane build, but using a TMS9918. Primarily for nostalgic reasons with the TI-99 being my childhood computer. I do plan on building an alternative display card soon.
Thanks Troy! I was pleased with the animations but I might put a bit of effort in soon to get it all done in the blanking interval. I didn't want to delay the video any further though.
Man, that is so cool. I remember watching Ben Eater's vids and wanting him to take the next steps to make it more efficient. So glad I found your channel.
Glad you found it interesting! I suspect a lot of people who come to my channel have started out there.
Dude, you're a genius... I will have to watch this through another few times with a notepad.
Thanks! Glad you are enjoying my project!
I used to design using TTL and CMOS circuits 25 years ago ... My last design was a thermic printer .You had to control evry single pin in those chips according to the data sheet .Then FPGAs came along ,i would find very hard to go back now .As now i have complete freedom to structure my design not in terms of generic function chips ..But i can just implement what i need ..But what you did brings me back nice memories ...specially how bulky those designs tend to be ..regards...
Glad you are finding it interesting. I plan on doing some circuits with FPGA's in future videos, but not until after finishing the pure(ish) logic cpu first.
@@weirdboyjim You are a commited person ..As i am of a certain age .Let me tell you before the micro computer revolution .All was done this way .I recall our University HP3000 system had several boards CPU .. you could literraly see the registers lol. And sometime it will fail and we had to repair them .By using some wand signal injectors and wand logic detectors ..It was fun .It is nice to see someone like you that keep that tradition alive.Engineers now have never touched a TTL manual .at least in paper . It is one of my most proud possesion .Regards .
I loved this video. Ive watched along and am making progress on my own build. Your knowledge is outstanding james. And thankyou for 'not giving up, and not letting us down'. 😁
I wouldn't desert you!
LOL no need to apologize for the Rick Astley bit. had me laughing out loud. ;-) and brilliant job on the vga output. that works way better than I imagined it would.
Glad you enjoyed it! I want to look at my animation compression code again at some point, this was a rush job and I think I can do better.
Awesome as always, thanks! This feels on the same level as the first computers and consoles I started using.
For the videos, what data format did you use? Just the raw data for all the frames?
The static parrot image was raw data, the animations I had to implement some compression with support for delta frames. The first one was 12 frames and came it at 13KB, the "post credit scene" was 33 frames and I had to implement some lossiness in the compressor to fit it into memory, it came in at 41KB. In both cases a stored the extra delta frame to loop it back to the start to keep it smooth.
Awesome, a lot to digest, I'll have to watch it again. That will bring you a little closer to that one billion views ;) Thanks James, take care.
Thanks Jerril! Glad you are still finding the build interesting!
Brilliant :-) Thanks for sharing with us. It gave me some ideas that I'd like to implement on my old system.
Thanks! Hope your build is going well!
@@weirdboyjim Thanks for asking :-)
A'm not quite there yet. My project is based on the video interface of the brittish Nascom-2. The very short description is, that a certain memory area has a double function, completely invisible to the CPU. Writing 41H in a memory cell in this area, will result in the letter "A" being output to the screen.
Some counters, multiplexers, an eprom (containing the graphis representation of the ASCII charset) creates the basis for the video signal. This construction makes outputting text to the screen very fast, as the CPU does not have to "draw" every single letter before text is written. This is solemnly done by simple hardware.
My angle is replacing the special RAM block whith a dual-port RAM chip. This will eliminate the need for the multiplexers. Giving the CPU and the Video output curcuits individual andress and data lines to the same RAM block, also simplifies the handling of horizontal and vertical linesync.
A physical VGA-end would be a very nice upgrade to the simple analog video curcuits of that time :-)
Hey really glad to see this project progress. I feel you missed a big opitunity with your vga. You use double RAM to combat the contention issue but you could perhaps use it as a hardware double buffer. That way the VGA adapter would be reading from one whilst the CPU writes to another, obviously you lose the ability to read what is currently being drawn to screen but you gain the ability to write as slow or as fast as possible since the CPU would be in charge of flipping the buffers at the next available blanking. Anyway thats just my idea, you obviously got yours to work and it's amazing.
I regret not covering this kind of double buffering in my discussion but I don't agree with you. I think perhaps you underestimate the amount of extra logic that is necessary to implement that, You would roughly double this circuit. You'll also find a true double buffer get's in the way of doing incremental screen update which is the way to get fast updating graphics on an 8-bit cpu. Double buffering comes into it's own when your cpu and memory architecture get's to the level where you can fill the buffer with useful processing at a respectable frame buffer. I plan on my games running at 60fps and I'll need every trick in the book to do that.
I think Hybrid++ would be very interesting to investigate
I'll do something with the fifo at some point
Awsome as always. FWIW 12nS 32KB SRAMs in DIP: IDT71256SA, £3.35 on mouser. Also available, at £1.37, in SOJ for the PCB.
Thanks! I wanted to stick with the 8k ram chips, there is also another problem I'll be talking about later that I want to solve the right way rather than be tempted by the "Yeah, but we can ignore this because modern ram is insanely fast" solution.
Hi James! I really enjoy your videos, they are just awesome!
I'm interested in how this memory access issue is solved in different real historical computers of different eras. Are there some articles or videos on this topic?
Good question! Lots of different ways to be honest, as many different flavors as systems. Most are variations on clocking the memory higher than the video needs to access it and/or making the cpu wait for a gap.
what about a system kind of like PCRA where the CPU writes to the shadow buffer and then the buffers swap, whereupon the write is reissued to the other RAM chip?
Double buffering like that can be effective, but it's not free. You have to double up all the address multiplexing logic (you have 2 lots of address bus to change), but it's also an impediment at times. Most games and visually impressive demos in the 8-bit era updated the screen incrementally rather than redrawing everything and double buffering means you are updating on top of the frame before last.
@@weirdboyjim first of all, huge fan. I love your build series, and if I hadn’t watched it, I might not have found so much interest in this type of hardware engineering.
My idea wasn’t really a full double buffer, and after some more thought, the computer has to issue a nop on the 2nd clock after a write which guarantees on that clock that the computer won’t be accessing the vga, and it can only issue 2 writes in a row. It might be possible to use latches to store writes temporarily while putting them in the shadow ram, then trigger a swap on the NOP, which could probably be made pretty well by running it from the VGA clock and issuing the buffered write. Because the system clock is much slower than the VGA clock that runs the swap, it should be possible to pull it off. Also, only 2 writes would need to be buffered, and only one catch-up write per swap. It would certainly be quite a lot of circuitry, but it’s more like a limited write buffer where you can guarantee only 2 writes ever need be buffered like in the video than a whole redraw. While it may not be the right solution here, I don’t think it’s entirely invalid.
Our 380Z at school had a system where you could extend the blanking interval so you had full CPU access to the video RAM but the screen would blank... so you could chuck an initial display at the start of a program and you'd avoid corruption by having a completely blank screen.
I did think about a blanking bit for that reason.
Raw genius is rarely so overtly displayed (pardon the pun) as (yet again) in this video. Watching Adreas Kling working on his SerenityOS operating system evokes the same euphoria. The two people who disliked the video are clearly blind...
Very high praise indeed thank you R. Mo, not sure that's quite deserved but I[m glad you are enjoying the content!
Congratulations! This is one fantastic project. The PCB'ed CPU looks gorgeous! Looking forward to the next improvements on this video card. Developing video games for a computer designed by yourself is the ultimate kink! ^_^
Thanks Damian! I'm looking forward to doing some more game dev once the vga is done, but did you see my games "Snek" I did outputting to a serial terminal? th-cam.com/video/efLzgweF958/w-d-xo.html
@@weirdboyjim Yes! Fantastic!
@ 21:54 You also have the paging approach: double memory bank, cpu reads/writes from one memory, the VGA driver reads from the second bank. Both banks are swapped on Vsync. Cons: tear-free requires careful timing or the CPU to set a hardware lock when the first bank is ready for swapping.
For many years I was writing games and demos where we had to worry about the raster timing so that's not a worry for me. The problem with all of these "Just have double the circuitry" comments is that it doubles the circuit, and it doubles sections the planned circuit that will appear in later videos.
@@weirdboyjim True. Note that double RAM is only one more address line, as you did in the Hybrid. On the other hand, the amount of external sync circuitry might be easier said than done indeed.
Well, after binge-watching your videos during July this is the first I catch in real-time! Fantastic work as always, and thank you for the clear explanation of several possible methods to manage the RAM addressing. One way I've thought about when trying to figure this out (I'm not building anything yet, but I'd love to) was to use several (2 or 4, maybe 8) RAM chips as "pages", that way you could read from one while writing to the others - but I doubt it would be very efficient space-wise, and small RAM chips appear hard to come by.
In a perfect world 2 pages is all you ever need, theoreticaly. You write to one as the other is displayed, then you change the pages.
Glad you are enjoying it. You suggestion is either similer to the banking system I described in passing, or true hardware double buffering. The later is indeed another way of solving the problem, but it ends up being roughly double the memory interfacing logic as the option I presented.
@@akkudakkupl Assuming I understand you correctly, if you just switch between two pages, then presumably the display memory the CPU sees is suddenly being changed (from what the CPU had just written)? So a subsequent CPU read may not return the expected data? I don't see how this would be workable in practice, without yet another video buffer in memory as the CPU's master display memory (that would then need to be bulk copied to the "paged" output buffer).
@@gregclare You blast data into one page or the other, you keep local copy of data in CPU RAM space. You just copy modified data to video RAM pages once per frame, alternating pages, so that video hardware has access to one page for displaying, while CPU has access to the other for copying.
@@akkudakkupl Yes, I understand what you're proposing. But having the CPU need to do a full buffer block copy once per frame is a significant overhead. It would be more efficient to just implement a simple "CPU has access priority to the video buffer" solution, and then code to only update the Video buffer during blanking periods (so no visual artifacts created by the CPU priority access). ie. You just organise your code to do the non-visual processing during the display period, and synchronise video updates triggered by either the Vsync or Hsync pulses.
I want to explore a little bit on Bus request with several chunks. Instead of having chunks to be placed one after another, one can have them interleaved! So, for ex., even bytes go into first chunk, and odd bytes into the other. That will guarantee, that the CPU will never be stopped for more than one Clock cycle, as VGA will always alternate reads from chunks.
To do that, one can have 2 RAM chips and select between them, but instead of doing it on the highest bit, do it on lowest. That way consecutive addresses will alternate between chips.
With more chunks, even better things possible. CPU cannot access VRAM all the time - it must also fetch instructions. That creates an "opportunity window", when VGA can read VRAM. The key idea is to read all chunks simultaneously, and latch the data for later use. With enough chunks, there will be no need to ever stop the CPU!
That's good thinking, it's actually what a lot of "real" systems do. For a while I toyed with an idea very similar to this but but instead of stopping the cpu you have a set of latches to store the last write (2 for address, one for data). Essentially a one byte fifo but you were always guaranteed to be able to flush it within a cycle.
You're still alive! Big relief! Also lovely video again!
Again this is super educational and inspiring. It really feels like you're after the holy grail of 8-bit computer graphics here. Whether you implement the FIFO or not (but you should, of course ;-)) I'm very curious to see how this will turn out. Seems to me this VGA solution should also work well connected to, say, a Z80 or 6502 CPU.
Please do keep smashing those milestones! I love how every part in this series so far has ended with some great accomplishment. A Rick Roll on a breadboard, it can't get better.
Thankfully there was no sound!
Hmm, perhaps I should do one with sound?
@@weirdboyjim Don't you dare! ;-)
I think the way the NES and I think the Gameboy handle graphics is really interesting, since they don't really have a pixel framebuffer. They just have the higher level descriptions of what do display and the pixels are calculated in real-time right as the they're being output. It saves on video memory, but also requires the graphics system to actually be capable of its own processing and requires the programming to be more clever to accommodate it. It also seems to be somewhat closer to a modern GPU rather than a traditional framebuffer since all the rendering is done outside the CPU.
Regarding memory access, from what I remember of my time writing an emulator, the NES allows the CPU to still run while outputting pixels, but it can't access video memory during that time.
Tile based system have a couple of uses, it's a big reduction in the memory used compared to a framebuffer with the same pixel density, but it also allows modification of a large portion of the screen by manipulating less data.
I don't agree that you are sorry for Rickrolling us, but...nicely done, sir.
I didn't want to let you down!
Good stuff James! I worked out a very similar circuit to what you have here based off your last video. My design is nearly identical ( minus the framebuffer ), I should have just waited and copied! ;) keep up the good series.
It's all about the journey! ;-)
As you say In old 8 bit days, the video mamory was updated during the video flyback period, there is no output to the screen. AKA PET, nice video. As in the old days of CGA, if you wait for the flyback signal you get no flicker but it slows down screen updates. can still remember when I got my first Trident SVGA card
I remember in most early vga cards if you updated the palette you would get this kind of snow on the screen if changed some of the registers without waiting for blanking.
@@weirdboyjimI can remember the exciting time of going from an amber screen to colour and a CGA card, might even have been full length. I bought a motherboard without Bios as they wanted and extra £75 for the bios i think. Luck my boss had bought one so I copied his. I think the processor was an 8086 can’t remember the speed but I expect around 1meg :-) and a single 360K floppy. What exciting times we have lived through, tech changing so fast.
You deserve 1000 times the thumbs up on this James! Many thanks.
Thanks Mike!
You could implement that kind of Write FIFO system in software by switching to a double-buffer memory architecture. Still just two RAM chips, but you multiplex both chips to both CPU and VGA, with a way to swap the front and back buffers in hardware. Updates to one buffer can be stored in a software queue that then gets flushed after the front/back buffer swap.
That kind of double buffering would require you to double up the multiplexer chips and some additional glue logic. There are indeed lots of different ways to do it, but I can only cover so many in the my discussion.
Honestly, that’s what I thought you were going to do, James. I guess the downfall of that approach is that the frames have to be fully rendered before the “ping-pong” switch gets thrown, plus it ears too much address decode in main memory... hummm, never thought of your “shadow” approach with a FIFO, that’s brilliant! Kudos!
Amazing ! I vote for FIFO also.
You people don't want me to actually finish this thing do you ;-)
@@weirdboyjim We want this series to last a couple more years ! So after the VGA, the GPU, then FPU, then... :P
This is fantastic. Thank you for sharing
Glad you enjoyed it!
Nice job on the design decisions and "sticking to your guns" re: costing.
I think the idea of pre-loading the counters to line up the start of memory to the top/left is a good idea. You are using the right counters for that job.
Does your CPU/MCU board have an interrupt controller? I was not sure but it did not seem that you were tying in the start of vertical blanking interval to an ISR... apologies if I was wrong on that.
Also is the 64KB video RAM that actual working memory or do you copy an already "rendered" image to the RAM during blanking intervals? If so, that could be extended into a nice working DMA system later on...
Cheers,
No interrupt controller, watch the design retrospective video if you want a bit more information about that but there is a build complexity tradeoff to be had and this thing already fills my desk.
I was rooting for the multiplexed solution -- my fave 8-bit system, the BBC micro, used that approach. The 6502 CPU ran at 2mhz, and the RAM was clocked at 4mhz to allow the video system access to it.
It is a nice technique, but they did have the same speed conundrum designing the bbc as I have here (albeit for a different underlying reason). Those 4mhz chips was a big chunk of extra part cost. We are spoiled for high speed parts these days!
For PC enthusiasts, it might be interesting to know that all classic IBM graphics cards (MDA, CGA, EGA, VGA) did use Time-Division multiplexing for video RAM access, even though on the MDA, the video clock (16.257 MHz) and the bus clock (4.77MHz) were completely different. IBM added logic to add wait states to video memory access until "the time has come for a CPU access time slot". This has some interesting consequences we observed on the PC compatible system:
- Suprisingly, performance is not that much hampered as I expected: You get only every second memory cycle to the CPU on MDA and CGA, because they use 1:1 multiplexing. On CGA in 40-coloumn modes (I get to 80-column mode later), we have a dot clock of 7.16MHz, and as characters are 8 pixels wide, a character clock of 895kHz. During this period of 1117ns, the graphics card needs to load a character byte and an attribute byte, i.e. 2 cycles. As graphics card and bus interface alternate between cycles, we need to pack 4 memory cycles into the 1117ns, so the total cycle time is 280ns. This exactly fits the 270ns cycle time of the MCM4517-12 RAM used on (some copies of) the original IBM CGA. The processor gets a chance to access video memory every 540ns. A memory cycle in the IBM PC takes 4 clocks at 4.77 MHz, which is 838ns, so it will miss every other opportunity, and could access one byte of video memory per 1080ns, i.e. at a rate of 925kHz. It's the slowness of the 8086 at 4.77MHz processor that makes this rate unobtainable, though. Keep in mind that the processor needs to fetch instructions over the same bus, and the microcode for "REP STOSW" or REP MOSVW" is slow. In essence, the average delay caused by the synchronization will be half the slot frequency, so around 270ns, which is around 1.3 processor clocks. The MDA card (in its only mode, the 80-character mode) runs a similar scheme at a character clock of 1.8MHz, but it uses 16-bit memory access to fetch characters and attributes at the same time, so the memory timing matches the 895kHz CGA clock quite well.
- The CGA in 80-character modes needs double the bandwidth of the CGA card in 40-character mode (or in graphics mode). The memory subsystem with its static 1:1 multiplexiing it already maxed out in 40-character mode. In 80-character mode, the CGA opportunistically also uses the cycles assigned to the processor, converting the 1:1 multiplexing into a 2:0 multiplexing. It can't make the processor wait until the end of the scanline, because the active display period of a CGA scanline is aroung 45µs, but you can't add waitstates for such a long time. The IBM PC runs memory refresh cycles that need to happen every 15.6µs over the same bus, so wait-states must not exceed 10µs or memory refreshe will fail. As I already hinted, the CGA uses the processor cycles "opportunistically", i.e. only if the processor doesn't access the RAM. If it does access the RAM, the CGA performs the processor cycle instead of the graphics card cycle - but still uses the data that was transferred during this cycle as if the graphics card cycle had happened. This causes the well-known "snow" on CGA in high-resolution text mode.
- To meet the performance required by 16-color 640-pixel graphics, IBM upped the 8-bit RAM interface of the CGA card to 32 bits on the EGA card. At the same time, the 40-character character time is no longer split into four memory cycles, but into five memory cycles. The multiplexing can be configured as 2:3 (2 cycles for the EGA, 4 cycles for the bus) or 4:1. In text mode, only one character/attribute pair is loaded in a 32-bit cycle due to how the memory is organized, so EGA is on par with the MDA here that uses all 16 bits of its 16-bit cycle. Text mode needs a second RAM cycle because EGA has the font in the display RAM. So high-res text mode (80 characters) needs to use the 4:1 multiplexing, while all other modes get away with 2:3 multiplexing. Display can be disabled to get a 0:5 multiplexing to the processor for increased performance.
- VGA still is the same, but the increased video clocks now generate a character clock of 1.57MHz for the 40-character character clock. In 4:1 multiplexing, the theoretical maximum rate would be 1.57MHz, while in practice mainboards add enough waitstates to 8-bit cycles for compatibility reasons to drop the rate to half of this, i.e. 780KB/s. The advantage of the 2:3 multiplexing is thus rooted in the lower latency, but total throughput is likely only slightly higher.
- MCGA got the 256-color mode to work with 8-bit video RAM instead of 32-bit video RAM (which has its rate maxed out on the VGA), because IBM decided to use VRAMs, a special kind of double-ported RAM, in this display solution. They could get rid of time-division multiplexing that way.
Obviously this was implementation specific, I had a 3rd party mda card that would screen fuzz if you wrote too fast. I remember the vga had it's memory broken up into 4 banks, this was usually hidden from you but you could flip some registers and take advantage of it. Sequential bytes were in different banks but that was usually hidden from you by the memory controller. In 256 colour mode you could only have 320x200 officially but flip a few bits and you could get 320x400 on a standard vga card. But to access the memory you would have to set the mask that controlled the bank access differently for each vertical column of pixels. I used that mode for Swiv 3D. As a programmer I wasn't thinking in physical implementation terms back then but now it's clear the 4 banks were for parallelising the read.
@@weirdboyjim Your 3rd-party MDA clone is likely an early cost-optimized clone. The later MDA/Hercules clones get away with 8-bit memory (remember the original MDA had 16-bit memory) without flicker, likely using faster RAM than the original MDA. A fuzzy MDA clone likely uses 8-bit RAM as the CGA card does, but isn't running the RAM fast enough to do proper 1:1 multiplexing.
The 4 banks of the VGA are more commonly called planes. While it was hidden in 256-color mode that there are four planes, the EGA/VGA memory organization was very apparent in 16-color modes. VGA memory is organized as 64K x 32 bits, whereas the bus view has 64K x 8 bits. One of the most simple ways to access the 32-bit memory is to just pick one out of four 8-bit chunks that make up the whole 32 bits, and that's where the plane stuff comes from. You could also write to multiple planes simultaneously (using the "map mask register" enabling multiple planes. WTH didn't they call it plane mask register?), or do a 16-color mode color compare read instead of reading a single plane.
The 256 color mode hack you talk about is most widely known as "Mode X" or "unchained mode". The idea of the standard 256-color mode is that four subsequent bytes in processor space map to the same address in VGA memory space, so the lowest two address bits are no longer used as address bit, but as plane select bits instead. Because it *chains* the *four* planes into one virtual plane, the mode is called "chain-4". 256-color mode is dependent on the four neighbouring pixels being stored in the same 32-bit word of VGA memory to, exactly as you say, read four pixels in parallel. In Mode X, you disable reinterpretation of the two lowest bits as plane select bits, so you can address all VGA memory addresses, and not just every fourth address. Then, you disable "doubleword mode" on scanout, which causes the card to only read every fourth address in VGA memory. This makes the whole 256K accessible - at the cost that you, as the programmer, have to fiddle around with the planes and make every byte hit the correct plane, a task the VGA card did for you in chain-4 mode.
The interesting thing about chain-4 mode is that due to the presentation of the 16K x 32 of VGA memory that is used (48K x 32 is unused, as only every fourth word is used) as 64K x 8, the programmer doesn't have to interoperate with the VGA card for drawing operations. The MCGA card I already mentioned in my previous comment at the end chose a completely different hardware way to provide a 64K x 8 video buffer (by using two 64K x 4 VRAM chips), but the 256-color mode of the MCGA and the non-hacked 256-color mode of the VGA behaved identical.
A further cool thing about chain-4 is that chain-4 modes allow 16-bit or 32-bit accesses to video memory by just transferring more than one plane at once, without any architecture change. Some early 16-bit ISA VGA cards support 16-bit memory cycles only in chain-4 modes, or the odd/even mode (something like a chain-2 mode used for CGA compatibility), but fall back to 8-bit only if no chaining is enabled.
The fact that MCGA could downgrade from a 32-bit data bus to an 8-bit data bus deliviering the same video performance as VGA shows that VRAM, a special kind of "nearly dual-ported" RAM, is a very efficient way to implement video cards. I still consider your choice to not go dual ported in your project a valid choice. If you go VRAM, you could also use an integrated CRTC (like the ubiquitious 6845) instead of your army of counters and the lookup EEPROM. But that's the level of integration your project tries to avoid to show the basics. Cheers for keeping true to that idea!
BTW: I call VRAM "nearly dual-ported", because you need to issue transfer cycles to the shift register for the secondary port trough the primary port, that is usually used for CPU access. You can't do anything sensible with the secondary port alone unless you have some control to the first port, too.
Did you consider using 41264 DRAM's? They are a memory that was used on VGA controllers for PC's in the 80's & 90's. One port is read write like standard 41256 DRAM's. The other is a high speed serial-like port that is read only for the display output. Might be worth a look, although it would add some logic for accessing DRAM.
I have no idea on the pricing but it looks like a good solution that I've seen used in older begin 90s computers too like the Acorn Archimedes. Haven't dug into the data sheets but it might use cpu free cycles to fill up the FIFO, maybe using parallel banks for speed.
I had a couple of reasons to not look at those, firstly they are not really an active part. You usually end up buying them on the second hand market and I wanted to avoid supply problems. The other thing is that they are great for solving one specific problem where the second port requires just linear access. However some of the features I want to build into the cpu require more random access on the vga side and I didn't want to include multiple systems.
@@reinoud6377 I looked at the data sheet and it seems that the serial FIFO uses a 256 bit wide parallel load mechanism so as to minimize the potential for blocking the CPU.
@@weirdboyjim I can understand that motivation very well. It also leaves me looking forward to seeing what you are planning!
Mestre da protoboard. Parabéns
Obrigado!
12:25 depends on where you buy! when i used one of those exact chips in my first crappy card i bought it used for a fraction of the retail price.
the downsides are the size of the chip and the speed. if you need more than ~16kB of VRAM you're gonna need a really big PCB, and at 55ns access time you won't be able to get anything that requires high bandwidth
I’m trying too only use easily available parts from common suppliers.
@@weirdboyjim yea that's reasonable. i feel like i should stop commenting on this older videos and just watch the whole series to maybe get some more ideas for my own Video Card design.
Fantastic work, as always. Now that the clocks are independent, are you considering increasing the CPU's clock speed to get more done in the blanking intervals? If you are, and you go above ~4.7MHz, I'm going to have to find more performance in my emulator, so I selfishly hope the answer is 'no'. :)
I was wondering how you'd handle access to memory, and this wasn't something I'd considered. I was thinking of splitting the frame buffer in half on separate chips that could be enabled independently. So top half of the screen on one, bottom half on the other. But there's probably some issue with that I haven't considered.
Also, please don't apologise for the final demo; a well done surprise like that is always welcome!
Thanks Quxxy, I'll do some tests on frequency at some point, but I think I'm faster already than I really need so I it's not a high priority. I calculated a while back that 4mhz was the peak with parts running inside there specification so I'm not sure going beyond that is a great idea.
Something's not adding up for me. You said in the intro video that you wanted 640x480 with 24-bit color. That works out to 900KB just for the color data, which is many times more than your computer's entire address space. Have you reduced your requirements? Are you planning on moving away from memory-mapped video later? Maybe use bank switching? Or have I completely missed something?
Maybe I could have been clearer in the goals. 640x480 with 24-bit color as a flat bitmap is a massive amount of memory as you correctly ascertain. My goal is that I output 640x480 pixels so I'm not running at a reduced resolution but I'll be using a combination of tiles, sprites and a palette is both make the memory manageable but also the update rate faster.
@@weirdboyjim Ah, okay. Thanks for the clarification.
James, I do believe I have a couple rails of short (1K or 2K long) byte wide IDT FIFOs if you would like some chips. Also, are you considering a RAMDAC such as the Brooktree Bt476 or Bt471? You can even find the Bt476 in 28 pin DIP and it's got everything you need to add a 256 entry palette with 24 bit colors.
Would you be so kind to share part numbers? I've searched a bit for FIFO ICs but found only some really expensive models
I've looked at a few FIFO chips, I haven't planned out that circuit in details yet but I'll probably take fresh look at what is available. I made a pure logic fifo for the uart but that wouldn't be practical at scale just in terms of part count.
Nice video!
I was only wondering why you are not considered using a double buffer technic. Where you still have two RAM chips (one front buffer and one back buffer).
The VGA would only read from the front buffer and the CPU would read and write to the back buffer.
Once the CPU finishes drawing a screen it would request a buffer swap which the VGA could do in the blanking region. That way there is no timing problem and no problem of handling the write signal.
I think you did something similar with the swappable registers in the CPU.
I've answered that question so many times now, I can only describe so many techniques in the videos I wish I had explicitly covered that (I regarded it as derivative). Double buffering in the way you describe takes more hardware than most people seem to think, you pretty much double the circuit I built. Furthermore 8-bit games get decent update rates by doing incremental updates on the screen which double buffering makes more complex.
Success! ... Oh, thanks for not, uh, giving up on us. 🤣
Hey, I was excited by the Hybrid ++ idea. Sorry, I'm new and trying to get to speed on this project. So, I can't quite get my head around how you'd implement a FIFO queue with low-level logic. Would this be a cascading array of buffers or a small ram chip with a bit of clever counter logic to push and pop?
There are a bunch of ways you can do it. An array of buffers would probably be too much for this, I've calculated 40 entries is the peak you need so that would be 120 8-bit chips. Ram chips are an option but I'd have to solve the read/write multiplexing issue for those as well. There are also hardware FIFO chips that solve is in one go, I might be more willing to use those to demonstrate this as a side project as the fifo implementation isn't the focus. If you are interest, I did a raw logic fifo implementation for the UART - th-cam.com/video/1766wc7rCNg/w-d-xo.html
@@weirdboyjim Thanks! I'll go check it out.
I was going to leave a comment bursting with high praise for your efforts thus far, but then I got Rick-rolled and now you can forget it, buster.
Sorry 😜
This is what I needed for the evening 🙂
Hope it lives upto expectations!
@@weirdboyjim hit the spot exactly 😉
That was really impressife! And on the technical side even the rick roll at the end xD
So when you take the grafics card into a pcb, i would like to make the video rom as a seperate module, so you can make a seperate board for the actual circuitry, the fifo buffer and the existing two port rom. That allows you to not cheat, make a nice improvement later but also have the highest possible cpu speed!
So if i were at your situation an would like to play and develope some games, i would hate myself if i had not used the "super cpu speed option"
I'll try and break the pcb's up into as many modules as possible. The write fifo could be kept very separate though.
@@weirdboyjim Thx for your answer, I cant wait to see all the pcb layouting process and finished pcbs!
Maybe the next big project will be a led monitor from scratch xD
20:30 choose to ignore the contention to update the entire screen at the start of a level: You just need to add a master ENABLE signal, so you can turn off the output (replace the RAM-DAC output with a constant value) then do your update, then re-enable the normal operation.
But, if you are going to double the RAM and have two identical copies, you are *so close* to having a proper double-buffer system! Instead of writing to both, just have complementary enable signals on the bus drivers on each side, so by changing a bit you can exchange which RAM chip is connected to the CPU and which is connected to the VGA. So, you write your updated frame and then output the signal to swap.
I'll likely have a blanking control, but a true double buffering of the type you describe is more complex than that. You basically end up doubling all the multiplexer circuitry which is what I wanted to avoid. I certainly wouldn't want to do that on a breadboard.
You've probably been told this but I think It would be cool to see you and Ben Eater get together on something. A Friendly competition, a large scale project where each of you produce interconnecting "modules" that work together, I'm sure you two could come up with far better ideas lol but even just different ways to achieve the same results would be interesting.
I'd love to, but let's be realistic. I'm a tiny channel compared to Ben so it will be of less first glance interest to regular viewers of his than regular viewers of mine.
FIFO! DO IT PLEASE
We'll see, I want to make a bit more progress before I distract myself but there might be a nice gap to play while I'm waiting for some pcb;s to be made.
@@weirdboyjim :)
Pretty cool solution. I really like it. I also guess it is in a spirit of the build. But I wonder if you considered doing double or triple buffering:
2. Two or three RAMs. One is read by GPU, next one is written by CPU., on horizontal sync and if new buffer was finished by CPU (controlled by CPU), they switch / progress. If CPU didn't write new buffer, current buffer used by GPU is not changed, and continues being current.
1. Shadow RAM (writes from CPU goes there too and read from CPU are done from it)
If you do not need to read data back from VGA, you can do with just doubling of RAM, but I would go with triple buffering. I think the FIFO approach you mention would be very hard to implement, but the double buffering should be actually quite easy to do, and make software side also easier. I know doublebuffering was not really used commonly in 8 and 16-bit era, but it is worth consideration.
Oh. I found you addressed this in a next video. :)
That kind of double buffering with Independent bus driving adds quite a lot of chips, basically doubles the circuit. My plan was to most stuff with scrolling etc so it was unnecessary. My next build will have true double buffering.
@@weirdboyjim i love scrolling effects, so I think it is more entertaining and interesting build hardware with scrolling. :)
Nice touch at the end there! 🤣 Haven't seen Nyan Cat in ages.
Had to be done!
I know you've already finished, but the alternative I thought of was using some kind of crossover switch, so that one RAM is filled up while the other is being read out.
That is a possibility, but hardware double buffering like that has it's own drawbacks. Most 8-bit games updated the screen incrementally rather than redrawing it all each frame, with double buffering you are updating the image from 2 frames ago and it get's more complex.
Best implementation would need Vblank interrupt, but that needs interrupt controller.
Most naive way of implementing it that comes to mind would be to have just NMI. When NMI is asserted allow the pipeline to empty, then push everything to the stack, next go to hardcoded adress (interrupt vector) that contains pointer to the ISR, run that and return from the branch by poping stuff from stack.
Then if you need more interrupts you can add more vectors and priority encoder 😁
I talk about what would be needed for interrupts in the design retrospective video. "Best" is a complicated term. Neither interrupts or the fifo that every seems keen on will actually change a single pixel of my final big demos, they would just make some things easier at the cost of extra circuitry.
@@weirdboyjim well yes, 'best' is relative 😅
I was thinking from programming standpoint, you do stuff that can be done whenever on the main loop, wait for Vblank to blast data into frame memory, maybe update audio at the same time.
For a 'simple' system focused on video and audio using Vblank for timing stuff is 'neat' 😉 but yes, interrupt control logic would be complicated.
@@akkudakkupl The way we did it in the 1980s was to simply poll the VSync line in a loop until the blanking started. No interrupts necessary and you never had to worry about being stuck in the loop as another sync is always coming.
@@Peter_S_ Also a valid way to do it ;-)
James, I’ve been rolling this around in my head since you posted this video, and I think I’ve got an idea that might be worth trying.
Since the issue is that the video RAM needs to provide a byte on every pixel clock, that effectively shuts out the CPU except on retrace intervals. What if you doubled the video RAM width to 16 bits (two RAM chips), so that the video Would only need data every other clock? You could use address bit zero from the CPU to drive a MUX to select which of the pair of video RAM chips is accessed. That would allow the 50/50 access for the CPU again. You would have to implement high byte/low byte select on each pixel clock, but that doesn’t seem difficult.
The only thing in the way of this, I think, is a “stall” capability for the pipeline stage zero and the memory bridge... if the CPU tries to access the video RAM on a cycle where the VGA adapter is reading from it, then that cycle needs to be held for one beat while the video access completes.
Simpler than a FIFO, I think... of course, if you have a better plan, then by all means go for it.
There are a large number of ways you can solve this contention and it wasn't possible for me to talk about all of them. Increasing the data bus width in the way you describe is very much aligned with the time division approach but you create the fetch slack with width rather than increased frequency. Either way you end up with a regular time interval where you can switch device. But you would then have twice the bus width to deal with for that switch, memory read and writes to the cpu are currently 8 bit so you need logic to deal with the bus width differently. What you propose is a big step up in component count and complexity from where my circuit is, and that circuit already handles everything I need it to.
@@weirdboyjim fair enough. Just an idea. However, I wasn’t suggesting to change the CPU main memory to 16 bits, just the portion of RAM inside the video adapter. To the CPU, it would still be 8 bits wide with the high/low byte steering logic.
I do understand keeping it simple for breadboarding. Maybe I can try something like I’m thinking in my own lab and get back to you.
46:00 Unsubbed, blocked, reported, and a complaint raised with Ofcom ;)
Seriously though this was fantastic, every minute of it, well done!
It's only a shame the audio circuit was partially disassembled at the time
I’m surprised you didn’t implement a back buffer for this purpose. They work by having V-RAM that’s read/write accessible to the CPU and one that’s read accessible to the rendering hardware. Every vertical blanking interval, the 2 pieces of RAM switch roles, allowing the CPU to write the next frame to what was previously the frame buffer while the previous back buffer is being read. Such a setup could greatly simplify graphical software while eliminating artifacts like tearing and corruption entirely.
Double buffering was an option but I couldn't discuss everything in the time. I had good reasons for disregarding it though, it's more circuitry than perhaps you think and in many of my use cases it becomes an impediment. Filling the screen for every frame is asking a lot of an 8-bit cpu, what you want to be doing is carefully updating only the bits that need it and pulling trickery to appear like you are doing more. Double buffering complicates that further.
Amazing results mate!
Thanks Seon! I was really pleased with how this came out.
Hi James, I really like your video series of the design and build process of this diy computer! One question came t mind though as you already realized that 25 MHz is already on the edge of reliably working on breadboards and you have to use 2 ram chips instead. The obvious solution would be to use a double wide memory data bus for the video and halving the frequency you need to read memory for vga out effectively anaable time sharing access within the timing limits. And other systems did this very early on. The C64 had a 12 bit video memory, the 4 bit color ram was read in parallel to the main ram for bitmap data, the nec uPD 7220 had 16 bit video data bus designed in the late 70s and used on larger Z80 systems.
Indeed there are far more ways of solving this problem than I had time to discuss in the video. Some of the features I want to implement later wouldn't work with a parallel read strategy so I disregarded it as an option to avoid solving the same issue 2 different ways in the build.
I was wondering. It's common practise to use double buffering in software, render a frame and then display it while rendering the next. Could this be applied at a hardware level by effectively switching between a pair of memory chips so that the CPU has uncontended read/write? I appreciate this means a 1 frame delay, but it ought to mean there's no possibility of corrupting due to writes. Of course it does mean that when you swap buffers any reads from the CPU are looking at two frames ago...
That depends on the machine, a lot of early devices didn't have double buffering in hardware so it was done in software.
Another great video and I love the RR at the end. You will be able to do so much more with a VGA output than just a UART & a terminal emulator. My VGA works almost the same way that you implemented, with RAM multiplexing. I only go up to 160x120 though. I run the CPU at 8th VGA speed (approx 3.15 MHz) and don't get any glitches. Synchronising with the GPU and CPU simplifies things greatly and you can fire pixels at it at full speed without waiting for blanking or the VGA. I also use a FIFO for input but its a single level FIFO (i.e. one set of latches for address and data). Can't you use your boot loader to load data straight into the Video memory space over UART, without writing any separate code?
Thanks David. I'll be using early game console style trickery to get a higher visual resolution rather than increasing the frame buffer size. Updating a big frame buffer is a big ask for an 8-bit cpu.
I know I’m late to this party, but why not double buffer? CPU has exclusive access to one buffer, VGA to the other. When CPU is ready, flip the access (ideally during vertical blank), so that VGA is accessing first buffer, CPU second.
If you are building your own circuit then it may well be a legitimate choice. It's not however a "duh, why didn't I think about that", you would need twice the memory chips on the vga circuit, twice the interfacing logic and some additional selection and control logic. It would be a notable improvement to the access simplicity but with significant additional complexity.
I wonder if you could just double the ram and have the cpu write to one while the vga reads from the other, then swap the next frame. Sort of like the technique you used with the virtual registers
That would absolutely work, but it would cost you nearly twice the components. It’s always a balancing act.
Could you have a hardware counter to drive the address lines at the maximum ram clock for a software-configurable range (of lines or pixels) of the video ram? If so, you could have the cpu only read+write from its main memory then have the main memory set to read and the gpu memory to write when the counter is running. The cpu would prepare the picture at any time it likes and then run the counter (which halts the cpu) during vblank. This could even be async, the cpu primes the counter and the counter starts automatically with vblank (or the last hblank to gain some extra time).
With some extra logic, the counter could also do line unpacking into gpu ram to save main memory.
ok, ok, I _am_ suggesting DMA, no way to deny it...
There are a vast number of ways of solving this problem, I was only able to discuss a few on the video. I was going for a good balance between complexity and features but there will always be other ways more appropriate for other use cases.
Thank you
Thanks Paul!
Idea: It would be good if it were possible to write text to the screen. This is atm. not possible because of the 8x8 tiles. But what if you add a spritebuffer that initial contains the graphic of the ascii characters and a screenbuffer which contains which sprite should be shown on which place. Then you have a good ascii-screen (with little ram) but the programmer (you) can overwrite ascii-char-sprites that are not be used with other graphics. So it is also possible to use graphics on the screen without additional chips (except 1 more ram for the sprites)
Long way to go with the project, wait and see ;-)
how about making 2 memory chips one in main memory the other small as a frame buffer and either connect the small memory to main memory only during blanking and make it copy graphical memory area, you would need some kind of adress translator a rom or pla which would adress reads and writes on 2 memory chips and just page over the data
or you could use blanking signal as an interupt that would force cpu to update graphical memory as a subroutine and never do it during normal program operation and eliminate pla element or halting the cpu during transfering data all together it can even disconect frame buffer memory from the main bus when it be done copying the frame
i may be crazy and stupid but that was my first thought when you started explaining possybilities of memory interfacing between 2 devices
There are far more ways of doing this then I can cover in one video. What you suggest has a limitation in that there are more visible pixel than blanking clocks, so you would need ram that could be run faster and in that case it would be easier to just implement the time division method (like the c64 and BBC micro chose to).
I think you can dualport the ram if you would use input and output buffers for adress and data and then run it at twice the pixel clock. Dump what you want into the buffers and clock data whichever way.
EDIT and you just discussed it in the video as multiplexing 😉
I do try and cover the obvious stuff that I'm not going to use, but it's impossible to talk about everything. This video was far too long as it is.
Hmmm, would prefer dedicated VIdeo RAM but I suppose that'll be a later video :)
I do extend things, and the project is still in progress but "Dedicated Video Memory" could mean different things so I'm not sure how to answer. This is Ram dedicated to video.
I know I'm way too late to actually suggest anything but why didn't you go with a double framebuffer approach? CPU accesses one RAM chip, reads and writes whenever it pleases, while VGA reads the other chip. When the frame is updated, swap chips (could even do so without waiting for blanking if you're trying to optimize for quick/low-latency updates, or just swap buffers during v-blank if you want a clean image without tearing). I believe it would be similar (though not exactly the same) as the dual program counter setup for the CPU that you use for function calls.
Weird, 2 comments in quick succession on an old video with a very similar sentiment. Here was my other reply - If you are building your own circuit then it may well be a legitimate choice. It's not however a "duh, why didn't I think about that", you would need twice the memory chips on the vga circuit, twice the interfacing logic and some additional selection and control logic. It would be a notable improvement to the access simplicity but with significant additional complexity.
You could use FPGA in schematic mode to bypass lot of work, when you get it working, replicate with chips. No need to know any VHDL, just draw as you draw :)
Just pause for a minute and look at that mess of pcb's, wires and breadboards evolving on my desk and ask yourself "Is this a man inclined to take shortcuts?".
@@weirdboyjim just trying to be helpful 🙂
Oh please, Verilog not VHDL. 😉
@@weirdboyjim To be fair, I think everything in the 80's other than the ZX80 & 81, used custom chips for video, (PGAs). But building the logic from discrete chips of course keeps the design accessible to those of us struggling to understand what's happening. I've never programmed a FPGA, but I imagine the logic design that goes into them won't be significantly different from using discrete chips. And you're on the slippery slope by using a EEPROM for timing rather than building from logic gates 😁😉
I really enjoy your content (I have been awaiting this video with bated breath and am looking forward to your sprite logic) but I simply can't resist a bit of gentle mockery. To wit: "Dual" not "Duel" ;D
I'm going to make the occasional mistake
@@weirdboyjim Please don't take offence. I meant it purely in good fun and I'm sorry if it didn't come across that way. I really stand in awe of your accomplishment.
Duel RAM = bus contention
@@Peter_S_ OK, That's pretty funny!
@@Peter_S_ Pull up resistors at dawn!
Just awesome work!
Thank you! Cheers!
Extremely cool! And thanks for the rickroll :P Do you think you will make a PCB next, or do you want to build the ++ version you discussed first?
I'm thinking about starting the pcb conversion of the vga fairly early, just to limit the "peek breadboard" issue, I'm low on desk space as you can see.
Nice, I'm interested in seeing that soon! I dislike cluttered workspaces as well.
But it was cool to see how quick you spotted the mistake on the breadboards that caused the horizontal line!
@@weirdboyjim what resolution would you settle on for games? Maybe 160x120x8 would be a nice compromise, or pack 2 pixels per byte and do 320x240x4? Thats 38,400 bytes though...
Great video as always, love it. But I don't really get the advantage of the hybrid dual RAM solution. The impact on CPU performance should be the same if not higher as with the Bus Request approach except for reading. But reading from the frame buffer should basically never be necessary. What am I missing?
I think you have misunderstood, the bus request actually stops the processor so for most of the frame the CPU can't do anything. With this hybrid I don't loose anything, I just need to add some feedback timing if I want to avoid the snow on the screen. The cpu reading from video memory is actually quote common but if you didn't need that functionality you could indeed drop the shadow copy.
When he's finished, there's going to be approximately 300K bytes of memory assuming 8 bits of color data per pixel. If he goes to 24 bits of color, call it close to a megabyte. Now his main computer only has 64K of memory. Now he has two main options for programming.
1. Keep track of everything on screen in his main memory, updating the video memory as required.
2. Check the video memory to see what's being displayed, and update as needed.
Given the huge difference in the amount of each type of memory, it would be better to be able to examine the video memory directly. Also given how he's using those counters, suspect he's gonna have 512K of memory of which about 200K won't be accessed by the VGA system, but will be accessible by the CPU.
@@weirdboyjim Thank you for your fast response. I assume the CPU would be stopped until the next blanking period just if it tries to read/write from the frame buffer outside of the blanking period. If the timing is good there should be no big performance panelty. With the hybrid approach, assuming image defects are not allowed, the CPU is also just allowed to write to the buffer during the blanking period. In my understanding both approaches should have similar performance. Also, I could be wrong. Thank you for your videos and keep up the good work.
@@weirdboyjim this is another case where interrupts would be handy. I'll go hide in the corner. :)
I have a Multitech CGA card which somehow solves the RAM contention issue with 8 bit era technology. It uses two HM6264LP-15 chips for a total of 16kb of memory, which is what CGA has. I have no idea how it manages to avoid "snow".
I see from the data sheet that there are two write cycles, but I'm not sure exactly what that is about.
There is no "one true" way to solve this issue, but I know some cga cards use the time division approach. Their frame buffers were organized such that either framebuffer or character data was read into a shift register for use so the level of contention was much lower.
@@weirdboyjim Interesting. I had wondered whether they might just be implementing a near perfect solution, rather than a perfect solution. There are a huge number of 74LS logic chips on there (even more than IBM's CGA), so it's doing something. But I didn't find time to trace the circuit to see what that is all for.
I'm curious why double buffering wasn't considered as you're already going with double memory.
Let's be clear, it was considered but rejected. It would have made the doomed demo easier though but for the most part 8-bit era games used scrolling and incremental update as the means to get games updating quickly. Double buffering would have been a more complex circuit (If you want to separate bus access) and is a real pain for anything where you are not going to redraw the entire screen. I have plans for future builds which will include double buffering though, just not this first one.
Beautiful! And (edited) the Rick Astley bit made me laugh… 😂
Thanks Pete!
5:00 rounding up the row and col count to a power of 2: My immediate reaction is "why not _display_ 1024 horizontally?" 1024 by 768 is the full resolution VGA.
I think real cards would not have any trouble having the rows one after another, as they maintained a counter that counted up each pixel rather than having separate counters for row and column.
"full vga" is a fairly arbitrary statement these days although technically everything over 640x480 is svga (or one of the other prefixes). 1024x768 needs a 75mhz pixel clock, no hope of that on a breadboard and I'd need a much more complex parallel memory access system to pull the data out fast enough. There isn't a single "real card" way of doing this, if you use a single counter you need a more complex circuit to handle replication of lines if anything isn't run at full resolution. Display systems designed for gaming tended to lead more towards duel counters for reasons that will become obvious after the next vga video.
If you insist on letter jumbles, that's not VGA but XGA. IBM VGA uses 720x480 (text mode with 9 pixels wide characters), 640x480 (which became known as VGA) and 640x400 (allowing it to output CGA-like 640x200 and 320x200). IBM 8514 used 1024x768. Plenty of "super" VGA-type boards used much higher resolutions as well, with 1600x1200 being quite common. 1024x768 is pleasingly round, though.
Potential downsides include things like requiring wider pixel counters (the blanking interval must be in there somewhere) and faster pixel clocks (e.g. 65MHz rather than 25.2MHz) or interlacing (still demanding 44.9MHz).
If you ever do need fast RAM, look into SDRAMs. They’re cheaper than SRAMs, very fast, and also quite large. Also they often have parallel outputs. They do need a lot of peripheral components though, for continually refreshing the DRAM.
I'm not actually worried about faster ram, compared to what people had in the 8-bit era we can very fast chips now. I'm more interested in using things efficiently. Adding dram refresh logic on the breadboard feels like a distraction for this project.
@@weirdboyjim yeah I agree it isn’t a good fit for this project. I was just looking about at RAM solutions for audio DSP when I discovered how cheap SDRAMs could be. At least until I realised the ESP32 had 520kB internally. Bought one anyway, maybe I’ll use it for packet radio or some other very fast sampling situation. I guess a CPU communicating to RAM that spits out its contents as radio is somewhat similar to this project in that dual port would be handy. At least it would be if the radio transmission needed to be in synch with something, which you never know with radio protocols.
we already got rick astley , now we need bad apple and Doom
Doom is more of an early 32bit game. I'm worried nobody would trust any video from me now talking about video playback.
In a VGA system i am working on for a homebrew system. I am using two memories like you but like a double buffer like system. Where one memory is read by the screen hardware and the other is attached to the CPU. But if you flip the buffer they switch places. And i was thinking of making it so the first frame after a flip the screen one is copied into the CPu Acessable one.
There are lots of different ways of doing it. No one way is perfect. When you say "copied into" are you talking about a byte by byte copy? How long does that take? The obvious way would be while the front buffer is being read out and so it would presumably take almost 16ms?
@@weirdboyjim yea, it would copy the frame buffer byte by byte. No idea how long but i would assume around 1/60th of a second since i was just going to use the next frame go copy it. Still in the planning stages honestly.
Yay more vga shenanigans :D
Hope you enjoyed.
It was surprising to use two RAM chips just for the sake of allowing the CPU to read back video data. Wouldn't it make more sense to use one RAM chip for storing data from the CPU while the other one is being read by the VGA circuit? If we could allow the CPU to swap between RAM chips, then it could freely read and write from the current RAM chip without worrying about what the VGA circuit is doing to the other RAM chip.
You have to look at the decision in context. I didn't add two2 ram chips, I added one for the video side and used a portion of the existing ram I already has to shadow it. To implement the scheme you suggest it would be necessary to modify the existing memory pcb (which was designed with knowledge of what I was intending to do here) and then double up the ram+multiplexor circuitry that I added in this video for the two halves (plus the additional logic necessary to manage the change). What I've built get's the functionality I want with the lowest circuit complexity, naturally I could make a better circuit by adding more complexity but it's always necessary to strike a balance.
How about using a front & back buffer configuration?
That has been discussed heavily in the comments and on discord. In short it adds more circuitry than you think but it also can get in the way of doing incremental updates to the screen buffer which is widely used by 8-bit systems to maintain high update rates.
Came for the VGA circuit...stayed for the Rick Roll. Awesome video!
Thanks! Getting the circuit to the point where I can do some animations was the obvious next step. The animations chose themselves.
Im designing something which outputs VGA and HDMI, I also decided to double up the ram.
Hope that works out for you. I wanted to do analogue VGA as it was the technology I grew up with. There are chips that do the hard work of encoding but didn't feel right. Essentially the vga to hdmi cable I'm using to capture footage is one of those chips I suppose.
Fantastic!
Happy Now ;-)
@@weirdboyjim Indeed I am!
You know you are doing well when you satisty the SEKs God.
20:38 its actually called blitting to memory (Bit blit)
That's a specialist term that is usually only used for a dma that is raster layout or pixel format aware.