UPDATE: I updated the code to optimize the routine that copies the buffer into color memory. Thanks to @ArneChristianRosenfeldt for pointing out some naiveties of my previous implementation. NOTE on color buffer: In the video, I forgot to mention the reason why, during the scrolling of the 7 pixels, I copy the color memory into a buffer (instead of scrolling it directly at the eighth pixel). The reason is that, otherwise, I would have to scroll the colors from bottom to top. Without the buffer, if I did the opposite (for example) copying row 1 into row 2 and then row 2 into row 3, etc., by copying row 1 into row 2, the latter gets overwritten, and therefore in the next step (2 -> 3), I would copy incorrect data! On the other hand, if I start from the bottom by copying row 22 into row 23 and then row 21 into row 22, there is no problem. You might wonder, why is it not okay to copy from bottom to top? The reason is that the raster beam is like a tsunami that rushes like the wind! Unfortunately, when it restarts drawing the background, the CPU has not yet completed all the work of scrolling the colors! However, by scrolling them from top to bottom and starting ahead of the raster beam (when it is at the beginning of the bottom border), the first rows it encounters will have already been completed, and the CPU will still manage to complete the last row before it's reached! This technique is called "racing with the raster beam," which is not applicable if we were to reverse the order of scrolling the colors because the first row reached by the raster beam will be updated last by the CPU!
For many years i used smooth scrolling on C64 screen, but then an UFO beamed me up and ejected me on the x86 alien planet with a new command prompt. After learning the alien language i made a new vertical smoot scroller for the vga text screen with 8x16 font size for up and down scrolling.
If I recall correctly on the Amiga for smooth horizontal scrolling you had to create 2 additional screens. One either side of the main display in order for the hardware scroll to work. (All 3 displaying the same image) I once tried one screen which was 16 pixels larger on both sides. Both sides hidden behind the border. When scrolled and a fresh column of 16x16 were drawn behind the border, the scroll went crazy.
The Amiga documentation talks a lot about modulo. The screen can have modulo 320px or 640px, but I could not find out what a bitplane can have .. or a playfield. The blitter can deal with any modulo in all channels. Why would you need something on “either side”. Instead of borders intruding the screen, on the Amiga you have a bit to load a word early ( but then smooth scroll delay it , so I guess they mean that you can smooth scroll the whole range 0..16 , though 0=16 ? Or you have to set an additional bit when you smooth scroll 1-15 ?? EGA and VGA with their dedicated memory just always loaded more pixels. Chunky VGA would throw away only 4 of them. 8 bit per plane EGA would only throw away 4 bytes.
Am I correct in saying this approach works because the screen will scroll regardless of player input? Conversely, if the camera is directly or indirectly controlled by the player, the screen must be scrolled on command. What are solutions for this condition? Edit: I thought about it a bit. If a hardware scroll gets close to 0, it can start to build the backbuffer preemptively, even if it is never realized due to the camera being moved in the opposite direction before the buffers should be swapped. In my case, the background graphics can change. I guess I need to ensure changes get propagated to both the front and back buffer.
The approach works even if totally controlled by the user also because the scrolling direction in this example is only one (you can download the demo and check the result). However, in the video I made about horizontal scrolling, I added the ability to go back and forth. In that case it works because when I go back, if I don't have enough pixels left to fill the backbuffer, at the last pixel, I update what's left in the backbuffer. Could be expensive, but it only happens when changing direction and is not noticeable (as you can see from the downloadable demo). There also other alternative approaches, like the one I used in my video on 8-way scrolling, which is used by many games like Turrican. Yes, if the background changes you must update it carefully. In my game Planet Balls, I have free-directional scrolling with double buffer and the background graphics change (since there are a maximum of 12 balls, 7 are hardware sprites and 5 are "software" sprites made with redefined characters).
I wonder if the repeated lda {ColBuf}+40*0,x sta {COLMEM}+40*0,x inx could be generated at runtime, like we do for webpages or Wolf3d did for the column renderer. Also why waste 2 cycles on inx here, but not here: lda {VIDMEM0}+40*0,x sta {VIDMEM1}+40*1,x lda {VIDMEM0}+40*1,x sta {VIDMEM1}+40*2,x ah got it. The second example does not fit into one byte. But a general generator would probably generate code without INX for both. I wonder why you don't align to 256 byte pages for the color buffer scrolling to save another cycle? Like you would copy 216 bytes inside a page and 40 bytes cross a page with x starting at 0 again to prevent the carry. Is the 6 important here? This does not seem to be Sonic the Hedgehog. You scroll max 8 pixel per frame.
Good points. If you try to replace lda/sta/inx with something like: scroll: lda {ColBuf}+40*0,x sta {COLMEM}+40*0,x lda {ColBuf}+40*1,x sta {COLMEM}+40*1,x lda {ColBuf}+40*2,x sta {COLMEM}+40*2,x lda {ColBuf}+40*3,x sta {COLMEM}+40*3,x lda {ColBuf}+40*4,x sta {COLMEM}+40*4,x lda {ColBuf}+40*5,x sta {COLMEM}+40*5,x lda {ColBuf}+40*6,x sta {COLMEM}+40*6,x ...(repeat until line 22)... inx cpx #40 bne scroll It shows artifacts. The reason is that you copy by columns and not by rows. Copying by columns is bad because you can't race the raster beam: the main reason why I need the ColBuf is because it allows me to scroll the colors from top to bottom (I've posted details in my pinned comment). I forget to mention this important fact in the video, my fault. Btw, we can reduce cycles like the following: ldy #0 ldx #0 scroll: lda {ColBuf}+40*0,x sta {COLMEM}+40*0,x lda {ColBuf}+40*0+1,x sta {COLMEM}+40*0+1,x lda {ColBuf}+40*0+2,x sta {COLMEM}+40*0+2,x lda {ColBuf}+40*0+3,x sta {COLMEM}+40*0+3,x lda {ColBuf}+40*0+4,x sta {COLMEM}+40*0+4,x lda {ColBuf}+40*0+5,x sta {COLMEM}+40*0+5,x lda {ColBuf}+40*0+6,x sta {COLMEM}+40*0+6,x lda {ColBuf}+40*0+7,x sta {COLMEM}+40*0+7,x ldx {add8},y ; lookup table to add 8 to x iny cpx #240 bne scroll add8: !byte 8, 16, 24, 32, ....., 240 Finally, yes, it's definitely better to align the color buffer to 256 to avoid the extra penalty of crossing the boundary. I will update the code. Thanks for pointing them out.
@@agpxnet ah, so it would have to be done in two parts at least (top and bottom). So using the full pages the unrolled loop would only do two copies. With more unrolling, x never becomes large and carry doesn’t happen often.
Hi, you can open it with a free software made by me called "C64 Graphics Maker", that's a graphic editor for C64. From there you can export "data" in a format suitabile for your project. You can found it here (with other software of mine): agpx.itch.io/
Hi, I was more after the scrolling assembly code to accompany the graphics. If you could please provide Code for scrolling the graphics vertically! Thanks
What I like about this method is that it works even better on TED (plus4). There you have a 1.8 MHz CPU in the lower/upper border and 4 useful pages for both the characters and colors. So for all direction scrolling we can waste a lot of memory and have 3 other pages half prepared if the camera moves close to a point where both scrollX and scrollY can wrap around... We should really squeeze every cycle out of the memCpy to at least bring this down to filling thirds of the other page with the scrolled versions. Then we only have to copy 3/2 characters. This is probably the reason why many games have a character status bar at the bottom: This thing just reduces the character count in the scrolling area to just make it work. So no need to use sprites for this as in non-scrolling games. I admired the status overlay using sprites on NES, but really it is just due to the limitation of the hardware. If you scroll on NES, the status appears in the playfield. EGA is similarly stupid. Status screen at 0 can appear on the lower edge of the screen, but then you have to place your Xenon2 playfield besides it and waste half of the video memory .. although EGA would store sprites there, like in a time when the ISA bus was considered fast and games actually read from vram ...
UPDATE: I updated the code to optimize the routine that copies the buffer into color memory. Thanks to @ArneChristianRosenfeldt for pointing out some naiveties of my previous implementation.
NOTE on color buffer:
In the video, I forgot to mention the reason why, during the scrolling of the 7 pixels, I copy the color memory into a buffer (instead of scrolling it directly at the eighth pixel). The reason is that, otherwise, I would have to scroll the colors from bottom to top. Without the buffer, if I did the opposite (for example) copying row 1 into row 2 and then row 2 into row 3, etc., by copying row 1 into row 2, the latter gets overwritten, and therefore in the next step (2 -> 3), I would copy incorrect data! On the other hand, if I start from the bottom by copying row 22 into row 23 and then row 21 into row 22, there is no problem. You might wonder, why is it not okay to copy from bottom to top? The reason is that the raster beam is like a tsunami that rushes like the wind! Unfortunately, when it restarts drawing the background, the CPU has not yet completed all the work of scrolling the colors! However, by scrolling them from top to bottom and starting ahead of the raster beam (when it is at the beginning of the bottom border), the first rows it encounters will have already been completed, and the CPU will still manage to complete the last row before it's reached! This technique is called "racing with the raster beam," which is not applicable if we were to reverse the order of scrolling the colors because the first row reached by the raster beam will be updated last by the CPU!
Thank you so much for the English language version. You are just brilliant at explaining Commodore 64 programming.
If you like these videos, please share! Thanks!
A very good explanation.
Thanks a lot for the time and effort you put into these tutorial-videos!
If you like these videos, please share them and you will help me produce more! Thank you!
For many years i used smooth scrolling on C64 screen, but then an UFO beamed me up and ejected me on the x86 alien planet with a new command prompt. After learning the alien language i made a new vertical smoot scroller for the vga text screen with 8x16 font size for up and down scrolling.
If I recall correctly on the Amiga for smooth horizontal scrolling you had to create 2 additional screens.
One either side of the main display in order for the hardware scroll to work. (All 3 displaying the same image)
I once tried one screen which was 16 pixels larger on both sides. Both sides hidden behind the border. When scrolled and a fresh column of 16x16 were drawn behind the border, the scroll went crazy.
I don't know how Amiga works, I've never had it. We'll see with the Commodore 64...
The Amiga documentation talks a lot about modulo. The screen can have modulo 320px or 640px, but I could not find out what a bitplane can have .. or a playfield. The blitter can deal with any modulo in all channels.
Why would you need something on “either side”. Instead of borders intruding the screen, on the Amiga you have a bit to load a word early ( but then smooth scroll delay it , so I guess they mean that you can smooth scroll the whole range 0..16 , though 0=16 ? Or you have to set an additional bit when you smooth scroll 1-15 ??
EGA and VGA with their dedicated memory just always loaded more pixels. Chunky VGA would throw away only 4 of them. 8 bit per plane EGA would only throw away 4 bytes.
Am I correct in saying this approach works because the screen will scroll regardless of player input? Conversely, if the camera is directly or indirectly controlled by the player, the screen must be scrolled on command. What are solutions for this condition?
Edit: I thought about it a bit. If a hardware scroll gets close to 0, it can start to build the backbuffer preemptively, even if it is never realized due to the camera being moved in the opposite direction before the buffers should be swapped.
In my case, the background graphics can change. I guess I need to ensure changes get propagated to both the front and back buffer.
The approach works even if totally controlled by the user also because the scrolling direction in this example is only one (you can download the demo and check the result). However, in the video I made about horizontal scrolling, I added the ability to go back and forth. In that case it works because when I go back, if I don't have enough pixels left to fill the backbuffer, at the last pixel, I update what's left in the backbuffer. Could be expensive, but it only happens when changing direction and is not noticeable (as you can see from the downloadable demo). There also other alternative approaches, like the one I used in my video on 8-way scrolling, which is used by many games like Turrican.
Yes, if the background changes you must update it carefully. In my game Planet Balls, I have free-directional scrolling with double buffer and the background graphics change (since there are a maximum of 12 balls, 7 are hardware sprites and 5 are "software" sprites made with redefined characters).
I wonder if the repeated
lda {ColBuf}+40*0,x
sta {COLMEM}+40*0,x
inx
could be generated at runtime, like we do for webpages or Wolf3d did for the column renderer.
Also why waste 2 cycles on inx here, but not here:
lda {VIDMEM0}+40*0,x
sta {VIDMEM1}+40*1,x
lda {VIDMEM0}+40*1,x
sta {VIDMEM1}+40*2,x
ah got it. The second example does not fit into one byte. But a general generator would probably generate code without INX for both.
I wonder why you don't align to 256 byte pages for the color buffer scrolling to save another cycle? Like you would copy 216 bytes inside a page and 40 bytes cross a page with x starting at 0 again to prevent the carry. Is the 6 important here? This does not seem to be Sonic the Hedgehog. You scroll max 8 pixel per frame.
Good points. If you try to replace lda/sta/inx with something like:
scroll:
lda {ColBuf}+40*0,x
sta {COLMEM}+40*0,x
lda {ColBuf}+40*1,x
sta {COLMEM}+40*1,x
lda {ColBuf}+40*2,x
sta {COLMEM}+40*2,x
lda {ColBuf}+40*3,x
sta {COLMEM}+40*3,x
lda {ColBuf}+40*4,x
sta {COLMEM}+40*4,x
lda {ColBuf}+40*5,x
sta {COLMEM}+40*5,x
lda {ColBuf}+40*6,x
sta {COLMEM}+40*6,x
...(repeat until line 22)...
inx
cpx #40
bne scroll
It shows artifacts. The reason is that you copy by columns and not by rows. Copying by columns is bad because you can't race the raster beam: the main reason why I need the ColBuf is because it allows me to scroll the colors from top to bottom (I've posted details in my pinned comment). I forget to mention this important fact in the video, my fault. Btw, we can reduce cycles like the following:
ldy #0
ldx #0
scroll:
lda {ColBuf}+40*0,x
sta {COLMEM}+40*0,x
lda {ColBuf}+40*0+1,x
sta {COLMEM}+40*0+1,x
lda {ColBuf}+40*0+2,x
sta {COLMEM}+40*0+2,x
lda {ColBuf}+40*0+3,x
sta {COLMEM}+40*0+3,x
lda {ColBuf}+40*0+4,x
sta {COLMEM}+40*0+4,x
lda {ColBuf}+40*0+5,x
sta {COLMEM}+40*0+5,x
lda {ColBuf}+40*0+6,x
sta {COLMEM}+40*0+6,x
lda {ColBuf}+40*0+7,x
sta {COLMEM}+40*0+7,x
ldx {add8},y ; lookup table to add 8 to x
iny
cpx #240
bne scroll
add8: !byte 8, 16, 24, 32, ....., 240
Finally, yes, it's definitely better to align the color buffer to 256 to avoid the extra penalty of crossing the boundary. I will update the code. Thanks for pointing them out.
Done. Source code updated. Thanks again.
@@agpxnet ah, so it would have to be done in two parts at least (top and bottom). So using the full pages the unrolled loop would only do two copies. With more unrolling, x never becomes large and carry doesn’t happen often.
Hi there! I cannot seem to open the ScrollDown.gmk64 file, which Assembler is it compatible with?
Hi, you can open it with a free software made by me called "C64 Graphics Maker", that's a graphic editor for C64. From there you can export "data" in a format suitabile for your project. You can found it here (with other software of mine): agpx.itch.io/
Hi, I was more after the scrolling assembly code to accompany the graphics. If you could please provide Code for scrolling the graphics vertically! Thanks
The assembly code is inside the Scrolling.bas file (inline assembly).
What I like about this method is that it works even better on TED (plus4). There you have a 1.8 MHz CPU in the lower/upper border and 4 useful pages for both the characters and colors. So for all direction scrolling we can waste a lot of memory and have 3 other pages half prepared if the camera moves close to a point where both scrollX and scrollY can wrap around... We should really squeeze every cycle out of the memCpy to at least bring this down to filling thirds of the other page with the scrolled versions. Then we only have to copy 3/2 characters. This is probably the reason why many games have a character status bar at the bottom: This thing just reduces the character count in the scrolling area to just make it work. So no need to use sprites for this as in non-scrolling games. I admired the status overlay using sprites on NES, but really it is just due to the limitation of the hardware. If you scroll on NES, the status appears in the playfield. EGA is similarly stupid. Status screen at 0 can appear on the lower edge of the screen, but then you have to place your Xenon2 playfield besides it and waste half of the video memory .. although EGA would store sprites there, like in a time when the ISA bus was considered fast and games actually read from vram ...
the ai voice is awful :(
Sorry for that, mine is worst. I've published subtitles.