Speccy owns C64 in 32bit count off

แชร์
ฝัง
  • เผยแพร่เมื่อ 21 พ.ย. 2024

ความคิดเห็น • 22

  • @TheRealWinsletFan
    @TheRealWinsletFan 7 หลายเดือนก่อน +3

    adc #0 is the obvious choice not the weird choice IMO.

    • @CallousCoder
      @CallousCoder  7 หลายเดือนก่อน

      Except then you shouldn't call clc before the adc# 0 and that doesn't make the code repeat in case of self modifying code or even a sub routine, you would need to call clc conditionally, since you need to clear the carry each new loop and not clear the carry to carry the carry.
      This is why I tend to use the clc and adc #1 combination, also it is more explicit that one is added and the carry is taken into account.
      But I added both versions of the code thanks to your reply.

  • @TheUtuber999
    @TheUtuber999 4 หลายเดือนก่อน +1

    Another way to perform the calculations more quickly would be to measure the elapsed time for incrementing 16 bits, then multiply that by 65536 to extrapolate the time for the upper 16 bits that were not measured explicitly. Here is an example for the C64:
    .C:033c A9 00 LDA #$00
    .C:033e 85 FB STA $FB
    .C:0340 85 FC STA $FC
    .C:0342 E6 FC INC $FC
    .C:0344 D0 FC BNE $0342
    .C:0346 E6 FB INC $FB
    .C:0348 D0 F8 BNE $0342
    .C:034a 60 RTS
    10 ti$="000000":sys828:printti
    ready.
    run
    34
    ready.
    ? 34*2^16/60/60/60"hours
    10.3158519 hours
    Edit: Since the Z80 and Arm calculations were performed using the CPU registers instead of memory, a fairer comparison might be doing something similar with the 6502...
    .C:033c A2 00 LDX #$00
    .C:033e A0 00 LDY #$00
    .C:0340 C8 INY
    .C:0341 D0 FD BNE $0340
    .C:0343 E8 INX
    .C:0344 D0 FA BNE $0340
    .C:0346 60 RTS
    10 ti$="000000":sys828:printti
    ready.
    run
    22
    ready.
    ?22*2^16/60/60/60"hours
    6.67496297 hours

  • @Retro_Bebzon
    @Retro_Bebzon 6 หลายเดือนก่อน +1

    16:05 I think on M1's sophisticated operating system just running a program adds a considerable overhead related to creating a process etc. In other words the ~1 s result you got includes more operations than just addition to 0xFFFFFFFF. To get a more realistic timing you could, for example, run the whole addition loop 100 times and then divide the time by 100.

    • @CallousCoder
      @CallousCoder  6 หลายเดือนก่อน +1

      Only slight more but yeah. The exit is actually a pretty heavy system call and the start isn’t trivial either.
      And I ran it many times and took the average hence I said in the video it ran 1.39 seconds on average 😉
      Yeah it’s incredible the power we have these days.

  • @GrevDrake
    @GrevDrake 6 หลายเดือนก่อน +1

    To visualize it, set the start address of "bytes:" to $0400, you'll see the counter in the upper left corner of the screen :)

    • @CallousCoder
      @CallousCoder  6 หลายเดือนก่อน

      But that would add extra overhead if you would want to make it show something useful. I don’t know the petscii character numbers.

    • @GrevDrake
      @GrevDrake 6 หลายเดือนก่อน

      @@CallousCoder I mean: if you set the address *=$0400 before the part with the 4 bytes, it would store and work with the 4 character positions. With that, you can actually see how fast the routine is running on a C64 without using a machine code monitor.
      BasicUpstart2(main) // 10 sys4096
      *=$1000
      main:
      loop:
      clc // clear carry flag
      lda bytes+3 // read byte 3
      adc #1 // add 1 (if a = ff and 1 is added, a is set to zero, which sets the carry flag and the zero flag)
      sta bytes+3 // write to byte 3
      bcc loop // if carry flag is not set then go to loop
      // else continue
      clc // clear carry flag
      lda bytes+2 // read byte 2
      adc #1 // add 1 (if a = ff and 1 is added, a is set to zero, which sets the carry flag and the zero flag)
      sta bytes+2 // write to byte 2
      bcc loop // if carry flag is not set then go to loop
      // else continue
      clc // clear carry flag
      lda bytes+1 // read byte 1
      adc #1 // add 1 (if a = ff and 1 is added, a is set to zero, which sets the carry flag and the zero flag)
      sta bytes+1 // write to byte 1
      bcc loop // if carry flag is not set then go to loop
      // else continue
      clc // clear carry flag
      lda bytes+0 // read byte 0
      adc #1 // add 1 (if a = ff and 1 is added, a is set to zero, which sets the carry flag and the zero flag)
      sta bytes+0 // write to byte 0
      bcc loop // if carry flag is not set then go to loop
      // else continue
      rts // end program
      *=$0400
      bytes:
      .byte $00,$00,$00,$00 // display the bytes on the screen while they are going from 00 to ff
      // it's better to use $fb to $fe (zero page) for this

    • @GrevDrake
      @GrevDrake 6 หลายเดือนก่อน

      @@CallousCoder Me neither, at least not by head, but you'd see the counter running on the screen:
      *=$0400
      bytes:
      .byte $00,$00,$00,$00
      You set the first 4 bytes of the screen to 0, then use these addresses for the counter, so you see each number increase while it's running :)

  • @Ayush_Story_Tv
    @Ayush_Story_Tv 6 หลายเดือนก่อน +1

    From where are you bro ? I am from india 🇮🇳♥️

    • @CallousCoder
      @CallousCoder  6 หลายเดือนก่อน +1

      I’m from hell mwahhahahahahaaaahaaa 😜I’m Dutch

    • @Ayush_Story_Tv
      @Ayush_Story_Tv 6 หลายเดือนก่อน +3

      @@CallousCoder Brother, you are very good, keep it up, your channel will definitely grow.

  • @tww5773
    @tww5773 7 หลายเดือนก่อน +2

    For curiosity you could try this one and would be cool to see if the C128 could cut the speed in half:
    .const ZP = $fb
    :BasicUpstart2(Main)
    Main:
    sei // Disable interrupts (CIA IRQ steals R-Time)
    lda #$0b
    sta $d011 // Turn off screen to remove badlines
    // inc $d030 // Make use of that untapped C128 power (Disabled for now)
    lda #$00
    sta ZP
    sta ZP + 1
    sta ZP + 2
    sta ZP + 3 // Reset counter/number to zero

    Loop:
    inc ZP + 3
    bne Loop
    inc ZP + 2
    bne Loop
    inc ZP + 1
    bne Loop
    inc ZP + 0
    bne Loop
    lda #$1b
    sta $d011 // Turn the screen back on
    // dec $d030 // Switch back to 1 MHz Mode
    cli // Reenable interrupts
    rts
    There might be some gain by relocating the code to ZP, but not immediately obvoius.
    EDIT: I also guess the CIA Timers can be chained to make a 32 bit counter so then each "inc" would take 1 cycle each :)

    • @CallousCoder
      @CallousCoder  7 หลายเดือนก่อน +1

      Great suggestions! It’s a shame I sold my C128 recently 🙄I had no nostalgia for it but now it would’ve been nice to have it. But luckily there are emulators 😉

    • @CallousCoder
      @CallousCoder  7 หลายเดือนก่อน +1

      This should be faster for sure as we just do an inc

    • @CallousCoder
      @CallousCoder  7 หลายเดือนก่อน +1

      Oh wow, I thought only fd and fe were the two only available ZP addresses. I saw your code and looked up the zero page memory address but there are indeed 4! I totally did not recall that :D

    • @CallousCoder
      @CallousCoder  7 หลายเดือนก่อน +1

      I added your inc code to the repo too (without disabling the CIA) and running it currently. Should be faster for sure, not only the ZP but also using the inc alone.

    • @tww5773
      @tww5773 7 หลายเดือนก่อน

      @@CallousCoder For science! 😁