Mit dem Basic Boss, unter Nutzung der Direktiven, habe ich eine TIME: 1.26 secs. erreicht. Schneller geht nur noch selbstgeschriebener, optimierter Assemblercode. :-)
I'm sorry I wrote in German. I only noticed a little later that this was an English-language TH-cam channel. With the Basic Boss, using the directives, I have a TIME: 1.26 secs. reached. The only thing that can be done faster is self-written, optimized assembly code. :-)
Not a sorting routine, but here's a video from "8-Bit Show & Tell" where he optimizes a one line program in ever quicker stages on the C64. It really shows how useful optimizing your code can be: th-cam.com/video/B-Cky_2l11U/w-d-xo.htmlsi=fkxZ0IT7hw2E0Qzj A good watch if you're into this sort of thing.
I bought this BITD too, never bothered with the type-in from Gazette. I never got more than a 3x increase using the Abacus product on the Rectan 3-D surface plotting program by Tim Colvin, from Compute May 84. A lot of SIN, COS and float->integer conversions it clearly taxed the capability of the compiler which still had to call the transcendental and floating point math functions from the ROM.
on a 386SX PC the Sin-Function was also slow. So i put the amount of every degree in a table. so it was only a memory transaction, which gave a huge speed up.
A fraction of a second (I created an ML version of this demo), it's what you see running in the beginning albeit with a swap counter slowing it down so you can see it working.
Some things come to my mind about your ML example. Did that use direct access to the screen buffer, or did it go via the kernel print? Maybe the basic code did the same. Because I suppose the printing itself was included in the benchmark. Then I also wonder if it wouldn't be possible to create a compiler that actually translate everything into pure assembly, I mean even using an optimized library for everything that otherwise is called in the basic rom. Because I suppose every basic command is still being processed in the basic rom, only the translation is left out?
Good question. The BASIC program is POKE/PEEKing screen RAM directly, so it follows that the compiled programs should also. I disassembled the compiled program and found this bit of code 213e: a9 00 lda #$00 ; screen RAM low 2140: 85 47 sta $47 2142: a9 04 lda #$04 ; screen RAM high 2144: 85 48 sta $48 2146: a5 47 lda $47 2148: 8d 3c 03 sta $033c The screen address is being stored in zero page and other code is reading/writing to it, so I'm going to say yes it does manipulate screen RAM directly. It's difficult to follow, as the compiled programs have a "runtime" library attached to them. There is a link in the description where you can download the files, if you want to experiment. Yes it would be cool if a compiler could produce a pure assembly program with no runtime library. I do have hand optimized machine language versions of various sort routines which will run in fractions of a second. It would make for a short video in speed tests :)
I think switching to C would be a better way to gain more speed... C is'nt that hard to learn if you have already mastered BASIC and for example PASCAL. Statically pre-defined types like those in C/C++ are mostly behind the massive speed differences... I wrote in BASIC starting at ~6 years old on the C64, switched to Turbo Pascal 3.0 and higher on the PC when I was around 10 years old, and at 11 years old my stepdad gave me a copy of Turbo C and a big book that teaches C to beginners, and after a few weeks of reading the book and learning C from it, I could write the same programs I did in BASIC and Pascal. The things that were the hardest to get used to coming from BASIC & Pascal were pointers, the header files/libraries (even for simple things like math, strings and IO) and dynamically allocating memory with malloc() etc... I wrote a vectorial drawing program (arrow key controlled cursor, add lines, boxes, circles, flood fills etc... as well as zooming and panning) in Turbo Pascal when I was approx 10-11 years old, with a bit of help from my stepdad, and after switching to C I rewrote it in C and the speed difference was astronomical. I've no experience with C on the C64 though, so I could'nt say if one of the few C compilers for the C64 used the Kernel ROM routines or not (probably not, maybe they have some but not all mappings from the standard C library to C64 Kernal routines)...
@@HPPalmtopTube A compiled C program will be faster than a interpreted BASIC program. The same might apply to a compiled BASIC program because of the overhead and the limits the language itself gives and which might probably not be sorted out by the compiler. But you won't beat a program written directly in assembly with a C version on such machines, because assembly allows still more freedom and this machine is quite limited for a C compiler, thus you probably won't even get a good optimizing compiler for such a machine. In the PC world, it even took until around 1994 for Microsoft, for example, to have a 32 Bit C compiler that could optimize reasonably well. The proof of this is the performance improvement of Windows NT 3.5 compared to Windows NT 3.1, which was mainly achieved through a significantly better C compiler.
@@OpenGL4ever i'm a C/C++ programmer for a living so it's a bit redundant explaining all this to me... (I wrote OctaneRender before I sold it) Did'nt know about the NT 3.1/3.5 compiler related speed though...
@@HPPalmtopTube Okay, in that case everything should be clear. Unfortunately, on the Internet you can't tell from the nickname whether the other person knows all this, which is why I wrote the above.
@@TheStuffMade ou you about more efficient basic routine for quick sort. It so minor moment I don't even remember it)) O notation ignores constant and compares only asymptotic behaviour of agorythm. But on small number of elements sometimes O(n²) algorithm can beat O(nlogn) algorithm . So it is perfectly reasonable to ask for more performant routine even if it has similar asymptotic complexity.
Quicksort can devolve into O(n²) performance with a pathological dataset. If you're not careful about picking your pivot value, an already sorted list is a pathological dataset.
An ML version takes just fraction of a second. I created an ML version of this demo, it's what you see running in the beginning albeit with a swap counter slowing it down so you can see it working. 1.5 seconds is pretty damn good for the compiler
When I read "compilers speed tests" I nearly spat my coffee on my keysboard!
Thanks for watching, and I'm sorry for the mess.
Mit dem Basic Boss, unter Nutzung der Direktiven, habe ich eine TIME: 1.26 secs. erreicht.
Schneller geht nur noch selbstgeschriebener, optimierter Assemblercode. :-)
I'm sorry I wrote in German. I only noticed a little later that this was an English-language TH-cam channel.
With the Basic Boss, using the directives, I have a TIME: 1.26 secs. reached.
The only thing that can be done faster is self-written, optimized assembly code. :-)
I did do a hand optimized ML version: th-cam.com/video/CV7CQZppeyE/w-d-xo.html
Excellent demonstration! 👍
"Thanks for watching"😉 "and be careful out there'😈
ok i hit like!
😂
Not a sorting routine, but here's a video from "8-Bit Show & Tell" where he optimizes a one line program in ever quicker stages on the C64. It really shows how useful optimizing your code can be: th-cam.com/video/B-Cky_2l11U/w-d-xo.htmlsi=fkxZ0IT7hw2E0Qzj A good watch if you're into this sort of thing.
It would be interesting to see a comparison of hand written assembly qsort routines.
Here it is: th-cam.com/video/CV7CQZppeyE/w-d-xo.html
I bought this BITD too, never bothered with the type-in from Gazette. I never got more than a 3x increase using the Abacus product on the Rectan 3-D surface plotting program by Tim Colvin, from Compute May 84. A lot of SIN, COS and float->integer conversions it clearly taxed the capability of the compiler which still had to call the transcendental and floating point math functions from the ROM.
on a 386SX PC the Sin-Function was also slow. So i put the amount of every degree in a table. so it was only a memory transaction, which gave a huge speed up.
Indeed @@RazzFazz-Race when it doubt look it up. One place where the (addr), y mode in assembly really puts the 6502 in its best position. Cheers.
wow I only knew austro speed basic compiler, never would have expected such speeds on 256 numbers not even from compiled basic
Oh God I'm not sober enough for this
Wonder how quick the C64 would be with this sort programmed entirely in 6510 code.
A fraction of a second (I created an ML version of this demo), it's what you see running in the beginning albeit with a swap counter slowing it down so you can see it working.
@@retrooldguy9880 Cool! but how fast was it?!
1/10 of a second or less depending on the randomness of the data.
Pure ML version: th-cam.com/video/CV7CQZppeyE/w-d-xo.html
You should have used the Sprint version for the benchmark. It didn't need to do any expression parsing, while the original did.
there's no option in advanced options to return to the main menu lol- return is it. trial and error
Yes, RETURN will take you back to the previous menu
Well quick sort is great for not completely random lists, but for completely random bubble sort tends to be faster.
Some things come to my mind about your ML example. Did that use direct access to the screen buffer, or did it go via the kernel print? Maybe the basic code did the same. Because I suppose the printing itself was included in the benchmark. Then I also wonder if it wouldn't be possible to create a compiler that actually translate everything into pure assembly, I mean even using an optimized library for everything that otherwise is called in the basic rom. Because I suppose every basic command is still being processed in the basic rom, only the translation is left out?
Good question. The BASIC program is POKE/PEEKing screen RAM directly, so it follows that the compiled programs should also. I disassembled the compiled program and found this bit of code
213e: a9 00 lda #$00 ; screen RAM low
2140: 85 47 sta $47
2142: a9 04 lda #$04 ; screen RAM high
2144: 85 48 sta $48
2146: a5 47 lda $47
2148: 8d 3c 03 sta $033c
The screen address is being stored in zero page and other code is reading/writing to it, so I'm going to say yes it does manipulate screen RAM directly. It's difficult to follow, as the compiled programs have a "runtime" library attached to them. There is a link in the description where you can download the files, if you want to experiment.
Yes it would be cool if a compiler could produce a pure assembly program with no runtime library. I do have hand optimized machine language versions of various sort routines which will run in fractions of a second. It would make for a short video in speed tests :)
I think switching to C would be a better way to gain more speed... C is'nt that hard to learn if you have already mastered BASIC and for example PASCAL.
Statically pre-defined types like those in C/C++ are mostly behind the massive speed differences...
I wrote in BASIC starting at ~6 years old on the C64, switched to Turbo Pascal 3.0 and higher on the PC when I was around 10 years old, and at 11 years old my stepdad gave me a copy of Turbo C and a big book that teaches C to beginners, and after a few weeks of reading the book and learning C from it, I could write the same programs I did in BASIC and Pascal.
The things that were the hardest to get used to coming from BASIC & Pascal were pointers, the header files/libraries (even for simple things like math, strings and IO) and dynamically allocating memory with malloc() etc...
I wrote a vectorial drawing program (arrow key controlled cursor, add lines, boxes, circles, flood fills etc... as well as zooming and panning) in Turbo Pascal when I was approx 10-11 years old, with a bit of help from my stepdad, and after switching to C I rewrote it in C and the speed difference was astronomical.
I've no experience with C on the C64 though, so I could'nt say if one of the few C compilers for the C64 used the Kernel ROM routines or not (probably not, maybe they have some but not all mappings from the standard C library to C64 Kernal routines)...
@@HPPalmtopTube A compiled C program will be faster than a interpreted BASIC program. The same might apply to a compiled BASIC program because of the overhead and the limits the language itself gives and which might probably not be sorted out by the compiler. But you won't beat a program written directly in assembly with a C version on such machines, because assembly allows still more freedom and this machine is quite limited for a C compiler, thus you probably won't even get a good optimizing compiler for such a machine.
In the PC world, it even took until around 1994 for Microsoft, for example, to have a 32 Bit C compiler that could optimize reasonably well. The proof of this is the performance improvement of Windows NT 3.5 compared to Windows NT 3.1, which was mainly achieved through a significantly better C compiler.
@@OpenGL4ever i'm a C/C++ programmer for a living so it's a bit redundant explaining all this to me... (I wrote OctaneRender before I sold it) Did'nt know about the NT 3.1/3.5 compiler related speed though...
@@HPPalmtopTube Okay, in that case everything should be clear. Unfortunately, on the Internet you can't tell from the nickname whether the other person knows all this, which is why I wrote the above.
Quick Sort will always be O(n log n) no matter how much you optimize it.
That test is not about optimising qsort. Qsort just a simple task to test optimising compilers. Why do you make your comment?
@@oberguga You didn't watch the video? Like around 13:58
@@TheStuffMade ou you about more efficient basic routine for quick sort. It so minor moment I don't even remember it))
O notation ignores constant and compares only asymptotic behaviour of agorythm.
But on small number of elements sometimes O(n²) algorithm can beat O(nlogn) algorithm . So it is perfectly reasonable to ask for more performant routine even if it has similar asymptotic complexity.
@@oberguga That sounds like a lot of cope, maybe you just want to apologize for your rude replay and then we can part ways.
Quicksort can devolve into O(n²) performance with a pathological dataset. If you're not careful about picking your pivot value, an already sorted list is a pathological dataset.
How would the result co pare to a pure machinecode program?
An ML version takes just fraction of a second. I created an ML version of this demo, it's what you see running in the beginning albeit with a swap counter slowing it down so you can see it working. 1.5 seconds is pretty damn good for the compiler
@@retrooldguy9880 very cool. 👍
Pure ML version: th-cam.com/video/CV7CQZppeyE/w-d-xo.html
those first two file names @9:05 are bs