I liked how you put Rust before Odin in the resulting graph, even though its implementation is slightly faster. Fair, we all know the pain of fighting with the Rust compiler)
Rust doesn't actually put the const array on the stack. Instead, it embeds it to binary directly, and loads just the value you want. You have probably somewhere copied it, which did actually put it on the stack. 21:35 Also, you can have mutable variables in rust just as arguments in a function: (..., mut occupancy: u64)
How would I copy a 64 x 4096 array in a function? I literally copied the code from Go into Rust and just fixed the errors. There wasn't a stack overflow in any of the other languages and the code was basically identical. I can't explain why I was getting a stack overflow at all. I even tried just this: pub fn get_rook_attacks_fast(starting_square: i32, mut occupancy: u64) -> u64 { println!("called rook attacks"); let converted_starting_square: usize = starting_square as usize; occupancy &= constants::ROOK_MASKS[converted_starting_square]; occupancy *= constants::ROOK_MAGIC_NUMBERS[converted_starting_square]; occupancy >>= 64 - constants::ROOK_REL_BITS[converted_starting_square]; let converted_occupancy: usize = occupancy as usize; println!("before return"); return constants::ROOK_ATTACKS[converted_starting_square][converted_occupancy]; } fn main() { set_starting_position(); print_board(); println!("after board"); let rook_attacks: u64 = get_rook_attacks_fast(0, 0); } Output: Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.81s Running `...rust\ChessEngine\main\target\debug\main.exe` Board: BR BN BB BQ BK BB BN BR BP BP BP BP BP BP BP BP __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ WP WP WP WP WP WP WP WP WR WN WB WQ WK WB WN WR White to play: true Castle: true true true true ep: 65 ply: 0 after board called rook attacks before return thread 'main' has overflowed its stack I just tested just this right now and still get a stack overflow. Running in release was the only fix.
@CodingWithTom-tn7nl I believe that is somewhat related to the `Index` trait mixed with `const variable` semantics. "Const variable" semantics is almost the same as a C/C++ preprocessor `define`; everytime you use it, it is like if you inlined the value at that spot. The `Index` trait has a `index` method, that takes the implementator as a reference and returns a reference with the same lifetime. My hypothesis is that when those two semantics are used together, your code would be naively desugared like below: ``` pub fn get_rook_attacks_fast(starting_square: i32, mut occupancy: u64) -> u64 { println!("called rook attacks"); let converted_starting_square: usize = starting_square as usize; let arr1 = constants::ROOK_MASKS; let v1 = arr1.index(converted_starting_square); occupancy &= *v1; let arr2 = constants::ROOK_MAGIC_NUMBERS; let v2 = arr2.index(converted_starting_square); occupancy *= *v2; let arr3 = constants::ROOK_REL_BITS; let v3 = arr3.index(converted_starting_square); occupancy >>= 64 - *v3; let converted_occupancy: usize = occupancy as usize; println!("before return"); let arr4 = constants::ROOK_ATTACKS; let sub_arr4 = arr4.index(converted_starting_square); let v4 = sub_arr4.index(converted_occupancy); return *v4; } ``` To make sure my hypothesis are correct, one would need to give a look into the HIR and/or MIR to check the desugaring. I believe with optimization level 1, LLVM would already sees this pattern and modify it to put the data in .rodata and use that address instead. IIRC I once did a similar pattern manually and saw the data being transformed to a rodata address with either O1 or O2 in the final assembly. Semantically, to make sure a data will always be at the .rodata section, you use immutable static variables (just changing the `const` keyword for `static`), that always have that semantics. I believe this could be the first thing to try. Also, chatgpt hallucinated, the main thread stack size is determined by the OS linker alone, and should not change since the OS linker has no information of the level of optimization of the code when called. Rust even has somewhere documented about expliciting not trying to change in any way the main thread stack size. The only way to change the default stack size is the user of the compiler passing link arguments to change that size. I hope this insigt helps you if you decide to investigate a little bit more on this! 😄 EDIT: About the module system: I love rust module system and how it deals with item privacy. I got it right aways when I was learning it (but maybe that is related to my brain bad wiring, there is quite a lot of things most people just get it and for me is so hard to get and the opposite as well, maybe that is one of those things LOL, it was a recent diagnosis so, bit-by-bit, a few weird thing are starting to make sense to me 😆)
The reason were cpp is faster because you designed it's algorithm with cpp way. Rust could be way more faster than that if you just use enums and trait system rather than that massive numbers. also in 6:25, you were make cringe the clean code missionaries :D
I really enjoyed this video, as it introduced me to several aspects of different programming languages that I hadn't been aware of before (e.g. that Odin is like Go). However, I noticed a few points regarding Zig where you made things a bit more complicated than necessary. For instance, while Zig defaults to usize, which matches the native pointer size, you didn't actually use it in your example. I couldn't quite understand why you converted the arrays away from u64 to u6 or u8 instead of keeping them as u64. This would have saved you the additional type conversions. It's great and absolutely correct that you explicitly handle integer overflow-after all, how should the compiler know your intent? If it were left to guess, you'd risk undefined behavior. That said, declarations like "var mutableOccupancy: u64 = occupancy;" are unnecessary since the compiler already knows that occupancy is a u64. Simply writing "var mutableOccupancy = occupancy;" is sufficient and avoids extra verbosity. These are just small details, of course, but they do add unnecessary writing (and thinking) effort. One thing I’d be curious to know: which compiler options did you use to compile your code?
The DEBRUJN array is originally an int array, so an i32 in Zig. I wrote the Zig code many months back before this video. The compiler didn't like it if I used anything other than a u6, as the arrays have 64 elements, and I didn't think of using a Usize at the time. The compiler options were fairly obvious. I tried to go for the most optimised possible. -With Zig it was just -O ReleaseFast. -Odin was with -o3 for maximum optimisation. -Rust Release build with opt-level = 3 -Go has no choices, as far as I'm aware -C# and CPP just Release build with 64 bit -Python, no options
I love your language choices my first language was C# so I love seeing people use it but at the same time I’m a simp for performance so it’s always fun to check out the lower level languages. I’ve been using rust and go but I need to check out zig and Odin both seem solid
It was my first language too and usually prefer to write in it. Odin is basically the same as Go. The package system is the same, semicolons are optional, no brackets for Ifs and For loops, no while loops etc. It just has manual memory management instead. Zig is similar in ways to Rust. Strict type system, usize for indexes, array bounds checking, strict compilers etc. Zig, however, doesn't use any macros at all and tries to be explicit with everything it does. There is no borrow checker or lifetimes and you need to manually manage heap memory but have a range of allocators.
As for rust.. it's shockingly difficult to use. I converted a C++ prime sieve to rust, just to compare speed. It took me an hour to get the C++ version working, but it was way longer to just port to rust, code that already works. And the rust version ran slower.
Skill issue. You can not just use it. You have to learn it first. Rust is not C++ after all. I came from C and C#. C++ is hardly easier to use than Rust. Rust is almost C# grade provided you can beat the borrow checker. And apparently I did because it does not fly into my face all the time. Only once in a while and is easily fixed.
@@techpriest4787 We do not write in plain ASM because of "skill issues". It's a non-argument. The reason to use Rust it's supposed to be because it's marketed as easier to get something working, not harder than C/C++ 😂
Making essentially the same code run on different languages is not "coding a chess engine in 7 languages" like the title says it is. Every language has its quirks that you need to keep in your mind and design your project accordingly. I dont think these benchmarks mean something.
Fun video! It seems you got frustrating with some of the languages 😂. If your curious rust module system is pretty inspired by functional programming functors! Its original implementation was in Ocaml, and theres alot of similarities. Dont let syntax get you in the way of langauges! Its all about the semantics dont let the quirks get you down or you will miss out. Got a giggle out of adding curly braces to make python more readable 😅
Syntax is very important. Makes or breaks an experience. With how many languages we have these days, we don't have to stick with an unsightly language, like Python.
C# was my first language through Unity and it's always the language I prefer to write in. The performance isn't the best but you don't need it for 90% of tasks. The easiest were Go and Odin to get working. They are very similar languages and most of the time you can just start building stuff quickly without having to fight the compiler or learn 100s of mechanics and features. CPP was actually pretty easy and that's just because I write it in a very simple way. I write it like C but with vectors and std::cout and avoid 90% of its features. Writing it in Rust was an experience I would rather forget.
@CodingWithTom-tn7nl I'm really interested in Odin cause I did some Go in the past and loved the simplicity. For GC languages I'd go with Go and for non-GC I'd stick to Odin to replace C or Cpp. I have seen Nim code and it looks beautiful and it's compiled unlike Python.
can you try compile your Python code using Codon and retest the benchmark? It should be just behind Rust. I've been having massive speedups using this method.
Unfortunately I'm on windows and there is no one on the github who was able to build it on windows. In theory I could run this on a virtual machine but I doubt this would be a fair test doing that.
I misspoke at the end. I mixed up the reason with Javascript for Nim. Javascript and Java would require the BigInteger type to use u64. Nim has uint64 but requires me to put UL after every constant value and remove all of the tabs. With a 64 x 4096 array this is too tedious, but maybe I could get copilot to do it.
I had a small difference in one part of the CPP code which I didn't think would affect results at all. In the CPP code I updated the occupancies with every move individually, which made debugging so difficult. In the other languages I simply rebuilt the occupancies each move with bitwise OR. This was for simplicity of bug finding. I retested the CPP code with that change and: Before - 347ms After - 382ms That still doesn't account for the other 100ms to get to Zig performance. I might make a part 2 of this video to test multiple versions of all of the languages to make it as fair as possible. Using global variables vs using a board struct. Update occupancies with each move vs rebuild them each time. I could also try to get Nim and other languages to work. After 3 days of trying to get Rust to work, I didn't want to work on other languages or test multiple variations.
@CodingWithTom-tn7nl did you do ReleaseFast or ReleaseSafe for zig ? if thats not it I excpect its either the compiler couldnt vectorize some loop because of a small difference in the code or maybe more cache misses in a very hot part of the code ?
@CodingWithTom-tn7nl god dammit I made multiple answers but I think they didn't get through. Ok so first: I would really appreciate it if you made the code available, you could just throw it on a google drive or smt thanks 😊 And then for the other comment: did you use ReleaseFast or ReleaseSafe for zig ? If that's not it I expect it might be some loop not being vectorized by the compiler because of some small difference in the code. Either way in theory with zig you shouldn't have any difference for performance compared to CPP. Rust and Odin have more overhead with their designs (bounds checking and all of that) but I also would expect them to be closer to CPP then they are in your benchmark (maybe). If you make the code available (pretty please 🥺) I'll check with perf and a flamegraph.
After watching the video and searching for "Nim chess engine" I found 2 "superhuman" engines that would usually defeat Magnus Carlsen: Heimdall and Nalwald.
I would probably choose Nim for the performance improvements and because I always prefer strongly typed languages. The main reason I didn't include Nim was that you need to write 'u64' after every value in constant arrays. It also doesn't allows 'tabs' and basically all of the constants have tabs in them. Having to add u64 after every value and removing all the tabs when you have a 64 x 4096 array is not worth the effort.
@@zoltankurti A Ukrainian guy has a channel called "Chess Programming". He covers a lot of stuff including bitboards. I can't post links. YT deleting :(
@@toby9999 thank you. I've already seen that channel, it was my entry to chess programming. Unfortunately last time I checked he was uploading videos where he is using engines he didn't even write to play against humans in go.
I liked how you put Rust before Odin in the resulting graph, even though its implementation is slightly faster. Fair, we all know the pain of fighting with the Rust compiler)
Rust doesn't actually put the const array on the stack. Instead, it embeds it to binary directly, and loads just the value you want. You have probably somewhere copied it, which did actually put it on the stack.
21:35 Also, you can have mutable variables in rust just as arguments in a function:
(..., mut occupancy: u64)
How would I copy a 64 x 4096 array in a function? I literally copied the code from Go into Rust and just fixed the errors. There wasn't a stack overflow in any of the other languages and the code was basically identical.
I can't explain why I was getting a stack overflow at all. I even tried just this:
pub fn get_rook_attacks_fast(starting_square: i32, mut occupancy: u64) -> u64 {
println!("called rook attacks");
let converted_starting_square: usize = starting_square as usize;
occupancy &= constants::ROOK_MASKS[converted_starting_square];
occupancy *= constants::ROOK_MAGIC_NUMBERS[converted_starting_square];
occupancy >>= 64 - constants::ROOK_REL_BITS[converted_starting_square];
let converted_occupancy: usize = occupancy as usize;
println!("before return");
return constants::ROOK_ATTACKS[converted_starting_square][converted_occupancy];
}
fn main() {
set_starting_position();
print_board();
println!("after board");
let rook_attacks: u64 = get_rook_attacks_fast(0, 0);
}
Output:
Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.81s
Running `...rust\ChessEngine\main\target\debug\main.exe`
Board:
BR BN BB BQ BK BB BN BR
BP BP BP BP BP BP BP BP
__ __ __ __ __ __ __ __
__ __ __ __ __ __ __ __
__ __ __ __ __ __ __ __
__ __ __ __ __ __ __ __
WP WP WP WP WP WP WP WP
WR WN WB WQ WK WB WN WR
White to play: true
Castle: true true true true
ep: 65
ply: 0
after board
called rook attacks
before return
thread 'main' has overflowed its stack
I just tested just this right now and still get a stack overflow. Running in release was the only fix.
@CodingWithTom-tn7nl I've had a go at using rust out of curiosity, but decided the gain was not worth the pain. I'll stay with C++.
this is what i was gonna say
1. You can make the stack bigger
2. You can run with optimization, but still including debug symbols
@CodingWithTom-tn7nl I believe that is somewhat related to the `Index` trait mixed with `const variable` semantics. "Const variable" semantics is almost the same as a C/C++ preprocessor `define`; everytime you use it, it is like if you inlined the value at that spot. The `Index` trait has a `index` method, that takes the implementator as a reference and returns a reference with the same lifetime.
My hypothesis is that when those two semantics are used together, your code would be naively desugared like below:
```
pub fn get_rook_attacks_fast(starting_square: i32, mut occupancy: u64) -> u64 {
println!("called rook attacks");
let converted_starting_square: usize = starting_square as usize;
let arr1 = constants::ROOK_MASKS;
let v1 = arr1.index(converted_starting_square);
occupancy &= *v1;
let arr2 = constants::ROOK_MAGIC_NUMBERS;
let v2 = arr2.index(converted_starting_square);
occupancy *= *v2;
let arr3 = constants::ROOK_REL_BITS;
let v3 = arr3.index(converted_starting_square);
occupancy >>= 64 - *v3;
let converted_occupancy: usize = occupancy as usize;
println!("before return");
let arr4 = constants::ROOK_ATTACKS;
let sub_arr4 = arr4.index(converted_starting_square);
let v4 = sub_arr4.index(converted_occupancy);
return *v4;
}
```
To make sure my hypothesis are correct, one would need to give a look into the HIR and/or MIR to check the desugaring.
I believe with optimization level 1, LLVM would already sees this pattern and modify it to put the data in .rodata and use that address instead. IIRC I once did a similar pattern manually and saw the data being transformed to a rodata address with either O1 or O2 in the final assembly.
Semantically, to make sure a data will always be at the .rodata section, you use immutable static variables (just changing the `const` keyword for `static`), that always have that semantics. I believe this could be the first thing to try.
Also, chatgpt hallucinated, the main thread stack size is determined by the OS linker alone, and should not change since the OS linker has no information of the level of optimization of the code when called. Rust even has somewhere documented about expliciting not trying to change in any way the main thread stack size. The only way to change the default stack size is the user of the compiler passing link arguments to change that size.
I hope this insigt helps you if you decide to investigate a little bit more on this! 😄
EDIT: About the module system: I love rust module system and how it deals with item privacy. I got it right aways when I was learning it (but maybe that is related to my brain bad wiring, there is quite a lot of things most people just get it and for me is so hard to get and the opposite as well, maybe that is one of those things LOL, it was a recent diagnosis so, bit-by-bit, a few weird thing are starting to make sense to me 😆)
The reason were cpp is faster because you designed it's algorithm with cpp way. Rust could be way more faster than that if you just use enums and trait system rather than that massive numbers.
also in 6:25, you were make cringe the clean code missionaries :D
The fact that cpp is faster than rust by close to 150% is just amazing
high speed is the advantage of no guaranteed security
@@softwet4341 C++ also have security with smart pointers, memory sanitizers, good coding practices. And also 150% efficiency on top
I really enjoyed this video, as it introduced me to several aspects of different programming languages that I hadn't been aware of before (e.g. that Odin is like Go). However, I noticed a few points regarding Zig where you made things a bit more complicated than necessary. For instance, while Zig defaults to usize, which matches the native pointer size, you didn't actually use it in your example. I couldn't quite understand why you converted the arrays away from u64 to u6 or u8 instead of keeping them as u64. This would have saved you the additional type conversions.
It's great and absolutely correct that you explicitly handle integer overflow-after all, how should the compiler know your intent? If it were left to guess, you'd risk undefined behavior. That said, declarations like "var mutableOccupancy: u64 = occupancy;" are unnecessary since the compiler already knows that occupancy is a u64. Simply writing "var mutableOccupancy = occupancy;" is sufficient and avoids extra verbosity.
These are just small details, of course, but they do add unnecessary writing (and thinking) effort. One thing I’d be curious to know: which compiler options did you use to compile your code?
The DEBRUJN array is originally an int array, so an i32 in Zig. I wrote the Zig code many months back before this video. The compiler didn't like it if I used anything other than a u6, as the arrays have 64 elements, and I didn't think of using a Usize at the time.
The compiler options were fairly obvious. I tried to go for the most optimised possible.
-With Zig it was just -O ReleaseFast.
-Odin was with -o3 for maximum optimisation.
-Rust Release build with opt-level = 3
-Go has no choices, as far as I'm aware
-C# and CPP just Release build with 64 bit
-Python, no options
I will do in common lisp in the future
I love your language choices my first language was C# so I love seeing people use it but at the same time I’m a simp for performance so it’s always fun to check out the lower level languages. I’ve been using rust and go but I need to check out zig and Odin both seem solid
It was my first language too and usually prefer to write in it.
Odin is basically the same as Go. The package system is the same, semicolons are optional, no brackets for Ifs and For loops, no while loops etc. It just has manual memory management instead.
Zig is similar in ways to Rust. Strict type system, usize for indexes, array bounds checking, strict compilers etc. Zig, however, doesn't use any macros at all and tries to be explicit with everything it does. There is no borrow checker or lifetimes and you need to manually manage heap memory but have a range of allocators.
Did not try Fortran? The modern version is worth checking out!
I might have to look into Modern Fortran then
As for rust.. it's shockingly difficult to use. I converted a C++ prime sieve to rust, just to compare speed. It took me an hour to get the C++ version working, but it was way longer to just port to rust, code that already works. And the rust version ran slower.
This is why you don't port to Rust, but write in the idiomatic way
Skill issue. You can not just use it. You have to learn it first. Rust is not C++ after all. I came from C and C#. C++ is hardly easier to use than Rust. Rust is almost C# grade provided you can beat the borrow checker. And apparently I did because it does not fly into my face all the time. Only once in a while and is easily fixed.
@@techpriest4787 We do not write in plain ASM because of "skill issues". It's a non-argument. The reason to use Rust it's supposed to be because it's marketed as easier to get something working, not harder than C/C++ 😂
Making essentially the same code run on different languages is not "coding a chess engine in 7 languages" like the title says it is. Every language has its quirks that you need to keep in your mind and design your project accordingly. I dont think these benchmarks mean something.
you could put mut in the parameter of the function for occupancy in Rust version, no need for a local variable declaration
Fun video!
It seems you got frustrating with some of the languages 😂.
If your curious rust module system is pretty inspired by functional programming functors! Its original implementation was in Ocaml, and theres alot of similarities.
Dont let syntax get you in the way of langauges! Its all about the semantics dont let the quirks get you down or you will miss out.
Got a giggle out of adding curly braces to make python more readable 😅
Syntax is very important. Makes or breaks an experience. With how many languages we have these days, we don't have to stick with an unsightly language, like Python.
So, what was your favorite and which one would you choose to have performance while also enjoying the lang?
C# was my first language through Unity and it's always the language I prefer to write in. The performance isn't the best but you don't need it for 90% of tasks.
The easiest were Go and Odin to get working. They are very similar languages and most of the time you can just start building stuff quickly without having to fight the compiler or learn 100s of mechanics and features.
CPP was actually pretty easy and that's just because I write it in a very simple way. I write it like C but with vectors and std::cout and avoid 90% of its features.
Writing it in Rust was an experience I would rather forget.
@CodingWithTom-tn7nl I'm really interested in Odin cause I did some Go in the past and loved the simplicity. For GC languages I'd go with Go and for non-GC I'd stick to Odin to replace C or Cpp.
I have seen Nim code and it looks beautiful and it's compiled unlike Python.
can you try compile your Python code using Codon and retest the benchmark? It should be just behind Rust. I've been having massive speedups using this method.
Unfortunately I'm on windows and there is no one on the github who was able to build it on windows. In theory I could run this on a virtual machine but I doubt this would be a fair test doing that.
Not sure what did you mean by Nim does not have unsigned 64 integer, because it clearly has, its called uint64.
I misspoke at the end. I mixed up the reason with Javascript for Nim. Javascript and Java would require the BigInteger type to use u64. Nim has uint64 but requires me to put UL after every constant value and remove all of the tabs. With a 64 x 4096 array this is too tedious, but maybe I could get copilot to do it.
Really nice video
but would love to go though the code myself and see why cpp is much faster than the others! Do you have it available anywhere?
I can make it available but it's quite long with all of the massive constants.
@CodingWithTom-tn7nl You could just put it on a google drive (pleeeaaase 🥺)
I had a small difference in one part of the CPP code which I didn't think would affect results at all. In the CPP code I updated the occupancies with every move individually, which made debugging so difficult. In the other languages I simply rebuilt the occupancies each move with bitwise OR. This was for simplicity of bug finding.
I retested the CPP code with that change and:
Before - 347ms
After - 382ms
That still doesn't account for the other 100ms to get to Zig performance.
I might make a part 2 of this video to test multiple versions of all of the languages to make it as fair as possible. Using global variables vs using a board struct. Update occupancies with each move vs rebuild them each time.
I could also try to get Nim and other languages to work. After 3 days of trying to get Rust to work, I didn't want to work on other languages or test multiple variations.
@CodingWithTom-tn7nl did you do ReleaseFast or ReleaseSafe for zig ?
if thats not it I excpect its either the compiler couldnt vectorize some loop because of a small difference in the code or maybe more cache misses in a very hot part of the code ?
@CodingWithTom-tn7nl god dammit I made multiple answers but I think they didn't get through.
Ok so first: I would really appreciate it if you made the code available, you could just throw it on a google drive or smt thanks 😊
And then for the other comment: did you use ReleaseFast or ReleaseSafe for zig ?
If that's not it I expect it might be some loop not being vectorized by the compiler because of some small difference in the code.
Either way in theory with zig you shouldn't have any difference for performance compared to CPP. Rust and Odin have more overhead with their designs (bounds checking and all of that) but I also would expect them to be closer to CPP then they are in your benchmark (maybe).
If you make the code available (pretty please 🥺) I'll check with perf and a flamegraph.
Question, if you had to choose between Python and Nim which one and why?
After watching the video and searching for "Nim chess engine" I found 2 "superhuman" engines that would usually defeat Magnus Carlsen: Heimdall and Nalwald.
I would probably choose Nim for the performance improvements and because I always prefer strongly typed languages.
The main reason I didn't include Nim was that you need to write 'u64' after every value in constant arrays. It also doesn't allows 'tabs' and basically all of the constants have tabs in them. Having to add u64 after every value and removing all the tabs when you have a 64 x 4096 array is not worth the effort.
I was looking for real chess engine programming videos. Very disappointing video. Nowhere near a chess engine. This is a move generator.
Would be a fine video if I didn't expect something very different.
@@zoltankurti A Ukrainian guy has a channel called "Chess Programming". He covers a lot of stuff including bitboards. I can't post links. YT deleting :(
@@toby9999 thank you. I've already seen that channel, it was my entry to chess programming. Unfortunately last time I checked he was uploading videos where he is using engines he didn't even write to play against humans in go.
@@zoltankurti Yeah, I notice that.
@@toby9999 It's already in the description