After watching so much "clean code" and "good standards" and "being production ready" in youtube, this feels refreshing. A down to earth coding session for a fun project.
2:35 This does work, however `&&` in bash will only execute the second command (your echo) if the exit code of the first program is `0`. In your case the exist code is `2`. For this use case, you want to use a semicolon (`;`) instead of `&&` to chain the commands, to have your `echo` run regardless of the first programs exit code.
To be honest this is a great video. I have a ton of experience from software development but I never tried to make my own compiler. This guy just goes at it. "it's just another program" love the mentality that nothing is too hard to do.
In your Peek() functions, you're not using the 'ahead' value when retrieving the character/token using the .at() function. This could lead to some problems down the road, if it isn't already. Also, afaik from previous attempts at doing this sort of thing, it's a good practice to make sure there's a new-line at the end of the file when you load it (or just add one to the loaded text) so your tokenizer doesn't miss the final token. Could be worth looking into. Loving these videos though!
Also, the ahead value should default to 0, to coreect behaviour to what is currently expected with regards to the return value if not none. Also, the > should be a >= when comparing the index+ahead with the length, again to preserve correct behaviour.
I love these vid. i just finished the last one. it was the perfect video to watch after taking the final to computer system and architecture. biggest part of our final project was writing an assebler, so your vid felt perfect to watch next
The fun to come in parsing is that infix/prefix expressions (why would anyone do postfix in current year?) is done via pratt parsing while statements are parsed via recursive descent or GLR. Wish you good luck and don't let the modern C++ features bring you down!
Loving these videos, as someone who enjoys tinkering and making their own strange languages it's fun! I would say, do you not think for the nodes instead of prefixing with Node it would look better post-fixed with Node? Such as ExitNode, ExprNode. Also, exit in C is defined in cstdlib and is a function (that the compiler does funny things to when it is using due to there being no way to actually express an exit). Maybe lexing and parsing a function call would be better in the long run here. FnCallNode could contain a vector and then it scales a bit. Then you can just hardcode checks for the function name for now to mess. Anyway, good video bro keep em coming
What do you mean "then it scales a bit"? No doubt that having a type for bulletin functions that take in a vararg of "values" and returns a single "value" result makes so much more sense than these raw keywords pointing to an ExitNode or SysCallNode.
@@SimGunther Well because they have just begun, I thought that having baseline functionality for functions would make life easier in the long run. Just a simple construct that takes arguments and optionally contains a return value. Then it scales because they can implement more functions without much boilerplate.
@@certidailyfacts That's a similar train of thought I had for that construct. There's a call expression I evaluate to see if the function name belongs to a builtin/intrinsic before calling it as a regular function and not an evaluation of the vector of statements with an environment localized to the non builtin function.
2:40 possibly the main issue why it did nothing is that && operator executes the next command when the left command ends successfuly (exit code 0) but you are using all sorts of numbers to test it, which however results in basically error as far as the shell is concerned so it does not call echo at all.
Nice video:) never tried doing something like this but might just try myself sometime. Some small things I noticed; Modern C++ prefers stringviews over passing const string references. This prevents any copies being made. And since this is a video about writing a compiler… splitting your header files and impementation (CLion can do that quickly for you) can speed up your compilation of your compiler dramatically 😛
You can define the functions without inline. all inline does is tell the compiler to inline the functions contents directly into the place where you call it.
The only problem will appear if you compile separate object files that has the same thing defined. Use a single cpp or include them like this. You could even error out if the same cpp file is included twice. Avoid header files as they are there just for the problems we want to avoid - multiple compilation units that share code. 😅 A better more suitable keyword would be "static" which makes the function local to the compile unit. I think. It has been ages since I touched C/C++
inline keyword actually has very little to no affect on the compilers decision to inline for the major compilers. It's more for allowing multiple definitions without violating ODR, therefore allowing implementations of non-template functions in headers. It also has novel uses in static variable initialization
commenting second part lessgo didn't found any other sussy timecodes 44:20 malloc & free 29:40 lisp langs looks like perfect AST example, they even don't need much explanation: (display (+ 1 (* 69 (- 10 2)))) 11:50 void type exists for no return 10:43 yay my sugesstion from part 1 was mentioned 8:40 you can return '\0' char
Lol "an abstract syntax snake". Great video, but If one is serious about wanting to learn to write programming languages I recommend learning to compile to C or LLVM. Both of those are cross platform and will generate fast code. Assembly is cool, but really niche at this point and creates countless platform headaches. Btw for those who are scared of C++ one can still write a compiler in python or javascript, and when that is done one can make the compiler self-compile. The best languages IMO for doing this is F# or Rust both have advanced pattern matching and great debugging and testing frameworks.
Had a project to make a language in Java while in school, so this is going to be interesting to follow to see what choices you make! btw, "./out ; echo $?" should work for a one-liner. it'll run the first command before the second rather than together.
Do all high level languages generate Parse Trees? I once was in a group project writing a compiler for Prolog in Haskell and we also had to implement sld-resolution, so thats why i wonder?
./out $$ echo $? fails because the return code of ./out is non zero. (It doesn't execute following commands if the command fails) What I tend to do is to put them in a bash script and just run the bash script.
Why has no one pointed out that the off by one in peek was because you're checking one ahead for end of string/vector while actually peeking the current character/token. The amount to peek should default to 0 and you should return the character/token of index+amount. Now the amount does nothing and your peek is only accidentally complete when you check if peek+amount is larger than length/size.
so... I have been refreshing the channel page every 20 mins on average since i watched this video. just for pt. 3 I have pt. 1 & 2 done before and the explanation was good. JUST the last step is missing for me. How I get the tree to do stuff... EDIT: I do not need _optimized_ code, i need _explained_ code. This is well explained (with some very minor hiccups, : D so cute). [+ minor edits on the phrasing]
2:37 Using && only runs the right hand side if the left hand side was 0. You can run both on one line the way you want by using a semicolon instead, like this: ./out; echo $? 1:03:42 i feel like not inlining everything in headers could've avoided this :^) you'd only need to include the bare minimum in the headers themselves, and then can include everything you need in just the .cpp files
I would write my compiler in JavaScript, since then no one would have to bother with C-Make and all that other stuff. There are a lot of drawbacks to using JS, but at least it is a lot simpler to run.
@@pixeled-yt I see no realistic way of doing that if you stick with C++. And not utilize LLVM/binutils and existing STL libraries like libc++. For C compiler it's possible, but for C++, one person, not external dependencies and reasonable time? Yeah, it would not 'be cool' it would be freaking achievement of the century. (well, you can be ambitious ofc!)
@@pixeled-yt That is current situation, so for self-compilation you need either 1) change your whole source of compiler to 'new lang'. 2) make your language at least subset of C++. I was discussing the latter.
@@mapron1bootstrapping is just a lot of work. that’s how it is. If you write a minimal compiler first, that compiles a language similar in features to C then you can bootstrap pretty early in development.
Making your member functions inline when you define them directly in the class is redundant, they're already inline. And inline is not a performance related specifier, it's not about inlining, it's about making something avoid the ODR.
@@toby9999their point is that 'inline' has little to no affect on compiler inlining intrinsics for the major compilers. It's all about ODR (one definition rule - allowing multiple definitions i.e. definitions in header). And the OP is right that member functions should be implicitly 'inline' already IIRC
Great video, but if you really want to bridge the gap to the metal, why not directly emit x86 opcodes/operand bytes with a InstructionBuilder abstraction or something and construct an ELF file around that? That removes the last bit of magic imo. Also whenever your compiler stringbuilds instead of going to an IR (or in the final step machine code), you are probably doing something wrong. The only people who have a license to construct compilers THAT horible are the ML compiler people writing everything in python 🙃🙃🙃🙃
instead of repeatedly calling the "peak" function again and again inside the tokenize function. why don't you just create a local variable above the while loop to store peak's result and do all the stuff that way? I mean it's not that big of a deal but it seems more managed to me
It allows me to not have to put the declaration in a header file and implementation in a cpp file. The inline keyword allows you to include the header file (with function implementations) in multiple cpp files without the compiler complaining about a "multiple definition" error
@@pixeled-ytYou don’t need to use the inline keyword if you implement methods in a header file. It will be inline implicitly, tho if it’s actually inlined is still up to the compiler.
The sequence of includes is a higher intelligence, trying to tell you, that you could as well have a Tokenizer instance in your parser and you tokenize lazily instead of using that std::vector. As a Common Lisp fan, the classical "Dragon Books" approach to building compilers looks just wrong. If you have a homoiconic syntax, you do not need to change the grammar and the parser and lexer each time, you add a new idiom to your language. Maybe one day in the future, you will find it useful to try it the Lisp way...
After watching so much "clean code" and "good standards" and "being production ready" in youtube, this feels refreshing. A down to earth coding session for a fun project.
nice profile foto 🤣
props to him for writing code that actually does something.
So-called "clean" code is often sparse in terms of actual things being done.
this is peek content
Glad it peeked my interest.
I think the function peak has to be renamed peek at both files, since peak means the top and peek means to look ahead.
I was confused why he named it that way!!
u right
I miss the peek and poke intrinsics from C64 basic! 😅
@@henrikholst7490 that is OLD!
I wasn't born in those years of Commodore 64 and old computers but I'm fascinated by them!!
@@pixeled-yt how r u using ubuntu on windows
2:35 This does work, however `&&` in bash will only execute the second command (your echo) if the exit code of the first program is `0`. In your case the exist code is `2`. For this use case, you want to use a semicolon (`;`) instead of `&&` to chain the commands, to have your `echo` run regardless of the first programs exit code.
19:18 Bjarne once said: "There are only two kinds of languages: the ones people complain about and the ones nobody uses"
To be honest this is a great video. I have a ton of experience from software development but I never tried to make my own compiler. This guy just goes at it. "it's just another program" love the mentality that nothing is too hard to do.
I really hope you continue this series! its been so fun to stumble along with you
In your Peek() functions, you're not using the 'ahead' value when retrieving the character/token using the .at() function. This could lead to some problems down the road, if it isn't already. Also, afaik from previous attempts at doing this sort of thing, it's a good practice to make sure there's a new-line at the end of the file when you load it (or just add one to the loaded text) so your tokenizer doesn't miss the final token. Could be worth looking into. Loving these videos though!
Also, the ahead value should default to 0, to coreect behaviour to what is currently expected with regards to the return value if not none. Also, the > should be a >= when comparing the index+ahead with the length, again to preserve correct behaviour.
Yeah, I was surprised this flew under the radar but I'm guessing trying to keep a train of thought while talking is making it harder for him
This is *peak* () entertainment
This is one of the most interesting series I've seen on TH-cam. Just perfectly paced, understandable, great presentation. Thank you.
loving this series so far
This series is so much fun and so interesting. It makes me feel so smart
This Video series is a great source of learning. Thanks for making. Keep Uploading More Videos like this
Such a great series ! ❤
this is a great series, i just finished the last video, excited to watch this one
Again, good video. This can be such a good series!
yo this series is awesome. i love programming
really loving this series ! 🔥
I love these vid. i just finished the last one. it was the perfect video to watch after taking the final to computer system and architecture. biggest part of our final project was writing an assebler, so your vid felt perfect to watch next
This guy makes my adderall sleepy
Maybe the "exit" check should have a space? To be "exit " example of issue
String brexit13;
Would cause a crash.
finally a youtuber that listens. ggs
The fun to come in parsing is that infix/prefix expressions (why would anyone do postfix in current year?) is done via pratt parsing while statements are parsed via recursive descent or GLR.
Wish you good luck and don't let the modern C++ features bring you down!
th-cam.com/video/8QP2fDBIxjM/w-d-xo.html
9k views with
Hey! Nice explanation dude! This vids should become more rated!
Such a well done series, looking forward to the upcoming parts. Will you ever use the ahead parameter or your peak (peek) functions?
Loving these videos, as someone who enjoys tinkering and making their own strange languages it's fun! I would say, do you not think for the nodes instead of prefixing with Node it would look better post-fixed with Node? Such as ExitNode, ExprNode. Also, exit in C is defined in cstdlib and is a function (that the compiler does funny things to when it is using due to there being no way to actually express an exit). Maybe lexing and parsing a function call would be better in the long run here. FnCallNode could contain a vector and then it scales a bit. Then you can just hardcode checks for the function name for now to mess. Anyway, good video bro keep em coming
What do you mean "then it scales a bit"?
No doubt that having a type for bulletin functions that take in a vararg of "values" and returns a single "value" result makes so much more sense than these raw keywords pointing to an ExitNode or SysCallNode.
@@SimGunther Well because they have just begun, I thought that having baseline functionality for functions would make life easier in the long run. Just a simple construct that takes arguments and optionally contains a return value. Then it scales because they can implement more functions without much boilerplate.
@@certidailyfacts That's a similar train of thought I had for that construct. There's a call expression I evaluate to see if the function name belongs to a builtin/intrinsic before calling it as a regular function and not an evaluation of the vector of statements with an environment localized to the non builtin function.
2:40 possibly the main issue why it did nothing is that && operator executes the next command when the left command ends successfuly (exit code 0) but you are using all sorts of numbers to test it, which however results in basically error as far as the shell is concerned so it does not call echo at all.
Awesome ❤
Wonderful! 👍
Great one. Pls keep up thnx 🥰
Nice video:) never tried doing something like this but might just try myself sometime. Some small things I noticed;
Modern C++ prefers stringviews over passing const string references. This prevents any copies being made.
And since this is a video about writing a compiler… splitting your header files and impementation (CLion can do that quickly for you) can speed up your compilation of your compiler dramatically 😛
46:08 in VSCode it's Alt+Shift+up/down arrows to move lines and Ctrl+L to select multiple lines before moving. Not sure if it's the same in CLion.
You can define the functions without inline. all inline does is tell the compiler to inline the functions contents directly into the place where you call it.
The only problem will appear if you compile separate object files that has the same thing defined. Use a single cpp or include them like this. You could even error out if the same cpp file is included twice. Avoid header files as they are there just for the problems we want to avoid - multiple compilation units that share code. 😅
A better more suitable keyword would be "static" which makes the function local to the compile unit. I think. It has been ages since I touched C/C++
inline keyword actually has very little to no affect on the compilers decision to inline for the major compilers. It's more for allowing multiple definitions without violating ODR, therefore allowing implementations of non-template functions in headers. It also has novel uses in static variable initialization
@@henrikholst7490 Tbh everything you said here tells me you never knew how to write C/C++ propperly in the first place.
commenting second part lessgo
didn't found any other sussy timecodes
44:20 malloc & free
29:40 lisp langs looks like perfect AST example, they even don't need much explanation: (display (+ 1 (* 69 (- 10 2))))
11:50 void type exists for no return
10:43 yay my sugesstion from part 1 was mentioned
8:40 you can return '\0' char
part 3 when?
Great video, Tsoding!
just finished watching pt1. lol
Hey, great series of videos, looking forward for this compiler series.
What's your CLION theme?
Keep up the great work, and informative series!!!😀
And what font do u use btw?
Font: Iosevka
Theme: One Dark
Lol "an abstract syntax snake". Great video, but If one is serious about wanting to learn to write programming languages I recommend learning to compile to C or LLVM. Both of those are cross platform and will generate fast code. Assembly is cool, but really niche at this point and creates countless platform headaches. Btw for those who are scared of C++ one can still write a compiler in python or javascript, and when that is done one can make the compiler self-compile. The best languages IMO for doing this is F# or Rust both have advanced pattern matching and great debugging and testing frameworks.
I find Rust scary. C++ looks more scary than it need be due to the feature bloat added in recent years.
@@toby9999 I agree. Rust is really nice, but it takes some time to get used to.
This is great
this is peek content (bah dum tss)
I'm seriously sure you get bounds checking with array lookup of an stl vector, ie using []. Its the same thing.
every time there is a big change in the code i get about 30 errors and i never know what it wants cuz my compiler is broken
Had a project to make a language in Java while in school, so this is going to be interesting to follow to see what choices you make! btw, "./out ; echo $?" should work for a one-liner. it'll run the first command before the second rather than together.
&& will only run the second command, if the first one ran successfully (returned 0).
Because his program returned 20, the && didn't run the echo.
@@Kiwi-tq2fy Accurate, better than my late night explanation :)
Cant you just include everything in a precompiled header and include the pch everywhere (ofc while keeping pragma once)?
Why do you else after a continue at 1:05?
What exactly will be the product that will emerge at the end of these videos?
@Pixeled When you edit the test.hy file, why did you need to recompile?
28:40 when you got to this point shouldn't you have written tests to make sure your compiler stays in a working condition?
Dumb question, but what's that font for code called? I see it sometimes but forget it
Iosevka
Do all high level languages generate Parse Trees? I once was in a group project writing a compiler for Prolog in Haskell and we also had to implement sld-resolution, so thats why i wonder?
Not necessarily, most c-style languages do. Might not be 100% necessary depending on the syntax of the language like a functional or stack-based one
@@pixeled-yt Thanks :)
I had the same is never used when I used CLion. It was always annoying, and there doesn't seem to be a way to fix it.
12:00 A "const" method could have side effects such as modifying a "mutable" member.
./out $$ echo $?
fails because the return code of ./out is non zero.
(It doesn't execute following commands if the command fails)
What I tend to do is to put them in a bash script and just run the bash script.
Why has no one pointed out that the off by one in peek was because you're checking one ahead for end of string/vector while actually peeking the current character/token. The amount to peek should default to 0 and you should return the character/token of index+amount. Now the amount does nothing and your peek is only accidentally complete when you check if peek+amount is larger than length/size.
2:38
./out; echo $?
FYI, if you want to do ./out and echo $? on the same line. Use a semicolon insead of && as this:
./out; echo $?
Making a snake from a pe*is made my day
What clion theme?
One Dark theme
when you use [] if the item in the vector/array/list doesnt exist, itll add it to the vector/array/list. So .at is just a lot better.
hey your compiler playlist is backwards
Fixed, thanks for letting me know
so... I have been refreshing the channel page every 20 mins on average since i watched this video. just for pt. 3
I have pt. 1 & 2 done before and the explanation was good. JUST the last step is missing for me. How I get the tree to do stuff...
EDIT: I do not need _optimized_ code, i need _explained_ code.
This is well explained (with some very minor hiccups, : D so cute).
[+ minor edits on the phrasing]
Check tomorrow morning ;)
@@pixeled-yt : O
°(^_^)°
let's gooooo
2:37 Using && only runs the right hand side if the left hand side was 0. You can run both on one line the way you want by using a semicolon instead, like this: ./out; echo $?
1:03:42 i feel like not inlining everything in headers could've avoided this :^) you'd only need to include the bare minimum in the headers themselves, and then can include everything you need in just the .cpp files
I am pretty sure that peak in your context, is actually spelled peek btw
Lol, you're right
@@TigranK115 lmao
I think this stream would be 2x if he did this with pair programming. Feel free to steal my ideas.
55:34 another compiler of course
I would write my compiler in JavaScript, since then no one would have to bother with C-Make and all that other stuff. There are a lot of drawbacks to using JS, but at least it is a lot simpler to run.
Been doing C++ application development on Windows for 25 years. I never use CMake. Must be a Linux thing?
By the way, that's not what the inline keyword means in c++.
you shouldnt need any of the inlines, methods defined directly in the class body are implictly inline and shouldnt cause any ODR violations
You missed the joke about the ASS - Abstract Syntax Snake
33:10 yeah, and we though C++ was bad, its just a mini-boss.
Do you heard about two programs called "yacc" and "lex" ? :|
I would make way more utility functions, like `consumeWord` or `consumeDigits`
Yeah the code is really dirty
@12:05 "Hey it's C++, you know how it works!" - nope
17:40 petition the rename unpeek to regurgitate.
wonder what is ur keyboard setup?
I think it's QWERTY
Keychron V6, Gateron G Pro V2 brown switches
got u thx a lot!@@pixeled-yt
lol
Do you have any plans of making your programming language self compiled?
If I can get that far, then sure, that would be cool
@@pixeled-yt I see no realistic way of doing that if you stick with C++. And not utilize LLVM/binutils and existing STL libraries like libc++. For C compiler it's possible, but for C++, one person, not external dependencies and reasonable time? Yeah, it would not 'be cool' it would be freaking achievement of the century.
(well, you can be ambitious ofc!)
Well I'm not writing a c++ compiler, I'm writing a compiler for my own custom language which I am bootstrapping with c++
@@pixeled-yt That is current situation, so for self-compilation you need either 1) change your whole source of compiler to 'new lang'. 2) make your language at least subset of C++. I was discussing the latter.
@@mapron1bootstrapping is just a lot of work. that’s how it is. If you write a minimal compiler first, that compiles a language similar in features to C then you can bootstrap pretty early in development.
He is I guess the first person I ever watched that uses #pragma once, clean code standards are truly something for him..
it's "peek" not "peak"
Making your member functions inline when you define them directly in the class is redundant, they're already inline. And inline is not a performance related specifier, it's not about inlining, it's about making something avoid the ODR.
Inlining is also about performance if it eliminates the function call overheads.
@@toby9999their point is that 'inline' has little to no affect on compiler inlining intrinsics for the major compilers. It's all about ODR (one definition rule - allowing multiple definitions i.e. definitions in header). And the OP is right that member functions should be implicitly 'inline' already IIRC
create a makefile to make things easier
That is what CMake is for 🥰
makefile is still much easier
Great video, but if you really want to bridge the gap to the metal, why not directly emit x86 opcodes/operand bytes with a InstructionBuilder abstraction or something and construct an ELF file around that? That removes the last bit of magic imo. Also whenever your compiler stringbuilds instead of going to an IR (or in the final step machine code), you are probably doing something wrong. The only people who have a license to construct compilers THAT horible are the ML compiler people writing everything in python 🙃🙃🙃🙃
28:13 almost fixed it 🙃
21:31 Same
* 29:31
Nice video, but
You messed up!
wrt pragma once please read cpp core guidelines SF.8!
Bro i think i am too noob. Still i will make my own compiler
for a guy who's using c++ he seems to really hate c++
that’s just anybody who uses c++
tsoding from wish kinda goes hard ngl.
Ur so cute
when I see him using c++ I reallize that java would be perfect if it compiled to an executable... 😅
I hate java. It's not the best at anything in my opinion.
@@toby9999 ar least you came know what the "FileInputStream" class does
Hi
instead of repeatedly calling the "peak" function again and again inside the tokenize function. why don't you just create a local variable above the while loop to store peak's result and do all the stuff that way?
I mean it's not that big of a deal but it seems more managed to me
“This will be inline too just because I don’t like writing things in separate files” I don’t think that keyword means what you are implying it means 🤔
It allows me to not have to put the declaration in a header file and implementation in a cpp file. The inline keyword allows you to include the header file (with function implementations) in multiple cpp files without the compiler complaining about a "multiple definition" error
@@pixeled-ytYou don’t need to use the inline keyword if you implement methods in a header file. It will be inline implicitly, tho if it’s actually inlined is still up to the compiler.
The sequence of includes is a higher intelligence, trying to tell you, that you could as well have a Tokenizer instance in your parser and you tokenize lazily instead of using that std::vector.
As a Common Lisp fan, the classical "Dragon Books" approach to building compilers looks just wrong. If you have a homoiconic syntax, you do not need to change the grammar and the parser and lexer each time, you add a new idiom to your language. Maybe one day in the future, you will find it useful to try it the Lisp way...
9:05
Nice series but code is sluggish at best and painful to watch, hope this improves cause the subject is really complex..
That's why we watch it
Great Video, but i noticed how you moved values AFTER already passing them by copy, which obviously defeats the pprpose of moving
No, it doesn't. You're exactly wrong on this.
Haven't watched the whole thing, as long as he moves into the constructor, and in the constructor body, then both are moves and no copy
consoome()