saying "If yo have written C you've already used it" is misleading. most people use GCC on Linux and MSVC on windows. clang is only the default on macos
Even if you're on linux chances are you still have clang compiled software as some projects prefer and recomend it for compiling e.g firefox, chromium etc and of course rust uses LLVM
@@blehbleh9283 name me one distro that uses clang for the kernel. I don't think theres any except for maybe chromium OS or android if you count them as linux distros
It should be pointed out that GCC doesn't use LLVM. A shocking amount of new languages do, and it's not always for the better, and also complicates the build process. I've had to write my own makefiles for certain newer languages because they use LLVM and cmake sucks for versioning. However, while SSA is a useful tool, there are alternative methods. In several of my compilers I basically run through the code like an interpreter and have condensed nodes where the result can be computed at compile time and placed directly into the machine language output.
Seems similar to a coding style I've been advocating to make 'variables' const almost always. Side benefit is that each variable has actually one and only one well-defined meaning, unit, etc. Since C++11, this can be applied to loops too by using range-based for().
SSA, in many situations, also increases readability and maintainability of code even before compilation. The same way compilers struggle to determine which value was last assigned and which value we refer to, programmers struggle too. Thats why in most cases it's preferable to use constants/immutables wherever possible (like rust does by default or javascript with const instead of let).
Just a quick note on dead code elimination (1:24). A variable not being read (or, in LLVM term's, having no users) is a necessary, but insufficient condition to label it as dead code. Take as an example the following fragment of C-inspired pseudocode: int f() { int x = g(); return 0; } The variable x clearly has no users. However, it may not be dead code. If g looks something like: int g() { a++; //global variable return a; } Then eliminating the call associated with the variable x would be incorrect, and would make the semantics of the optimized program different from those of the unoptimized one. All in all, this is a great video that highlights the importance of SSA. Keep it up :)
SSA is just a way to represent binary trees with instructions at nodes and data at edges. It’s like saying “trees are awesome” - that’s true but you aren’t exactly breaking new ground here
How are loops written in SSA? Also, having written an optimizing C compiler that didn’t use SSA, I can tell you that a variety of optimizations like constant folding and dead code elimination are actually also really easy even without SSA - probably even easier actually since you just run through the generated instructions backwards while tracking what gets used, and if outputs aren’t used then remove the instruction. General analysis, especially when it comes to code branches, would be simpler with SSA.
You have two (broadly speaking) approaches, llvm uses so called phi nodes. A compiler called cranelift takes another approach and models them as so called basic block parameters.
@ Right, for llvm, phi nodes would be involved since there are conditional jumps in (most) loops. My question is more about how SSA works when the body of a loop runs more than once. The iteration variable is assigned to multiple values but would need to be referenced as the same symbol since the body of the loop isn’t always fully unrolled. It truly only use phi nodes in this context? Can it still be considered SSA even when the body of a loop runs more than once (which would assign a variable multiple times across multiple instances of the loop body)?
@@SethPentolope The basic block represents the operation that happens inside the loop abstractly. It _is_ ran multiple times if the loop repeats, but the single assignment principle only matters _within_ a basic block. (Also, off-topic, I like to joke that phi nodes are 'comefrom' instructions, taken from INTERCAL. It's funny, but kinda true)
@@SethPentolope the value is assigned multiple times but only in one place, which is fine. If this is still guaranteed for the whole program, you can perform all the optimizations you otherwise would like constant folding and common subexpression elimination. They still just work and loops aren't special, in that regard.
There were tons of compilers not based on LLVM. It's a pretty young project in computer history,although the most popular right now, by far... But something that time teaches you is that everything changes, all the time.
Phi function as you described would not be faster unless the two variables were lazily evaluated. This description fits circuit programming style for constant time in branching, used in some cryptographic libraries, but otherwise is very rare.
Thanks for this video, but regarding the compiler subject, I remember also compiler generators like Lex and Yacc, can you explain how they work, I think it'll be a subject for a good video.
These are parser generators and not compiler generators, they generate a ready-to-use (and in any language) parse tree out of a grammar file describing a language.
0:00 Not really a given that "you've already used it". Non-LLVM compilers like GCC or whatever the hell Visual Studio uses these days exist. If you use Visual Studio, you have to actively choose and setup LLVM/Clang as your compiler. So many people use VS and most of them will just go with the defaults. If you're on some kind of Linux, GCC is more often the "default". I've used LLVM unintentionally I'm pretty sure since one of my machines runs Gentoo which is a source based distro but most package managers just download binaries.
@TheCodingGopher Oops, sorry about that then. It's just that your voice sounds a bit robotic. The video was very interesting, though, so while I didn't mean for that comment to be a compliment, you do have my compliment for addressing this topic.
For a second I thought you were coder gopher; who is also from Canada. I was like “Holy Cannoli Casserolee! He’s back!” Cool video though! Funny enough, I was reading about Chris Lattner yesterday.
Why should you not explain why it is better? "X is better" without explanation is just a claim without substance, and thus at best a statement of opinion, at worst useless.
As others have pointed out, GCC doesn't use LLVM. It has its own IR and implements its own SSA.
saying "If yo have written C you've already used it" is misleading. most people use GCC on Linux and MSVC on windows. clang is only the default on macos
Even if you're on linux chances are you still have clang compiled software as some projects prefer and recomend it for compiling e.g firefox, chromium etc and of course rust uses LLVM
A fair amount of distros build with clang to ship their kernels and binary packages
i use Clang btw
@@blehbleh9283 name me one distro that uses clang for the kernel. I don't think theres any except for maybe chromium OS or android if you count them as linux distros
@@zyxiwAlpine linux uses LLVM and musl for a lot of stuff (probably because of licensing) so maybe they build linux with it as well?
This isn’t an llvm feature. It’s general in computer science. Java runtime compilers were using this 25 years ago.
And it's also related to continuation passing style in functional programming.
It should be pointed out that GCC doesn't use LLVM. A shocking amount of new languages do, and it's not always for the better, and also complicates the build process. I've had to write my own makefiles for certain newer languages because they use LLVM and cmake sucks for versioning. However, while SSA is a useful tool, there are alternative methods. In several of my compilers I basically run through the code like an interpreter and have condensed nodes where the result can be computed at compile time and placed directly into the machine language output.
This is sooo old. SSA is used for decades. I thought they found an improvement for SSA.
I know right, it's not like some people came not knowing what it is
Thank you for this explanation. Very clearly explained. Please more like this.
I'm sorry but GCC is wildly popular and almost always just has the edge on performance, and a VAST majority of C devs dont use llvm
This is an old invention from IBM, and yes, it's one of the many, many tricks one should be familiar with when designing an optimising compiler.
Seems similar to a coding style I've been advocating to make 'variables' const almost always. Side benefit is that each variable has actually one and only one well-defined meaning, unit, etc.
Since C++11, this can be applied to loops too by using range-based for().
SSA, in many situations, also increases readability and maintainability of code even before compilation. The same way compilers struggle to determine which value was last assigned and which value we refer to, programmers struggle too. Thats why in most cases it's preferable to use constants/immutables wherever possible (like rust does by default or javascript with const instead of let).
Just a quick note on dead code elimination (1:24). A variable not being read (or, in LLVM term's, having no users) is a necessary, but insufficient condition to label it as dead code. Take as an example the following fragment of C-inspired pseudocode:
int f() {
int x = g();
return 0;
}
The variable x clearly has no users. However, it may not be dead code. If g looks something like:
int g() {
a++; //global variable
return a;
}
Then eliminating the call associated with the variable x would be incorrect, and would make the semantics of the optimized program different from those of the unoptimized one.
All in all, this is a great video that highlights the importance of SSA. Keep it up :)
Very good point!
Thanks for watching
SSA is just a way to represent binary trees with instructions at nodes and data at edges. It’s like saying “trees are awesome” - that’s true but you aren’t exactly breaking new ground here
How are loops written in SSA?
Also, having written an optimizing C compiler that didn’t use SSA, I can tell you that a variety of optimizations like constant folding and dead code elimination are actually also really easy even without SSA - probably even easier actually since you just run through the generated instructions backwards while tracking what gets used, and if outputs aren’t used then remove the instruction. General analysis, especially when it comes to code branches, would be simpler with SSA.
You have two (broadly speaking) approaches, llvm uses so called phi nodes. A compiler called cranelift takes another approach and models them as so called basic block parameters.
@ Right, for llvm, phi nodes would be involved since there are conditional jumps in (most) loops. My question is more about how SSA works when the body of a loop runs more than once. The iteration variable is assigned to multiple values but would need to be referenced as the same symbol since the body of the loop isn’t always fully unrolled. It truly only use phi nodes in this context? Can it still be considered SSA even when the body of a loop runs more than once (which would assign a variable multiple times across multiple instances of the loop body)?
I imagine it's something like "%i = x" where I is the loop iteration and x is the value that gets set during the loop
@@SethPentolope The basic block represents the operation that happens inside the loop abstractly. It _is_ ran multiple times if the loop repeats, but the single assignment principle only matters _within_ a basic block.
(Also, off-topic, I like to joke that phi nodes are 'comefrom' instructions, taken from INTERCAL. It's funny, but kinda true)
@@SethPentolope the value is assigned multiple times but only in one place, which is fine. If this is still guaranteed for the whole program, you can perform all the optimizations you otherwise would like constant folding and common subexpression elimination. They still just work and loops aren't special, in that regard.
There were tons of compilers not based on LLVM. It's a pretty young project in computer history,although the most popular right now, by far... But something that time teaches you is that everything changes, all the time.
4:34 well don't leave us hanging! What are the commands to have LLVM help me here? Because I'm REALLY sick of reading thread sanitizer outputs :D
another amazing LLMV video!! you always outdo urself with every LLMV vid. can’t wait to see the next one
Thank you so much!
@@TheCodingGopher yeah just don't turn into react youtuber
LLMV: Low-Level Machine Virtual
xD
Everyday learning new things😮
Gcc, while it doesn't use LLVM, does use SSA in the form of GIMPLE.
Anyone can make the simple look complicated... this video succeeds in making the complicated look simple :) Awesome Video.
Cheers! 🥂
About ten years ago one guy proposed cpu sporting SSA using something called belt - the mill cpu
been enjoying your channel, you're covering some interesting topics that don't get enough attention :D
Thanks for the support! :)
Phi function as you described would not be faster unless the two variables were lazily evaluated. This description fits circuit programming style for constant time in branching, used in some cryptographic libraries, but otherwise is very rare.
Thanks for this video, but regarding the compiler subject, I remember also compiler generators like Lex and Yacc, can you explain how they work, I think it'll be a subject for a good video.
Thanks for the suggestion!
I have added it to my video ideas backlog. For priority review, consider getting my membership :)
These are parser generators and not compiler generators, they generate a ready-to-use (and in any language) parse tree out of a grammar file describing a language.
These programs generate parser programs that take text and convert it into AST. After the syntax tree, real compilation is still left.
Surprisingly I didn't learn anything new because this crazy interesting idea we learned in university
thats a very smart idea
Where can I read more about this?
Here's a good link: mapping-high-level-constructs-to-llvm-ir.readthedocs.io/en/latest/control-structures/ssa-phi.html
cool and interesting info thanks
That's liveness analysis, isn't it?
This is an underrated channel 🔥
0:00 Not really a given that "you've already used it". Non-LLVM compilers like GCC or whatever the hell Visual Studio uses these days exist. If you use Visual Studio, you have to actively choose and setup LLVM/Clang as your compiler. So many people use VS and most of them will just go with the defaults. If you're on some kind of Linux, GCC is more often the "default". I've used LLVM unintentionally I'm pretty sure since one of my machines runs Gentoo which is a source based distro but most package managers just download binaries.
What AI did you use to voice this video? 11labs?
This is my voice. No AI is used in the video. I will take that as a compliment though
@TheCodingGopher Oops, sorry about that then. It's just that your voice sounds a bit robotic.
The video was very interesting, though, so while I didn't mean for that comment to be a compliment, you do have my compliment for addressing this topic.
@@SrIgort Ah no worries! In that case, I will take it as constructive criticism :)
For a second I thought you were coder gopher; who is also from Canada. I was like “Holy Cannoli Casserolee! He’s back!” Cool video though! Funny enough, I was reading about Chris Lattner yesterday.
TH-cam is always listening 😂
Any compiled language... except Golang
Is it chatgpt text? "X is better because it makes Y easier" is a common sign of gpt - just say it is better, not explain why even a little.
Why should you not explain why it is better? "X is better" without explanation is just a claim without substance, and thus at best a statement of opinion, at worst useless.
Just pay someone on fiver to rrad your script this ai voice is unbearable
It's my voice - not AI