There was a feature in the very first version of FORTRAN, intended to mollify the Assembler users of the 1950s about “efficiency,” which allowed the programmer to “assist” the compiler in compiling efficient code for nested loops. Basically, it consisted of a FREQUENCY and an EQUIVALENCE statement. The programmer would specify an estimated number of inner loop executions per outer loop execution with FREQUENCY, and the pattern of memory reusability with EQUIVALENCE. Combined with other information collected during the compilation, FORTRAN could optimize when subscripts could be kept as the equivalent values in an index register, when they could be kept as index values but stored during a loop, or when they actually had to be kept in memory as subscripts (eg when the subscript value had to be printed out). These statements allowed the compiler to generate code closer in efficiency to what an Assembler programmer could produce, while wasting less time in the compiler doing the optimizing. These statements were always optional, and were often added after test runs to produce a production version of a program. Later compilers which required less optimization (because computers were faster and had more storage) retained the ability to accept these statements (for compatibility), but ignored them. These features helped overcome the bias of Assembler programmers against high level languages (known as “automatic” programming at the time), so that FORTRAN and other high level languages became successful.
The most ironic thing of all things about compilers is SSA - single statement assigment. It turns your damned mutable imperative program into a pure functional data-flow program ! its beautiful. Because that's the only way to optimize.
@@pranaypallavtripathi2460 yep I never would know that what's essentially PGO - profile guided optimization. was possible in Fortran. Ahead of its time in all senses. C on the other hand, designed by comitee, but hacked down, literally from BCPL. Ambiguous parsing syntax, because why not, right?
@@monad_tcp Actually it wasn’t possible in the FORTRAN language (ie the program being written in FORTRAN), but the COMPILER, written in Assembler (for the machine in question, initially the IBM 701) could optimize the generated code with these “hints” from the programmer. New terminology appeared as time went on, but the essence remained. Speaking of ambiguous parsing syntax, only two hard examples of ambiguity existed in FORTRAN that I could find: one in the DO statement and one in the FORMAT statement, and they existed in all versions of the language because, for compatibility, users would not allow the “space” character to have any delimiting function. Except as part of the text of an alphanumeric item in the FORMAT statement (and later in CHAR form constants), the spacing between actual language elements was DISREGARDED, and the syntax was defined so that the separation between elements did not affect the compiler. Example 1, the DO statement, used to create an iterative loop: the general form is DO = , [ , ] where = label number of a statement, value 1 through 32767, which ends the range of the loop, = the name of a simple integer variable, consisting of one letter (I-N) followed by zero through 5 characters which can be letters (A-Z) or digits, And , , and can be integer constants [-]#…#, subscripted integer variables (subscript[s]) or expressions. But the arithmetic statement format is [(subscript[s]) = expression. Therefore the DO statement “DO 35 I = 1 , 100” written with no spaces (which is how the compiler sees it) as DO35I=1,100 could also be an arithmetic statement up until the first comma, after which if DO35I can be divided into “DO 35 I = 1,” where 35 is a valid statement number and I is a non-conflicting integer variable name, it is compiled as a DO statement; otherwise it is an arithmetic statement, which may contain errors. Likewise, the FORMAT statement, used in I/O operations, is defined as “FORMAT()” But if “FORMAT” is the name of a variable, which is valid (FORTRAN has no “reserved” words), and it has subscripts, the compiler will not be sure this is not intended to be an arithmetic statement until it reaches the matching “)” and sees no “=“ sign after it! Amazing what they were able to do with so little computing power and math theory!
That's maybe the best explanation why Python, Java and other canonically interpreted langs can differ in performance so much depending on the interpreter/standard/environment.
@@igorthelight Almost no language is just interpreted (except for batch files or shell scripts, maybe?). Most interpreters first compile to a simplified internal format (like bytecode). It doesn't make sense to call Java "compiled to bytecode" and Python "just interpreted" when Python does the same thing.
@@pierrefley5000 Well, you can say "compiled to bytecode and then interpreted", if you want to. But since bytecode is just a different, more machine readable representation of the written code, that's not so big distinction. And it's more of a translation than compilation; and it has been there from the very early 8-bit microcomputers and their BASIC - nobody is saying that they weren't "just interpreted" because they were translated into byte-code first. But yeah, batch and shell scripts are, AFAIK and I'm really confident about this, parsed and interpreted everytime it runs, there's no translation to byte-code involved at all. Rest of the interpreted languages, I don't know if there's a single one that doesn't have some kind of internal byte-code representation that the written code is first translated/compiled into, but if there is then it has to be a fringe case really ;) It just doesn't make sense for interpreter to parse the written representation of the code every single time that piece of code is ran.
@@robsku1 Well, there was no translation to bytecode for early 8-bit BASIC interpreters. All they did was parse the source code and perform its behavior directly.
The downside of JIT is the final behavior is not locked down and determinant in a way that can be practically tested with certainty, so runtime behavior can't be effectively verified for critical applications. eg Some fool at Boeing tried to use Java for a critical embedded control, worked fine most of the time but about 1% of the tests would have failed activation because the JVM was busy doing stuff when the signal came in. If a game drops a frame here or there or misdraws a few pixels, it's a minor annoyance if it is even noticed, If a jet doesn't retract its landing gear or an automated defibrillator(medical) glitches, it will be noticed and more than an annoyance.
@@daruiraikage When the Java Virtual Machine de-fragments and frees memory (garbage collection) it will sometimes pause the program while it does so. Garbage Collection is inevitable in a long-running program. You can delay it giving it a larger memory buffer. However, giving it a larger buffer will make the pauses longer when it finally decides to do so. Most of the time it's difficult to effectively control when garbage collection happens. If you somehow disable garbage collection during execution of a critical section, but you use more memory than is available in the buffer, your program will crash. Essentially, languages with Garbage Collection (most JIT languages) are il-suited for real-time applications. i.e. Java, JavaScript, Python, C#, Golang, Ruby, Haskell, Php, Perl should not be used in real-time or critical environments.
A major use for JIT compilation: emulators. Huge performance boosts were gained by using JIT instead of straightforward interpretation for translating the machine code of the emulated system into the host's machine code. Especially important for emulating more recent systems. Of course, what constitutes "recent" depends on when the post was written. When I first heard of it being used, it was the Playstation 2 and Gamecube that it was being applied to (possibly the Wii as well), and there wasn't any way of emulating the PS3, or Xbox360 yet. With the last couple of crops of consoles using x86 processors, emulating one might have more in common with running Windows in a virtual machine instead.
The compiler is removing the loop because the “a” variable is not being used after the loop! If you print the “a” variable at the end the “time” will be different.
Yes, additionally he just adds a constant, 5, up to 1e7 times, which is just a multiplication that fits perfectly fine in a single 32-bit number multiplication instruction, so the speed should also be constant.
I'm not sure how Pypy works since I don't use Python but it should still be possible to more or less optimize the loop away. The function call inside the loop can be abstracted to a constant number instead (since it runs the same function with the same inputs) and from there you can just multiply that constant by the number of loops and add that total value all at once. Of course, this isn't to say that Pypy would actually do this, just that it's possible. I don't know whether it actually would in this case or not.
assuming a was used and loop haven't been removed, would the jit optimize the f for the int values or optimize it just for f(2,3) (since it is only used for that particular values) ? assuming jit optimizing methods for particular values how does it know when to stop optimizing? does it a have storage limit for "optimized" methods etc. it feels like it might take away the performance if there are too many variations
I looove computerphile. i work in construction but everytime i watch one of your videos I realize what i really wanna do in my life... thx you guys so much!
It would be great if you could talk about the "scary how clever they've gotten", it's exciting to hear a statement like that so would love to know more
@@ralf391 That's a genuinely interesting way of putting it! Makes it much easier to see how out-of-order cores can be considered JIT compilers in a way
A decent optimizing compiler will detect that that loop has no side effects (including output) and that the result of summation isn't used anywhere, and will simply throw the whole thing out. To prevent that outputting the final sum would suffice.
The example in the video was a silly toy example. The idea is that JIT works, even in situations where static optimization would have no idea what's happening.
I'm fairly sure it's impossible to have a fully-static implementation of a javascript compiler that also completely implements the ECMAScript spec. This is true for most "JIT/interpreted languages" For example: Runtime reflection on aribtrary data types is often a key feature in JIT/interpreted languages and this is in general incompatible with static compilation. Another example is runtime evaluation of generated source code. A compiler could embed itself into the compiled program and invoke itself for these cases but then it's no longer statically compiled and you effectively just have a JIT compiler.
as someone who works on the v8 js engine in chromium I quite enjoyed this video. people often think jits are a lot scarier than they really are and education in this area is always appreciated! also shout-out for using a framework laptop :)
Hi, he said that javascript can be compiled, but in the past I was googling and it's not possible to make binary file out of js code. So is it possible or not? And if not, why it's not a thing?
@@daliborin He meant that it's possible to compile any language, or interpret any language. That means that a compiler could theoretically be created for JS, but I don't know if anyone has done it.
Hey Brady, please ask Dr. Tratt to change his computer password; the video shows him typing it in at 3:18. While the risk is low (camera angle is pretty flat), a dedicated attacker might be able to figure out his password from his finger movements and use it. Better safe than sorry. Excellent video btw!
I was about to say... typing in a password on camera, bold move. Hope he changed it before the video to something specifically for the purpose of the video
While this guy obviously understands the subject well, the example they chose is quite silly. A statically compiled language like C will also optimize out such a loop; it doesn't really show off how JITs can excel so much as it shows how bad the more naive approach of interpreters is.
Yeah. He demonstrated the difference between interpreted vs compiled, but not necessarily JIT. To show the advantage of a JIT he really needed to show a case where the program execution isn't known until runtime. Like a dynamic function call where a static compiler can not deduce the target so must generate a procedure call, where as a JIT can inline the target. Thats critical to make Java run fast when every method is virtual and most are very tiny, so inlining is crucial. Its the reason why static compiled Java performs badly. It needs a JIT to work well. C++ on thr other hand, with templates and virtual calls being relatively rare, much more information is known at compile time so it doesnt have the problem.
I love this channel but wow this was a terrible way to demonstrate the advantages of JIT. All he did was demonstrate the problem with purely interpreted languages, not why JIT is useful.
I disagree. Yes, this kind of problem is something a traditionally compiled language can optimize out. But JIT is primarily used for accelerating languages intended to be interpreted, which he demonstrates perfectly fine here. I'm not sure what else you were expecting to be shown.
@@benloud8740 imo his example didn't really demonstrate the difference between interpreted vs compiled, either. that was the difference between the translator making optimisations, vs making no optimisations. an interpreter can choose to implement optimisations in the way that it chooses instructions in the host language, just as well as a compiler can choose to implement optimisations in the way that it chooses instructions for the target language. the only important/qualitative difference is whether those optimisations are done statically or dynamically. you could absolutely make an interpreter that creates & remembers program-dependent optimisations ahead-of-time / statically, and then uses them upon actually evaluating the guest-language program. a compiler just has the advantage that it can target a less abstract language than the language the compiler itself is written in - but even then, compilers can't translate to CPU hardware microcode! that means that the CPU is effectively an interpreter (host lang: microcode. guest lang: instruction set) which is performing optimisations
This video just explained something I've known about (and leveraged) for _many_ years now, but never truly understood, in a simple bite-sized way. Thank you!
Tratt, I've been a fan of your writing (and some of your research) for years! The work you did on the (missing) warm up of JITs with the PyPy guys is downright amazing :) (The stuff I'm not a fan of? Mostly the style of language you chose for your Converge language -- I agree with pretty much all your goals for it, of course. De gustibus non est disputandum...)
Static compilers can have just as much info as JIT when given runtime info. This is often done using a technique called PGO and usually leads to the absolute fastest type of programs when combined with other static compilation optimizations (LTO, binary optimization [bolt], etc...)
You can also use JIT in some parts in a program compiled with and AOT.
2 ปีที่แล้ว +2
finally someone uses one of the two editors and it's even the one, that can be used without surgical alteration of the editing hands...I think that's a first for computerphile.
Another advantage of JIT over statically pre-compiled files that are then distributed (apps, games, etc.) is that the optimization can be specifically tailored to exactly your machine. The exact CPU features, cache sizes, memory size/bandwidth/latency, etc. Statically pre-compiled on the other end must be compatible with a lot of different systems and configurations (some machines are 32 bits, some 64 bits, some have SSEx extension, some don't, etc.) This makes the programs both bigger and slower. There are detection, tricks, and such available to mitigate this problem but not eliminate it. JIT rules in that aspect.
In the modern C toolchains, there is the first part, compiling into the object file and there happens the first layer of optimization, the second level of "static" optimization when all object files are put into one "bucket", until this point i fully agree that the JIT can be more efficient. However, there is also the LTO (Link Time Optimization), this also has multiple stages, and this will look from the linker perspective and you can consider it as "dynamic" optimization. Off course you can seriously hinder it with improper pointer handling, especially function pointers and recastings, as those will break the path of data and function accesses and in this area JIT might be more efficient than the modern C compiled binary. My background is more C focused, happy to learn more about the higher level solutions like JIT.
The whole idea and power of JIT compilation is that it enables code written by plebs without exactly that knowledge of the inner working to execute much faster. But it also scales so a very skilled programmer can spend more of their time on the interesting stuff instead of optimising for performance everywhere.
Even LTO is completely unaware what data or parameters the program will be run with. PGO (profile-guided optimization) is what you should be touting as the C competitor to JIT; but the problem is that PGO requires you to generate profiles, which is a manual step the developer has to go through. Whereas, as @Hamachingo points out, JIT can see things that PGO can see but LTO can't, but it also just works "for free".
I always wondered why the JVM for example is not able to dump the current compilation state into a file. So next time you load the same program you don't have to start the JIT process from scratch. If you have a big webapplication in java it takes quite a long time from startup to a point where your application runs a same speed it did last time.
I also wondered the same thing, but in the end you can't just dump and reload pieces of compiled code. JVM initializes everything lazily and initialization order affects application correctness so every time you start your Java app, the JVM must ensure it's initialized in correct order, even if something has changed (e.g. environment variables or whatever). OTOH, if you invest enough engineering effort then you can actually reuse some compiled code if you check initialization order (and other important stuff) first, so your app reaches peak performance much quicker. Azul ReadyNow! implements that and it's a paid technology. OpenJDK authors are also working on improving JVM startup as part of Project Leyden.
You can disable the E-Cores in bios if you wanted to have more comparable results. For the best comparison, disable all but 1 P-Core + hyperthreading and set it to a fixed frequency.
@@simon7719 What I said has nothing to do with JIT, it's just an explanation of how to set up the hardware to it shows a consistent duration (Time) requirement for any testing. Skip to 4:44 and listen to what he says about his laptop. I'm not sure if you're familiar with the last few generations of Intel CPUs, but most of the variants of the last 2 generations have "E-Cores" (E for Efficiency) and "P-Cores" (P for Performance). These are both a power saving method for doing background tasks as well as a way to make the core count look "better" to consumers to compete with AMD's core/thread counts.
@@simon7719 "more than likely" part is debatable. Especially when CPUs with efficiency cores draw more power than previous gen and the thing that is supposed to choose between E/P cores for your load is Windows.
maybe I'm being too skeptical, but the typing at @3:20 is a bit risky? They showed the keystrokes of the password in the video. somebody with enough expertise in this area could potentially figure out what his password is based on the fingers he used for the keystrokes! I can already see he used a "shift" key for an uppercase letter in there!
I appreciate the concern, but there's no need to worry. What you can't see (since its output wasn't used) is that there was a second camera pointing at my keyboard. I rotated my password after the recording, as I tend to do in such situations anyway.
*a compiler takes a file and produces another file.* the language in the input file is called source code and the language in the output file is called target. the source and target can be the same language. some people call this a specific name like compiler-compiler or transpiler but they don't need to because is just a compiler. *an interpreter takes a file and produces actions described in the file.* the JIT can compile to a bytecode or to assembly in both cases produces faster execution however if the source programming language isn't designed with zero cost features then the speed the JIT produces won't be same as a zero cost programming language. There are very small number of usecases like inside a loop in which a JIT can produce code faster than a zero cost language.
The most insane form of optimization I've seen in any language is APL, if anyone's curious just search for "Dyalog 18: The interpretive advantage". They dive into how they can shave off nanoseconds just based on how they optimize, it's crazy stuff.
I've heard many times that JIT compilers can possibly optimize the code during runtime to make it faster than statically-linked code, and I believe that. However I've yet to see a real-world example of that happening, even in frequent loops. I'd really appreciate it if someone could point me to an example where the JIT code is indeed faster (without the linked code being artificially hampered).
I have some code that fits billions of curves to n datapoints each (once per pixel in a large image). n is unknown before runtime, but it is constant once it is read in. We wrote it in C and in Python using numba as the JIT. The python version is slightly faster. The reason is that the python version knows the size of all the used arrays at compile time (which is at runtime) and the C compiler does not and cannot know this. One option to make the AOT version faster is to compile many versions, one or each possible value of n. Obviously that is quite limited.
Very interesting. Would love to hear more details on how those optimizations work, i.e. why exactly makes that much difference between the two compilers. What does the optimized program look like?
I used Numba for a big data analysis project in particle physics, very math heavy. With the near-trivial extension to distribute the work (built-in feature prange() ), I managed to get >100x speedup and >10x faster than existing C++-code (though to be fair, it was not terribly optimized). Point is, it maintained readability perfectly and cut execution time from several minutes to seconds!
It's worth noting that if you create a generic function that can take an integer or a string, compilers will automatically create two machine code versions and just use the right one depending on what datatype you are using. That really has nothing to do with JIT. Also, JIT engines generally use a LOT of memory, so while they can be fast, they aren't magic and don't suit all applications. Always use the right tool for the job. 8)
I mean, I hate Python, but I have to point out that a Python function can take 20 arguments; whereas we're not going to make a C++ generic function that takes 20 generic types and have the compiler generate all 2^20 machine code versions. So comparing JIT to generics like that seems weird to me.
@@TheHuesSciTech You're very unlikely to call that generic function 2^20 different ways, though. The template types are known at compile time, so they don't depend on user input. The programmer would have to code up every possibility of static types.
python 3.11 actually uses a similar strategy to a JIT by doing an optimized lookup for the addition operator based on what types that function is frequently called with.
Tratt's student cfbolz once answered this question "we add lots of ifs and prey". If you're fortunate, all of those ifs can be hoisted outside of the hot loop, but there are obviously times they must remain. There are a lot of approaches to reducing the number of things you need to check, too.
Using the same example from the video. They have a function "f" in Python, whilst being just 1 line of code we see it doing 2 very different things (addition of numbers, or string concatenation) At runtime, there's a big loop constantly calling the function f with just numbers, we know it's always numbers because the numbers were hardcoded. The JIT can spot this, create and compile a function at runtime in machine code that just deals with numbers (let's call this new function f_numbers) and replaces the code in the loop to call this new f_numbers function. Elsewhere the code is left as-is using the original interpreted function, f. This ensures that other code paths that aren't guaranteed to be using numbers continue to work It's not the best example because this could have been spotted without running the program but hopefully that makes sense. The main thing to realise is that when JIT compiles a method, it doesn't completely replace the original, but creates an optimised one for particular use cases alongside it, and the new compiled one will only be used where it knows it's safe to do so
It will add "guards" that check that all the assumptions are still valid, and if they are it continues to the optimised fast path, otherwise it will jump to a "deoptimise" function that will invalidate the compiled code and fall back to running it in an interpreter So this way it can assume things that are probably true based on the behaviour it has seen so far, but not break if that changes, which a static compiler can't do because it can't see how the program will behave at runtime
There is no way anyone cam guess the password from that footage. Maybe length and rough positions for some keys but I don’t think it reduces the search field considerably
@@Gvozd111 Would you put money on that? Before you answer, maybe you should get off your phone and watch it again in 1/4 speed on a 4K monitor. I can clearly see all the right hand letters, and I can tell what keyboard row and finger he's using for each letter on the left hand. This is slightly more difficult because he's a 4 finger typist and moves his hands along the keyboard, but it narrows the left hand characters down to just a few possibilities. And that's assuming the password isn't so simple that you don't just guess the left hand letters based solely on the known right hand ones. 😁
"big team in most cases" - funny to hear from someone showing pypy (where work on jit was done in quite small team) as an example :) same with amazing luajit2 which afair was done mostly by a single person?
In C# and .Net languages in general, the JIT compiler compiles the IL code once before running. So. My understanding is that the IL code can be run cross platform as the compilation happens on the target platform. What you described seems to be a runtime optimizer or something. In any case it's a great time to be developing software.
I came to write a similar comment about C# and .NET. Also, it's very interesting to hear the word "compiler" for an interpreted language such as javascript. I don't know why he uses that word.
@@XtroTheArctic "interpreted language such as javascript" watch the video again. there's NO SUCH a thing as "interpreted language". Languages are statically typed or dynamically typed. They are not compiled/interpreted, that's environment.
No, what he described is a JIT compiler, that's how they work actually. JIT compilers are runtime optimizers. Your misconception is that it is not the JIT compiler that produce the IL on the dotnet machine, its the C# compiler itself that does that. The JIT compiler only works with IL. Also, the System.Expression has a lambda compiler that's separated and also is able to make IL code. (so is the mono.cecil library) The F# compiler also uses an entire different mechanism to produce optimized IL for functional code. The F# compiler is written in F#. The JIT compiler doesn't produce IL, it consumes IL and produces machine code, at RUNTIME. But it might also do AOT if you use Roslyn compiler, Roslyn is an hybrid compiler, it can do both, JIT/AOT and PGO, which is profile guided optimization. You can have 98% of the performance of C++ for 25% of the cost. I'm in the middle of porting everything I have in C++ to dotnet7. The performance of the new Dotnet7 (with stack types and spans) is so amazing that it beats the Java Azul compiler, the best JIT for years. But not anymore. Exciting days for the dotnet community.
@@XtroTheArctic No, I don't need to research this, I work on this research field. I have a MSc in compiler development. Your teacher is wrong. There's no compiler/interpreter languages. There never was. The first Fortran was interpreted, then a compiler was made. The very first compiler. This is a simplification some teachers used to teach kids.
I would agree that the code isn't the best. In the example, will the optimizer notice that the a(variable) is not used after the loop and remove the variable and loop? I have seen in C, C++ and C#, the optimizer sometime removes "wait" loops. Will the results be the same if line 10 "print (a)" is added?
@@cobaltno51 I think they were hoping for an example where the optimization was based on analysing the runtime behaviour of the program - i.e. the kind of optimization only a JIT would perform.
I understand citing as examples for JIT compilation Java and Javascript engines, but I think a bit more visibility and research should also have LuaJIT, it's pretty astouning the progress which has been made in that project.
What i have notices about me playing minecraft, is that after i have started playing for a while, it does seem to get a bit faster after ive played for about an hour or so. I wonder if this is what im seeing in action.
My instinct is that it seems weird that it would take a whole hour before it took effect; I'd expect the JIT gains to be observed much sooner than that. But I have no specific evidence to give to you on that.
Yeah I don't think it would take Java an hour to figure out how to run its code. It's probably just that you've got all the chunks around you loaded, and you're getting more cache hits on the entities that have to update. I found that garbage collection would still make the game stutter every few seconds though.
Case and point that languages can be both compiled and interpreted: Java native images. There the graalvm native image compiler takes java bytecode (so Java or Kotlin or something else compiled for the JVM) and then compiles it into a static binary, which is then much smaller, faster to start up, but (especially without PGO) it can be quite a bit slower.
I wonder if it is "realistically" possible to make some of the dynamically typed languages as compiled machine code without just resorting to intermediate representations + runtime (like how Python does?). Would it not create way too many permutations of the code if the data structure is complicated and could hold any number of possible types associated to it? If things are statically typed, I think being interpreted or compiled is just a matter of design decisions (both can be a feature of the language), but if some program can take in any type of data and permutate the actual operation during runtime depending on the input, is there a point at which it really becomes non-beneficial to completely compile down to machine code?
It is absolutely the case that it can be quite detrimental to performance to have a function that could have one of many different interpretations, with no way to know which one to use until it is called. However, static compilation generally still will improve performance, since a static compiler will be able to find a way to pack all the possible implementations of that function into a dispatch table, at which point each of those implementations would be able to be optimized as normal.
I wander if someone has done some study on how large the the dispatch table will become to accomodate all permutations of a dynamically typed language (of course it will change depending on how the program is written etc.).
@@Howtheheckarehandleswit or we could analyze the parts where they're being used and which operators/functions they're used with and massively reduce the number of permutations
@@ashwinalagiri-rajan1180 This is a hypothetical compiler for a dynamically typed language. It’s not possible to see all of the types that are actually used at compile time.
There is a thing called Generic Types in almost all modern langs, and the ahead of time compiler can optmize more efficially by using it. and, JIT can not do much more as AOT, since it needs detect the runtime infos, that is why JIT has 3 gens is C#.(or PGO, too)
How about a discussion on JITs versus compiler code instrumentation and if there are still advantages that JITs hold that are out of reach for what the compiler can get from repeated runtime statistics?
JIT just happens "for free", whereas PGO (which is what I assume you're referring to) requires manual effort from the developer. I feel like that's the biggest difference, and that's coming from a PGO fanboy like me!
A _"compiled PL"_ is just an abbreviation for _"a PL designed for effective compilation",_ while "interpreted PL:s" are languages without this design feature. Besides, for any computer scientist, it should be obvious that Python is very bad for compilation to effective code, so therefore it is a language most suitable for interpreters.
And now CPython (the official and reference implementation of Python) has a JIT compiler, still preliminary but still they expect to do a 50% performance increase for 3.13 over 3.12, and another 50% for 3.14 over 3.13.
I think they use one of many possible workarounds like checking every so often if there is a new compiled version of a function available and only then replacing the function pointer. Remember that JIT is not just a theory, its in use for a long time, like ~25 Years of Java for example.
This video should be titled with "How Just In Time (JIT) Compilers Optimize" or "Something about Profile Guided Optimization". For example, transcoding bytecode into machine code during runtime should also be JIT; and then we can use the technique in the video to optimize the machine code during runtime (which is also JIT). As long as the "translation/compilation" happened during runtime, it is JIT, with or without optimization. -by some extend, self modify code can also be JIT-
the loop in this example is pretty much static and only executed once per run. does this mean the jit is doing some optimizations before the first run?
@@syntro6607 This however is not something only JIT can do, an AOT-compiler would be to do that just as well. Perhaps JIT even optimizes it before the loop is executed, although that is unlikely given that such optimizing is resource-intensive and should rather be done while the method is being executed
@@syntro6607 With java, there is a command-line parameter -XX:CompileThreshold "By default, in the server JVM, the JIT compiler performs 10,000 interpreted method invocations to gather information for efficient compilation. For the client JVM, the default setting is 1,500 invocations."
1:00 I would be a bit critical of these statements, for example it's impossible to write a Python Compiler which does not include a python compiler or interpreter in it's result(You can do something like exec(input(">>>")), so a real compilation of python or ECMA Script is impossible. A C Compiler does not need to do this.
If you had a non deterministic function, but most often returns a value, say def func(): if random.random() > 0.999: return 4 return 5 Would the JIT optimize the randomness away?
No. "Just-In-Time debugging is a feature that launches the Visual Studio debugger automatically when a program, running outside Visual Studio, encounters a fatal error. Just-In-Time debugging allows you to examine the error before the application is terminated by the operating system." That has nothing to do with having several goes at compiling code based on observed usage patterns, which was what was talked about in this video.
Surely a compiler would find the same optimizations as the JIT for the example shown. What are some examples of programs where JIT is able to find optimizations that compilers can't?
When an input to a function is dynamically computed or fetched during runtime, making it impossible to optimise for during compilation. Take a game engine responding to keyboard/controller input or a program running over a bunch of files on a user's machine, and the subsequent function calls that take that information as input. The compiler can make _assumptions_ about the inputs a particular function or section of code uses, but it cannot _guarantee_ that those are the _only_ inputs that'll be provided. Conversely, a JIT compiler is able to _observe_ the inputs and how they're used during runtime so that it can make an increasingly more accurate decision on how it should optimise the code.
never, maybe if someone wrote bad code in C in better code in other language, if you are actually using power of C with full pointers and cache control, doesn't matter the magic nothing can beat C..
It can be faster, by putting heavily used code together so it can all reside in the icache. And, as in the toy example in the video, for languages like Python with generic functions, it can decide to have the JIT just compile to specialized versions (a static compiler can, of course, do the same thing, but if you have binary functions like + with N cases, you wind up with N-squared specializations, which can take a lot of room).
@@Amish_Avenger It's not just theory. It's simply that there are optimizations that are always possible in C, but that doesn't mean any C developer will do them. It's like trying to say that assembly is always faster than C if done properly. It's true, but it would require you to always do optimal code for it.
PHP got a JIT a couple of years ago. I'm not sure if it was worth it though, lots of people say it might not be. Part of the problem is that a typical PHP program just starts up, takes a web request, gets some info out of a database and generates a response, then shuts down. So there isn't much time for the JIT to optimize it.
Couldn't static compilers do the same thing by running the program one or multiple times after the first compilation? If that's not possible, what prevents them from doing so? If this is possible, could it eliminate the worm up problem?
I don't c a technical feasibility challenge to recompile machine code (for compiled languages) @ run time, except for the overhead in performance. PS: Since static compilers (C, C++) convert it to machine code before execution they lose out on the benefit of JIT. This is preferable for low powered embedded systems, where power utilisation is imp. However, Java is an example of a language that converts to bytecode (intermediate representation) & has a JIT compiler to optimize the code @ run time.
The advantage of JIT(not well demonstarted in the video) is that the runtime input is not known until runtime. Different types of data inputs or outputs each time the program is used. The downside of JIT is the final behavior is not verifiable for critical applications.
Static compilers don't know the input the program will take or even whether the program will ever terminate. But you could let the compiler observe some human guided runs and gain further knowledge about the program. This knowledge, however, is nothing but assumptions and might be wrong in later, actual runs. The compiler could perform some optimizations based on these assumptions, though, and may be able to make the compiled program faster.
That exists and as Daniel pointed out it's called profile guided optimization or pgo. The main drawbacks are overfitting and slow compilation times. casual static compilation makes assumptions about your code which got hard coded by some compiler engineers. Those assumptions might very well be wrong for your use case but how do you know the stuff that pgo just measured is representative of future executions? Well, that's kind of hard since you don't really know what pgo measured exactly.
@@capturedflame this. C/C++ needs more design beforehand, too. Javascript and Python are perfect for free-flow adhoc dev. This reflects in the way you optimize it. “Just do it for me” versus “show me what I should focus on to improve.
So the JIT looked at that loop going to 10 mill and thought. "Why am I counting? I'm not showing him the work, just just return 10mil. that will make him happy."
are most Javascript engines actually using some code from java's vm? I didn't think so, but I might be wrong. also, that example shown here is pretty absurd, even the cheapest C compiler would optimize that loop out, unless explicitly told to not do it (which is usually difficult to ask the compiler to be more stupid in favor of better debugging capabilities)
pls correct me if wrong,, @10:00 I thought Java's data types are checked during compilation and therefore all data types should be confirmed during runtime?
Class loading still happens at runtime, so you can still get type errors. Also, you might find that once the program is running, you get the one subclass 95% of the time and it's worth making that assumption.
Veery interesting👍 thanks! I would have liked to see what's happening under the hood a little. So, what it compiles to compared to without the jit maybe
While the JIT compilers will look at your for loop sum try to optimize based on how it's run, AOT compilers with enough optimization flags would just look at that for loop and say "huh i can guess the end result at compile time" and then just remove your loop and place the result in there, that's also why most of the time when i want to test speed of things in C i resort to writing the value myself on terminal so the compiler can't know the value thus can't optimize the end result and entirely remove my calculation
Do Just In Time Compilers come with Almost Too Late Debuggers?
underrated. xD
great comment 🙂
never too late debuggers
Well, Java's debugger is also just in time, since it can attach over the network and talk to a running server... :p
I think you misspelled "people"
There was a feature in the very first version of FORTRAN, intended to mollify the Assembler users of the 1950s about “efficiency,” which allowed the programmer to “assist” the compiler in compiling efficient code for nested loops. Basically, it consisted of a FREQUENCY and an EQUIVALENCE statement. The programmer would specify an estimated number of inner loop executions per outer loop execution with FREQUENCY, and the pattern of memory reusability with EQUIVALENCE. Combined with other information collected during the compilation, FORTRAN could optimize when subscripts could be kept as the equivalent values in an index register, when they could be kept as index values but stored during a loop, or when they actually had to be kept in memory as subscripts (eg when the subscript value had to be printed out). These statements allowed the compiler to generate code closer in efficiency to what an Assembler programmer could produce, while wasting less time in the compiler doing the optimizing.
These statements were always optional, and were often added after test runs to produce a production version of a program. Later compilers which required less optimization (because computers were faster and had more storage) retained the ability to accept these statements (for compatibility), but ignored them.
These features helped overcome the bias of Assembler programmers against high level languages (known as “automatic” programming at the time), so that FORTRAN and other high level languages became successful.
The most ironic thing of all things about compilers is SSA - single statement assigment. It turns your damned mutable imperative program into a pure functional data-flow program ! its beautiful.
Because that's the only way to optimize.
@Allan Richardson that was some very nice to read piece of history 👍
@@pranaypallavtripathi2460 yep I never would know that what's essentially PGO - profile guided optimization. was possible in Fortran. Ahead of its time in all senses.
C on the other hand, designed by comitee, but hacked down, literally from BCPL.
Ambiguous parsing syntax, because why not, right?
@@monad_tcp Actually it wasn’t possible in the FORTRAN language (ie the program being written in FORTRAN), but the COMPILER, written in Assembler (for the machine in question, initially the IBM 701) could optimize the generated code with these “hints” from the programmer. New terminology appeared as time went on, but the essence remained.
Speaking of ambiguous parsing syntax, only two hard examples of ambiguity existed in FORTRAN that I could find: one in the DO statement and one in the FORMAT statement, and they existed in all versions of the language because, for compatibility, users would not allow the “space” character to have any delimiting function. Except as part of the text of an alphanumeric item in the FORMAT statement (and later in CHAR form constants), the spacing between actual language elements was DISREGARDED, and the syntax was defined so that the separation between elements did not affect the compiler.
Example 1, the DO statement, used to create an iterative loop: the general form is DO = , [ , ]
where = label number of a statement, value 1 through 32767,
which ends the range of the loop,
= the name of a simple integer variable, consisting of one letter (I-N) followed by zero through 5 characters which can be letters (A-Z) or digits,
And , , and can be integer constants [-]#…#, subscripted integer variables (subscript[s]) or expressions. But the arithmetic statement format is [(subscript[s]) = expression. Therefore the DO statement “DO 35 I = 1 , 100” written with no spaces (which is how the compiler sees it) as DO35I=1,100 could also be an arithmetic statement up until the first comma, after which if DO35I can be divided into “DO 35 I = 1,” where 35 is a valid statement number and I is a non-conflicting integer variable name, it is compiled as a DO statement; otherwise it is an arithmetic statement, which may contain errors.
Likewise, the FORMAT statement, used in I/O operations, is defined as “FORMAT()”
But if “FORMAT” is the name of a variable, which is valid (FORTRAN has no “reserved” words), and it has subscripts, the compiler will not be sure this is not intended to be an arithmetic statement until it reaches the matching “)” and sees no “=“ sign after it!
Amazing what they were able to do with so little computing power and math theory!
That's maybe the best explanation why Python, Java and other canonically interpreted langs can differ in performance so much depending on the interpreter/standard/environment.
Java is compiled to bytecode than JIT'ed. JavaScript is interpreted and then JIT'ed
Python is just interpreted. It can be JIT'ed using PyPy.
@@igorthelight Almost no language is just interpreted (except for batch files or shell scripts, maybe?). Most interpreters first compile to a simplified internal format (like bytecode). It doesn't make sense to call Java "compiled to bytecode" and Python "just interpreted" when Python does the same thing.
@@pierrefley5000 Fair point!
@@pierrefley5000 Well, you can say "compiled to bytecode and then interpreted", if you want to. But since bytecode is just a different, more machine readable representation of the written code, that's not so big distinction. And it's more of a translation than compilation; and it has been there from the very early 8-bit microcomputers and their BASIC - nobody is saying that they weren't "just interpreted" because they were translated into byte-code first.
But yeah, batch and shell scripts are, AFAIK and I'm really confident about this, parsed and interpreted everytime it runs, there's no translation to byte-code involved at all. Rest of the interpreted languages, I don't know if there's a single one that doesn't have some kind of internal byte-code representation that the written code is first translated/compiled into, but if there is then it has to be a fringe case really ;) It just doesn't make sense for interpreter to parse the written representation of the code every single time that piece of code is ran.
@@robsku1 Well, there was no translation to bytecode for early 8-bit BASIC interpreters. All they did was parse the source code and perform its behavior directly.
The downside of JIT is the final behavior is not locked down and determinant in a way that can be practically tested with certainty, so runtime behavior can't be effectively verified for critical applications.
eg Some fool at Boeing tried to use Java for a critical embedded control, worked fine most of the time but about 1% of the tests would have failed activation because the JVM was busy doing stuff when the signal came in.
If a game drops a frame here or there or misdraws a few pixels, it's a minor annoyance if it is even noticed, If a jet doesn't retract its landing gear or an automated defibrillator(medical) glitches, it will be noticed and more than an annoyance.
This is also why you don't use a hammer as a scalpel.
Oof
@@mage1over137 First one must know the difference between the hammer and scalple.
I dont get it, so all java programs that run on JIT compilation are prone to failure and unexpected behaviour at runtime for 1%(or less) cases?
@@daruiraikage When the Java Virtual Machine de-fragments and frees memory (garbage collection) it will sometimes pause the program while it does so. Garbage Collection is inevitable in a long-running program. You can delay it giving it a larger memory buffer. However, giving it a larger buffer will make the pauses longer when it finally decides to do so. Most of the time it's difficult to effectively control when garbage collection happens. If you somehow disable garbage collection during execution of a critical section, but you use more memory than is available in the buffer, your program will crash.
Essentially, languages with Garbage Collection (most JIT languages) are il-suited for real-time applications. i.e. Java, JavaScript, Python, C#, Golang, Ruby, Haskell, Php, Perl should not be used in real-time or critical environments.
A major use for JIT compilation: emulators. Huge performance boosts were gained by using JIT instead of straightforward interpretation for translating the machine code of the emulated system into the host's machine code. Especially important for emulating more recent systems. Of course, what constitutes "recent" depends on when the post was written. When I first heard of it being used, it was the Playstation 2 and Gamecube that it was being applied to (possibly the Wii as well), and there wasn't any way of emulating the PS3, or Xbox360 yet. With the last couple of crops of consoles using x86 processors, emulating one might have more in common with running Windows in a virtual machine instead.
gamecube and wii have the same processor architecture and so use the same emulator (dolphin)
@@octopirate-bak Yeah, I knew Dolphin can do both. I just wasn't sure when it had Wii emulation capability added.
Switch already has great emulators, running at faster than the original hardware on many games. It's great.
@@hosono3918 well, they both use JIT compilers too for ARM -> x86 translation
"and there wasn't any way of emulating the PS3, or Xbox360 yet" - so what about RPCS3 and Xenia?
The compiler is removing the loop because the “a” variable is not being used after the loop! If you print the “a” variable at the end the “time” will be different.
Yes, additionally he just adds a constant, 5, up to 1e7 times, which is just a multiplication that fits perfectly fine in a single 32-bit number multiplication instruction, so the speed should also be constant.
Yes and I think he also implied that with saying the loop just got compiled away. But it could have been pointed out better.
I'm not sure how Pypy works since I don't use Python but it should still be possible to more or less optimize the loop away. The function call inside the loop can be abstracted to a constant number instead (since it runs the same function with the same inputs) and from there you can just multiply that constant by the number of loops and add that total value all at once. Of course, this isn't to say that Pypy would actually do this, just that it's possible. I don't know whether it actually would in this case or not.
assuming a was used and loop haven't been removed, would the jit optimize the f for the int values or optimize it just for f(2,3) (since it is only used for that particular values) ? assuming jit optimizing methods for particular values how does it know when to stop optimizing? does it a have storage limit for "optimized" methods etc. it feels like it might take away the performance if there are too many variations
Absolutely. Benchmarking JIT compilers is tricky.
I looove computerphile. i work in construction but everytime i watch one of your videos I realize what i really wanna do in my life... thx you guys so much!
It would be great if you could talk about the "scary how clever they've gotten", it's exciting to hear a statement like that so would love to know more
I'm pretty sure they already have videos about, for example, the Spectre and Meltdown vulnerabilities in CPUs.
The comparison between Just In Time compilation and out of order pipelines is brilliant. Never though about it before.
CPU cores are hardware interpreters for their ISA. Their µcode looks very different from their machine code. Been like this since system/360.
@@ralf391 That's a genuinely interesting way of putting it! Makes it much easier to see how out-of-order cores can be considered JIT compilers in a way
A decent optimizing compiler will detect that that loop has no side effects (including output) and that the result of summation isn't used anywhere, and will simply throw the whole thing out. To prevent that outputting the final sum would suffice.
The example in the video was a silly toy example. The idea is that JIT works, even in situations where static optimization would have no idea what's happening.
@@TheHuesSciTech Yes, yes, I know.
I'm fairly sure it's impossible to have a fully-static implementation of a javascript compiler that also completely implements the ECMAScript spec. This is true for most "JIT/interpreted languages"
For example: Runtime reflection on aribtrary data types is often a key feature in JIT/interpreted languages and this is in general incompatible with static compilation.
Another example is runtime evaluation of generated source code. A compiler could embed itself into the compiled program and invoke itself for these cases but then it's no longer statically compiled and you effectively just have a JIT compiler.
as someone who works on the v8 js engine in chromium I quite enjoyed this video. people often think jits are a lot scarier than they really are and education in this area is always appreciated! also shout-out for using a framework laptop :)
Hi, he said that javascript can be compiled, but in the past I was googling and it's not possible to make binary file out of js code. So is it possible or not? And if not, why it's not a thing?
@@daliborin a machine only ever runs machine code. eventuality, *something* is compiled.
@@daliborin quickjs
Snek
Reminder: protogens go beep boop
@@daliborin He meant that it's possible to compile any language, or interpret any language. That means that a compiler could theoretically be created for JS, but I don't know if anyone has done it.
Hey Brady, please ask Dr. Tratt to change his computer password; the video shows him typing it in at 3:18. While the risk is low (camera angle is pretty flat), a dedicated attacker might be able to figure out his password from his finger movements and use it. Better safe than sorry.
Excellent video btw!
+
I was about to say... typing in a password on camera, bold move. Hope he changed it before the video to something specifically for the purpose of the video
He commented in another thread he rotated his password (and that doing so is a habit after interviews like this).
+
Came here to say the same!
While this guy obviously understands the subject well, the example they chose is quite silly. A statically compiled language like C will also optimize out such a loop; it doesn't really show off how JITs can excel so much as it shows how bad the more naive approach of interpreters is.
Yeah. He demonstrated the difference between interpreted vs compiled, but not necessarily JIT. To show the advantage of a JIT he really needed to show a case where the program execution isn't known until runtime. Like a dynamic function call where a static compiler can not deduce the target so must generate a procedure call, where as a JIT can inline the target. Thats critical to make Java run fast when every method is virtual and most are very tiny, so inlining is crucial. Its the reason why static compiled Java performs badly. It needs a JIT to work well. C++ on thr other hand, with templates and virtual calls being relatively rare, much more information is known at compile time so it doesnt have the problem.
I love this channel but wow this was a terrible way to demonstrate the advantages of JIT. All he did was demonstrate the problem with purely interpreted languages, not why JIT is useful.
I disagree. Yes, this kind of problem is something a traditionally compiled language can optimize out. But JIT is primarily used for accelerating languages intended to be interpreted, which he demonstrates perfectly fine here. I'm not sure what else you were expecting to be shown.
@@benloud8740 imo his example didn't really demonstrate the difference between interpreted vs compiled, either. that was the difference between the translator making optimisations, vs making no optimisations. an interpreter can choose to implement optimisations in the way that it chooses instructions in the host language, just as well as a compiler can choose to implement optimisations in the way that it chooses instructions for the target language. the only important/qualitative difference is whether those optimisations are done statically or dynamically. you could absolutely make an interpreter that creates & remembers program-dependent optimisations ahead-of-time / statically, and then uses them upon actually evaluating the guest-language program. a compiler just has the advantage that it can target a less abstract language than the language the compiler itself is written in - but even then, compilers can't translate to CPU hardware microcode! that means that the CPU is effectively an interpreter (host lang: microcode. guest lang: instruction set) which is performing optimisations
I was hoping to see some coin flips to choose data types -- at dev time vs at runtime before the loop vs inside the loop.
We need a video of Dr. Tratt explaining how scary the processors have got as soon as possible!
Somewhere in my office sits Intel's five thousand page SDM. I may not know if a processor is scary but this, this is.
This video just explained something I've known about (and leveraged) for _many_ years now, but never truly understood, in a simple bite-sized way. Thank you!
Surely a byte-sized way? I'll see myself out 😉😁
@@adamuk73 👍
Tratt, I've been a fan of your writing (and some of your research) for years!
The work you did on the (missing) warm up of JITs with the PyPy guys is downright amazing :)
(The stuff I'm not a fan of? Mostly the style of language you chose for your Converge language -- I agree with pretty much all your goals for it, of course. De gustibus non est disputandum...)
Static compilers can have just as much info as JIT when given runtime info. This is often done using a technique called PGO and usually leads to the absolute fastest type of programs when combined with other static compilation optimizations (LTO, binary optimization [bolt], etc...)
Very cool!
You can also use JIT in some parts in a program compiled with and AOT.
finally someone uses one of the two editors and it's even the one, that can be used without surgical alteration of the editing hands...I think that's a first for computerphile.
Emacs has many packages that have modal editing. No need for that surgery anymore! Just install evil, Meow, boon, etc.
Another advantage of JIT over statically pre-compiled files that are then distributed (apps, games, etc.) is that the optimization can be specifically tailored to exactly your machine. The exact CPU features, cache sizes, memory size/bandwidth/latency, etc.
Statically pre-compiled on the other end must be compatible with a lot of different systems and configurations (some machines are 32 bits, some 64 bits, some have SSEx extension, some don't, etc.) This makes the programs both bigger and slower. There are detection, tricks, and such available to mitigate this problem but not eliminate it. JIT rules in that aspect.
Some of this is also possible with static compilation, look up Profile Guides Optimisation (PGO)
Great and full of insights as always! Just gave a lecture about JIT yesterday, posted this video to students.
In the modern C toolchains, there is the first part, compiling into the object file and there happens the first layer of optimization, the second level of "static" optimization when all object files are put into one "bucket", until this point i fully agree that the JIT can be more efficient. However, there is also the LTO (Link Time Optimization), this also has multiple stages, and this will look from the linker perspective and you can consider it as "dynamic" optimization. Off course you can seriously hinder it with improper pointer handling, especially function pointers and recastings, as those will break the path of data and function accesses and in this area JIT might be more efficient than the modern C compiled binary. My background is more C focused, happy to learn more about the higher level solutions like JIT.
The whole idea and power of JIT compilation is that it enables code written by plebs without exactly that knowledge of the inner working to execute much faster. But it also scales so a very skilled programmer can spend more of their time on the interesting stuff instead of optimising for performance everywhere.
Even LTO is completely unaware what data or parameters the program will be run with. PGO (profile-guided optimization) is what you should be touting as the C competitor to JIT; but the problem is that PGO requires you to generate profiles, which is a manual step the developer has to go through. Whereas, as @Hamachingo points out, JIT can see things that PGO can see but LTO can't, but it also just works "for free".
@@TheHuesSciTech Thanks for this, i was not aware of the PGO, i learned something today.
I always wondered why the JVM for example is not able to dump the current compilation state into a file. So next time you load the same program you don't have to start the JIT process from scratch. If you have a big webapplication in java it takes quite a long time from startup to a point where your application runs a same speed it did last time.
I also wondered the same thing, but in the end you can't just dump and reload pieces of compiled code. JVM initializes everything lazily and initialization order affects application correctness so every time you start your Java app, the JVM must ensure it's initialized in correct order, even if something has changed (e.g. environment variables or whatever). OTOH, if you invest enough engineering effort then you can actually reuse some compiled code if you check initialization order (and other important stuff) first, so your app reaches peak performance much quicker. Azul ReadyNow! implements that and it's a paid technology. OpenJDK authors are also working on improving JVM startup as part of Project Leyden.
There are some languages out there that can do this. Check out Pharo and Glamorous Toolkit (and other Smalltalk family languages, like Self).
There is actually an official Java fork that does just this, look up CRAC (Coordinated Restore At Checkpoint)
@@theshermantanker7043 Nice 🙂
@@theshermantanker7043 Nice 🙂
You can disable the E-Cores in bios if you wanted to have more comparable results. For the best comparison, disable all but 1 P-Core + hyperthreading and set it to a fixed frequency.
not to buy them at all is even easer.
@@Mr.Leeroy you can't avoid them even on x86 now, so good luck with RISC-V
Or just accept that it's more than likely the future and therefore downright relevant for a language/compiler researcher to have.
@@simon7719 What I said has nothing to do with JIT, it's just an explanation of how to set up the hardware to it shows a consistent duration (Time) requirement for any testing.
Skip to 4:44 and listen to what he says about his laptop. I'm not sure if you're familiar with the last few generations of Intel CPUs, but most of the variants of the last 2 generations have "E-Cores" (E for Efficiency) and "P-Cores" (P for Performance). These are both a power saving method for doing background tasks as well as a way to make the core count look "better" to consumers to compete with AMD's core/thread counts.
@@simon7719 "more than likely" part is debatable. Especially when CPUs with efficiency cores draw more power than previous gen and the thing that is supposed to choose between E/P cores for your load is Windows.
maybe I'm being too skeptical, but the typing at @3:20 is a bit risky? They showed the keystrokes of the password in the video. somebody with enough expertise in this area could potentially figure out what his password is based on the fingers he used for the keystrokes! I can already see he used a "shift" key for an uppercase letter in there!
I also thought that I would have requested a black box over it just to be safe
Pretty sure I heard that someone was successful in decoding keyboard keystrokes by sound alone.
So better put on you tin foil hats..
I came to the comments to see if anyone else had picked up on this
Hunter2
I appreciate the concern, but there's no need to worry. What you can't see (since its output wasn't used) is that there was a second camera pointing at my keyboard. I rotated my password after the recording, as I tend to do in such situations anyway.
The video quality is superb! It's really noticeable how clear the image is
They did up their game, it looks great!
Yeah. It is 4k.
*a compiler takes a file and produces another file.* the language in the input file is called source code and the language in the output file is called target. the source and target can be the same language. some people call this a specific name like compiler-compiler or transpiler but they don't need to because is just a compiler.
*an interpreter takes a file and produces actions described in the file.* the JIT can compile to a bytecode or to assembly in both cases produces faster execution however if the source programming language isn't designed with zero cost features then the speed the JIT produces won't be same as a zero cost programming language. There are very small number of usecases like inside a loop in which a JIT can produce code faster than a zero cost language.
Was confused about this for a long time. This video cleared most of my doubts.
This channel is an unknown treasure 🙌
No it is not. It has 2.2 million subscribers.
I'd say they’re pretty well established
Trace-based compilation has a separate lineage starting from the HP "Dynamo" compiler. That was revolutionary at the time.
The most insane form of optimization I've seen in any language is APL, if anyone's curious just search for "Dyalog 18: The interpretive advantage". They dive into how they can shave off nanoseconds just based on how they optimize, it's crazy stuff.
Thanks!
First video I've seen with Laurence Tratt. He's great!
Loved him, really great!
3:20 who will be the first to tell the password
I've heard many times that JIT compilers can possibly optimize the code during runtime to make it faster than statically-linked code, and I believe that. However I've yet to see a real-world example of that happening, even in frequent loops. I'd really appreciate it if someone could point me to an example where the JIT code is indeed faster (without the linked code being artificially hampered).
I have some code that fits billions of curves to n datapoints each (once per pixel in a large image). n is unknown before runtime, but it is constant once it is read in. We wrote it in C and in Python using numba as the JIT. The python version is slightly faster. The reason is that the python version knows the size of all the used arrays at compile time (which is at runtime) and the C compiler does not and cannot know this. One option to make the AOT version faster is to compile many versions, one or each possible value of n. Obviously that is quite limited.
@@jensrenders4994 Thank you for the example! Does the source happen to be available so I can play around with it?
@@mmsbludhound873 This is part of unpublished research so cannot share it immediately. It is supposed to be published at some point though.
@@jensrenders4994 Ahh gotcha.
Try searching for some math scripts and just run them with both an interpreter and a JIT compiler
Rocking the Framework laptop ✊️⚙️🛠
Very interesting. Would love to hear more details on how those optimizations work, i.e. why exactly makes that much difference between the two compilers. What does the optimized program look like?
Brave of him to type his password on camera.
I see you're using a Framework! I love it!
Nice Framework laptop!
Nice framework laptop! so tempted to buy one.
That was the first thing I noticed lol
CPython does have a JIT compiler, Numba uses one. Sure it's a bit limited, but you can get 13x speed on math heavy code.
gotta crunch the numba's innit?
I used Numba for a big data analysis project in particle physics, very math heavy. With the near-trivial extension to distribute the work (built-in feature prange() ), I managed to get >100x speedup and >10x faster than existing C++-code (though to be fair, it was not terribly optimized).
Point is, it maintained readability perfectly and cut execution time from several minutes to seconds!
It's worth noting that if you create a generic function that can take an integer or a string, compilers will automatically create two machine code versions and just use the right one depending on what datatype you are using. That really has nothing to do with JIT.
Also, JIT engines generally use a LOT of memory, so while they can be fast, they aren't magic and don't suit all applications. Always use the right tool for the job. 8)
I mean, I hate Python, but I have to point out that a Python function can take 20 arguments; whereas we're not going to make a C++ generic function that takes 20 generic types and have the compiler generate all 2^20 machine code versions. So comparing JIT to generics like that seems weird to me.
@@TheHuesSciTech You're very unlikely to call that generic function 2^20 different ways, though. The template types are known at compile time, so they don't depend on user input. The programmer would have to code up every possibility of static types.
Thanks for another great video that masterfully balances on the gap between technical depth and superficial overview :)
Best regards
python 3.11 actually uses a similar strategy to a JIT by doing an optimized lookup for the addition operator based on what types that function is frequently called with.
It was nice learning a few new things about the JVM, as it’s one of my main languages.
The "dynamic analysis" can't possibly enumerate all valid code paths, so how can JIT make sure its optimizations are non-breaking?
Tratt's student cfbolz once answered this question "we add lots of ifs and prey". If you're fortunate, all of those ifs can be hoisted outside of the hot loop, but there are obviously times they must remain. There are a lot of approaches to reducing the number of things you need to check, too.
Using the same example from the video. They have a function "f" in Python, whilst being just 1 line of code we see it doing 2 very different things (addition of numbers, or string concatenation)
At runtime, there's a big loop constantly calling the function f with just numbers, we know it's always numbers because the numbers were hardcoded.
The JIT can spot this, create and compile a function at runtime in machine code that just deals with numbers (let's call this new function f_numbers) and replaces the code in the loop to call this new f_numbers function.
Elsewhere the code is left as-is using the original interpreted function, f. This ensures that other code paths that aren't guaranteed to be using numbers continue to work
It's not the best example because this could have been spotted without running the program but hopefully that makes sense.
The main thing to realise is that when JIT compiles a method, it doesn't completely replace the original, but creates an optimised one for particular use cases alongside it, and the new compiled one will only be used where it knows it's safe to do so
It will add "guards" that check that all the assumptions are still valid, and if they are it continues to the optimised fast path, otherwise it will jump to a "deoptimise" function that will invalidate the compiled code and fall back to running it in an interpreter
So this way it can assume things that are probably true based on the behaviour it has seen so far, but not break if that changes, which a static compiler can't do because it can't see how the program will behave at runtime
Nice that you indicated what I already said long ago: JIT actually is JTL.
Maybe you want to blur the part of the video where Dr. Laurence Tratt logs in to his pc...
There is no way anyone cam guess the password from that footage. Maybe length and rough positions for some keys but I don’t think it reduces the search field considerably
@@Gvozd111 Would you put money on that? Before you answer, maybe you should get off your phone and watch it again in 1/4 speed on a 4K monitor. I can clearly see all the right hand letters, and I can tell what keyboard row and finger he's using for each letter on the left hand. This is slightly more difficult because he's a 4 finger typist and moves his hands along the keyboard, but it narrows the left hand characters down to just a few possibilities. And that's assuming the password isn't so simple that you don't just guess the left hand letters based solely on the known right hand ones. 😁
"big team in most cases" - funny to hear from someone showing pypy (where work on jit was done in quite small team) as an example :)
same with amazing luajit2 which afair was done mostly by a single person?
But how much time those small team needed to create their first working version? ;-)
In C# and .Net languages in general, the JIT compiler compiles the IL code once before running. So. My understanding is that the IL code can be run cross platform as the compilation happens on the target platform.
What you described seems to be a runtime optimizer or something. In any case it's a great time to be developing software.
I came to write a similar comment about C# and .NET. Also, it's very interesting to hear the word "compiler" for an interpreted language such as javascript. I don't know why he uses that word.
@@XtroTheArctic "interpreted language such as javascript"
watch the video again.
there's NO SUCH a thing as "interpreted language".
Languages are statically typed or dynamically typed. They are not compiled/interpreted, that's environment.
No, what he described is a JIT compiler, that's how they work actually.
JIT compilers are runtime optimizers.
Your misconception is that it is not the JIT compiler that produce the IL on the dotnet machine, its the C# compiler itself that does that.
The JIT compiler only works with IL.
Also, the System.Expression has a lambda compiler that's separated and also is able to make IL code. (so is the mono.cecil library)
The F# compiler also uses an entire different mechanism to produce optimized IL for functional code. The F# compiler is written in F#.
The JIT compiler doesn't produce IL, it consumes IL and produces machine code, at RUNTIME.
But it might also do AOT if you use Roslyn compiler, Roslyn is an hybrid compiler, it can do both, JIT/AOT and PGO, which is profile guided optimization.
You can have 98% of the performance of C++ for 25% of the cost.
I'm in the middle of porting everything I have in C++ to dotnet7.
The performance of the new Dotnet7 (with stack types and spans) is so amazing that it beats the Java Azul compiler, the best JIT for years. But not anymore.
Exciting days for the dotnet community.
@@monad_tcp I kindly suggest you to research this to learn more, if you want. Have a nice day.
@@XtroTheArctic No, I don't need to research this, I work on this research field.
I have a MSc in compiler development.
Your teacher is wrong.
There's no compiler/interpreter languages.
There never was. The first Fortran was interpreted, then a compiler was made. The very first compiler.
This is a simplification some teachers used to teach kids.
I would agree that the code isn't the best. In the example, will the optimizer notice that the a(variable) is not used after the loop and remove the variable and loop? I have seen in C, C++ and C#, the optimizer sometime removes "wait" loops. Will the results be the same if line 10 "print (a)" is added?
...but that is the whole point of optimization? Btw, I think he mentioned this happening, but I admit, that he could have been clearer.
@@cobaltno51 I think they were hoping for an example where the optimization was based on analysing the runtime behaviour of the program - i.e. the kind of optimization only a JIT would perform.
I believe .NET also has a JIT compiler.
It does!
I understand citing as examples for JIT compilation Java and Javascript engines, but I think a bit more visibility and research should also have LuaJIT, it's pretty astouning the progress which has been made in that project.
Just discovered this channel, thanks to the Tom Scott association.
What i have notices about me playing minecraft, is that after i have started playing for a while, it does seem to get a bit faster after ive played for about an hour or so. I wonder if this is what im seeing in action.
My instinct is that it seems weird that it would take a whole hour before it took effect; I'd expect the JIT gains to be observed much sooner than that. But I have no specific evidence to give to you on that.
Yeah I don't think it would take Java an hour to figure out how to run its code. It's probably just that you've got all the chunks around you loaded, and you're getting more cache hits on the entities that have to update. I found that garbage collection would still make the game stutter every few seconds though.
Please cover Ahead of Time compilation.
The framework laptop. Nice.
tnx Dr Laurence Tratt. you have answers 10 questions of mine I was searching in a 10 mins video
PGO is basically JIT but for static compilation :)
Case and point that languages can be both compiled and interpreted: Java native images. There the graalvm native image compiler takes java bytecode (so Java or Kotlin or something else compiled for the JVM) and then compiles it into a static binary, which is then much smaller, faster to start up, but (especially without PGO) it can be quite a bit slower.
This video is LeJIT!
JVM optimizations come from the Strongtalk system which was a strong typed Smalltalk implementation because Self's optimizations weren't enough.
is he using a framework laptop?
He is
I wonder if it is "realistically" possible to make some of the dynamically typed languages as compiled machine code without just resorting to intermediate representations + runtime (like how Python does?). Would it not create way too many permutations of the code if the data structure is complicated and could hold any number of possible types associated to it?
If things are statically typed, I think being interpreted or compiled is just a matter of design decisions (both can be a feature of the language), but if some program can take in any type of data and permutate the actual operation during runtime depending on the input, is there a point at which it really becomes non-beneficial to completely compile down to machine code?
It is absolutely the case that it can be quite detrimental to performance to have a function that could have one of many different interpretations, with no way to know which one to use until it is called. However, static compilation generally still will improve performance, since a static compiler will be able to find a way to pack all the possible implementations of that function into a dispatch table, at which point each of those implementations would be able to be optimized as normal.
I wander if someone has done some study on how large the the dispatch table will become to accomodate all permutations of a dynamically typed language (of course it will change depending on how the program is written etc.).
@@aoi_mizma I imagine it would just be the total number of combinations of possible input types
@@Howtheheckarehandleswit or we could analyze the parts where they're being used and which operators/functions they're used with and massively reduce the number of permutations
@@ashwinalagiri-rajan1180 This is a hypothetical compiler for a dynamically typed language. It’s not possible to see all of the types that are actually used at compile time.
There is a thing called Generic Types in almost all modern langs, and the ahead of time compiler can optmize more efficially by using it. and, JIT can not do much more as AOT, since it needs detect the runtime infos, that is why JIT has 3 gens is C#.(or PGO, too)
How about a discussion on JITs versus compiler code instrumentation and if there are still advantages that JITs hold that are out of reach for what the compiler can get from repeated runtime statistics?
JIT just happens "for free", whereas PGO (which is what I assume you're referring to) requires manual effort from the developer. I feel like that's the biggest difference, and that's coming from a PGO fanboy like me!
Great video. Another contributor I'd like to see more of.
A _"compiled PL"_ is just an abbreviation for _"a PL designed for effective compilation",_ while "interpreted PL:s" are languages without this design feature. Besides, for any computer scientist, it should be obvious that Python is very bad for compilation to effective code, so therefore it is a language most suitable for interpreters.
And now CPython (the official and reference implementation of Python) has a JIT compiler, still preliminary but still they expect to do a 50% performance increase for 3.13 over 3.12, and another 50% for 3.14 over 3.13.
My man uses the one correct text editor.
Aren't there reentrancy concerns with JIT? - I.e., does JIT have to be treated like self-modifing code?
I think they use one of many possible workarounds like checking every so often if there is a new compiled version of a function available and only then replacing the function pointer.
Remember that JIT is not just a theory, its in use for a long time, like ~25 Years of Java for example.
This video should be titled with "How Just In Time (JIT) Compilers Optimize" or "Something about Profile Guided Optimization".
For example, transcoding bytecode into machine code during runtime should also be JIT; and then we can use the technique in the video to optimize the machine code during runtime (which is also JIT). As long as the "translation/compilation" happened during runtime, it is JIT, with or without optimization.
-by some extend, self modify code can also be JIT-
the loop in this example is pretty much static and only executed once per run. does this mean the jit is doing some optimizations before the first run?
If I understand correctly, the JIT is doing optimizations within the loop (as the loop is being iterated through)
@@syntro6607 This however is not something only JIT can do, an AOT-compiler would be to do that just as well. Perhaps JIT even optimizes it before the loop is executed, although that is unlikely given that such optimizing is resource-intensive and should rather be done while the method is being executed
@@syntro6607 With java, there is a command-line parameter -XX:CompileThreshold
"By default, in the server JVM, the JIT compiler performs 10,000 interpreted method invocations to gather information for efficient compilation. For the client JVM, the default setting is 1,500 invocations."
Oh, man, guy is dropping little hints to interesting bits and then leaves us hanging.
Oh, framework laptop, a man of taste I see.
You can unroll loops and exploit better branch prediction behavior.
3:44 thanks a LOT for increasing font size - it became watchable at 240p on a smartphone display :)
1:00 I would be a bit critical of these statements, for example it's impossible to write a Python Compiler which does not include a python compiler or interpreter in it's result(You can do something like exec(input(">>>")), so a real compilation of python or ECMA Script is impossible. A C Compiler does not need to do this.
If you had a non deterministic function, but most often returns a value, say
def func():
if random.random() > 0.999:
return 4
return 5
Would the JIT optimize the randomness away?
That's pseudo-random in most implementations, so it theoretically could do that and get the correct result but it probably won't.
Nice Framework laptop
I've heard of "Just in time debuggers" too, is that the same basic idea?
No. "Just-In-Time debugging is a feature that launches the Visual Studio debugger automatically when a program, running outside Visual Studio, encounters a fatal error. Just-In-Time debugging allows you to examine the error before the application is terminated by the operating system." That has nothing to do with having several goes at compiling code based on observed usage patterns, which was what was talked about in this video.
King Terry's compiler just in time and ahead of time rip terry davis
Bruv. Stop reading my mind, bruv. Was just searching this up not long ago to learn more and u come out with a vid on it?
Surely a compiler would find the same optimizations as the JIT for the example shown.
What are some examples of programs where JIT is able to find optimizations that compilers can't?
When an input to a function is dynamically computed or fetched during runtime, making it impossible to optimise for during compilation. Take a game engine responding to keyboard/controller input or a program running over a bunch of files on a user's machine, and the subsequent function calls that take that information as input. The compiler can make _assumptions_ about the inputs a particular function or section of code uses, but it cannot _guarantee_ that those are the _only_ inputs that'll be provided. Conversely, a JIT compiler is able to _observe_ the inputs and how they're used during runtime so that it can make an increasingly more accurate decision on how it should optimise the code.
When has a JIT compiled program ever been faster than a statically compiled program from something like C?
never, maybe if someone wrote bad code in C in better code in other language, if you are actually using power of C with full pointers and cache control, doesn't matter the magic nothing can beat C..
It can be faster, by putting heavily used code together so it can all reside in the icache. And, as in the toy example in the video, for languages like Python with generic functions, it can decide to have the JIT just compile to specialized versions (a static compiler can, of course, do the same thing, but if you have binary functions like + with N cases, you wind up with N-squared specializations, which can take a lot of room).
@@hardlyb I'm not trying to be annoying but my question wasn't can. I'd like to hear of a real world application and not just theory.
@@Amish_Avenger It's not just theory. It's simply that there are optimizations that are always possible in C, but that doesn't mean any C developer will do them.
It's like trying to say that assembly is always faster than C if done properly. It's true, but it would require you to always do optimal code for it.
@@user-xl5kd6il6c This still doesn't answer my question.
PHP got a JIT a couple of years ago. I'm not sure if it was worth it though, lots of people say it might not be. Part of the problem is that a typical PHP program just starts up, takes a web request, gets some info out of a database and generates a response, then shuts down. So there isn't much time for the JIT to optimize it.
I very much appreciate the VIM here with Airline/Powerline
Couldn't static compilers do the same thing by running the program one or multiple times after the first compilation? If that's not possible, what prevents them from doing so? If this is possible, could it eliminate the worm up problem?
I don't c a technical feasibility challenge to recompile machine code (for compiled languages) @ run time, except for the overhead in performance.
PS:
Since static compilers (C, C++) convert it to machine code before execution they lose out on the benefit of JIT. This is preferable for low powered embedded systems, where power utilisation is imp.
However, Java is an example of a language that converts to bytecode (intermediate representation) & has a JIT compiler to optimize the code @ run time.
The advantage of JIT(not well demonstarted in the video) is that the runtime input is not known until runtime. Different types of data inputs or outputs each time the program is used.
The downside of JIT is the final behavior is not verifiable for critical applications.
Static compilers don't know the input the program will take or even whether the program will ever terminate. But you could let the compiler observe some human guided runs and gain further knowledge about the program. This knowledge, however, is nothing but assumptions and might be wrong in later, actual runs. The compiler could perform some optimizations based on these assumptions, though, and may be able to make the compiled program faster.
That exists and as Daniel pointed out it's called profile guided optimization or pgo. The main drawbacks are overfitting and slow compilation times.
casual static compilation makes assumptions about your code which got hard coded by some compiler engineers.
Those assumptions might very well be wrong for your use case but how do you know the stuff that pgo just measured is representative of future executions?
Well, that's kind of hard since you don't really know what pgo measured exactly.
@@capturedflame this.
C/C++ needs more design beforehand, too.
Javascript and Python are perfect for free-flow adhoc dev.
This reflects in the way you optimize it. “Just do it for me” versus “show me what I should focus on to improve.
Super explanation. Thanks!
yes - absolutely fascinating and very well explained!!
So the JIT looked at that loop going to 10 mill and thought. "Why am I counting? I'm not showing him the work, just just return 10mil. that will make him happy."
Basically, yeah. Obviously it's a contrived example but the same idea can apply to more real-world examples.
Except in golang. No matter the code size it statically compiles in milliseconds.
are most Javascript engines actually using some code from java's vm? I didn't think so, but I might be wrong. also, that example shown here is pretty absurd, even the cheapest C compiler would optimize that loop out, unless explicitly told to not do it (which is usually difficult to ask the compiler to be more stupid in favor of better debugging capabilities)
pls correct me if wrong,, @10:00 I thought Java's data types are checked during compilation and therefore all data types should be confirmed during runtime?
Class loading still happens at runtime, so you can still get type errors. Also, you might find that once the program is running, you get the one subclass 95% of the time and it's worth making that assumption.
@@capability-snob thx for replying, yea that's right, also another example I just came up with is the type erasure of genetics.
BASIC is interpreted.
Love to see a Framework laptop
Veery interesting👍 thanks!
I would have liked to see what's happening under the hood a little. So, what it compiles to compared to without the jit maybe
While the JIT compilers will look at your for loop sum try to optimize based on how it's run, AOT compilers with enough optimization flags would just look at that for loop and say "huh i can guess the end result at compile time" and then just remove your loop and place the result in there, that's also why most of the time when i want to test speed of things in C i resort to writing the value myself on terminal so the compiler can't know the value thus can't optimize the end result and entirely remove my calculation
Are there actual static compilers for the whole of javascript (not just a subset)? That compile it down to a binary?
what about AOT (Ahead-Of-Time) Compilers?
nice presentation in question :)
What's the name of the programming font ?