How to Build a Virtual Machine

แชร์
ฝัง
  • เผยแพร่เมื่อ 23 พ.ย. 2024

ความคิดเห็น • 58

  • @ryanlunger8215
    @ryanlunger8215 10 ปีที่แล้ว +9

    I watched this video mainly because I admire Mr. Parr for his work. I enjoy the particulars of language design and implementation, so I knew starting out that this was going to cover pretty basic stuff. But I really appreciate and enjoy presentations given in this very informal way. My first C++ class was very much like this. There was a handful of us in the class, and with our instructor's involvement we would democratically discuss and design the solution to some particular assignment. A great deal of the more advanced stuff really sank in during those times.

  • @philipp7732
    @philipp7732 9 ปีที่แล้ว +25

    Awesome video, had some initial problems, but after debugging I found the issue and now I am a proud creator of a VM!

    • @unlockwithjsr
      @unlockwithjsr 4 ปีที่แล้ว

      Wow, how ? Did you open-source it or built it your own ?

    • @chrisparker9672
      @chrisparker9672 4 ปีที่แล้ว +9

      @@unlockwithjsr I mean, he literally builds one during the video.

  • @mmille10
    @mmille10 4 ปีที่แล้ว +3

    I started trying to read the (last) implementation chapter of "Smalltalk-80: The Language and its Implementation," and got lost rather quickly. I got here after reading Chap. 7 of "Squeak: Open Personal Computing and Multimedia," where it talked about the VM implementation. Now, some basics are coming together in my head. Great talk!

  • @avwie132
    @avwie132 4 ปีที่แล้ว +26

    When you close your eyes it sounds like Tom Hanks is talking

    • @snorman1911
      @snorman1911 6 หลายเดือนก่อน

      Wilsonnnnn!

  • @xN811x
    @xN811x 8 ปีที่แล้ว +11

    Immediately ordered his book

    • @pililoabc123
      @pililoabc123 8 ปีที่แล้ว

      title of book?

    • @xN811x
      @xN811x 8 ปีที่แล้ว +4

      Ruben Rivero "Language Implementation Patterns"

    • @pililoabc123
      @pililoabc123 8 ปีที่แล้ว +1

      Thank you!

  • @ivand8393
    @ivand8393 5 ปีที่แล้ว +7

    It is worth mentioning that probably the the first one was the FORTH virtual machine

    • @daschewie
      @daschewie 4 ปีที่แล้ว +4

      The first VM was O-code for BCPL in 1966 followed by Forth and Pascal in 1970.

  • @ThaerRazeq
    @ThaerRazeq 9 ปีที่แล้ว +11

    The fastest way of doing this other than generating native code manually, is by generating c code and compile that with llvm and run it at runtime. This is similar to how unreal script or valve's half life SDK do it. But I think iOS and Mac don't allow this for security reasons due to the potential malicious modifications.

    • @SimGunther
      @SimGunther 2 ปีที่แล้ว

      To get around that Mac walled garden restriction, those generated operations must be built into the program and made to be indexable via a table.
      The technique has been formalized as "building a weird machine" and QEMU makes good user of this tech. Just ask Kate Temkin as she's done a !!con talk on this sort of thing in 2021.

  • @kovertopz
    @kovertopz 10 ปีที่แล้ว +4

    I've written a tree interpreter for a small DSL. Traversing the tree isn't that bad. Creating the tree to me is difficult since you have to create one that represents the order of operations. My grammar was very restricted and I should have went with something like a Pratt parser.

  • @JackPurdon
    @JackPurdon 3 ปีที่แล้ว +1

    Awesome presentation

  • @petercheung63
    @petercheung63 7 ปีที่แล้ว +6

    better to have english subtitle for this wonderful video for non native english speaker

    • @Speaks4itself
      @Speaks4itself 2 ปีที่แล้ว

      Use the auto-generated captions

  • @blenderpanzi
    @blenderpanzi 10 ปีที่แล้ว +1

    A tree based interpreter has one application: When you have a throw-away expression that is only executed once. E.g. a input field where a user can input a value via an expression that gets immediately evaluated and replaced with its value. (Blender 3D has these, IIRC. But I think it simply uses Python for that.) Also I don't find tree based interpreters harder than others, but then I haven't built complex interpreters. Only very basic stuff (simple mathematical functions with only one data type (number)).

    • @falc0knights496
      @falc0knights496 9 ปีที่แล้ว +3

      fuck outta here,jk you should write a book bruh many books and i will sql force ly way to the domain of the book and DDOSS and analyze the book using sharkwire

    • @HumanBeingSpawn
      @HumanBeingSpawn 7 ปีที่แล้ว

      adrian venegas Wireshark? lol

  • @danielsmith5626
    @danielsmith5626 3 ปีที่แล้ว +4

    14:24 I'm still in shock that universities exist that don't introduce assembly language.

  • @karlmin8471
    @karlmin8471 7 ปีที่แล้ว +3

    He looks like Harrison Wells in The Flash, even his modal.

  • @solaxun
    @solaxun 2 ปีที่แล้ว

    Great video, helped clear up many of the questions I had about how bytecode VM's work (along with his book).
    One thing I do not understand though, is how compiled function code shares the same array as the rest of the code (see 1:23:05 discussing function calls). I can see how jumping to the address of a function's code works during the interpretation of "call" instruction, but how do we avoid running into sections of compiled function code while just normally stepping through the code array (as we increment the instruction pointer)? Is a portion of the code array effectively blocked off for function use exclusively, and the compiler handles this separation during compilation?

    • @leandroaraujo4201
      @leandroaraujo4201 2 ปีที่แล้ว

      You could certainly step into other code if you do a bad jump, but in the case of a function this is handled by the RET instruction. Ret cleans up the stack, sets the return value, and jumps back to the caller address. Now if a function doesn't call RET, that's likely to happen, since you don't jump back.

    • @solaxun
      @solaxun 2 ปีที่แล้ว

      @@leandroaraujo4201 That's related to, but not quite the same as the question I'm asking. It's tough to ask this one online, but I'll try. Imagine you have some pseudocode like this:
      ```
      x = 10
      define somefunc(x,y){dostuff}
      y = 9
      ...more code ...
      ```
      Since in the example VM all bytecode gets compiled into the same array, wouldn't you end up with compiled function bytecode in the middle of those variable assignments, which you would then "run into" while incrementing the program counter?
      For example, maybe the above code compiles to something like:
      ICONST 10
      GSTORE xaddr
      **compiled function code**
      ICONST 9
      GSTORE yaddr
      You start with the PC at "ICONST 10" and then immediately after running that, you increment the PC and step into the compiled function code, even though it hasn't been called, so there would have been no jump to that location. I could see this working if function code compiled to it's own "region" of the array (like before or after everything else), but that part wasn't covered in the video.

    • @leandroaraujo4201
      @leandroaraujo4201 2 ปีที่แล้ว

      @@solaxun Oh, that. Yes, that would happen; and I guess the only way to avoid it is structuring the code and setting the program counter correctly. See in the factorial example, how the main function is last, and the program counter is set to point to it. As far as I know, the same thing happens with assembly (with some minor changes).

  • @carnelyve866
    @carnelyve866 10 ปีที่แล้ว +2

    bookmarked and subscribed....

  • @guilhermesaraiva3846
    @guilhermesaraiva3846 7 หลายเดือนก่อน

    there are any book about this subject that guy is talked, building VM step by step i did not find it

  • @kennethcarvalho3684
    @kennethcarvalho3684 ปีที่แล้ว

    How can one get java source shown in this presentation?

  • @BryanChance
    @BryanChance 3 ปีที่แล้ว +1

    How does the CPU do anything in the first place. That's what I want to know. Too low level?

    • @OttoFazzl
      @OttoFazzl 6 หลายเดือนก่อน

      Check out the free course nand2tetris (part 1). In it, you build a CPU from scratch (only from nand blocks), then you build a compiler, a bytecode VM similar to the one explained here (ever more complex), then high level language and OS for the computer. After that course, there will be no more "magic" in how the CPU works and how it's all connected.

  • @peterfireflylund
    @peterfireflylund 7 ปีที่แล้ว

    Is it just me or is the 6502 memcpy code he shows in the beginning buggy?
    I think it decrements Y (CNT.L) outside the inner loop when it shouldn't.
    I also think it is wrong to increment SRC.H and DST.H after the inner loop because it will only have copied 256 bytes if CNT.L started out as 0.

  • @jeanclaudescandale
    @jeanclaudescandale 5 ปีที่แล้ว

    Hi, Why not using a data stack and a return stack ? each instruction, beside push, could pops its arguments from the stack only and the compiler could use macros to avoid the hassle of using too many pushes and pops;
    Like jmpz could be inlined :
    push addr
    push 0
    sub
    jmpz
    What do you thinks ?

    • @jeanclaudescandale
      @jeanclaudescandale 5 ปีที่แล้ว

      or call (fun (arg1 arg2)) :
      push arg2
      push arg1
      push 2 (or even nothing)
      push addr on top of ds and ip+1 onto the return stack
      call
      and the vm could use stack pointers instead of linked list to manage the stacks.

  • @xinyuliu7346
    @xinyuliu7346 8 ปีที่แล้ว +1

    this guy is great.

    • @Kitulous
      @Kitulous 6 ปีที่แล้ว

      Xinyu Liu he is an Apple fanboy that doesn't understand that phones are just a tool, not the way to show off or something.

  • @ihnwtpu
    @ihnwtpu 9 ปีที่แล้ว +1

    I felt so smart when I was able to answer his question before he said the correct answer :D

  • @HumanBeingSpawn
    @HumanBeingSpawn 8 ปีที่แล้ว +1

    Name of book please

  • @jovaha
    @jovaha 8 ปีที่แล้ว

    In most of the instructions he uses the value stack[sp] and then decrements sp or increments sp and than sets stack[sp] which would correspond to push/pop. but in the LOAD/STORE instructions he uses and sets the value stack[fp+offset]. To me that seams like something you couldn't do whit a stack. I would like to implement this whit an actual stack like data structure. Is there any way of doing LOAD/STORE instructions whit only pop/push?

    • @OttoFazzl
      @OttoFazzl 6 หลายเดือนก่อน

      My understanding based on this lecture and other course I took on this topic is that the implementation of a stack virtual machine itself doesn't have to be stack-based. It purely depends on what language you are using to implement the machine. So, there is nothing wrong to have access to stack with offset while implementing LOAD/STORE. He also used offsets when implementing CALL for example. Stack virtual machine is stack based abstraction, but implementation doesn't have to be.

    • @OttoFazzl
      @OttoFazzl 6 หลายเดือนก่อน

      Even if you were implementing the stack VM in pure assembly, you can have offset-based memory access in assembly by calculating offset memory address and storing it in a register.

  • @alessandromeyer4888
    @alessandromeyer4888 10 ปีที่แล้ว +1

    Really great! If u'd have used scala u'd have been twice as fast writing the code. :-)

  • @LarsHarmsen1337
    @LarsHarmsen1337 10 ปีที่แล้ว +4

    When he was spiting on android users I wanted to stop the video. And then, instead of naming the register "program counter", he gave it some funny name, because he's an apple fanboy. Say what?
    But I kept watching until the end. And I guess it was worth it. I wasn't aware you can label things and break out of those specific parts in java.
    A little criticism: I couldn't understand the questions of the audience.

    • @HansUhlig
      @HansUhlig 10 ปีที่แล้ว +10

      Are you referring to the instruction pointer? The names are synonymous. See en.wikipedia.org/wiki/Program_counter

    • @blenderpanzi
      @blenderpanzi 10 ปีที่แล้ว +6

      Indeed. I learned both names in a lecture held by someone that is definitely not an Apple fanboy (Linux user).

    • @asuasuasu
      @asuasuasu 6 ปีที่แล้ว +2

      x86 uses the instruction pointer name. the x86-64 register is named r**ip**.

    • @asuasuasu
      @asuasuasu 6 ปีที่แล้ว +1

      oh i'm 3 years late ok

  • @HumanBeingSpawn
    @HumanBeingSpawn 8 ปีที่แล้ว

    Did you know Microsoft uses the *STDCALL* convention in their Win32 API? lol
    It seems like you despise everything associated with Microsoft, or anything not associated with Linux.

  • @FacebookIL
    @FacebookIL 9 ปีที่แล้ว

    the stack is growing downwards and not upwards.

    • @gonkula
      @gonkula 9 ปีที่แล้ว +1

      +NoPTic S (Dersus) That's actually architecture dependant (and true for x86/x86_64 amongst others) but there are architectures where that's not true HP's PA-RISC springs to mind.

    • @FacebookIL
      @FacebookIL 9 ปีที่แล้ว +1

      I know, but most of them grow downwards.
      Anyway, nice to see some people who does understand some ASM !

    • @kilswitchengaged
      @kilswitchengaged 8 ปีที่แล้ว +3

      +Alexandru Pană Its not irrelevant if you are working at the OS level generating machine code. You need to be aware of how much memory you have for the stack and if you are overwriting something else.