Introduction to Tokenization | Writing a Custom Language Parser in Golang

แชร์
ฝัง
  • เผยแพร่เมื่อ 23 พ.ย. 2024

ความคิดเห็น • 32

  • @WilhelmDrake
    @WilhelmDrake หลายเดือนก่อน +2

    @34:00 Isn't it kind of inefficient to search the source string for every pattern while parsing each token? Am I missing something?
    I'm sorry if this is a dumb question, I don't know Go and I'm not a professional programmer.

    • @tylerlaceby
      @tylerlaceby  หลายเดือนก่อน +1

      No this is a great observation. Yes this is a performance consideration that is to be considered. When handling this type of Lexer, the order of which regex patterns you try first really matters for optimal performance.
      Also making your regex patterns really consider and making use of efficiencies in the language are important here too. There are some really bad practices with regex which I won't name here, but if you use them in your patterns, it can add linear or exponentially more time to your query. As you noted since this can happen for many of all tokens this can make it polynomial and that's really bad possibly for large input sizes.
      The Lexer for this series is going to be more than performant enough for most languages. However, if you were building a language with requirements for a small memory footprint for the compiler and the Lexer is taking too much time, you could always hand write the Lexer using basic state machine.
      This is typically some with a single while loop where you check the current token and based on that value, you go from there. If it's a single character tokens then you already know what to do. If it's a token which can be a part of a multi character token then you look ahead and repeat until you have a decision node. I have some examples of this in my Typescript series videos.
      Anyway, I hope you enjoy the rest of the series and learn much from it. Also great observation and If you have any other questions feel free to ask here or on discord.

    • @WilhelmDrake
      @WilhelmDrake หลายเดือนก่อน

      @@tylerlaceby Thank-you for the response! Really means a lot! I'm enjoying the series! Your work is greatly appreciated.

  • @modley_the_m_guy
    @modley_the_m_guy 7 หลายเดือนก่อน +6

    Really hyped! I first watched your "How to make an interpreted language" about 3 weeks ago, and after rewriting my codebase in C#, I published it on GitHub! It was originally supposed to be a functional language, so I didn't add variables. I'm working on that as well as static typing! Now you can type anything, the only thing I have left to do before I do this GIGANTIC new commit is to add static typing to function return types!
    ...What am I yapping about? Great video, man! Can't wait for episode 3!

  • @rakinar2
    @rakinar2 7 หลายเดือนก่อน +4

    Loved the video! And yes, I'd love to see new videos about interpreting and compilation!

    • @joseph0x45
      @joseph0x45 7 หลายเดือนก่อน

      sameeeee

    • @martinvacheron3839
      @martinvacheron3839 7 หลายเดือนก่อน

      Same here! But mostly on an interpreter

  • @riebeck1986
    @riebeck1986 3 หลายเดือนก่อน +4

    Extremely clean and thorough explanation of a Lexer. Thanks a lot of making this video

  • @Raaampage
    @Raaampage 7 หลายเดือนก่อน +1

    Nice ! I'm looking forward the parser videos 😊

  • @DoubleDotStudio
    @DoubleDotStudio 7 หลายเดือนก่อน +1

    Your videos are always great. They are super easy to follow and learn from. 😃

  • @MultiMarcsOfficialChannel
    @MultiMarcsOfficialChannel 7 หลายเดือนก่อน +1

    Super excited to continue learning with this series! I just got finished with exams and wanted to learn more about how LSPs work. Thank you for this!

  • @vitamingo
    @vitamingo 4 หลายเดือนก่อน +1

    I'm writing my own sql parser, thanks for your video!

    • @tylerlaceby
      @tylerlaceby  4 หลายเดือนก่อน

      Awesome. Hope this helps 😄

  • @tmanley1985
    @tmanley1985 7 หลายเดือนก่อน +1

    I'm learning to write dsls right now and this is immensely helpful. I started building an interpreter for a yaml based dsl so that I could learn that portion. But now I'm focusing on the lexing and parsing portion which scared the bejesus out of me. Unemployed at the moment so I'm gonna take as much time as I need to learn to do this. Thanks for the videos!

    • @tylerlaceby
      @tylerlaceby  7 หลายเดือนก่อน +1

      Hopefully this series will treat you well. Thanks for the kind words. Yea luckily parsing a yaml lioe syntax is much simpler than the one we cover in the series.
      Best of luck and if you need any assistance feel free to reach out on my discord.

    • @tmanley1985
      @tmanley1985 7 หลายเดือนก่อน

      @@tylerlaceby I appreciate it! I'm gonna suffer for a while with it but I'm sure this will be a great help.

  • @MaixPeriyon
    @MaixPeriyon 4 หลายเดือนก่อน

    I have really loved your series, really anticipating the rest of the series

  • @NathanWienand
    @NathanWienand 7 หลายเดือนก่อน +1

    You could make: type TokenKind string ... then define their values in the const declaration block directly, then you dont need a function to determine the value :D Good luck. very cool video and well explained.

  • @Pi7on
    @Pi7on 7 หลายเดือนก่อน +1

    Yooo, I literally discovered you a few days ago because I was researching around for my toy language project!
    Loved your ts series, I'll definitely be sticking around!

    • @tylerlaceby
      @tylerlaceby  7 หลายเดือนก่อน +2

      Happy to have been of help. New videos coming soon for this series.

  • @minma02262
    @minma02262 5 หลายเดือนก่อน +1

    Really great. Liked.

  • @anashe5417
    @anashe5417 4 หลายเดือนก่อน +1

    Brothaaa, I was waiting for this but I didn’t get any notificación. Let me enjoy my novel!

  • @uynilo9
    @uynilo9 7 หลายเดือนก่อน +2

    why bro read my mind
    i was just looking for making language in golang the other day

  • @Dviih
    @Dviih 7 หลายเดือนก่อน

    Hi, do you have plans to integrate it also with some compiler like LLVM? your language can really take advantage of what LLVM already have for it not only performance but also a bunch of cross language libraries.

  • @drynianme
    @drynianme 2 หลายเดือนก่อน

    what extensions do u use? code highlighting, auto import, etc.?

    • @tylerlaceby
      @tylerlaceby  2 หลายเดือนก่อน +1

      Just the go extension, go language server, the default settings go uses for formatting is applied on save and Ayu Mirage is the theme I like to use.

    • @drynianme
      @drynianme 2 หลายเดือนก่อน

      @@tylerlaceby thanks!

  • @kr_24
    @kr_24 6 หลายเดือนก่อน

    compile to to machine code would be awesome

  • @dedladxd4011
    @dedladxd4011 3 หลายเดือนก่อน

    hey do i wanna add a separate token for types like i32, u32, ...?

    • @tylerlaceby
      @tylerlaceby  3 หลายเดือนก่อน

      Maybe for your primitive types. But it should be able to group the rest as a symbol. I personally like making it a symbol and not having separate tokens. But some languages do it the other way where they have a token for each primitive

  • @_slier
    @_slier 5 หลายเดือนก่อน

    not just compiling/interpreting, how about adding simple std libs too