How regexes got catastrophic

แชร์
ฝัง
  • เผยแพร่เมื่อ 27 ธ.ค. 2024

ความคิดเห็น • 186

  • @focando-lol
    @focando-lol หลายเดือนก่อน +215

    i understand nothing but lowkey enjoying what is happening

    • @Musikvidedo
      @Musikvidedo หลายเดือนก่อน +10

      if you keep at it you'll understand most of it pretty soon

    • @NithinJune
      @NithinJune หลายเดือนก่อน +6

      kay is so pretty how am i supposed to focus 😵‍💫

    • @JordanManfrey
      @JordanManfrey หลายเดือนก่อน +4

      That’s most developers experience using RegEx in a nutshell

    • @v0id_d3m0n
      @v0id_d3m0n 16 วันที่ผ่านมา

      It makes more sense if you've watched the first two videos :)

  • @LeeDanielCrocker
    @LeeDanielCrocker 21 วันที่ผ่านมา +59

    FYI, Rust's standard regex library uses an O(mn) algorithm without backreferences.

    • @GregWhiteley
      @GregWhiteley 13 วันที่ผ่านมา +5

      Puzzled by this. Rust has _no_ standard regex library, and rust-lang discussion groups trumpets this as a feature, not a bug. Or has something changed.

    • @albertweber1617
      @albertweber1617 13 วันที่ผ่านมา +5

      The de-facto default regex crate "regex" does not implement look-around or backreferences

    • @LeeDanielCrocker
      @LeeDanielCrocker 12 วันที่ผ่านมา +3

      @@GregWhiteley The regex crate is not de jure standard, but de facto the one everyone uses.

  • @yaboi1525
    @yaboi1525 หลายเดือนก่อน +136

    Please don't stop uploading. I found this channel by chance and it has to be one of the best things happening this year

    • @moosehunter248
      @moosehunter248 หลายเดือนก่อน +3

      Sounds like you should consider sponsoring the channel so it doesn’t happen 😊

  • @Julienraptor01
    @Julienraptor01 หลายเดือนก่อน +124

    I now understand why i'm good at regex where most people fail
    I never use backreferences
    Like idk why but they never felt right logically
    Even when i made crazy things like a youtube URL parser to clean those in regex, i've found ways to just do it without backreference when i could have used some
    And it's kinda cause when i'm building the regex, i'm running it in my mind and backreference just makes that impossible
    Like tracking what it does become too complex
    So big thanks for this vid, very informative and great !

    • @Eunakria
      @Eunakria หลายเดือนก่อน +16

      for most CFGs I've always found it easiest to just build a tokenizer with regex and a parser with other techniques
      leaving the abstract computer science world and entering practical software engineering world - it is possible to express any CFG as just that, a parser, but the tokenizer/parser distinction often yields faster, more readable code with advisable properties for most practical applications of parsers (e.g. DSLs such as SQL or serialization formats such as JSON)

    • @JordanManfrey
      @JordanManfrey หลายเดือนก่อน +1

      I’ve never used them because when I learned them in school they would bring up stuff like that as being dangerous bullshit

    • @Eunakria
      @Eunakria หลายเดือนก่อน

      @@JordanManfrey I've had teachers and profs like this before. totally disconnected from the real world. they treat regexes like the devil, while pretending that memory safety errors in languages like C/C++ are a non-issue.
      regexes are an incredibly powerful tool if you know how and where to use them. definitely, part of "how and where" is knowing when you as a programmer, or your team, may be too error-prone to use the tool in confidence. I definitely would avoid using regex, or at least seriously consider and verify my use of it, in any high-availability or high-reliability context; but it's not so bad, all things said and done

    • @AySz88
      @AySz88 หลายเดือนก่อน +5

      ​​@@EunakriaBizarrely enough, I've more often encountered new learners going the other direction - they "in practice" want to abuse regex features, because it's built in, and looks powerful, so it must be fast. (Think "I built a calculator in Minecraft" style shenanigans, and Parker's usual stuff for that matter.) And parsing and tokenizing is more of an abstraction for them, especially if they never got a sense for how to use the toolkit of functional programming.

    • @Noname-67
      @Noname-67 23 วันที่ผ่านมา +1

      Maybe your brain is simulating a finite automaton when doing regex.

  • @JasonWalter
    @JasonWalter 15 วันที่ผ่านมา +6

    Reminded of what JWZ said: "Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems."

  • @wojciechostrowicz
    @wojciechostrowicz 23 วันที่ผ่านมา +20

    YT algorithms just decided that my constant work with regex deserves this video. Thank you algorithm.
    It was very pleasant to watch.

  • @erik_with_a_kay
    @erik_with_a_kay 17 วันที่ผ่านมา +5

    Where [regex] blood makes [regex] blood unclean! First-time watcher and this was great.

  • @rkvkydqf
    @rkvkydqf 20 วันที่ผ่านมา +4

    I can't believe this video has only 31K views. All this work the amazing visualizations, the quality of the explanations, the lined exercises in the description. I truly hope all this work would get rewarded some day. Thank you so much.

  • @9_-_-_-_-_swo
    @9_-_-_-_-_swo หลายเดือนก่อน +43

    i got a (.*?) tattoo when i barely knew what regexes were lol. thank you so much for this series, it has done wonders for my ability to live with that decision (and is also some of the best comp sci content I've ever seen on this platform

    • @NickiRusin
      @NickiRusin หลายเดือนก่อน +1

      that's a really good idea for a tattoo

    • @TVDaJa
      @TVDaJa 29 วันที่ผ่านมา

      Someone please explain for my lil brain

    • @SimonBuchanNz
      @SimonBuchanNz 28 วันที่ผ่านมา +18

      ​@@TVDaJaaccepts anything, but makes everything after it backtrack as much as possible if it fails. An incredibly efficient way to be as inefficient as possible!

    • @9_-_-_-_-_swo
      @9_-_-_-_-_swo 25 วันที่ผ่านมา +2

      @@SimonBuchanNz u can just put it between stuff

    • @SimonBuchanNz
      @SimonBuchanNz 25 วันที่ผ่านมา

      @@9_-_-_-_-_swo yeah, it's fine for input you know is going to be a reasonable size.

  • @AstonishedByTheLackOfCake
    @AstonishedByTheLackOfCake 21 วันที่ผ่านมา +16

    I've always had a sneaking suspicion that regexes were too versatile and did too many things to actually be performant, turns out I was right
    though I did not expect back references to break the entire concept of a language being regular
    will probably use Google's regex engine in the future since I barely ever use backreferences anyway

  • @Niki1A_
    @Niki1A_ 29 วันที่ผ่านมา +39

    Why not use the lockstep algorithm, when the regex has no backreference? It would be easy to just store a boolean with the automaton, that indicates whether it has backreferences, and pick the algorithm accordingly. This would limit catastrophic cases to regexes that actually use backreferences, which could be taught as something to be avoided.

    • @Rock6Sixes
      @Rock6Sixes 28 วันที่ผ่านมา +9

      wanted to comment this, but looked through the comments first. Seems very obvious, but I can't think of the reason why. Maybe the automata structures get reused when possible and you can't know where to use which engine.

    • @rngwrldngnr
      @rngwrldngnr 28 วันที่ผ่านมา +10

      It's possible some systems do, these days. The problematic cases all have backreferences, so it wouldn't actually help in those edge cases.

    • @Ziraya0
      @Ziraya0 23 วันที่ผ่านมา +1

      The only reason I can imagine is that with and without backreferences would be two literally different algorithms and it's going to be hard to make the algorithms otherwise have exactly the same outputs for the same patterns & inputs. I've never used a backreference before, I can't really conceptualize why I would, but in part that's because I already know that regex can't parse html or email addresses, so I've never tried to do a thing you would use them for with regex; the obvious answer to me is basically what you said, but instead of a flag, the language has two literally distinct regex implementations that you're responsible for picking between, and writing regex that produces the result you want for the one you're using. One is fast, one has this weird trick.
      I do a lot of manual regexing stuff in text editors, spreadsheets and microsoft's PowerRenamer, so I'm already used to tailoring patterns and methodology to the environment I'm in. PowerRenamer in particular has an alternate engine it can use that's supposedly faster, but they're not 100% congruent in features and behaviors. In particular, PowerRenamer's default, a wrapping of .Net regex does something wrong with lookbacks so if you want lookbacks to work, you have to use the other one.

    • @PizzaRollExpert
      @PizzaRollExpert 22 วันที่ผ่านมา +3

      This might arguably be the best solution, but it's still far from ideal. Adding a backreference would completely change the performance characteristics of the regex, which is pretty unintuitive. It's better to have lockstep by default and force the programmer to use some special syntax for regexes with backtracking to make it clear that they have different performance characteristics. You might also want to use the ones with backtracking in some special cases anyway, since they can have a better best-case performance on certain input strings.

    • @ThomasWilde
      @ThomasWilde 21 วันที่ผ่านมา

      They can and do. That boolean flags the difference between O(n) and O(2^n). That's the problem.

  • @4l6ag3
    @4l6ag3 15 วันที่ผ่านมา

    What a great overview of this! Great refresher of things I haven't though much about since college, and explained more concisely than any of my professors managed to.

  • @nolanmccarthy3335
    @nolanmccarthy3335 16 วันที่ผ่านมา +2

    I'm really drunk and I don't know what you're talking about but I'm enjoying it

    • @mmmhorsesteaks
      @mmmhorsesteaks 15 วันที่ผ่านมา +1

      Best way to enjoy regexes, really.

  • @somdudewillson
    @somdudewillson 29 วันที่ผ่านมา +8

    My personal position is that once one has spent a few hours trying to convince various HTML parsing libraries to _only_ parse the input string and stop reformatting it, parsing with a regex starts looking pretty good.

  • @dr.strangelove5622
    @dr.strangelove5622 หลายเดือนก่อน +18

    This is amazing!! I read about Thompson's algorithm last week when I was studying non-deterministic automata and the fact that regex engines in most modern and/or popular programming languages are slower than it and suffer from exponential blowup for longer expressions (if I remember correctly). The visualizations of algos was great and helpful in understanding them. Thumbs up for that!!
    All this increases my respect for these giants: programming all these using ed, the standard editor, on a teletype connected to a computer which was much slower than our present day handheld gadgets.

  • @faldarith
    @faldarith 17 วันที่ผ่านมา +2

    Thank you for sharing the stackoverflow answer, I’ve never felt so seen

  • @JamesChurchill
    @JamesChurchill 29 วันที่ผ่านมา +7

    I use a regex to finite state machine code generation tool occasionally for tricky problems, and it's always violently obvious when I've accidentally added nondeterminism to my input regex - the state machine that comes out of the other side blows up spectacularly.

  • @fugoogle_was_already_taken
    @fugoogle_was_already_taken หลายเดือนก่อน +30

    Chomsky hirearchy is not a guideline. If you adhere to proper definitions, it is provable. My formal languages teacher would tear you apart for this 😂

  • @stephenJpollei
    @stephenJpollei 29 วันที่ผ่านมา +1

    This was a fantastic video and I did the first exercise and found strings that caused issues that made them go over 5 seconds. Most of the strings were well under 80 characters and I think the max was around 300. I likely wasn't efficient with the 300 one and could have made a shorter one.
    I think that this really made me understand the issue much better, but I will have to look into some more.
    Thank you, for making such an excellent set of resources.

  • @ouroya
    @ouroya 24 วันที่ผ่านมา +1

    watching this video is making me very glad i've spent the past few months reimplementing my friend's 32-line sed across multiple hundred json files into a program that actually parses and understands json properly. who would've guessed that this works infinitely better and is infinitely more capable?

  • @Tumbolisu
    @Tumbolisu 26 วันที่ผ่านมา +6

    I first learned about the concept of regular expressions at university. (yeah, my interests are not in parsing text, so i literally never saw them until then.) I was taught about the whole hierarchy, with context-free grammars and all that. a year later, i am working on a python project made by other students across the years, and it uses regular expressions all over the place. i sit down and properly learn about them, immediately notice that almost all of the code can be improved (such as some regexes potentially matching the wrong things), and of course, notice that the entire concept of matching groups and back references just... doesn't make sense. it's called a regular expression, but it actually isn't? the fact that every regex engine is forced to use the backtracking algorithm Precisely Because modern every-day regex isn't a regular language is just the cherry on top.
    this video mentions that people want to use regex to match html tags... just write a parser!? html is dead simple, it's the easiest thing on the planet to parse. i have manually written parsers for more complex things for fun.
    besides, i never liked the way regex strings look. they are just this ugly mess of punctuation and backslashes. i know some people who exclusively let ChatGPT write regex strings for them because they trust an LLM more with that mess than themselves.

  • @bntegor
    @bntegor หลายเดือนก่อน +6

    10:25 this transition cracked me up haha great video!

  • @websiteuser7926
    @websiteuser7926 หลายเดือนก่อน +2

    great video, i read that russ cox page years ago and it was fascinating. its surprising that so many languages still use slow regex algorithms

  • @NithinJune
    @NithinJune หลายเดือนก่อน +10

    yikes i think i missed some prerequisites

  • @yash1152
    @yash1152 29 วันที่ผ่านมา +5

    0:40 lemme guess, backtracking is that least efficient method most Rgex engines use to support the backreference matching, with exception of RE2 (in c++) made by google's engineer with doesnt support that feature.
    Fun fact, backref matching is not a regex capability in first place.

  • @drelephanttube
    @drelephanttube หลายเดือนก่อน +17

    Haven't even watched this yet but a new Kay Lack video is automatic thumbs up.

  • @bjorgemeulemeester1398
    @bjorgemeulemeester1398 27 วันที่ผ่านมา +3

    Stumbled here somehow. Quality of production and depth of content are absolutely unmatched.

  • @PatFarrellKTM
    @PatFarrellKTM 15 วันที่ผ่านมา +1

    I want to meet Kay Lack Turing Award Winner. I'll wait a bit. So much fun watching early 'go' videos about how Russ Cox, Rob Pike, Ken Thompson and Rob Griessimer used their decades of experience. Plus lots more refugees from Bell Labs....

  • @MichaelLouard
    @MichaelLouard 27 วันที่ผ่านมา +1

    I can only validate you by experience and not the full level of theory you know, but I’ve had to refactor SEVERAL notebooks containing regex once I learned about what you’re talking about (albeit not as well as you’ve explained it here)
    please take this as a sign you are not just teaching people who know nothing, but also people who know some, but need to be better (people like me!)
    Thanks!

  • @chrispy2117
    @chrispy2117 16 วันที่ผ่านมา +1

    Do i know enough to understand this video? No, absolutely not. Did your voice and explanations make this a really calming video? Yes, 100%. So despite the fact I'm probably not your target audience, I'm absolutely dropping a sub 😄

  • @Jalae
    @Jalae หลายเดือนก่อน +6

    seems like plan9 is the branch of reality where ken thompson and ross cox fixed their regexps. I was confused why it was always greedy and didn't have back references.

  • @checkmate080
    @checkmate080 16 วันที่ผ่านมา +1

    video got me to start using this option in C#:
    [GeneratedRegex("[^a-zA-Z0-9]", RegexOptions.NonBacktracking)]
    from microsoft docs:
    Enable matching using an approach that avoids backtracking and guarantees linear-time processing in the length of the input.

  • @BlueIsLeet
    @BlueIsLeet หลายเดือนก่อน +20

    Use the KISS approach when creating your regexes and you'll be fine. For anything extremely complicated, use a parser library.

    • @mage3690
      @mage3690 25 วันที่ผ่านมา +5

      Or just parse. It's not that hard, really.

    • @AnimeGIFfy
      @AnimeGIFfy 11 วันที่ผ่านมา

      Using regex is like the opposite of KISS. Parsing is as simple as you can get

    • @bensalemi7783
      @bensalemi7783 5 วันที่ผ่านมา

      Some things are problems that need solutions. Some things are solutions that are looking for a problem.
      RegExes are problems trying to convince you they are a solution.

  • @gronki1
    @gronki1 หลายเดือนก่อน +7

    I love your channel! Just recently wrote my first parser in Fortran, language everyone tells me that I should not waste my life on, but it was possible!

    • @boombeet678
      @boombeet678 หลายเดือนก่อน

      Actually a really good idea by the way, many people would recommend more modern alternatives like Python or Julia in terms of languages (even though they have enough differences not to necessarily be considered straight up replacements) for it but considering the time in which it came out, I would say FORTRAN is more akin to Lisp than COBOL in terms of it's use cases. Alike Haskell, probably wonderful to look at not necessarily for it being a swiss army knife (sometimes even getting outclassed in terms of specialization when it comes to which language is "the right tool for the job") but for what it does theoretically and technically, as a construct that holds up throughout time in that fashion.
      I'm wishing I also get to using it sooner or later alike Python and Julia, although the latter two from what I've gathered are probably going to come first for me. Doesn't change the fact that it's a language to take notes from nonetheless.

  • @yoctometric
    @yoctometric หลายเดือนก่อน +2

    Thank you so much for producing this kind of content! Seconding those who enjoyed the transition to the html srackoverflow post. You explain things very well

  • @JackSlinger10
    @JackSlinger10 หลายเดือนก่อน +2

    Really interesting stuff, thank you for sharing.

  • @MaxG628
    @MaxG628 7 วันที่ผ่านมา

    Setting aside the maintenance of two code paths, it seems like a regex library should be able to detect if back references are used, and if not, use the more efficient algorithm. Are there any of that do that?

  • @brendand3765
    @brendand3765 13 วันที่ผ่านมา

    Is there a way to analyze regexp to use the backtracking algorithm only when back references are used ? It doesn't to seem to be used that often.

  • @brandonyong301
    @brandonyong301 หลายเดือนก่อน

    Cool gem of a channel

  • @fugoogle_was_already_taken
    @fugoogle_was_already_taken หลายเดือนก่อน +4

    Why are nondetermenistic automata not compiled to deterministic ones? Wouldnt it solve the algorithmic complexity?

    • @imacds
      @imacds หลายเดือนก่อน +4

      I haven't done the math so I am not sure about this, but my guess would be that the resulting deterministic automata would be the same as traversing with backtracking through the nondeterministic one. The memory consumption of a deterministic automata can also grow quite quickly, so it would likely let you write OutOfMemory dos attacks.

    • @Vaaaaadim
      @Vaaaaadim หลายเดือนก่อน +10

      Given an NFA you can make a DFA that is equivalent, however the number of states in the DFA may be exponentially large.

    • @schwingedeshaehers
      @schwingedeshaehers หลายเดือนก่อน

      how can it grow quickly? you just need to store the state and the rest string​@@imacds

    • @JordanManfrey
      @JordanManfrey หลายเดือนก่อน

      @@Vaaaaadimyeah this, it just sets off the “I’m creating a monster” alarm bells so you don’t do it

    • @moarjank
      @moarjank 16 วันที่ผ่านมา

      It's mathematically impossible, as far as we're aware

  • @benarcher372
    @benarcher372 หลายเดือนก่อน +1

    Great video! I liked very much "... from somebody called Rob Pike..." 🙂I'm sure he doesn't mind!

  • @peterc-s6423
    @peterc-s6423 16 วันที่ผ่านมา

    1. use a regex that matches if a regex causes catastrophic backtracking
    2. if it doesnt, match to see if it uses backreferences
    3. if it uses backreferences, use backtracking
    4. if it doesnt use backreferences, use lockstep

  • @maxmustermann5590
    @maxmustermann5590 14 วันที่ผ่านมา +2

    I love you kay

  • @nik90001
    @nik90001 22 วันที่ผ่านมา

    I think it's pretty common in search stuff to evaluate regexes as finite automata. This lets them intersect the automata with the search index to iterate all marching strings. That's nice because you can OR together the list of matching documents for all strings and get the list of matching documents for the whole regex.
    Also interesting, there is another sort of attack. Usually these algorithms want a DFA but regex make an NFA. Converting between the two *can* use a ton of memory. I accidentally crashed search on Wikipedia using it. That was exciting.

  • @blacklistnr1
    @blacklistnr1 6 วันที่ผ่านมา

    regex defaults always seemed weird to me, the usual case is N serial chunks of: start parsing when x, stop parsing when y, check that what is captured along the way follows some rules.
    I feel like this idea could be expressed so much better if regex didn't try to be so smart, it would also make the runtime faster while avoiding the troublesome edge cases.
    I'd be really interested to find out if there's a legitimate production use case for the smarter regexes (even the 'a repeated pattern from start to end' one you mentioned seems far-fetched to me)

  • @neilsvedberg
    @neilsvedberg 21 วันที่ผ่านมา

    I think raku's approach to more verbose regexes is a step in the right direction.

  • @hugmynutus
    @hugmynutus หลายเดือนก่อน +2

    14:30 appreciate this story. I never realized it was Stream Ed, which is cool as I use sed a lot.

  • @hanpham5790
    @hanpham5790 หลายเดือนก่อน

    I love your videos and contents so much Kay!! :^)

  • @Sandromatic
    @Sandromatic 15 วันที่ผ่านมา

    It seems like a good idea to do lockstep by default and then go to backtracking if a back reference is included in the regex? Most of the time you don't use back references, but if you do it's very useful, but obviously then causes the inability to use the efficient algorithm. Since regexes are usually compiled ahead of time, the algorithm can be chosen ahead of time depending on if back references are included. Just include a warning in your documentation that backrefs can cause major slowdown.

  • @Leadbraw
    @Leadbraw 26 วันที่ผ่านมา

    keep up the great work

  • @NillKitty
    @NillKitty หลายเดือนก่อน +1

    I love your hair and your accent. But to echo others, how do you do your animations?

  • @Lucas-pj9ns
    @Lucas-pj9ns 24 วันที่ผ่านมา

    thanks for the interesting video! i used to hate regex cause it looked like arcane magic but the theory is so interesting. I've been trying to figure out how to implement capturing groups and I'm a bit stuck. I have some ideas about storing more state and I've optimized it so that it only takes O(capture groups^2) more memory but I don't know if that's good or not. I skimmed Russ Cox's papers you linked and I can't seem to find the part about implementing capture groups. Could you point me to which section talks about capture groups or some other link about that?

    • @neoeno4242
      @neoeno4242  23 วันที่ผ่านมา

      Hi Lucas! This is the one in which he talks about it: swtch.com/~rsc/regexp/regexp2.html - roughly the idea is to treat each active state as a ‘thread’ and then maintain a list of all the unique threads active. You don’t have to go the whole way implementing a virtual machine to get something to work but it’s an interesting exercise. If that doesn’t help though drop me an email at hello@0de5.net and I can share some code.

  • @Il_Dilettante
    @Il_Dilettante 29 วันที่ผ่านมา +1

    I'm a feynman bro except instead of feynman it's regex
    Thank God for RegEx

  • @sommanker
    @sommanker 19 วันที่ผ่านมา

    Ehat about lookaheads and lookbehinds?

  • @Lantalia
    @Lantalia 22 วันที่ผ่านมา

    The presence of backreferences are a signpost that you have moved beyond the lexer and should be writing a parser

  • @dungeonrobot
    @dungeonrobot หลายเดือนก่อน

    Great video! Not sure how it’s only at 1k views.
    Very disappointed to discover after this video that the game engine I was using only supports PCRE2

  • @ms-fk6eb
    @ms-fk6eb หลายเดือนก่อน +18

    I hate this, the kind of "yes, but actually no" type stuff that, like the SO answer, consumes all that is pure and beautiful

    • @neoeno4242
      @neoeno4242  หลายเดือนก่อน +3

      would you mind saying a little more about this @ms-fk6eb?

    • @logickedmazimoon6001
      @logickedmazimoon6001 หลายเดือนก่อน +3

      nothing is black and white, nothing is known for sure, our concept of objective is how repeatable something is in one or more cases but there will always be one or more cases that repeatable thing is not... repeatable. When we say "yes" in response to a "does this happen?" actually means, "More than likely yes"

    • @Criz454
      @Criz454 29 วันที่ผ่านมา +2

      @@logickedmazimoon6001 things literally are known for sure but ok

    • @logickedmazimoon6001
      @logickedmazimoon6001 29 วันที่ผ่านมา +1

      @@Criz454 In what sense?

    • @wumi2419
      @wumi2419 15 วันที่ผ่านมา +1

      ​@@Criz454no. Everything has it's limits of applicability. Any physical model breaks apart or is too complicated to be computed. Sometimes both.

  • @HerbieBancock
    @HerbieBancock 9 วันที่ผ่านมา +1

    "I object to binary searches because I'm non-binary" -Pasty Zoomer

  • @rogo7330
    @rogo7330 19 วันที่ผ่านมา +1

    I just avoid regexes as first thing when I parse something. If you ever thought to parse XML *or* HTML (they are quite not the same and I hate it) with regexes, consider to read specs on them, realise that people who wrote all that crap are just sociopaths and hate you, and go back to just splitting file and writing your automata. At least in that case you will find a bug right away or find it more easielly, almost always you will know what parts of the spec you are ignored, and in best case it will be much simpler and faster since you can say «nagh, I know I parsing this xml from that program, I just gonna be sure to preserve information that program produces and just skip parsing of the XML all-together».

  • @02052645
    @02052645 หลายเดือนก่อน +1

    OK, but I need to know: can you parse HTML with a regex?

    • @stephenJpollei
      @stephenJpollei 29 วันที่ผ่านมา +3

      I think at best you can do a poor-man's tokenizer somewhat sanely using regexps. The whole grammar includes full recursion etc so no you can't properly parse arbitrary html just using a regexp. You can scrape stuff from html that follows known pattern or template but that can be extremely fragile when even minor changes to the format is done.

  • @viniciusrolandcrisci272
    @viniciusrolandcrisci272 หลายเดือนก่อน +6

    how do you create your animations?

  • @studogYT
    @studogYT 14 วันที่ผ่านมา +2

    Perl predates Linux by 3 or 4 years.

  • @JohnSmall314
    @JohnSmall314 29 วันที่ผ่านมา

    Has anyone done the quantum computation version of this?

  • @fugoogle_was_already_taken
    @fugoogle_was_already_taken หลายเดือนก่อน

    Did not know about redos, thx

  • @Flourish38
    @Flourish38 15 วันที่ผ่านมา

    Being unable to negate generation in the same way you can negate recognition is reminiscent of how SAT is in NP, but UNSAT is in EXP because there’s no way to read a sufficient proof in non-exponential time.

  • @i_am_lambda
    @i_am_lambda 28 วันที่ผ่านมา

    Why not scan the regex for back references and use lockstep if none exist?

    • @rngwrldngnr
      @rngwrldngnr 28 วันที่ผ่านมา +2

      Some tools probably do, but the specific website-taking-down cases were all backreferences related, so it wouldn't help with the really icky edge cases.

  • @mysteryman7877
    @mysteryman7877 21 วันที่ผ่านมา

    Wait, do you issue regex licenses?

  • @zweitekonto9654
    @zweitekonto9654 หลายเดือนก่อน +5

    what's a regular language tho. I don't think you ever got into the definitions in the series.

    • @drdca8263
      @drdca8263 22 วันที่ผ่านมา

      A regular language is one that can be recognized by a finite state automaton

  • @bensalemi7783
    @bensalemi7783 5 วันที่ผ่านมา

    If I wanted to right unreadable code, I'd enter an obfuscated C contest.
    I don't care what the theory is. I don't care how efficient you can make a regEx implementation. If you want to make things as efficient as possible you can always write in assembler, but people don't do that unless they absolutely have to (or for fun) and the reasons are obvious. For those same reasons, try to avoid regEx and realize they are from another time and solved a different problem than the ones we need to solve today.

  • @v0id_d3m0n
    @v0id_d3m0n 16 วันที่ผ่านมา

    That stackoverflow answer is hilarious. Thanks for sharing xD

  • @jammin023
    @jammin023 หลายเดือนก่อน +1

    You keep saying "in big-O notation..." but then giving the time complexity's *name* rather than its big-O notation... e.g. quadratic time would be O(n²)

  • @rich_in_paradise
    @rich_in_paradise 10 วันที่ผ่านมา

    Maybe because I learnt compilers (Dragon book, yay!) before I programmed in a language (or used an OS) that had regexes, I always thought people coverted their NFAs into DFAs before recognition.

  • @Klosterhasi
    @Klosterhasi 17 วันที่ผ่านมา

    10:25
    no way its actually been there the whole time holy shit

  • @caniggiasyabil470
    @caniggiasyabil470 22 วันที่ผ่านมา +2

    bro you are GORGEOUS omg

  • @ricardojunior4334
    @ricardojunior4334 6 วันที่ผ่านมา

    "Someone called Rob Pike"

  • @gasun1274
    @gasun1274 15 วันที่ผ่านมา +1

    or do you just like looking like a medieval english lord

  • @omegahaxors9-11
    @omegahaxors9-11 หลายเดือนก่อน

    I still don't know what a regex is.

    • @williamdrum9899
      @williamdrum9899 10 วันที่ผ่านมา

      Regular Expression. It's a computer program that lets you tell if a piece if text fits a common format, such as an email address, a phone number, etc. Theoretically it works with any kind of text pattern. The downside is that it looks nonsensical to a human reader

  • @yxyk-fr
    @yxyk-fr หลายเดือนก่อน +1

    0:35 yeah let's hack !!!

  • @gasun1274
    @gasun1274 15 วันที่ผ่านมา +1

    r u enby

  • @Tobiky
    @Tobiky 26 วันที่ผ่านมา

    banger

  • @killerwrox
    @killerwrox 21 วันที่ผ่านมา

    Why do you know so much about regex?

  • @stupidteous
    @stupidteous หลายเดือนก่อน

    u look like finlay christie

  • @ducemano
    @ducemano หลายเดือนก่อน +2

    Love me some diversity in the mathmetics field

  • @Sub0x-x40
    @Sub0x-x40 27 วันที่ผ่านมา

    thank the Gods for gpt doing all my regex when I need it

  • @leeschumacher8285
    @leeschumacher8285 29 วันที่ผ่านมา

    G … N … U ?!?

  • @randomchannel-px6ho
    @randomchannel-px6ho หลายเดือนก่อน

    How they got catastrophic xD

  • @SolaceCafe
    @SolaceCafe หลายเดือนก่อน

    This video, while I would love to learn about regex efficiency as I use regex a lot, kinda lost me in talking a lot about different Algs and backtracking and such and not really relating it back to regex.
    I basically just went "ok, so just don't use lookahead or lookbehind probably?"
    Edit: I just noticed that every graph you show past like, halfway into the video, is showing what regex expression it is. Now I feel dumb.

  • @randomchannel-px6ho
    @randomchannel-px6ho หลายเดือนก่อน

    Yo draw a line between 1 and 5 and 3 and 7 and flip 1:45 90 degrees clockwise. If you recognize it, wtf

    • @spaghettiking653
      @spaghettiking653 หลายเดือนก่อน

      What, it looks like a cube?

    • @JadeVanadiumResearch
      @JadeVanadiumResearch หลายเดือนก่อน

      Is this Tunic?????????????????

    • @volbla
      @volbla 29 วันที่ผ่านมา +1

      I'm guessing the tree of life. It turns out judaism was a finite state machine all along!

    • @JadeVanadiumResearch
      @JadeVanadiumResearch 4 วันที่ผ่านมา +1

      @@volbla Yeah I guess that's more likely than Tunic... Good game though.

  • @tommihommi1
    @tommihommi1 21 วันที่ผ่านมา +1

    most things done with regexes should instead be done with plain, readable, testable, debuggable code.

  • @JOHNSMITH-ve3rq
    @JOHNSMITH-ve3rq หลายเดือนก่อน

    Ugh. Can’t do it.

  • @RifazNahiyanFWS
    @RifazNahiyanFWS 22 วันที่ผ่านมา

    I *have* to ask a question that's just on my mind. Are you a trans? It's just your voice and how you look I cannot add up.

    • @humanfactorsio
      @humanfactorsio 19 วันที่ผ่านมา +1

      Why does it matter at all? Stop transvestigating her. She's a woman, you don't need to know more than that. It's just you satiating your curiosity about the privacy of her life. This is the problem with society, you say we shove it down your throats - but you bring it up every opportunity you can and treat it like a dumb detective act. Let people have privacy.

    • @RifazNahiyanFWS
      @RifazNahiyanFWS 19 วันที่ผ่านมา +1

      @@humanfactorsio honestly if you can't answer that question but also are showing your appearance, you're not doing privacy right.
      I think it's a natural question to ask when you can look at her but then after hearing her you can't make the match.

    • @AnimeGIFfy
      @AnimeGIFfy 11 วันที่ผ่านมา +1

      You lookin for a date or something lmao

  • @ccriztoff
    @ccriztoff 29 วันที่ผ่านมา +2

    what a handsome young lady

    • @jvo1464
      @jvo1464 24 วันที่ผ่านมา

      😂

    • @humanfactorsio
      @humanfactorsio 19 วันที่ผ่านมา

      Why do you need to an an absolute bigot on here? Show some respect FFS.

  • @dadisuperman3472
    @dadisuperman3472 หลายเดือนก่อน +6

    What are you???
    Man or woman???

    • @flyingzeppo
      @flyingzeppo หลายเดือนก่อน +39

      Why does it matter? You're focusing on the wrong thing in this video.

    • @dinkc64
      @dinkc64 หลายเดือนก่อน +5

      Isn't it obvious? 😞

    • @alysphacker
      @alysphacker หลายเดือนก่อน +1

      @@flyingzeppo he wants to know obviously if he asked

    • @robinforbes9197
      @robinforbes9197 หลายเดือนก่อน +15

      M8 we're just trying to live. Can you leave her alone?

    • @alphago9397
      @alphago9397 หลายเดือนก่อน +2

      I honestly would have preferred the video a lot more if it were just narrated.

  • @ЕвгенийРезвей
    @ЕвгенийРезвей หลายเดือนก่อน

    зачем он вырядился как баба?

    • @JordanManfrey
      @JordanManfrey หลายเดือนก่อน +6

      Because they wont be jailed for it unlike in your country

  •  หลายเดือนก่อน

    If You're so smart, just write better and be the hero. Huh? I guess not a chance, right? The complexity and unefficiency probably has its purpose. Well. But You just needed views, right?

    • @rngwrldngnr
      @rngwrldngnr หลายเดือนก่อน +12

      The video literally explains the use of backtracking, and how removing it could lead to safer, faster code, admitting that doing so would involve removing back references and their expressive power. Besides, it's comparing the performance between two algorithms, both of which were written by the same person, so it's hardly a callout.

    • @JordanManfrey
      @JordanManfrey หลายเดือนก่อน +1

      Let me guess, self-taught

  • @humanfactorsio
    @humanfactorsio 20 วันที่ผ่านมา

    You are genuinely one of the greatest video creators I've seen in so long (also you are goals in general girl 🥹💕). I have a massive holiday watchload now. Thank you so so so so much for your hard work