Regular Expressions - Computerphile

แชร์
ฝัง
  • เผยแพร่เมื่อ 17 พ.ค. 2024
  • Professor Brailsford on one of our most requested topics.
    Playlist of Videos the Prof mentioned: • Chomsky Language Levels
    EXTRA BITS: • EXTRA BITS RegEx & Kle...
    / computerphile
    / computer_phile
    This video was filmed and edited by Sean Riley.
    Computer Science at the University of Nottingham: bit.ly/nottscomputer
    Computerphile is a sister project to Brady Haran's Numberphile. More at www.bradyharan.com

ความคิดเห็น • 375

  • @ke9tv
    @ke9tv 4 ปีที่แล้ว +1088

    'Some people, when confronted with a problem, think, “I know, I'll use regular expressions.” Now they have two problems.' - J. Zawinski

    • @BlazingSun46
      @BlazingSun46 4 ปีที่แล้ว +31

      Kevin Kenny Oh I know! I will just use regex. wait, what was the token for that thing again?

    • @IshayuG
      @IshayuG 4 ปีที่แล้ว +19

      Put RegEx into APL for ultimate confusion (and compactness)

    • @recklessroges
      @recklessroges 4 ปีที่แล้ว +34

      booo "bad Regexp meme is bad". #Regexp4life

    • @limmeh7881
      @limmeh7881 4 ปีที่แล้ว +2

      You just earned your 69th like my man. Regex is the most confusing thing for me lol.

    • @sugarfrosted2005
      @sugarfrosted2005 4 ปีที่แล้ว +27

      It's funny because regular expressions regularly make my life easier because I use them constantly. A rare use of them is worse than using them often and now using them.

  • @ChrisSeltzer
    @ChrisSeltzer 4 ปีที่แล้ว +285

    Just an anecdote about regular expressions from working as a software engineer. I once worked in a codebase that made lots of use of regular expressions. I suggested that for maintainability we should have a rule that above the regular expression you add a comment stating what it was intended to match. The pushback I received was that "the regular expression, by definition, says what it's intended to match."

    • @adamdyson3977
      @adamdyson3977 4 ปีที่แล้ว +96

      How dare you not instantly know how to read regex 😂

    • @Roxor128
      @Roxor128 4 ปีที่แล้ว +50

      [groans] A prime example of being correct on the wrong level.

    • @lawrencedoliveiro9104
      @lawrencedoliveiro9104 4 ปีที่แล้ว +35

      My attitude to comments is that they shouldn’t say just what the code does (the code itself does that), but _why_ the code is doing what it’s doing. Perhaps a comment on the regular expression could be expressed on the same basis.

    • @evangrove2131
      @evangrove2131 4 ปีที่แล้ว +48

      It's all fun and games until \b\w+\b\s*(?=(\[.*\]\w*)?([\(]|([ ](?![,+\-*/=!^&<>|?]|and|as|div|in|is|isnot|mod|notin|of|or|shl|shr|xor))))

    • @iainp84
      @iainp84 4 ปีที่แล้ว +19

      Evan Grove please could you add a comment describing what that regex is intended to match?.....

  • @josko50
    @josko50 4 ปีที่แล้ว +16

    All of the contributors on this channel are great, but every Professor Brailsford video always has my undivided attention. I appreciate his particular style of making these complex concepts unfold like a story.

  • @davidgillies620
    @davidgillies620 4 ปีที่แล้ว +8

    It is hard to overstate the enlightenment that comes from realising that a regular expression is a state machine.

    • @profdaveb6384
      @profdaveb6384 4 ปีที่แล้ว

      Yes !

    • @musashi939
      @musashi939 4 ปีที่แล้ว

      Which you know if you study computer science.
      Regular language -> deterministic, finite state machine -> regular expression

  • @jaimecristalino
    @jaimecristalino 4 ปีที่แล้ว +33

    His voice is so nice to hear...

    •  2 ปีที่แล้ว +1

      Finally someone speaks about the important things. I would have visited his lectures just for relaxation 😌

  • @realandar
    @realandar 2 ปีที่แล้ว +8

    I'm a huge fan of Prof Brailsford. I was extremely fortunate to start my computing career, while in secondary school in about 1975, learning on a pdp/8E. I learned so much, including TECO a text editor that used regexp. Those were the days that truly grounded my understanding of computers, hardware, firmware, and software. Watching this series brings back great memories. Thank you.

  • @samipah
    @samipah 4 ปีที่แล้ว +98

    More info in:
    Mastering Regular Expressions, 3rd Edition
    By Jeffrey Friedl
    Pages: 544

    • @Ceelvain
      @Ceelvain 4 ปีที่แล้ว +4

      The famous book by the man who disregard and direspect the underlying math.
      This book will mostly teach you how to best use regular expressions given the current widespread implementations. Not what they are nor how regex engine actually work. It's targeted at casual programmers and not at computer scientists. (Gosh, I don't even remember this book talking about algorithmic complexity.)
      And most of everything he says about regex engines is at best incomplete and misuse the terminology, at worst it's wrong and misleading.
      I would recommend the blog of Russ Cox to dive into how regex are actually executed. I honestly loved every word.

    • @thomaskember4628
      @thomaskember4628 4 ปีที่แล้ว +1

      Ceelvain I am in the middle of reading Larry Wall’s book on Perl. In it he says Friedl’s book is the place to go to for more information about regular expressions. Are you saying that this is not so and Friedl’s book is rubbish?
      I looked it up on Amazon but haven’t decided to go ahead and buy it.

  • @sejinmajnaric2884
    @sejinmajnaric2884 3 ปีที่แล้ว +15

    Thank you so much, Professor Brailsford! Hope you are well and healthy! I absolutely love listening to your explanation, it's so soothing to my ears.

  • @swagatochatterjee7104
    @swagatochatterjee7104 4 ปีที่แล้ว +34

    12:40 not only Ken Thompson. This problem was also solved on the other side of the Iron Curtain by Vladimir Glushkov. Unfortunately we are not taught much about Glushkov's construction algorithm.

  • @asaadb1
    @asaadb1 4 ปีที่แล้ว +4

    I faced a really pesky regex problem the other day, took me 2 days to solve it (had to trace back data to a reduced set), i never knew you could use a diagram for regex - i just scribbled it down in like 20 mins, got the same solution. genius!

  • @BeCurieUs
    @BeCurieUs 4 ปีที่แล้ว +3

    As far as pedagogy goes, Professor Brailsford is unrivaled. Def reignited my computer science flame

  • @hahahatall09
    @hahahatall09 4 ปีที่แล้ว +335

    It's terribly difficult to explain regex to people who don't use it!

    • @CottonInDerTube
      @CottonInDerTube 4 ปีที่แล้ว +59

      I use it from time to time. I understood nothing from the video.
      And thats the first time on computerphile videos.

    • @pdrg
      @pdrg 4 ปีที่แล้ว +8

      It's terribly difficult to explain regex to people who don't use it ;-)

    • @FireWyvern870
      @FireWyvern870 4 ปีที่แล้ว

      @@pdrg you can use it -like this- you know

    • @a.yashwanth
      @a.yashwanth 4 ปีที่แล้ว

      @@FireWyvern870 how

    • @RustyTube
      @RustyTube 4 ปีที่แล้ว +2

      @@a.yashwanth As he said, -like this- (precede and follow it by a dash).

  • @tedchirvasiu
    @tedchirvasiu 4 ปีที่แล้ว +63

    0:43 Professor Brailsford is a real-life Yoda

  • @davidwise1302
    @davidwise1302 4 ปีที่แล้ว +2

    I am retired now, but for decades of my software engineering career I have used C switch statements to implement state machines. No programming book that I have ever read has ever described such a method, but I learned it on-the-job (OTJ, a very common military acronymn since that is how you learn almost everything). In the last two decades of my software engineering career, I was repeatedly tasked with communicating with GPS receivers over their serial communications. We were never taught those skills in school, but we learned them immediately in the real world.

    • @lawrencedoliveiro9104
      @lawrencedoliveiro9104 4 ปีที่แล้ว

      Yup, hand-coding state machines to handle parsing various kinds of data is an extremely common thing. Done it lots myself.

    • @lawrencedoliveiro9104
      @lawrencedoliveiro9104 4 ปีที่แล้ว

      Yup, hand-coding state machines to handle parsing various kinds of data is an extremely common thing. Done it lots myself.

  • @BTHobbies
    @BTHobbies 4 ปีที่แล้ว +12

    This video was the first time the link between regular expressions and finite state automata clicked in my brain. I've been comfortable with both for years, and knew they were equivalent, but they just never came together for me before now. Thanks!

    •  2 ปีที่แล้ว +1

      The NFA-to-GNFA-to-Regex algorithm is really helpful in making that connection too

  • @fouzaialaa7962
    @fouzaialaa7962 4 ปีที่แล้ว +5

    i was taught this stuff without explanation never thought it could be used to solve a problem ..... i genuinely believe this channel is better that teachers i pay to teach me stuff

    • @lydianlights
      @lydianlights 4 ปีที่แล้ว +2

      Lol I was the opposite... I got taught how to solve problems using it but not really any of the theory behind it.

  • @LemonadetvYT
    @LemonadetvYT ปีที่แล้ว +1

    You can’t lie he is the best professor to explain regular expressions

  • @banderfargoyl
    @banderfargoyl 4 ปีที่แล้ว +11

    Love the history. Ken and Dennis certainly deserve the recognition for their accomplishments.

    • @josephgaviota
      @josephgaviota 2 ปีที่แล้ว

      💯 agree! True GENIUS.

  • @noswonky
    @noswonky 4 ปีที่แล้ว +39

    If I didn't already know what regular expressions are, I still wouldn't.

  • @Relemsis
    @Relemsis 4 ปีที่แล้ว +1

    this man is one of my favorite youtubers

  • @roberthunter5059
    @roberthunter5059 4 ปีที่แล้ว +91

    This was a very "computer science" explanation. I'm not sure how well anyone who hasn't worked with regexes will understand it.

    • @daniel.lupton
      @daniel.lupton 4 ปีที่แล้ว +6

      If you get deep into regex, it helps to understand FSA. But for basic stuff you don't need to understand the mechanics, just the syntax.

    •  4 ปีที่แล้ว +12

      I think there should have been a real-life example like matching a telephone number or zip code or something. I had a course on formal languages so I understood the video in full and enjoyed the context provided, but there is no way this is digestible by common folk, which is a shame because this might end up showing up in the youtube search for regular expressions…

    • @zacharieetienne5784
      @zacharieetienne5784 4 ปีที่แล้ว +1

      yes i understood some of those words

  • @aasutossh
    @aasutossh 4 ปีที่แล้ว +5

    The story telling is amazing.

  • @Sei783
    @Sei783 4 ปีที่แล้ว +2

    The wealth of knowledge contained within this man is mind blowing. To me, all this is trivial information that I'd never remember. These days, I can't remember what I did five minutes ago. But it's all so fascinating.

  • @rnoro
    @rnoro 2 ปีที่แล้ว +1

    Definitely the best video about regular expression I've ever seen!

  • @thattimestampguy
    @thattimestampguy ปีที่แล้ว +4

    1:32 012 ac string
    3:15 abc
    3:57 Ab*c
    B* = 0 or more B’s
    5:03 Quite complicated but locally simple
    5:34 Chomsky Grammer approach
    7:27 complete equivalence
    Automaton
    Regular expression
    Chomsky Expression
    Start, Automation, Finish
    8:29 Snag in equivalence
    9:46 2 alternatives, 2 completely different directions.
    11:03 What are you gonna do?
    11:28 Nondeterminist -> Determinist
    13:03 Ken Thompson said
    13:57 Regular Expression recognizer.

  • @ColinBroderickMaths
    @ColinBroderickMaths 4 ปีที่แล้ว +213

    Having watched this I feel like I learned absolutely nothing about regular expressions.

    • @billigerfusel
      @billigerfusel 4 ปีที่แล้ว +6

      Same, lol

    • @SteS
      @SteS 4 ปีที่แล้ว +67

      I've known it for ten years and his explanation greatly overcomplicates it and never really gives you an idea of how it works, it was more a history lesson. He does this a lot.

    • @idjles
      @idjles 4 ปีที่แล้ว +23

      @@SteS i came for the history - i know the rest.

    • @SteS
      @SteS 4 ปีที่แล้ว +10

      @@idjles I was replying to Colin as he thought this was an explanation when it wasn't. The closest he got was explaining *b** was matching b 0 times-infinity.

    • @servenToGo
      @servenToGo 4 ปีที่แล้ว +3

      Well, it is about the idea of regex

  • @MikeMMCC
    @MikeMMCC 4 ปีที่แล้ว +4

    I loved how this video tied together regex, automata and formal languages. Watching this I felt the formal definition for regex is really simple, but in real world use things can get messy. Would've liked some mention of the extended symbols like the Kleene plus, or use of subexpressions etc. Anyway great job.

    • @profdaveb6384
      @profdaveb6384 4 ปีที่แล้ว +2

      Hope to do more actual regex examples and more extended regex constructs in a follow-up video

  • @chaoslab
    @chaoslab 4 ปีที่แล้ว +2

    Quite happy to regularly express that I think Prof Brailsford is a nice chap!

  • @henry-js
    @henry-js 4 ปีที่แล้ว

    I've been waiting for this video for so long!!!

  • @marco201ful
    @marco201ful 4 ปีที่แล้ว

    People wanting a more practical video do not get that this channel is more focused on the mathematical aspects of computer science.

  • @stephenm3874
    @stephenm3874 4 ปีที่แล้ว +1

    Light bulb lit up @ 7:25. Used regex extensively in early 2000's writing real-time flat file parsers for AIG. Very challenging and enjoyable.

  • @lawrencedoliveiro9104
    @lawrencedoliveiro9104 4 ปีที่แล้ว +20

    7:04 Notice the only kind of recursion allowed is tail recursion-there is nothing after the B that occurs in the substitution for B itself. This can be implemented with simple iteration, without a stack. Otherwise, it would no longer be a Type 3 grammar.

    • @profdaveb6384
      @profdaveb6384 4 ปีที่แล้ว +9

      Just so - many thanks! I'm hoping to get on to this in a later video.

    • @noe6673
      @noe6673 2 ปีที่แล้ว

      Thank you for pointing this out. I was actually quite confused when he said Regular Expressions are equivalent to what I thought were Context Free Grammars. It's a detail, but a very important one! : )

  • @mubafaw
    @mubafaw 2 ปีที่แล้ว

    Sir. These explanations are crystal clear. You're a legend. Thank you so much :)

  • @devttyUSB0
    @devttyUSB0 4 ปีที่แล้ว +42

    That epic 'how to parse HTML with regexps'-question on stack overflow.

    • @NoNameAtAll2
      @NoNameAtAll2 4 ปีที่แล้ว +2

      Worst answers on all that website

    • @Furiends
      @Furiends 4 ปีที่แล้ว

      Thats an awful use case for regex.

    • @NoNameAtAll2
      @NoNameAtAll2 4 ปีที่แล้ว +1

      @@Furiends
      Why?
      StOve question was about finding closing tags in simplest html files
      Seems alright place for regex

    • @lawrencedoliveiro9104
      @lawrencedoliveiro9104 4 ปีที่แล้ว +6

      @@NoNameAtAll2 It can’t work, because opening/closing tags can be nested inside other opening/closing tags, to arbitrary levels. Regexes cannot handle that.

    • @lawrencedoliveiro9104
      @lawrencedoliveiro9104 4 ปีที่แล้ว

      @@NoNameAtAll2
      It can’t work, because opening/closing tags can be nested inside other
      opening/closing tags, to arbitrary levels. Regexes cannot handle that.

  • @samsepiol6069
    @samsepiol6069 4 ปีที่แล้ว

    I wish this video was up when I was studying towards my theory exam! Great video

  • @hatoline4731
    @hatoline4731 3 ปีที่แล้ว +1

    The idea that I can describe state machines with regex is somehow mindblowing.

  • @stephenarnsparger5433
    @stephenarnsparger5433 4 ปีที่แล้ว +3

    Awesome video! RegEx are super useful!

  • @programmingguy6081
    @programmingguy6081 4 ปีที่แล้ว +1

    I love regular expressions and I include them in all coding projects. It saves me from having to write hundreds of lines of code to capture the same data with a single regular expression.

  • @AcornElectron
    @AcornElectron 4 ปีที่แล้ว +7

    Go on professor B, nail it!

  • @torarinvik4920
    @torarinvik4920 8 หลายเดือนก่อน

    The guy who designed APL was deeply inspired by the obfuscation of Regex. If you can express a program in 4 lines, then squeeze it into one! The incredible amount of disk space a few extra lines requires can not be understated!

  • @thomaskember4628
    @thomaskember4628 4 ปีที่แล้ว

    I have often wondered how the name Kleene should be pronounced since I had only seen it written. This pronunciation is at least something I’ve learned from this video.

  • @nikodemus7900
    @nikodemus7900 4 ปีที่แล้ว +7

    I can't imagine a person sounding more British than this gentleman

  • @ianmcnaney6528
    @ianmcnaney6528 4 ปีที่แล้ว

    It still boggles my mind that people know what regular expressions are but don’t understand them, and are so proud of that fact that they’ll brag about it in TH-cam comments. If you thought you had one problem before someone used a regex, and then had two problems, well now you have three. You’re unemployable.

  • @Redentor92
    @Redentor92 4 ปีที่แล้ว +7

    Finally a subject I know! There's a lot to be said about Automata and Regex, and personally I think it's a great debt of this channel.
    One thing is about the Sakoda Sipser problem, which asks about the amount of states involved in the change from non deterministic to deterministic automata of basic operations of (regular) languages like union, intersection, complement and so.
    Other is the Černy problem, but this is more abstract.
    Automata are a very interesting subject hidden from the general public.

    • @Furiends
      @Furiends 4 ปีที่แล้ว

      Yeah they could have explained why the not so regular expressions are called regular and in fact knowing why its called that and understanding what regular means also tells you when you should and should not use regex.

    • @petrmalecik5661
      @petrmalecik5661 4 ปีที่แล้ว

      I agree also It would be great to do something about push-down automata, deterministic and non-deterministic and say something about relation between automatas and languages in chomsky's hierarchy

  • @doom87er
    @doom87er 4 ปีที่แล้ว

    Regex is my bane. Anything but the most simple expressions are like writing minified code outright!
    But they’re so good!

    • @Furiends
      @Furiends 4 ปีที่แล้ว +1

      ^\(?([0-9]{3})\)?[-. ]?([0-9]{3})[-. ]?([0-9]{4})$
      I feel like part of the purpose of regex is lost in its short handedness. If the above looks like an alien language then you should either get really familiar with regex or use an expanded notation thats intuitive.
      Long English notation:
      ^ = From start
      \X = literal X
      ? = zero or one time
      ( = capture
      ) = end capture
      [X] = any of X
      X - Y = from X to Y
      {X} = X (range) times
      $ = to end
      From start, literal (, zero or one time, capture (, any of, from 0 to 9, 3 times, literal ), zero or one time, ) end capture, any of - or ., zero or one time,
      capture (, any of, from 0 to 9, 3 times, ) end capture, any of - or ., zero or one time,
      capture (, any of, from 0 to 9, 4 times, ) end capture, to end.

  • @danielroder830
    @danielroder830 4 ปีที่แล้ว +1

    The way people mostly use regular expression in programming or at least me is to use it as a language that describes search patterns. One example would be an expression that looks for a date in a specific format like 2020-01-09 in a given text. For example you would write an expression that states: A four digit number, starting with a 1 or 2 followed by a "-" then a number between 01 and 12 one more "-" and then a number between 01 and 31. But you have to write it in that language in a compact form. And then your expression will only find that specific format and you can extend it to other formats, like allowing 2020-1-9 as a first step or 2020/01/09 and so on and then formats from other countries, with names of months in different languages and totally go nuts until your regular expression fills multiple pages. But its and incredibly powerful tool and very hard to master. If you have to make one yourself for a more complicated case you will find yourself always tweaking it a bit to include more matches and have less mismatches.

  • @TheCookieMuncher96
    @TheCookieMuncher96 4 ปีที่แล้ว +4

    This is one of the topics we have covered in discrete maths on my computer science degree, could have done with this video a couple of weeks ago lmao, great video though

  • @suicidalbanananana
    @suicidalbanananana 4 ปีที่แล้ว +5

    Regular expressions are the sole reason we have websites like stackoverflow etc, so you can ask somebody else to write your regex :P

  • @safebox36
    @safebox36 4 ปีที่แล้ว

    I'm doing a lunch and learn on RegEx at work in the coming weeks...this is conveniently timed.

  • @NESMASTER14
    @NESMASTER14 4 ปีที่แล้ว

    On my lunch break and here I am watching a video on Regex...But it's by Professor Brailsford so it'll be interesting!

  • @photobigero8055
    @photobigero8055 ปีที่แล้ว

    Thank you, for the explanation , but especially for the historical perspective on things that are used.
    I was intrigued by describing graphic diagrams over the phone , that I admit you can't really do that if you only have a microphone and a speaker.
    The older I get the more I try to learn and use things with a histiorical background and not just use them mindlessly.

  • @swagatochatterjee7104
    @swagatochatterjee7104 4 ปีที่แล้ว +12

    I never knew it was Kleene(Cleany) I always called him Kleene(Clean). Yes I called the operator a "clean star".

  • @perladeluzes
    @perladeluzes 3 ปีที่แล้ว +1

    Professor Brailsford, I loveeee you!!!!

  • @DFX2KX
    @DFX2KX 4 ปีที่แล้ว +2

    Regex is incredibly useful for searching for and filtering text. It's a feature of pretty much every MUD client for that reason. One long sentence worth of Regex can do with 40 lines of code for a plugin can do. I've used them to the extent that my client's memory allocation for that system seems to have run out and they don't always fire now, lol.

    • @JNCressey
      @JNCressey 4 ปีที่แล้ว +1

      Useful as long as you don't let it become Zed's hair code.
      > These yellow pets
      > are called Zeds.
      > They have one hair
      > up on their heads.
      > Their hair grows fast...
      > so fast, they say,
      > they need a hair cut
      > every day.
      > - Dr. Seuss, One Fish Two Fish Red Fish Blue Fish
      When code is crunched into a single complex line when many simple lines would be clearer, I call that Zed's hair.
      Eg, if you're trying capturing all email addresses and phone numbers from a string. Split it into two separate searches, one for emails, one for numbers. Don't crunch it into one line by creating a single RegEx that matches both email addresses and phone numbers.
      Also, as you mentioned, regex can be inefficient. Crunching down line numbers when it sacrifices both clarity and efficiency is incredibly dumb.
      That said, simple regular expressions can be much clearer than the equivalent multiple lines of code.

    • @lydianlights
      @lydianlights 4 ปีที่แล้ว

      @@JNCressey Haha that's a great expression. I'm stealing it.

  • @harshp146
    @harshp146 4 ปีที่แล้ว

    I definitely learned thus while learning Haskell and prolog

  • @davidusnazarus1700
    @davidusnazarus1700 3 ปีที่แล้ว

    I could'nt comprehend a single word in this entire video, let alone understand regular expressions.

  • @raglandasir6885
    @raglandasir6885 ปีที่แล้ว

    There's the Compiler Dragon book in the background there which explains & uses the principles mentioned in this video

  • @BenMcKenn
    @BenMcKenn 4 ปีที่แล้ว

    I thought regex was just a syntax for searching for patterns in strings, but this video makes it seem like it's at the heart of programming and computer science... what am I missing?

  • @bextract0
    @bextract0 3 ปีที่แล้ว

    Thanks, this helped me understand a lot.

  • @jamesedwards6173
    @jamesedwards6173 ปีที่แล้ว

    Regular expressions are awesome, and in my own experience everyone who's spoken ill of them has implicitly announced their own lack of skill with them rather than establishing anything compellingly negative about them.

  • @FrankHarwald
    @FrankHarwald 4 ปีที่แล้ว

    For those interested on how to convert a non-deterministic regex (& hence equally a non-deterministic finite automata or non-deterministic left-regular grammar) into a deterministic ones, what you'll have to do is a closure of each (initial state + input state tuple) with each possible subsequent states which in the worst case will mathematically end up being a full power set construction of the initial non-deterministic automata which is why a given non-deterministic regex of length ~n may end up being as big as O(k*2^n) for a positive non-vanishing constant k in the worst case.
    So yes, succinctness as well as computational complexity of deterministic vs non-deterministic regexes differ exponentially!
    Another thing: all this video talks about is about regexes, finite automata & grammars without much output besides a singular boolean accepting or non accepting, that is, pure acceptors - while in practice regexes, finite automata & grammars often specify a corresponding output which is why they are called a transducers - & there's an entire zoo of different kinds of transducers - even for just regexes - different forms of output, multiple outputs... you name it!

  • @sevilnatas
    @sevilnatas 4 ปีที่แล้ว +1

    My question is, do you still have use for that green bar paper or are you using it as note pad paper because you have it left over from back in the day? As a kid, I had reams of green bar paper from my uncle that was a software engineer at Rockwell International, programming avionics in C and who knows what other language before that. That green bar paper was, literally, an unending joy to draw on as a kid.
    P.S. - I think RegEx is one of the most brilliant, yet least mastered languages in computer science.

  • @iphobley
    @iphobley 4 ปีที่แล้ว

    I recognise the ‘Dragon’ book in the background, but couldn’t find my copy. But I did find Bornat’s book on compiler writing.

  • @poddus
    @poddus 4 ปีที่แล้ว +2

    I have the same mouse. excellent taste professor brailsford

  • @BaronSamedi1959
    @BaronSamedi1959 4 ปีที่แล้ว +50

    And then Larry Wall put regexes into Perl and the world was saved!

    • @TheUglyGnome
      @TheUglyGnome 4 ปีที่แล้ว +11

      Perl is the best programming language ever.

    • @MSStuckwisch
      @MSStuckwisch 4 ปีที่แล้ว +4

      You should see what he did with it in Raku (née Perl 6). Allows for full parsing of something like JSON in a crazy small amount of code (and READABLE code at that). Take a look at the JSON::Tiny module's code. It's actually kinda beautiful

    • @vitalspark6288
      @vitalspark6288 4 ปีที่แล้ว +5

      Ken Thompson's `grep` was also a pretty important in-between step.

    • @DCFusor
      @DCFusor 4 ปีที่แล้ว +3

      Glad to see there are a few of us, still.

    • @xaviarnl
      @xaviarnl 4 ปีที่แล้ว +4

      And you appreciate Perl regexes even more when you see the weird regex implementations in other languages (I'm looking at you, Java).

  • @scaredyfish
    @scaredyfish 4 ปีที่แล้ว +2

    Was not expecting Noam Chomsky to come up in a Computerphile video!

    • @Roxor128
      @Roxor128 4 ปีที่แล้ว

      His hierarchy of languages has come up in a couple of past videos. I suspect that among this channel's audience, there's a decently large percentage who only know of him for that.

  • @ChunkyChest
    @ChunkyChest 4 ปีที่แล้ว +2

    Omg finally I asked for this 2 years ago

  • @hkmp5k
    @hkmp5k 4 ปีที่แล้ว

    Probably 2/3rds of my SQL or PL/SQL coding nowadays involves using the regexp_????? functions (regexp_replace and regexp_like in particular). They are so useful.

  • @carl-marvin
    @carl-marvin 4 ปีที่แล้ว

    I just tried to search a not completly readable serial code with google today, but regex seems not to be supported.
    So thank you for substituting my missed oportunity to revise/remember regular expressions!

  • @talideon
    @talideon 4 ปีที่แล้ว +1

    There are a fantastic collection of articles on regexes and implementing them efficiently on Russ Cox's website. They're a bit more in depth than this, but still pretty accessible, though they come with the assumption that you can read C.

    • @urbaniv
      @urbaniv 4 ปีที่แล้ว

      You need to understand C ... Yeah very accessible ;-)

    • @kallewirsch2263
      @kallewirsch2263 4 ปีที่แล้ว

      @Steven Tsakiris
      :-)
      C seperates the man from the quiche
      eaters.

  • @vectrex28
    @vectrex28 4 ปีที่แล้ว +3

    I used to struggle with regex until I took a formal language class :P

  • @zizzu549
    @zizzu549 4 ปีที่แล้ว +1

    Regular expressions are hard to read but on *unix you can't live without.

  • @1100000011110
    @1100000011110 4 ปีที่แล้ว +37

    A history lesson is fine and all, but I was hoping for a more practical video

    • @profdaveb6384
      @profdaveb6384 4 ปีที่แล้ว +17

      Hope to do a couple of case studies in detail in the next video

    • @adamdry9423
      @adamdry9423 4 ปีที่แล้ว +2

      @@profdaveb6384 it was a great video but must admit I expected a practical video too.

    • @arthurlevitsky3347
      @arthurlevitsky3347 2 ปีที่แล้ว +1

      If you think this isnt practical then you should see my professor explain this stuff

  • @AstAMoore
    @AstAMoore 4 ปีที่แล้ว

    Oh, boy. Writing my Z80 assembly syntax highlighter in RegEx for TextMate was a hoot and a half. And a nightmare. _Mostly_ a nightmare.

  • @tiagoandrade7945
    @tiagoandrade7945 ปีที่แล้ว

    amazing

  • @EgoShredder
    @EgoShredder 4 ปีที่แล้ว

    I know you should never put people on a pedestal, but I have to make an exception for Professor Brailsford!

  • @davidechiappetta
    @davidechiappetta 11 หลายเดือนก่อน +1

    at 6:42 it seems you place more emphasis on regular expressions (type 3) than the more powerful context free grammar (type 2), regular expressions are only limited in recognizing strings in text (pattern matching), and for lexers, or tools like grep (only 2000/3000 lines of code), awk etc..., but they can't do that much, I see you have the dragon book on your desk, and you value regular expressions more than cfg, the real geniuses were Chomsky and then McClure with his TMG, that Ken Tompson was so impressed that he used his principles also in the B language which later became C, same principles but greatly improved by Donald Knuth's inventing the most powerful LR parser followed by Stephen C. Johnson inventing Bison, I'm a parser fanatic, worked on a preprocessor C, contributed to add improvements in binutils and gcc source (bfd for the section elf and coff and dwarf for symbol debugger) and make version 4.x for variable expansions

  • @caver1
    @caver1 4 ปีที่แล้ว +2

    ab(a|b)b
    or for a single line
    ^ab(a|b)b$
    I bet there's other approaches that would work just fine too.

  • @brightshadow9480
    @brightshadow9480 4 ปีที่แล้ว +3

    If you can master Regex, you can add tremendous power to app for little effort. Seriously, if you need to do text parsing and validation, you are cheating yourself if you don't look into using them.

  • @Omnifarious0
    @Omnifarious0 4 ปีที่แล้ว +1

    Of course, the grammar way of doing things allows you to specify things that cannot be specified with a finite automata or a regular expression (namely nested constructs like parenthesis). :-) Just, the particular set of grammar statements he chose express exactly what's being expressed in the other two forms.

    • @MSStuckwisch
      @MSStuckwisch 4 ปีที่แล้ว

      I was about to comment the same thing. A formal grammar allows recursive structures, but most regex implementations don't have a way to do that. Thankfully there are some now coming out that do, and man does it make creating parsers so much easier.

    • @Omnifarious0
      @Omnifarious0 4 ปีที่แล้ว +1

      @@MSStuckwisch - But, it will make the regex handling slower. There's a reason for the very restrictive nature of regexes. It enables a very simple model for the computation. There is, in fact, a C++ library that will generate the C++ code to match a DFA at compile time for super-fast regex handling. Making it as fast as it is if you had to do full LALR or recursive descent parsing would be impossible I think.
      Regexes definitely have their place. And parsers have their place too. That's one of the reasons that a lot of compilers are written with a 'lexing' phase that uses a regex to tokenize the input, then uses a parser to parse the code.

    • @MSStuckwisch
      @MSStuckwisch 4 ปีที่แล้ว

      @@Omnifarious0 True, though the speed will really depend on the nature of the grammar. A fully unambiguous CFG (to eliminate any type of backingtracking) should be able to optimize to virtually the same speed. Of course, virtually no real-world grammars are written like that, and most of the time we'd want to do more than just a yay/nay match.
      To me it comes down to ease of writing / maintaining. Most of the time the slower parsing using regex grammars is worth it, but if something starts to become truly speed critical, then I'd look to coding it manually for speed. (An interesting speed test for optimizations can be the JSON::Tiny (regex) and JSON::Fast (imperative) modules in Raku. Overtime the Tiny has been catching up a little bit, although the Fast still runs in about 25% Tiny's time

  • @misode
    @misode 4 ปีที่แล้ว +10

    While this explains the history and background of Regex very well, it's not a video for regex beginners who want to use them.

  • @geraldbustos
    @geraldbustos 4 ปีที่แล้ว +7

    Please allow the automatic captions

    • @profdaveb6384
      @profdaveb6384 4 ปีที่แล้ว +1

      subs have been sent to Sean so shouldn't be too long now ...

  • @sp10sn
    @sp10sn 4 ปีที่แล้ว +2

    What the heck is determinate (deterministic) vs non? I comes up too often and in too many ways for me to infer and I don't understand the somewhat philosophical references that I find.

  • @zacharieetienne5784
    @zacharieetienne5784 4 ปีที่แล้ว +1

    yes i understood some of those words

  • @AntjeWeh
    @AntjeWeh 4 ปีที่แล้ว

    Brilliant and in a very effective way helpful, especially at work with people forcing one to demonstrate coolness

  • @lawrencedoliveiro9104
    @lawrencedoliveiro9104 4 ปีที่แล้ว

    10:46 Just noticed, that’s not a widescreen monitor. Is that a 4:3 ratio? I have both an 8:5 and 5:4 monitor within arm’s reach, and no matter how I lean to the side, I can’t make either of them look quite like that shape.

  • @dandan7884
    @dandan7884 4 ปีที่แล้ว +6

    ill never get what people dont get about regular expressions

    • @josephgaviota
      @josephgaviota 2 ปีที่แล้ว +1

      Pattern recognition is the essence of intelligence.
      Some people get regex, some don't.
      Therein lies the rub.

  • @leonhardeuler9839
    @leonhardeuler9839 4 ปีที่แล้ว +11

    1:14 it’s not a good idea to write zero inside a circle, professor Brailsford.

    • @Roxor128
      @Roxor128 4 ปีที่แล้ว

      Only if you forget to put a stroke through it.

  • @phsopher
    @phsopher 2 ปีที่แล้ว

    If this was the no-previous-experience-needed version I'd hate to see the one requiring it.

  • @sleblanc
    @sleblanc 4 ปีที่แล้ว

    In languages that permit it, parser combinators allow more verbose, but still highly readable regular expression syntax:
    myParser = do
    char 'a'
    many (char 'b')
    char 'c'

    • @sleblanc
      @sleblanc 4 ปีที่แล้ว

      An instance of (char x) is of type (Parser Char). When sequenced, the whole function effectively becomes type (Parser [Char]) which is equal to (Parser String).

  • @swagatochatterjee7104
    @swagatochatterjee7104 4 ปีที่แล้ว

    Few more videos and we will have Prof. Brailsford's Computerphile course on compilers.

  • @Simbosan
    @Simbosan 4 ปีที่แล้ว +1

    Should have shown the official regexp for an email address. It's enormous

  • @doougle
    @doougle 4 ปีที่แล้ว

    I've googled my way through using some regular expressions to scrape websites for data using perl and PHP. After seeing this I think I understand them both more and less than I did before.

    • @EduardoGonzalez-bm1mk
      @EduardoGonzalez-bm1mk 4 ปีที่แล้ว

      Doug Sarbon they re just a finite state machine , you input a string and the regex expresión ; them the regex compiler create an algorithm that tela if your string matches the expresión or not.

  • @DrDress
    @DrDress 4 ปีที่แล้ว

    I'm watching a series on computation theory on another channel and funnily enough he just covered some of this stuff. But I'm not a computer scientist and I still need someone tell me how this actually relates to real programming or machine design. Unless I truly wanted to build a machine that accept or reject strings of characters. But it does feel like that when I code.

  • @curiousmonkey7765
    @curiousmonkey7765 3 ปีที่แล้ว

    pattern matched:
    print('wow, i have the same exact shirt as professor's.)

  • @Dong_Harvey
    @Dong_Harvey 4 ปีที่แล้ว

    That shirt is awesome

  • @bretdavis8097
    @bretdavis8097 4 ปีที่แล้ว

    PEG is a better solution for complex parsing, but regex is fantastic for simpler cases. All about using the right tool for the job

  • @nathanrcoe1132
    @nathanrcoe1132 4 ปีที่แล้ว +1

    regex are entities that transform into a write-only language :P

  • @JoshBlasy
    @JoshBlasy 4 ปีที่แล้ว

    Having used regexes plenty of times for web scraping applications, I have no clue what this video is about. for the abab vs abbb why would you not just do r"ab[ab]b" or something?

    • @SnoopJeDi
      @SnoopJeDi 4 ปีที่แล้ว

      That expression corresponds to the deterministic automaton that he shows at the end of that example. When matching [ab], you are in the state that has exit paths for either letter.

  • @artit91
    @artit91 4 ปีที่แล้ว

    Regexp is the shortest way(often the worst) to define state transitions.