Funny enough, I did not use regular expressions at all during my solutions. That tells me that I need to keep my mind more open to another paradigm. Always love these videos. It shows me how much more I need to learn. I also really appreciated your description on word/non-word boundaries. Time to start learning a few more regex tricks. See you tomorrow on next week's problem!
The use of state machine for this problem was so clever! I still struggle to identify opportunities to implement state machines, other than the popular "automatic change machine" problem
The reason why your Regx does not match the Unicode mark is because you are only matching [a-zA-Z'] which is not on a word boundary. Note boundary is tested first then the letter it'self. Therefore, for "\u{41 300}la \u{E0}mour" the first byte 'A' does not match because it is on a boundary, the second-byte mark does not match because it is not in [a-zA-Z'] the first letter of the second word Unicode A with an accent does not match because it is on a boundary. Conclusion It is harder to understand because it is negation, you are looking for things not matching to remove them. Also, if you had a mark in the middle of the word, it would also not be removed.
Yeah, it's always a balance to try and strike. Smaller faces would make the code window wider, but it wouldn't actually mean we fit more code on the screen unless we increase font-size which also shows fewer LOCs so less context. But thank you for the feedback - always helpful to know what people think.
Thanks for helping to de-mystify some of the confusion around regex - very insightful video.
Funny enough, I did not use regular expressions at all during my solutions. That tells me that I need to keep my mind more open to another paradigm. Always love these videos. It shows me how much more I need to learn. I also really appreciated your description on word/non-word boundaries. Time to start learning a few more regex tricks. See you tomorrow on next week's problem!
Thanks so much for the lovely comment - really encouraging to read! Hope you like the latest video!
loads of good info! thank you. makes me want to learn AWK
Yes! Definitely my reaction too :)
The use of state machine for this problem was so clever! I still struggle to identify opportunities to implement state machines, other than the popular "automatic change machine" problem
Yeah, I really enjoyed seeing that. Interesting to consider that that's how it's probably implemented under the hood too!
Wow thanks for this video, for me Regex its a hard topic but you have helped me more to learn and understand it more!
Awesome. That's great to hear. So glad it was useful :)
The reason why your Regx does not match the Unicode mark is because you are only matching [a-zA-Z'] which is not on a word boundary.
Note boundary is tested first then the letter it'self.
Therefore, for "\u{41 300}la \u{E0}mour"
the first byte 'A' does not match because it is on a boundary,
the second-byte mark does not match because it is not in [a-zA-Z']
the first letter of the second word Unicode A with an accent does not match because it is on a boundary.
Conclusion
It is harder to understand because it is negation, you are looking for things not matching to remove them.
Also, if you had a mark in the middle of the word, it would also not be removed.
Have thought about the edge cases, a simple solution came to me.
/(?
@@kristianwhittick Lovely! Thanks for the explanation.
Next time, smaller faces, bigger code window
Yeah, it's always a balance to try and strike. Smaller faces would make the code window wider, but it wouldn't actually mean we fit more code on the screen unless we increase font-size which also shows fewer LOCs so less context. But thank you for the feedback - always helpful to know what people think.