honestly this does open some interesting philosophical ideas about how genius solutions and algorithms come to be. the best ideas are those that even though took a while to come up with, are comparatively easy to teach after they've been discovered.
you’d better consider yourself lucky if you get edit distance asked in an interview. It’s popular, intuitive and fairly moderate in complexity. I mean, it’s solving a real world problem and I’m all for it. There can be many DP problems that are just bad for interviews.
serious question: when you gotta make a program or a piece of code, whatever. how "original" it needs to be until it is acceptable to you? i mean, how many lines of code you can copy without have a guilty conscience? (not literally, but you get it, i think) also having in mind that you don't know that specific algorithm but you got to do it anyway
@@herrerapedro.mp4It relally does depend on what you are trying to achive If it's for learning purpopses why would you use someone elses solution to a problem, why not make it yourself? that implies that by copying you mean literally copying the code line by line but if by copying you mean that someone just has the idea of the solution to it you solve x by doing z thing and y thing you still have to code z and y thing even tough you know in what way you should i think these are what you call patents
@@herrerapedro.mp4I would say that if you are programming something and you copy code because you know it but don’t want to type it all out then it’s fine. Alternatively, you could also copy code to try and pick it apart and learn it better. There are no rules though, so do what you think is best for your situation
@@krolmuch bro what's with the attack? I'm a visual learner. I struggle to read. Video education is just easier for me to understand. It was mostly a joke anyway.
Two points: it’s actually the Damerau-Levenshtein algorithm; and the implementation given is O of n^2, which is unnecessary. You can use a moving window into the grid that is a diagonal stripe wide enough to hold the maximum acceptable edit distance. That makes the algorithm O of n.
6:36 Wagner-Fischer algorithm looks a lot like NeedleMan-Wunsch algorithm(it also a dynamic programming algorithm that is used for alignment of nucleotide, protein and other genetic sequences). It’s possibly the same algorithm but repurposed for alignment in genetic sequences.
While edit distance is still common, newer spellcheckers involve other techniques and have moved on to using neural networks for a lot of the heavy lifting.
had to make a levenshtein distance calculator (in python) for college about 2 months ago, had no idea it would be helpful for spellcheckers though! super interesting stuff
Excellent explanation. Modern spell checkers also use other techniques. One is transposition because that is one of teh most common spelling mistakes. Another is nearness of letters on the keyboard because people can mistype letters that are clise to each other.
Your explanation helped me a lot! But i think that I identificate a little mismatch in your explanation: i think that when m[0][j] == m[i][0] we should to copy the value in m[i-][j-1] instead of select the minimun value of the three neighbors positions. In some tests your method works, but sometimes it fails. Sorry for my english...
00:01 Spellcheckers rely on a sophisticated algorithm for accuracy 01:33 The Lenin distance algorithm was crucial for enhancing spell checkers. 03:10 The algorithm follows guard clauses and recursive comparisons. 04:54 Lenin distance algorithm is not practical due to its recursive nature 06:37 Wagner-Fischer algorithm uses dynamic programming for efficient spell checking. 08:23 Explanation of operations involved in transforming strings. 10:02 Wagner Fisher approach calculates edit distance efficiently 11:42 Spell checkers use edit distance to suggest correct words. Crafted by Merlin AI.
This algorithm actually paved the way for a lot of modern bioinformatics algorithms used to align two DNA sequences together, some of the most famous being Smith-Waterman and Needleman-Wunsch! It’s so cool to see the overlap!
@@ohaswintry look up the terms that @ScienceSuds commented in Google Scholar, as well as terms such as "Sequence Alignment". There's really a ton of work in this field!
Oh yeah don’t biologists check for mutations and differences in a genome by pasting it into word and spellchecking it when the original is in the spellcheckers dictionary?
Now I understand why cat, vat and pat are treated equal to the spell checkers. They don't know the key layout and not understand how unlikely I would accidentally press P when I want C.
This is my thought. Correct me if I am wrong. So every word we type in is checked in within a dataset, that if it is spelled correctly, if not then either insertion, deletion or substitution is performed amd after every such operation that word is again checked in the dataset??? And in the example BOATS to FLOAT, since boats is a valid word why would it call the spell check algo??
I always thought spellcheckers would incorporate the keyboard layout into their suggestions, as in correcting "worls" to "world", because s is one key away from d
This and parts of speech. Maybe track the most common errors based on vocab and document length, like the TH-cam algorithm recommending videos based on age, gender, etc.
Of course it does now. There is a video from Enrico Tartarotti released recently "The LIES That Make Your Tech ACTUALLY Work" where you can learn more about your idea and how it is implemented!
this was not what I had to email you about, but it is key info when typing the word "good," I have occasionally typed the word "goo." On subsequent review, I have found that allowing the substitution to go through, can be a great deal of fun. As in, "Hey, Bob, last night, we went out and had some goo Chinese." Just sharing Because it is important that people know.
Best way to teach Dynamic Programming is just simple hashmap memoization of the recursive function, and only teaching the 2D matrix after solving multiple DP problems with memoization.
Are there algorithms that put 10 finger typing into consideration? You most likely isn't a swap between positions of r and v, because they are type by the same hand but it's more likely though that a swap between r and o happened.
Would have loved TH-cam 30 yrs ago. In my day-yeah, I’m old-I had a class in which the last assignment was an assembly program for the intel 8086 that implemented a spellchecker. Prof said it would take 40 hrs if we knew what we were doing. No mention of Levenstein, Gorin, or any known algos. I took a 0, as I was behind in other things.
Dynamic Programming is one of the coolest design techniques in computer science. First time I learned it I was amazed. Cudos to the Richard Bellman who first developed the idea for it
@@Maker0824 not sure what they meant w the typewriter mention but i read this as to mean implementing something that provides a diff-like output (it being character-by character instead of by line tho)
9:30 why so? is this equivalent to the square bracket in the levenshtein formula? if yes, which box stands for which formula in the square bracket? or perhaps this is left as an exercise for the reader lmao. im a bit lazy ill look over it one more time😅
I wrote this version of the Levenshtein formula in C just now. It's recursive, however I optimized the two length checks so they only happen once and the we just increment the length value down as we increment the string pointer up. /*Levenshtein distance formula*/ #define min(a,b) ((a < b) ? a : b) int _lev(char *s1,int sl1, char *s2, int sl2); int lev(char *s1,char *s2) { int sl1 = strlen(s1); int sl2 = strlen(s2); return _lev(s1,sl1,s2,sl2); } int _lev(char *s1,int sl1,char *s2,int sl2) { if (sl2 == 0) return sl1; if (sl1 == 0) return sl2; if (s1[0] == s2[0]) return _lev(s1+1,sl1 - 1,s2+1, sl2 - 1);
int a = _lev((s1+1),sl1-1,s2,sl2); int b = _lev(s1,sl1, (s2+1),sl2-1); int c = _lev(s1+1, sl1-1,s2+1,sl2 - 1); return 1 + min(min(a,b),c); }
This is awesome, I learned a lot, thank you! I heard that some spell checkers use tries (prefix trees) for better auto-completion. I'd love to see a video on those as well, I adore your way of explaining!!
For anyone confused he's saying that because the levenshtein distance is considered a "metric space" Which basically means that if you imagine all strings as points in space, that the levenshtein distance works much the same as distance in real space. It sounds kind of meaningless at first but if you use it that way it actually unlocks certain properties of strings that enable some other clever algorithms for searching text.
pathfinding makes it possible to backtrack, this does not, it has only 1 thing in common with patfinding - finding the shortest path, this alghoritm however works completely different from pathfinding and has nothing in common with it.
Spellcheckers on my phone have gotten worse in the past decade. They keep replacing my CORRECTLY SPELLED words with "words that make more sense". No! It's spelled right don't fuck with it!
Nice video, very pedagogical, if you ever get tempted to make a follow up there is an optimization where instead of computing the entire matrix you only compute the distance bellow a threshold d, this corresponds to computing a wide diagonal in the middle of the dynamic programming matrix.
11:29 In your wagner_fischer implementation, why are you incrementing change? (line 17) If "previous_row[j-1]" was guaranteed to always be the smallest value, and none others shared that value, maybe it would work. Why not choose the minimum first and then add 1 to it after checking if the two letters are not the same? Or am I misunderstanding something?
The matrix is similar to an action table used to determine the symmetry of a group in accordance to an operation. Basically the math of dp which is you think of it is fractal
thank u very much , i learnt so much from this for my project
It blows my mind how much co. Computer science have evolved
Synthwave '84? You're based. Love seeing this theme out in the wild.
im here: 1:51
k-nearest neighbors
it has to be this
Omg could this be a channel about algorithms? 🤩
It took 20 years to solve the Edit Distance problem for the first time, but they want us to solve it in 1 hour of interview.
honestly this does open some interesting philosophical ideas about how genius solutions and algorithms come to be. the best ideas are those that even though took a while to come up with, are comparatively easy to teach after they've been discovered.
you’d better consider yourself lucky if you get edit distance asked in an interview. It’s popular, intuitive and fairly moderate in complexity. I mean, it’s solving a real world problem and I’m all for it. There can be many DP problems that are just bad for interviews.
Theres a difference between inventing and solving something already famous.
1:00 "sill" is not a misspelled word.
This video wouldve been super helpful 3 years ago in college. A professor had us make a spellchecker. It didnt not go well
serious question:
when you gotta make a program or a piece of code, whatever.
how "original" it needs to be until it is acceptable to you?
i mean, how many lines of code you can copy without have a guilty conscience? (not literally, but you get it, i think)
also having in mind that you don't know that specific algorithm but you got to do it anyway
@@herrerapedro.mp4It relally does depend on what you are trying to achive
If it's for learning purpopses why would you use someone elses solution to a problem, why not make it yourself? that implies that by copying you mean literally copying the code line by line but if by copying you mean that someone just has the idea of the solution to it
you solve x by doing z thing and y thing
you still have to code z and y thing even tough you know in what way you should i think these are what you call patents
@@herrerapedro.mp4I would say that if you are programming something and you copy code because you know it but don’t want to type it all out then it’s fine. Alternatively, you could also copy code to try and pick it apart and learn it better. There are no rules though, so do what you think is best for your situation
it wouldn't help you at all... you can't do basic research
@@krolmuch bro what's with the attack? I'm a visual learner. I struggle to read. Video education is just easier for me to understand. It was mostly a joke anyway.
Jaro & Winkler sitting on the corner:
Two points: it’s actually the Damerau-Levenshtein algorithm; and the implementation given is O of n^2, which is unnecessary. You can use a moving window into the grid that is a diagonal stripe wide enough to hold the maximum acceptable edit distance. That makes the algorithm O of n.
I meant that the commonly used algorithm is Damerau-Levenshtein.
u r amazing bro u directly helping me doing my P.hD
6:36 Wagner-Fischer algorithm looks a lot like NeedleMan-Wunsch algorithm(it also a dynamic programming algorithm that is used for alignment of nucleotide, protein and other genetic sequences). It’s possibly the same algorithm but repurposed for alignment in genetic sequences.
"But the algorithm could be improved!"
...
Creates an N^2 algorithm.
While edit distance is still common, newer spellcheckers involve other techniques and have moved on to using neural networks for a lot of the heavy lifting.
You can create a data structure, sorting words by letter
great video!
1:00 sill is a word..
Amazing video extremely interesting, simple and high quality
1:10 sill is a totally valid word! E.g. the window sill
Huh, good point. I did not even think about that 😂
he forgot to add in his word list
[till] is also a valid word...!
had to make a levenshtein distance calculator (in python) for college about 2 months ago, had no idea it would be helpful for spellcheckers though! super interesting stuff
Edit distance is a famous problem ask in software engineering interview!
Werk werk! Angelicaaa! Werk werk! Eliizaa! AND PEGGY!
Excellent explanation. Modern spell checkers also use other techniques. One is transposition because that is one of teh most common spelling mistakes. Another is nearness of letters on the keyboard because people can mistype letters that are clise to each other.
excellent showcasing of tranpsosition and nearnesd
it seems like the modern ones bridged the difference betweens actual spelling errors and what we might call typos
I see whay yuo did there
@@arandomguy9669 hwat a mitzure
Great video! Thanks for the excellent explanation. I found it really friendly and easy to understand.
Your explanation helped me a lot! But i think that I identificate a little mismatch in your explanation: i think that when m[0][j] == m[i][0] we should to copy the value in m[i-][j-1] instead of select the minimun value of the three neighbors positions. In some tests your method works, but sometimes it fails. Sorry for my english...
10:02 Shouldn't this be the diagonal, and not the minimum?
00:01 Spellcheckers rely on a sophisticated algorithm for accuracy
01:33 The Lenin distance algorithm was crucial for enhancing spell checkers.
03:10 The algorithm follows guard clauses and recursive comparisons.
04:54 Lenin distance algorithm is not practical due to its recursive nature
06:37 Wagner-Fischer algorithm uses dynamic programming for efficient spell checking.
08:23 Explanation of operations involved in transforming strings.
10:02 Wagner Fisher approach calculates edit distance efficiently
11:42 Spell checkers use edit distance to suggest correct words.
Crafted by Merlin AI.
not only is he a communist, he's also a computer scientist!
crafted by a meatbag
Oh no, communists are back to destroy computer science with the Lenin algorithm 😂
interesting, my quip about the Lenin distance is deleted? Did I offend a communist?
@@MikeTheSapien3I guess TH-cam took it personally lol
now this is awesome
Is this the windows spellchecker? "Hat?! Hmpff, I think you mean 'can'"
Spell checker, autocorrect, suggestions. . The bane of my existence
isnt this the diff algorithm too?
This algorithm actually paved the way for a lot of modern bioinformatics algorithms used to align two DNA sequences together, some of the most famous being Smith-Waterman and Needleman-Wunsch! It’s so cool to see the overlap!
do you know where i could find more about bioinformatics algorithms?
@@ohaswintry look up the terms that @ScienceSuds commented in Google Scholar, as well as terms such as "Sequence Alignment". There's really a ton of work in this field!
@@ohaswin search for fasta and blasta methods
Oh yeah don’t biologists check for mutations and differences in a genome by pasting it into word and spellchecking it when the original is in the spellcheckers dictionary?
It is so cool to see the overlap! pun intended
That mf who auto correct "is" to "us" and make a nontypo..a typo
Now I understand why cat, vat and pat are treated equal to the spell checkers. They don't know the key layout and not understand how unlikely I would accidentally press P when I want C.
Wow such a great explanation, I am definitely trying out writing a spellchecker
can we have a keyboard/setup tour
omg I had to solve this exact problem to implement a feature similar to git diff
I presume this wagner-fisher algorithm is also what is behind the edit distance (file diffing) in git
how is operational decided in wagnor fischer algo
Here’s my spell checker
If word == ‘fucking’
Return ‘ducking’
U dat is gewoon goed genederlandst hoor
It was extremely wonderful. Thanks for your great explanations 😍
I'd like to know why the suggestion section will have the word I meant to type but the auto correct picks the wrong one to use.
So frustrating!
The content is amazing. The sound volume is too low, hard to hear.
This is very interesting! Great video.
I'd love a deep dive series into such algorithms, slowly building up to NLPs
This is my thought. Correct me if I am wrong. So every word we type in is checked in within a dataset, that if it is spelled correctly, if not then either insertion, deletion or substitution is performed amd after every such operation that word is again checked in the dataset???
And in the example BOATS to FLOAT, since boats is a valid word why would it call the spell check algo??
I always thought spellcheckers would incorporate the keyboard layout into their suggestions, as in correcting "worls" to "world", because s is one key away from d
I'm sure some do.
Same! I’m always like “why can’t you tell that I just missed one letter!!!”
Keep in mind many different keyboard layouts exist. You could also have a case where a written file is OCR'd in which case that wouldn't be relevant.
This and parts of speech.
Maybe track the most common errors based on vocab and document length, like the TH-cam algorithm recommending videos based on age, gender, etc.
Of course it does now. There is a video from Enrico Tartarotti released recently "The LIES That Make Your Tech ACTUALLY Work" where you can learn more about your idea and how it is implemented!
I needed more explanation on how Google recommend only one or two words instead list of works
Let us know about google s algorithm
my head hurts, crazy sudoku
Algorithms like Levenshtein distance and Jaro Winkler are amazingly important and have impacted the development of a lot of sophisticated systems.
this was not what I had to email you about, but it is key info
when typing the word "good," I have occasionally typed the word "goo."
On subsequent review, I have found that allowing the substitution to go through, can be a great deal of fun.
As in, "Hey, Bob, last night, we went out and had some goo Chinese."
Just sharing
Because it is important that people know.
Not clear: is the asymptotic memory consumption O(n) or O(n²) ?
i was hoping to learn about the modern algortihms, but well, now i know the history behind it. hope to see a part 2
Thu thumbnail is just in Liverpool accent 😂
Whenever I see matrices, I think GPU. GPU accelerated spell checker?
People will slap AI being a tool like that for investment
whats the font? looks good
overall nice video
Thank you for this! The visuals are great!
Best way to teach Dynamic Programming is just simple hashmap memoization of the recursive function, and only teaching the 2D matrix after solving multiple DP problems with memoization.
super informative .. thanksss
Can anyone tell which vs code theme is he using
Are there algorithms that put 10 finger typing into consideration? You most likely isn't a swap between positions of r and v, because they are type by the same hand but it's more likely though that a swap between r and o happened.
Dew knot trussed yore spell chequer two fined awl miss steaks.
But hat and can are perfectly cromulent words already.
At 9:25 why substituning makes us count diagonally? It it because of a subsititution is a "sum" of deletion and insertion?
that was the best explanation of Dynamic programing ive ever heard
Would have loved TH-cam 30 yrs ago. In my day-yeah, I’m old-I had a class in which the last assignment was an assembly program for the intel 8086 that implemented a spellchecker. Prof said it would take 40 hrs if we knew what we were doing. No mention of Levenstein, Gorin, or any known algos. I took a 0, as I was behind in other things.
That's insane
one of the students names was probably fischer or wagner
Dynamic Programming is one of the coolest design techniques in computer science. First time I learned it I was amazed. Cudos to the Richard Bellman who first developed the idea for it
Loved all of it! Smooth editing and voiceover, and a nicely paced explanation! Great work ✌
You should absolutely make more videos like this! You're extremely good at explaining things and this video was genuinely so interesting. Well done :)
Just amazing explanation
Fun fact: werk is work in dutch! So it isn't a typo! 😂
What is an interesting addition to the algorithm is actually providing a list of the changes between the two, like for a typewriter.
I read this like 7 times and I can’t tell what you are trying to say
@@Maker0824 not sure what they meant w the typewriter mention but i read this as to mean implementing something that provides a diff-like output (it being character-by character instead of by line tho)
9:30 why so? is this equivalent to the square bracket in the levenshtein formula? if yes, which box stands for which formula in the square bracket?
or perhaps this is left as an exercise for the reader lmao. im a bit lazy ill look over it one more time😅
Thank
the autoguesser in my phone changes "for" to Fortnite. I think i only said that word once ever. remakbly annoying when these systems fail us
Nice video 👍👍❤
I wrote this version of the Levenshtein formula in C just now. It's recursive, however I optimized the two length checks so they only happen once and the we just increment the length value down as we increment the string pointer up.
/*Levenshtein distance formula*/
#define min(a,b) ((a < b) ? a : b)
int _lev(char *s1,int sl1, char *s2, int sl2);
int lev(char *s1,char *s2)
{
int sl1 = strlen(s1);
int sl2 = strlen(s2);
return _lev(s1,sl1,s2,sl2);
}
int _lev(char *s1,int sl1,char *s2,int sl2)
{
if (sl2 == 0)
return sl1;
if (sl1 == 0)
return sl2;
if (s1[0] == s2[0])
return _lev(s1+1,sl1 - 1,s2+1, sl2 - 1);
int a = _lev((s1+1),sl1-1,s2,sl2);
int b = _lev(s1,sl1, (s2+1),sl2-1);
int c = _lev(s1+1, sl1-1,s2+1,sl2 - 1);
return 1 + min(min(a,b),c);
}
Algorithms with historical context videos are the best.
Isn’t sill an actual word though?
As in window sill
This is awesome, I learned a lot, thank you! I heard that some spell checkers use tries (prefix trees) for better auto-completion. I'd love to see a video on those as well, I adore your way of explaining!!
levenshtein distances is basically a pathfinding algorithm.
what??? its not even remotely close to that
@@Zaary i agree
Yes, its working out the unknown path (there could be more than one) from one word to another, thats true.
For anyone confused he's saying that because the levenshtein distance is considered a "metric space" Which basically means that if you imagine all strings as points in space, that the levenshtein distance works much the same as distance in real space.
It sounds kind of meaningless at first but if you use it that way it actually unlocks certain properties of strings that enable some other clever algorithms for searching text.
pathfinding makes it possible to backtrack, this does not, it has only 1 thing in common with patfinding - finding the shortest path, this alghoritm however works completely different from pathfinding and has nothing in common with it.
Awesome 😍
What theme do you use for VSCode? It looks so good
Quite a interesting history.
Why can't universities explain things like this in short concise ways? So many people struggle with algos bc they still teach it like it's the 70s
Hey! Your mic volume feels too low, just came from another video and had to raise the volume from 5 to like 8 haha. Other than that, cool video!
Fisher and Wagner did not invent the bicycle, Levinstein invented an algorithm, it is up to you on how you would implement it
English: Spellcheckers are great
Chinese: 😳
Werk still means work in dutch. So task failed successfully?
You sound exactly like a dude I used to work with
Which tool are you using for the slides and transitions?
This was awesome!! I think I just found an awesome new channel :D
Spellcheckers on my phone have gotten worse in the past decade. They keep replacing my CORRECTLY SPELLED words with "words that make more sense". No! It's spelled right don't fuck with it!
That sounds like an iOS issue.
@@cmyk8964 android
How do you do this animations? Are you using some kind of library like Manim?
Nice video, very pedagogical, if you ever get tempted to make a follow up there is an optimization where instead of computing the entire matrix you only compute the distance bellow a threshold d, this corresponds to computing a wide diagonal in the middle of the dynamic programming matrix.
11:29
In your wagner_fischer implementation, why are you incrementing change? (line 17) If "previous_row[j-1]" was guaranteed to always be the smallest value, and none others shared that value, maybe it would work. Why not choose the minimum first and then add 1 to it after checking if the two letters are not the same? Or am I misunderstanding something?
The matrix is similar to an action table used to determine the symmetry of a group in accordance to an operation. Basically the math of dp which is you think of it is fractal
Werk is afrikaans for work
I was like whats wrong here, and then i realized you shouldnt use 2 langauges interchangeably
😂😂😂