I was wanting a sort of crash course on Rust and watching you work actually taught me so much more! I love the way you break down problems and it’s not too difficult to follow. Keep it up man. You have a new follower!
Hello, I am a Korean web developer who is studying while listening to your lecture. First of all, I want to say thank you. Thanks to you, I am learning a lot about programming itself because I can see the coding process of top programmers as well as the Rust language. Thank you. Can I ask you a favor? I want you to turn on TH-cam automatic English subtitles so that users of non-English speaking countries, including myself, can easily listen to lectures. Recent lectures have automatic English subtitles, but the early ones you uploaded do not have automatic English subtitles on TH-cam. Please turn on the TH-cam automatic English subtitles for people who are not familiar with English like me or have difficulty hearing. Thank you always. from your big fan in south korea.
Hi there! I'm glad you've found these videos helpful! Automatic subtitles is actually on for all the older videos too, but TH-cam used to (and maybe still do) not add automatic subtitles to videos longer than a certain length, so I suspect that's what you're seeing :)
The insight around 2:21:12 (reusing correctness computation) is that if you plug the guess as the expected answer and it yields a different mask (to the one you actually got previously) against the previous guess - then it *cannot* be the correct answer. If it were the correct answer, it would have to do exactly the same again as it did previously with the previous guess. So you can reject every remaining word that does not yield the same mask when treated as the target answer. And since you don’t know *anything else* about the answer except for the mask it returns, you cannot reject any word that yields the same mask as the true answer. That’s why you can just reuse the game logic and plug your candidate as the expected answer and compare the masks returned previously by actual game and now by the guess.
Right. Basically you know that the correct answer gives that mask, if a word does not give that mask it cannot be the correct answer. Makes sense to me.
Also, I’m not sure if you change that later, but so far (around 3 h in) you seem to be only considering remaining candidates as valid guesses. I believe the original algorithm considered all words accepted by Wordle (even if they’re already eliminated by the masks returned from previous guesses) - obviously you know they won’t be the right answer, but one of them still might be optimal for information gain. For example you might guess a word which has a letter X repeated in the same place where it already was yellow previously (thus not a correct answer), but it has such a combination of other letters that the rest of the returned mask will further narrow down the candidate list (and would do that better than any of the remaining candidates, as they might eg. give you insight only if they are the right answer, but wouldn’t narrow down anything if they’re not - so you might want to trade off the chance of winning in this step to gain more info for better chance in the next step).
I think I learned more from this one video than I learned in my entire first year of college. This was an amazing stream. Would love to see more of these!
As someone who's just discovered Rust and decided his first project would be the Wordle game, you wouldn't believe how much fun it is to see you make the same assumptions with regards to compute(). It was all so very easy until I wrote tests. 😄
During the correctness check: If you loop over the answers you do not need the `let mut used = [false; 5];` array. If answer == guess you can mark it as correct and if it isn't loop over the guess array and mark the first matching one that is still set to Wrong as Misplaced (if I understood the problem correctly). (At minute 55)
Some more insight relevant to 1:41:21 and 2:23:38: I ended up implementing the unoptimized algorithm on my own (before watching the relevant part of the video) and comparing that in retrospect. For the functionality of Guess.matches (described on 1:41:21), I simply used Correctness::check (later reached on 2:23:38), then verified that the returned mask is equal to the mask inside &self. Here's the logic: I was thinking in the lines of "I cannot possibly retain a word that, had their roles been swapped (answer and guessed word), will not provide the same mask". It's useful to think of the mask as a bilateral non-directional relation between two words. It does not care which is the answer, and which is the guessed word, but provides information that relates the letters of both words to each other.
I have tried to come up with a good explanation for my "Use Correctness::calculate() in matches()" idea. The best I've come up with is this (I don't remember the argument order to the functions): matches(word, guess, pattern) wants to ask the question "Given that guessing guess yielded pattern, could word be the answer?" The line Correctness::calculate(word, guess) == pattern; asks "If word were the answer, would guessing guess yield pattern?" These questions are equivalent, under the assumption that a given answer and guess always yields the same pattern. Given some guess, the possible patterns partition the (remaining) dictionary into subdictionaries. Each word is in exactly one of these subdictionaries. Our main goal in the loop where we call matches() is basically to filter down to the partition that the current pattern corresponds to. I had to leave the stream early, and I won't have time to see the rest for a while, so I don't know whether you did what I'm about to suggest now, but here it is: We want to partition the dictionary. The way it is done (at least at 3:30:00, the time where you use my correctness calculation suggestion), we loop over the patterns, and filter the entire dictionary against each pattern. It seems like it would be faster to go through the entire dictionary once and calculate the correctness against guess, then construct the partitions from there. Something like let mut parts: HashMap = HashMap::new(); for word in dict { parts.get_mut(Correctness::calculate(word, guess)).insert(word); } Yes, something needs to be done about initialisation here. But apart from that, I think this should be better.
Nitpick: I thought the same thing you did at firs but upon some thought I realized that your comment at 3:40:00 is actually wrong. If G gives mask C against A, A doesn't have to give mask C against G. Imagine: G: "awxyz", A: "bacde" -> G|A = "MWWWW", A|G = "WMWWW". However the code is still correct as explained by others here.
I guess the biggest impact would be to calculate the probability of a mask in a different way: you could generate a mask for every word against your potential guess, than increment counters on a hashmap in witch the masks are keys. At the end you divide each count by the total number of words, this would eliminate one big loop. I did this and I could run the entire thing on 100ms (including de first guess being calculated).
I'd suggest that you can use fzf to find your history of command line, I noted you're using fish shell but the examples of the configuration that enables your shell to do some fuzzy search with your system are mostly in bash/zsh (i didn't check if fish has examples as well), you can take a look and see if that suits you : )
Great session! You were wondering why `pop` would be so expensive -- the callgrind output showed that it was called 79M times! That means matches() was being called way too often. Each subsequent guess should only consider those words in the dictionary that exactly match the most recent mask. Among those candidates, you could make the next guess among the most frequent words, OR you could choose a word that will be most likely to give the next mask with the most information. Some combination of both approaches will probably yield the best scores.
Ah, so, I wasn't so much confused that "pop" was taking up so many cycles. I was confused about why _that_ pop was showing up in so many samples, but _not_ the other pops around it. What made that pop special?
I would simply account this to sampling errors, remember perf is not 100% accurate. Old video (C++) but still gives a good explanation: th-cam.com/video/nXaxk27zwlk/w-d-xo.html The video in total is still worth a look
@@TTrippleT it's true that perf isn't 100% accurate, but the fact that the other pops get some samples, yet the skew seems so significant, it does feel like something else is going on too.
@@jonhoo Yeah you are right assuming its the accumulated cost of stuff happening before (could also be something different) it should probably spread across the pop instructions. Definitely a really strange behavior I also have no idea why this is happening, blaming it on perf accuracy was just my escape goat :)
@@jonhoo It was the last pop, so it’s a clue that the very common execution path has an early return. In assembly, this is a jump to the last pop. That’s the only pop needed to correct the stack.
The look up at 1:10:08 is some magic I need to learn. I'd love to be able to search specific documentation with a ! macro in the web browser like that, but no idea how that's done.
@@michaeletzkorn The keystroke combination was alternating 'n' and '.' In (neo)vim, 'n' goes to the next search result (you can see the search pattern a few seconds earlier, he's searching for lines that don't end with a number). '.' repeats the last command (in this case, it was something along the lines of 'A 1' which Appends ' 1' to the end of the line) Custom lookups are built into most browsers, e.g. you can configure them in Firefox in Settings->Search->Search Shortcuts, or in Brave it's under Settings->Search Engine->Manage search engines.
@Josh very cool did not know about '.'; this works the same in vim. I mostly use vs code and only vim here and there when doing something quickly in a terminal, so I'm easily impressed by (n)vim inputs like this. @Jon - thanks for the response :) I was already considering the switch and ! is definitely all the more reason to!
I really enjoy these videos. But it can be frustrating to follow along with all the random single-letter variable names. `match m ..... if g == w` only makes sense, if I scroll back up and re-read what's going on. Whereas verbose variables would make it read better.
I have a ray-tracing project that is running too slow compared to alternatives. I assume it has to do with Computer Science stuff and not necessarily the algorithms themselves (although they could be improved). If you want to use it as an example, I'd love to connect!
I want this too! Literally, I also have a Rust raytracer that isn't running as fast as I would like. I'll get in line. 😅 A "code review" Rust stream would be amazing (from anyone, but Jon specifically, though his plate is probably too full).
Based on playing the game, one should not limit to the opening word. It should be opening 2 words and basically we should build a tree for the solution based on the word. I am typing this as I am watching the video
Very nice video! I found it very entertaining and enlightening. I love 3blue1brown, so I thoroughly enjoyed the topic. I do want to note that the explanation about the correctness computation giving the answer to the matches query given around 3:40:20 is not quite accurate. If for instance the guess is abbbb for an answer of cccca, the correctness mask is [M W W W W], because the a in the first position is misplaced and all the b's are wrong, but if the situation is reversed and you guess cccca for the answer abbbb, the correctness mask is [W W W W M], because it is the a in the fifth position that is misplaced this time. Also, I'm not sure if I understood the pattern pruning optimization correctly. As I understand it, each time you play a game, you start with the full set of valid correctness patterns (3^5 = 243), and while deciding on each guess (after the first, which is fixed and does no computation), you iterate through all potential answers and for each answer you calculate its goodness by iterating through all remaining correctness patterns and checking how many of the potential answers would yield that pattern if you would guess the word you are evaluating. If no answers would yield that pattern, then you remove it from the list of consideration for the rest of the game. If that is indeed how it is to be understood, then I don't think this is a valid operation, since the set of valid patterns of course depends on the word that is guessed. If for instance, the set of possible answers has been so reduced that the only place a 'q' appears is at the beginning of a word (but the letter 'q' has not been confirmed or denied so far), then when a word beginning with 'q' is considered, all patterns that have yellow (misplaced) in the first position will be pruned because 'q' can't be misplaced in the first position according to the remaining answer list. But surely there are still valid patterns with yellow at the first position when guessing some different word that doesn't begin with 'q' (like those guesses beginning with the second letter of the answer).
Admittedly not through the 6 hours, but the naive guesser could have started by marking all yellows, and then override them with green (the same as starting with incorrect is overridden by marking yellow). By switching the order you wouldn't have to keep track of the previous used, right?
It doesn't just walk through the list of words it scrambles them in a certain predictable way so you can't just look up the day's answer without going through the indexing algorithm
Shouldn't you prune the patterns only after you know that the candidate you're testing will be the actual guess? Like, an empty pattern for a candidate might not be empty for another one, no?
To be honest I don't understand what the popularity of words has to do with the solver. My understanding is that there are 1. a set of words that can be guessed and 2. a set of words that might be the answer (strict subset of 1) and the answer is uniformly picked from (2). Is that incorrect? Because if that's how it works I can't see how the real-world usage frequency of words has any bearing on the problem.
Ah, no, Wordle has a set of games that is a subset of the dictionary of words that Wordle allows. The set of game words is curated, and contains mostly fairly common words. Hence the bias.
That's basically the difference between easy and hard mode - in hard mode you're (mostly) not allowed to guess words that you have clues indicating _cannot_ be the answer. See also github.com/jonhoo/roget/pull/9 and github.com/jonhoo/roget/commit/4482ad13db3c556703565f81419223f365c4f96b
Good stuff! I'm following along on a Macbook from the very beginning. "awk '{print $2}' 5-letters-lc-sorted-combined.txt | paste -sd+ | bc" wouldn't run. I'm not sure why, but I skipped that single command. The rest is awesome. I appreciate your content immensely.
The file names algorithms.rs and algorithms/mod.rs are equivalent as far as editions 2018+ are concerned. Submodules of algorithms will have to be put in the algorithms directory. And again you can either put a submodule foo file at algorithms/foo.rs, or use algorithms/foo/mod.rs.
Could you please tell me what you have done for the status bar to be on bottom ? By the way still in the middle of this great tutorial as is long :) thanks
It looked like the sort was unstable with each run. Did I catch the output of prune changing each time? And isn’t any sort over floating points considered unstable?
1:05:15 i always wonder how after he yanked some line than he can just paste it down below of the yanked line without even going there, i really need to use this on my vim, any idea how anyone?
He seems to have a custom keybind set up for it so he can do it in one keystroke, but the command sequence after having the block selected would be (line breaks just for readability): y '> (go to end of last selection) p
1:04:00 Is there really a reason to write test cases you can prove redundant by analysis? I write tests when it's non-trivial to prove a function's behaviour mathematically, or for edge cases that are non-trivial to consider.
My man's gonna zip two sets ont one another, then enumerate and loop over it and name his variables "i, g, m, w" Sir you're allowed words in your iterator variables when you've got that many! You're not playing code golf and our monitors exceed 80 characters now
As you noted, not all words are tagged with an identifier. Making the identifier pattern option in ripgrep reveals a much larger list. The pattern, "^[a-zA-Z]{5}(?:_[A-Z]{3,4})?\t" leads to just over 4.2M lines. Curiously, using zgrep with the same pattern (and flags -E for expanded regex, -h to not print the file names) I get even more, just over 4.3M lines. Am I wrong to use the same regex for both? Commands are pasted below. Ripgrep: ```rg -Iz "^[a-zA-Z]{5}(?:_[A-Z]{3,4})?\t" 1-*-of-00024.gz > 5-letters.txt``` zgrep: ```zgrep -Eh "^[a-zA-Z]{5}(?:_[A-Z]{3,4})?\t" 1-*-of-00024.gz > 5-letters.txt```
Hmm, it could be that they have slightly different regex semantics, I'm bit sure. In any case I've singe updated the way the dictionary is constructed to be entirely in Rust: github.com/jonhoo/roget/commit/4e57548bee971cd627df88cd52ac0f61c4d233a4
Not that it really matters, but I'm pretty sure that the "x_NOUN" type records are all counted under the "x" record. So the canonical record is just the word immediately followed by a tab. '^[a-zA-Z]{5}\t'
Calling in sick on Monday to watch this 🙏
I was wanting a sort of crash course on Rust and watching you work actually taught me so much more! I love the way you break down problems and it’s not too difficult to follow.
Keep it up man. You have a new follower!
Hello, I am a Korean web developer who is studying while listening to your lecture.
First of all, I want to say thank you. Thanks to you, I am learning a lot about programming itself because I can see the coding process of top programmers as well as the Rust language. Thank you.
Can I ask you a favor?
I want you to turn on TH-cam automatic English subtitles so that users of non-English speaking countries, including myself, can easily listen to lectures.
Recent lectures have automatic English subtitles, but the early ones you uploaded do not have automatic English subtitles on TH-cam.
Please turn on the TH-cam automatic English subtitles for people who are not familiar with English like me or have difficulty hearing.
Thank you always. from your big fan in south korea.
Hi there! I'm glad you've found these videos helpful!
Automatic subtitles is actually on for all the older videos too, but TH-cam used to (and maybe still do) not add automatic subtitles to videos longer than a certain length, so I suspect that's what you're seeing :)
I don't know how to explain why, but this video is extremely good, I would want to watch several of them in a row
These are my new favourite streams, not too complex, but really useful for everyday work :)
For me, this video represents the best form of education. Accessible and with the highest intellectual quality ever.
The insight around 2:21:12 (reusing correctness computation) is that if you plug the guess as the expected answer and it yields a different mask (to the one you actually got previously) against the previous guess - then it *cannot* be the correct answer. If it were the correct answer, it would have to do exactly the same again as it did previously with the previous guess.
So you can reject every remaining word that does not yield the same mask when treated as the target answer. And since you don’t know *anything else* about the answer except for the mask it returns, you cannot reject any word that yields the same mask as the true answer.
That’s why you can just reuse the game logic and plug your candidate as the expected answer and compare the masks returned previously by actual game and now by the guess.
Right. Basically you know that the correct answer gives that mask, if a word does not give that mask it cannot be the correct answer. Makes sense to me.
Also, I’m not sure if you change that later, but so far (around 3 h in) you seem to be only considering remaining candidates as valid guesses. I believe the original algorithm considered all words accepted by Wordle (even if they’re already eliminated by the masks returned from previous guesses) - obviously you know they won’t be the right answer, but one of them still might be optimal for information gain.
For example you might guess a word which has a letter X repeated in the same place where it already was yellow previously (thus not a correct answer), but it has such a combination of other letters that the rest of the returned mask will further narrow down the candidate list (and would do that better than any of the remaining candidates, as they might eg. give you insight only if they are the right answer, but wouldn’t narrow down anything if they’re not - so you might want to trade off the chance of winning in this step to gain more info for better chance in the next step).
That’s about what I did with my implementation. Maybe he’s only going for hard mode?
I think I learned more from this one video than I learned in my entire first year of college. This was an amazing stream. Would love to see more of these!
Maybe you should've listened better in college
As someone who's just discovered Rust and decided his first project would be the Wordle game, you wouldn't believe how much fun it is to see you make the same assumptions with regards to compute(). It was all so very easy until I wrote tests. 😄
After watching this I'm exhausted! I had almost no grey-outs this episode while previous videos felt like they were in Norwegen. 😀
26 minutes into the video (feels like 2) and he says, "...and now this is a good place to start." I've found my perfect video! ❤
“Fn main? That seems good” 😂😂😂
During the correctness check: If you loop over the answers you do not need the `let mut used = [false; 5];` array. If answer == guess you can mark it as correct and if it isn't loop over the guess array and mark the first matching one that is still set to Wrong as Misplaced (if I understood the problem correctly).
(At minute 55)
So awesome to watch the process end to end of identifying slow points in the code. Keep up the great work!
Some more insight relevant to 1:41:21 and 2:23:38:
I ended up implementing the unoptimized algorithm on my own (before watching the relevant part of the video) and comparing that in retrospect. For the functionality of Guess.matches (described on 1:41:21), I simply used Correctness::check (later reached on 2:23:38), then verified that the returned mask is equal to the mask inside &self.
Here's the logic:
I was thinking in the lines of "I cannot possibly retain a word that, had their roles been swapped (answer and guessed word), will not provide the same mask".
It's useful to think of the mask as a bilateral non-directional relation between two words. It does not care which is the answer, and which is the guessed word, but provides information that relates the letters of both words to each other.
ORDER=guess
CARES=answer
WMWCW=mask
CARES=guess
ORDER=answer
WWMCW=match
The mask very much cares about order...
Watching this now and tried Wordle, the word was "Rusty"....what a weird coincident 😂
Would be interesting to also look at cache utilization of this program and your changes
I have tried to come up with a good explanation for my "Use Correctness::calculate() in matches()" idea. The best I've come up with is this (I don't remember the argument order to the functions):
matches(word, guess, pattern) wants to ask the question "Given that guessing guess yielded pattern, could word be the answer?" The line Correctness::calculate(word, guess) == pattern; asks "If word were the answer, would guessing guess yield pattern?" These questions are equivalent, under the assumption that a given answer and guess always yields the same pattern.
Given some guess, the possible patterns partition the (remaining) dictionary into subdictionaries. Each word is in exactly one of these subdictionaries. Our main goal in the loop where we call matches() is basically to filter down to the partition that the current pattern corresponds to.
I had to leave the stream early, and I won't have time to see the rest for a while, so I don't know whether you did what I'm about to suggest now, but here it is:
We want to partition the dictionary. The way it is done (at least at 3:30:00, the time where you use my correctness calculation suggestion), we loop over the patterns, and filter the entire dictionary against each pattern. It seems like it would be faster to go through the entire dictionary once and calculate the correctness against guess, then construct the partitions from there. Something like
let mut parts: HashMap = HashMap::new();
for word in dict {
parts.get_mut(Correctness::calculate(word, guess)).insert(word);
}
Yes, something needs to be done about initialisation here. But apart from that, I think this should be better.
Nitpick: I thought the same thing you did at firs but upon some thought I realized that your comment at 3:40:00 is actually wrong. If G gives mask C against A, A doesn't have to give mask C against G. Imagine: G: "awxyz", A: "bacde" -> G|A = "MWWWW", A|G = "WMWWW". However the code is still correct as explained by others here.
I guess the biggest impact would be to calculate the probability of a mask in a different way: you could generate a mask for every word against your potential guess, than increment counters on a hashmap in witch the masks are keys. At the end you divide each count by the total number of words, this would eliminate one big loop. I did this and I could run the entire thing on 100ms (including de first guess being calculated).
3:56:45 I recently learned the name of this phenomenon. It's called semantic satiation!
I especially appreciated your impersonation of a grumpy old man ^^
That was pretty hilarious.
And the coding part was of course amazing as well!
Great to see collaboration between channels specially these 2 that are great!
I love you Jon.
Ahh, so cool, looking forward to getting through this one. I learn so much from your streams, thank you so much!
I learned a lot about rust and useful tools. Thanks for the interesting stream!
I'd suggest that you can use fzf to find your history of command line, I noted you're using fish shell but the examples of the configuration that enables your shell to do some fuzzy search with your system are mostly in bash/zsh (i didn't check if fish has examples as well), you can take a look and see if that suits you : )
Great session!
You were wondering why `pop` would be so expensive -- the callgrind output showed that it was called 79M times! That means matches() was being called way too often. Each subsequent guess should only consider those words in the dictionary that exactly match the most recent mask. Among those candidates, you could make the next guess among the most frequent words, OR you could choose a word that will be most likely to give the next mask with the most information. Some combination of both approaches will probably yield the best scores.
Ah, so, I wasn't so much confused that "pop" was taking up so many cycles. I was confused about why _that_ pop was showing up in so many samples, but _not_ the other pops around it. What made that pop special?
I would simply account this to sampling errors, remember perf is not 100% accurate.
Old video (C++) but still gives a good explanation: th-cam.com/video/nXaxk27zwlk/w-d-xo.html The video in total is still worth a look
@@TTrippleT it's true that perf isn't 100% accurate, but the fact that the other pops get some samples, yet the skew seems so significant, it does feel like something else is going on too.
@@jonhoo Yeah you are right assuming its the accumulated cost of stuff happening before (could also be something different) it should probably spread across the pop instructions. Definitely a really strange behavior I also have no idea why this is happening, blaming it on perf accuracy was just my escape goat :)
@@jonhoo It was the last pop, so it’s a clue that the very common execution path has an early return. In assembly, this is a jump to the last pop. That’s the only pop needed to correct the stack.
The look up at 1:10:08 is some magic I need to learn. I'd love to be able to search specific documentation with a ! macro in the web browser like that, but no idea how that's done.
I also really want to know what the keystrokes at 1:24:48 were
@@michaeletzkorn The keystroke combination was alternating 'n' and '.'
In (neo)vim, 'n' goes to the next search result (you can see the search pattern a few seconds earlier, he's searching for lines that don't end with a number).
'.' repeats the last command (in this case, it was something along the lines of 'A 1' which Appends ' 1' to the end of the line)
Custom lookups are built into most browsers, e.g. you can configure them in Firefox in Settings->Search->Search Shortcuts, or in Brave it's under Settings->Search Engine->Manage search engines.
The ! search is actually just a built in feature if DuckDuckGo :)
@Josh very cool did not know about '.'; this works the same in vim. I mostly use vs code and only vim here and there when doing something quickly in a terminal, so I'm easily impressed by (n)vim inputs like this.
@Jon - thanks for the response :) I was already considering the switch and ! is definitely all the more reason to!
I really enjoy these videos. But it can be frustrating to follow along with all the random single-letter variable names. `match m ..... if g == w` only makes sense, if I scroll back up and re-read what's going on. Whereas verbose variables would make it read better.
I have a ray-tracing project that is running too slow compared to alternatives. I assume it has to do with Computer Science stuff and not necessarily the algorithms themselves (although they could be improved). If you want to use it as an example, I'd love to connect!
I want this too! Literally, I also have a Rust raytracer that isn't running as fast as I would like. I'll get in line. 😅 A "code review" Rust stream would be amazing (from anyone, but Jon specifically, though his plate is probably too full).
Followed this and bought your book... after some effort, I managed to reduce runtime by 25%. Still not the fastest, but big change.
Go get his book.
Based on playing the game, one should not limit to the opening word. It should be opening 2 words and basically we should build a tree for the solution based on the word. I am typing this as I am watching the video
5:50:29 I wonder how many iterations you were doing before and after the pruning. It must have reduced the search space by a very large amount
Do you have your init vim available somewhere? That’s a really cool theme and status bar
Very nice video! I found it very entertaining and enlightening. I love 3blue1brown, so I thoroughly enjoyed the topic.
I do want to note that the explanation about the correctness computation giving the answer to the matches query given around 3:40:20 is not quite accurate. If for instance the guess is abbbb for an answer of cccca, the correctness mask is [M W W W W], because the a in the first position is misplaced and all the b's are wrong, but if the situation is reversed and you guess cccca for the answer abbbb, the correctness mask is [W W W W M], because it is the a in the fifth position that is misplaced this time.
Also, I'm not sure if I understood the pattern pruning optimization correctly. As I understand it, each time you play a game, you start with the full set of valid correctness patterns (3^5 = 243), and while deciding on each guess (after the first, which is fixed and does no computation), you iterate through all potential answers and for each answer you calculate its goodness by iterating through all remaining correctness patterns and checking how many of the potential answers would yield that pattern if you would guess the word you are evaluating. If no answers would yield that pattern, then you remove it from the list of consideration for the rest of the game. If that is indeed how it is to be understood, then I don't think this is a valid operation, since the set of valid patterns of course depends on the word that is guessed. If for instance, the set of possible answers has been so reduced that the only place a 'q' appears is at the beginning of a word (but the letter 'q' has not been confirmed or denied so far), then when a word beginning with 'q' is considered, all patterns that have yellow (misplaced) in the first position will be pruned because 'q' can't be misplaced in the first position according to the remaining answer list. But surely there are still valid patterns with yellow at the first position when guessing some different word that doesn't begin with 'q' (like those guesses beginning with the second letter of the answer).
Yep, you're totally right on both counts! See github.com/jonhoo/roget/pull/9 for more discussion and for the eventual alternative we landed on.
Admittedly not through the 6 hours, but the naive guesser could have started by marking all yellows, and then override them with green (the same as starting with incorrect is overridden by marking yellow). By switching the order you wouldn't have to keep track of the previous used, right?
I've been starving for those videos ❤️
Excellent video as always 🔥🔥
42:24 actually `n..` *is* an infinite loop, with the caveat that in debug builds it will panic when reaching usize::max (in release it will wrap)
I just checked the Google dataset and occurrences are not cumulative in the 2020 dataset so the awk command should be changed to `for (i=7;i
Oh interesting! That's definitely something to fix then. Want to send a PR?
40:15 That loop isn't infinite though, is it? Surely it'll end once it has ran with i == i32::MAX?
It doesn't just walk through the list of words it scrambles them in a certain predictable way so you can't just look up the day's answer without going through the indexing algorithm
Shouldn't you prune the patterns only after you know that the candidate you're testing will be the actual guess? Like, an empty pattern for a candidate might not be empty for another one, no?
Yes indeed! See github.com/jonhoo/roget/pull/9
Just came to TH-cam for some Swedish Death Metal… but here I am… to stay lol
At around 1:55:00 won't it be easier to just search for an unused instance of the letter in word for the Correctness::Misplaced case
What's with the different amounts of red and white squares next to your rust-analyzer(?) errors? What do they mean?
Let's see you make a Nerdle solver now
1:20:30 just cast it:
(|_history: &[Guess]| "moved".to_string()) as (fn(history: &[Guess]) -> String)
Yep - the error is just really unhelpful. See also github.com/rust-lang/rust/issues/86654#issuecomment-1060013901
To be honest I don't understand what the popularity of words has to do with the solver. My understanding is that there are
1. a set of words that can be guessed and
2. a set of words that might be the answer (strict subset of 1)
and the answer is uniformly picked from (2). Is that incorrect? Because if that's how it works I can't see how the real-world usage frequency of words has any bearing on the problem.
Ah, no, Wordle has a set of games that is a subset of the dictionary of words that Wordle allows. The set of game words is curated, and contains mostly fairly common words. Hence the bias.
@@jonhoo I see, thank you for the explanation.
I was thinking that the pruning might change the results, because maybe a pruned word might give more bits of information than one that was not pruned
Like it is sure that it is bot that word, but it check useful letters
That's basically the difference between easy and hard mode - in hard mode you're (mostly) not allowed to guess words that you have clues indicating _cannot_ be the answer. See also github.com/jonhoo/roget/pull/9 and github.com/jonhoo/roget/commit/4482ad13db3c556703565f81419223f365c4f96b
@jonhoo, you should check out the Aur helper Paru! It's written in Rust! :)
Good stuff! I'm following along on a Macbook from the very beginning. "awk '{print $2}' 5-letters-lc-sorted-combined.txt | paste -sd+ | bc" wouldn't run. I'm not sure why, but I skipped that single command. The rest is awesome. I appreciate your content immensely.
I believe on macOS you have to write `paste -s -d+ -` for it to work - try that!
Is there some way to look at your vim configuration?
github.com/jonhoo/configs
Great video, I would love to look at your vim config, what Language Server are you using for rust?
github.com/jonhoo/configs/blob/master/editor/.config/nvim/init.vim :)
Octordle solver when?
1:20:27 Isn't the lifetime error occurring because the guess function is declared after the 'w' variable?
No, it's because the types _actually_ don't match up. See github.com/rust-lang/rust/issues/86654#issuecomment-1060013901
What's your terminal colorscheme? It looks so pretty. Awesome video man, all the best.
It's in his desktop and environment video
@@noorhu2555 Dope, thanks!
Which operating system do you use?
Hi, what is your vim color scheme?
how did you change the menu bar position of your firefox?
github.com/jonhoo/configs/blob/master/gui/.mozilla/firefox/chrome/userChrome.css :)
Post a link to that guys videos.
Wow!
what theme is this?
base16-gruvbox-dark-hard
@@jonhoo hey sorry to disturb you do u have a github link to it
Why does he create an algorithms.rs then an algorithms directory, instead of algorithms/mod.rs? Isn't mod.rs the standard?
The file names algorithms.rs and algorithms/mod.rs are equivalent as far as editions 2018+ are concerned. Submodules of algorithms will have to be put in the algorithms directory. And again you can either put a submodule foo file at algorithms/foo.rs, or use algorithms/foo/mod.rs.
Could you please tell me what you have done for the status bar to be on bottom ? By the way still in the middle of this great tutorial as is long :) thanks
It looked like the sort was unstable with each run. Did I catch the output of prune changing each time? And isn’t any sort over floating points considered unstable?
1:05:15 i always wonder how after he yanked some line than he can just paste it down below of the yanked line without even going there, i really need to use this on my vim, any idea how anyone?
He seems to have a custom keybind set up for it so he can do it in one keystroke, but the command sequence after having the block selected would be (line breaks just for readability):
y
'> (go to end of last selection)
p
@@protowalker Thank you so much!
Wow this got super complicated
Hi Jon and everyone! Could anyone point me to a video that details your development environment? :)
@Luna Razzaghipour thank you! 🙌
What terminal are you using?
This is Alacritty running tmux running fish :)
@@jonhoo oh, same except I'm using zsh :)
@@jonhoo thanks
1:04:00 Is there really a reason to write test cases you can prove redundant by analysis? I write tests when it's non-trivial to prove a function's behaviour mathematically, or for edge cases that are non-trivial to consider.
My man's gonna zip two sets ont one another, then enumerate and loop over it and name his variables "i, g, m, w"
Sir you're allowed words in your iterator variables when you've got that many! You're not playing code golf and our monitors exceed 80 characters now
Can you implement something like libp2p2 in rust keeping blockchain in mind and create a series please?
Cool!!
🤩🤩
As you noted, not all words are tagged with an identifier. Making the identifier pattern option in ripgrep reveals a much larger list. The pattern,
"^[a-zA-Z]{5}(?:_[A-Z]{3,4})?\t"
leads to just over 4.2M lines.
Curiously, using zgrep with the same pattern (and flags -E for expanded regex, -h to not print the file names) I get even more, just over 4.3M lines. Am I wrong to use the same regex for both? Commands are pasted below.
Ripgrep: ```rg -Iz "^[a-zA-Z]{5}(?:_[A-Z]{3,4})?\t" 1-*-of-00024.gz > 5-letters.txt```
zgrep: ```zgrep -Eh "^[a-zA-Z]{5}(?:_[A-Z]{3,4})?\t" 1-*-of-00024.gz > 5-letters.txt```
Hmm, it could be that they have slightly different regex semantics, I'm bit sure. In any case I've singe updated the way the dictionary is constructed to be entirely in Rust: github.com/jonhoo/roget/commit/4e57548bee971cd627df88cd52ac0f61c4d233a4
Not that it really matters, but I'm pretty sure that the "x_NOUN" type records are all counted under the "x" record. So the canonical record is just the word immediately followed by a tab.
'^[a-zA-Z]{5}\t'